Tag Archives: paper

Synchronization Views for Event-loop Actors

With Joeri we have been working already for a while on a paper to extend the standard actor model with more parallelism. This work is not completed yet, and there are still some theoretical issues with the approach he designed. But we are working on it!

For the moment, you can have a sneak-peak at the poster abstract for PPoPP’12.

Abstract

The actor model has already proven itself as an interesting concurrency model that avoids issues such as deadlocks and race conditions by construction, and thus facilitates concurrent programming. The tradeoff is that it sacrifices expressiveness and efficiency especially with respect to data parallelism. However, many standard solutions to computationally expensive problems employ data parallel algorithms for better performance on parallel systems.

We identified three problems that inhibit the use of data-parallel algorithms within the actor model. Firstly, one of the main properties of the actor model, the fact that no data is shared, is one of the most severe performance bottlenecks. Especially the fact that shared state can not be read truly in parallel. Secondly, the actor model on its own does not provide a mechanism to specify extra synchronization conditions on batches of messages which leads to event-level data-races. And lastly, programmers are forced to write code in a continuation-passing style (CPS) to handle typical request-response situations. However, CPS breaks the sequential flow of the code and is often hard to understand, which increases complexity and lowers maintainability.

We proposes synchronization views to solve these three issues without compromising the semantic properties of the actor model. Thus, the resulting concurrency model maintains deadlock-freedom, avoids low-level race conditions, and keeps the semantics of macro-step execution.

  • Synchronization Views for Event-loop Actors, Joeri De Koster, Stefan Marr, Theo D’Hondt, Proceedings of PPoPP’12, USA, to appear (2012).
  • Paper: PDF
    ©ACM, 2012. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. To appear.
  • BibTex: BibSonomy

Highlights of HPCC 2010

The 12th IEEE Internal Conference on High Performance Computing and Communications was not the first conference I attended. However, it was the first one where I actually presented a paper in the main research track.

As usual, the conference covered a wide variety of different topics. For me the following presentations were the most interesting, since they discuss problems related to my own research.

GPGPUs are a hot topic and HPCC covered several different aspects. Often it was not the main focus of the presented paper, which was inherently interesting for me, instead it was the problems the authors were facing during their experiments. For instance the presentation on Sparse Matrix Formats Evaluation and Optimization on a GPU highlighted the difficulties of programming systems with such complex memory hierarchies as present in today’s GPGPU systems. Similar, OpenCL: Make Ubiquitous Supercomputing Possible gave an introduction on how to develop applications for GPGPU systems.

The presentation on Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments gave interesting insights in how super computing center operate and what challenges they face. One interesting anecdote was that the power saving functionality of modern CPUs actually causes some unexpected trouble in such large-scale systems. The temperature differences caused by powering down CPUs can cause their dies to crack resulting in destroyed CPUs. Furthermore, the talk also included a projection of how future super computers will look like. Interestingly, this projection says that the tradeoff between power efficiency and sophisticated CPU optimizations for single-core performance will be decided in favor for power efficiency. Thus, they expect future super computers to have dramatically higher number of much simpler cores than what is used today. That means, optimizations like branch prediction and out-of-order execution will most likely not be present on CPU cores for exascale systems to be able to build high performance systems that fit into the practical energy envelope, i.e., are coolable.

Related to cluster computing was the presentation on Enabling GPU and Many-Core Systems in Heterogeneous HPC Environments Using Memory Considerations. Here the memory bandwidth limitations were considered to schedule tasks on heterogeneous clusters. The idea is that tasks that saturate the memory system will slow down the overall systems. Thus, better distribution of tasks, taking the available memory bandwidth into account will lead to better execution times.

The Evaluation of the Task Programming Model in the Parallelization of Wavefront Programs provides the arguments for the intuitive assumption that fine-grained parallelism is inefficient. However, the approach here was based on OpenMP and Intel’s TBB library, thus, there is no automatic solution for problems which are expressed naturally in a very fine-grained way. As far as I remember form the TiC Summer School, X10’s compiler is actually capable to coarsen up fine-grained task parallelism, and thus provides some automatic solution for this problem.

However, my very personal highlight of the conference was moment when I unexpectedly received the Best Student Paper Award for my paper on Insertion Tree Phasers.

Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fine-grained Parallelism

The last half year was an interesting departure from my actual PhD research. First, I though the idea of barriers and phasers might be interesting to incorporate into a virtual machine as part of my thesis, but as it turned out, they are much to high-level and are better off implemented in a library. The gain for direct support in a VM is just not proportional to the effort and restrictions which come with that step.

However, the time was not spend just for the sake of the academic exercise. I had a small idea how to improve the existing approaches and after quite some work, which proved that the initial idea was just broken, I had an algorithm that actually worked. Even so that idea is not contributing anything directly to my thesis, I was lucky enough to have a paper about it accepted at the IEEE HPCC’10 conference.

Abstract

This paper presents an algorithm and a data structure for scalable dynamic synchronization in fine-grained parallelism. The algorithm supports the full generality of phasers with dynamic, two-phase, and point-to-point synchronization. It retains the scalability of classical tree barriers, but provides unbounded dynamicity by employing a tailor-made insertion tree data structure.

It is the first completely documented implementation strategy for a scalable phaser synchronization construct. Our evaluation shows that it can be used as a drop-in replacement for classic barriers without harming performance, despite its additional complexity and potential for performance optimizations. Furthermore, our approach overcomes performance and scalability limitations which have been present in other phaser proposals.

  • Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fine-grained Parallelism, Stefan Marr, Stijn Verhaegen, Bruno De Fraine, Theo D’Hondt, Wolfgang De Meuter, Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications (HPCC), IEEE CS, September (2010) (to appear).
  • Paper: PDF
    ©IEEE, 2010. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
  • BibTex: BibSonomy

Towards an Actor-based Concurrent Machine Model

Already quite a while ago, I was involved in writing a workshop paper about an actor model for virtual machines. Actually, the main idea was to find a concurrency model for a VM which supports multi-dimensional separation of concerns. However, AOP is not that interesting for me at the moment, so I am focussing on the concurrency, especially the actor-based VM model.

After one year, I am back looking at that paper, and it still looks like a great model. Think, I will incorporate it into my manycore VM now :)

Abstract

In this position paper we propose to extend an existing delegation-based machine model with concurrency primitives. The original machine model which is built on the concepts of objects, messages, and delegation, provides support for languages enabling multi-dimensional separation of concerns (MDSOC). We propose to extend this model with an actor-based concurrency model, allowing for both true parallelism as well as lightweight concurrency primitives such as coroutines. In order to demonstrate its expressiveness, we informally describe how three high-level languages supporting different concurrency models can be mapped onto our extended machine model. We also provide an outlook on the extended model’s potential to support concurrency-related MDSOC features.

  • Towards an Actor-based Concurrent Machine Model, Hans Schippers, Tom Van Cutsem, Stefan Marr, Michael Haupt, Robert Hirschfeld, Proceedings of the fourth workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS), New York, NY, USA, ACM (2009), p. 4–9.
  • Paper: PDF
    ©ACM, 2009. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ICOOOLPS’09 July 6, 2009, Genova, Italy. http://doi.acm.org/10.1145/1565824.1565825
  • BibTex: BibSonomy