Jun 7 2010

Trends in Concurrency 2010, part 1

I already posted the presentation I gave at the summer school earlier. In the following posts, I will report a bit about the lectures of the summer school, similar to my posts about the TPLI summer school of last year.

Like last year, my thanks go to FWO and the summer school, for providing me with grants to cover all the costs. Thanks :)

The first two days covered four different topics. Adam Welc started with an introduction to Software Transactional Memory, Jean-Pierre Talpin gave lectures on the topic Virtual Prototyping Embedded Architectures, Satnam Singh reported on his experience with Data-Parallel Programming, and last but not least, Peter Sewell introduced us to the mystics of hardware memory models with his talk titled Low-Level Concurrency – For Real.

Adam Welc: Software Transactional Memory

Adam, who works for Intel Labs, talked in his first presentation mainly about the semantics of STM and the different approaches for designing a STM system. The second presentation concentrated on how the implementation of such a system could work.

His presentations were very much in the spirit of the STM system Intel provides as a prototype, but the general ideas are used in other systems as well.

The different options for semantics of an STM included open or closed transaction nesting, transaction flattening as well as abort and retry. Furthermore, he discussed the interaction for instance with legacy code, which leads to the question of weak and strong atomicity properties, or the semantics of single global lock atomicity. The last semantic question he discussed was how consistency should be provided, and how pointer-privatization could be supported by the system. Afterwards, he outlined the design space for STM systems, introducing optimistic vs. pessimistic transactions as well as write buffering vs. in-place updates.

In the second talk, he presented the implementation details.

For their C++-based STM the compiler needs additional information about the characteristics of all code, which is done by annotations on class and method level. Similar, the compiler supports atomic blocks, abort, and retry statements. Beside all the implementation details, he also discussed several optimizations the compiler attempts to reduce the STM overhead with regard to read/write-barriers wherever possible. My gut feeling tells me, someone should write a STM+TracingJIT story. Not sure whether that buys anything, but what I have seen with STM looks a lot like the trace-guards.

The last slides he presented were about performance. The basic conclusion still is: performance is an issue. However, he gave me an interesting pointer to one of his papers, which basically states, that transaction can be used for looks without contention, quite straight forwardly: Transparently reconciling transactions with locking for Java synchronization.

Jean-Pierre Talpin: Virtual Prototyping Embedded Architectures

Unfortunately, I missed the introductorily part of these lectures due to problems with our accommodation *sigh*. However, as far as I understood, in the domain of embedded systems for large scale applications like aircrafts, there is always a need to combine a large number of different technologies, and the final goal is to simulate, analyze, and verify the software before it is deployed into production use. Especially the verification part is important for critical systems like in avionics.

The systems in this domain are mostly event driven, i.e., in his terminology it is an asynchronous composition of reactive processes.

For these purpose, they developed Polychrony including an Eclipse integration. It allows using data-flow models for computation and mode automata for control. These models are verified with model-checking techniques and allow controller synthesis.

At the heart of this system is a data-flow language, which allows expressing such an asynchronous event system in terms of synchronous modules, which have better characteristics with respect to verification. Furthermore, they established the means to prove certain correctness criteria.

For an aerospace project, they used this as a basis and code-generation framework. One of their techniques uses a SSA as an intermediate representation to model existing software. They use SSA to translate the input to their synchronous data-flow formalism enabling their analysis’.

Satnam Singh: Data-Parallel Programming

The motivation behind his work is to allow users to utilize parallelism in a civil manner, i.e., with a reasonable effort/gain ratio. His premise is that programming models like SIMD, OpenMP, and MPI are inherently low-level and require very detailed knowledge about the target architecture, which restricts their applicability. Not only increases it the development cost, but it also restricts the application to a single platform. With the Microsoft Accelerator, they develop a high-level data-parallel library to target various platforms, starting with standard multi-core CPUs with SIMD instruction extensions, GPGPUs, and FPGAs.

The user describes the computation by defining a data-flow graph, which describes the intention instead of the detailed mapping to an execution. Most notably, they provide parallel array structures, on which operations can be performed without necessarily using an index, but deferring these details to the runtime, which can optimize the relevant array operations for the target architecture. Furthermore, it provides certain standard mechanisms to work in a data-parallel way. This includes various operations on the parallel arrays, which for instance can be combined with appropriate reduce operations. An important operation he emphasized is the array shift operation, which is extensively used to describe necessary data transformations. Instead of doing these kind of transformations explicitly, the runtime system can highly-optimize them to the target platform.

The resulting program description is just-in-time compiled for the target machine, and only then executed. For the typical problems they are approaching, the construction of the program describing graph and the JIT overhead are negligible.

However, this form of nested data parallelism brings also some problems. Especially the distribution of the problem over the computing platform is hard since the shape of the resulting computational graph depends on the input data.

Peter Sewell: Low-Level Concurrency – For Real

The basic conclusion of his talk is: it is impossible to write portable, correct, and efficient concurrent code for any existing hardware platform, because they do not actually tell you all the constrains/properties in their specifications.

So he basically starts to rant about all processor manufacturers and their specs. He starts with a seemingly simple example, and demonstrates that the expected outcome is not guaranteed.

Then he goes on and describes various examples for x86 and relates them to the tricks a CPU applies, which result in rather weak memory models.

Based on that, he presents a model they developed called x86-TSO (Total Store Order), which allows to reason about the correctness of concurrent x86 programs. This model is eventually used to introduce the notion of triangular races. This allows to reason about a class of programs, which can contain data races, for instance lock implementations.

From here, he goes on to rant about all the other existing hardware architectures and their semantics or lack of specification. They tried for years to come up with a formal model like x86-TSO but failed.

To conclude his lectures, he presents 10 different options how to define a sane memory model for instance for a programming language. He starts out with option 1: DON’T i.e. no concurrency. Well, my personal impression still is, that this is his favorite option. It has several advantages. Especially, it is very simple. In the end he comes to option 10, which means to have a memory model similar to C++0x. It is a data-race free model, which provides low-level concurrency primitives, and allows to break abstractions. Beside the notion of the data-race free part, that is something, I have expected/thought about earlier for a good VM.


May 25 2010

Locality and Encapsulation: My Student’s Presentation at the TiC Summer School

On the first day here at the summer school, the organizers gave us the opportunity to present our research ideas to the lecturers and other participants.

I presented my idea on extending the virtual machine models with explicit support for concurrency models. My brief presentation focused on using multiple-language virtual machines to provide support for domain-specific concurrent languages. It basically states the question whether it is possible to use the concepts of locality and encapsulation as a common foundation for a wide range of different models. Since I am at the very beginning of working on this specific idea, there aren’t any answers yet.


Feb 21 2010

Towards an Actor-based Concurrent Machine Model

Already quite a while ago, I was involved in writing a workshop paper about an actor model for virtual machines. Actually, the main idea was to find a concurrency model for a VM which supports multi-dimensional separation of concerns. However, AOP is not that interesting for me at the moment, so I am focussing on the concurrency, especially the actor-based VM model.

After one year, I am back looking at that paper, and it still looks like a great model. Think, I will incorporate it into my manycore VM now :)

Abstract

In this position paper we propose to extend an existing delegation-based machine model with concurrency primitives. The original machine model which is built on the concepts of objects, messages, and delegation, provides support for languages enabling multi-dimensional separation of concerns (MDSOC). We propose to extend this model with an actor-based concurrency model, allowing for both true parallelism as well as lightweight concurrency primitives such as coroutines. In order to demonstrate its expressiveness, we informally describe how three high-level languages supporting different concurrency models can be mapped onto our extended machine model. We also provide an outlook on the extended model’s potential to support concurrency-related MDSOC features.

  • Towards an Actor-based Concurrent Machine Model, Hans Schippers, Tom Van Cutsem, Stefan Marr, Michael Haupt, Robert Hirschfeld, Proceedings of the fourth workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS), New York, NY, USA, ACM (2009), p. 4–9.
  • Paper: PDF
    ©ACM, 2009. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ICOOOLPS’09 July 6, 2009, Genova, Italy. http://doi.acm.org/10.1145/1565824.1565825
  • BibTex: BibSonomy

Feb 14 2010

Intermediate Language Design of High-level Language Virtual Machines: Towards Comprehensive Concurrency Support

My second workshop paper got published at the ACM Digital Library. This is actually only an abstract, but nonetheless, it might be interesting for people looking into the design of virtual machines and especially bytecodes/intermediate languages.

Abstract

Today’s major high-level language virtual machines (VMs) are becoming successful in being multi-language execution platforms, hosting a wide range of languages. With the transition from few-core to many-core processors, we argue that VMs will also have to abstract from concrete concurrency models at the hardware level, to be able to support a wide range of abstract concurrency models on a language level. To overcome the lack of sufficient abstractions for concurrency concepts in VMs, we proposed earlier to extend VM intermediate languages by special concurrency constructs[PLACES09].

As a first step towards this goal, we try to fill a gap in the current literature and survey the intermediate language design of VMs. Our goal is to identify currently used techniques and principles as well as to gain an overview over the available concurrency related features in intermediate languages.

Another aspect of interest is the influence of the particular target language, for which the VM is originally intended, on the intermediate language.

  • Paper: PDF
    ©ACM, 2009. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in VMIL’09, October 25, 2009. http://doi.acm.org/10.1145/1711506.1711509
  • BibTex: BibSonomy

Slides of the Talk at VMIL09/OOPSLA


Feb 7 2010

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Finally, my first workshop paper got published, which was a little odyssey with some misunderstandings, but anyway, now it is out. It is just a position paper, thus, do not expect to many insights. However, what it describes is my big plan, and hopefully the story of my PhD. Am working on it…

Abstract

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today’s high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets.

Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets.

As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research.

Slides of the Talk at PLACES09