The third day started with Brendan Eich’s keynote on JavaScript’s world domination plan. It was a very technical keynote, not very typical I suppose. And he was rushing through his slides with an enormous speed. Good that I have some JavaScript background. Aside all the small things he mentioned, interesting for me is that he seemed to be very interested to get Intel’s RiverTrail approach to data-parallelism into ECMAScript in one or another form. That is kind of contradicting the position I heard so far, that ECMAScript would be done with having WebWorkers as a model for concurrency and parallel programming.
Language Implementation
The first session had two interesting VM talks for me. The first being JIT Compilation Policy for Modern Machines. With the assumption that you cannot get your multicore/manycore machines busy with application threads, they experimented how additional compilation threads can be used to optimize code better. I do not remember the details completely, but I think, after 7 compilation threads they reached a mark where it was not worthwhile anymore to add more threads.
The second paper is a pessimistic approach to implement atomic statements meant for systems programming: Composable, Nestable, Pessimistic Atomic Statements. It is not optimistic like STM, and does not use a global locking order, which would need to be determined statically. Instead they annotate the fields with so-called shelters. Shelters build a hierarchy, which describes the necessary parts to synchronize with.
The third paper was also a very interesting one for me: Delegated Isolation. It is an approach again similar to STM in a sense, but it avoids unbounded number of transaction retries. For that, the object graph is essentially partitioned in growing subgraphs that can be processed in parallel. Only when a conflict, a race-condition occurred, subgraphs are merged logically, and the computation is serialized. A neat idea and an interesting use of the object-ownership idea.
The last talk was on: AC: Composable Asynchronous IO for Native Languages. It was a presentation of work done in the context of the Barrelfish manycore OS. The goal was to have an easy to use programming model, that comes close to sequential programming, but has similar performance properties as typical asynchronous APIs in operating systems. The result is a model based on async/finish, that seems to be relatively nice. As I understand it, it is basically syntactic sugar and some library support to wrap typical asynchronous APIs. But it is a model that is purely focused on such request/response asynchrony, and does not handle concurrency/parallism.
And that was basically it. SPLASH is a nice conference when it comes to the content. Not so interesting when it comes to the social “events”, it wasn’t much of an event anyway. Not even the food was notable…
The second day of the technical tracks started with a keynote by Markus Püschel. He is not the typical programming language researcher you meet at OOPSLA, but he does research in automatic optimization of programs. In his keynote, he showed a number of examples how to get the best performance for a given algorithm out of a particular processor architecture. Today’s compilers are still not up to the task, and will probably never be up to it. Given a naïve implementation, hand-optimized C code can have 10x speedup when dependencies are made explicit, and the compiler knows that no aliasing can happen. He was then discussing how that can be approached in an automated way, and was also thinking about what programming languages could do.
Award Papers
Afterwards, I attended the session with the awarded OOPSLA papers. The Hybrid Partial Evaluation talk presented an approach to avoid the typical cost of use of reflection or ‘interpretation’. The presentation of SugarJ: Library-based Syntactic Languages felt like a déjà vu. I did not get where it is different from Helvetica other than that it is for Java. The third paper on Reactive Imperative Programming with Dataflow Constraints was interesting in that it used also memory protection tricks to realize a reactive model in C++. The last presentation: Two for the Price of One: A Model for Parallel and Incremental Computation was very interesting. I have not used incremental computations as far as I am aware of anywhere other than for course work, but bringing it together with parallel programming in a single programming model, gives plenty of opportunities for super-linear speedups.
Parallel and Concurrent Programming
The second session of the day was on parallel and concurrent programming. Kismet: Parallel Speedup Estimates for Sequential Programs tackled the problem to get an idea of what opportunities for parallelism are available in a given program without having to change the used algorithms and approaches to much. For that, it uses data dependency analysis to characterize the critical path on a data-flow level. Since that usually does not give realistic results because of overestimation of parallelizability, they use in addition a hierarchical model of loops and the knowledge of the available hardware parallelism to better predict possible speedups.
The second and the third talk where almost identical in terms of problem and goal. Essentially, they provide the necessary infrastructure to run different variants of sequential implementations in parallel and then chose either the winner in terms of runtime or precision. These approaches are especially interesting if the available algorithms have very different properties for different input or input sizes. For instance, some mathematical algorithms just do not converge to a solution for certain inputs while they are very fast for others.
The last talk of the session discussed Scalable Join Patterns. Join patterns are an old approach to describe synchronization mechanism flexibly and declaratively. The presented work provided a scalable implementation approach that seems to work quite well and when they would use a compilation based approach for the patters, I guess it could be a very feasible and flexible replacement for standard synchronization mechanism provided as libraries.
Panel
Instead of attending the third paper session of the day, I attended the panel on Multicore, Manycore, and Cloud Computing: Is a new Programming Language Paradigm required?. Well, it was entertaining Nothing really new, no surprising arguments as far as I recall, but certainly interesting to watch. I think, they also recorded it. So it might be floating around the web soon.
The following introduction and analysis of the Sly3 programming language was written by Pablo Inostroza Valdera as part of his course work for the Multicore Programming course of Tom Van Cutsem. The assignment was to write a blog post about a topic of their own choice, and I repost Pablo’s work here with his permission to spread the word about Sly a bit wider. We made his article also available as part of his work for the Renaissance project itself.
Sly3 is a Smalltalk dialect that features two
concepts in order to deal with parallelism: ensembles, which are
collections of objects that can receive parallel messages, and modifiers,
which decorate messages in order to modify their parallel behavior
(adverbs) or to specify how to handle reductions
(gerunds). In this article we analyze Sly3‘s programming model.
Sly3 is a language being developed in the
context of the Renaissance Project, at
IBM. With Sly3, their authors David Ungar and Sam
Adams explore abstractions for nondeterministic parallel programming. Sly3 is the experimental successor of LY,
a javascript-like interpreted language that was implemented on top of
Smalltalk[UA10]. As the 3 in Sly3 indicates, Sly
itself underwent already three iterations of design. The main idea behind
the design of Sly3 is that multicore programming is
commonly addressed by restricting the nondeterminism parts and trying to
enforce strict determinism. As an example of this, Ungar and Adams refer to
the actors paradigm. While the actor model certainly adds some order to
concurrent computations, the truth is that some nondeterminism still
persists because the global order in which messages arrive to an actor is
nondeterministic. They proposed that instead of trying to restrict the
nondeterminism, it should be embraced. The Ly
language and, later on, the Sly3 language are steps
towards that direction. In this article we discuss the latter.
Ensembles, or How to Represent A Flock of Birds
Sly3‘s key elements were conceived by observing the
nature. Think of a flock of birds, an ant colony or a school of fishes.
These are all examples for a multitude of independent agents, acting in a
nondeterministic way, whose aggregated behavior is more complex than that
of the individuals. In other words, a more complex behavior emerges. The idea of a whole formed from independent units
is key to Sly3, and it has been reflected in the concept of
an ensemble. An ensemble is a collection of
objects, subject of receiving messages as a whole. In the following
example, we create an ensemble and execute an operation on it:
This code yields: %{1. 4. 9. 16. 25}. Recall that the curly
braces represent an array in Smalltalk. In Sly3, curly braces preceded by %
denote an ensemble.
Notice how the ensemble received a message as a whole, but it acted on
each individual in order to produce a new ensemble with all of the elements
squared. It is not surprising to know that the default behavior of this
mapping is parallel, i.e., a different thread of execution is
triggered for each individual computation. The standard execution strategy
in this this simple example is as follows:
The message was delegated to members of the ensemble, creating a
parallel process for the evaluation of each of the messages. We can say
that each member was treated individually and
was executing its own task.
For generating the final result, a new ensemble was created from the
results of all the tasks of the previous step. We can say that we are
ensembling together all the individual
results.
Notice that we have put in bold letters two words: individually and
ensembling. We have done so, because both words characterize a strategy for
handling a parallel computation. Actually, this is the default strategy in
Sly3, but the programming model was designed to be able to
customize these behaviors.
Adverbs and Gerunds, or How to Modulate Parallel Behavior
The basic model of message handling for an ensemble is:
The ensemble that receives the message is treated in accordance to
its adverb. A receiver’s adverb precedes the
message name. In the absence of an explicit one, the default adverb for
receiver, namely individualLY, is used.
We elaborate on the meaning of individualLY in subsequent
paragraphs, important here is to note that adverbs’ names always end in LY.
If there are operands, they have to be treated in accordance with
their adverbs. An operand adverb follows the
message name. In the absence of an explicit one, the default adverb for
operands, namely wholLY, is used.
The operation is executed according to an overall strategy, also
determined by an adverb.
The result of the operation is reduced in accordance to a
gerund. In the absence of an explicit one,
the default gerund, namely ensemblING, is used. Gerunds’ names
always end in ING.
Regarding the point 3, it suffices to mention that currently there are
only three overall strategies, namely, serially, plainly and the
default one, which we can call parallelly.
Note that the metaphor of the adverb is taken to an extreme in
Sly3. Many words that do not sound like adverbs are
adverbialized in order to fit in the
conceptual framework.
Serially forces serial executions, so no processes/threads
are spawned to compute the results. This can be useful either for semantic
reasons or when the cost of parallel execution is to high, i.e., when we
are below the sequential threshold. The adverb plainly can be used to turn off ensemble behavior, and
therefore, to treat the ensemble as a collection (e.g., we could ask for
the size of an ensemble by using plainly, and this would not imply sending
a message to each of its members). For the sake of brevity, we are not
going to discuss the overall strategies further, since they are not
relevant for grasping the essence of the Sly3 programming
model.
Adverb
Informal description
wholLY
just pass me along as a whole ensemble
individualLY
create a process/thread for each of my members
collectionLY
treat me as a collection instead of an ensemble
duplicativeLY
copy each of my members, to produce a new ensemble
roundLY
create a process for each combination of the receiver’s members and my members
randomLY
chose n of my members randomly, n is a parameter of this
adverb
valueLY
take a block and apply it to each of my members
Table 1: Some of the principal adverbs in Sly3.
The syntax of Sly3 is cumbersome, so next we present some
examples in order to provide a more clear idea of how to express things in
this language. Let’s analyze them in the light of three of the steps
presented before (1, 2 and 4; 3 is excluded since are not going to discuss
the overall strategy). In order to guide our
analysis, figure 1 presents a table with brief
descriptions of some of the main adverbs in Sly3, and their
impact on the message evaluation process. We have not included a similar
table for gerunds, because their names are in general more self
descriptive.
Example 1
%{true. true. false. true} andING
In this example, there are no adverbs specified, therefore, the
receiver’s adverb is the default one, i.e.,
individually. This implies that the
processing of the members is done independently (i.e. in different
Smalltalk processes).
There are no arguments, therefore, no argument’s adverb.
There is a gerund andING that acts on the result of the
operation reducing the independent results coming from different
processes. As its name indicates, andING performs logical
and over the members of the resulting
ensemble. Notice that there no actual message specified here, thus,
andING is applied to unchanged set of members of the
ensemble.
Therefore, the result of this operation is: false.
Example 2
%{1. 2. 3} plusRoundLY: %{10. 20. 30}
Here are no adverbs specified for the receiver, therefore, the
receiver’s adverb is the default one, i.e.,
individually. This implies that the
processing of the members is done independently (i.e. in different
processes/threads).
roundLY is an argument to the operand (%10. 20. 30
). This adverb says: create a process/thread for each combination of the
receiver’s member and the argument’s member; acting as a cartesian
product.
There is no gerund, which means that the default one
(ensemblING) is used.
Therefore, the result of this operation is: %{11. 12. 13. 21. 22.
23. 31. 32. 33}.
Example 3
%{1. 2. 3} valueLY: [ | x * 10]
valueLY is an argument of the receiver, therefore, the
receiver has to be mapped to a new ensemble using the given block as the
mapper.
There are no arguments, therefore, no argument’s adverb. Recall that
the block is an argument to the adverb. As we can see, the problems of
Sly3′s syntax become evident in these cases.
Therefore, the result of this operation is: %{10. 20.
30}.
An Overview of Sly3′s Implementation
In order to have a comprehensive view of the spectrum of the current
custom behaviors available for direct use, in
figure 1 we show the hierarchy of gerunds and
adverbs that we can find in the current image of Sly3.
Notice that this forms a restricted vocabulary that we can use as a basis
of more complex strategies, in a LEGO-like way. In fact, we can see this
hierarchy as a custom domain-specific language suited for the description
of common strategies for handling parallelism.
Figure 1: The hierarchy of modifiers of Sly3.
Based on what we have discussed so far, we can characterize
Sly3 as:
An Object-oriented language, descendant of Smalltalk, which
incorporates ensembles as first-class entities to model parallelism.
It features parallelism by default. As soon as a message is sent to
an ensemble, it is assumed that the processing will be spawned as
independent tasks over the members of this ensemble.
Implicit parallelism. The processes that handle messages for
ensembles are spawned under the hood, and
programmers interact without having to switch to a parallel mindset.
The way the receiver and the operands of a message are treated is
specified by a set of adverbs.
The way the result is reduced is specified by a set of gerunds.
The messages are not reified as for instance in the case of
Communicating Sequential Processes or
actors.
Communication is assumed to be synchronous.
The processes are not reified as in the case of actor-based
computing.
A Note on the Evaluation Process
The Sly3 language is implemented as an extension of a
Smalltalk Virtual Machine, the RoarVM. The current strategy is that
whenever a message to an ensemble is detected, a custom message dispatcher
is launched in order to handle it. Thus, the message is parsed and
evaluated at runtime. The message is
analyzed by partitioning the receiver according to its adverb, and pairing
it with the arguments according to arguments’ adverbs. Then, the actual
operation is executed in whichever set of operands resulted from the
previous processing. The final result is reduced according to the gerund.
When we use the expression according to, it is
worth to highlight that the actual behavior is encoded in the classes of
the gerunds an adverbs.
Below we give as example the implementation of the totallING
gerund. In the image it is the only method of the class
Sly3ModTotalling. Notice, that it nothing more but the
implementation of a simple summing policy.
; method of the class Sly3ModTotalling
reduceForEvaluator: ev
"Sum the elements in results and return the result.
Assumes these elements understand +."
| result results |
results := ev result.
results isEmpty ifTrue:
[ self error: 'No members, how should we handle? proceed for nil'.
^nil ].
result := 0.
results do: [ :each | result := each + result ].
ev result: result
Below we give the code for the two relevant method of the
Sly3ModCollectionly class that implements the
collectionLY adverb in Sly3. Although it is necessary to deeply
understand the behavior of the process of evaluating a message in order to
understand this code, at least we can have an intuition that what is
happening at some point is that a collection is created and returned (see
line 17). Eventually, this is how the creation
of a collection from an ensemble (which is what collectionLY does)
is implemented.
Therefore, each modifier in Sly3 is self descriptive and
fully determines the impact that their presence provokes on the evaluation
process.
Implementing MapReduce in Sly3
In order to show an example of how to use Sly3, we have
implemented a naive MapReduce framework. Take a look at the
course notes
for an Erlang implementation of the
same examples.
It is worth to stress that the conciseness of
the code that we eventually obtain is a consequence of both using
Sly3 and the application of an Object-oriented approach. The
framework relies on ensembles that receive messages. By doing that, the
messages are implicitly sent in parallel, so we do not have to specify this
in the code. We next discuss the core excerpts of the code.
Imagine two well-known applications of MapReduce. First, we want to
count the number of occurrences of words in a set of documents. Another
application is to know in which documents some words are found. In order to
do that, the MapReduce paradigm requires us to provide the problem-specific
knowledge by means of a map procedure and a reduce procedure.
Since Sly3 being an object-oriented language, we have
modeled the MapReduce problem-independent behavior as a class
MapReduce. Any particular implementation has to subclass this
class and override two abstract methods, namely, mapForKey:value:
and reduceForKey:values:.
Facing the problem of counting words in a document, below are the
methods of the class CountWordsMapReduce that implement this
behavior in Sly3.
Notice that for sakes of simplicity, we have modeled a filesystem as
a Dictionary whose keys are strings that represent file names
and whose values are ensembles corresponding to the set of words in a
file.
"Specific methods of the class CountWordsMapReduce"
mapForKey: ignore value: fileName
| words |
words := self consult: fileName.
^ words valueLY: [:y| {y. 1}]
reduceForKey: word values: counts
^ {word. counts totallING}
consult: file
^ files at: file
Facing the problem of indexing the texts using the words that they
contain, below are the methods of the class TextIndexingMapReduce
that implement this behavior in Sly3.
"Specific methods of the class TextIndexingMapReduce"
mapForKey: ignore value: fileName
|words |
words := self consult: fileName.
^ words valueLY: [:word| {word. fileName}]
reduceForKey: word values: names
^ {word. names members asSet}
consult: file
^ files at: file
Notice that in both cases, it suffices to use combinations of the
existing adverbs to achieve the desired functionality (see lines 5 and 8 of the
first listing and line 5 in the second
one). Moreover let’s now analyze the class MapReduce, that
implements all the machinery of our naive
MapReduce implementation.
As it is possible to see in lines 9, 13 or 23 of the above code,
the framework itself also profits from the fact that the data structures
that are being passed are ensembles. By having ensembles as the main
element being passed around, it is possible to have implicit parallel
behavior, which is the key of the MapReduce approach. Evidently, in this
case we are discussing a very limited version of MapReduce meant to be used
on a single computer, but it is still useful to show how the implicit
parallelism in Sly3 works.
Conclusions
In the spectrum of language abstractions for dealing with parallelism
and concurrency, Sly3 is a language that has opted for a
conservative style, in the sense that it is a specific element of the
language (an ensemble) the one that is subject of parallelism. On the other
hand, the stress has been put on giving a set of tools to programmers,
enabling them to combine these tools. For customized strategies, it is the
impression of the author of this article that the level of granularity is
too coarse-grained. Although it is conceivable to extend the set of adverbs
and gerunds by extending the hierarchy aforementioned, this implies to
understand the Sly3 implementation in very detail, something
that we most likely do not expect from application programmers. It is
possible to think of a meta-programming framework that could inject custom functionalities in the language, lowering the
complexity for extending the set of primitives.
Regarding the approach to parallelism, it is implicit and synchronous.
Programmers are not aware of how computations are spawned or scheduled, and
the only means they have to initiate parallel computations is by using
ensembles.
As regards to the syntax, Sly3 uses decorated keyword
based messages to model the way the message is handled by the virtual
machine. Although it is conceivable that this schema can be improved, there
are some problems originated from the fact that adverbs can decorate
receivers, arguments, and the overall strategy. If we add to this
complexity the gerunds, it is clear that it is difficult to express these
multidimensional facets using a textual syntax.
To sum up, Sly3 is a language that incorporates in its
design a series of patterns to deal with parallelism, but the mechanism for
extending the set of parallel primitives could be reconsidered, in order to
reduce the complexity of extending Sly3. However, the
MapReduce example has shown that it is possible to concisely express
complex patterns of interactions in Sly3, as soon as these
patterns fall within the give set of Sly3‘s primitives.
Last Friday was the annual Lab event of our Software Languages Lab. Like last year, many people related to the lab in one or the other way came to get an overview of what the current topics of our research are.
This year, we presented our research in the form of a Pecha Kucha talk. That means every presenter got 20 slides to present and each of the slides was shown exactly 20 seconds. That gives enough time to convey the general idea, but avoids boring the people with endless technical details.
All in all, that worked out pretty well.
My talk gave an overview of what the Parallel Programming Group is up to, on a very high level. It motivates why we are doing research in languages and language runtimes/virtual machines, and names our approaches to tackle the challenges. Well, for researchers in the field that is probably to vague, but everyone else might get just enough out of it to see in which direction we are going.
I already posted the presentation I gave at the summer school earlier. In the following posts, I will report a bit about the lectures of the summer school, similar to my posts about the TPLI summer school of last year.
Like last year, my thanks go to FWO and the summer school, for providing me with grants to cover all the costs. Thanks
The first two days covered four different topics. Adam Welc started with an introduction to Software Transactional Memory, Jean-Pierre Talpin gave lectures on the topic Virtual Prototyping Embedded Architectures, Satnam Singh reported on his experience with Data-Parallel Programming, and last but not least, Peter Sewell introduced us to the mystics of hardware memory models with his talk titled Low-Level Concurrency – For Real.
Adam Welc: Software Transactional Memory
Adam, who works for Intel Labs, talked in his first presentation mainly about the semantics of STM and the different approaches for designing a STM system. The second presentation concentrated on how the implementation of such a system could work.
The different options for semantics of an STM included open or closed transaction nesting, transaction flattening as well as abort and retry. Furthermore, he discussed the interaction for instance with legacy code, which leads to the question of weak and strong atomicity properties, or the semantics of single global lock atomicity. The last semantic question he discussed was how consistency should be provided, and how pointer-privatization could be supported by the system. Afterwards, he outlined the design space for STM systems, introducing optimistic vs. pessimistic transactions as well as write buffering vs. in-place updates.
In the second talk, he presented the implementation details.
For their C++-based STM the compiler needs additional information about the characteristics of all code, which is done by annotations on class and method level. Similar, the compiler supports atomic blocks, abort, and retry statements. Beside all the implementation details, he also discussed several optimizations the compiler attempts to reduce the STM overhead with regard to read/write-barriers wherever possible. My gut feeling tells me, someone should write a STM+TracingJIT story. Not sure whether that buys anything, but what I have seen with STM looks a lot like the trace-guards.
The last slides he presented were about performance. The basic conclusion still is: performance is an issue. However, he gave me an interesting pointer to one of his papers, which basically states, that transaction can be used for looks without contention, quite straight forwardly: Transparently reconciling transactions with locking for Java synchronization.
Unfortunately, I missed the introductorily part of these lectures due to problems with our accommodation *sigh*. However, as far as I understood, in the domain of embedded systems for large scale applications like aircrafts, there is always a need to combine a large number of different technologies, and the final goal is to simulate, analyze, and verify the software before it is deployed into production use. Especially the verification part is important for critical systems like in avionics.
The systems in this domain are mostly event driven, i.e., in his terminology it is an asynchronous composition of reactive processes.
For these purpose, they developed Polychrony including an Eclipse integration. It allows using data-flow models for computation and mode automata for control. These models are verified with model-checking techniques and allow controller synthesis.
At the heart of this system is a data-flow language, which allows expressing such an asynchronous event system in terms of synchronous modules, which have better characteristics with respect to verification. Furthermore, they established the means to prove certain correctness criteria.
For an aerospace project, they used this as a basis and code-generation framework. One of their techniques uses a SSA as an intermediate representation to model existing software. They use SSA to translate the input to their synchronous data-flow formalism enabling their analysis’.
Satnam Singh: Data-Parallel Programming
The motivation behind his work is to allow users to utilize parallelism in a civil manner, i.e., with a reasonable effort/gain ratio. His premise is that programming models like SIMD, OpenMP, and MPI are inherently low-level and require very detailed knowledge about the target architecture, which restricts their applicability. Not only increases it the development cost, but it also restricts the application to a single platform. With the Microsoft Accelerator, they develop a high-level data-parallel library to target various platforms, starting with standard multi-core CPUs with SIMD instruction extensions, GPGPUs, and FPGAs.
The user describes the computation by defining a data-flow graph, which describes the intention instead of the detailed mapping to an execution. Most notably, they provide parallel array structures, on which operations can be performed without necessarily using an index, but deferring these details to the runtime, which can optimize the relevant array operations for the target architecture. Furthermore, it provides certain standard mechanisms to work in a data-parallel way. This includes various operations on the parallel arrays, which for instance can be combined with appropriate reduce operations. An important operation he emphasized is the array shift operation, which is extensively used to describe necessary data transformations. Instead of doing these kind of transformations explicitly, the runtime system can highly-optimize them to the target platform.
The resulting program description is just-in-time compiled for the target machine, and only then executed. For the typical problems they are approaching, the construction of the program describing graph and the JIT overhead are negligible.
However, this form of nested data parallelism brings also some problems. Especially the distribution of the problem over the computing platform is hard since the shape of the resulting computational graph depends on the input data.
Peter Sewell: Low-Level Concurrency – For Real
The basic conclusion of his talk is: it is impossible to write portable, correct, and efficient concurrent code for any existing hardware platform, because they do not actually tell you all the constrains/properties in their specifications.
So he basically starts to rant about all processor manufacturers and their specs. He starts with a seemingly simple example, and demonstrates that the expected outcome is not guaranteed.
Then he goes on and describes various examples for x86 and relates them to the tricks a CPU applies, which result in rather weak memory models.
Based on that, he presents a model they developed called x86-TSO (Total Store Order), which allows to reason about the correctness of concurrent x86 programs. This model is eventually used to introduce the notion of triangular races. This allows to reason about a class of programs, which can contain data races, for instance lock implementations.
From here, he goes on to rant about all the other existing hardware architectures and their semantics or lack of specification. They tried for years to come up with a formal model like x86-TSO but failed.
To conclude his lectures, he presents 10 different options how to define a sane memory model for instance for a programming language. He starts out with option 1: DON’T i.e. no concurrency. Well, my personal impression still is, that this is his favorite option. It has several advantages. Especially, it is very simple. In the end he comes to option 10, which means to have a memory model similar to C++0x. It is a data-race free model, which provides low-level concurrency primitives, and allows to break abstractions. Beside the notion of the data-race free part, that is something, I have expected/thought about earlier for a good VM.
I started as a PhD student in October 2008 at the Software Languages Lab of the Vrije Universiteit Brussel. My field of research are virtual machines especially with respect to concurrency issues and language constructs for modularization. Recently, I started working on concurrency support at the VM instruction set level.