Tag Archives: concurrency

Modularity and Conventions for Maintainable Concurrent Language Implementations: A Review of Our Experiences and Practices

Modularity: AOSD’12 will be in Potsdam at the end of March, and I am looking forward especially to the MISS’12 workshop.

My understanding of the workshop’s format is that its goal is to encourage the participants to actively interact. Far to often, workshops are just a collection of semi-related presentations, without a common problem and without a common goal. I fear a bit, the MISS workshop will have a similar problem. Being part of the program committee, I have seen all the submissions and the author do tend to prefer business as usual over actual position papers. From my perspective, this is really a pity. It is a lost chance to really exchange ideas actively and perhaps start collaborations with interesting people. A technical paper, with a few ideas and a work-in-progress prototype does not qualify as a position paper in my opinion. Usually, that kind of work only encourages discussion between people that have been working on similar things already. But let’s see how it turns out.

Our contribution to the workshop is a little experience report on how concurrency and modularity are related to each other in interpreter implementations. And, to make it short: modularity does matter to manage concurrency invariants, but things like AOP are far less important than some people might hope.

Abstract

In this paper, we review what we have learned from implementing languages for parallel and concurrent programming, and investigate the role of modularity. To identify the approaches used to facilitate correctness and maintainability, we ask the following questions: What guides modularization? Are informal approaches used to facilitate correctness? Are concurrency concerns modularized? And, where is language support lacking most?

Our subjects are AmbientTalk, SLIP, and the RoarVM. All three evolved over the years, enabling us to look back at specific experiments to understand the impact of concurrency on modularity.

We conclude from our review that concurrency concerns are one of the strongest drivers for the definition of module boundaries. It helps when languages offer sophisticated modularization constructs. However, with respect to concurrency, other language features like single-assignment are of greater importance. Furthermore, tooling that enables remodularization taking concurrency invariants into account would be of great value.

  • Modularity and Conventions for Maintainable Concurrent Language Implementations: A Review of Our Experiences and Practices, Stefan Marr, Jens Nicolay, Tom Van Cutsem, Theo D’Hondt, Proceedings of the 2nd Workshop on Modularity In Systems Software (MISS’2012), ACM (2012), to appear.
  • Paper: PDF
    ©ACM, 2012. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. To appear.
  • BibTex: BibSonomy

OOSPLA 2011 @SPLASH2011, Day 3

The third day started with Brendan Eich’s keynote on JavaScript’s world domination plan. It was a very technical keynote, not very typical I suppose. And he was rushing through his slides with an enormous speed. Good that I have some JavaScript background. Aside all the small things he mentioned, interesting for me is that he seemed to be very interested to get Intel’s RiverTrail approach to data-parallelism into ECMAScript in one or another form. That is kind of contradicting the position I heard so far, that ECMAScript would be done with having WebWorkers as a model for concurrency and parallel programming.

Language Implementation

The first session had two interesting VM talks for me. The first being JIT Compilation Policy for Modern Machines. With the assumption that you cannot get your multicore/manycore machines busy with application threads, they experimented how additional compilation threads can be used to optimize code better. I do not remember the details completely, but I think, after 7 compilation threads they reached a mark where it was not worthwhile anymore to add more threads.

The second talk was on Reducing Trace Selection Footprint for Large-scale Java Applications with no Performance Loss. The goal here is to reduce the number of trace in a tracing JIT that need to be kept around and optimized. Might be something interesting for PyPy and LuaJIT2 to consider.

Parallel and Concurrent Programming

The last session I attended was again on parallel and concurrent programing. The first paper A Simple Abstraction for Complex Concurrent Indexes was a pretty formal one.

The second paper is a pessimistic approach to implement atomic statements meant for systems programming: Composable, Nestable, Pessimistic Atomic Statements. It is not optimistic like STM, and does not use a global locking order, which would need to be determined statically. Instead they annotate the fields with so-called shelters. Shelters build a hierarchy, which describes the necessary parts to synchronize with.

The third paper was also a very interesting one for me: Delegated Isolation. It is an approach again similar to STM in a sense, but it avoids unbounded number of transaction retries. For that, the object graph is essentially partitioned in growing subgraphs that can be processed in parallel. Only when a conflict, a race-condition occurred, subgraphs are merged logically, and the computation is serialized. A neat idea and an interesting use of the object-ownership idea.

The last talk was on: AC: Composable Asynchronous IO for Native Languages. It was a presentation of work done in the context of the Barrelfish manycore OS. The goal was to have an easy to use programming model, that comes close to sequential programming, but has similar performance properties as typical asynchronous APIs in operating systems. The result is a model based on async/finish, that seems to be relatively nice. As I understand it, it is basically syntactic sugar and some library support to wrap typical asynchronous APIs. But it is a model that is purely focused on such request/response asynchrony, and does not handle concurrency/parallism.

And that was basically it. SPLASH is a nice conference when it comes to the content. Not so interesting when it comes to the social “events”, it wasn’t much of an event anyway. Not even the food was notable…

OOSPLA 2011 @SPLASH2011, Day 2

The second day of the technical tracks started with a keynote by Markus Püschel. He is not the typical programming language researcher you meet at OOPSLA, but he does research in automatic optimization of programs. In his keynote, he showed a number of examples how to get the best performance for a given algorithm out of a particular processor architecture. Today’s compilers are still not up to the task, and will probably never be up to it. Given a naïve implementation, hand-optimized C code can have 10x speedup when dependencies are made explicit, and the compiler knows that no aliasing can happen. He was then discussing how that can be approached in an automated way, and was also thinking about what programming languages could do.

Award Papers

Afterwards, I attended the session with the awarded OOPSLA papers. The Hybrid Partial Evaluation talk presented an approach to avoid the typical cost of use of reflection or ‘interpretation’. The presentation of SugarJ: Library-based Syntactic Languages felt like a déjà vu. I did not get where it is different from Helvetica other than that it is for Java. The third paper on Reactive Imperative Programming with Dataflow Constraints was interesting in that it used also memory protection tricks to realize a reactive model in C++. The last presentation: Two for the Price of One: A Model for Parallel and Incremental Computation was very interesting. I have not used incremental computations as far as I am aware of anywhere other than for course work, but bringing it together with parallel programming in a single programming model, gives plenty of opportunities for super-linear speedups.

Parallel and Concurrent Programming

The second session of the day was on parallel and concurrent programming. Kismet: Parallel Speedup Estimates for Sequential Programs tackled the problem to get an idea of what opportunities for parallelism are available in a given program without having to change the used algorithms and approaches to much. For that, it uses data dependency analysis to characterize the critical path on a data-flow level. Since that usually does not give realistic results because of overestimation of parallelizability, they use in addition a hierarchical model of loops and the knowledge of the available hardware parallelism to better predict possible speedups.

The second and the third talk where almost identical in terms of problem and goal. Essentially, they provide the necessary infrastructure to run different variants of sequential implementations in parallel and then chose either the winner in terms of runtime or precision. These approaches are especially interesting if the available algorithms have very different properties for different input or input sizes. For instance, some mathematical algorithms just do not converge to a solution for certain inputs while they are very fast for others.

The last talk of the session discussed Scalable Join Patterns. Join patterns are an old approach to describe synchronization mechanism flexibly and declaratively. The presented work provided a scalable implementation approach that seems to work quite well and when they would use a compilation based approach for the patters, I guess it could be a very feasible and flexible replacement for standard synchronization mechanism provided as libraries.

Panel

Instead of attending the third paper session of the day, I attended the panel on Multicore, Manycore, and Cloud Computing: Is a new Programming Language Paradigm required?. Well, it was entertaining :) Nothing really new, no surprising arguments as far as I recall, but certainly interesting to watch. I think, they also recorded it. So it might be floating around the web soon.

OOSPLA 2011 @SPLASH2011, Day 1

The first day of the technical tracks including OOPSLA started with a keynote by Ivan Sutherland titled The Sequential Prison. His main point was that the way we think and the way we build machines and software is based on sequential concepts. The words we use to communicate and express ourselves are often of a very sequential nature. His examples included: call, do, repeat, program, and instruction. Other examples that shape and restrict our way of thinking are for instance basic data structures and concepts like strings (character sequences). However, we also use words that enable thinking about concurrency and parallelism much better. His examples for these included: configure, pipeline, connect, channel, network, and path.

After the talk David ask him what he would do in the first day of a class on how to program a massively parallel system. His answer was something like: “I would probably retire!”, making the point that it is a hard problem which requires creative solutions.

Catching Concurrency Bugs

After the keynote, the first technical track started with a session on catching concurrency bugs. The first paper presented was Sheriff: Precise Detection and Automatic Mitigation of False Sharing. As far as I understood, it is a pthread replacement, which abuses the memory-managing features and copy-on-write tricks to know about write-write contention on cacheline-level. The tool can attribute that back to the allocation site, which does not seem to be terribly useful if I manage my heap myself :-/

Accentuating the Positive: Atomicity Inference and Enforcement Using Correct Executions was the second paper presented. They use some inference technique to place locks to prevent data-races that are still in the code. The most severe limitation seems to be that it only works for stack variables. However, the idea of making almost correct code more correct looks interesting.

The third paper, SOS: Saving Time in Dynamic Race Detection with Stationary Analysis, focuses on identifying objects that are not changing their state after a certain initialization period. They call such objects stationary, since they cannot participate in races. Would be interesting to see what we could get out of a similar analysis to decide when to promote an object into the RoarVM’s read-mostly heap.

The last presentation in that session was about Testing Atomicity of Composed Concurrent Operations. Here they use commutativity specifications to reduce the search space/testing effort. The goal is to find for instance pairs of operations, which are meant to be executed atomically but are not properly synchronized.

Parallelizing Compilers

The second session started with Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization. They use a dynamic analysis to determine dependencies. The analysis further uses abstraction, ADT semantics to understand the recorded traces. The ADT semantics are used to abstract from the details, and avoid having to track everything precisely. Based on that analysis approach, they developed a tool that can be used to identify undesirable dependencies and iteratively improve a program.

The second paper was Automatic Fine-Grained Locking using Shape Properties. The goal is go from unsynchronized code to fine-grained synchronized code. For that purpose, they describe a new locking protocol called Domination Locking for objects graphs, which is supposed to be more general then earlier approaches.

The third paper Safe Parallel Programming using Dynamic Dependency Hints presented an extension on earlier work that uses hints like ‘possibly parallel region’. The system can execute such regions in parallel and will use an STM-like approach to make it correct in case there are conflicts. The presented work introduced channels to better describe data dependencies.

The last paper titled Sprint: Speculative Prefetching of Remote Data is targeted at distributed systems, but presents an approach that will automatically prefetch data to reduce the impact of latency and reduce overall runtime. Such a technique could be relevant for manycore systems exhibiting similar tradeoffs.

Memory Management

The last session of the day started with the presentation of a nice VM paper: Why Nothing Matters: The Impact of Zeroing. Certainly a worthwhile read for everyone implementing safe languages concerned with initializing objects/tables/structures/… efficiently with NULL.

The second talk Ribbons: a Partially Shared Memory Programming Model presented a programming model in-between threads and processes using memory-protection tricks to restrict the use of shared memory and isolate components. An interesting approach, especially since I have something similar in mind for the RoarVM.

The last talk of the day was about Asynchronous Assertions. Also something that might be interesting for paranoid VM hacker like me. It is an approach that allows the runtime to offload the assertion checking to other threads. To make that work, it works with snapshot semantics of the memory at the point where the assertion is offloaded.

Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Multicore/Manycore Era?

As preparation for SPLASH’11, here my paper for the VMIL workshop. It is a position paper discussing in which direction virtual machines should evolve in the future with regard to the challenges manycore architectures and concurrent programming bring.

As I said, this is a position paper, which hopefully provokes discussion. Feedback of any kind is welcome, and I am happy to adapt my presentation accordingly.

Abstract

While parallel programming for very regular problems has been used in the scientific community by non-computer-scientists successfully for a few decades now, concurrent programming and solving irregular problems remains hard. Furthermore, we shift from few expert system programmers mastering concurrency for a constrained set of problems to mainstream application developers being required to master concurrency for a wide variety of problems.

Consequently, high-level language virtual machine (VM) research faces interesting questions. What are processor design changes that have an impact on the abstractions provided by VMs to provide platform independence? How can application programmers’ diverse needs be facilitated to solve concurrent programming problems?

We argue that VMs will need to be ready for a wide range of different concurrency models that allow solving concurrency problems with appropriate abstractions. Furthermore, they need to abstract from heterogeneous processor architectures, varying performance characteristics, need to account for memory access cost and inter-core communication mechanisms but should only expose the minimal useful set of notions like locality, explicit communication, and adaptable scheduling to maintain their abstracting nature.

Eventually, language designers need to be enabled to guarantee properties like encapsulation, scheduling guarantees, and immutability also when an interaction between different problem-specific concurrency abstractions is required.

  • Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Multicore/Manycore Era?, Stefan Marr, Mattias De Wael, Michael Haupt, Theo D’Hondt, Proceedings of the 5th Workshop on Virtual Machines and Intermediate Languages, USA, ACM (2011), to appear.
  • Paper: PDF
    ©ACM, 2011. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. To appear.
  • BibTex: BibSonomy

Slides