OOSPLA 2011 @SPLASH2011, Day 2

The second day of the technical tracks started with a keynote by Markus Püschel. He is not the typical programming language researcher you meet at OOPSLA, but he does research in automatic optimization of programs. In his keynote, he showed a number of examples how to get the best performance for a given algorithm out of a particular processor architecture. Today’s compilers are still not up to the task, and will probably never be up to it. Given a naïve implementation, hand-optimized C code can have 10x speedup when dependencies are made explicit, and the compiler knows that no aliasing can happen. He was then discussing how that can be approached in an automated way, and was also thinking about what programming languages could do.

Award Papers

Afterwards, I attended the session with the awarded OOPSLA papers. The Hybrid Partial Evaluation talk presented an approach to avoid the typical cost of use of reflection or ‘interpretation’. The presentation of SugarJ: Library-based Syntactic Languages felt like a déjà vu. I did not get where it is different from Helvetica other than that it is for Java. The third paper on Reactive Imperative Programming with Dataflow Constraints was interesting in that it used also memory protection tricks to realize a reactive model in C++. The last presentation: Two for the Price of One: A Model for Parallel and Incremental Computation was very interesting. I have not used incremental computations as far as I am aware of anywhere other than for course work, but bringing it together with parallel programming in a single programming model, gives plenty of opportunities for super-linear speedups.

Parallel and Concurrent Programming

The second session of the day was on parallel and concurrent programming. Kismet: Parallel Speedup Estimates for Sequential Programs tackled the problem to get an idea of what opportunities for parallelism are available in a given program without having to change the used algorithms and approaches to much. For that, it uses data dependency analysis to characterize the critical path on a data-flow level. Since that usually does not give realistic results because of overestimation of parallelizability, they use in addition a hierarchical model of loops and the knowledge of the available hardware parallelism to better predict possible speedups.

The second and the third talk where almost identical in terms of problem and goal. Essentially, they provide the necessary infrastructure to run different variants of sequential implementations in parallel and then chose either the winner in terms of runtime or precision. These approaches are especially interesting if the available algorithms have very different properties for different input or input sizes. For instance, some mathematical algorithms just do not converge to a solution for certain inputs while they are very fast for others.

The last talk of the session discussed Scalable Join Patterns. Join patterns are an old approach to describe synchronization mechanism flexibly and declaratively. The presented work provided a scalable implementation approach that seems to work quite well and when they would use a compilation based approach for the patters, I guess it could be a very feasible and flexible replacement for standard synchronization mechanism provided as libraries.

Panel

Instead of attending the third paper session of the day, I attended the panel on Multicore, Manycore, and Cloud Computing: Is a new Programming Language Paradigm required?. Well, it was entertaining :) Nothing really new, no surprising arguments as far as I recall, but certainly interesting to watch. I think, they also recorded it. So it might be floating around the web soon.

OOSPLA 2011 @SPLASH2011, Day 1

The first day of the technical tracks including OOPSLA started with a keynote by Ivan Sutherland titled The Sequential Prison. His main point was that the way we think and the way we build machines and software is based on sequential concepts. The words we use to communicate and express ourselves are often of a very sequential nature. His examples included: call, do, repeat, program, and instruction. Other examples that shape and restrict our way of thinking are for instance basic data structures and concepts like strings (character sequences). However, we also use words that enable thinking about concurrency and parallelism much better. His examples for these included: configure, pipeline, connect, channel, network, and path.

After the talk David ask him what he would do in the first day of a class on how to program a massively parallel system. His answer was something like: “I would probably retire!”, making the point that it is a hard problem which requires creative solutions.

Catching Concurrency Bugs

After the keynote, the first technical track started with a session on catching concurrency bugs. The first paper presented was Sheriff: Precise Detection and Automatic Mitigation of False Sharing. As far as I understood, it is a pthread replacement, which abuses the memory-managing features and copy-on-write tricks to know about write-write contention on cacheline-level. The tool can attribute that back to the allocation site, which does not seem to be terribly useful if I manage my heap myself :-/

Accentuating the Positive: Atomicity Inference and Enforcement Using Correct Executions was the second paper presented. They use some inference technique to place locks to prevent data-races that are still in the code. The most severe limitation seems to be that it only works for stack variables. However, the idea of making almost correct code more correct looks interesting.

The third paper, SOS: Saving Time in Dynamic Race Detection with Stationary Analysis, focuses on identifying objects that are not changing their state after a certain initialization period. They call such objects stationary, since they cannot participate in races. Would be interesting to see what we could get out of a similar analysis to decide when to promote an object into the RoarVM’s read-mostly heap.

The last presentation in that session was about Testing Atomicity of Composed Concurrent Operations. Here they use commutativity specifications to reduce the search space/testing effort. The goal is to find for instance pairs of operations, which are meant to be executed atomically but are not properly synchronized.

Parallelizing Compilers

The second session started with Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization. They use a dynamic analysis to determine dependencies. The analysis further uses abstraction, ADT semantics to understand the recorded traces. The ADT semantics are used to abstract from the details, and avoid having to track everything precisely. Based on that analysis approach, they developed a tool that can be used to identify undesirable dependencies and iteratively improve a program.

The second paper was Automatic Fine-Grained Locking using Shape Properties. The goal is go from unsynchronized code to fine-grained synchronized code. For that purpose, they describe a new locking protocol called Domination Locking for objects graphs, which is supposed to be more general then earlier approaches.

The third paper Safe Parallel Programming using Dynamic Dependency Hints presented an extension on earlier work that uses hints like ‘possibly parallel region’. The system can execute such regions in parallel and will use an STM-like approach to make it correct in case there are conflicts. The presented work introduced channels to better describe data dependencies.

The last paper titled Sprint: Speculative Prefetching of Remote Data is targeted at distributed systems, but presents an approach that will automatically prefetch data to reduce the impact of latency and reduce overall runtime. Such a technique could be relevant for manycore systems exhibiting similar tradeoffs.

Memory Management

The last session of the day started with the presentation of a nice VM paper: Why Nothing Matters: The Impact of Zeroing. Certainly a worthwhile read for everyone implementing safe languages concerned with initializing objects/tables/structures/… efficiently with NULL.

The second talk Ribbons: a Partially Shared Memory Programming Model presented a programming model in-between threads and processes using memory-protection tricks to restrict the use of shared memory and isolate components. An interesting approach, especially since I have something similar in mind for the RoarVM.

The last talk of the day was about Asynchronous Assertions. Also something that might be interesting for paranoid VM hacker like me. It is an approach that allows the runtime to offload the assertion checking to other threads. To make that work, it works with snapshot semantics of the memory at the point where the assertion is offloaded.

DLS’11 and VMIL’11 @SPLASH2011

The second conference day was unfortunately full of “conflicts of interest”… It was pretty hard to choose between all the talks on the schedule.

Beside DLS and VMIL, there was also a workshop on Actor systems. However, I did not make it to this one. Cherry picking between DLS and VMIL was already hard enough and worked out only partially.

The two DLS keynotes were certainly highlights of the whole SPLASH conference. Gilad’s talk, on Dart was not just interesting but quite entertaining; especially their take on the optional type system looks useful. I like the reasoning behind their approach a lot. Who needs sound type systems anyway? Well, then again, I had a few bugs were a complete type system could have notified me earlier to make me aware that I am doing something stupid. So, I wonder whether it would be hard to have another optional type system in addition, which could be enabled on demand for certain modules to check a more strict set of constraints?

The second DLS keynote was David’s presentation with the title: “Everything You Know (About Parallel Programming) is Wrong!: A Wild Screed About the Future”. The main conclusion is certainly that Romeo would not have died when there would have been synchronization to avoid the race condition. What a nerdy example :D

In the end, the parallel revolution will need to find the balance between performance and correctness, since correctness is always bought by introducing delays and thus hampers performance. Another point he made more implicitly is that programming these systems becomes not only harder because of the parallel aspect, but also because of the ever-growing complexity of the underlying hardware. We just do not know anymore what is going on and why the performance is not as expected.

The VMIL workshop started with Eliot’s presentation. But, well, if you know how Eliot presents his work on the Cog VM then you know, that it is likely that people not acquainted with the details of Smalltalk VMs have a very hard time to follow. His paper made the message a bit clearer, I think. His main point was the VM integration of the Bochs processor emulator for debugging.

After the first break, Lars talked a bit too much about Dart as a language, and the VM details were rather scarce. They said that they are going for simplicity, and the one aspect that I remember was that they do not have any static initialization to avoid ‘spontaneous’ executions. And well, their Smalltalk and Java history certainly shows between the lines.

The talk titled “Virtual machines should be invisible” was essentially about how to build a VM that reuses the standard systems ABI to integrate better with the surrounding system. They use the DWARF debugging format and dynamic linking techniques to reach that goal. Interesting but not platform-independent and I do not really see why that would be desirable or necessary for what I consider a VM.

I unfortunately missed the talk on “A Microbenchmark Case Study and Lessons Learned”. From the comments made later in the workshop, it seemed to have been an interesting and very relevant talk.

The talk titled “Intermediate Language Extensions for Parallelism” was interesting, but I found that this IL is rather language dependent and is only applicable for a X10 or Habanero-ish language. Sure, the Cilk-style seems to be useful for a number of parallel problems, but it is not the only paradigm out there, so I wonder how that IL would look like if it would be designed for a multi-language VM.

The last talk of the workshop was mine: Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Multicore/Manycore Era? So, if you have any answers or thoughts on that topic, please leave me a note.

Transitioning to Multicore @SPLASH2011

And here we go again: SPLASH 2011 has started with its first day of workshops.

I attended the Transitioning to Multicore workshop.

For me the most interesting talks were two empirical studies presented by Fernando Castor. The first talk was about a controlled experiment with students using coarse-grained locks (or actually MVars) and STM in Haskell. While such experiments are usually biased in some way or another, he presented an interesting set of anecdotes along with the actual results. Notably, he observed similar mistakes done by the students as we observed as part of our Multicore Programming course. It seems to be hard to define the right granularity of atomicity, especially when an all-inclusive atomic section is not applicable. His second talk was about analyzing the use of concurrency constructs in open source software. While this was still very early work, it shows how gradual the adoption of libraries like java.util.concurrent is in such software projects.

The other presentations included approaches to minimize conflicts in transactional systems, automated support for memorization on top of STM, and how Intel’s TBB can be used to express parallel pipelined programs with a focus on which kind of pipelines can be easily expressed.

For me, the workshop ended with a panel discussion featuring Doug Lea, Matt Sottlie, Suresh Srinivas, and Tucker Taft. The most interesting point made was that one of the goals should be to facilitate all kind of different flavors of parallel programming (of course! ;) ) Matt pointed out in his introduction that parallel programming models are all good and well, and that being able to spilt up a problem in threads is important, but that the memory wall is often not discussed and that feeding enough data to the threads to keep them busy is becoming increasingly harder. Another point made in different variations is the importance of tools, their accessibility to developers, and their integration with runtimes and hardware. Doug mentioned also that the hardware vendors are making life harder for everyone without apparent reason by not using standards for instance to number cores. Especially since they do not provide the mechanisms to query the cache design properties and core arrangements that inhibits portable and reliable performance for mechanism like work-stealing schedulers.

Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Multicore/Manycore Era?

As preparation for SPLASH’11, here my paper for the VMIL workshop. It is a position paper discussing in which direction virtual machines should evolve in the future with regard to the challenges manycore architectures and concurrent programming bring.

As I said, this is a position paper, which hopefully provokes discussion. Feedback of any kind is welcome, and I am happy to adapt my presentation accordingly.

Abstract

While parallel programming for very regular problems has been used in the scientific community by non-computer-scientists successfully for a few decades now, concurrent programming and solving irregular problems remains hard. Furthermore, we shift from few expert system programmers mastering concurrency for a constrained set of problems to mainstream application developers being required to master concurrency for a wide variety of problems.

Consequently, high-level language virtual machine (VM) research faces interesting questions. What are processor design changes that have an impact on the abstractions provided by VMs to provide platform independence? How can application programmers’ diverse needs be facilitated to solve concurrent programming problems?

We argue that VMs will need to be ready for a wide range of different concurrency models that allow solving concurrency problems with appropriate abstractions. Furthermore, they need to abstract from heterogeneous processor architectures, varying performance characteristics, need to account for memory access cost and inter-core communication mechanisms but should only expose the minimal useful set of notions like locality, explicit communication, and adaptable scheduling to maintain their abstracting nature.

Eventually, language designers need to be enabled to guarantee properties like encapsulation, scheduling guarantees, and immutability also when an interaction between different problem-specific concurrency abstractions is required.

  • Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Multicore/Manycore Era?, Stefan Marr, Mattias De Wael, Michael Haupt, Theo D’Hondt, Proceedings of the 5th Workshop on Virtual Machines and Intermediate Languages, USA, ACM (2011), to appear.
  • Paper: PDF
    ©ACM, 2011. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. To appear.
  • BibTex: BibSonomy

Slides