Tag Archives: OOPSLA

OOSPLA 2011 @SPLASH2011, Day 3

The third day started with Brendan Eich’s keynote on JavaScript’s world domination plan. It was a very technical keynote, not very typical I suppose. And he was rushing through his slides with an enormous speed. Good that I have some JavaScript background. Aside all the small things he mentioned, interesting for me is that he seemed to be very interested to get Intel’s RiverTrail approach to data-parallelism into ECMAScript in one or another form. That is kind of contradicting the position I heard so far, that ECMAScript would be done with having WebWorkers as a model for concurrency and parallel programming.

Language Implementation

The first session had two interesting VM talks for me. The first being JIT Compilation Policy for Modern Machines. With the assumption that you cannot get your multicore/manycore machines busy with application threads, they experimented how additional compilation threads can be used to optimize code better. I do not remember the details completely, but I think, after 7 compilation threads they reached a mark where it was not worthwhile anymore to add more threads.

The second talk was on Reducing Trace Selection Footprint for Large-scale Java Applications with no Performance Loss. The goal here is to reduce the number of trace in a tracing JIT that need to be kept around and optimized. Might be something interesting for PyPy and LuaJIT2 to consider.

Parallel and Concurrent Programming

The last session I attended was again on parallel and concurrent programing. The first paper A Simple Abstraction for Complex Concurrent Indexes was a pretty formal one.

The second paper is a pessimistic approach to implement atomic statements meant for systems programming: Composable, Nestable, Pessimistic Atomic Statements. It is not optimistic like STM, and does not use a global locking order, which would need to be determined statically. Instead they annotate the fields with so-called shelters. Shelters build a hierarchy, which describes the necessary parts to synchronize with.

The third paper was also a very interesting one for me: Delegated Isolation. It is an approach again similar to STM in a sense, but it avoids unbounded number of transaction retries. For that, the object graph is essentially partitioned in growing subgraphs that can be processed in parallel. Only when a conflict, a race-condition occurred, subgraphs are merged logically, and the computation is serialized. A neat idea and an interesting use of the object-ownership idea.

The last talk was on: AC: Composable Asynchronous IO for Native Languages. It was a presentation of work done in the context of the Barrelfish manycore OS. The goal was to have an easy to use programming model, that comes close to sequential programming, but has similar performance properties as typical asynchronous APIs in operating systems. The result is a model based on async/finish, that seems to be relatively nice. As I understand it, it is basically syntactic sugar and some library support to wrap typical asynchronous APIs. But it is a model that is purely focused on such request/response asynchrony, and does not handle concurrency/parallism.

And that was basically it. SPLASH is a nice conference when it comes to the content. Not so interesting when it comes to the social “events”, it wasn’t much of an event anyway. Not even the food was notable…

OOSPLA 2011 @SPLASH2011, Day 2

The second day of the technical tracks started with a keynote by Markus Püschel. He is not the typical programming language researcher you meet at OOPSLA, but he does research in automatic optimization of programs. In his keynote, he showed a number of examples how to get the best performance for a given algorithm out of a particular processor architecture. Today’s compilers are still not up to the task, and will probably never be up to it. Given a naïve implementation, hand-optimized C code can have 10x speedup when dependencies are made explicit, and the compiler knows that no aliasing can happen. He was then discussing how that can be approached in an automated way, and was also thinking about what programming languages could do.

Award Papers

Afterwards, I attended the session with the awarded OOPSLA papers. The Hybrid Partial Evaluation talk presented an approach to avoid the typical cost of use of reflection or ‘interpretation’. The presentation of SugarJ: Library-based Syntactic Languages felt like a déjà vu. I did not get where it is different from Helvetica other than that it is for Java. The third paper on Reactive Imperative Programming with Dataflow Constraints was interesting in that it used also memory protection tricks to realize a reactive model in C++. The last presentation: Two for the Price of One: A Model for Parallel and Incremental Computation was very interesting. I have not used incremental computations as far as I am aware of anywhere other than for course work, but bringing it together with parallel programming in a single programming model, gives plenty of opportunities for super-linear speedups.

Parallel and Concurrent Programming

The second session of the day was on parallel and concurrent programming. Kismet: Parallel Speedup Estimates for Sequential Programs tackled the problem to get an idea of what opportunities for parallelism are available in a given program without having to change the used algorithms and approaches to much. For that, it uses data dependency analysis to characterize the critical path on a data-flow level. Since that usually does not give realistic results because of overestimation of parallelizability, they use in addition a hierarchical model of loops and the knowledge of the available hardware parallelism to better predict possible speedups.

The second and the third talk where almost identical in terms of problem and goal. Essentially, they provide the necessary infrastructure to run different variants of sequential implementations in parallel and then chose either the winner in terms of runtime or precision. These approaches are especially interesting if the available algorithms have very different properties for different input or input sizes. For instance, some mathematical algorithms just do not converge to a solution for certain inputs while they are very fast for others.

The last talk of the session discussed Scalable Join Patterns. Join patterns are an old approach to describe synchronization mechanism flexibly and declaratively. The presented work provided a scalable implementation approach that seems to work quite well and when they would use a compilation based approach for the patters, I guess it could be a very feasible and flexible replacement for standard synchronization mechanism provided as libraries.

Panel

Instead of attending the third paper session of the day, I attended the panel on Multicore, Manycore, and Cloud Computing: Is a new Programming Language Paradigm required?. Well, it was entertaining :) Nothing really new, no surprising arguments as far as I recall, but certainly interesting to watch. I think, they also recorded it. So it might be floating around the web soon.

OOSPLA 2011 @SPLASH2011, Day 1

The first day of the technical tracks including OOPSLA started with a keynote by Ivan Sutherland titled The Sequential Prison. His main point was that the way we think and the way we build machines and software is based on sequential concepts. The words we use to communicate and express ourselves are often of a very sequential nature. His examples included: call, do, repeat, program, and instruction. Other examples that shape and restrict our way of thinking are for instance basic data structures and concepts like strings (character sequences). However, we also use words that enable thinking about concurrency and parallelism much better. His examples for these included: configure, pipeline, connect, channel, network, and path.

After the talk David ask him what he would do in the first day of a class on how to program a massively parallel system. His answer was something like: “I would probably retire!”, making the point that it is a hard problem which requires creative solutions.

Catching Concurrency Bugs

After the keynote, the first technical track started with a session on catching concurrency bugs. The first paper presented was Sheriff: Precise Detection and Automatic Mitigation of False Sharing. As far as I understood, it is a pthread replacement, which abuses the memory-managing features and copy-on-write tricks to know about write-write contention on cacheline-level. The tool can attribute that back to the allocation site, which does not seem to be terribly useful if I manage my heap myself :-/

Accentuating the Positive: Atomicity Inference and Enforcement Using Correct Executions was the second paper presented. They use some inference technique to place locks to prevent data-races that are still in the code. The most severe limitation seems to be that it only works for stack variables. However, the idea of making almost correct code more correct looks interesting.

The third paper, SOS: Saving Time in Dynamic Race Detection with Stationary Analysis, focuses on identifying objects that are not changing their state after a certain initialization period. They call such objects stationary, since they cannot participate in races. Would be interesting to see what we could get out of a similar analysis to decide when to promote an object into the RoarVM’s read-mostly heap.

The last presentation in that session was about Testing Atomicity of Composed Concurrent Operations. Here they use commutativity specifications to reduce the search space/testing effort. The goal is to find for instance pairs of operations, which are meant to be executed atomically but are not properly synchronized.

Parallelizing Compilers

The second session started with Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization. They use a dynamic analysis to determine dependencies. The analysis further uses abstraction, ADT semantics to understand the recorded traces. The ADT semantics are used to abstract from the details, and avoid having to track everything precisely. Based on that analysis approach, they developed a tool that can be used to identify undesirable dependencies and iteratively improve a program.

The second paper was Automatic Fine-Grained Locking using Shape Properties. The goal is go from unsynchronized code to fine-grained synchronized code. For that purpose, they describe a new locking protocol called Domination Locking for objects graphs, which is supposed to be more general then earlier approaches.

The third paper Safe Parallel Programming using Dynamic Dependency Hints presented an extension on earlier work that uses hints like ‘possibly parallel region’. The system can execute such regions in parallel and will use an STM-like approach to make it correct in case there are conflicts. The presented work introduced channels to better describe data dependencies.

The last paper titled Sprint: Speculative Prefetching of Remote Data is targeted at distributed systems, but presents an approach that will automatically prefetch data to reduce the impact of latency and reduce overall runtime. Such a technique could be relevant for manycore systems exhibiting similar tradeoffs.

Memory Management

The last session of the day started with the presentation of a nice VM paper: Why Nothing Matters: The Impact of Zeroing. Certainly a worthwhile read for everyone implementing safe languages concerned with initializing objects/tables/structures/… efficiently with NULL.

The second talk Ribbons: a Partially Shared Memory Programming Model presented a programming model in-between threads and processes using memory-protection tricks to restrict the use of shared memory and isolate components. An interesting approach, especially since I have something similar in mind for the RoarVM.

The last talk of the day was about Asynchronous Assertions. Also something that might be interesting for paranoid VM hacker like me. It is an approach that allows the runtime to offload the assertion checking to other threads. To make that work, it works with snapshot semantics of the memory at the point where the assertion is offloaded.

Intermediate Language Design of High-level Language Virtual Machines: Towards Comprehensive Concurrency Support

My second workshop paper got published at the ACM Digital Library. This is actually only an abstract, but nonetheless, it might be interesting for people looking into the design of virtual machines and especially bytecodes/intermediate languages.

Abstract

Today’s major high-level language virtual machines (VMs) are becoming successful in being multi-language execution platforms, hosting a wide range of languages. With the transition from few-core to many-core processors, we argue that VMs will also have to abstract from concrete concurrency models at the hardware level, to be able to support a wide range of abstract concurrency models on a language level. To overcome the lack of sufficient abstractions for concurrency concepts in VMs, we proposed earlier to extend VM intermediate languages by special concurrency constructs[PLACES09].

As a first step towards this goal, we try to fill a gap in the current literature and survey the intermediate language design of VMs. Our goal is to identify currently used techniques and principles as well as to gain an overview over the available concurrency related features in intermediate languages.

Another aspect of interest is the influence of the particular target language, for which the VM is originally intended, on the intermediate language.

  • Paper: PDF
    ©ACM, 2009. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in VMIL’09, October 25, 2009. http://doi.acm.org/10.1145/1711506.1711509
  • BibTex: BibSonomy

Slides of the Talk at VMIL09/OOPSLA