Workshops at SPLASH 2010

As usual I will write about a few of my personal highlights of SPLASH and the co-located workshops. That is mostly from my spotty notes, and from memory, so I don’t guarantee 100% accuracy, especially with respect to what other people might have said.

For an impression on the location itself, I will just cite and refer to what Nick wrote on the JOT blog:
“Reno airport was like a gateway into hell, slot machines everywhere [...] The conference venue is almost comically grim. The main floor is a sea of slot machines and haggard looking people.”
So, it was definitely not the most exciting place ever, and I was already worried that my colleagues start to shoot at those zombies ;)

Anyway, from the content point of view, it was actually a nice conference for me.

On Sunday the Virtual Machine Intermediate Languages workshop took place. As last year, that is the most relevant workshop with respect to VMs I came across so far. This year, especially the invited talks were very interesting.

A JVM Does What????

Cliff Click started with reporting on his perception of JVMs and the illusions they provide to the developers. My take away from his talk are the following points.

First, garbage collection is still the major issue, and people are willing to pay for better performance here. He kind of implied that JIT compilers are nice to have, but not as high on the list of priorities for his typical customers.

Second, he wants people to explore alternative concurrency models on top of the VM. From his perspective, the JVM is a great platform and things like locks are cheap. He agrees that things like Erlang-like Actors need deeper hocks into the Java Memory Model and possibly the JIT compiler, but in general I understood that he would rather prefer something on top instead of another thing integrated into the VM. Well, lets see how my ideas work out.

Related to my ideas we had a small discussion afterwards with David. I was surprised that Azul uses a Uniform Memory Access model for its systems but apparently the problem is that current business applications exhibit random access patterns all over the heap. Thus, if you have a system with 16 chips and 16 memory controller, 15/16 of the access are going remote anyway. That is why the optimize for that case instead of optimizing local performance. Interesting, but perhaps just the consequence of not having appropriate languages which take locality into account in the first place.

SPUR: A Trace-Based JIT Compiler for CIL

Nikolai Tillmann reported on the SPUR project at Microsoft Research. He gave a introduction to tracing-based just-in-time compilation and also present some benchmarks. The interesting part about SPUR is that they actually JIT .NET but experiment mainly with JavaScript.

For me the most interesting aspect of his talk was the future work section were he mentioned a few attempts on parallelizing code by the tracing JIT. Their ideas mainly focus on vectorization which is kind of not so exciting, hope they will also look into speculative execution, even so Nikolai asks for more hardware support for such an idea.

A Systematic Mapping Study on High-level Language Virtual Machines

The first research paper I am going to mention here was meta research on VM research.

The authors surveyed the body of literature on VMs to find out what people are doing research on. Well, the scope was a bit to narrow to actually cover all interesting papers, but it is a very nice first step. David was a bit disappointed that his Self and other Smalltalk papers were not covered and that the literature that was identified as being relevant only started in the 90′ies or so. Well, the authors were already aware of those limitations, but beside this definitely constructive criticism, the audience also came up with proposals to get us as the community involved. There is serious interest in such research and people would be happy with helping classifying (and certainly promoting their own research) if that could happen on a wiki or so…

The Architecture of DecentVM – Towards a Decentralized Virtual Machine for Many-Core Computing

 

The second research paper with high relevance for myself was about DecentVM. The DecentVM is based on the distributed DecentSTM. It implements a JVM currently running on a distributed system. However, they also want to look into how to make it run on Intel’s Single-chip Cloud Computer. So, some interesting work coming up there.

How’s the Parallel Computing Revolution Going? Towards Parallel Scalable Virtual Machine Services

Kathryn McKinley reported on experiments her students did to compare the speed and power consumption of CPUs over the last few years. Turns out, the power consumption seems to rise faster than the performance, especially since the benchmarks do not scale perfectly for multicore applications. However, there is quite a bit of progress with respect to saving energy instead of increasing performance with Intel’s Atom and related architectures.

Interesting was her proposal to parallelize the VM in itself. Something Theo always asks for, too. However, Cliff Click basically said that HotSpot is already at that point for most parts. So, at least from his perspective, that is not a field where major breakthroughs will come from…

Monday, the second day of workshops was less interesting. I started in the day with giving my presentation at the Doctoral Symposium. Did not get more than meta-feedback, unfortunately. I guess, it was just to early for that. What I have is an idea (perhaps with to many open design options) and a plan to validate it. But it was obviously still too fluffy… On the other hand, that meant I was missing great workshops like for instance Evaluate 2010 and the Dynamic Languages Symposium *sigh*

Highlights of HPCC 2010

The 12th IEEE Internal Conference on High Performance Computing and Communications was not the first conference I attended. However, it was the first one where I actually presented a paper in the main research track.

As usual, the conference covered a wide variety of different topics. For me the following presentations were the most interesting, since they discuss problems related to my own research.

GPGPUs are a hot topic and HPCC covered several different aspects. Often it was not the main focus of the presented paper, which was inherently interesting for me, instead it was the problems the authors were facing during their experiments. For instance the presentation on Sparse Matrix Formats Evaluation and Optimization on a GPU highlighted the difficulties of programming systems with such complex memory hierarchies as present in today’s GPGPU systems. Similar, OpenCL: Make Ubiquitous Supercomputing Possible gave an introduction on how to develop applications for GPGPU systems.

The presentation on Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments gave interesting insights in how super computing center operate and what challenges they face. One interesting anecdote was that the power saving functionality of modern CPUs actually causes some unexpected trouble in such large-scale systems. The temperature differences caused by powering down CPUs can cause their dies to crack resulting in destroyed CPUs. Furthermore, the talk also included a projection of how future super computers will look like. Interestingly, this projection says that the tradeoff between power efficiency and sophisticated CPU optimizations for single-core performance will be decided in favor for power efficiency. Thus, they expect future super computers to have dramatically higher number of much simpler cores than what is used today. That means, optimizations like branch prediction and out-of-order execution will most likely not be present on CPU cores for exascale systems to be able to build high performance systems that fit into the practical energy envelope, i.e., are coolable.

Related to cluster computing was the presentation on Enabling GPU and Many-Core Systems in Heterogeneous HPC Environments Using Memory Considerations. Here the memory bandwidth limitations were considered to schedule tasks on heterogeneous clusters. The idea is that tasks that saturate the memory system will slow down the overall systems. Thus, better distribution of tasks, taking the available memory bandwidth into account will lead to better execution times.

The Evaluation of the Task Programming Model in the Parallelization of Wavefront Programs provides the arguments for the intuitive assumption that fine-grained parallelism is inefficient. However, the approach here was based on OpenMP and Intel’s TBB library, thus, there is no automatic solution for problems which are expressed naturally in a very fine-grained way. As far as I remember form the TiC Summer School, X10’s compiler is actually capable to coarsen up fine-grained task parallelism, and thus provides some automatic solution for this problem.

However, my very personal highlight of the conference was moment when I unexpectedly received the Best Student Paper Award for my paper on Insertion Tree Phasers.

Poster at SPLASH’10

On a related note to the last post, my poster got also accepted for SPLASH’10.

It will get a minor revision, and include some of the latest ideas related to locality and encapsulation. However, the abstract which will appear in the proceedings is rather high-level, and does not go into detail. It is another case of reusing IWT report material to have another little dot on the CV, and free food ;)

Abstract

We propose to search for common abstractions for concurrency models to enable multi-language virtual machines to support a wide range of them. This would enable domain-specific solutions for concurrency problems. Furthermore, such an abstraction could improve portability of virtual machines to the vastly different upcoming many-core architectures.

  • Many-Core Virtual Machines: Decoupling Abstract from Concrete Concurrency, Stefan Marr, Theo D’Hondt, Proceedings of the first SPLASHCon, USA, ACM (2010).
  • Paper: PDF
    ©ACM, 2010. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. To appear.
  • BibTex: BibSonomy

Doctoral Symposium at SPLASH 2010

In October, I will give a brief presentation on the state of affairs with my PhD research at the SPLASH 2010 Doctoral Symposium. The basic idea has not changed since my last presentation at the TiC’10 summer school. I haven’t been able to do a lot of real work for it, but the ideas are a bit clearer now. The following two-page proposal will be published as part of the conference proceedings.

Abstract

We propose to search for common abstractions for different concurrency models to enable high-level language virtual machines to support a wide range of different concurrency models. This would enable domain-specific solutions for the concurrency problem. Furthermore, advanced knowledge about concurrency in the VM model will most likely lead to better implementation opportunities on top of the different upcoming many-core architectures. The idea is to investigate the concepts of encapsulation and locality to this end. Thus, we are going to experiment with different language abstractions for concurrency on top of a virtual machine, which supports encapsulation and locality, to see how language designers could benefit, and how virtual machines could optimize programs using these concepts.

  • Encapsulation And Locality: A Foundation for Concurrency Support in Multi-Language Virtual Machines?, Stefan Marr, Proceedings of the first SPLASHCon, USA, ACM (2010).
  • Paper: PDF
    ©ACM, 2010. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. To appear.
  • BibTex: BibSonomy

Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fine-grained Parallelism

The last half year was an interesting departure from my actual PhD research. First, I though the idea of barriers and phasers might be interesting to incorporate into a virtual machine as part of my thesis, but as it turned out, they are much to high-level and are better off implemented in a library. The gain for direct support in a VM is just not proportional to the effort and restrictions which come with that step.

However, the time was not spend just for the sake of the academic exercise. I had a small idea how to improve the existing approaches and after quite some work, which proved that the initial idea was just broken, I had an algorithm that actually worked. Even so that idea is not contributing anything directly to my thesis, I was lucky enough to have a paper about it accepted at the IEEE HPCC’10 conference.

Abstract

This paper presents an algorithm and a data structure for scalable dynamic synchronization in fine-grained parallelism. The algorithm supports the full generality of phasers with dynamic, two-phase, and point-to-point synchronization. It retains the scalability of classical tree barriers, but provides unbounded dynamicity by employing a tailor-made insertion tree data structure.

It is the first completely documented implementation strategy for a scalable phaser synchronization construct. Our evaluation shows that it can be used as a drop-in replacement for classic barriers without harming performance, despite its additional complexity and potential for performance optimizations. Furthermore, our approach overcomes performance and scalability limitations which have been present in other phaser proposals.

  • Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fine-grained Parallelism, Stefan Marr, Stijn Verhaegen, Bruno De Fraine, Theo D’Hondt, Wolfgang De Meuter, Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications (HPCC), IEEE CS, September (2010) (to appear).
  • Paper: PDF
    ©IEEE, 2010. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
  • BibTex: BibSonomy