Sep 20 2009

Theory and Practice of Language Implementation, part 3

The third and last part of the summer school started with Ondrej Lhotak talking about pointer analysis. David Bacon presented a basic introduction to garbage collection and detailed description of the real-time Metronome GC. The last lecturer for this summer school was Patrick Cousot, who gave a very basic and thus understandable introduction to abstract interpretation.

Pointer Analysis

Ondrej’s lectures were focused on pointer analysis for C and Java-like languages. He introduced it as a static analysis for several problems like for instance optimization, call-graph construction, dependency analysis and optimization, cast elimination, side effect, and escape analysis. The major problem seems to be that it is a very broad subject with many different applications, and thus, many different techniques to do the analysis.

So, before applying pointer analysis the right abstraction for a specific problem has to be chosen. Usually, this kind of analysis will result in an over approximation of the actual problem. Under approximations are unsound and the exact points-to sets are not computable. Thus, a suitable over approximation has to be found which is exact enough for a specific question to be answered.

Just to recapitulate two examples, he introduced abstractions based on type filtering, i.e, points-to sets are calculated on the bases of types of the potential object during a program execution. Furthermore, he explained field sensitive abstractions. Here the main property of interest is the field to which an object is assigned and this information is used to calculate the points-to sets. More examples and more advance techniques like refinement demand-driven analysis are shown in the slide set which will hopefully appear on the summer school webpage.

Garbage Collection and the Metronome GC

David started his lectures by showing off a bit ;) He showed a nice demo of what the Metronome system is able to achieve in terms of real-time constraints. Definitely very interesting and it fortifies my belief that a Java-like language will be the next system programming language succeeding C/C++.

Afterwards, he gave a short introduction to mark/sweep and semi-space collectors. Useful was his explanation about the actual semantics of StopTheWorld, parallel, concurrent, and incremental GCs supported with an illustration (should remember that one). BTW: The world needs a better book on garbage collection. I don’t like the one of Jones and Lins. It does not solve my problems :(

Eventually, he started explaining the Metronome-2 systems. The paging approach let me wonder whether there is anything an operating system is actually doing for a modern VM. Well there is, but is it enough to justify the additional trouble? Dreaming of Singularity… The second lesson was basically dedicated to synchronization issues and parallel/concurrent data structures like lock-free stacks and which strategies are out there to implement them. This was a nice addition to what we discussed during Yannis’ lectures and gave me some basic inspiration to implement a lock-free ring buffer for multiple writers and a single reader.

His explanation of all the details of the metronome algorithm and how the different phases interact was quite difficult, and well, not to easy to follow especially since I lost the overall picture from time to time. But well, it is a hard problem and a lot of engineering without a simple solution as it seems.

Abstract Interpretation

The last topic at the summer school was abstract interpretation and the lectures were given by Mr. Abstract Interpretation Patrick Cousot himself. Even so it is a theoretical topic, I liked his introduction. Probably for some of the people it was to basic, but since I never had looked at it before, for me it was the right level of detail and complexity to get an idea what abstract interpretation is actually about.

Unfortunately, my own notes are very spares, and I was not able to retrieve the slide set up to now. However, abstract interpretation tries to provide a sound approximation of the semantics of a computer program, to enable various kinds of static analysis. The main problem here is, similar to pointer analysis, to choose the right abstraction for a given problem.

The examples I recall could have fit into Summit’s presentation about bound computation of loops as well. He introduced ideas like step-wise approximation and other techniques. But, well, unfortunately my memory is too blurry to write something down here with certainty. Beside the introduction of the formal techniques, he also spent some time on how to implement them with Ocaml examples. He also made quite some advertisement for his product ASTRÉE, which originally was developed to analyze code for avionic systems. Well, if I can get my hands on his slides, I might add some more information about the content, but for the moment that’s it.

Where to go next Year?

Need less to say, that I really enjoyed the summer school. The lectures had different levels and quality, but the overall experience was great. Hope, that there will be a VM summer school next year, somewhere on this nice planet. I would love to attend it, too. :)

Aug 3 2009

Theory and Practice of Language Implementation, part 2

The second part of the summer school was a bit more applied and more in the direction of my own interests. Chandra Krintz talked about managed runtime environments. Yannis Smaragdakis introduced multi-threaded programming and transactional memory. Sumit Gulwani as the third lecturer taught us symbolic bounds computation.

As a cherry on the cake, Oliver Danvy talked about general tips and tricks for PhD students.

Managed Runtime Environments: Implementations and Opportunities

The content-part of her lecture started with an overview of how VMs execute programs. Well, as usual, it was limited to Java, Python, and Ruby. This gives me the feeling, that there are only very few people looking into the similarities between runtimes for functional and imperative languages. She talked about stack-based VM instruction sets and briefly mentioned register-based as well as register-memory-based designs, but mainly referring to CPU instruction set architectures. Interpretation techniques, just-in-time compilation, problems associated with profiling and instrumentation had been discussed, but neither of them in detail.

One topic, she was spending more time on was her own research, i.e., communication between independent JVM instances via shared memory. It was interesting, especially her approach to share classes between the VM instances by using shared pages and direct pointers from the instances.

The last lecture was a broad overview of cloud computing, available techniques and the current challenges, but not really technical.

Multi-Threaded Programming and Transactional Memory

Yannies gave more or less a hands-on introduction to locks and threads with programming examples/exercises. From the poll in the audience, there are actually people, which never have used threads before. I thought this is just one of the assumptions professors usually get wrong, but well… From my point of view, I have heard a lot of it already before, but it was still nice to be reminded of some of the details we discussed. The primitives for monitor-style/traditional shared-memory concurrency he identified were lock/unlock, wait, signal, and broadcast. They allow building any kind of more complex critical sections on top.

The basic points he emphasized are to use while-loops on all conditions and the rule for single signals vs. broadcast signals. In short, a single signal, i.e., a notify, can only be used when all threads waiting on a condition are equal, but only one at a time can make progress. If more then one can make progress, a broadcast, i.e., a notifyAll has to be used to avoid livelocks.

After some typical examples of problems with locks, for instance, locking order leading to deadlocks, he introduced transactional memory. His focus was on the implications of this programming model, the different variations, performance questions, and the problem of irreversible operations.

Art of Invariant Generation applied to Symbolic Bound Computation

The computation of bounds for e.g. loops has a broad range of applications. The knowledge about bounds allows estimating the necessary runtime or memory utilization, thus in general resource utilization. Furthermore, it can be used to identify secrecy properties in terms of information leakage due to repetitive patterns and problems of robustness due to errors and uncertainty for instance introduced by rounding.

After a short introduction and a simple example, he introduced a countless number of approaches to this problem and explained how to apply them to specific problems. His lectures was stuffed with a lot of content, think much more then any other lecturer tried to convey. Moreover, he managed to speak with an incredible speak. Have never heard someone before, putting an infinite number of words into one minute. Sometimes less is more…

Edutainment for the Night Lecture

Oliver was giving a Tips and Tricks lecture about being a PhD student, writing and reviewing papers, and giving talks. Presented in a quite entertaining style :)


Jul 28 2009

Theory and Practice of Language Implementation, part 1

Thanks to the financial support of FWO and a grant by the summer school itself, I was able to go to Eugene, Oregon and learn about theory and practice of language implementation.

The first part of this summer school had three different topics. Ras Bodik presented highly interesting approaches for program synthesis. Oliver Danvy taught us how to use continuations and how to transform a program into continuation passing style. Matt Might tried to explain control-flow analysis(CFA) for high-order programs, starting with 0-CFA(zero-CFA) and k-CFA.

Algorithmic Program Synthesis

The program synthesis lectures were from my perspective the most comprehensible and interesting ones. Ras gave a high-level overview over different approaches to utilize expert knowledge i.e. domain insights to guide the synthesis of algorithms. He presented two different directions: derivative and selective synthesis. The derivative synthesis has its roots in the formal world. The idea is to define a problem with a formal language and to automatically derive a proof for it. As a by-product of this proof, an algorithm to solve the problem is synthesized. This approach is limited mainly by the formal skills of the programmers, and thus, was not successful in a larger scale.

Selective synthesis tries to circumvent these limitations by trying to preserve the usual tools for the programmers as the basic foundation. One example he explained in more detail was SKETCH. Here, the programmer has to describe a space of possible programs and the synthesizer can chose an exemplar of this space by testing/verifying whether it fulfills a specification, which also has to be given by the programmer. An example could be an optimized sort algorithm, which might be tailored to a specific memory model. The programmer implements the basic sketch and leaves some open holes, which needs to be synthesized. She also can give a simple bubble sort as specification, or as an alternative, a checking algorithm, whether the resulting data is sorted. He concluded his lectures with a demonstration of angelic programming, which uses the idea of sketching a solution and automatically deriving an implementation.

Continuations to Go

Oliver was the first lecturer asking us to do exercises. This was a good way to force us to really think about the content of his lectures. At first he taught us the three basic steps to transform a program to continuation-passing style (CPS): 1. name all immediate values, 2. sequentialize immediate evaluations, and 3. introduce the actual continuation. Then, he gave us some exercises, for instance to build the list of suffixes of a list and the list of prefixes. In a language supporting linked lists, the list of suffixes can be realized by sharing the data between all suffixes. The list of suffixes, thus is comprised of pointers pointing to the according element of the original list, where the suffix starts. For the list of prefixes, it is not possible to share the data, but, by implementing it in a continuation-passing style. You can share the computation of the prefixes similar to sharing the data in the case of the suffixes.

Another example, he ask to implement was a convolution of two lists. Thus, the result is a list of pairs of elements of each list, where the second list is reverse beforehand. The interesting thing here is, that it can be implemented in a recursive way with n steps of recursion for lists with n elements. The solution is simple, if you figure out that you can use the return time of a recursion to traverse the second lists, which is equal to a reverse. Later he pointed out, that an implementation of this, using CPS and defunctionalization will result in a plain usual zip/reverse like it would be done in Haskell.

Control-flow Analysis of Higher-Order Languages

Well, and then there was the control-flow analysis (CFA). At the moment, I have not figured out completely how to read the slides, since my Greek is rather limited. What I was able to take away from this course was, that there are several different techniques where 0-CFA is a robust and efficient one. In general, the k in k-CFAs denotes the amount of history, which is considered in the analysis. Usually, only 0-CFA and 1-CFA are applied in practice, i.e. in compilers, since the state space of the analysis explodes very fast. The concrete semantics of a program would be described by an infinite-CFA. He also talked about possible optimizations, like applying garbage collection, to eliminate unreachable states/possibilities during the CFA and thereby increasing the efficiency of the analysis significantly.