SSW@ECOOP'26: On Debugging, Benchmarking and (Meta-)Compilation

Categories: Research June 16, 2026

At this year’s ECOOP, the Institute for System Software will be attending with the almost complete team and we’re going to present on a variety of topics. Come and talk to us at ICOOOLPS, MPLR, the Demo Track, DEBT, ECOOP Academy, and at the poster session!

Below, a brief overview and when the talks are scheduled.

Monday, June 29, 2pm: AOT Meta-Compilation of Dynamic Languages

Towards Ahead-of-Time Meta-Compilation of Dynamic Languages With an Extensible Type Analysis

Christoph A. is going to present initial ideas of how to approach ahead-of-time meta-compilation for dynamic languages. While some dynamic languages can already be compiled fairly successfully ahead of time, we would love to get the compiler for free, ideally from not much more than having to implement the interpreter.

Preprint of the ICOOOLPS position paper.

Full Abstract

Dynamically-typed languages rely on just-in-time (JIT) compilation for execution performance. Meta-compilation systems such as GraalVM's Truffle language implementation framework have reduced the effort needed of enabling JIT compilation to implementing an interpreter. But dynamic languages are increasingly used in scenarios where ahead-of-time (AOT) compilation would be preferable, for instance, for faster startup or to avoid the memory cost of JIT compilation. Therefore, we plan to extend meta-compilation systems to also support AOT compilation.

For successful AOT compilation of dynamically-typed languages, we need an extensive and robust type analysis. In this position paper, we present first ideas for a framework with an extensible core analysis that will enable us to extract type flow semantics from an interpreter implemented in a meta-compilation system.

To achieve the precision needed for fast machine code, we will need to include heuristic analyses. For this, we envision a plugin system that allows us to integrate various different heuristics into a singular unified analysis. Combining analyses in this way can produce results that are better than the sum of their parts.

While this is a very ambitious goal, given the complexity of compiling dynamic languages, we believe we can achieve better-than-interpreted performance for programs with reasonable behavior. Furthermore, to support the full language semantics we keep a general interpreter as a fallback.

Monday, June 29, 3pm: Pedagogical Annotations in a Debugger

Towards Guided Omniscient Debugging in Education using Pedagogical Execution Traces

Markus is going to present work on teaching programming by using debugging techniques. Specifically, he will look at enriching program visualizations with explanations and interactive questions.

Preprint of the DEBT paper.

Full Abstract

Educators frequently use trace-based debuggers for live classroom demonstrations. Yet, if a student’s attention drops during class, they have to fall back to watching recordings (providing a passive, non-interactive experience) or replaying the debugging session at home (lacking the instructor’s pedagogical context and verbal explanations). We introduce Pedagogical Execution Traces (PETs), a concept that enriches execution traces with explanations, highlights and interactive questions. In this work-in-progress idea paper, we present the conceptual foundation of PETs as interactive learning artifacts, showing their applicability within JavaWiz, an educational trace-based graphical debugger. We explore PET authoring design goals and outline ongoing work regarding collaborative debugging scenarios and leveraging Large Language Models (LLMs) for trace annotation.

Tuesday, June 30, 4pm: Supporting Different GCs in AOT-compiled Binaries

A Unifying Approach to Supporting Multiple Garbage Collectors in AOT-compiled Binaries

Thomas will present a fairly simple but effective approach to enable a single AOT-compiled binary for a Java program to use different GCs. At the moment it supports HotSpot's G1 and a more basic generational GC.

Preprint of the MPLR paper.

Full Abstract

Some language implementations combine garbage collection with ahead-of-time compilation to produce self-contained executables for managed-language programs. In these systems, one can typically choose a garbage collector (GC) only at build time. To use another GC, e.g., for better performance,one needs to build another executable.

In this paper, we present an approach for supporting multiple GCs in the same self-contained executable using unified barriers, object layout, object header, and dynamic dispatch. This enables developers to select a GC at run time. Additionally, isolates, i.e., lightweight virtual machine instances with separate collected heaps but within the same process, can now use different GCs alongside each other.

We evaluate our approach in GraalVM Native Image, supporting the Garbage First (G1) and the Serial GC in the same executable. Our evaluation on the DaCapo Chopin and Renaissance benchmarks shows that G1 has on average no performance change (min. −9 %, max. 14 %). Serial GC shows a peak performance regression of 11 % (min. −10 %, max. 33 %). We believe the simplicity of the approach and that one can now choose the GC at run time and on a per isolate basis make this overhead acceptable.

Tuesday, June 30, 5pm: Reducing Binary Size with Static Heuristics

To Compile or Not To Compile: Evaluating Static Heuristics to Reduce Binary Size of Hybrid Execution Systems

Christoph P. will present his evaluation of how far one can get with basic static compiler heuristics, when it comes to reducing the size of AOT-compiled Java binaries, while minimizing the impact on performance.

Preprint of the MPLR paper.

Full Abstract

To compile, or not to compile, that is the question: When ’tis nobler to optimize for performance. Modern compilers have many different optimizations and optimization goals. A common one is to balance between peak performance and startup time. A new ahead-of-time compiled native executable that embeds a managed runtime tries to offer both, while solidifying the notion that everything should be compiled. However, the cost of an enlarged binary size raises the question whether it is beneficial to compile everything.

In this paper, we evaluate static heuristics from classical AOT compilers as well as other techniques based on our own observations. Our goal is to identify heuristics that work in a compilation-first environment and that allow us to reduce binary size while maintaining peak performance.

We compare the different policies in a closed-world hybrid execution system for Java, based on GraalVM Native Image, on a set of 5 DaCapo and 13 Renaissance benchmarks. We find that with the best combination of heuristics we can reduce binary size by 20% while slowing down average performance by only 4%, but avoiding the need for any run-time feedback or complex machine-learning-based approaches. The most promising combination for production use combines heuristics based on early returns, estimated CPU cycles, number of parameters, and whether a method is a static initializer.

Date TBC: A Debugger for Teaching Threads and Locks

JavaWiz ThreadViz - A Visual Debugger for Multi-threaded Programs Based on the Espresso Java VM

Melissa is going to present a visual debugger designed for teaching threads and locks in Java. Threads, locks, and their interaction can feel hard to explain, though, with the right representation in a debugging tool, their dynamic interactions can become more understandable.

Preprint of the Demo paper.

Full Abstract

Programming novices often face difficulties understanding how multi-threading works. Visual debuggers such as JavaWiz can support beginners by providing dynamic visualizations of a program’s behavior, however, they usually only work for single-threaded programs. This paper presents ThreadViz, an extension of JavaWiz to support visualizing multi-threaded Java programs.

In ThreadViz, thread information is collected for visualization by using the Truffle Debug API. Instead of real concurrency, threads are executed stepwise, allowing the user to determine the order of execution and preventing any unpredictable behavior. In the user interface, a unique color is associated with each thread to illustrate the effects of different synchronization mechanisms such as locking and indicate thread state changes. To conclude, application examples are presented to highlight the tool's capabilities.

Friday, July 3, 11am: A Lecture on Benchmarking

Benchmarking on Modern Hardware: Techniques for Performance Comparisons from Day-To-Day Experimenting to Paper Writing

Last but not least, I'll give a lecture on benchmarking. Modern hard- and software makes that quite a bit more complicated than what we would like it to be and I will show a bit how we approach it in practice.

Full Abstract

Modern systems are great! In many ways, they adapt to our software, and optimize it, despite us not really knowing what we are doing, and to a degree that would have been considered magic just a few decades ago.

Though, once we develop our own research ideas on top of these systems and want to make any argument about performance, all this “magic” makes it hard to understand what measurements mean. Worse yet, making sensible performance claims means we have to understand a good chunk of it. Is this benchmark 20% faster because of what I did, or did the CPU increase the clock frequency for the new but not for the old code? Did the JVM just trigger garbage collection? Did the just-in-time compiler slow down my code? What do you mean, “efficiency core”?

In this lecture, we will have a brief look at why benchmarking on modern systems is hard and what can go wrong. Then we will discuss a range of different research scenarios to get a better feeling of what we may need for our work. Since much of this work may involve gradually building up our own systems, we will also look at what it takes to build them based on reliable feedback.

In the second part, we will look at how we can turn the often chaotic scientific process, with all its trials and errors, into a “scientific engineering process” that enables us to try and try again. I’ll suggest a process that allows us to use the same setup that we use for developing our system to not just understand its performance, but also use it to run the experiments we may want for a scientific paper. I’ll demonstrate how to go from daily pull requests with continuous performance tracking to generating plots and statistics for direct inclusion in LaTeX.

Tags: metacompilation, compilation, garbage collection, debugging, benchmarking

Programming Language Implementation: In Theory, We Understand. In Practice, We Wish We Would.

Categories: Personal February 2, 2026

It’s February! This means I have been at the JKU for four months. Four months with teaching Compiler Construction and System Software, lots of new responsibilities (most notably signing off on telephone bills and coffee orders…), many new colleagues, and new things to learn for me, not least because of the very motivated students and PhD students here. And when I say motivated, yes, I am very surprised. While the attendance of my 8:30am Compiler Construction lectures was declining throughout the term as expected, the students absolutely aced their exam. I suspect I will have to make it harder next year. Much harder… hmmm 🤔 Much of the good results can likely be attributed to the very extensive exercise sessions run by my colleagues throughout the semester.

At this point, I have to send a big thank you to everyone from the Institute for System Software, past and present. It’s great to be part of such a team! You made my start very easy, and, well, it now gives me the time to think about my inaugural lecture.

What’s an inaugural lecture?

I have been in academia for almost two decades, but I have to admit, I don’t really remember being at an inaugural lecture. According to Wikipedia, in the Germanic tradition an inaugural lecture (Antrittsvorlesung) is these days something of a celebration. It’s a festive occasion for a new professor to present their field to a wider audience, possibly also presenting their research vision.

At the JKU, it indeed seems to be planned as a festive occasion, too.

On March 9th, 2026, starting at 4pm Prof. Bernhard Aichernig and I will give our Antrittsvorlesungen, and you are cordially invited to attend.

Bernhard will give a talk titled Verification, Falsification, and Learning – a Triptych of Formal Methods for Trustworthy IT Systems.

My own talk is titled, as is this post: Programming Language Implementation: In Theory, We Understand. In Practice, We Wish We Would.

Bernhard will start out by looking at the formal side of things, making the connection between proving correctness, testing systems in the context of where they are used, and learning models from observable data. My talk will narrow in on language implementations, but also look at how formal correctness is helping us there. Unfortunately, provably-correct systems still elude us for many practical languages. Even worse, we are at a point where we rarely understand what’s going on in enough detail to improve performance or perhaps fix certain rare bugs.

If you like to attend, please register here.

In Theory, We Understand. In Practice, We Wish We Would

Here’s the abstract of my talk:

Our world runs on software, but we understand it less and less. In practice, the complexity of modern systems drains your phone’s battery faster, increases the cost of hosting applications, and consumes unnecessary resources, for instance, in AI systems. All because we do not truly understand our systems any longer. Still, at a basic level, we can fully understand how computers work, from transistors to processors, machine language, all the way up to high-level programming languages.

The convenience of contemporary programming languages is however bought with complexity. Over the last two decades, I admit, I added to that complexity. In the next two decades, I hope we can learn to build programming languages in ways that we can prove to be correct, enable us to generate their implementations automatically, and let systems select optimizations in a way that we can still understand the implications for software running on top of it.

You may now wonder where to go from here. And that’s a very good question. I have another month to figure that out, perhaps more… 😅

So, maybe see you in March?

Until then, suggestions, questions, and complaints, as usual on Mastodon, BlueSky, and Twitter.

Tags: personal, research, teaching, linz

Python, Is It Being Killed by Incremental Improvements?

Categories: Research January 20, 2026

Over the past years, two major players invested into the future of Python. Microsoft’s Faster CPython team has pushed ahead with impressive performance improvements for the CPython interpreter, which has gotten at least 2x faster since Python 3.9. They also have a baseline JIT compiler for CPython, too. At the same time, Meta is worked hard on making free-threaded Python a reality to bring classic shared-memory multithreading to Python, without being limited by the still standard Global Interpreter Lock, which prevents true parallelism.

Both projects deliver major improvements to Python, and the wider ecosystem. So, it’s all great, or is it?

In my talk talk on this topic at SPLASH, which is now online, I discussed some of the aspects the Python core developers and wider community seem to not regard with the same urgency as I would hope for. Concurrency makes me scared, and I strongly believe the Python ecosystem should be scared, too, or look forward to the 2030s being “Python’s Decade of Concurrency Bugs”.

In the talk, I started out reviewing some of the changes in observable language semantics between Python 3.9 and today and discuss their implications. I previously discussed the changes around the global interpreter lock in my post on the changing “guarantees”. In the talk, I also use the example from a real bug report, to illustrate the semantic changes:

request_id = self._next_id
self._next_id += 1

It looks simple, but reveals quite profound differences between Python versions.

Since I have some old ideas lying around, I also propose a way forward. In practice though, this isn’t a small well-defined engineering or research project. So, I hope I can inspire some of you to follow me down the rabbit hole of Python’s free-threaded future.

Incidentally, the latest release of TruffleRuby now uses many of the techniques that would be useful for Python. Benoit Daloze implemented them during his PhD and we originally published the ideas back in 2018.

Questions, pointers, and suggestions are always welcome, for instance, on Mastodon, BlueSky, or Twitter.

Slides

Tags: python, research, concurrency, concurrency models, cpython, interpreters, language implementation, dynamic languages, language design, parallelism, presentation

Older Posts

Nov 17, 2025 Benchmarking Language Implementations: Am I doing it right? Get Early Feedback!
Oct 15, 2025 Can We Know Whether a Profiler is Accurate?
Oct 1, 2025 First Day: A New Chapter at the JKU
Aug 27, 2025 How to Slow Down a Program? And Why it Can Be Useful.
Jul 31, 2025 It's Thursday, and My Last* Day at Kent

Stefan-Marr.de

Stefan Marr

Full Professor (Universitätsprofessor)

SSW@ECOOP'26: On Debugging, Benchmarking and (Meta-)Compilation

Monday, June 29, 2pm: AOT Meta-Compilation of Dynamic Languages

Towards Ahead-of-Time Meta-Compilation of Dynamic Languages With an Extensible Type Analysis

Monday, June 29, 3pm: Pedagogical Annotations in a Debugger

Towards Guided Omniscient Debugging in Education using Pedagogical Execution Traces

Tuesday, June 30, 4pm: Supporting Different GCs in AOT-compiled Binaries

A Unifying Approach to Supporting Multiple Garbage Collectors in AOT-compiled Binaries

Tuesday, June 30, 5pm: Reducing Binary Size with Static Heuristics

To Compile or Not To Compile: Evaluating Static Heuristics to Reduce Binary Size of Hybrid Execution Systems

Date TBC: A Debugger for Teaching Threads and Locks

JavaWiz ThreadViz - A Visual Debugger for Multi-threaded Programs Based on the Espresso Java VM

Friday, July 3, 11am: A Lecture on Benchmarking

Benchmarking on Modern Hardware: Techniques for Performance Comparisons from Day-To-Day Experimenting to Paper Writing

Programming Language Implementation: In Theory, We Understand. In Practice, We Wish We Would.

What’s an inaugural lecture?

In Theory, We Understand. In Practice, We Wish We Would

Python, Is It Being Killed by Incremental Improvements?

Slides

Older Posts