<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stefan-Marr.de &#187; Research</title>
	<atom:link href="http://soft.vub.ac.be/~smarr/tag/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://soft.vub.ac.be/~smarr</link>
	<description>personal and research notes</description>
	<lastBuildDate>Tue, 24 Jan 2012 18:41:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Using R to Understand Benchmarking Results</title>
		<link>http://soft.vub.ac.be/~smarr/2011/09/using-r-to-understand-benchmarking-results/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=using-r-to-understand-benchmarking-results</link>
		<comments>http://soft.vub.ac.be/~smarr/2011/09/using-r-to-understand-benchmarking-results/#comments</comments>
		<pubDate>Sun, 18 Sep 2011 11:15:18 +0000</pubDate>
		<dc:creator>Stefan</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[RoarVM]]></category>
		<category><![CDATA[Smalltalk]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Virtual Machines]]></category>
		<category><![CDATA[VM]]></category>

		<guid isPermaLink="false">http://soft.vub.ac.be/~smarr/?p=441</guid>
		<description><![CDATA[Why R? Evaluating benchmark results with Excel became too cumbersome and error prone for me so that I needed an alternative. Especially, reevaluating new data for the same experiments was a hassle. However, the biggest problem with Excel was that I did not know a good way to query the raw data sets and group [...]]]></description>
			<content:encoded><![CDATA[<script type="text/javascript" src="http://soft.vub.ac.be/~smarr/renaissance/code/shCore.js"></script>
<script type="text/javascript" src="http://soft.vub.ac.be/~smarr/renaissance/code/shBrushR.js"></script>
<link type="text/css" rel="stylesheet" href="http://soft.vub.ac.be/~smarr/renaissance/code/shCoreDefault.css"/>

<h2>Why R?</h2>

<p>Evaluating benchmark results with Excel became
   too cumbersome and error prone for me so that I needed an alternative.
   Especially, reevaluating new data for the same experiments was a hassle.
   However, the biggest problem with Excel was that I did not know a good
   way to query the raw data sets and group results easily to
   be able to answer different kind of questions about the data.
   Thus, I decided I need to learn how to do it in a better way.
   Since, I was not really happy with the debuggability, traceability,
   and reusability of my spreadsheets either, I gave up on Excel entirely.
   I bet, Excel can do most of the things I need, but I wanted a text-based
   solution for which I can use my normal tools, too.</p>

<p>While working on <a href="https://github.com/smarr/ReBench">ReBench</a>,
   I got already in touch with
   <a href="http://matplotlib.sourceforge.net/">matplotlib</a> to generate
   simple graphs from benchmark results automatically.
   But well, Python does not feel like the ultimate language for what I was
   looking for either. Instead, <a href="http://www.r-project.org/">R</a> was
   mentioned from time to time when it came to statistical evaluation of
   measurements. And, at least for me, it turned out to be an interesting
   language with an enormous amount of libraries. Actually, a bit
   to enormous for my little needs, but it looked like a good starting
   point to brush up on my statistics knowledge.</p>

<p>By now, I use it regularly and applied it to a number of problems,
   including my work on the <a href="https://github.com/smarr/RoarVM">RoarVM</a>
   and a paper about
   <a href="http://www.hpi.uni-potsdam.de/hirschfeld/projects/som/index.html#csompl">CSOM/PL</a>.</p>
   
<h2>Benchmark Execution</h2>

<p>Before we can analyze any benchmark results, we need a reliable way to
   execute them, ideally, as reproducible as possible. For that purpose,
   I use a couple of tools:</p>
   <dl>
     <dt><a href="https://github.com/smarr/ReBench">ReBench</a></dt>
     <dd>A Python application that executes benchmarks based on a given
         configuration file that defines the variables to be varied,
         the executables to be used and the benchmarks and their
         parameters.</dd>
     <dt><a href="http://www.squeaksource.com/SMark.html">SMark</a></dt>
     <dd>A Smalltalk benchmarking framework that allows one to write
         benchmarks in a style similar to how unit-tests are written.</dd>
     <dt><a href="https://github.com/tobami/codespeed">Codespeed</a></dt>
     <dd>This is mostly for the bigger picture, a web application
         that provides the basic functionality to track long-term
         performance of an application.<br/><br/></dd>
  </dl>

<p>Beside having good tools, serious benchmarking requires some background
   knowledge. Today&#8217;s computer systems are just to complex to get good
   results with the most naive approaches.
   In that regard, I highly suggest to read the following two
   papers which discussing many of the pitfalls that one encounters when
   working with modern virtual machines, but also just on any modern
   operating system and with state-of-the-art processor tricks.</p>
   
   <ul>
     <li><em>Georges, A.; Buytaert, D. &#038; Eeckhout, L.</em><br/>
         <a href="http://dx.doi.org/10.1145/1297105.1297033">Statistically Rigorous Java Performance Evaluation</a><br/>
         SIGPLAN Not., ACM, 2007, 42, 57-76. (<a href="http://itkovian.net/base/files/papers/oopsla2007-georges-preprint.pdf">PDF</a>)</li>
     <li><em>Blackburn, S. M.; McKinley, K. S.; Garner, R.; Hoffmann, C.; Khan, A. M.; Bentzur, R.; Diwan, A.; Feinberg, D.; Frampton, D.; Guyer, S. Z.; Hirzel, M.; Hosking, A.; Jump, M.; Lee, H.; Moss, J. E. B.; Phansalkar, A.; Stefanovik, D.; VanDrunen, T.; von Dincklage, D. &#038; Wiedermann, B.</em></li>
         <a href="http://dx.doi.org/10.1145/1378704.1378723">Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century</a>
         Commun. ACM, ACM, 2008, 51, 83-89. (<a href="http://domino.watson.ibm.com/comm/research_people.nsf/pages/hirzel.index.html/$FILE/cacm08-dacapo-benchmarks.pdf">PDF</a>)</li>
   </ul>
   

<h2>Getting Started with R</h2>

<p>Now, we need to find the right tools to get started with R.
   After searching a bit, I quickly settled on
   <a href="http://www.rstudio.org/">RStudio</a>. It feels pretty much like
   a typical IDE with a <a href="http://en.wikipedia.org/wiki/Read-eval-print_loop">REPL</a>
   at its heart. As you can see in the screenshot
   below, there are four important areas. Top-left are your *.R files
   in a code editor with syntax highlighting and code completion. Top-right
   is your current R workspace with all defined variables and functions.
   This allows also to view and explore your current data set easily.
   On the bottom-right, we got a window that usually contains either
   help texts, or plots.
   Last but not least, in the bottom-left corner is the actual REPL.</p>

<p><a href="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/RStudio.png"><img
class="aligncenter size-medium wp-image-444" title="Screenshot of RStudio"
src="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/RStudio-300x187.png"
alt="Screenshot of RStudio" width="300" height="187" /></a></p>

<p><code>Rscript</code> is the interpreter that can be used on the command-line
   and thus, is nice to have for automating tasks. I installed it using MacPorts.</p>

<p>This introduction is accompanied by a
   <a href="https://github.com/smarr/BenchR/blob/master/using-R-to-understand-benchmarking-results.R">script</a>.
   It contains all the code discussed here and can be used to follow the
   introduction avoiding to much copy&#8217;n'past. Furthermore, the code here
   is not as complete as the script, to be a little more concise.</p>

<p>The first thing to do after setting up R is to install some libraries.
   This can be done easily from the REPL by executing the following code:
   <code>install.packages("plyr")</code>. This will
   install the <code>plyr</code> library providing <code>ddply()</code>,
   which we are going to use later to process our data set.
   Before it is going to install the code, it will ask for a mirror nearby
   to download the necessary files.
   To visualize the benchmark results afterwards, we are going to use bean plots
   from the <code>beanplot</code> library. The <code>doBy</code> library
   provides some convenience functions from which we are going to use
   <code>orderBy()</code>.
   After the installing all libraries, they can be loaded by executing
   <code>library(lib_name)</code>.</p>

<p>The documentation for functions, libraries, and other topics is always
   just a question mark away. For instance, to find out how libraries are installed,
   execute the following code in the REPL: <code>?install.packages</code>.

<h2>Preprocessing Benchmark Results</h2>

<p>To give a little context, the experiment this introduction is based on
   tries to evaluate the impact of using an object table instead of direct
   references on the performance of our manycore Smalltalk, the RoarVM.</p>

<p>The data set is available here:
   <a href="http://stefan-marr.de/downloads/object-table-data.csv.bz2">object-table-data.csv.bz2</a>.
   Download it together with the <a href="https://raw.github.com/smarr/BenchR/master/using-R-to-understand-benchmarking-results.R">script</a> and
   put it into the same folder.
   After opening the script in RStudio, you can follow the description here
   while using the code in RStudio directly. The run-button will execute a
   line or selection directly in the REPL.</p>

<h3>Loading Data</h3>

<p>First, change the working directory of R to the directory where you put
   the data file: <code>setwd("folder/with/data-file.csv.bz2/")</code>.
   The file is a compressed comma-separated values file that contains
   benchmark results, but no header line. Furthermore, some of the rows do not
   contain data for all columns, but this is not relevant here and we can just
   fill the gabs automatically.
   Load the compressed data directly into R&#8217;s workspace by
   evaluating the following code:</p>
   
<pre class="brush: r; toolbar: false;">
bench &lt;- read.table("object-table-data.csv.bz2",
                    sep="\t", header=FALSE,
                    col.names=c("Time", "Benchmark", "VirtualMachine",
                                "Platform", "ExtraArguments", "Cores",
                                "Iterations", "None", "Criterion",
                                "Criterion-total"),
                    fill=TRUE)
</pre>

<p>As you can see in the workspace on the right, the variable <code>bench</code>
   contains now 60432 observations for 10 variables. Type the following
   into the REPL to get an idea of what kind of data we are looking at:
   <code>summary(bench)</code>, or <code>View(bench)</code>.
   Just typing <code>bench</code> into the REPL will print the plain content,
   too.
   As you will see, every row represents one measurement with a set of
   properties, like the executed benchmark, its parameters, and the virtual 
   machine used.
</p>

<h3>Transforming Raw Data</h3>

<p> As a first step, we will do some parsing and rearrange the data to
    be able to work with it more easily. The name of the virtual machine binary
    encodes a number of properties that we want to be able to access directly.
    Thus, we split up that information into separate columns. </p>

<pre class="brush: r; toolbar: false;">
bench &lt;- ddply(bench,
               ~ VirtualMachine, # this formula groups the data by the value in VirtualMachine
               transform,
               # the second part of the VM name indicates whether it uses the object table
               ObjectTable = strsplit(as.character(VirtualMachine), "-")[[1]][2] == "OT",
               # the third part indicates the format of the object header
               Header = factor(strsplit(as.character(VirtualMachine), "-")[[1]][3]))  
</pre>

<p> This operation could use some more explanation, but most important
    know now is that we add the two new columns and that the data
    is being processed grouped by the values in the <code>VirtualMachine</code>
    column. </p>

<p> The data set contains also results from another experiment and some of the
    benchmarks include very detailed information. Neither of those information
    is required at the moment and we can drop the irrelevant data points
    by using <code>subset</code>. </p>

<pre class="brush: r; toolbar: false;">
bench &lt;- subset(bench,
                Header == "full"         # concentrate on the VM with full object headers
                 &#038; Criterion == "total", # use only total values of a measurement
                select=c(Time, Platform, # use only a limited number of columns
                         ObjectTable, Benchmark, ExtraArguments, Cores))
</pre>

<p> Furthermore, we assume that all variations in measurements come from the
    same non-deterministic influences. This allows to order the measurements,
    before correlating them pair-wise. The data is order based on all columns,
    where <code>Time</code> is used as the first ordering criterion. </p>

<pre class="brush: r; toolbar: false;">
bench &lt;- orderBy(~Time + Platform + ObjectTable + Benchmark + ExtraArguments + Cores, bench)
</pre>

<h3>Normalizing Data</h3>

<p> In the next step, we can calculate the speed ratio between pairs of
    measurements. To that end, we group the data based the unrelated variables
    and divide the measured runtime by the corresponding measurement of the VM
    that uses an object table. Afterwards we drop the measurements for the VM
    with an object table, since the speed ratio here is obviously always 1.
</p>

<pre class="brush: r; toolbar: false;">
norm_bench &lt;- ddply(bench, ~ Platform + Benchmark + ExtraArguments + Cores,
                    transform,
                    SpeedRatio = Time / Time[ObjectTable == TRUE])
norm_bench &lt;- subset(norm_bench, ObjectTable == FALSE, c(SpeedRatio, Platform, Benchmark, ExtraArguments, Cores))
</pre>

<h2>Analyzing the Data</h2>

<h3>The Basics</h3>

<p> Now we are at a point were we can start to make sense of the benchmark
    results.
    The summary function provides now a useful overview of the data and
    we can for instance concentrate on the speed ratio alone.
    As you might expect, you can also get properties like the standard
    deviation, arithmetic mean, and the median easily: </p>

<pre class="brush: r; toolbar: false;">
summary(norm_bench$SpeedRatio)
sd(norm_bench$SpeedRatio)
mean(norm_bench$SpeedRatio)
median(norm_bench$SpeedRatio)
</pre>

<p> However, that is very simple, a bit more interesting are R&#8217;s features
    to query and process the data. The first question we are interested
    in is, whether there is actually an impact on the performance difference
    for different numbers of cores: </p>

<pre class="brush: r; toolbar: false;">
summary(norm_bench$SpeedRatio[norm_bench$Cores==1])
summary(norm_bench$SpeedRatio[norm_bench$Cores==16])
</pre>

<h3>Beanplots</h3>

<p> Starring at numbers is sometimes informative, but usually only useful
    for small data sets.
    Since we humans are better in recognizing patterns in visual representations,
    <a href="http://www.jstatsoft.org/v28/c01/paper">beanplots</a> are a better
    way to make sense of the data. They are an
    elegant visualization of the distribution of measurements. </p>

<pre class="brush: r; toolbar: false;">
beanplot(SpeedRatio ~ Platform,
         data = norm_bench,
         what = c(1,1,1,0), log="",
         ylab="Runtime: noOT/OT",
         las=2)
</pre>

<p><a
href="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/overall-distribution-beanplot.png"><img
class="aligncenter size-medium wp-image-454" title="Beanplot of the overall
distribution of all measurements"
src="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/overall-distribution-beanplot-300x180.png"
alt="" width="300" height="180" /></a></p>


<p> This beanplot tells us, like the numbers before, that the mean is smaller
    than 1, which is the expected result and means that the indirection
    of an object table slows the VM down. However, the numbers did not tell
    use that there are various clusters of measurements that now become visible
    in the beanplot and are worth investigating. </p>

<p> Let&#8217;s look at the results split up by number of cores. </p>

<pre class="brush: r; toolbar: false;">
beanplot(SpeedRatio ~ Cores,
         data = norm_bench,
         what = c(1,1,1,0), log="",
         ylab="Runtime: noOT/OT")
</pre>

<p><a
href="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/distribution-by-cores.png"><img
class="aligncenter size-medium wp-image-455" title="Distribution of
measurements split up by cores"
src="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/distribution-by-cores-300x180.png"
alt="" width="300" height="180" /></a></p>

<h3>Digging Deeper</h3>

<p> While noticing that the variant without object table is faster
    for more cores, we also see that the speed ratios are distributed unevenly.
    While the bean for 1-core has a 2-parted shape, the 2-cores bean has a lot
    more points where measurements clump.
    That is probably related to the benchmarks: </p>

<pre class="brush: r; toolbar: false;">
beanplot(SpeedRatio ~ Benchmark,
         data = subset(norm_bench, Cores==2),
         what = c(1,1,1,0), log="",
         ylab="Runtime: noOT/OT", las=2)
beanplot(SpeedRatio ~ Benchmark,
         data = subset(norm_bench, Cores==16),
         what = c(1,1,1,0), log="",
         ylab="Runtime: noOT/OT", las=2)
</pre>

<p class="aligncenter"><a
href="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/distribution-by-benchmarks-cores2.png"><img
class="size-thumbnail wp-image-457" title="Results for 2 cores"
src="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/distribution-by-benchmarks-cores2-150x150.png"
alt="" width="150" height="150" /></a>

<a
href="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/distribution-by-benchmarks-cores16.png"><img
class="size-thumbnail wp-image-458" title="Results for 16 cores"
src="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/distribution-by-benchmarks-cores16-150x150.png"
alt="" width="150" height="150" /></a></p>

<p> Those two graphs are somehow similar, but you might notice that the float
    loop benchmark has a couple of strong outliers.
    I forgot the exact identifier of the float loop benchmark, so lets find out
    with this code: <code>levels(norm_bench$Benchmark)</code><br/>
    Now let&#8217;s filer the data shown by that benchmark. </p>

<pre class="brush: r; toolbar: false;">
beanplot(SpeedRatio ~ Cores,
         data = subset(norm_bench, Benchmark == "SMarkLoops.benchFloatLoop"),
         what = c(1,1,1,0), log="", ylab="Runtime: noOT/OT")
</pre>

<p><a
href="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/float-bench-per-core.png"><img
class="aligncenter size-medium wp-image-463" title="Results for the float loop
benchmark split up by cores"
src="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/float-bench-per-core-300x180.png"
alt="" width="300" height="180" /></a></p>


<p> This visualization reassures me, that there is something strange going on.
    The distribution of result still looks clumped as if there is another
    parameter influencing the result, which we have not regarded yet.
    The only column we have not looked at is <code>ExtraArguments</code>,
    so we will add them. Note that <code>droplevels()</code> is applied on the
    data set this time before giving it to the <code>beanplot()</code> function.
    This is necessary since the plot would contain all unused factor levels
    instead, which would reduce readability considerably. </p>

<pre class="brush: r; toolbar: false;">
beanplot(SpeedRatio ~ Cores + ExtraArguments,
         data = droplevels(subset(norm_bench, Benchmark == "SMarkLoops.benchFloatLoop")),
         what = c(1,1,1,0), log="", ylab="Runtime: noOT/OT", las=2)
</pre>

<p><a
href="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/float-bench-per-core-and-extra-arguments.png"><img
class="aligncenter size-medium wp-image-464" title="Results of the float loop
benchmark split up by cores and extra benchmark arguments"
src="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/float-bench-per-core-and-extra-arguments-300x180.png"
alt="" width="300" height="180" /></a></p>


<h3>Fixing up a Mistake</h3>

<p> This plot now shows us that there are three groups of results with
    different <code>ExtraArguments</code>.
    Think I forgot that the data set contains some very specific benchmarks.
    The overall goal of the benchmarks is to test the weak scaling behavior of
    the VM by increasing work-load and number-of-cores together.
    However, for the questions we are interested here, only those weak-scaling
    benchmarks are of interest.
    Thus, we need to filter out a couple of more data points, since the results
    will be unnecessarily biased otherwise.
    To filter the data points we use <code>grepl</code>.
    It matches the strings of <code>ExtraArguments</code> and allows us to
    filter out the single-core and the 10x-load benchmarks.</p>

<pre class="brush: r; toolbar: false;">
norm_bench &lt;- subset(norm_bench,
                     !grepl("^1 ", ExtraArguments)    # those beginning with "1" put load on a single core
                     &#038; !grepl("s0 ", ExtraArguments)) # those having "s0" in it put 10x load on each core
norm_bench &lt;- droplevels(norm_bench)
</pre>

<p> And since I always wanted to say that: 
    The exercise to regenerate all interesting graphs and re-answer the
    original questions are left to the interested reader. <img src='http://soft.vub.ac.be/~smarr/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> 

<h2>Conclusion</h2>

<p> As a final graph, we can plot all benchmarks for all cores,
    and get an overview. The <code>par</code> function allows to adapt the
    margins of the plot, which is necessary here to get the full benchmark
    names on the plot. </p>

<pre class="brush: r; toolbar: false;">
par(mar=c(20, 4, 1, 1))
beanplot(SpeedRatio ~ Cores + Benchmark,
         data = norm_bench,
         what = c(1,1,1,0), log="",  ylab="Runtime: noOT/OT", las=2)
</pre>

<p><a
href="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/final-overview.png"><img
class="aligncenter size-medium wp-image-465" title="Overview split by number
of cores and benchmarks"
src="http://soft.vub.ac.be/~smarr/wp-content/uploads/2011/09/final-overview-300x180.png"
alt="" width="300" height="180" /></a></p>

<p> As we already knew, we see an influence of the number of cores on the
 results, but more importantly, we see most benchmarks benefitting from removing
 the extra indirection through the object table. The float loops benefit by
 far the strongest. The float objects are so small and usually used only
 temporary that avoiding the object table pays off.
 For the integer loops it does not make a difference, since the VMs uses
 immediate values (tagged integers). Thus, the integers used here are not
 objects allocated in the heap and the object table is not used either. </p>

<p> Beyond the won insight into the performance implications of an object table
    this analysis also demonstrates the benefits of using a language like
    R. Its language features allow us to filter and reshape data easily.
    Furthermore, regenerating plots and tracing steps becomes easy, too.
    Here it was necessary since some data points needed to be removed from
    the data set to get to reasonable results. Reexecuting part of the script
    or just exploring the data is convenient and done
    fast, which allows me to ask more questions about the data and understand the
    measurements more deeply. Furthermore, it is less a hassle to reassess
    the data in case certain assumptions have changed, we made a mistake,
    or the data set changed for other reasons.
    From my experience, it is much more convenient than Excel, but that might
    just be because I spend more time on learning R than on learning Excel. </p>

<p> In case you try it out yourself, you will certainly want to experiment
    with other types of visualizations, save them to files, etc.
    A few of these things can be found in my benchmarking scripts on
    GitHub: <a href="https://github.com/smarr/BenchR">BenchR</a>.</p>

<script type="text/javascript">SyntaxHighlighter.all();</script>
]]></content:encoded>
			<wfw:commentRss>http://soft.vub.ac.be/~smarr/2011/09/using-r-to-understand-benchmarking-results/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Price of the Free Lunch: Programming in the Multicore Era</title>
		<link>http://soft.vub.ac.be/~smarr/2010/12/the-price-of-the-free-lunch-programming-in-the-multicore-era/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-price-of-the-free-lunch-programming-in-the-multicore-era</link>
		<comments>http://soft.vub.ac.be/~smarr/2010/12/the-price-of-the-free-lunch-programming-in-the-multicore-era/#comments</comments>
		<pubDate>Tue, 07 Dec 2010 22:49:30 +0000</pubDate>
		<dc:creator>Stefan</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Actors]]></category>
		<category><![CDATA[automatic parallelization]]></category>
		<category><![CDATA[brussels]]></category>
		<category><![CDATA[free lunch]]></category>
		<category><![CDATA[locks]]></category>
		<category><![CDATA[Manycore]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[over]]></category>
		<category><![CDATA[parallel]]></category>
		<category><![CDATA[pecha kucha]]></category>
		<category><![CDATA[software languages lab]]></category>
		<category><![CDATA[Virtual Machines]]></category>
		<category><![CDATA[vub]]></category>

		<guid isPermaLink="false">http://soft.vub.ac.be/~smarr/?p=388</guid>
		<description><![CDATA[Last Friday was the annual Lab event of our Software Languages Lab. Like last year, many people related to the lab in one or the other way came to get an overview of what the current topics of our research are. This year, we presented our research in the form of a Pecha Kucha talk. [...]]]></description>
			<content:encoded><![CDATA[<p>Last Friday was the annual Lab event of our Software Languages Lab. Like last year, many people related to the lab in one or the other way came to get an overview of what the current topics of our research are.</p>
<p>This year, we presented our research in the form of a Pecha Kucha talk. That means every presenter got 20 slides to present and each of the slides was shown exactly 20 seconds. That gives enough time to convey the general idea, but avoids boring the people with endless technical details.</p>
<p>All in all, that worked out pretty well.</p>
<p>My talk gave an overview of what the Parallel Programming Group is up to, on a very high level. It motivates why we are doing research in languages and language runtimes/virtual machines, and names our approaches to tackle the challenges. Well, for researchers in the field that is probably to vague, but everyone else might get just enough out of it to see in which direction we are going.</p>
<p>&nbsp;</p>

<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/yiwitL5AFR8?hl=en&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/yiwitL5AFR8?hl=en&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>&nbsp;</p>

<div id="__ss_6066686" style="width: 425px;"><strong><a title="The Price of the Free Lunch: Programming in the Multicore Era" href="http://www.slideshare.net/gron/the-price-of-the-free-lunch-programming-in-the-multicore-era">The Price of the Free Lunch: Programming in the Multicore Era</a></strong><object id="__sse6066686" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=pechakutcha-2010-004-101207163906-phpapp02&amp;stripped_title=the-price-of-the-free-lunch-programming-in-the-multicore-era&amp;userName=gron" /><param name="name" value="__sse6066686" /><param name="allowfullscreen" value="true" /><embed id="__sse6066686" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=pechakutcha-2010-004-101207163906-phpapp02&amp;stripped_title=the-price-of-the-free-lunch-programming-in-the-multicore-era&amp;userName=gron" name="__sse6066686" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
]]></content:encoded>
			<wfw:commentRss>http://soft.vub.ac.be/~smarr/2010/12/the-price-of-the-free-lunch-programming-in-the-multicore-era/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Workshops at SPLASH 2010</title>
		<link>http://soft.vub.ac.be/~smarr/2010/10/workshops-at-splash-2010/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=workshops-at-splash-2010</link>
		<comments>http://soft.vub.ac.be/~smarr/2010/10/workshops-at-splash-2010/#comments</comments>
		<pubDate>Sat, 30 Oct 2010 08:54:03 +0000</pubDate>
		<dc:creator>Stefan</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[Notes]]></category>
		<category><![CDATA[SPLASH]]></category>
		<category><![CDATA[VM]]></category>
		<category><![CDATA[VMIL]]></category>

		<guid isPermaLink="false">http://soft.vub.ac.be/~smarr/?p=362</guid>
		<description><![CDATA[As usual I will write about a few of my personal highlights of SPLASH and the co-located workshops. That is mostly from my spotty notes, and from memory, so I don&#8217;t guarantee 100% accuracy, especially with respect to what other people might have said. For an impression on the location itself, I will just cite [...]]]></description>
			<content:encoded><![CDATA[<p>As usual I will write about a few of my personal highlights of SPLASH and the co-located workshops. That is mostly from my spotty notes, and from memory, so I don&#8217;t guarantee 100% accuracy, especially with respect to what other people might have said.</p>
<p>For an impression on the location itself, I will just cite and refer to what <a href="http://blog.jot.fm/2010/10/25/first-impressions-of-reno-and-oopslasplash/">Nick wrote on the JOT blog</a>:<br />&#8220;Reno airport was like a gateway into hell, slot machines everywhere [...] The conference venue is almost comically grim. The main floor is a sea of slot machines and haggard looking people.&#8221;<br />So, it was definitely not the most exciting place ever, and I was already worried that my colleagues start to shoot at those zombies <img src='http://soft.vub.ac.be/~smarr/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Anyway, from the content point of view, it was actually a nice conference for me.</p>
<p>On Sunday the <a href="http://www.cs.iastate.edu/~design/vmil/2010/">Virtual Machine Intermediate Languages workshop</a> took place. As last year, that is the most relevant workshop with respect to VMs I came across so far. This year, especially the invited talks were very interesting.</p>
<h2><a href="http://www.cs.iastate.edu/~design/vmil/2010/papers/p01-click.txt">A JVM Does What????</a></h2>
<p><a href="http://www.azulsystems.com/blogs/cliff">Cliff Click</a> started with reporting on his perception of JVMs and the illusions they provide to the developers. My take away from his talk are the following points.</p>
<p>First, garbage collection is still the major issue, and people are willing to pay for better performance here. He kind of implied that JIT compilers are nice to have, but not as high on the list of priorities for his typical customers.</p>
<p>Second, he wants people to explore alternative concurrency models on top of the VM. From his perspective, the JVM is a great platform and things like locks are cheap. He agrees that things like Erlang-like Actors need deeper hocks into the Java Memory Model and possibly the JIT compiler, but in general I understood that he would rather prefer something on top instead of another thing integrated into the VM. Well, lets see how <a href="http://www.stefan-marr.de/2010/07/doctoral-symposium-at-splash-2010/">my ideas</a> work out.</p>
<p>Related to my ideas we had a small discussion afterwards with David. I was surprised that Azul uses a Uniform Memory Access model for its systems but apparently the problem is that current business applications exhibit random access patterns all over the heap. Thus, if you have a system with 16 chips and 16 memory controller, 15/16 of the access are going remote anyway. That is why the optimize for that case instead of optimizing local performance. Interesting, but perhaps just the consequence of not having appropriate languages which take locality into account in the first place.</p>
<h2><a href="http://www.cs.iastate.edu/~design/vmil/2010/papers/p03-tillmann.txt">SPUR: A Trace-Based JIT Compiler for CIL</a></h2>
<p><a href="http://research.microsoft.com/en-us/people/nikolait/">Nikolai Tillmann</a> reported on the SPUR project at Microsoft Research. He gave a introduction to tracing-based just-in-time compilation and also present some benchmarks. The interesting part about SPUR is that they actually JIT .NET but experiment mainly with JavaScript.</p>
<p>For me the most interesting aspect of his talk was the future work section were he mentioned a few attempts on parallelizing code by the tracing JIT. Their ideas mainly focus on vectorization which is kind of not so exciting, hope they will also look into speculative execution, even so Nikolai asks for more hardware support for such an idea.</p>
<h2><a href="http://www.cs.iastate.edu/~design/vmil/2010/papers/p05-durelli.pdf">A Systematic Mapping Study on High-level Language Virtual Machines</a></h2>
<p>The first research paper I am going to mention here was meta research on VM research.</p>
<p>The authors surveyed the body of literature on VMs to find out what people are doing research on. Well, the scope was a bit to narrow to actually cover all interesting papers, but it is a very nice first step. David was a bit disappointed that his Self and other Smalltalk papers were not covered and that the literature that was identified as being relevant only started in the 90&#8242;ies or so. Well, the authors were already aware of those limitations, but beside this definitely constructive criticism, the audience also came up with proposals to get us as the community involved. There is serious interest in such research and people would be happy with helping classifying (and certainly promoting their own research) if that could happen on a wiki or so&#8230;</p>
<h2><a href="http://www.cs.iastate.edu/~design/vmil/2010/papers/p07-bieniusa.pdf">The Architecture of DecentVM &#8211; Towards a Decentralized Virtual Machine for Many-Core Computing</a></h2>
<p>&nbsp;</p>

<div>The second research paper with high relevance for myself was about DecentVM. The DecentVM is based on the distributed DecentSTM. It implements a JVM currently running on a distributed system. However, they also want to look into how to make it run on Intel&#8217;s Single-chip Cloud Computer. So, some interesting work coming up there.</div>
<h2><a href="http://www.cs.iastate.edu/~design/vmil/2010/papers/p08-mckinley.txt">How&#8217;s the Parallel Computing Revolution Going? Towards Parallel Scalable Virtual Machine Services</a></h2>
<p><a href="http://www.cs.iastate.edu/~design/vmil/2010/papers/p08-mckinley.txt"></a>Kathryn McKinley reported on experiments her students did to compare the speed and power consumption of CPUs over the last few years. Turns out, the power consumption seems to rise faster than the performance, especially since the benchmarks do not scale perfectly for multicore applications. However, there is quite a bit of progress with respect to saving energy instead of increasing performance with Intel&#8217;s Atom and related architectures.</p>
<p>Interesting was her proposal to parallelize the VM in itself. Something Theo always asks for, too. However, Cliff Click basically said that HotSpot is already at that point for most parts. So, at least from his perspective, that is not a field where major breakthroughs will come from&#8230;</p>
<p>Monday, the second day of workshops was less interesting. I started in the day with giving my presentation at the<a href="http://www.stefan-marr.de/2010/07/doctoral-symposium-at-splash-2010/"> Doctoral Symposium</a>. Did not get more than meta-feedback, unfortunately. I guess, it was just to early for that. What I have is an idea (perhaps with to many open design options) and a plan to validate it. But it was obviously still too fluffy&#8230; On the other hand, that meant I was missing great workshops like for instance Evaluate 2010 and the Dynamic Languages Symposium *sigh*</p>
]]></content:encoded>
			<wfw:commentRss>http://soft.vub.ac.be/~smarr/2010/10/workshops-at-splash-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Towards an Actor-based Concurrent Machine Model</title>
		<link>http://soft.vub.ac.be/~smarr/2010/02/towards-an-actor-based-concurrent-machine-model/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=towards-an-actor-based-concurrent-machine-model</link>
		<comments>http://soft.vub.ac.be/~smarr/2010/02/towards-an-actor-based-concurrent-machine-model/#comments</comments>
		<pubDate>Sat, 20 Feb 2010 23:19:08 +0000</pubDate>
		<dc:creator>Stefan</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Actors]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[Manycore]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[paper]]></category>
		<category><![CDATA[Smalltalk]]></category>
		<category><![CDATA[Virtual Machines]]></category>
		<category><![CDATA[VMs]]></category>
		<category><![CDATA[workshop]]></category>

		<guid isPermaLink="false">http://soft.vub.ac.be/~smarr/?p=289</guid>
		<description><![CDATA[Already quite a while ago, I was involved in writing a workshop paper about an actor model for virtual machines. Actually, the main idea was to find a concurrency model for a VM which supports multi-dimensional separation of concerns. However, AOP is not that interesting for me at the moment, so I am focussing on [...]]]></description>
			<content:encoded><![CDATA[<p>Already quite a while ago, I was involved in writing a workshop paper about an actor model for virtual machines. Actually, the main idea was to find a concurrency model for a VM which supports multi-dimensional separation of concerns. However, AOP is not that interesting for me at the moment, so I am focussing on the concurrency, especially the actor-based VM model.</p>
<p>After one year, I am back looking at that paper, and it still looks like a great model. Think, I will incorporate it into my manycore VM now <img src='http://soft.vub.ac.be/~smarr/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h3>Abstract</h3>
<blockquote>
<p>In this position paper we propose to extend an existing delegation-based machine model with concurrency primitives. The original machine model which is built on the concepts of objects, messages, and delegation, provides support for languages enabling multi-dimensional separation of concerns (MDSOC). We propose to extend this model with an actor-based concurrency model, allowing for both true parallelism as well as lightweight concurrency primitives such as coroutines. In order to demonstrate its expressiveness, we informally describe how three high-level languages supporting different concurrency models can be mapped onto our extended machine model. We also provide an outlook on the extended model&#8217;s potential to support concurrency-related MDSOC features.</p></blockquote>
<ul>
	<li>Towards an Actor-based Concurrent Machine Model, <em>Hans Schippers, Tom Van Cutsem, Stefan Marr, Michael Haupt, Robert Hirschfeld</em>, Proceedings of the fourth workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS), New York, NY, USA, ACM (2009), p. 4&#8211;9.</li>
	<li>Paper: <a title="Towards an Actor-based Concurrent Machine Model" href="http://soft.vub.ac.be/~smarr/downloads/icooolps09-schippers.pdf">PDF</a><br /> ©ACM, 2009. This is the author&#8217;s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ICOOOLPS’09 July 6, 2009, Genova, Italy. <a href="http://doi.acm.org/10.1145/1565824.1565825">http://doi.acm.org/10.1145/1565824.1565825</a></li>
	<li>BibTex: <a href="http://www.bibsonomy.org/bibtex/243d4b86261eac5a70d28160c493e70d1/gron">BibSonomy</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://soft.vub.ac.be/~smarr/2010/02/towards-an-actor-based-concurrent-machine-model/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models</title>
		<link>http://soft.vub.ac.be/~smarr/2010/02/virtual-machine-support-for-many-core-architectures-decoupling-abstract-from-concrete-concurrency-models/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=virtual-machine-support-for-many-core-architectures-decoupling-abstract-from-concrete-concurrency-models</link>
		<comments>http://soft.vub.ac.be/~smarr/2010/02/virtual-machine-support-for-many-core-architectures-decoupling-abstract-from-concrete-concurrency-models/#comments</comments>
		<pubDate>Sun, 07 Feb 2010 21:37:42 +0000</pubDate>
		<dc:creator>Stefan</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Actors]]></category>
		<category><![CDATA[Bytecode]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[CSOM]]></category>
		<category><![CDATA[Intermediate Languages]]></category>
		<category><![CDATA[locks]]></category>
		<category><![CDATA[Position Paper]]></category>
		<category><![CDATA[STM]]></category>
		<category><![CDATA[threads]]></category>
		<category><![CDATA[Virtual Machines]]></category>
		<category><![CDATA[VMs]]></category>

		<guid isPermaLink="false">http://soft.vub.ac.be/~smarr/?p=282</guid>
		<description><![CDATA[Finally, my first workshop paper got published, which was a little odyssey with some misunderstandings, but anyway, now it is out. It is just a position paper, thus, do not expect to many insights. However, what it describes is my big plan, and hopefully the story of my PhD. Am working on it&#8230; Abstract The upcoming [...]]]></description>
			<content:encoded><![CDATA[<p>Finally, my first workshop paper got published, which was a little odyssey with some misunderstandings, but anyway, now it is out. It is just a position paper, thus, do not expect to many insights. However, what it describes is my big plan, and hopefully the story of my PhD. Am working on it&#8230;</p>
<h3>Abstract</h3>
<blockquote>
<p>The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today&#8217;s high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets.</p>
<p>Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets.</p>
<p>As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research.</p></blockquote>
<ul>
	<li>Paper: <a title="Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models" href="http://arxiv.org/pdf/1002.0939v1">PDF</a></li>
	<li>EPTCS: <a href="http://cgi.cse.unsw.edu.au/~rvg/eptcs/content.cgi?PLACES2009">Proceedings Second International Workshop on  Programming Language Approaches to Concurrency and Communication-cEntric Software</a></li>
	<li>BibTex: <a href="http://www.bibsonomy.org/bibtex/288f7f1a18d61f2db356e0658d392215c/gron">BibSonomy</a></li>
</ul>
<h3>Slides of the Talk at PLACES09</h3>
<div id="__ss_1872959" style="width: 425px; text-align: left;"><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=places-090817140006-phpapp01&amp;rel=0&amp;stripped_title=virtual-machine-support-for-manycore-architectures-decoupling-abstract-from-concrete-concurrency-models" /><param name="allowfullscreen" value="true" /><embed style="margin:0px" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=places-090817140006-phpapp01&amp;rel=0&amp;stripped_title=virtual-machine-support-for-manycore-architectures-decoupling-abstract-from-concrete-concurrency-models" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
]]></content:encoded>
			<wfw:commentRss>http://soft.vub.ac.be/~smarr/2010/02/virtual-machine-support-for-many-core-architectures-decoupling-abstract-from-concrete-concurrency-models/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

