<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;A04FRn48fCp7ImA9WhRUGEo.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152</id><updated>2012-01-30T00:31:57.074+01:00</updated><category term="speed" /><category term="cli" /><category term="valgrind" /><category term="extension modules" /><category term="sun" /><category term="ep2008" /><category term="kcachegrind" /><category term="PyQt4" /><category term="parser" /><category term="CPython" /><category term="RPyC" /><category term="jython" /><category term="jit" /><category term="cpyext" /><category term="profiling" /><category term="pypy" /><category term="compiler" /><category term="sprint" /><category term="roadmap" /><title>PyPy Status Blog</title><subtitle type="html" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://morepypy.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Carl Friedrich Bolz</name><uri>http://www.blogger.com/profile/00518922641059511014</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>212</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/PyPyStatusBlog" /><feedburner:info uri="pypystatusblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;DEcERXg_fCp7ImA9WhRUF0g.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-6434340612277938795</id><published>2012-01-28T14:06:00.001+01:00</published><updated>2012-01-28T14:06:44.644+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-28T14:06:44.644+01:00</app:edited><title>NumPyPy status update</title><content type="html">&lt;p&gt;Hello.&lt;/p&gt;
&lt;p&gt;This is just a quick status update on the NumPy in PyPy project that very
recently became my day job. I should give my thanks once again to Getco,
Nate Lawson and other contributors who donated above $40000 towards the goal.&lt;/p&gt;
&lt;p&gt;Recently we (Alex Gaynor, Matti Picus and me) implemented a few interesting things
that a lot of people use:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;more ufuncs&lt;/li&gt;
&lt;li&gt;most ufuncs now accept the &lt;tt class="docutils literal"&gt;axis&lt;/tt&gt; parameter (except &lt;tt class="docutils literal"&gt;all&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;any&lt;/tt&gt;)&lt;/li&gt;
&lt;li&gt;fixed string representation of arrays, now it's identical to numpy (uses
pretty much the same code)&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;ndarray.flat&lt;/tt&gt; should be working correctly&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;ndarray.flatten&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;ndarray.ravel&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;ndarray.take&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;indexing arrays by boolean arrays of the same size&lt;/li&gt;
&lt;li&gt;and various bugfixes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We would also like to introduce the &lt;a class="reference external" href="http://buildbot.pypy.org/numpy-status/latest.html"&gt;nightly report&lt;/a&gt; of numpy status. This
is an automated tool that does package introspection. While it gives some
sort of idea how much of numpy is implemented, it's not by far the authority.
Your tests should be the authority. It won't report whether functions
support all kinds of parameters (for example masked arrays and &lt;tt class="docutils literal"&gt;out&lt;/tt&gt; parameter
are completely unsupported) or that functions work &lt;strong&gt;at all&lt;/strong&gt;. We also
reserve the right to incorporate jokes in that website, so don't treat it
that seriously overall :-)&lt;/p&gt;
&lt;p&gt;Thanks, and stay tuned.  We hope to post here regular updates on the
progress.&lt;/p&gt;
&lt;p&gt;Cheers,&lt;br/&gt;
fijal &amp;amp; the PyPy team&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-6434340612277938795?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/18_LesNy4LI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/6434340612277938795/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=6434340612277938795" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6434340612277938795?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6434340612277938795?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/18_LesNy4LI/numpypy-status-update.html" title="NumPyPy status update" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>3</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2012/01/numpypy-status-update.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUECSH87eCp7ImA9WhRUFkg.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-3008917396290059758</id><published>2012-01-27T10:46:00.001+01:00</published><updated>2012-01-27T10:47:49.100+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-27T10:47:49.100+01:00</app:edited><title>Py3k and Numpy First Stage: Thanks to all who Gave</title><content type="html">&lt;p&gt;Last year was quite successful for PyPy fundraising through the &lt;a class="reference external" href="http://sfconservancy.org/"&gt;Software
Freedom Conservancy&lt;/a&gt;, and Conservancy and PyPy are very excited to
announce that enough was raised to begin the first stages on the &lt;a class="reference external" href="http://pypy.org/py3donate.html"&gt;Py3k&lt;/a&gt;
and &lt;a class="reference external" href="http://pypy.org/numpydonate.html"&gt;Numpy&lt;/a&gt; grant proposals.&lt;/p&gt;
&lt;p&gt;As of the end of 2011, 135 different individuals gave to the &lt;a class="reference external" href="http://pypy.org/py3donate.html"&gt;Py3k&lt;/a&gt;
campaign, and 114 to the &lt;a class="reference external" href="http://pypy.org/numpydonate.html"&gt;Numpy&lt;/a&gt; campaign.  We thank each of you who
donated to help make this work possible.  Meanwhile, if you haven't given
to support these projects, we do hope you'll give generously now to help
fund their second stages later this year!&lt;/p&gt;
&lt;p&gt;We're also particularly excited that a few donors gave particularly large
donations to support this work; those big donations really filled in the
gap to help us get started!&lt;/p&gt;
&lt;p&gt;Specifically, we're pleased to announce that &lt;a class="reference external" href="http://google.com"&gt;Google&lt;/a&gt;  donated $35000 towards
implementing Python 3 in PyPy.   Google's general support of the Python community
is well known, and their specific support of our grant proposal
is much appreciated.&lt;/p&gt;
&lt;p&gt;Meanwhile, Numpy was supported in part by contributions from Nate Lawson,
&lt;a class="reference external" href="http://www.cantabcapital.com/"&gt;Cantab Capital Partners&lt;/a&gt;, and &lt;a class="reference external" href="http://www.getcollc.com/"&gt;Getco&lt;/a&gt;, as well as more than a hundred
other contributors.&lt;/p&gt;
&lt;p&gt;With these donations combined with many others, we're now starting work on
both projects.  This week, the &lt;a class="reference external" href="http://sfconservancy.org/"&gt;Conservancy&lt;/a&gt; signed contracts with
&lt;a class="reference external" href="http://twitter.com/#!/antocuni"&gt;Antonio Cuni&lt;/a&gt; and Benjamin Peterson to work towards the Stage 1.1 goals in
Py3k proposal (and is negotiating for another contractor as well), and
with &lt;a class="reference external" href="http://baroquesoftware.com/~fijal/"&gt;Maciej Fijałkowski&lt;/a&gt; to work towards the Stage 1 goals in the Numpy
proposal.&lt;/p&gt;
&lt;p&gt;In 2012, PyPy will continue regular &lt;a class="reference external" href="http://morepypy.blogspot.com/2011/12/leysin-winter-sprint.html"&gt;sprint meetings&lt;/a&gt;, at which Py3K and Numpy
efforts will certainly have a place.  We have some limited funds to
fund travels of contributors to those meetings.&lt;/p&gt;
&lt;p&gt;We're very thankful for all who donated so far to support these efforts,
and we hope that now that work has begun, even more donors will come
forward to help us finish the job.  In the meantime, watch for the commits
showing up from these developers and other contributors in the PyPy repositories!&lt;/p&gt;
&lt;p&gt;Cheers,
The PyPy Team&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-3008917396290059758?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/Gy25f88Ofew" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/3008917396290059758/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=3008917396290059758" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/3008917396290059758?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/3008917396290059758?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/Gy25f88Ofew/py3k-and-numpy-first-stage-thanks-to.html" title="Py3k and Numpy First Stage: Thanks to all who Gave" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2012/01/py3k-and-numpy-first-stage-thanks-to.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C08DRXs4cSp7ImA9WhRUFUU.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-7255412724168990164</id><published>2012-01-26T13:44:00.000+01:00</published><updated>2012-01-26T13:44:34.539+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-26T13:44:34.539+01:00</app:edited><title>Comparing Partial Evaluation and Tracing, Part 1</title><content type="html">&lt;p&gt;As part of writing my PhD I am currently thinking about the relationship
between PyPy's meta-tracing approach with various previous ideas to
automatically get a (JIT-)compiler from only an interpreter of a language. One
of the most-researched ideas along these lines is that of &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Partial_evaluation"&gt;partial evaluation&lt;/a&gt;.
Partial evaluation has basically the same goals as PyPy when it comes to
compilers: Write an interpreter, and get a compiler for free. The methods for
reaching that goal are a bit different. In this series of blog posts, I am
trying to explore the similarities and differences of partial evaluation and
PyPy's meta-tracing.&lt;/p&gt;
&lt;h2&gt;A Flowgraph Language&lt;/h2&gt;
&lt;p&gt;To be able to clearly understand what &amp;quot;partial evaluation&amp;quot; is and what
&amp;quot;meta-tracing&amp;quot; is I will show an &amp;quot;executable model&amp;quot; of both. To that end, I am
defining a small imperative language and will then show what a partial evaluator
and a tracer for that language look like. All this code will be
implemented in Prolog. (Any pattern-matching functional language would do, but I
happen to know Prolog best. Backtracking is not used, so you can read things
simply as functional programs.) In this post I will start with
the definition of the language, and a partial evaluator for it. The code
written in this blog post can be found fully here: &lt;a class="reference external" href="http://paste.pocoo.org/show/541004/"&gt;http://paste.pocoo.org/show/541004/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The language is conceptionally similar to PyPy's flow graphs, but a bit more
restricted. It does not have function calls, only labelled basic blocks
that consist of a series of linearly executed operations, followed by a
conditional or an unconditional jump. Every operation is assigning a value to a
variable, which is computed by applying some operation to some arguments.&lt;/p&gt;
&lt;p&gt;A simple program to raise &lt;tt class="docutils literal"&gt;x&lt;/tt&gt; to the &lt;tt class="docutils literal"&gt;yth&lt;/tt&gt; power in that language looks like
this:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
power:
    res = 1
    if y goto power_rec else goto power_done

power_rec:
    res = res * x
    y = y - 1
    if y goto power_rec else goto power_done

power_done:
    print_and_stop(res)
&lt;/pre&gt;
&lt;p&gt;To represent the same program as Prolog data structures, we use the
following Prolog code:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;op1&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;same&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #FF6600"&gt;1&lt;/span&gt;),
             &lt;span style="color: #CC00FF"&gt;if&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;power_rec&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;power_done&lt;/span&gt;))).
&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;op2&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;mul&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;x&lt;/span&gt;),
                 &lt;span style="color: #CC00FF"&gt;op2&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;sub&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #FF6600"&gt;1&lt;/span&gt;),
                 &lt;span style="color: #CC00FF"&gt;if&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;power_rec&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;power_done&lt;/span&gt;)))).
&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_done&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;print_and_stop&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;))).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Every rule of &lt;tt class="docutils literal"&gt;block&lt;/tt&gt; declares one block by first giving the label of the
block, followed by the code. Code is a series of &lt;tt class="docutils literal"&gt;op1&lt;/tt&gt; or &lt;tt class="docutils literal"&gt;op2&lt;/tt&gt; statements
terminated by a &lt;tt class="docutils literal"&gt;jump&lt;/tt&gt;, an &lt;tt class="docutils literal"&gt;if&lt;/tt&gt; or a &lt;tt class="docutils literal"&gt;print_and_stop&lt;/tt&gt;. &lt;tt class="docutils literal"&gt;op1&lt;/tt&gt; statements
are operations with one argument of the form &lt;tt class="docutils literal"&gt;op1(res_variable,
operation_name, argument, next_statement)&lt;/tt&gt;. Arguments can be either variables
in the form &lt;tt class="docutils literal"&gt;var(name)&lt;/tt&gt; or constants in the form &lt;tt class="docutils literal"&gt;const(value)&lt;/tt&gt;.&lt;/p&gt;
&lt;p&gt;To run programs in this flowgraph language, we first need some helper
functionality. The first few helper functions are concerned with the handling of
environments, the data structures the interpreter uses to map variable
names occuring in the program to the variables' current values. In Python
dictionaries would be used for this purpose, but in Prolog we have to emulate
these by lists of key/value pairs (not very efficient, but good enough):&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;lookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;X&lt;/span&gt;, [], &lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt;) :- &lt;span style="color: #CC00FF"&gt;throw&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;key_not_found&lt;/span&gt;(&lt;span style="color: #003333"&gt;X&lt;/span&gt;)).
&lt;span style="color: #CC00FF"&gt;lookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;Key&lt;/span&gt;, [&lt;span style="color: #003333"&gt;Key&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #003333"&gt;Value&lt;/span&gt; | &lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt;], &lt;span style="color: #003333"&gt;Value&lt;/span&gt;) :- !.
&lt;span style="color: #CC00FF"&gt;lookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;Key&lt;/span&gt;, [&lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt; | &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;], &lt;span style="color: #003333"&gt;Value&lt;/span&gt;) :- &lt;span style="color: #CC00FF"&gt;lookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;Key&lt;/span&gt;, &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;, &lt;span style="color: #003333"&gt;Value&lt;/span&gt;).

&lt;span style="color: #CC00FF"&gt;write_env&lt;/span&gt;([], &lt;span style="color: #003333"&gt;X&lt;/span&gt;, &lt;span style="color: #003333"&gt;V&lt;/span&gt;, [&lt;span style="color: #003333"&gt;X&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #003333"&gt;V&lt;/span&gt;]).
&lt;span style="color: #CC00FF"&gt;write_env&lt;/span&gt;([&lt;span style="color: #003333"&gt;Key&lt;/span&gt;&lt;span style="color: #CC3300"&gt;/&lt;/span&gt;&lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt; | &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;], &lt;span style="color: #003333"&gt;Key&lt;/span&gt;, &lt;span style="color: #003333"&gt;Value&lt;/span&gt;, [&lt;span style="color: #003333"&gt;Key&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #003333"&gt;Value&lt;/span&gt; | &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;]) :- !.
&lt;span style="color: #CC00FF"&gt;write_env&lt;/span&gt;([&lt;span style="color: #003333"&gt;Pair&lt;/span&gt; | &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;], &lt;span style="color: #003333"&gt;Key&lt;/span&gt;, &lt;span style="color: #003333"&gt;Value&lt;/span&gt;, [&lt;span style="color: #003333"&gt;Pair&lt;/span&gt; | &lt;span style="color: #003333"&gt;NewRest&lt;/span&gt;]) :- &lt;span style="color: #CC00FF"&gt;write_env&lt;/span&gt;(&lt;span style="color: #003333"&gt;Rest&lt;/span&gt;, &lt;span style="color: #003333"&gt;Key&lt;/span&gt;, &lt;span style="color: #003333"&gt;Value&lt;/span&gt;, &lt;span style="color: #003333"&gt;NewRest&lt;/span&gt;).

&lt;span style="color: #CC00FF"&gt;remove_env&lt;/span&gt;([], &lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt;, []).
&lt;span style="color: #CC00FF"&gt;remove_env&lt;/span&gt;([&lt;span style="color: #003333"&gt;Key&lt;/span&gt;&lt;span style="color: #CC3300"&gt;/&lt;/span&gt;&lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt; | &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;], &lt;span style="color: #003333"&gt;Key&lt;/span&gt;, &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;) :- !.
&lt;span style="color: #CC00FF"&gt;remove_env&lt;/span&gt;([&lt;span style="color: #003333"&gt;Pair&lt;/span&gt; | &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;], &lt;span style="color: #003333"&gt;Key&lt;/span&gt;, [&lt;span style="color: #003333"&gt;Pair&lt;/span&gt; | &lt;span style="color: #003333"&gt;NewRest&lt;/span&gt;]) :- &lt;span style="color: #CC00FF"&gt;remove_env&lt;/span&gt;(&lt;span style="color: #003333"&gt;Rest&lt;/span&gt;, &lt;span style="color: #003333"&gt;Key&lt;/span&gt;, &lt;span style="color: #003333"&gt;NewRest&lt;/span&gt;).

&lt;span style="color: #CC00FF"&gt;resolve&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #003333"&gt;X&lt;/span&gt;), &lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;).
&lt;span style="color: #CC00FF"&gt;resolve&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #003333"&gt;X&lt;/span&gt;), &lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #003333"&gt;Y&lt;/span&gt;) :- &lt;span style="color: #CC00FF"&gt;lookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;X&lt;/span&gt;, &lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #003333"&gt;Y&lt;/span&gt;).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The implementation of these functions is not too important. The &lt;tt class="docutils literal"&gt;lookup&lt;/tt&gt;
function finds a key in an environment list, the &lt;tt class="docutils literal"&gt;write_env&lt;/tt&gt; function adds a
new key/value pair to an environment, &lt;tt class="docutils literal"&gt;remove_env&lt;/tt&gt; removes a key. The
&lt;tt class="docutils literal"&gt;resolve&lt;/tt&gt; function is used to take either a constant or a variable and return
a value. If it's a constant, the value of that constant is returned, if it's a
variable it is looked up in the environment. Note how the last argument of
&lt;tt class="docutils literal"&gt;lookup&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;resolve&lt;/tt&gt; is actually a return value, which is the typical
approach in Prolog.&lt;/p&gt;
&lt;p&gt;So far we have not specified what the primitive operations that can occur in the
program actually mean. For that we define a &lt;tt class="docutils literal"&gt;do_op&lt;/tt&gt; function which
executes primitive operations:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;same&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;).
&lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;mul&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;, &lt;span style="color: #003333"&gt;Y&lt;/span&gt;, &lt;span style="color: #003333"&gt;Z&lt;/span&gt;) :- &lt;span style="color: #003333"&gt;Z&lt;/span&gt; &lt;span style="color: #555555"&gt;is&lt;/span&gt; &lt;span style="color: #003333"&gt;X&lt;/span&gt; &lt;span style="color: #555555"&gt;*&lt;/span&gt; &lt;span style="color: #003333"&gt;Y&lt;/span&gt;.
&lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;add&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;, &lt;span style="color: #003333"&gt;Y&lt;/span&gt;, &lt;span style="color: #003333"&gt;Z&lt;/span&gt;) :- &lt;span style="color: #003333"&gt;Z&lt;/span&gt; &lt;span style="color: #555555"&gt;is&lt;/span&gt; &lt;span style="color: #003333"&gt;X&lt;/span&gt; &lt;span style="color: #555555"&gt;+&lt;/span&gt; &lt;span style="color: #003333"&gt;Y&lt;/span&gt;.
&lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;sub&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;, &lt;span style="color: #003333"&gt;Y&lt;/span&gt;, &lt;span style="color: #003333"&gt;Z&lt;/span&gt;) :- &lt;span style="color: #003333"&gt;Z&lt;/span&gt; &lt;span style="color: #555555"&gt;is&lt;/span&gt; &lt;span style="color: #003333"&gt;X&lt;/span&gt; &lt;span style="color: #555555"&gt;-&lt;/span&gt; &lt;span style="color: #003333"&gt;Y&lt;/span&gt;.
&lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;eq&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;, &lt;span style="color: #003333"&gt;Y&lt;/span&gt;, &lt;span style="color: #003333"&gt;Z&lt;/span&gt;) :- &lt;span style="color: #003333"&gt;X&lt;/span&gt; &lt;span style="color: #555555"&gt;==&lt;/span&gt; &lt;span style="color: #003333"&gt;Y&lt;/span&gt; &lt;span style="color: #CC3300"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color: #003333"&gt;Z&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #FF6600"&gt;1&lt;/span&gt;; &lt;span style="color: #003333"&gt;Z&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #FF6600"&gt;0&lt;/span&gt;.
&lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;ge&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;, &lt;span style="color: #003333"&gt;Y&lt;/span&gt;, &lt;span style="color: #003333"&gt;Z&lt;/span&gt;) :- &lt;span style="color: #003333"&gt;X&lt;/span&gt; &lt;span style="color: #555555"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color: #003333"&gt;Y&lt;/span&gt; &lt;span style="color: #CC3300"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color: #003333"&gt;Z&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #FF6600"&gt;1&lt;/span&gt;; &lt;span style="color: #003333"&gt;Z&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #FF6600"&gt;0&lt;/span&gt;.
&lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;readlist&lt;/span&gt;, &lt;span style="color: #003333"&gt;L&lt;/span&gt;, &lt;span style="color: #003333"&gt;I&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;) :- &lt;span style="color: #CC00FF"&gt;nth0&lt;/span&gt;(&lt;span style="color: #003333"&gt;I&lt;/span&gt;, &lt;span style="color: #003333"&gt;L&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;).
&lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt;, &lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt;, &lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt;) :- &lt;span style="color: #CC00FF"&gt;throw&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;missing_op&lt;/span&gt;(&lt;span style="color: #003333"&gt;Op&lt;/span&gt;)).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Again the last argument is an output variable.&lt;/p&gt;
&lt;p&gt;Now we can start executing simple operations. For that an &lt;tt class="docutils literal"&gt;interp&lt;/tt&gt; predicate
is defined. It takes as its first argument the current environment and as the
second argument the operation to execute. E.g. to execute primitive operations
with one or two arguments:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;interp&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;op1&lt;/span&gt;(&lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #003333"&gt;Arg&lt;/span&gt;, &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;), &lt;span style="color: #003333"&gt;Env&lt;/span&gt;) :-
    &lt;span style="color: #CC00FF"&gt;resolve&lt;/span&gt;(&lt;span style="color: #003333"&gt;Arg&lt;/span&gt;, &lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg&lt;/span&gt;),
    &lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg&lt;/span&gt;, &lt;span style="color: #003333"&gt;Res&lt;/span&gt;),
    &lt;span style="color: #CC00FF"&gt;write_env&lt;/span&gt;(&lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;Res&lt;/span&gt;, &lt;span style="color: #003333"&gt;NEnv&lt;/span&gt;),
    &lt;span style="color: #CC00FF"&gt;interp&lt;/span&gt;(&lt;span style="color: #003333"&gt;Rest&lt;/span&gt;, &lt;span style="color: #003333"&gt;NEnv&lt;/span&gt;).

&lt;span style="color: #CC00FF"&gt;interp&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;op2&lt;/span&gt;(&lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #003333"&gt;Arg1&lt;/span&gt;, &lt;span style="color: #003333"&gt;Arg2&lt;/span&gt;, &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;), &lt;span style="color: #003333"&gt;Env&lt;/span&gt;) :-
    &lt;span style="color: #CC00FF"&gt;resolve&lt;/span&gt;(&lt;span style="color: #003333"&gt;Arg1&lt;/span&gt;, &lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg1&lt;/span&gt;),
    &lt;span style="color: #CC00FF"&gt;resolve&lt;/span&gt;(&lt;span style="color: #003333"&gt;Arg2&lt;/span&gt;, &lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg2&lt;/span&gt;),
    &lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg1&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg2&lt;/span&gt;, &lt;span style="color: #003333"&gt;Res&lt;/span&gt;),
    &lt;span style="color: #CC00FF"&gt;write_env&lt;/span&gt;(&lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;Res&lt;/span&gt;, &lt;span style="color: #003333"&gt;NEnv&lt;/span&gt;),
    &lt;span style="color: #CC00FF"&gt;interp&lt;/span&gt;(&lt;span style="color: #003333"&gt;Rest&lt;/span&gt;, &lt;span style="color: #003333"&gt;NEnv&lt;/span&gt;).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;First the arguments are resolved into values. Afterwards the operation is executed,
and the result is written back into the environment. Then &lt;tt class="docutils literal"&gt;interp&lt;/tt&gt; is called on
the rest of the program. Similarly easy are the unconditional jump and
&lt;tt class="docutils literal"&gt;print_and_stop&lt;/tt&gt;:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;interp&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;jump&lt;/span&gt;(&lt;span style="color: #003333"&gt;L&lt;/span&gt;), &lt;span style="color: #003333"&gt;Env&lt;/span&gt;) :-
    &lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #003333"&gt;L&lt;/span&gt;, &lt;span style="color: #003333"&gt;Block&lt;/span&gt;),
    &lt;span style="color: #CC00FF"&gt;interp&lt;/span&gt;(&lt;span style="color: #003333"&gt;Block&lt;/span&gt;, &lt;span style="color: #003333"&gt;Env&lt;/span&gt;).


&lt;span style="color: #CC00FF"&gt;interp&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;print_and_stop&lt;/span&gt;(&lt;span style="color: #003333"&gt;Arg&lt;/span&gt;), &lt;span style="color: #003333"&gt;Env&lt;/span&gt;) :-
    &lt;span style="color: #CC00FF"&gt;resolve&lt;/span&gt;(&lt;span style="color: #003333"&gt;Arg&lt;/span&gt;, &lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #003333"&gt;Val&lt;/span&gt;),
    &lt;span style="color: #CC00FF"&gt;print&lt;/span&gt;(&lt;span style="color: #003333"&gt;Val&lt;/span&gt;), &lt;span style="color: #CC3300"&gt;nl&lt;/span&gt;.
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the unconditional jump we simply get the target block and continue executing
that. To execute &lt;tt class="docutils literal"&gt;print_and_stop&lt;/tt&gt; we resolve the argument, print the value and
then are done.&lt;/p&gt;
&lt;p&gt;The conditional jump is only slightly more difficult:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;interp&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;if&lt;/span&gt;(&lt;span style="color: #003333"&gt;V&lt;/span&gt;, &lt;span style="color: #003333"&gt;L1&lt;/span&gt;, &lt;span style="color: #003333"&gt;L2&lt;/span&gt;), &lt;span style="color: #003333"&gt;Env&lt;/span&gt;) :-
    &lt;span style="color: #CC00FF"&gt;lookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;V&lt;/span&gt;, &lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #003333"&gt;Val&lt;/span&gt;),
    (&lt;span style="color: #003333"&gt;Val&lt;/span&gt; &lt;span style="color: #555555"&gt;==&lt;/span&gt; &lt;span style="color: #FF6600"&gt;0&lt;/span&gt; &lt;span style="color: #CC3300"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #003333"&gt;L2&lt;/span&gt;, &lt;span style="color: #003333"&gt;Block&lt;/span&gt;)
    ;
        &lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #003333"&gt;L1&lt;/span&gt;, &lt;span style="color: #003333"&gt;Block&lt;/span&gt;)
    ),
    &lt;span style="color: #CC00FF"&gt;interp&lt;/span&gt;(&lt;span style="color: #003333"&gt;Block&lt;/span&gt;, &lt;span style="color: #003333"&gt;Env&lt;/span&gt;).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;First the variable is looked up in the environment. If the variable is zero,
execution continues at the second block, otherwise it continues at the first
block.&lt;/p&gt;
&lt;p&gt;Given this interpreter, we can execute the above example program like this, on a
Prolog console:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #AA0000; background-color: #FFAAAA"&gt;$&lt;/span&gt; &lt;span style="color: #CC3300"&gt;swipl&lt;/span&gt; &lt;span style="color: #555555"&gt;-&lt;/span&gt;&lt;span style="color: #CC3300"&gt;s&lt;/span&gt; &lt;span style="color: #CC3300"&gt;cfglang&lt;/span&gt;.&lt;span style="color: #CC3300"&gt;pl&lt;/span&gt;
&lt;span style="color: #CC3300"&gt;?-&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power&lt;/span&gt;, &lt;span style="color: #003333"&gt;Block&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;interp&lt;/span&gt;(&lt;span style="color: #003333"&gt;Block&lt;/span&gt;, [&lt;span style="color: #CC3300"&gt;x&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;10&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;y&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;10&lt;/span&gt;]).
&lt;span style="color: #FF6600"&gt;10000000000&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Partial Evaluation of the Flowgraph Language&lt;/h2&gt;
&lt;p&gt;Let's look at what a partial evaluator for this simple flowgraph language would
look like. Partial evaluation (PE), also called specialization, is a program
manipuation technique. PE takes an input program and transforms it into a
(hopefully) simpler and faster output program. It does this by assuming that
some variables in the input program are constants. All operations that act only
on such constants can be folded away. All other operations need to remain in the
output program (called residual program). Thus the partial evaluator proceeds
much like an interpreter, just that it cannot actually execute some operations.
Also, its output is not just a value, but also list of remaining operations that
could not be optimized away.&lt;/p&gt;
&lt;p&gt;The partial evaluator cannot use normal environments, because unlike the
interpreter not all variables' values are known to it. It will therefore work on
partial environments, which store just the know variables. For these partial
environments, some new helper functions are needed:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;plookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;Key&lt;/span&gt;, [], &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #003333"&gt;Key&lt;/span&gt;)).
&lt;span style="color: #CC00FF"&gt;plookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;Key&lt;/span&gt;, [&lt;span style="color: #003333"&gt;Key&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #003333"&gt;Value&lt;/span&gt; | &lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt;], &lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #003333"&gt;Value&lt;/span&gt;)) :- !.
&lt;span style="color: #CC00FF"&gt;plookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;Key&lt;/span&gt;, [&lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt; | &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;], &lt;span style="color: #003333"&gt;Value&lt;/span&gt;) :- &lt;span style="color: #CC00FF"&gt;plookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;Key&lt;/span&gt;, &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;, &lt;span style="color: #003333"&gt;Value&lt;/span&gt;).

&lt;span style="color: #CC00FF"&gt;presolve&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #003333"&gt;X&lt;/span&gt;), &lt;span style="color: #006699; font-weight: bold"&gt;_&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #003333"&gt;X&lt;/span&gt;)).
&lt;span style="color: #CC00FF"&gt;presolve&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #003333"&gt;V&lt;/span&gt;), &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;) :- &lt;span style="color: #CC00FF"&gt;plookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;V&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;X&lt;/span&gt;).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The function &lt;tt class="docutils literal"&gt;plookup&lt;/tt&gt; takes a variable and a partial environment and returns
either &lt;tt class="docutils literal"&gt;const(Value)&lt;/tt&gt; if the variable is found in the partial environment or
&lt;tt class="docutils literal"&gt;var(Key)&lt;/tt&gt; if it is not. Equivalently, &lt;tt class="docutils literal"&gt;presolve&lt;/tt&gt; is like &lt;tt class="docutils literal"&gt;resolve&lt;/tt&gt;,
except that it uses &lt;tt class="docutils literal"&gt;plookup&lt;/tt&gt; instead of &lt;tt class="docutils literal"&gt;lookup&lt;/tt&gt;.&lt;/p&gt;
&lt;p&gt;With these helpers we can start writing a partial evaluator. The following two
rules are where the main optimization in the form of constant folding happens.
The idea is that when the partial evaluator sees an operation that involves
only constant arguments, it can constant-fold the operation, otherwise it
can't:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;pe&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;op1&lt;/span&gt;(&lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #003333"&gt;Arg&lt;/span&gt;, &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;), &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;NewOp&lt;/span&gt;) :-
    &lt;span style="color: #CC00FF"&gt;presolve&lt;/span&gt;(&lt;span style="color: #003333"&gt;Arg&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg&lt;/span&gt;),
    (&lt;span style="color: #003333"&gt;RArg&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #003333"&gt;C&lt;/span&gt;) &lt;span style="color: #CC3300"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #003333"&gt;C&lt;/span&gt;, &lt;span style="color: #003333"&gt;Res&lt;/span&gt;),
        &lt;span style="color: #CC00FF"&gt;write_env&lt;/span&gt;(&lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;Res&lt;/span&gt;, &lt;span style="color: #003333"&gt;NEnv&lt;/span&gt;),
        &lt;span style="color: #003333"&gt;RestResidual&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #003333"&gt;NewOp&lt;/span&gt;
    ;
        &lt;span style="color: #CC00FF"&gt;remove_env&lt;/span&gt;(&lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;NEnv&lt;/span&gt;),
        &lt;span style="color: #003333"&gt;NewOp&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;op1&lt;/span&gt;(&lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg&lt;/span&gt;, &lt;span style="color: #003333"&gt;RestResidual&lt;/span&gt;)
    ),
    &lt;span style="color: #CC00FF"&gt;pe&lt;/span&gt;(&lt;span style="color: #003333"&gt;Rest&lt;/span&gt;, &lt;span style="color: #003333"&gt;NEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;RestResidual&lt;/span&gt;).

&lt;span style="color: #CC00FF"&gt;pe&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;op2&lt;/span&gt;(&lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #003333"&gt;Arg1&lt;/span&gt;, &lt;span style="color: #003333"&gt;Arg2&lt;/span&gt;, &lt;span style="color: #003333"&gt;Rest&lt;/span&gt;), &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;NewOp&lt;/span&gt;) :-
    &lt;span style="color: #CC00FF"&gt;presolve&lt;/span&gt;(&lt;span style="color: #003333"&gt;Arg1&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg1&lt;/span&gt;),
    &lt;span style="color: #CC00FF"&gt;presolve&lt;/span&gt;(&lt;span style="color: #003333"&gt;Arg2&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg2&lt;/span&gt;),
    (&lt;span style="color: #003333"&gt;RArg1&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #003333"&gt;C1&lt;/span&gt;), &lt;span style="color: #003333"&gt;RArg2&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #003333"&gt;C2&lt;/span&gt;) &lt;span style="color: #CC3300"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span style="color: #CC00FF"&gt;do_op&lt;/span&gt;(&lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #003333"&gt;C1&lt;/span&gt;, &lt;span style="color: #003333"&gt;C2&lt;/span&gt;, &lt;span style="color: #003333"&gt;Res&lt;/span&gt;),
        &lt;span style="color: #CC00FF"&gt;write_env&lt;/span&gt;(&lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;Res&lt;/span&gt;, &lt;span style="color: #003333"&gt;NEnv&lt;/span&gt;),
        &lt;span style="color: #003333"&gt;RestResidual&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #003333"&gt;NewOp&lt;/span&gt;

    ;
        &lt;span style="color: #CC00FF"&gt;remove_env&lt;/span&gt;(&lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;NEnv&lt;/span&gt;),
        &lt;span style="color: #003333"&gt;NewOp&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;op2&lt;/span&gt;(&lt;span style="color: #003333"&gt;ResultVar&lt;/span&gt;, &lt;span style="color: #003333"&gt;Op&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg1&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg2&lt;/span&gt;, &lt;span style="color: #003333"&gt;RestResidual&lt;/span&gt;)
    ),
    &lt;span style="color: #CC00FF"&gt;pe&lt;/span&gt;(&lt;span style="color: #003333"&gt;Rest&lt;/span&gt;, &lt;span style="color: #003333"&gt;NEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;RestResidual&lt;/span&gt;).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;tt class="docutils literal"&gt;pe&lt;/tt&gt; predicate takes a partial environment, the current operations and
potentially returns a new operation. To partially evaluate a simple operation, its arguments are
looked up in the partial environment. If all the arguments are constants, the
operation can be executed, and no new operation is produced. Otherwise, we need
to produce a new residual operation which is exactly like the one currently
looked at. Also, the result variable needs to be removed from the partial
environment, because it was just overwritten by an unknown value.&lt;/p&gt;
&lt;p&gt;The potentially generated residual operation is stored into the output argument
&lt;tt class="docutils literal"&gt;NewOp&lt;/tt&gt;. The output argument of the recursive call is the last argument of
the newly created residual operation, which will then be filled by the
recursive call. This is a typical approach in Prolog, but may look strange if
you are not familiar with it.&lt;/p&gt;
&lt;p&gt;Note how the first case of these two rules is just like interpretation. The
second case doesn't really do anything, it just produces a residual operation.
This relationship between normal evaluation and partial evaluation is very
typical.&lt;/p&gt;
&lt;p&gt;The unconditional jump and &lt;tt class="docutils literal"&gt;print_and_stop&lt;/tt&gt; are not much more complex:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;pe&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;jump&lt;/span&gt;(&lt;span style="color: #003333"&gt;L&lt;/span&gt;), &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;jump&lt;/span&gt;(&lt;span style="color: #003333"&gt;LR&lt;/span&gt;)) :-
    &lt;span style="color: #CC00FF"&gt;do_pe&lt;/span&gt;(&lt;span style="color: #003333"&gt;L&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;LR&lt;/span&gt;).

&lt;span style="color: #CC00FF"&gt;pe&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;print_and_stop&lt;/span&gt;(&lt;span style="color: #003333"&gt;Arg&lt;/span&gt;), &lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;print_and_stop&lt;/span&gt;(&lt;span style="color: #003333"&gt;RArg&lt;/span&gt;)) :-
    &lt;span style="color: #CC00FF"&gt;presolve&lt;/span&gt;(&lt;span style="color: #003333"&gt;Arg&lt;/span&gt;, &lt;span style="color: #003333"&gt;Env&lt;/span&gt;, &lt;span style="color: #003333"&gt;RArg&lt;/span&gt;).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To partially evaluate an unconditional jump we again produce a jump. The target
label of that residual jump is computed by asking the partial evaluator to
produce residual code for the label &lt;tt class="docutils literal"&gt;L&lt;/tt&gt; with the given partial environment.
&lt;tt class="docutils literal"&gt;print_and_stop&lt;/tt&gt; is simply turned into a &lt;tt class="docutils literal"&gt;print_and_stop&lt;/tt&gt;. We will see the
code for &lt;tt class="docutils literal"&gt;do_pe&lt;/tt&gt; soon.&lt;/p&gt;
&lt;p&gt;Conditional jumps are more interesting:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;pe&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;if&lt;/span&gt;(&lt;span style="color: #003333"&gt;V&lt;/span&gt;, &lt;span style="color: #003333"&gt;L1&lt;/span&gt;, &lt;span style="color: #003333"&gt;L2&lt;/span&gt;), &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;NewOp&lt;/span&gt;) :-
    &lt;span style="color: #CC00FF"&gt;plookup&lt;/span&gt;(&lt;span style="color: #003333"&gt;V&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;Val&lt;/span&gt;),
    (&lt;span style="color: #003333"&gt;Val&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #003333"&gt;C&lt;/span&gt;) &lt;span style="color: #CC3300"&gt;-&amp;gt;&lt;/span&gt;
        (&lt;span style="color: #003333"&gt;C&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #FF6600"&gt;0&lt;/span&gt; &lt;span style="color: #CC3300"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span style="color: #003333"&gt;L&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #003333"&gt;L2&lt;/span&gt;
        ;
            &lt;span style="color: #003333"&gt;L&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #003333"&gt;L1&lt;/span&gt;
        ),
        &lt;span style="color: #CC00FF"&gt;do_pe&lt;/span&gt;(&lt;span style="color: #003333"&gt;L&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;LR&lt;/span&gt;),
        &lt;span style="color: #003333"&gt;NewOp&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;jump&lt;/span&gt;(&lt;span style="color: #003333"&gt;LR&lt;/span&gt;)
    ;
        &lt;span style="color: #CC00FF"&gt;do_pe&lt;/span&gt;(&lt;span style="color: #003333"&gt;L1&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;L1R&lt;/span&gt;),
        &lt;span style="color: #CC00FF"&gt;do_pe&lt;/span&gt;(&lt;span style="color: #003333"&gt;L2&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;L2R&lt;/span&gt;),
        &lt;span style="color: #003333"&gt;NewOp&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;if&lt;/span&gt;(&lt;span style="color: #003333"&gt;V&lt;/span&gt;, &lt;span style="color: #003333"&gt;L1R&lt;/span&gt;, &lt;span style="color: #003333"&gt;L2R&lt;/span&gt;)
    ).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;First we look up the value of the condition variable. If it is a constant, we
can produce better code, because we know statically that only one path is
reachable. Thus we produce code for that path, and then emit an unconditional
jump there. If the condition variable is not known at partial evaluation time,
we need to partially evaluate both paths and produce a conditional jump in the
residual code.&lt;/p&gt;
&lt;p&gt;This rule is the one that causes the partial evaluator to potentially do much
more work than the interpreter, because after an &lt;tt class="docutils literal"&gt;if&lt;/tt&gt; sometimes both paths
need to be explored. In the worst case this process never stops, so a real
partial evaluator would need to ensure somehow that it terminates. There are
many algorithms for doing that, but I will ignore this problem here.&lt;/p&gt;
&lt;p&gt;Now we need to understand what the &lt;tt class="docutils literal"&gt;do_pe&lt;/tt&gt; predicate is doing. Its most
important task is to make sure that we don't do the same work twice by
memoizing code that was already partially evaluated in the past. For that it
keeps a mapping of &lt;tt class="docutils literal"&gt;Label, Partial Environment&lt;/tt&gt; to &lt;tt class="docutils literal"&gt;Label of the residual
code&lt;/tt&gt;:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC00FF"&gt;do_pe&lt;/span&gt;(&lt;span style="color: #003333"&gt;L&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;LR&lt;/span&gt;) :-
    (&lt;span style="color: #CC00FF"&gt;code_cache&lt;/span&gt;(&lt;span style="color: #003333"&gt;L&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;LR&lt;/span&gt;) &lt;span style="color: #CC3300"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span style="color: #CC3300"&gt;true&lt;/span&gt;
    ;
        &lt;span style="color: #CC00FF"&gt;gensym&lt;/span&gt;(&lt;span style="color: #003333"&gt;L&lt;/span&gt;, &lt;span style="color: #003333"&gt;LR&lt;/span&gt;),
        &lt;span style="color: #CC00FF"&gt;assert&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;code_cache&lt;/span&gt;(&lt;span style="color: #003333"&gt;L&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;LR&lt;/span&gt;)),
        &lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #003333"&gt;L&lt;/span&gt;, &lt;span style="color: #003333"&gt;Code&lt;/span&gt;),
        &lt;span style="color: #CC00FF"&gt;pe&lt;/span&gt;(&lt;span style="color: #003333"&gt;Code&lt;/span&gt;, &lt;span style="color: #003333"&gt;PEnv&lt;/span&gt;, &lt;span style="color: #003333"&gt;Residual&lt;/span&gt;),
        &lt;span style="color: #CC00FF"&gt;assert&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #003333"&gt;LR&lt;/span&gt;, &lt;span style="color: #003333"&gt;Residual&lt;/span&gt;))
    ).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If the code cache indicates that label &lt;tt class="docutils literal"&gt;L&lt;/tt&gt; was already partially evaluated
with partial environment &lt;tt class="docutils literal"&gt;PEnv&lt;/tt&gt;, then the previous residual code label
&lt;tt class="docutils literal"&gt;LPrevious&lt;/tt&gt;
is returned. Otherwise, a new label is generated with &lt;tt class="docutils literal"&gt;gensym&lt;/tt&gt;, the code cache
is informed of that new label with &lt;tt class="docutils literal"&gt;assert&lt;/tt&gt;, then the block is partially
evaluated and the residual code is added to the database.&lt;/p&gt;
&lt;p&gt;For those who know partial evaluation terminology: This partial evaluator is a
polyvariant online partial evaluator. &amp;quot;Polyvariant&amp;quot; means that for every label,
several specialized version of the block can be generated. &amp;quot;Online&amp;quot; means that
no preprocessing is done before the partial evaluator runs.&lt;/p&gt;

&lt;h2&gt;Partial Evaluation Example&lt;/h2&gt;
&lt;p&gt;With this code we can look at the classical example of partial evaluation (it's
probably the &amp;quot;Hello World&amp;quot; of partial evaluation). We
can ask the partial evaluator to compute a power function, where the exponent
&lt;tt class="docutils literal"&gt;y&lt;/tt&gt; is a fixed number, e.g. 5, and the base &lt;tt class="docutils literal"&gt;x&lt;/tt&gt; is unknown:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC3300"&gt;?-&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;do_pe&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power&lt;/span&gt;, [&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;5&lt;/span&gt;], &lt;span style="color: #003333"&gt;LR&lt;/span&gt;).
&lt;span style="color: #003333"&gt;LR&lt;/span&gt; &lt;span style="color: #555555"&gt;=&lt;/span&gt; &lt;span style="color: #CC3300"&gt;power1&lt;/span&gt;.
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To find out which code was produced, we can use &lt;tt class="docutils literal"&gt;listing&lt;/tt&gt;:&lt;/p&gt;
&lt;div class="highlight" style="background: #f0f3f3"&gt;&lt;pre style="line-height: 125%"&gt;&lt;span style="color: #CC3300"&gt;?-&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;listing&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;code_cache&lt;/span&gt;)
&lt;span style="color: #CC00FF"&gt;code_cache&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power&lt;/span&gt;, [&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;5&lt;/span&gt;], &lt;span style="color: #CC3300"&gt;power1&lt;/span&gt;).
&lt;span style="color: #CC00FF"&gt;code_cache&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec&lt;/span&gt;, [&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;5&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;res&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;1&lt;/span&gt;], &lt;span style="color: #CC3300"&gt;power_rec1&lt;/span&gt;).
&lt;span style="color: #CC00FF"&gt;code_cache&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec&lt;/span&gt;, [&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;4&lt;/span&gt;], &lt;span style="color: #CC3300"&gt;power_rec2&lt;/span&gt;).
&lt;span style="color: #CC00FF"&gt;code_cache&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec&lt;/span&gt;, [&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;3&lt;/span&gt;], &lt;span style="color: #CC3300"&gt;power_rec3&lt;/span&gt;).
&lt;span style="color: #CC00FF"&gt;code_cache&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec&lt;/span&gt;, [&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;2&lt;/span&gt;], &lt;span style="color: #CC3300"&gt;power_rec4&lt;/span&gt;).
&lt;span style="color: #CC00FF"&gt;code_cache&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec&lt;/span&gt;, [&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;1&lt;/span&gt;], &lt;span style="color: #CC3300"&gt;power_rec5&lt;/span&gt;).
&lt;span style="color: #CC00FF"&gt;code_cache&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_done&lt;/span&gt;, [&lt;span style="color: #CC3300"&gt;y&lt;/span&gt;&lt;span style="color: #555555"&gt;/&lt;/span&gt;&lt;span style="color: #FF6600"&gt;0&lt;/span&gt;], &lt;span style="color: #CC3300"&gt;power_done1&lt;/span&gt;).

&lt;span style="color: #CC3300"&gt;?-&lt;/span&gt; &lt;span style="color: #CC00FF"&gt;listing&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;block&lt;/span&gt;)
.... &lt;span style="color: #CC3300"&gt;the&lt;/span&gt; &lt;span style="color: #CC3300"&gt;block&lt;/span&gt; &lt;span style="color: #CC3300"&gt;definition&lt;/span&gt; &lt;span style="color: #CC3300"&gt;of&lt;/span&gt; &lt;span style="color: #CC3300"&gt;the&lt;/span&gt; &lt;span style="color: #CC3300"&gt;user&lt;/span&gt; &lt;span style="color: #CC3300"&gt;program&lt;/span&gt; ....
&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_done1&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;print_and_stop&lt;/span&gt;(&lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;))).
&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec5&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;op2&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;mul&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;x&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;jump&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_done1&lt;/span&gt;))).
&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec4&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;op2&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;mul&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;x&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;jump&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec5&lt;/span&gt;))).
&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec3&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;op2&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;mul&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;x&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;jump&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec4&lt;/span&gt;))).
&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec2&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;op2&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;mul&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;x&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;jump&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec3&lt;/span&gt;))).
&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec1&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;op2&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;res&lt;/span&gt;, &lt;span style="color: #CC3300"&gt;mul&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;const&lt;/span&gt;(&lt;span style="color: #FF6600"&gt;1&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;var&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;x&lt;/span&gt;), &lt;span style="color: #CC00FF"&gt;jump&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec2&lt;/span&gt;))).
&lt;span style="color: #CC00FF"&gt;block&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power1&lt;/span&gt;, &lt;span style="color: #CC00FF"&gt;jump&lt;/span&gt;(&lt;span style="color: #CC3300"&gt;power_rec1&lt;/span&gt;)).
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;tt class="docutils literal"&gt;code_cache&lt;/tt&gt; tells which residual labels correspond to which original
labels under which partial environments. Thus, &lt;tt class="docutils literal"&gt;power1&lt;/tt&gt; contains the code of
&lt;tt class="docutils literal"&gt;power&lt;/tt&gt; under the assumption that &lt;tt class="docutils literal"&gt;y&lt;/tt&gt; is 5. Looking at the block listing,
the label &lt;tt class="docutils literal"&gt;power1&lt;/tt&gt; corresponds to code that simply multiplies &lt;tt class="docutils literal"&gt;res&lt;/tt&gt; by &lt;tt class="docutils literal"&gt;x&lt;/tt&gt;
five times without using the variable &lt;tt class="docutils literal"&gt;x&lt;/tt&gt; at all. The loop that was present
in the original program has been fully unrolled, the loop variable &lt;tt class="docutils literal"&gt;y&lt;/tt&gt; has
disappeared. Hopefully this is faster than the original program.&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this blog post we saw an interpreter for a simple flow graph language in
Prolog, together with a partial evaluator for it. The partial evaluator
essentially duplicates every rule of the interpreter. If all the arguments of
the current operation are known, it acts like the interpreter, otherwise it
simply copies the operation into the residual code.&lt;/p&gt;
&lt;p&gt;Partial evaluation can be used for a variety of applications, but the most
commonly cited one is that of applying it to an interpreter. To do that, the
program that the interpreter runs is assumed to be constant by the partial
evaluator. Thus a specialized version of the interpreter is produced that does
not use the input program at all. That residual code can be seen as a compiled
version of the input program.&lt;/p&gt;
&lt;p&gt;In the next blog post in this series we will look at writing a simple tracer for
the same flowgraph language.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-7255412724168990164?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/0bmYIqwuv9I" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/7255412724168990164/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=7255412724168990164" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/7255412724168990164?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/7255412724168990164?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/0bmYIqwuv9I/comparing-partial-evaluation-and.html" title="Comparing Partial Evaluation and Tracing, Part 1" /><author><name>Carl Friedrich Bolz</name><uri>http://www.blogger.com/profile/00518922641059511014</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>5</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2012/01/comparing-partial-evaluation-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUYHQnw_eyp7ImA9WhRVFks.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-2244162842744077724</id><published>2012-01-15T23:38:00.001+01:00</published><updated>2012-01-15T23:38:53.243+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-15T23:38:53.243+01:00</app:edited><title>PyPy internship at NCAR</title><content type="html">&lt;p&gt;Hello, everyone&lt;/p&gt;
&lt;p&gt;I would like to inform you that there is a very interesting opportunity
for doing an internship at &lt;a class="reference external" href="http://ncar.ucar.edu/"&gt;NCAR&lt;/a&gt; in the lovely town of &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Boulder,_Colorado"&gt;Boulder&lt;/a&gt;, situated
on the foothils of Rocky Mountains. Before you read on, make sure you:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;are a student of a US University, who is legally eligible to work in the US&lt;/li&gt;
&lt;li&gt;are at least finishing second year this year&lt;/li&gt;
&lt;li&gt;apply before February 3rd.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The internship itself will focus on using PyPy (in some way) to provide
a high performance numeric kernel for an atmospheric model, and measuring how
fast we can go. This is very much in line with what the current effort on
NumPy in PyPy is about. The internship will be mentored by &lt;a class="reference external" href="http://www.linkedin.com/in/delvento"&gt;Davide del Vento&lt;/a&gt;
and I hope to have some influence over where it goes myself :-)&lt;/p&gt;
&lt;p&gt;A few interesting links:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www2.cisl.ucar.edu/siparcs/"&gt;program website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www2.cisl.ucar.edu/siparcs/opportunities/ad"&gt;internship proposal&lt;/a&gt; - note that the actual roadmap is very flexible, as
long as it's a numeric kernel of an atmospheric model using PyPy.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Feel free to contact Davide for details about the proposal and &lt;a class="reference external" href="http://mail.python.org/mailman/listinfo/pypy-dev"&gt;pypy-dev&lt;/a&gt; or
me directly for details about PyPy.&lt;/p&gt;
&lt;p&gt;Cheers,
fijal&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-2244162842744077724?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/ztHZkbWrLCk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/2244162842744077724/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=2244162842744077724" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/2244162842744077724?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/2244162842744077724?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/ztHZkbWrLCk/pypy-internship-at-ncar.html" title="PyPy internship at NCAR" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2012/01/pypy-internship-at-ncar.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0ICQXs5eCp7ImA9WhRVFkg.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-7225309560970774590</id><published>2012-01-14T14:21:00.002+01:00</published><updated>2012-01-15T19:19:20.520+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-15T19:19:20.520+01:00</app:edited><title>Transactional Memory (II)</title><content type="html">&lt;p&gt;Here is an update about the &lt;a class="reference" href="http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill.html"&gt;previous blog post&lt;/a&gt; about the
Global Interpreter Lock (GIL).  In 5 months, the point of view
changed quite a bit.&lt;/p&gt;
&lt;p&gt;Let me remind you that the GIL is the technique used in both CPython and
PyPy to safely run multi-threaded programs: it is a global lock that
prevents multiple threads from actually running at the same time.  The
reason to do that is that it would have disastrous effects in the
interpreter if several threads access the same object concurrently --- to
the point that in CPython even just manipulating the object's reference
counter needs to be protected by the lock.&lt;/p&gt;
&lt;p&gt;So far, the ultimate goal to enable true multi-CPU usage has been to
remove the infamous GIL from the interpreter, so that multiple threads
could actually run in parallel.  It's a lot of work, but this has been
done in Jython.  The reason that it has not been done in CPython so far
is that it's even more work: we would need to care not only about
carefully adding fine-grained locks everywhere, but also about reference
counting; and there are a lot more C extension modules that would need
care, too.  And we don't have locking primitives as performant as
Java's, which have been hand-tuned since ages (e.g. to use help from the
JIT compiler).&lt;/p&gt;
&lt;p&gt;But we think we have a plan to implement a different model for using
multiple cores.  Believe it or not, this is &lt;em&gt;better&lt;/em&gt; than just removing
the GIL from PyPy.  You might get to use all your cores &lt;em&gt;without ever
writing threads.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;You would instead just use some event dispatcher, say from Twisted, from
Stackless, or from your favorite GUI; or just write your own.  From
there, you (or someone else) would add some minimal extra code to the
event dispatcher's source code, to exploit the new transactional features
offered by PyPy.  Then you would run your program on a
special version of PyPy, and voilà: you get some form of automatic parallelization.
Sounds magic, but the basic idea is simple: start handling multiple
events in parallel, giving each one its own &lt;em&gt;transaction.&lt;/em&gt;  More about
it later.&lt;/p&gt;

&lt;h2&gt;&lt;a id="threads-or-events" name="threads-or-events"&gt;Threads or Events?&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;First, why would this be better than &amp;quot;just&amp;quot; removing the GIL?  Because
using threads can be a mess in any complex program.  Some authors (e.g.
&lt;a class="reference" href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf"&gt;Lee&lt;/a&gt;) have argued that the reason is that threads are fundamentally
non-deterministic.  This makes it very hard to reason about them.
Basically the programmer needs to &amp;quot;trim&amp;quot; down the non-determinism (e.g.
by adding locks, semaphores, etc.), and it's hard to be sure when he's
got a sufficiently deterministic result, if only because he can't write
exhaustive tests for it.&lt;/p&gt;
&lt;p&gt;By contrast, consider a Twisted program.  It's not a multi-threaded
program, which means that it handles the &amp;quot;events&amp;quot; one after the other.
The exact ordering of the events is not really deterministic, because
they often correspond to external events; but that's the only source of
non-determinism.  The actual handling of each event occurs in a nicely
deterministic way, and most importantly, not in parallel with the
handling of other events.  The same is true about other libraries like
GUI toolkits, gevent, or Stackless.&lt;/p&gt;
&lt;p&gt;(Of course the Twisted and the Stackless models, to cite only these two,
are quite different from each other; but they have in common the fact
that they are not multi-threaded, and based instead on &amp;quot;events&amp;quot; ---
which in the Stackless case means running a tasklet from one switch()
point to the next one.)&lt;/p&gt;
&lt;p&gt;These two models --- threads or events --- are the two main models we
have right now.  The latter is more used in Python, because it is much
simpler to use than the former, and the former doesn't give any benefit
because of the GIL.  A third model, which is the only one that gives
multi-core benefits, is to use multiple processes, and do inter-process
communication.&lt;/p&gt;

&lt;h2&gt;&lt;a id="the-problem" name="the-problem"&gt;The problem&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Consider the case of a big program that has arbitrary complicated
dependencies.  Even assuming a GIL-less Python, this is likely enough to
prevent the programmer from even starting a multi-threaded rewrite,
because it would require a huge mess of locks.  He could also consider
using multiple processes instead, but the result is annoying as well:
the complicated dependencies translate into a huge mess of inter-process
synchronization.&lt;/p&gt;
&lt;p&gt;The problem can also be down-sized to very small programs, like the kind
of hacks that you do and forget about.  In this case, the dependencies
might be simpler, but you still have to learn and use subtle locking
patterns or a complex inter-process library, which is overkill for the
purpose.&lt;/p&gt;
&lt;p&gt;(This is similar to how explicit memory management is not very hard for
small programs --- but still, nowadays a lot of people agree that
automatic memory management is easier for programs of all sizes.  I
think the same will eventually be true for using multiple CPUs, but the
correct solution will take time to mature, like garbage collectors did.
This post is a step in hopefully the right direction &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;:-)&lt;/span&gt;&lt;/tt&gt;)&lt;/p&gt;

&lt;h2&gt;&lt;a id="events-in-transactions" name="events-in-transactions"&gt;Events in Transactions&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Let me introduce the notion of &lt;em&gt;independent events&lt;/em&gt;: two events are
independent if they don't touch the same set of objects. In a multi-threaded
world, it means that they can be executed in parallel without needing any lock
to ensure correctness.&lt;/p&gt;
&lt;p&gt;Events might also be &lt;em&gt;mostly independent&lt;/em&gt;, i.e. they rarely access the same
object concurrently.  Of course, in a multi-threaded world we would still need
locks to ensure correctness, but the point is that the locks are rarely causing
pauses: &lt;a class="reference" href="http://en.wikipedia.org/wiki/Lock_%28computer_science%29"&gt;lock contention&lt;/a&gt; is low.&lt;/p&gt;
&lt;p&gt;Consider again the Twisted example I gave above.  There are often several
events pending in the dispatch queue (assuming the program is using 100%
of our single usable CPU, otherwise the whole discussion is moot).  The case I am
interested in is the case in which these events are &lt;em&gt;generally mostly
independent&lt;/em&gt;, i.e. we expect few conflicts between them.  However
they don't &lt;em&gt;have&lt;/em&gt; to be proved independent.  In fact it is fine if
they have arbitrary complicated dependencies as described above.  The
point is the expected common case.  Imagine that you have a GIL-less
Python and that you can, by a wave of your hand, have all the careful
locking mess magically done.  Then what I mean here is the case in which
such a theoretical program would run mostly in parallel on multiple
core, without waiting too often on the locks.&lt;/p&gt;
&lt;p&gt;In this case, the solution I'm proposing is that with minimal tweaks
in the event dispatch loop, we can
handle multiple events on multiple threads, each in its own transaction.
A &lt;a class="reference" href="http://en.wikipedia.org/wiki/Transactional_memory"&gt;transaction&lt;/a&gt; is basically a tentative execution of the corresponding
piece of code: if we detect conflicts with other concurrently executing
transactions, we abort the whole transaction and restart it from
scratch.&lt;/p&gt;
&lt;p&gt;By now, the fact that it can basically work should be clear: multiple
transactions will only get into conflict when modifying the same data
structures, which is the case where the magical wand above would have
put locks.  If the magical program could progress without too many
locks, then the transactional program can progress without too many
conflicts.  In a way, you get even more than what the magical program
can give you: each event is dispatched in its own transaction, which
means that from each event's point of view, we have the illusion that
nobody else is running concurrently.  This is exactly what all existing
Twisted-/Stackless-/etc.-based programs are assuming.&lt;/p&gt;
&lt;p&gt;Note that this solution, without transactions, already exists in some
other languages: for example, Erlang is all about independent events.
This is the simple case where we can just run them on multiple cores,
knowing by construction of the language that you can't get conflicts.
Of course, it doesn't work for Python or for a lot of other languages.
From that point of view, what I'm suggesting is merely that
transactional memory could be a good model to cope with the risks of
conflicts that come from not having a special-made language.&lt;/p&gt;

&lt;h2&gt;&lt;a id="not-a-perfect-solution" name="not-a-perfect-solution"&gt;Not a perfect solution&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Of course, transactional memory
(TM) is not a perfect solution either.  Right now, the biggest issue is
the performance hit that comes from the software implementation (STM).
In time, hardware support (HTM) is &lt;a class="reference" href="http://en.wikipedia.org/wiki/Haswell_%28microarchitecture%29"&gt;likely to show up&lt;/a&gt; and help
mitigate the problem; but I won't deny the fact that in some cases,
because it's simple enough and/or because you really need the top
performance, TM is not the best solution.&lt;/p&gt;
&lt;p&gt;Also, the explanations above are silent on what is a hard point for TM,
namely system calls.  The basic general solution is to suspend other
transactions as soon as a transaction does its first system call, so
that we are sure that the transaction will succeed.  Of course this
solution is far from optimal.  Interestingly, it's possible to do better
on a case-by-case basis: for example, by adding in-process buffers, we
can improve the situation for sockets, by having recv() store in a
buffer what is received so that it can be re-recv()-ed later if the
transaction is aborted; similarly, send() or writes to log files can be
delayed until we are sure that the transaction will commit.&lt;/p&gt;
&lt;p&gt;From my point of view, the most important point is that the TM solution
comes from the correct side of the &amp;quot;determinism&amp;quot; scale.  With threads,
you have to prune down non-determinism.  With TM, you start from a
mostly deterministic point, and if needed, you add non-determinism.  The
reason you would want to do so is to make the transactions shorter:
shorter transactions have less risks of conflicts, and when there are
conflicts, less things to redo.  So making transactions shorter
increases the parallelism that your program can achieve, while at the
same time requiring more care.&lt;/p&gt;
&lt;p&gt;In terms of an event-driven model, the equivalent would be to divide the
response of a big processing event into several events that are handled
one after the other: for example, the first event sets things up and fires the second
event, which does the actual computation; and afterwards a third event
writes the results back.  As a result, the second event's transaction
has little risks of getting aborted.  On the other hand, the writing
back needs to be aware of the fact that it's not in the same transaction
as the original setting up, which means that other unrelated
transactions may have run in-between.&lt;/p&gt;

&lt;h2&gt;&lt;a id="one-step-towards-the-future" name="one-step-towards-the-future"&gt;One step towards the future?&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;These, and others, are the problems of the TM approach.  They are &amp;quot;new&amp;quot;
problems, too, in the sense that the existing ways of programming don't
have these problems.&lt;/p&gt;
&lt;p&gt;Still, as you have guessed, I think that it is overall a win, and
possibly a big win --- a win that might be on the same scale for the age
of multiple CPUs as automatic garbage collection was 20 years ago for
the age of RAM size explosion.&lt;/p&gt;
&lt;p&gt;Stay tuned for more!&lt;/p&gt;
&lt;p&gt;--- Armin (and reviews by Antonio and Fijal)&lt;/p&gt;

&lt;br&gt;
&lt;b&gt;UPDATE:&lt;/b&gt; please look at the tiny &lt;a href="https://bitbucket.org/arigo/arigo/raw/default/hack/stm/transactionmodule/"&gt;transaction module&lt;/a&gt; I wrote as an example.  The idea is to have the same interface as this module, but implemented differently.  By making use of transactional memory internally, it should be possible to safely run on multiple CPUs while keeping the very same programmer interface.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-7225309560970774590?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/POusKEgvNK4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/7225309560970774590/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=7225309560970774590" title="17 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/7225309560970774590?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/7225309560970774590?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/POusKEgvNK4/transactional-memory-ii.html" title="Transactional Memory (II)" /><author><name>Armin Rigo</name><uri>http://www.blogger.com/profile/06300515270104686574</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>17</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2012/01/transactional-memory-ii.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0MFQHc5cCp7ImA9WhRVEkQ.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-3336055571122066974</id><published>2012-01-10T20:21:00.008+01:00</published><updated>2012-01-11T17:30:11.928+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-11T17:30:11.928+01:00</app:edited><title>NumPyPy progress report - running benchmarks</title><content type="html">&lt;p&gt;Hello.&lt;/p&gt;
&lt;p&gt;We're excited to let you know about some of the great progress we've made on
NumPyPy: both completeness and performance. In this blog entry we mostly
will talk about performance and how much progress we have made so far.
&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Word of warning:&lt;/strong&gt; this
work is in progress -- we're maybe half way to where we want to be and there are
many trivial and not so trivial optimizations to be written. (For example, we
haven't even started to implement important optimizations, like vectorization.)&lt;/p&gt;
&lt;div class="section" id="benchmark"&gt;
&lt;h1&gt;Benchmark&lt;/h1&gt;
&lt;p&gt;We chose a laplace equation solver, based on SciPy's &lt;a class="reference external" href="http://www.scipy.org/PerformancePython"&gt;PerformancePython&lt;/a&gt; wiki.
Unfortunately, the different implementations on the wiki page accidentally use
two different algorithms, which have different convergences, and very different
performance characteristics on modern computers. As a result, we implemented
our own versions in both C and Python (with and without NumPy). The full source
can be found in &lt;a class="reference external" href="https://bitbucket.org/fijal/hack2/src/default/bench/laplace"&gt;fijal's hack&lt;/a&gt; repo, all these benchmarks were performed at
revision 18502dbbcdb3.&lt;/p&gt;
&lt;p&gt;First, let me describe various algorithms used. Note that some of them contain
PyPy-specific hacks to work around limitations in the current implementation.
These hacks will go away eventually and the performance will improve.
Numerically the algorithms used are identical, however exact data layout in
memory differs between them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A note about all the benchmarks:&lt;/strong&gt; they each were run once, but the
performance is very stable across runs.&lt;/p&gt;
&lt;p&gt;Starting with the C version, it implements a trivial laplace transform
using two loops and double-reference memory (array of &lt;tt class="docutils literal"&gt;int*&lt;/tt&gt;). The double
reference does not matter for performance and the two algorithms are
implemented in &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;inline-laplace.c&lt;/span&gt;&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;laplace.c&lt;/tt&gt;. They were both compiled
with &lt;tt class="docutils literal"&gt;gcc 4.4.5&lt;/tt&gt; at &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;-O3&lt;/span&gt;&lt;/tt&gt;. The inline version modifies array in-place while the non-inline version stores results in a copy. That makes them converge at different rate, hence different number of iterations&lt;/p&gt;
&lt;p&gt;A straightforward version of those in Python is implemented in &lt;tt class="docutils literal"&gt;laplace.py&lt;/tt&gt;
using, respectively, &lt;tt class="docutils literal"&gt;inline_slow_time_step&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;slow_time_step&lt;/tt&gt;.
&lt;tt class="docutils literal"&gt;slow_2_time_step&lt;/tt&gt; does the same thing, except it copies arrays in-place
instead of creating new copies. Table below compares running PyPy against C:&lt;/p&gt;
&lt;table border="1" class="docutils"&gt;
&lt;colgroup&gt;
&lt;col width="35%" /&gt;
&lt;col width="34%" /&gt;
&lt;col width="31%" /&gt;
&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td&gt;bench&lt;/td&gt;
&lt;td&gt;number of iterations&lt;/td&gt;
&lt;td&gt;time per iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;laplace C&lt;/td&gt;
&lt;td&gt;219&lt;/td&gt;
&lt;td&gt;6.3ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;inline-laplace C&lt;/td&gt;
&lt;td&gt;278&lt;/td&gt;
&lt;td&gt;20ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;slow python&lt;/td&gt;
&lt;td&gt;219&lt;/td&gt;
&lt;td&gt;17ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;slow 2 python&lt;/td&gt;
&lt;td&gt;219&lt;/td&gt;
&lt;td&gt;14ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;inline_slow python&lt;/td&gt;
&lt;td&gt;278&lt;/td&gt;
&lt;td&gt;23.7ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;An important thing to notice is the data dependency of the inline
version causes a huge slowdown for the C versions. This is not a severe
disadvantage for us though -- the brain-dead Python version takes longer
and PyPy is not able to take advantage of the knowledge that the data is
independent. The results are in the same ballpark as the C versions --
&lt;strong&gt;15% - 170%&lt;/strong&gt; slower, but the algorithm
one chooses matters more than the language. By comparison, the slow versions
take about &lt;strong&gt;5.75s&lt;/strong&gt; each on CPython 2.6 per iteration and, by estimation,
are about &lt;strong&gt;200x&lt;/strong&gt; slower than the PyPy equivalent, if I had the patience to
measure the full run.&lt;/p&gt;
&lt;p&gt;The next step is to use NumPy expressions. The first problem we run into is
that computing the error requires walking the entire array a second time. This
is fairly inefficient in terms of cache access, so I took the liberty of
computing the errors every 15 steps. This results in the convergence being
rounded to the nearest 15 iterations, but speeds things up considerably.
&lt;tt class="docutils literal"&gt;numeric_time_step&lt;/tt&gt; takes the most braindead approach of replacing the array
with itself, like this:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
u[1:-1, 1:-1] = ((u[0:-2, 1:-1] + u[2:, 1:-1])*dy2 +
                       (u[1:-1,0:-2] + u[1:-1, 2:])*dx2)*dnr_inv
&lt;/pre&gt;
&lt;p&gt;We need 3 arrays here -- one is an intermediate (PyPy only needs one, for all of
those subexpressions), one is a copy for computing the error, and one is the
result. This works automatically because in NumPy &lt;tt class="docutils literal"&gt;+&lt;/tt&gt; or &lt;tt class="docutils literal"&gt;*&lt;/tt&gt; creates an
intermediate, while NumPyPy avoids allocating the intermediate if possible.&lt;/p&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;numeric_2_time_step&lt;/tt&gt; works in pretty much the same way:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
src = self.u
self.u = src.copy()
self.u[1:-1, 1:-1] = ((src[0:-2, 1:-1] + src[2:, 1:-1])*dy2 +
                      (src[1:-1,0:-2] + src[1:-1, 2:])*dx2)*dnr_inv
&lt;/pre&gt;
&lt;p&gt;except the copy is now explicit rather than implicit.&lt;/p&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;numeric_3_time_step&lt;/tt&gt; does the same thing, but notice one doesn't have to copy
the entire array, it's enough to copy the border pieces and fill rest with
zeros:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
src = self.u
self.u = numpy.zeros((self.nx, self.ny), 'd')
self.u[0] = src[0]
self.u[-1] = src[-1]
self.u[:, 0] = src[:, 0]
self.u[:, -1] = src[:, -1]
self.u[1:-1, 1:-1] = ((src[0:-2, 1:-1] + src[2:, 1:-1])*dy2 +
                      (src[1:-1,0:-2] + src[1:-1, 2:])*dx2)*dnr_inv
&lt;/pre&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;numeric_4_time_step&lt;/tt&gt; is the one that tries hardest to resemble the C version.
Instead of doing an array copy, it actually notices that one can alternate
between two arrays. This is exactly what the C version does. The
&lt;tt class="docutils literal"&gt;remove_invalidates&lt;/tt&gt; call is a PyPy specific hack - we hope to remove this
call in the near future, but, in short, it promises &amp;quot;I don't have any unbuilt
intermediates that depend on the value of the argument&amp;quot;, which means one doesn't
have to compute sub-expressions one is not actually using:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
remove_invalidates(self.old_u)
remove_invalidates(self.u)
self.old_u[:,:] = self.u
src = self.old_u
self.u[1:-1, 1:-1] = ((src[0:-2, 1:-1] + src[2:, 1:-1])*dy2 +
                      (src[1:-1,0:-2] + src[1:-1, 2:])*dx2)*dnr_inv
&lt;/pre&gt;
&lt;p&gt;This one is the most comparable to the C version.&lt;/p&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;numeric_5_time_step&lt;/tt&gt; does the same thing, but notices one doesn't have to copy
the entire array, it's enough to just copy the edges. This is an optimization
that was not done in the C version:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
remove_invalidates(self.old_u)
remove_invalidates(self.u)
src = self.u
self.old_u, self.u = self.u, self.old_u
self.u[0] = src[0]
self.u[-1] = src[-1]
self.u[:, 0] = src[:, 0]
self.u[:, -1] = src[:, -1]
self.u[1:-1, 1:-1] = ((src[0:-2, 1:-1] + src[2:, 1:-1])*dy2 +
                      (src[1:-1,0:-2] + src[1:-1, 2:])*dx2)*dnr_inv
&lt;/pre&gt;
&lt;p&gt;Let's look at the table of runs. As before, &lt;tt class="docutils literal"&gt;gcc 4.4.5&lt;/tt&gt;, compiled at &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;-O3&lt;/span&gt;&lt;/tt&gt;,
and PyPy nightly 7bb8b38d8563, on an x86-64 machine. All of the numeric methods
run for 226 steps, slightly more than the 219, rounding to the next 15 when the
error is computed.&lt;/p&gt;
&lt;table border="1" class="docutils"&gt;
&lt;colgroup&gt;
&lt;col width="44%" /&gt;
&lt;col width="25%" /&gt;
&lt;col width="31%" /&gt;
&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td&gt;benchmark&lt;/td&gt;
&lt;td&gt;PyPy&lt;/td&gt;
&lt;td&gt;CPython&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;numeric&lt;/td&gt;
&lt;td&gt;21ms&lt;/td&gt;
&lt;td&gt;35ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;numeric 2&lt;/td&gt;
&lt;td&gt;14ms&lt;/td&gt;
&lt;td&gt;37ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;numeric 3&lt;/td&gt;
&lt;td&gt;13ms&lt;/td&gt;
&lt;td&gt;29ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;numeric 4&lt;/td&gt;
&lt;td&gt;11ms&lt;/td&gt;
&lt;td&gt;31ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;numeric 5&lt;/td&gt;
&lt;td&gt;9.3ms&lt;/td&gt;
&lt;td&gt;21ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We think that these preliminary results are pretty good. They're not as fast as
the C version (or as fast as we'd like them to be), but we're already much
faster than NumPy on CPython -- almost always by more than 2x on this relatively
real-world example. This is not the end, though. In fact, it's hardly the
beginning! As we continue work, we hope to make even more use of the
high level information that we have. Looking at the assembler generated by
gcc for this example, it's pretty clear we can outperform it thanks to better
aliasing information and hence better possibilities for vectorization.
Stay tuned.&lt;/p&gt;
&lt;p&gt;EDIT: fixed the benchmark name
&lt;/p&gt;
&lt;p&gt;EDIT2: added info that first table is about PyPy&lt;/p&gt;
&lt;p&gt;Cheers,
fijal&lt;/p&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-3336055571122066974?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/uj7qkz-DuxM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/3336055571122066974/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=3336055571122066974" title="17 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/3336055571122066974?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/3336055571122066974?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/uj7qkz-DuxM/numpypy-progress-report-running.html" title="NumPyPy progress report - running benchmarks" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>17</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2012/01/numpypy-progress-report-running.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0YER3o-fSp7ImA9WhRWEE0.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-6862532189897876336</id><published>2011-12-27T17:57:00.001+01:00</published><updated>2011-12-27T17:58:26.455+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-27T17:58:26.455+01:00</app:edited><title>Leysin Winter Sprint</title><content type="html">&lt;h2&gt;PyPy Leysin Winter Sprint: 15-22nd January 2012&lt;/h2&gt;

&lt;p&gt;
The next PyPy sprint will be in Leysin, Switzerland, for the
eighth time.  This is a fully public sprint: newcomers and topics
other than those proposed below are welcome.&lt;/p&gt;

&lt;h3&gt;Goals and topics of the sprint&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Py3k: work towards supporting Python 3 in PyPy

&lt;li&gt;NumPyPy: work towards supporting the numpy module in PyPy

&lt;li&gt;JIT backends: integrate tests for ARM; look at the PowerPC 64;
  maybe try again to write an LLVM- or GCC-based one

&lt;li&gt;STM and STM-related topics; or the Concurrent Mark-n-Sweep GC

&lt;li&gt;And as usual, the main side goal is to have fun in winter sports :-)
  We can take a day off for ski.
&lt;/ul&gt;

&lt;h3&gt;Exact times&lt;/h3&gt;

&lt;p&gt;The work days should be 15-21 January 2011 (Sunday-Saturday).  The
official plans are for people to arrive on the 14th or the 15th, and to
leave on the 22nd.&lt;/p&gt;

&lt;p&gt;Interested? &lt;a href="https://bitbucket.org/pypy/extradoc/raw/extradoc/sprintinfo/leysin-winter-2012/announcement.txt"&gt;Read more...&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-6862532189897876336?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/4nHdPvgIKRU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/6862532189897876336/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=6862532189897876336" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6862532189897876336?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6862532189897876336?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/4nHdPvgIKRU/leysin-winter-sprint.html" title="Leysin Winter Sprint" /><author><name>Armin Rigo</name><uri>http://www.blogger.com/profile/06300515270104686574</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>3</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/12/leysin-winter-sprint.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkEDQXg_fip7ImA9WhRXFkk.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-610420698450130659</id><published>2011-12-22T23:27:00.003+01:00</published><updated>2011-12-23T14:57:50.646+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-23T14:57:50.646+01:00</app:edited><title>Come see us at PyCon 2012</title><content type="html">&lt;p&gt;&lt;a class="reference external" href="https://us.pycon.org/2012/"&gt;PyCon 2012&lt;/a&gt; is coming up in just a few short months, and PyPy will be well&lt;br /&gt;
represented there.  We'll be delivering a tutorial, two talks, plus we'll be&lt;br /&gt;
around for the sprints.&lt;/p&gt;&lt;p&gt;Here are the abstracts for the tutorials and talks:&lt;/p&gt;&lt;ul class="simple"&gt;&lt;li&gt;&lt;strong&gt;How to get the most out of your PyPy&lt;/strong&gt;, by Maciej Fijalkowski, Alex Gaynor&lt;br /&gt;
and Armin Rigo: For many applications PyPy can provide performance benefits&lt;br /&gt;
right out of the box. However, little details can push your application to&lt;br /&gt;
perform much better. In this tutorial we'll give you insights on how to push&lt;br /&gt;
PyPy to its limits. We'll focus on understanding the performance&lt;br /&gt;
characteristics of PyPy, and learning the analysis tools in order to maximize&lt;br /&gt;
your applications' performance. &lt;em&gt;This is the tutorial.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Why PyPy by example&lt;/strong&gt;, by Maciej Fijalkowski, Alex Gaynor and Armin Rigo:&lt;br /&gt;
One of the goals of PyPy is to make existing Python code faster; however an&lt;br /&gt;
even broader goal was to make it possible to write things in Python that&lt;br /&gt;
previously would needed to be written in C or other low-level language. This&lt;br /&gt;
talk will show examples of this, and describe how they represent the&lt;br /&gt;
tremendous progress PyPy has made, and what it means for people looking at&lt;br /&gt;
using PyPy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How the PyPy JIT works&lt;/strong&gt;, by Benjamin Peterson: The Python community is&lt;br /&gt;
abuzz about the major speed gains PyPy can offer for pure Python code. But how&lt;br /&gt;
does the PyPy JIT actually work? This talk will discuss how the PyPy JIT is&lt;br /&gt;
implemented. It will include descriptions of the tracing, optimization, and&lt;br /&gt;
assembly generation phases. I will demonstrate each step with an example loop.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;If you have any questions let us know!  We look forward to seeing people at&lt;br /&gt;
PyCon and chatting about PyPy and the entire Python ecosystem.&lt;/p&gt;&lt;p&gt;See you there,&lt;br /&gt;
Maciej Fijalkowski, Alex Gaynor, Benjamin Peterson, Armin Rigo, and the entire PyPy team&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-610420698450130659?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/4OOmhB-scoc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/610420698450130659/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=610420698450130659" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/610420698450130659?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/610420698450130659?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/4OOmhB-scoc/come-see-us-at-pycon-2012.html" title="Come see us at PyCon 2012" /><author><name>Alex</name><uri>http://www.blogger.com/profile/14054821112394577330</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/12/come-see-us-at-pycon-2012.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUMDSXg5fip7ImA9WhRQE0U.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-6389240123679375092</id><published>2011-12-08T23:29:00.001+01:00</published><updated>2011-12-08T23:31:18.626+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-08T23:31:18.626+01:00</app:edited><title>Plotting using matplotlib from PyPy</title><content type="html">&lt;p&gt;&lt;strong&gt;Big fat warning&lt;/strong&gt; This is just a proof of concept. It barely works. There are
missing pieces left and right, which were replaced with hacks so I can get this
to run and prove it's possible. Don't try this at home, especially your home.
You have been warned.&lt;/p&gt;
&lt;p&gt;There has been a lot of talking about PyPy not integrating well with the
current scientific Python ecosystem, and &lt;tt class="docutils literal"&gt;numpypy&lt;/tt&gt; (a NumPy reimplementation
on top of pypy) was dubbed &amp;quot;a fancy array library&amp;quot;. I'm going to show that
integration with this ecosystem is possible with our design.&lt;/p&gt;
&lt;p&gt;First, &lt;a class="reference external" href="https://bitbucket.org/fijal/hack2/src/default/embed/embed/matplotwrapper.py"&gt;the demo&lt;/a&gt;:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
#!/usr/bin/env pypy

# numpy, pypy version
import numpypy as numpy
# DRAGONS LIVE THERE (fortunately hidden)
from embed.emb import import_mod

pylab = import_mod('matplotlib.pylab')

if __name__ == '__main__':
    a = numpy.arange(100, dtype=int)
    b = numpy.sin(a)
    pylab.plot(a, b)
    pylab.show()
&lt;/pre&gt;
&lt;p&gt;And you get:&lt;/p&gt;

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-4WFlo06T9AQ/TuE6jWWCdXI/AAAAAAAABiY/g4gZPlWg0-U/s1600/screen0.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 258px;" src="http://3.bp.blogspot.com/-4WFlo06T9AQ/TuE6jWWCdXI/AAAAAAAABiY/g4gZPlWg0-U/s320/screen0.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5683888583686124914" /&gt;&lt;/a&gt;

&lt;p&gt;Now, how to reproduce it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p class="first"&gt;You need a PyPy without cpyext, I did not find a linker that would support
overriding symbols. Right now there are no nightlies like this, so you have
to compile it yourself, like:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
./translate.py -Ojit targetpypystandalone.py --withoutmod-cpyext
&lt;/pre&gt;
&lt;p&gt;That would give you a PyPy that's unable to load some libraries like PIL, but
perfectly working otherwise.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Speaking of which, you need a reasonably recent PyPy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;The approach is generally portable, however the implementation has been
tested only on 64bit linux. Few tweaks might be required.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;You need to install python2.6, the python2.6 development headers, and have
numpy and matplotlib installed on that python.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;You need a checkout of my &lt;a class="reference external" href="https://bitbucket.org/fijal/hack2"&gt;hacks directory&lt;/a&gt; and put embedded on your
&lt;tt class="docutils literal"&gt;PYTHONPATH&lt;/tt&gt;, your pypy checkout also has to be on the &lt;tt class="docutils literal"&gt;PYTHONPATH&lt;/tt&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="section" id="er-wait-what-happened"&gt;
&lt;h3&gt;Er wait, what happened?&lt;/h3&gt;
&lt;p&gt;What didn't happen is we did not reimplement matplotlib on top of PyPy. What
did happen is we embed CPython inside of PyPy using ctypes. We instantiate it.
and follow the &lt;a class="reference external" href="http://docs.python.org/extending/embedding.html"&gt;embedding&lt;/a&gt; tutorial for CPython. Since numpy arrays are not
movable, we're able to pass around an integer that's represents the memory
address of the array data and reconstruct it in the embedded interpreter. Hence
with a relatively little effort we managed to reuse the same array data on both
sides to plot at array. Easy, no?&lt;/p&gt;
&lt;p&gt;This approach can be extended to support anything that's not too tied with
python objects. SciPy and matplotlib both fall into the same category
but probably the same strategy can be applied to anything, like GTK or QT.
It's just a matter of extending a hack into a working library.&lt;/p&gt;
&lt;p&gt;To summarize, while we're busy making numpypy better and faster, it seems
that all external libraries on the C side can be done using an embedded Python
interpreter with relatively little effort. To get to that point, I spent
a day and a half to learn how to embed CPython, with very little prior
experience in the CPython APIs. Of course you should still keep as much as
possible in PyPy to make it nice and fast :)&lt;/p&gt;
&lt;p&gt;Cheers,
fijal&lt;/p&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-6389240123679375092?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/Wj-0eduzUJ0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/6389240123679375092/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=6389240123679375092" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6389240123679375092?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6389240123679375092?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/Wj-0eduzUJ0/plotting-using-matplotlib-from-pypy.html" title="Plotting using matplotlib from PyPy" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-4WFlo06T9AQ/TuE6jWWCdXI/AAAAAAAABiY/g4gZPlWg0-U/s72-c/screen0.png" height="72" width="72" /><thr:total>6</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/12/plotting-using-matplotlib-from-pypy.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkMASHY9cSp7ImA9WhRRFUU.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-4962523601794245248</id><published>2011-11-29T16:24:00.004+01:00</published><updated>2011-11-29T16:27:29.869+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-29T16:27:29.869+01:00</app:edited><title>PyPy 1.7 on Win32</title><content type="html">&lt;p&gt;Hi all,&lt;/p&gt;

&lt;p&gt;We have fixed &lt;a href="http://doc.pypy.org/en/latest/stackless.html"&gt;_continuation&lt;/a&gt; on Win32 (thanks Stakkars), and so we have now a Win32 version of &lt;a href="http://pypy.org/download.html"&gt;PyPy 1.7.&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-4962523601794245248?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/QOut5II5yRA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/4962523601794245248/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=4962523601794245248" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/4962523601794245248?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/4962523601794245248?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/QOut5II5yRA/pypy-17-on-win32.html" title="PyPy 1.7 on Win32" /><author><name>Armin Rigo</name><uri>http://www.blogger.com/profile/06300515270104686574</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/11/pypy-17-on-win32.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C08MSHo9eip7ImA9WhRSGEo.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-4260962828394182017</id><published>2011-11-21T11:34:00.002+01:00</published><updated>2011-11-21T11:38:09.462+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-21T11:38:09.462+01:00</app:edited><title>PyPy 1.7 - widening the sweet spot</title><content type="html">&lt;p&gt;We're pleased to announce the 1.7 release of PyPy. As became a habit, this
release brings a lot of bugfixes and performance improvements over the 1.6
release. However, unlike the previous releases, the focus has been on widening
the &amp;quot;sweet spot&amp;quot; of PyPy. That is, classes of Python code that PyPy can greatly
speed up should be vastly improved with this release. You can download the 1.7
release here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;a class="reference external" href="http://pypy.org/download.html"&gt;http://pypy.org/download.html&lt;/a&gt;&lt;/blockquote&gt;
&lt;div class="section" id="what-is-pypy"&gt;
&lt;h3&gt;What is PyPy?&lt;/h3&gt;
&lt;p&gt;PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7. It's fast (&lt;a class="reference external" href="http://speed.pypy.org"&gt;pypy 1.7 and cpython 2.7.1&lt;/a&gt; performance comparison)
due to its integrated tracing JIT compiler.&lt;/p&gt;
&lt;p&gt;This release supports x86 machines running Linux 32/64, Mac OS X 32/64 or
Windows 32. Windows 64 work is ongoing, but not yet natively supported.&lt;/p&gt;
&lt;p&gt;The main topic of this release is widening the range of code which PyPy
can greatly speed up. On average on
our benchmark suite, PyPy 1.7 is around &lt;strong&gt;30%&lt;/strong&gt; faster than PyPy 1.6 and up
to &lt;strong&gt;20 times&lt;/strong&gt; faster on some benchmarks.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="highlights"&gt;
&lt;h3&gt;Highlights&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p class="first"&gt;Numerous performance improvements. There are too many examples which python
constructs now should behave faster to list them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Bugfixes and compatibility fixes with CPython.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Windows fixes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;PyPy now comes with stackless features enabled by default. However,
any loop using stackless features will interrupt the JIT for now, so no real
performance improvement for stackless-based programs. Contact pypy-dev for
info how to help on removing this restriction.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;NumPy effort in PyPy was renamed numpypy. In order to try using it, simply
write:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
import numpypy as numpy
&lt;/pre&gt;
&lt;p&gt;at the beginning of your program. There is a huge progress on numpy in PyPy
since 1.6, the main feature being implementation of dtypes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;JSON encoder (but not decoder) has been replaced with a new one. This one
is written in pure Python, but is known to outperform CPython's C extension
up to &lt;strong&gt;2 times&lt;/strong&gt; in some cases. It's about &lt;strong&gt;20 times&lt;/strong&gt; faster than
the one that we had in 1.6.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;The memory footprint of some of our RPython modules has been drastically
improved. This should impact any applications using for example cryptography,
like tornado.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;There was some progress in exposing even more CPython C API via cpyext.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="things-that-didn-t-make-it-expect-in-1-8-soon"&gt;
&lt;h3&gt;Things that didn't make it, expect in 1.8 soon&lt;/h3&gt;
&lt;p&gt;There is an ongoing work, which while didn't make it to the release, is
probably worth mentioning here. This is what you should probably expect in
1.8 some time soon:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Specialized list implementation. There is a branch that implements lists of
integers/floats/strings as compactly as array.array. This should drastically
improve performance/memory impact of some applications&lt;/li&gt;
&lt;li&gt;NumPy effort is progressing forward, with multi-dimensional arrays coming
soon.&lt;/li&gt;
&lt;li&gt;There are two brand new JIT assembler backends, notably for the PowerPC and
ARM processors.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="fundraising"&gt;
&lt;h3&gt;Fundraising&lt;/h3&gt;
&lt;p&gt;It's maybe worth mentioning that we're running fundraising campaigns for
NumPy effort in PyPy and for Python 3 in PyPy. In case you want to see any
of those happen faster, we urge you to donate to &lt;a class="reference external" href="http://pypy.org/numpydonate.html"&gt;numpy proposal&lt;/a&gt; or
&lt;a class="reference external" href="http://pypy.org/py3donate.html"&gt;py3k proposal&lt;/a&gt;. In case you want PyPy to progress, but you trust us with
the general direction, you can always donate to the &lt;a class="reference external" href="http://pypy.org"&gt;general pot&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Cheers,&lt;br/&gt;Maciej Fijałkowki, Armin Rigo and the entire PyPy team&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-4260962828394182017?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/YDxdPHGBChk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/4260962828394182017/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=4260962828394182017" title="27 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/4260962828394182017?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/4260962828394182017?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/YDxdPHGBChk/pypy-17-widening-sweet-spot.html" title="PyPy 1.7 - widening the sweet spot" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>27</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/11/pypy-17-widening-sweet-spot.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUIGQnk6fCp7ImA9WhRSEkU.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-8371395613874909242</id><published>2011-11-14T12:42:00.002+01:00</published><updated>2011-11-14T16:12:03.714+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-14T16:12:03.714+01:00</app:edited><title>Gothenburg sprint report</title><content type="html">In the past week, we have been busy hacking on PyPy at the Gothenburg sprint, the second of this 2011.  The sprint was hold at Laura's and Jacob's place, and here is a brief report of what happened.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the first day we welcomed Mark Pearse, who was new to PyPy and at his first sprint.  Mark worked the whole sprint in the new &lt;a class="reference external" href="http://bitbucket.org/pypy/pypy/changesets/tip/branch%28%22SpecialisedTuples%22%29"&gt;SpecialisedTuple&lt;/a&gt; branch, whose aim is to have a special implementation for small 2-items and 3-items tuples of primitive types (e.g., ints or floats) to save memory.  Mark paired with Antonio for a couple of days, then he continued alone and did an amazing job.  He even learned how to properly do Test Driven Development :-).&lt;br /&gt;&lt;br/&gt;
Antonio spent a couple of days investigating whether it is possible to use &lt;cite&gt;application checkpoint&lt;/cite&gt; libraries such as &lt;a class="reference external" href="http://ftg.lbl.gov/projects/CheckpointRestart/"&gt;BLCR&lt;/a&gt; and &lt;a class="reference external" href="http://dmtcp.sourceforge.net/"&gt;DMTCP&lt;/a&gt; to save the state of the PyPy interpreter between subsequent runs, thus saving also the JIT-compiled code to reduce the warmup time.  The conclusion is that these are interesting technologies, but more work would be needed (either on the PyPy side or on the checkpoint library side) before it can have a practical usage for PyPy users.&lt;br /&gt;&lt;br/&gt;
Then, Antonio spent most of the rest of the sprint working on his &lt;a class="reference external" href="http://bitbucket.org/pypy/pypy/changesets/tip/branch%28%22ffistruct%22%29"&gt;ffistruct&lt;/a&gt; branch, whose aim is to provide a very JIT-friendly way to interact with C structures, and eventually implement &lt;tt class="docutils literal"&gt;ctypes.Structure&lt;/tt&gt; on top of that.  The "cool part" of the branch is already done, and the JIT already can compile set/get of fields into a single fast assembly instruction, about 400 times faster than the corresponding ctypes code.  What is still left to do is to add a nicer syntax (which is easy) and to implement all the ctypes peculiarities (which is tedious, at best :-)).&lt;br /&gt;&lt;br/&gt;
As usual, Armin did tons of different stuff, including fixing a JIT bug, improving the performance of &lt;tt class="docutils literal"&gt;file.readlines()&lt;/tt&gt; and working on the &lt;a class="reference external" href="http://bitbucket.org/pypy/pypy/changesets/tip/branch%28%22stm%22%29"&gt;STM&lt;/a&gt; branch (for Software Transactional Memory), which is now able to run RPython multithreaded programs using software transaction (as long as they don't fill up all the memory, because support for the GC is still missing :-)).  Finally, he worked on improving the Windows version of PyPy. While doing so he discovered together with Anto a terrible bug which lead to a continuous leak of stack space because the JIT called some functions using the wrong calling convention.&lt;br /&gt;&lt;br/&gt;
Håkan, with some help from Armin, worked on the &lt;a class="reference external" href="http://bitbucket.org/pypy/pypy/changesets/tip/branch%28%22stm%22%29"&gt;jit-targets&lt;/a&gt; branch, whose goal is to heavily refactor the way the traces are internally represented by the JIT, so that in the end we can produce (even :-)) better code than what we do nowadays.  More details in this &lt;a class="reference external" href="http://mail.python.org/pipermail/pypy-dev/2011-November/008728.html"&gt;mail&lt;/a&gt;.&lt;br /&gt;&lt;br/&gt;
Andrew Dalke worked on a way to integrate PyPy with FORTRAN libraries, and in particular the ones which are wrapped by Numpy and Scipy: in doing so, he wrote &lt;a class="reference external" href="http://bitbucket.org/pypy/f2pypy"&gt;f2pypy&lt;/a&gt;, which is similar to the existing &lt;tt class="docutils literal"&gt;f2py&lt;/tt&gt; but instead of producing a CPython extension module it produces a pure python modules based on &lt;tt class="docutils literal"&gt;ctypes&lt;/tt&gt;.  More work is needed before it can be considered complete, but &lt;tt class="docutils literal"&gt;f2pypy&lt;/tt&gt; is already able to produce a wrapper for BLAS which passes most of the tests under CPython, although there's still work left to get it working for PyPy.&lt;br /&gt;&lt;br/&gt;
&lt;table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; text-align: right;"&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-VSYa9zxkn_Y/TsD-Dera4zI/AAAAAAAAAPE/Yoea-XAY0Kg/s1600/5x-cake.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="240" src="http://1.bp.blogspot.com/-VSYa9zxkn_Y/TsD-Dera4zI/AAAAAAAAAPE/Yoea-XAY0Kg/s320/5x-cake.jpg" width="320" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Armin and Håkan with Laura's "5x faster" cake&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;Christian Tismer worked the whole sprint on the branch to make PyPy compatible with Windows 64 bit.  This needs a lot of work because a lot of PyPy is written under the assumption that the &lt;tt class="docutils literal"&gt;long&lt;/tt&gt; type in C has the same bit size than &lt;tt class="docutils literal"&gt;void*&lt;/tt&gt;, which is not true on Win64.  Christian says that in the past Genova-Pegli sprint he completed 90% of the work, and in this sprint he did the other 90% of the work.  Obviously, what is left to complete the task is the third 90% :-).  More seriously, he estimated a total of 2-4 person-weeks of work to finish it.&lt;br /&gt;&lt;br/&gt;
But, all in all, the best part of the sprint has been the cake that Laura baked to celebrate the "5x faster than CPython" achievement. Well, actually our &lt;a class="reference external" href="http://speed.pypy.org/"&gt;speed&lt;/a&gt; page reports "only" 4.7x, but that's because in the meantime we switched from comparing against CPython 2.6 to comparing against CPython 2.7, which is slightly faster.  We are confident that we will reach the 5x goal again, and that will be the perfect excuse to eat another cake :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-8371395613874909242?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/RFnbC60eu-M" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/8371395613874909242/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=8371395613874909242" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/8371395613874909242?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/8371395613874909242?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/RFnbC60eu-M/gothenburg-sprint-report.html" title="Gothenburg sprint report" /><author><name>Antonio Cuni</name><uri>http://www.blogger.com/profile/17017456817083804792</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="27" height="32" src="http://3.bp.blogspot.com/-L5bYMW7FznA/Tb3Ms8nMvRI/AAAAAAAAANI/dbKVSIc2dzk/s220/anto-bn3.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-VSYa9zxkn_Y/TsD-Dera4zI/AAAAAAAAAPE/Yoea-XAY0Kg/s72-c/5x-cake.jpg" height="72" width="72" /><thr:total>7</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/11/gothenburg-sprint-report.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0INQXs7fSp7ImA9WhdaF04.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-8937643890263223898</id><published>2011-10-27T17:48:00.004+02:00</published><updated>2011-10-27T20:19:50.505+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-27T20:19:50.505+02:00</app:edited><title>Speeding up JSON encoding in PyPy</title><content type="html">&lt;p&gt;Hi&lt;/p&gt;
&lt;p&gt;Recently I spent a bit of effort into speeding up JSON in PyPy. I started with
writing a &lt;a class="reference external" href="https://bitbucket.org/pypy/benchmarks/src/f04d6d63ba60/own/json_bench.py"&gt;benchmark&lt;/a&gt;, which is admittedly not a very good one, but it's
better than nothing (suggestions on how to improve it are welcome!).&lt;/p&gt;
&lt;p&gt;For this particular benchmark, the numbers are as follow. &lt;strong&gt;Note that CPython by
default uses the optimized C extension, while PyPy uses the pure Python one&lt;/strong&gt;.
PyPy trunk contains another pure Python version which has been optimized
specifically for the PyPy JIT. Detailed optimizations are described later in
this post.&lt;/p&gt;
&lt;p&gt;The number reported is the time taken for the third run, when things are
warmed up. Full session &lt;a class="reference external" href="http://paste.pocoo.org/show/498988/"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;table border="1" class="docutils"&gt;
&lt;colgroup&gt;
&lt;col width="68%" /&gt;
&lt;col width="32%" /&gt;
&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td&gt;CPython 2.6&lt;/td&gt;
&lt;td&gt;22s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;CPython 2.7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.7s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;CPython 2.7 no C extension&lt;/td&gt;
&lt;td&gt;44s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;PyPy 1.5&lt;/td&gt;
&lt;td&gt;34s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;PyPy 1.6&lt;/td&gt;
&lt;td&gt;22s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;PyPy trunk&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.3s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Lessons learned:&lt;/p&gt;
&lt;div class="section" id="expectations-are-high"&gt;
&lt;h3&gt;Expectations are high&lt;/h3&gt;
&lt;p&gt;A lot of performance critical stuff in Python world is already written in a hand
optimized C. Writing C (especially when you interface with CPython C API) is
ugly and takes significant effort. This approach does not scale well when
there is a lot of code to be written or when there is a very tight coupling
between the part to be rewritten and the rest of the code. Still, people would
expect PyPy to be better at &amp;quot;tasks&amp;quot; and not precisely at running equivalent
code, hence a comparison between the C extension and the pure python version
is sound. Fortunately it's possible to outperform the C extension, but requires
a bit of effort on the programmer side as well.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="often-interface-between-the-c-and-python-part-is-ugly"&gt;
&lt;h3&gt;Often interface between the C and Python part is ugly&lt;/h3&gt;
&lt;p&gt;This is very clear if you look at json module as implemented in CPython's
standard library. Not everything is in C (it would probably be just too
much effort) and the interface to what is in C is guided via profiling not
by what kind of interface makes sense. This especially is evident comparing CPython 2.6 to 2.7.
Just adapting the code to an interface with C made the Python version slower.
Removing this clutter improves the readability a lot and improves PyPy's version
a bit, although I don't have hard numbers.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="jitviewer-is-crucial"&gt;
&lt;h3&gt;JitViewer is crucial&lt;/h3&gt;
&lt;p&gt;In case you're fighting with PyPy's performance, &lt;a class="reference external" href="https://bitbucket.org/pypy/jitviewer"&gt;jitviewer&lt;/a&gt; is worth a shot.
While it's not completely trivial to understand what's going on, it'll
definitely show you what kind of loops got compiled and how.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="no-nice-and-fast-way-to-build-strings-in-python"&gt;
&lt;h3&gt;No nice and fast way to build strings in Python&lt;/h3&gt;
&lt;p&gt;PyPy has a custom thing called &lt;tt class="docutils literal"&gt;__pypy__.builders.StringBuilder&lt;/tt&gt;. It has
a few a features that make it much easier to optimize than other ways like
&lt;tt class="docutils literal"&gt;str.join()&lt;/tt&gt; or &lt;tt class="docutils literal"&gt;cStringIO&lt;/tt&gt;.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;You can specify the start size, which helps a lot if you can even provide
a rough estimate on the size of the string (less copying)&lt;/li&gt;
&lt;li&gt;Only append and build are allowed. While  the string is being built you
can't seek or do anything else. After it's built you can never append any more.&lt;/li&gt;
&lt;li&gt;Unicode version available as well as &lt;tt class="docutils literal"&gt;__pypy__.builders.UnicodeBuilder&lt;/tt&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="method-calls-are-ok-immutable-globals-are-ok"&gt;
&lt;h3&gt;Method calls are ok, immutable globals are ok&lt;/h3&gt;
&lt;p&gt;PyPy's JIT seems to be good enough for at least the simple cases. Calling
methods for common infrastructure or loading globals (instead of rebinding as
locals) is fast enough and improves code readability.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="string-copying-is-expensive"&gt;
&lt;h3&gt;String copying is expensive&lt;/h3&gt;
&lt;p&gt;&lt;b&gt;Edit:&lt;/b&gt; see the comment at the end&lt;/p&gt;
&lt;p&gt;If you use &lt;tt class="docutils literal"&gt;re.sub&lt;/tt&gt;, the current implementation will always create a copy
of the string even if there was no match to replace.
If you know your regexp is simple, first try to check if there is
anything to replace. This is a pretty hard optimization to
do automatically -- simply matching the regular expression can be too costly
for it to make sense. In our particular example however, the regexp is really
simple, checking ranges of characters. It also seems that this is by far the
fastest way to escape characters as of now.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="generators-are-slower-than-they-should-be"&gt;
&lt;h3&gt;Generators are slower than they should be&lt;/h3&gt;
&lt;p&gt;I changed the entire thing to simply call &lt;tt class="docutils literal"&gt;builder.append&lt;/tt&gt; instead of
yielding to the main loop where it would be gathered. This is kind of a PyPy
bug that using generators extensively is slower, but a bit hard to fix.
Especially in cases where there is relatively little data being passed around
(few bytes), it makes sense to gather it first. If I were to implement an
efficient version of &lt;tt class="docutils literal"&gt;iterencode&lt;/tt&gt;, I would probably handle chunks of
predetermined size, about 1000 bytes instead of yielding data every few bytes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="i-must-admit-i-worked-around-pypy-s-performance-bug"&gt;
&lt;h3&gt;I must admit I worked around PyPy's performance bug&lt;/h3&gt;
&lt;p&gt;For obscure (although eventually fixable) reasons, this:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
for c in s: # s is string
  del c
&lt;/pre&gt;
&lt;p&gt;is faster than:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
for c in s:
  pass
&lt;/pre&gt;
&lt;p&gt;This is a PyPy performance bug and should be fixed, but on a different branch ;-)&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="pypy-s-jit-is-good"&gt;
&lt;h3&gt;PyPy's JIT is good&lt;/h3&gt;
&lt;p&gt;I was pretty surprised, but the JIT actually did make stuff work nicely.
The changes that were done were relatively minor and straightforward, once
the module was cleaned to the normal &amp;quot;pythonic&amp;quot; state.
It is worth noting that it's possible to write code in Python and make it
run really fast, but you have to be a bit careful. Again, jitviewer is your
friend when determining why things are slow. I hope we can write more tools
in the future that would more automatically guide people through potential
performance pitfals.&lt;/p&gt;
&lt;p&gt;Cheers,
fijal&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Edit:&lt;/b&gt; I was wrong about re.sub. It just seems to be that the JIT is figuring match better than sub, will be fixed soon&lt;/p&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-8937643890263223898?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/-anuUT2hQ5I" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/8937643890263223898/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=8937643890263223898" title="13 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/8937643890263223898?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/8937643890263223898?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/-anuUT2hQ5I/speeding-up-json-encoding-in-pypy.html" title="Speeding up JSON encoding in PyPy" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>13</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/10/speeding-up-json-encoding-in-pypy.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkEGRn86fCp7ImA9WhdbGEU.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-7335004338996313725</id><published>2011-10-17T21:26:00.002+02:00</published><updated>2011-10-17T21:43:47.114+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-17T21:43:47.114+02:00</app:edited><title>PyPy Göteborg Post-Hallowe'en Sprint Nov 2nd - Nov 9th</title><content type="html">&lt;p&gt;The next PyPy sprint will be in Gothenburg, Sweden. It is a public sprint,
suitable for newcomers.  We'll focus on making a public kickoff for
both the &lt;a class="reference external" href="http://pypy.org/numpydonate.html"&gt;numpy/pypy integration project&lt;/a&gt;
and the &lt;a class="reference external" href="http://pypy.org/py3donate.html"&gt;Py3k support project&lt;/a&gt;,
as well as whatever interests the Sprint attendees.  Since both of these
projects are very new, there will be plenty of work suitable for newcomers
to PyPy.&lt;/p&gt;
&lt;p&gt;Other topics might include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Helping people get their code running with PyPy&lt;/li&gt;
&lt;li&gt;work on a FSCons talk?&lt;/li&gt;
&lt;li&gt;state of the STM Vinnova project (We most likely, but not for certain will
know whether or not we are approved by this date.)&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="section" id="other-useful-dates"&gt;
&lt;h3&gt;Other Useful dates&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.meetup.com/GothPy/events/32864862/"&gt;GothPyCon&lt;/a&gt; - Saturday Oct 29.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://fscons.org/"&gt;FSCONS&lt;/a&gt; Friday Nov 11 - Sunday Nov 12.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="location"&gt;
&lt;h3&gt;Location&lt;/h3&gt;
&lt;p&gt;The sprint will be held in the apartment of Laura Creighton and Jacob Hallén
which is at Götabergsgatan 22 in Gothenburg, Sweden.  Here is a &lt;a class="reference external" href="http://bit.ly/grRuQe"&gt;map&lt;/a&gt;.  This is
in central Gothenburg.  It is between the &lt;a class="reference external" href="http://www.vasttrafik.se/en/"&gt;tram&lt;/a&gt; stops of Vasaplatsen and
Valand, (a distance of 4 blocks) where many lines call -- the 2, 3, 4, 5,
7, 10 and 13.&lt;/p&gt;
&lt;p&gt;Probably cheapest and not too far away is to book accomodation at &lt;a class="reference external" href="http://www.sgsveckobostader.se/en"&gt;SGS
Veckobostader&lt;/a&gt;. The  &lt;a class="reference external" href="http://www.elite.se/hotell/goteborg/park/"&gt;Elite Park Avenyn Hotel&lt;/a&gt; is a luxury hotel just a
few blocks away. There are scores of hotels a short walk away from the
sprint location, suitable for every budget, desire for luxury, and desire
for the unusual.  You could, for instance, stay on a &lt;a class="reference external" href="http://www.liseberg.se/en/home/Accommodation/Hotel/Hotel-Barken-Viking/"&gt;boat&lt;/a&gt;.  Options are
too numerous to go into here. Just ask in the mailing list or on the blog.&lt;/p&gt;
&lt;p&gt;Hours will be
from 10:00 until people have had enough.  It's a good idea to arrive a
day before the sprint starts and leave a day later.  In the middle of
the sprint there usually is a break day and it's usually ok to take
half-days off if you feel like it.  Of course, many of you may be interested
in sticking around for FSCons, held the weekend after the sprint.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="good-to-know"&gt;
&lt;h3&gt;Good to Know&lt;/h3&gt;
&lt;p&gt;Sweden is not part of the Euro zone. One SEK (krona in singular, kronor
in plural) is roughly 1/10th of a Euro (9.36 SEK to 1 Euro).&lt;/p&gt;
&lt;p&gt;The venue is central in Gothenburg.  There is a large selection of
places to get food nearby, from edible-and-cheap to outstanding.  We
often cook meals together, so let us know if you have any food allergies,
dislikes, or special requirements.&lt;/p&gt;
&lt;p&gt;Sweden uses the same kind of plugs as Germany. 230V AC.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="getting-here"&gt;
&lt;h3&gt;Getting Here&lt;/h3&gt;
&lt;p&gt;If are coming train, you will arrive at the &lt;a class="reference external" href="http://bit.ly/fON43p"&gt;Central Station&lt;/a&gt;.  It is
about 12 blocks to the site from there, or you can take a &lt;a class="reference external" href="http://www.vasttrafik.se/en/"&gt;tram&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are two airports which are local to Göteborg, &lt;a class="reference external" href="http://swedavia.se/en/Goteborg/Traveller-information/Traffic-information/"&gt;Landvetter&lt;/a&gt; (the main
one) and &lt;a class="reference external" href="http://www.goteborgairport.se/eng.asp"&gt;Gothenburg City Airport&lt;/a&gt; (where some budget airlines fly).
If you arrive at &lt;a class="reference external" href="http://swedavia.se/en/Goteborg/Traveller-information/Traffic-information/"&gt;Landvetter&lt;/a&gt;  the airport bus stops right downtown at
&lt;a class="reference external" href="http://www.elite.se/hotell/goteborg/park/"&gt;Elite Park Avenyn Hotel&lt;/a&gt; which is the second stop, 4 blocks from the
Sprint site, as well as the end of the line, which is the &lt;a class="reference external" href="http://bit.ly/fON43p"&gt;Central Station&lt;/a&gt;.
If you arrive at &lt;a class="reference external" href="http://www.goteborgairport.se/eng.asp"&gt;Gothenburg City Airport&lt;/a&gt; take the bus to the end of the
line.  You will be at the  &lt;a class="reference external" href="http://bit.ly/fON43p"&gt;Central Station&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can also arrive by &lt;a class="reference external" href="http://www.stenaline.nl/en/ferry/"&gt;ferry&lt;/a&gt;, from either Kiel in Germany or Frederikshavn
in Denmark.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="who-s-coming"&gt;
&lt;h3&gt;Who's Coming?&lt;/h3&gt;
&lt;p&gt;If you'd like to come, please let us know when you will be arriving and
leaving, as well as letting us know your interests  We'll keep a list
of &lt;a class="reference external" href="https://bitbucket.org/pypy/extradoc/src/tip/sprintinfo/gothenburg-2011-2/people.txt"&gt;people&lt;/a&gt; which we'll update (which you can do so yourself if you
have bitbucket pypy commit rights).&lt;/p&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-7335004338996313725?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/Q7nRHhfLY64" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/7335004338996313725/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=7335004338996313725" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/7335004338996313725?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/7335004338996313725?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/Q7nRHhfLY64/pypy-goteborg-post-halloween-sprint-nov.html" title="PyPy Göteborg Post-Hallowe'en Sprint Nov 2nd - Nov 9th" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/10/pypy-goteborg-post-halloween-sprint-nov.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUQMSHoyeip7ImA9WhdbFEg.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-2380711174693638392</id><published>2011-10-12T23:02:00.002+02:00</published><updated>2011-10-12T23:03:09.492+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-12T23:03:09.492+02:00</app:edited><title>Numpy funding and status update</title><content type="html">&lt;p&gt;Hi everyone,&lt;/p&gt;
&lt;p&gt;It's been a little while since we wrote about NumPy on PyPy, so we wanted to
give everyone an update on what we've been up to, and what's up next for us.&lt;/p&gt;
&lt;p&gt;We would also like to note that we're launching a &lt;strong&gt;funding campaign&lt;/strong&gt;
for NumPy support in PyPy. Details can be found on the &lt;a class="reference external" href="http://pypy.org/numpydonate.html"&gt;donation page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some of the things that have happened since last we wrote are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;We added &lt;tt class="docutils literal"&gt;dtype&lt;/tt&gt; support, meaning you can now create arrays of a bunch of
different types, including bools, ints of a various sizes, and floats.&lt;/li&gt;
&lt;li&gt;More array methods and ufuncs, including things like comparison methods
(&lt;tt class="docutils literal"&gt;==&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;&amp;gt;&lt;/tt&gt;, etc.)&lt;/li&gt;
&lt;li&gt;Support for more and more argument types, for example you can index by a
tuple now (only works with tuples of length one, since we only have
single-dimension arrays thus far).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some of the things we're working on at the moment:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;More dtypes, including complex values and user-defined dtypes.&lt;/li&gt;
&lt;li&gt;Subscripting arrays by other array as indices, and by bool arrays as masks.&lt;/li&gt;
&lt;li&gt;Starting to reuse Python code from the original numpy.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some of the things on the near horizon are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Better support for scalar data, for example did you know that
&lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;numpy.array([True],&lt;/span&gt; &lt;span class="pre"&gt;dtype=bool)[0]&lt;/span&gt;&lt;/tt&gt; doesn't return a &lt;tt class="docutils literal"&gt;bool&lt;/tt&gt; object?
Instead it returns a &lt;tt class="docutils literal"&gt;numpy.bool_&lt;/tt&gt;.&lt;/li&gt;
&lt;li&gt;Multi-dimensional array support.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you're interested in helping out, we always love more contributors,
Alex, Maciej, Justin, and the whole PyPy team&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-2380711174693638392?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/KnXOE2XV88k" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/2380711174693638392/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=2380711174693638392" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/2380711174693638392?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/2380711174693638392?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/KnXOE2XV88k/numpy-funding-and-status-update.html" title="Numpy funding and status update" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>7</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/10/numpy-funding-and-status-update.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEENQH88eyp7ImA9WhdbE04.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-8229304944653956829</id><published>2011-10-11T13:25:00.004+02:00</published><updated>2011-10-11T14:38:11.173+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-11T14:38:11.173+02:00</app:edited><title>More Compact Lists with List Strategies</title><content type="html">&lt;p&gt;Since we come closer to merging the list-strategy branch I want to try to explain this memory optimization today. &lt;/p&gt;&lt;p&gt;Datatypes in PyPy are stored as &lt;tt&gt;W_&amp;lt;type&amp;gt;Objects&lt;/tt&gt; (e.g. &lt;tt&gt;W_StringObject to represent strings, W_IntObject&lt;/tt&gt; to represent ints). This is necessary due to the dynamic nature of Python. So the actual value (e.g. string, integer) is stored inside that box, resulting in an indirection. When having a large amount of such boxed objects, for example in a list, the wasted memory can become quite large. &lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-0-qMMmAxRro/TpQoKo728xI/AAAAAAAAAYA/Z5nNNEYYquk/s1600/overhead_before.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 183px;" src="http://3.bp.blogspot.com/-0-qMMmAxRro/TpQoKo728xI/AAAAAAAAAYA/Z5nNNEYYquk/s400/overhead_before.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5662194794763842322" /&gt;&lt;/a&gt;  &lt;p&gt;If you have a closer look at such lists, you will see that in many of them only one type of data is stored and only few (and smaller) lists store mixed types. Another thing to observe is that those lists often won't change the types of the objects they contain at runtime very often. For instance a list of a million integers is very unlikely to suddenly get a string appended to it. &lt;/p&gt;&lt;h2&gt;List Strategies&lt;/h2&gt;&lt;p&gt;The goal of this work is to write an optimization that exploits this behaviour. Instead of wrapping all items in a list, we implement lists in a way that they are optimized for storing certain (primitive) datatypes. These implementations store the content of the list in unwrapped form, getting rid of the extra indirection and wrapper objects. &lt;/p&gt;&lt;p&gt;One approach would be to add a level of indirection, making each &lt;tt&gt;W_ListObject&lt;/tt&gt; instance point to another object that stores the actual content. For this other object, several implementations would exist, for every datatype we want to store without wrapping it (as well as a general one that deals with arbitrary content). The data layout would look something like this:&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-LWtyy4ORb00/TpQohbOjg2I/AAAAAAAAAYM/kgpVemG8-9o/s1600/with_special_impl.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 193px;" src="http://3.bp.blogspot.com/-LWtyy4ORb00/TpQohbOjg2I/AAAAAAAAAYM/kgpVemG8-9o/s400/with_special_impl.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5662195186221155170" /&gt;&lt;/a&gt;  &lt;p&gt;This approach has the problem that we need two indirections to get to the data and that the implementation instances need memory themselves.&lt;/p&gt;&lt;p&gt;What we would like to do is to make the &lt;tt&gt;W_ListObject&lt;/tt&gt; point to an RPython list directly, that contains either wrapped or unwrapped data. This plan has the problem that storing different unwrapped data is not directly possible in RPython.  &lt;p&gt;To solve the problem, we use the &lt;tt&gt;rerased&lt;/tt&gt; RPython library module. It allows us to erase the type of an object, in this case lists, and returns something similar to &lt;tt&gt;void-star&lt;/tt&gt; in C, or &lt;tt&gt;Object&lt;/tt&gt; in Java. This object is then stored on the &lt;tt&gt;W_ListObject&lt;/tt&gt; in the field &lt;tt&gt;storage&lt;/tt&gt;. If we want to work with the list, for example to append or delete items, we need to unerase the storage again.&lt;/p&gt;&lt;p&gt;Example for rerase: &lt;/p&gt;&lt;pre&gt;storage = erase([1 ,2 ,3 ,4])
# storage is an opaque object that you can do nothing with
....
l = unerase(storage)
l.clear()
&lt;/pre&gt;&lt;p&gt;Now that we know how to make the &lt;tt&gt;W_ListObject&lt;/tt&gt; point directly to wrapped or unwrapped data, we need to find out how to actually do any operations on this data. This can be accomplished by adding another field to our &lt;tt&gt;W_ListObject&lt;/tt&gt;. This field points to a &lt;tt&gt;ListStrategy&lt;/tt&gt; object. The actual implementation of &lt;tt&gt;W_ListObject&lt;/tt&gt; is now deferred to those &lt;tt&gt;ListStrategy&lt;/tt&gt; classes. For instance, a &lt;tt&gt;W_ListObject&lt;/tt&gt; which holds only integers will use the &lt;tt&gt;IntegerListStrategy&lt;/tt&gt;.&lt;/p&gt;&lt;p&gt;When the type of content is being changed, we need to change the used strategy as well as the storage in compatible ways. For example when we add a string to the list of integers we need to switch to the &lt;tt&gt;ObjectListStrategy&lt;/tt&gt; and change the storage to be a list of wrapped objects. Thus the currently used strategy always knows what to do with what is currently in the storage.&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-hFXLNQ0Ry0I/TpQohnZHRpI/AAAAAAAAAYY/-AUuRfoFYqA/s1600/with_strategies.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 130px;" src="http://4.bp.blogspot.com/-hFXLNQ0Ry0I/TpQohnZHRpI/AAAAAAAAAYY/-AUuRfoFYqA/s400/with_strategies.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5662195189486667410" /&gt;&lt;/a&gt;  &lt;p&gt;As you can see, we now save one level of indirections by storing some of the data unwrapped. Of course each operation on a list needs to go via the strategy, but since we save one indirection for each element stored in that list and the &lt;tt&gt;Strategy&lt;/tt&gt; classes are singletons, the benefits outweigh the costs.&lt;/p&gt;&lt;p&gt;Currently there are only strategies for integers and strings since many lists seem to have these datatypes. Other strategies i.e for floats and unicode strings are planned. We also implemented two special strategies for empty lists and range-lists. The &lt;tt&gt;EmptyListStrategy&lt;/tt&gt;'s storage is &lt;tt&gt;None&lt;/tt&gt;. If objects are added to the list we just switch to the appropriate strategy (determined by the item's type). &lt;tt&gt;RangeListsStrategies&lt;/tt&gt; do not store any items at all. Instead they only store values describing the range of the list, i.e. start, step and length. On any operations that changes the data of the list we switch to the &lt;tt&gt;IntegerStrategy&lt;/tt&gt;.&lt;/p&gt;&lt;p&gt;A nice side-effect of storing unwrapped datatypes is that we can implement optimized methods for certain cases. For instance, since comparison of unwrapped integers is now much faster than comparison between arbitrary objects, we can rewrite the sorting methods for lists containing integers.&lt;/p&gt;&lt;h2&gt;Microbenchmarks&lt;/h2&gt;&lt;p&gt;Finally here is an early overview of the memory consumption of different Python implementations: &lt;tt&gt;CPython, PyPy&lt;/tt&gt; and &lt;tt&gt;PyPy-list&lt;/tt&gt; which uses list-strategies. To demonstrate how powerful list-strategies can be in the best case, we wrote benchmarks that create a list of integers, a list of strings and a range-list each with one million elements each and then reads out the heap size of the process as reported by the OS. &lt;/p&gt;&lt;p&gt;The results are as follows: &lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-FG6r9y8tXF4/TpQohzPkeNI/AAAAAAAAAYk/h-oZpthkFEQ/s1600/osheapsize.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 252px;" src="http://2.bp.blogspot.com/-FG6r9y8tXF4/TpQohzPkeNI/AAAAAAAAAYk/h-oZpthkFEQ/s400/osheapsize.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5662195192667863250" /&gt;&lt;/a&gt;   &lt;p&gt;The savings on integers and strings in this ideal case are quite big.&lt;/p&gt;&lt;p&gt;The benchmark for range-lists is a little unfair, since in &lt;tt&gt;CPython&lt;/tt&gt; one could accomplish the same memory behaviour using &lt;tt&gt;xrange&lt;/tt&gt;. However, in PyPy users won't notice that internally the list does not store all items, making it still possible to use all list methods, such as &lt;tt&gt;append&lt;/tt&gt; or &lt;tt&gt;delete&lt;/tt&gt;.&lt;/p&gt;&lt;h2&gt;Conclusion&lt;/h2&gt;&lt;p&gt;We hope that list strategies bring memory savings for applications that use homogeneous lists of primitive types. Furthermore, operations on such lists tend to be somewhat faster as well. This also integrates well with the JIT. The list strategies optimizations will be merged to the PyPy's default branch at some point in the next months. An equivalent optimization for dictionaries has already been merged (and is part of PyPy 1.6), one for sets is coming in the future.&lt;/p&gt;&lt;p&gt;Lukas Diekmann and Carl Friedrich Bolz&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-8229304944653956829?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/cWM5FAsRdt8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/8229304944653956829/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=8229304944653956829" title="16 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/8229304944653956829?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/8229304944653956829?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/cWM5FAsRdt8/more-compact-lists-with-list-strategies.html" title="More Compact Lists with List Strategies" /><author><name>Carl Friedrich Bolz</name><uri>http://www.blogger.com/profile/00518922641059511014</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-0-qMMmAxRro/TpQoKo728xI/AAAAAAAAAYA/Z5nNNEYYquk/s72-c/overhead_before.png" height="72" width="72" /><thr:total>16</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/10/more-compact-lists-with-list-strategies.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUMNSXw5eyp7ImA9WhdVFk8.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-8139653689520709617</id><published>2011-09-21T18:44:00.003+02:00</published><updated>2011-09-21T18:44:58.223+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-21T18:44:58.223+02:00</app:edited><title>Py3k for PyPy fundraiser</title><content type="html">&lt;p&gt;Hi,&lt;/p&gt;&lt;p&gt;We would like to announce a donation campaign for implementing Python 3 in PyPy.&lt;br /&gt;
Please read our &lt;a class="reference external" href="http://pypy.org/py3donate.html"&gt;detailed plan&lt;/a&gt; for all the details and donate using the&lt;br /&gt;
button on that page!&lt;/p&gt;&lt;p&gt;Thanks,&lt;br /&gt;
The PyPy Team&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-8139653689520709617?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/4SGV2hdYRyA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/8139653689520709617/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=8139653689520709617" title="21 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/8139653689520709617?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/8139653689520709617?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/4SGV2hdYRyA/py3k-for-pypy-fundraiser.html" title="Py3k for PyPy fundraiser" /><author><name>Alex</name><uri>http://www.blogger.com/profile/14054821112394577330</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>21</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/09/py3k-for-pypy-fundraiser.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0QGRHo5fCp7ImA9WhdXF0w.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-3916959558080483711</id><published>2011-08-30T14:08:00.002+02:00</published><updated>2011-08-30T16:42:05.424+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-30T16:42:05.424+02:00</app:edited><title>Wrapping C++ Libraries with Reflection — Status Report One Year Later</title><content type="html">&lt;p&gt;Well over a year ago, &lt;a class="reference external" href="http://morepypy.blogspot.com/2010/07/cern-sprint-report-wrapping-c-libraries.html"&gt;work was started&lt;/a&gt; on the &lt;tt class="docutils literal"&gt;cppyy&lt;/tt&gt; module which lives in the
&lt;tt class="docutils literal"&gt;reflex-support&lt;/tt&gt; branch.
Since then, work has progressed at a varying pace and has included a recent
sprint in Düsseldorf, last July.&lt;/p&gt;
&lt;p&gt;Let's first take a step back and recap why we're interested in doing this,
given that it is perfectly possible to use C++ through generated bindings and
&lt;tt class="docutils literal"&gt;cpyext&lt;/tt&gt;.
&lt;tt class="docutils literal"&gt;cppyy&lt;/tt&gt; makes use of reflection information generated for the C++ classes of
interest, and has that reflection information available at run time.
Therefore, it is able to open up complex C++ types to the JIT in a
conceptually similar manner as simple types are open to it.
This means that it is possible to get rid of a lot of the marshalling layers
when making cross-language calls, resulting in much lower call overhead than
is possible when going through the CPython API, or other methods of wrapping.&lt;/p&gt;
&lt;p&gt;There are two problems that need to be solved: C++ language constructs need to
be presented on the Python side in a natural way; and cross-language impedance
mismatches need to be minimized, with some hints of the user if need be.
For the former, the list of mapped features has grown to a set that is
sufficient to do real work.
There is now support for:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;builtin, pointer, and array types&lt;/li&gt;
&lt;li&gt;namespaces, classes, and inner classes&lt;/li&gt;
&lt;li&gt;global functions, global data&lt;/li&gt;
&lt;li&gt;static/instance data members and methods&lt;/li&gt;
&lt;li&gt;default variables, object return by value&lt;/li&gt;
&lt;li&gt;single and multiple (virtual) inheritance&lt;/li&gt;
&lt;li&gt;templated classes&lt;/li&gt;
&lt;li&gt;basic STL support and pythonizations&lt;/li&gt;
&lt;li&gt;basic (non-global) operator mapping&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;The second problem is harder and will always be an on-going process.
But one of the more important issues has been solved at the recent Düsseldorf
sprint, namely, that of reclaiming C++ objects instantiated from the Python
side by the garbage collector.&lt;/p&gt;
&lt;p&gt;Performance has also improved, especially that of the nicer &amp;quot;pythonized&amp;quot;
interface that the user actually sees, although it still misses out on
about a factor of 2.5 in comparison to the lower-level interface (which has
gotten uglier, so you really don't want to use that).
Most of this improvement is due to restructuring so that it plays nicer with
the JIT and libffi, both of which themselves have seen improvements.&lt;/p&gt;
&lt;p&gt;Work is currently concentrated on the back-ends: a &lt;a class="reference external" href="http://root.cern.ch/drupal/content/cint"&gt;CINT&lt;/a&gt; back-end is underway
and a LLVM/CLang pre-compiled headers (PCH) back-end is planned.
The latter is needed for this code to be released in the wild, rather than
just used in high energy physics (HEP), as that would be easier to support.
Also, within HEP, CLang's PCH are foreseen to be the future format of
reflection information.&lt;/p&gt;
&lt;p&gt;At the end of the Düsseldorf sprint, we tried a little code that did something
actually &amp;quot;useful,&amp;quot; namely the filling of a histogram with some random values.
We did get it to work, but trying &lt;tt class="docutils literal"&gt;cppyy&lt;/tt&gt; on a large class library showed
that a good warning system for such things like missing classes was sorely
needed.
That has been added since, and revisiting the histogram example later, here is
an interesting note: the &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;pypy-c&lt;/span&gt;&lt;/tt&gt; run takes 1.5x the amount of time of that
of the compiled, optimized, C++ code.
The run was timed start to finish, including the reflection library loading
and JIT warm-up that is needed in the case of Python, but not for the compiled
C++ code.
However, in HEP, scientists run many short jobs while developing their
analysis codes, before submitting larger jobs on the GRID to run during lunch
time or overnight.
Thus, a more realistic comparison is to include the compilation time needed
for the C++ code and with that, the Python code needs only 55% of the time
required by C++.&lt;/p&gt;
&lt;p&gt;The choice of a programming language is often a personal one, and such
arguments like the idea that C++ is hard to use typically do not carry much
weight with the in-crowd that studies quantum field dynamics for fun.
However, getting the prompt with your analysis results back faster is a sure
winner. We hope that &lt;tt class="docutils literal"&gt;cppyy&lt;/tt&gt; will soon have progressed far enough to make it
useful first to particle physicists and then other uses for wrapping C++
libraries.&lt;/p&gt;

Wim Lavrijsen, Carl Friedrich Bolz, Armin Rigo&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-3916959558080483711?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/9wgZSgNeGoQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/3916959558080483711/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=3916959558080483711" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/3916959558080483711?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/3916959558080483711?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/9wgZSgNeGoQ/wrapping-c-libraries-with-reflection.html" title="Wrapping C++ Libraries with Reflection — Status Report One Year Later" /><author><name>Carl Friedrich Bolz</name><uri>http://www.blogger.com/profile/00518922641059511014</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>5</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/08/wrapping-c-libraries-with-reflection.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEAGQ3g4eSp7ImA9WhdXE04.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-6513983438425039230</id><published>2011-08-23T13:53:00.003+02:00</published><updated>2011-08-26T07:32:02.631+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-26T07:32:02.631+02:00</app:edited><title>We need Software Transactional Memory</title><content type="html">&lt;p&gt;Hi all.  Here is (an extract of) a short summary paper about my current position on
Software Transactional Memory as a general tool in the implementation
of Python or Python-like languages.  Thanks to people on IRC for discussion on making
this blog post better (lucian, Alex Gaynor, rguillebert, timonator, Da_Blitz).
For the purpose of the present discussion, we are comparing Java with Python
when it comes to multi-threading.&lt;/p&gt;

&lt;h2&gt;The problem in complex high-level languages&lt;/h2&gt;
&lt;p&gt;Like Java, the Python language gives guarantees: it is not acceptable
for the Python virtual machine to crash due to incorrect usage of
threads.  A primitive operation in Java is something like reading or
writing a field of an object; the corresponding guarantees are along the
lines of: if the program reads a field of an object, and another thread
writes to the same field of the same object, then the program will see
either the old value, or the new value, but not something else entirely,
and the virtual machine will not crash.&lt;/p&gt;
&lt;p&gt;Higher-level languages like Python differ from Java by the fact that a
&amp;quot;primitive operation&amp;quot; is far more complex.  It may for example involve
looking in several hash maps, perhaps doing updates.  In general, it is
completely impossible to map every operation that must be atomic to a
single processor instruction.&lt;/p&gt;

&lt;h2&gt;Jython: fine-grained locking&lt;/h2&gt;
&lt;p&gt;This problem has been solved &amp;quot;explicitly&amp;quot; in the Jython interpreter that
runs on top of Java.  The solution is explicit in the following sense:
throughout the Jython interpreter, every single operation makes careful
use of Java-level locking mechanisms.  This is an application of
&amp;quot;fine-grained locking&amp;quot;.  For example, operations like attribute lookup,
which need to perform look-ups in a number of hash maps, are protected
by acquiring and releasing locks (in __getattribute__).&lt;/p&gt;
&lt;p&gt;A draw-back of this solution is the attention to detail required.
If even one place misses a lock, then there is either a
bug --- and such bugs occur in cases that are increasingly rare and hard
to debug as the previous bugs are fixed --- or we just file it under "differences
from CPython".  There is however the risk of
deadlock, if two threads attempt to lock the same objects in different
order.&lt;/p&gt;

&lt;p&gt;In practice, the situation is actually not as bad as
I may paint it: the number of locks in Jython is reasonable, and allows for
all the &amp;quot;common cases&amp;quot; to work as expected.
(For the uncommon cases, see below.)&lt;/p&gt;

&lt;p&gt;Performance-wise, the Java virtual machine itself comes with locks that
have been heavily optimized over a long period of time, so the
performance is acceptable.  However if this solution were coded in C, it
would need a lot of extra work to optimize the locks manually (possibly
introducing more of the subtle bugs).&lt;/p&gt;

&lt;h2&gt;CPython: coarse-grained locking&lt;/h2&gt;
&lt;p&gt;CPython, the standard implementation of Python in C, took a different
and simpler approach: it has a single global lock, called the Global
Interpreter Lock (GIL).  It uses &amp;quot;coarse-grained locking&amp;quot;: the lock is
acquired and released around the whole execution of one bytecode (or
actually a small number of bytecodes, like 100).  This solution is
enough to ensure that no two operations can conflict with each other,
because the two bytecodes that invoke them are themselves
serialized by the GIL.  It is a solution which avoids --- unlike Jython
--- writing careful lock-acquiring code all over the interpreter.  It
also offers even stronger guarantees: every bytecode runs entirely
atomically.&lt;/p&gt;
&lt;p&gt;Nowadays, the draw-back of the GIL approach is obvious on multi-core
machines: by serializing the execution of bytecodes, starting multiple
threads does not actually let the interpreter use of more than one core.&lt;/p&gt;
&lt;p&gt;PyPy, the Python implementation in Python, takes the same approach so
far.&lt;/p&gt;

&lt;h2&gt;Existing usage&lt;/h2&gt;
&lt;p&gt;As we have seen, we have the following situation: the existing Python
language, as CPython implements it, offers very strong guarantees about
multi-threaded usage.  It is important to emphasize that most existing
multi-threaded Python programs actually rely on such strong guarantees.
This can be seen for example in a problem that takes a populated list
and does in several threads:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
next_item = global_list.pop()
&lt;/pre&gt;
&lt;p&gt;This implicitly relies on the fact that pop() will perform atomic
removal from the list.  If two threads try to pop() from the same list
at the same time, then the two operations will occur in one order or the
other; but they will not e.g. return the same object to both threads or
mess up the internal state of the list object.&lt;/p&gt;
&lt;p&gt;With such an example in mind, it should be clear that we do not want a
solution to the multi-core issue that involves dropping these strong
guarantees.  It is ok however to lower the barrier, as Jython does; but
any Python implementation must offer &lt;i&gt;some&lt;/i&gt; guarantees, or not offer
multi-threading at all.  This includes the fact that a lot of methods on
built-in types are supposed to be atomic.&lt;/p&gt;

&lt;p&gt;(It should be noted that not offering multi-threading at all is actually
also a (partial) solution to the problem.  Recently, several &amp;quot;hacks&amp;quot;
have appeared that give a programmer more-or-less transparent access to
multiple independent processes (e.g. &lt;a href="http://docs.python.org/library/multiprocessing.html"&gt;multiprocessing&lt;/a&gt;).  While these provide appropriate
solutions in some context, they are not as widely applicable as
multi-threading.  As a typical example, they fail to apply when the
mutiple cores need to process information that cannot be serialized at
all --- a requirement for any data exchange between several processes.)&lt;/p&gt;

&lt;p&gt;Here is an example of how Jython's consistency is weaker than CPython's GIL.
It takes uncommon examples to show it, and the fact that it does not work
like a CPython programmer expect them to is generally considered as an
implementation detail.  Consider:&lt;/p&gt;
&lt;pre&gt;Thread 1:  set1.update(set2)
Thread 2:  set2.update(set3)
Thread 3:  set3.update(set1)&lt;/pre&gt;
&lt;p&gt;Each operation is atomic in the case of CPython, but decomposed in two steps
(which can each be considered atomic) in the case of Jython: reading from the
argument, and then updating the target set.  Suppose that initially
set1 = {1}, set2 = {2}, set3 = {3}.  On CPython, independently on
the order in which the threads run, we will end up with at least one of the
sets being {1, 2, 3}.  On Jython, it is possible that all
three sets end up as containing two items only.  The example is a bit
far-fetched but should show that CPython's consistency is strictly stronger
than Jython's.&lt;/p&gt;

&lt;h2&gt;PyPy&lt;/h2&gt;
&lt;p&gt;PyPy is a Python interpreter much like CPython or Jython, but the way it
is produced is particular.  It is an interpreter written in RPython, a
subset of Python, which gets turned into a complete virtual machine (as
generated C code) automatically by a step called the &amp;quot;translation&amp;quot;.  In
this context, the trade-offs are different from the ones in CPython and
in Jython: it is possible in PyPy, and even easy, to apply arbitrary
whole-program transformations to the interpreter at &amp;quot;translation-time&amp;quot;.&lt;/p&gt;
&lt;p&gt;With this in mind, it is possible to imagine a whole-program
transformation that would add locking on every object manipulated in
RPython by the interpreter.  This would end up in a situation similar to
Jython.  However, it would not automatically solve the issue of
deadlocks, which is avoided in the case of Jython by careful manual
placement of the locks.  (In fact, being deadlock-free is a global
program property that cannot be automatically ensured or verified; any
change to Jython can in theory break this property, and thus introduce
subtle deadlocks.  The same applies to non-atomicity.)&lt;/p&gt;
&lt;p&gt;In fact, we can easily check that if the interpreter accesses (for
both reading and writing)
objects A and B in a bytecode of thread 1, and objects B and A (in the
opposite order) in a bytecode of thread 2 --- and moreover if you need to
have accessed the first object before you can decide that you will need
to access the second object --- then there is no way (apart from the GIL) to avoid
a deadlock while keeping the strong guarantee of atomicity.  Indeed, if
both threads have progressed to the middle of the execution of their
bytecode, then A has already been mutated by thread 1 and similarly B
has already been mutated by thread 2.  It is not possible to
successfully continue running the threads in that case.&lt;/p&gt;

&lt;h2&gt;Using Software Transactional Memory&lt;/h2&gt;
&lt;p&gt;Software Transactional Memory (STM) is an approach that gives a solution
to precisely the above problem.  If a thread ended up in a situation
where continuing to run it would be wrong, then we can &lt;i&gt;abort and
rollback.&lt;/i&gt;  This is similar to the notion of transaction on databases.
In the above example, one or both threads would notice that they are
about to run into troubles and abort.  This means more concretely that
they need to have a way to restart execution at the start of the
bytecode, with all the side-effects of what they did so far being either
cancelled or just not committed yet.&lt;/p&gt;
&lt;p&gt;We think that this capacity to abort and rollback is the missing piece
of the puzzle of multi-threaded implementations of Python.
Actually, according to the presentation of the problem given
above, it is unavoidable that any solution that wants to offer the
same level of consistency and atomicity as CPython would involve
the capacity of aborting and rolling back --- &lt;i&gt;which means precisely
that STM cannot be avoided.&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;Ok, but why not settle down with Jython's
approach and put careful locks left and right throughout the interpreter?
Because (1) we would have to consider every operation's atomicity and make decisions
(or steal Jython's) and document them
&lt;a href="http://doc.pypy.org/en/latest/cpython_differences.html"&gt;here&lt;/a&gt;;
(2) it would also be really a lot of work, to optimize these locks e.g. with the
JIT as well as the JVM does; and (3) it is not the PyPy way to require manually
tweaking your code everywhere for a feature that should be orthogonal.  Point
(3) is probably the most important here: you need to redo the work for every
language you implement in PyPy.
It also implies my own point (4): &lt;i&gt;it is not fun :-)&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;In more details, the process would work as follows.  (This gives an
overview of one possible model; it is possible that a different model
will end up being better.)  In every thread:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;At the start of a bytecode, we start a &amp;quot;transaction&amp;quot;.  This means
setting up a thread-local data structure to record a log of what
occurs in the transaction.&lt;/li&gt;
&lt;li&gt;We record in the log all objects that are read, as well as the
modifications that we would like to make.&lt;/li&gt;
&lt;li&gt;During this time, we detect &amp;quot;read&amp;quot; inconsistencies, shown by the
object's &amp;quot;last-modified&amp;quot; timestamp being later than the start time
of the current transaction, and abort.  This prevents the rest of
the code from running with inconsistent values.&lt;/li&gt;
&lt;li&gt;If we reach the end of the bytecode without a &amp;quot;read&amp;quot; inconsistency,
then we atomically check for &amp;quot;write&amp;quot; inconsistencies.  These are
inconsistencies which arise from concurrent updates to objects
in the other threads --- either our &amp;quot;write&amp;quot; objects, or our &amp;quot;read&amp;quot;
objects.&lt;/li&gt;
&lt;li&gt;If no inconsistency is found, we &amp;quot;commit&amp;quot; the transaction by copying
the delayed writes from the log into main memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The points at which a transaction starts or ends are exactly the
points at which, in CPython, the Global Interpreter Lock is
respectively acquired and released.  If we ignore the fact that (purely for
performance) CPython acquires and releases the GIL only every N bytecodes,
then this means:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Before any bytecode we acquire the GIL (start a transaction), and after
the bytecode we release it (ends the transaction); and
&lt;li&gt;Before doing an external call to the C library or the OS we release the GIL
(ends the transaction) and afterwards re-acquire it (start the next transaction).
&lt;/ol&gt;
So in particular this model is well suited to the STM condition that we cannot
do anything in a transaction that cannot be rolled back, like --- precisely ---
system calls.  Indeed, by construction, these system calls occur outside a
transaction, because in CPython they occur with the GIL released.&lt;/p&gt;

&lt;h2&gt;Performance&lt;/h2&gt;
&lt;p&gt;A large number of implementation details are still open for now.
From a user's point of view (i.e. the programmer using Python),
the most relevant one is the overall performance impact.  We
cannot give precise numbers so far, and we expect the initial
performance to be abysmally bad (maybe 10x slower); however, with
successive improvements to the locking mechanism, to the global
program transformation inserting the locks, to the garbage 
collector (GC), and to the Just-in-Time (JIT) compiler, we
believe that it should be possible to get a roughly reasonable
performance (up to maybe 2x slower).  For example, the GC can
maintain flags on the objects to know that they did not escape
their creation thread, and do not need any logging; and the JIT
compiler can aggregate several reads or writes to an object into
one.  We believe that these are the kind of optimizations that
can give back a lot of the performance lost.&lt;/p&gt;

&lt;h2&gt;The state of STM&lt;/h2&gt;
&lt;p&gt;Transactional Memory is itself a relatively old idea, originating
from a 1986 paper by Tom Knight.  At first based on hardware
support, the idea of software-only transactional memory (STM) was
popularized in 1995 and has recently been the focus of intense 
research.&lt;/p&gt;
&lt;p&gt;The approach outlined above --- using STM to form the core of the
implementation of a language --- is new, as far as we know.  So
far, most implementations provide STM as a library feature.  It
requires explicit usage, often in the form of explicitly
declaring which objects must be protected by STM (object-based
STMs).  It is only recently that native STM support has started
to appear, notably in the Clojure language.&lt;/p&gt;
&lt;p&gt;STM is described on Wikipedia as an approach that &amp;quot;greatly
simplifies conceptual understanding of multithreaded programs and
helps make programs more maintainable by working in harmony with
existing high-level abstractions such as objects and modules.&amp;quot;
We actually think that these benefits are important enough to
warrant being exposed to the Python programmer as well, instead
of being used only internally.  This would give the Python
programmer a very simple interface:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
with atomic:
    &amp;lt;these operations are executed atomically&amp;gt;
&lt;/pre&gt;
&lt;p&gt;(This is &lt;a href="http://mail.python.org/pipermail/python-dev/2003-February/033259.html"&gt;an old idea.&lt;/a&gt;  Funny how back in 2003 people, including me, thought that this was a hack.  Now I'm writing a blog post to say "it was not a hack; it's explicitly using locks that is a hack."  I'm buying the idea of &lt;a href="http://en.wikipedia.org/wiki/Software_transactional_memory#Composable_operations"&gt;composability.&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;From a practical point of view, I started looking seriously at
the University of Rochester STM (RSTM), a C++ library that has
been a focus of --- and a collection of results from --- recent
research.  One particularly representative paper is
&lt;a href="http://www.cs.rochester.edu/u/spear/ppopp09.pdf"&gt;A
Comprehensive Strategy for Contention Management in Software
Transactional Memory&lt;/a&gt; by Michael F. Spear, Luke Dalessandro,
Virendra J. Marathe and Michael L. Scott.&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Taking these ideas and applying them in the context of an
implementation of a complex high-level language like Python comes
with its own challanges.  In this context, using PyPy makes sense
as both an experimentation platform and as a platform that is
recently gaining attention for its performance.  The alternatives
are unattractive: doing it in CPython for example would mean
globally rewriting the interpreter.  In PyPy instead, we write it
as a transformation that is applied systematically at translation-time.
Also, PyPy is a general platform for generating fast interpreters
for dynamic languages; the STM implementation in PyPy would work
out of the box for other language implementations as well, instead
of just for Python.&lt;/p&gt;
&lt;br&gt;
&lt;p&gt;&lt;b&gt;Update:&lt;/b&gt;
&lt;ul&gt;
&lt;li&gt;This is mostly me (Armin Rigo) ranting aloud and trying experiments;
this post should not be confused as meaning that the whole PyPy team
will now spend the next years working on it full-time.
As I said it is orthogonal to the actual Python interpreter, and it is in
any case a feature that can be turned on or off during translation; I know
that in many or most use cases, people are more interested in getting a
fast PyPy rather than one which is twice as slow but scales well.
&lt;li&gt;Nothing I said is really new.  For proof, see
&lt;a href="http://sabi.net/nriley/pubs/dls6-riley.pdf"&gt;Riley and Zilles (2006)&lt;/a&gt;
as well as &lt;a href="http://www.cs.auckland.ac.nz/~fuad/parpycan.pdf"&gt;Tabba (2010)&lt;/a&gt; who both experimented with &lt;i&gt;Hardware&lt;/i&gt; Transactional Memory, turning CPython or PyPy interpreter's GIL into start/end transactions, as I describe here.
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-6513983438425039230?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/u6ms7u_pkdc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/6513983438425039230/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=6513983438425039230" title="35 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6513983438425039230?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6513983438425039230?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/u6ms7u_pkdc/we-need-software-transactional-memory.html" title="We need Software Transactional Memory" /><author><name>Armin Rigo</name><uri>http://www.blogger.com/profile/06300515270104686574</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>35</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/08/we-need-software-transactional-memory.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak8ERnozeyp7ImA9WhdQFkU.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-559424594592497545</id><published>2011-08-18T19:24:00.001+02:00</published><updated>2011-08-18T19:33:27.483+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-18T19:33:27.483+02:00</app:edited><title>PyPy 1.6 - kickass panda</title><content type="html">&lt;p&gt;We're pleased to announce the 1.6 release of PyPy. This release brings a lot
of bugfixes and performance improvements over 1.5, and improves support for
Windows 32bit and OS X 64bit. This version fully implements Python 2.7.1 and
has beta level support for loading CPython C extensions.  You can download it
here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;a class="reference external" href="http://pypy.org/download.html"&gt;http://pypy.org/download.html&lt;/a&gt;&lt;/blockquote&gt;
&lt;div class="section" id="what-is-pypy"&gt;
&lt;h3&gt;What is PyPy?&lt;/h3&gt;
&lt;p&gt;PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7.1. It's fast (&lt;a class="reference external" href="http://speed.pypy.org"&gt;pypy 1.6 and cpython 2.6.2&lt;/a&gt; performance comparison)
due to its integrated tracing JIT compiler.&lt;/p&gt;
&lt;p&gt;This release supports x86 machines running Linux 32/64 or Mac OS X.  Windows 32
is beta (it roughly works but a lot of small issues have not been fixed so
far).  Windows 64 is not yet supported.&lt;/p&gt;
&lt;p&gt;The main topics of this release are speed and stability: on average on
our benchmark suite, PyPy 1.6 is between &lt;strong&gt;20% and 30%&lt;/strong&gt; faster than PyPy 1.5,
which was already much faster than CPython on our set of benchmarks.&lt;/p&gt;
&lt;p&gt;The speed improvements have been made possible by optimizing many of the
layers which compose PyPy.  In particular, we improved: the Garbage Collector,
the JIT warmup time, the optimizations performed by the JIT, the quality of
the generated machine code and the implementation of our Python interpreter.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="highlights"&gt;
&lt;h3&gt;Highlights&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Numerous performance improvements, overall giving considerable speedups:&lt;ul&gt;
&lt;li&gt;better GC behavior when dealing with very large objects and arrays&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;fast ctypes:&lt;/strong&gt; now calls to ctypes functions are seen and optimized
by the JIT, and they are up to 60 times faster than PyPy 1.5 and 10 times
faster than CPython&lt;/li&gt;
&lt;li&gt;improved generators(1): simple generators now are inlined into the caller
loop, making performance up to 3.5 times faster than PyPy 1.5.&lt;/li&gt;
&lt;li&gt;improved generators(2): thanks to other optimizations, even generators
that are not inlined are between 10% and 20% faster than PyPy 1.5.&lt;/li&gt;
&lt;li&gt;faster warmup time for the JIT&lt;/li&gt;
&lt;li&gt;JIT support for single floats (e.g., for &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;array('f')&lt;/span&gt;&lt;/tt&gt;)&lt;/li&gt;
&lt;li&gt;optimized dictionaries: the internal representation of dictionaries is now
dynamically selected depending on the type of stored objects, resulting in
faster code and smaller memory footprint.  For example, dictionaries whose
keys are all strings, or all integers. Other dictionaries are also smaller
due to bugfixes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;JitViewer: this is the first official release which includes the JitViewer,
a web-based tool which helps you to see which parts of your Python code have
been compiled by the JIT, down until the assembler. The &lt;a class="reference external" href="http://morepypy.blogspot.com/2011/08/visualization-of-jitted-code.html"&gt;jitviewer&lt;/a&gt; 0.1 has
already been release and works well with PyPy 1.6.&lt;/li&gt;
&lt;li&gt;The CPython extension module API has been improved and now supports many
more extensions. For information on which one are supported, please refer to
our &lt;a class="reference external" href="https://bitbucket.org/pypy/compatibility/wiki/Home"&gt;compatibility wiki&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Multibyte encoding support: this was of of the last areas in which we were
still behind CPython, but now we fully support them.&lt;/li&gt;
&lt;li&gt;Preliminary support for NumPy: this release includes a preview of a very
fast NumPy module integrated with the PyPy JIT.  Unfortunately, this does
not mean that you can expect to take an existing NumPy program and run it on
PyPy, because the module is still unfinished and supports only some of the
numpy API. However, barring some details, what works should be
blazingly fast :-)&lt;/li&gt;
&lt;li&gt;Bugfixes: since the 1.5 release we fixed 53 bugs in our &lt;a class="reference external" href="https://bugs.pypy.org"&gt;bug tracker&lt;/a&gt;, not
counting the numerous bugs that were found and reported through other
channels than the bug tracker.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cheers,&lt;/p&gt;
&lt;p&gt;Hakan Ardo, Carl Friedrich Bolz, Laura Creighton, Antonio Cuni,
Maciej Fijalkowski, Amaury Forgeot d'Arc, Alex Gaynor,
Armin Rigo and the PyPy team&lt;/p&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-559424594592497545?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/JKCXAZP5WJo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/559424594592497545/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=559424594592497545" title="20 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/559424594592497545?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/559424594592497545?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/JKCXAZP5WJo/pypy-16-kickass-panda.html" title="PyPy 1.6 - kickass panda" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>20</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/08/pypy-16-kickass-panda.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUEHRH46fSp7ImA9WhdQEk4.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-6202490807361942120</id><published>2011-08-12T18:39:00.001+02:00</published><updated>2011-08-13T13:07:15.015+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-13T13:07:15.015+02:00</app:edited><title>Visualization of JITted code</title><content type="html">&lt;p&gt;Hello.&lt;/p&gt;
&lt;p&gt;We're proud to announce the first public release of the jitviewer. As of now,
jitviewer is a slightly internal tool that helps understanding how your Python
source code is compiled by the PyPy's JIT all the way down to machine code.&lt;/p&gt;
&lt;p&gt;To install it, you need a very recent version of PyPy
(newer than 9th of August), for example one of the &lt;a class="reference external" href="http://buildbot.pypy.org/nightly/trunk/"&gt;nightly builds&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;install &lt;tt class="docutils literal"&gt;pip&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;distribute&lt;/tt&gt; either by creating a PyPy &lt;a class="reference external" href="http://pypi.python.org/pypi/virtualenv"&gt;virtualenv&lt;/a&gt;
or by following the &lt;a class="reference external" href="http://doc.pypy.org/en/latest/getting-started.html#installing-pypy"&gt;installation instructions&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;make sure to have a &lt;a class="reference external" href="http://bitbucket.org/pypy/pypy"&gt;source code checkout&lt;/a&gt; of PyPy and put it in your
PYTHONPATH.&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;pip install jitviewer&lt;/tt&gt;.  Note that you need to run the &lt;tt class="docutils literal"&gt;pip&lt;/tt&gt;
executable which belongs to PyPy, not the globally installed one.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Have a look at the &lt;a class="reference external" href="http://bitbucket.org/pypy/jitviewer/src/24adc3403cd8/README"&gt;README&lt;/a&gt; for how to start it, or try the &lt;a class="reference external" href="http://wyvern.cs.uni-duesseldorf.de:5000/"&gt;online demo&lt;/a&gt; if
you just want to play with it.&lt;/p&gt;
&lt;p&gt;The jitviewer is a web application written with &lt;tt class="docutils literal"&gt;flask&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;jinja2&lt;/tt&gt;.  If
you have experience with web development and you want to help PyPy, don't
hesitate to contact us, there are plenty of things to improve in it :-).&lt;/p&gt;
&lt;div class="section" id="what-does-the-jitviewer-really-do"&gt;
&lt;h3&gt;What does the jitviewer really do?&lt;/h3&gt;
&lt;p&gt;At the top of the page, you will see the list of pieces of code which has been
compiled by the JIT.  You will see entries for both normal loops and for
&amp;quot;entry bridges&amp;quot;.  This is not the right place to discuss the difference
between those, but you most probably want to look at loops, because usually
it's where most of the time is spent.&lt;/p&gt;
&lt;p&gt;Note that for each loop, you will see the name of the function which contains
the &lt;strong&gt;first&lt;/strong&gt; instruction of the loop.  However, thanks to the inlining done
by the JIT, it will contain also the code for other functions.&lt;/p&gt;
&lt;p&gt;Once you select a loop, the jitviewer shows how the JIT has compiled the
Python source code into assembler in a hierarchical way. It displays four
levels:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p class="first"&gt;Python source code: only the lines shown in azure have been compiled for
this particular loop, the ones in gray have not.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Python bytecode, the one you would get by doing:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
def f(a, b):
   return a + b

import dis
dis.dis(f)
&lt;/pre&gt;
&lt;p&gt;The opcodes are e.g. &lt;tt class="docutils literal"&gt;LOAD_FAST&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;LOAD_GLOBAL&lt;/tt&gt; etc.  The opcodes
which are not in bold have been completely optimized aways by the JIT.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Intermediate representation of jit code (IR). This is a combination of
operations (like integer addition, reading fields out of structures) and
guards (which check that the assumptions we made are actually true). Guards
are in red.  These operations are &amp;quot;at the same level as C&amp;quot;: so, for example,
&lt;tt class="docutils literal"&gt;+&lt;/tt&gt; takes two unboxed integers which can be stored into the register
of the CPU.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Assembler: you can see it by clicking on &amp;quot;Show assembler&amp;quot; in the menu on the
right.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Sometimes you'll find that a guard fails often enough that a new piece of
assembler is required to be compiled. This is an alternative path through the
code and it's called a bridge. You can see bridges in the jitviewer when
there is a link next to a guard. For more information about purpose look up
the &lt;a class="reference external" href="http://doc.pypy.org/en/latest/jit/index.html"&gt;jit documentation&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="i-m-still-confused"&gt;
&lt;h3&gt;I'm still confused&lt;/h3&gt;
&lt;p&gt;Jitviewer is not perfect when it comes to explaining what's going on. Feel free
to pop up on IRC or send us a mail to the mailing list, we'll try to explain
and/or improve the situation. Consult the &lt;a class="reference external" href="http://pypy.org/contact.html"&gt;contact&lt;/a&gt; page for details.&lt;/p&gt;
&lt;p&gt;Cheers,&lt;br/&gt;
fijal &amp;amp; antocuni&lt;/p&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-6202490807361942120?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/r3quDrGKLUA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/6202490807361942120/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=6202490807361942120" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6202490807361942120?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6202490807361942120?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/r3quDrGKLUA/visualization-of-jitted-code.html" title="Visualization of JITted code" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>6</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/08/visualization-of-jitted-code.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUIHRn45cCp7ImA9WhdRE00.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-6756589731691762127</id><published>2011-08-02T19:50:00.000+02:00</published><updated>2011-08-02T19:52:17.028+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-02T19:52:17.028+02:00</app:edited><title>PyPy is faster than C, again: string formatting</title><content type="html">&lt;p&gt;String formatting is probably something you do just about every day in Python,
and never think about.  It's so easy, just &lt;tt class="docutils literal"&gt;&amp;quot;%d %d&amp;quot; % (i, i)&lt;/tt&gt; and you're
done.  No thinking about how to size your result buffer, whether your output
has an appropriate NULL byte at the end, or any other details.  A C
equivalent might be:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
char x[44];
sprintf(x, &amp;quot;%d %d&amp;quot;, i, i);
&lt;/pre&gt;
&lt;p&gt;Note that we had to stop for a second and consider how big numbers might get
and overestimate the size (44 = length of the biggest number on 64bit (20) +
1 for the sign * 2 + 1 (for the space) + 1 (NUL byte)), it took the authors of
this post, fijal and alex, 3 tries to get the math right on this :-)&lt;/p&gt;
&lt;p&gt;This is fine, except you can't even return &lt;tt class="docutils literal"&gt;x&lt;/tt&gt; from this function, a more
fair comparison might be:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
char *x = malloc(44 * sizeof(char));
sprintf(x, &amp;quot;%d %d&amp;quot;, i, i);
&lt;/pre&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;x&lt;/tt&gt; is slightly overallocated in some situations, but that's fine.&lt;/p&gt;
&lt;p&gt;But we're not here to just discuss the implementation of string
formatting, we're here to discuss how blazing fast PyPy is at it, with
the new &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;unroll-if-alt&lt;/span&gt;&lt;/tt&gt; branch.  Given the Python code:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
def main():
    for i in xrange(10000000):
        &amp;quot;%d %d&amp;quot; % (i, i)

main()
&lt;/pre&gt;
&lt;p&gt;and the C code:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;


int main() {
    int i = 0;
    char x[44];
    for (i = 0; i &amp;lt; 10000000; i++) {
        sprintf(x, &amp;quot;%d %d&amp;quot;, i, i);
    }
}
&lt;/pre&gt;
&lt;p&gt;Run under PyPy, at the head of the &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;unroll-if-alt&lt;/span&gt;&lt;/tt&gt; branch, and
compiled with GCC 4.5.2 at -O4 (other optimization levels were tested,
this produced the best performance). It took &lt;strong&gt;0.85&lt;/strong&gt; seconds to
execute under PyPy, and &lt;strong&gt;1.63&lt;/strong&gt; seconds with the compiled binary. We
think this demonstrates the incredible potential of dynamic
compilation, GCC is unable to inline or unroll the &lt;tt class="docutils literal"&gt;sprintf&lt;/tt&gt; call,
because it sits inside of libc.&lt;/p&gt;
&lt;p&gt;Benchmarking the C code:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;


int main() {
    int i = 0;
    for (i = 0; i &amp;lt; 10000000; i++) {
        char *x = malloc(44 * sizeof(char));
        sprintf(x, &amp;quot;%d %d&amp;quot;, i, i);
        free(x);
    }
}
&lt;/pre&gt;
&lt;p&gt;Which as discussed above, is more comperable to the Python, gives a
result of &lt;strong&gt;1.96&lt;/strong&gt; seconds.&lt;/p&gt;
&lt;p&gt;Summary of performance:&lt;/p&gt;
&lt;table border="1" class="docutils"&gt;
&lt;colgroup&gt;
&lt;col width="20%" /&gt;
&lt;col width="19%" /&gt;
&lt;col width="19%" /&gt;
&lt;col width="12%" /&gt;
&lt;col width="30%" /&gt;
&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;GCC (stack)&lt;/td&gt;
&lt;td&gt;GCC (malloc)&lt;/td&gt;
&lt;td&gt;CPython&lt;/td&gt;
&lt;td&gt;PyPy (unroll-if-alt)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Time&lt;/td&gt;
&lt;td&gt;1.63s&lt;/td&gt;
&lt;td&gt;1.96s&lt;/td&gt;
&lt;td&gt;10.2s&lt;/td&gt;
&lt;td&gt;0.85s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;relative to C&lt;/td&gt;
&lt;td&gt;1x&lt;/td&gt;
&lt;td&gt;0.83x&lt;/td&gt;
&lt;td&gt;0.16x&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.9x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Overall PyPy is almost &lt;strong&gt;2x&lt;/strong&gt; faster. This is clearly win for dynamic
compilation over static - the &lt;cite&gt;sprintf&lt;/cite&gt; function lives in libc and so
cannot be specializing over the constant string, which has to be parsed
every time it's executed. In the case of PyPy, we specialize
the assembler if we detect the left hand string of the modulo operator
to be constant.&lt;/p&gt;
&lt;p&gt;Cheers,&lt;br/&gt;
alex &amp;amp; fijal&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-6756589731691762127?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/rkdXKSuYy8Y" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/6756589731691762127/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=6756589731691762127" title="50 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6756589731691762127?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6756589731691762127?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/rkdXKSuYy8Y/pypy-is-faster-than-c-again-string.html" title="PyPy is faster than C, again: string formatting" /><author><name>Maciej Fijalkowski</name><uri>http://www.blogger.com/profile/11410841070239382771</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>50</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/08/pypy-is-faster-than-c-again-string.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUYMSXc6cCp7ImA9WhdTEEg.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-6985924592886873374</id><published>2011-07-07T17:24:00.001+02:00</published><updated>2011-07-07T17:39:48.918+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-07-07T17:39:48.918+02:00</app:edited><title>Realtime image processing in Python</title><content type="html">Image processing is notoriously a CPU intensive task.  To do it in realtime,
you need to implement your algorithm in a fast language, hence trying to do it
in Python is foolish: Python is clearly not fast enough for this task. Is it?
:-)&lt;br /&gt;
Actually, it turns out that the PyPy JIT compiler produces code which is fast
enough to do realtime video processing using two simple algorithms implemented
by Håkan Ardö.&lt;br /&gt;
&lt;tt class="docutils literal"&gt;sobel.py&lt;/tt&gt; implements a classical way of locating edges in images, the
&lt;a class="reference external" href="http://en.wikipedia.org/wiki/Sobel_operator"&gt;Sobel operator&lt;/a&gt;. It is an approximation of the magnitude of the &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Image_gradient"&gt;image
gradient&lt;/a&gt;. The processing time is spend on two &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Convolution"&gt;convolutions&lt;/a&gt; between the
image and 3x3-kernels.&lt;br /&gt;
&lt;tt class="docutils literal"&gt;magnify.py&lt;/tt&gt; implements a pixel coordinate transformation that rearranges
the pixels in the image to form a magnifying effect in the center.
It consists of a single loop over the pixels in the output image copying
pixels from the input image.&lt;br /&gt;
You can try by yourself by downloading the appropriate demo:&lt;br /&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://wyvern.cs.uni-duesseldorf.de/%7Eantocuni/pypy-image-demo.tar.bz2"&gt;pypy-image-demo.tar.bz2&lt;/a&gt;: this archive contains only the source code,
use this is you have PyPy already installed&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://wyvern.cs.uni-duesseldorf.de/%7Eantocuni/pypy-image-demo-full.tar.bz2"&gt;pypy-image-demo-full.tar.bz2&lt;/a&gt;: this archive contains both the source
code and prebuilt PyPy binaries for linux 32 and 64 bits&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
To run the demo, you need to have &lt;tt class="docutils literal"&gt;mplayer&lt;/tt&gt; installed on your system.  The
demo has been tested only on linux, it might (or not) work also on other
systems:&lt;br /&gt;
&lt;pre class="literal-block"&gt;$ pypy pypy-image-demo/sobel.py

$ pypy pypy-image-demo/magnify.py
&lt;/pre&gt;
By default, the two demos uses an example AVI file.  To have more fun, you can
use your webcam by passing the appropriate mplayer parameters to the scripts,
e.g:&lt;br /&gt;
&lt;pre class="literal-block"&gt;$ pypy demo/sobel.py tv://
&lt;/pre&gt;
By default magnify.py uses &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Nearest-neighbor_interpolation"&gt;nearest-neighbor interpolation&lt;/a&gt;.  By adding the
option -b, &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Bilinear_interpolation"&gt;bilinear interpolation&lt;/a&gt; will be used instead, which gives
smoother result:&lt;br /&gt;
&lt;pre class="literal-block"&gt;$ pypy demo/magnify.py -b
&lt;/pre&gt;
There is only a single implementation of the algorithm in
&lt;tt class="docutils literal"&gt;magnify.py&lt;/tt&gt;. The two different interpolation methods are implemented by
subclassing the class used to represent images and embed the
interpolation within the pixel access method. PyPy is able to achieve good
performance with this kind of abstractions because it can inline
the pixel access method and specialize the implementation of the algorithm.
In C++ that kind of pixel access method would be virtual and you'll need to use
templates to get the same effect without incurring in runtime overhead.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://2.gvt0.com/vi/5DtlBC_Zbq4/0.jpg" height="266" width="320"&gt;&lt;param name="movie" value="http://www.youtube.com/v/5DtlBC_Zbq4&amp;fs=1&amp;source=uds" /&gt;

&lt;param name="bgcolor" value="#FFFFFF" /&gt;

&lt;embed width="320" height="266"  src="http://www.youtube.com/v/5DtlBC_Zbq4&amp;fs=1&amp;source=uds" type="application/x-shockwave-flash"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;/div&gt;
The &lt;a class="reference external" href="http://www.youtube.com/watch?v=5DtlBC_Zbq4"&gt;video&lt;/a&gt; above shows PyPy and CPython running &lt;tt class="docutils literal"&gt;sobel.py&lt;/tt&gt; side by
side (PyPy taking input from the webcam, CPython from the test
file). Alternatively, to have a feeling on how much PyPy is faster than
CPython, try to run the demo with the latter.  These are the the average fps
(frames per second) that I get on my machine (Ubuntu 64 bit, Intel i7 920, 4GB
RAM) when processing the default &lt;tt class="docutils literal"&gt;test.avi&lt;/tt&gt; video and using the prebuilt
PyPy binary found in the &lt;a class="reference external" href="http://wyvern.cs.uni-duesseldorf.de/%7Eantocuni/pypy-image-demo-full.tar.bz2"&gt;full&lt;/a&gt; tarball alinked above.  For &lt;tt class="docutils literal"&gt;sobel.py&lt;/tt&gt;:&lt;br /&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;PyPy: ~47.23 fps&lt;/li&gt;
&lt;li&gt;CPython: ~0.08 fps&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
For &lt;tt class="docutils literal"&gt;magnify.py&lt;/tt&gt;:&lt;br /&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;PyPy: ~26.92 fps&lt;/li&gt;
&lt;li&gt;CPython: ~1.78 fps&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
This means that on &lt;tt class="docutils literal"&gt;sobel.py&lt;/tt&gt;, PyPy is &lt;b&gt;590 times faster&lt;/b&gt;.  On
&lt;tt class="docutils literal"&gt;magnify.py&lt;/tt&gt; the difference is much less evident and the speedup is "only"
15x.&lt;br /&gt;
It must be noted that this is an extreme example of what PyPy can do.  In
particular, you cannot expect (yet :-)) PyPy to be fast enough to run an
arbitrary video processing algorithm in real time, but the demo still proves
that PyPy has the potential to get there.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-6985924592886873374?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/i2g0NvdWPZo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/6985924592886873374/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=6985924592886873374" title="22 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6985924592886873374?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/6985924592886873374?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/i2g0NvdWPZo/realtime-image-processing-in-python.html" title="Realtime image processing in Python" /><author><name>Antonio Cuni</name><uri>http://www.blogger.com/profile/17017456817083804792</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="27" height="32" src="http://3.bp.blogspot.com/-L5bYMW7FznA/Tb3Ms8nMvRI/AAAAAAAAANI/dbKVSIc2dzk/s220/anto-bn3.jpg" /></author><thr:total>22</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/07/realtime-image-processing-in-python.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEQNSXg5eyp7ImA9WhZaE0s.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-8270246310848099963</id><published>2011-06-29T18:50:00.002+02:00</published><updated>2011-06-29T18:53:18.623+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-06-29T18:53:18.623+02:00</app:edited><title>Global Interpreter Lock, or how to kill it</title><content type="html">&lt;p&gt;People that listened to my (Armin Rigo) lightning talk at EuroPython know that
suddenly, we have a plan to remove the Global Interpreter Lock --- the
infamous GIL, the thing in CPython that prevents multiple threads from
actually running in your Python code in parallel.&lt;/p&gt;
&lt;p&gt;That's not actually new, because Jython has been doing it all along.
Jython works by very carefully adding locks to
all the mutable built-in types, and by relying on the underlying Java
platform to be efficient about them (so that the result is faster than,
say, very carefully adding similar locks in CPython).  By &amp;quot;very
carefully&amp;quot;, I mean &lt;em&gt;really&lt;/em&gt; &lt;em&gt;really&lt;/em&gt; carefully; for example,
'dict1.update(dict2)' needs to lock both dict1 and dict2, but if you do
it naively, then a parallel 'dict2.update(dict1)' might cause a
deadlock.&lt;/p&gt;
&lt;p&gt;All of PyPy, CPython and IronPython have a GIL.  But for PyPy we are considering
a quite different approach than Jython's, based on &lt;a class="reference" href="http://en.wikipedia.org/wiki/Software_transactional_memory"&gt;Software
Transactional Memory&lt;/a&gt;.  This is a recent development in computer
science, and it gives a nicer solution than locking.  Here is a short
introduction to it.&lt;/p&gt;
&lt;p&gt;Say you want to atomically pop an item from 'list1' and append it to
'list2':&lt;/p&gt;
&lt;pre class="literal-block"&gt;
def f(list1, list2):
    x = list1.pop()
    list2.append(x)
&lt;/pre&gt;
&lt;p&gt;This is not safe in multithreaded cases (even with the GIL).  Say that
you call &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;f(l1,&lt;/span&gt; &lt;span class="pre"&gt;l2)&lt;/span&gt;&lt;/tt&gt; in thread 1 and &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;f(l2,&lt;/span&gt; &lt;span class="pre"&gt;l1)&lt;/span&gt;&lt;/tt&gt; in thread 2.  What
you want is that it has no effect at all (x is moved from one list to
the other, then back).  But what can occur is that instead the top of
the two lists are swapped, depending on timing issues.&lt;/p&gt;
&lt;p&gt;One way to fix it is with a global lock:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
def f(list1, list2):
    global_lock.acquire()
    x = list1.pop()
    list2.append(x)
    global_lock.release()
&lt;/pre&gt;
&lt;p&gt;A finer way to fix it is with locks that come with the lists:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
def f(list1, list2):
    acquire_all_locks(list1.lock, list2.lock)
    x = list1.pop()
    list2.append(x)
    release_all_locks(list1.lock, list2.lock)
&lt;/pre&gt;
&lt;p&gt;The second solution is a model for Jython's, while the first is a model
for CPython's.  Indeed, in CPython's interpreter, we acquire the GIL,
then we do one bytecode (or actually a number of them, like 100), then
we release the GIL; and then we proceed to the next bunch of 100.&lt;/p&gt;
&lt;p&gt;Software Transactional Memory (STM) gives a third solution:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
def f(list1, list2):
    while True:
        t = transaction()
        x = list1.pop(t)
        list2.append(t, x)
        if t.commit():
            break
&lt;/pre&gt;
&lt;p&gt;In this solution, we make a &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;transaction&lt;/span&gt;&lt;/tt&gt; object and use it in all
reads and writes we do to the lists.  There are actually several
different models, but let's focus on one of them.  During a transaction,
we don't actually change the global memory at all.  Instead, we use the
thread-local &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;transaction&lt;/span&gt;&lt;/tt&gt; object.  We store in it which objects we
read from, which objects we write to, and what values we write.  It is
only when the transaction reaches its end that we attempt to &amp;quot;commit&amp;quot;
it.  Committing might fail if other commits have occurred in between,
creating inconsistencies; in that case, the transaction aborts and
must restart from the beginning.&lt;/p&gt;
&lt;p&gt;In the same way as the previous two solutions are models for CPython and
Jython, the STM solution looks like it could be a model for PyPy in the
future.  In such a PyPy, the interpreter would start a transaction, do
one or several bytecodes, and then end the transaction; and repeat.
This is very similar to what is going on in CPython with the GIL.  In
particular, it means that it gives programmers all the same guarantees
as the GIL does.  The &lt;em&gt;only&lt;/em&gt; difference is that it can actually run
multiple threads in parallel, as long as their code does not interfere
with each other.  (In particular, if you need not just the GIL but actual
locks in your existing multi-threaded program, then this will not
magically remove the need for them.  You might get an additional built-in
module that exposes STM to your Python programs, if you prefer it over
locks, but that's another question.)&lt;/p&gt;
&lt;p&gt;Why not apply that idea to CPython?  Because we would need to change
everything everywhere.  In the example above, you may have noted that I
no longer call 'list1.pop()', but 'list1.pop(t)'; this is a way to tell
that the implementation of all the methods needs to be changed, in order
to do their work &amp;quot;transactionally&amp;quot;.  This means that instead of really
changing the global memory in which the list is stored, it must instead
record the change in the &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;transation&lt;/span&gt;&lt;/tt&gt; object.  If our interpreter is
written in C, as CPython is, then we need to write it explicitly
everywhere.  If it is written instead in a higher-level language, as
PyPy is, then we can add this behavior as as set of translation rules, and
apply them automatically wherever it is necessary.  Moreover, it can be
a translation-time option: you can either get the current &amp;quot;pypy&amp;quot; with a
GIL, or a version with STM, which would be slower due to the extra
bookkeeping.  (How much slower?  I have no clue, but as a wild guess,
maybe between 2 and 5 times slower.  That is fine if you have enough
cores, as long as it scales nicely :-)&lt;/p&gt;
&lt;p&gt;A final note: as STM research is very recent (it started around 2003),
there are a number of variants around, and it's not clear yet which one
is better in which cases.  As far as I can tell, the approach described
in &amp;quot;A Comprehensive Strategy for Contention Management in Software
Transactional Memory&amp;quot; seems to be one possible state-of-the-art; it also
seems to be &amp;quot;good enough for all cases&amp;quot;.&lt;/p&gt;
&lt;p&gt;So, when will it be done?  I cannot say yet.  It is still at the idea
stage, but I &lt;em&gt;think&lt;/em&gt; that it can work.  How long would it take us to
write it?  Again no clue, but we are looking at many months rather
than many days.  This is the sort of thing that I would
like to be able to work on full time after the &lt;a class="reference" href="http://morepypy.blogspot.com/2010/12/oh-and-btw-pypy-gets-funding-through.html"&gt;Eurostars funding&lt;/a&gt;
runs out on September 1.  We are currently looking at ways to use
&lt;a class="reference" href="http://en.wikipedia.org/wiki/Crowd_funding"&gt;crowdfunding&lt;/a&gt; to raise money so that I can do exactly that.  Expect
a blog post about that very soon.  But this looks like a perfect
candidate for crowdfunding -- there are at least thousands of you who
would be willing to pay 10s of Euros to Kill the GIL.  Now we only
have to make this happen.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-8270246310848099963?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/-QRfm9Eta90" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/8270246310848099963/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=8270246310848099963" title="48 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/8270246310848099963?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/8270246310848099963?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/-QRfm9Eta90/global-interpreter-lock-or-how-to-kill.html" title="Global Interpreter Lock, or how to kill it" /><author><name>Armin Rigo</name><uri>http://www.blogger.com/profile/06300515270104686574</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>48</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkQFQno7eCp7ImA9WhZUFUw.&quot;"><id>tag:blogger.com,1999:blog-3971202189709462152.post-2083371215707583264</id><published>2011-06-08T07:18:00.001+02:00</published><updated>2011-06-08T07:18:33.400+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-06-08T07:18:33.400+02:00</app:edited><title>Report back from our survey</title><content type="html">&lt;p&gt;Hi all,&lt;/p&gt;
&lt;p&gt;I'm here to report back the results of our survey. First, we're very pleased to
report that a number of you guys are happilly running PyPy in production! Most
(97%) of the respondants using PyPy are using it because it's faster, but a
further 26% (respondants could choose multiple answers) are using it because of
lower memory usage. Of users who aren't using PyPy, the most common reason was
C extensions, followed by &amp;quot;Other&amp;quot;.&lt;/p&gt;
&lt;p&gt;From reading the extra comments section there are a few things we've learned:&lt;/p&gt;
&lt;ol class="loweralpha simple"&gt;
&lt;li&gt;Google docs needs a better UI for this stuff&lt;/li&gt;
&lt;li&gt;A huge number of people want NumPy and SciPy, it was easily the most
requested C extension (25% of respondants said somthing about NumPy). We've
already blogged on the topic of &lt;a class="reference external" href="http://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html"&gt;our plans for NumPy&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Having packages in the various OS's repositories would be a big help in
getting users up and running.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A huge thanks to everyone who responded! Finally, if you're using PyPy in
production we'd love to get a testimonial from you, if you're willing to spare
a few minutes to give us a quote or two please get in contact with us via &lt;a class="reference external" href="http://mail.python.org/mailman/listinfo/pypy-dev"&gt;our
mailing list&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Thanks,
Alex&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3971202189709462152-2083371215707583264?l=morepypy.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/PyPyStatusBlog/~4/CerU5EJrN60" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://morepypy.blogspot.com/feeds/2083371215707583264/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3971202189709462152&amp;postID=2083371215707583264" title="12 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/2083371215707583264?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3971202189709462152/posts/default/2083371215707583264?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/PyPyStatusBlog/~3/CerU5EJrN60/report-back-from-our-survey.html" title="Report back from our survey" /><author><name>Alex</name><uri>http://www.blogger.com/profile/14054821112394577330</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>12</thr:total><feedburner:origLink>http://morepypy.blogspot.com/2011/06/report-back-from-our-survey.html</feedburner:origLink></entry></feed>

