Return to the Front page

Little things that matter in language design

LWN.net Weekly Edition for June 6, 2013

Power-aware scheduling meets a line in the sand

LWN.net Weekly Edition for May 31, 2013

Weekly edition	Kernel	Security	Distributions	Contact Us	Search
Archives	Calendar	Subscribe	Write for LWN	LWN.net FAQ	Sponsors

A look at the PyPy 2.0 release

ByJake Edge
May 15, 2013

It's hard to say why, but May appears to be the month where we look in on PyPy. Three years ago, we had a May 2010 introduction to PyPy, followed by an experiment using it in May 2011. This year, the PyPy 2.0 release was made on May 9—that, coupled with our evident tradition, makes for a good reason to look in on this Python interpreter written in Python.

One might ask: "Why write a Python interpreter in Python?" The possibly surprising answer is: "performance". It's not quite as simple as that, of course, as there are some additional pieces to the puzzle. To start with, PyPy is written in a subset of Python, called RPython, which is oriented toward making a Python dialect that is less dynamic and acts a bit more like C. PyPy also includes a just-in-time (JIT) compiler that flat out beats "normal" Python (called CPython) on a variety of benchmarks.

PyPy has been making steady progress for over ten years now, and has reached a point where it can be used in place of CPython in lots of places. For a long time, compatibility with the standard library and other Python libraries and frameworks was lacking, but that situation has improved substantially over the years. Major frameworks like Django and Twisted already work with PyPy. 2.0 adds support for Stackless Python with greenlets, which provide micro-threading for Python. Those two pieces should allow asynchronous programs using gevent and eventlet to work as well (though gevent requires some PyPy-specific changes).

In order to support more Python modules that call out to C (typically for performance reasons), PyPy now includes CFFI 0.6, which is a foreign function interface for calling C code from Python. Unlike other methods for calling C functions, CFFI works well for both CPython and PyPy, while also providing a "reasonable path" for IronPython (Python on .NET) or Jython (Python on the Java virtual machine).

Trying it out

Getting PyPy 2.0 is a bit tricky, at least for now. Those who are on Ubuntu 10.04 or 12.04 can pick up binaries from the download page (as can Mac OS X and Windows users). While many distributions carry PyPy in their repositories, 2.0 has not arrived yet. There are "statically linked" PyPy binaries, but the 64-bit version (at least) doesn't quite live up to the name—it requires a dozen or so shared libraries, including older versions of libssl, libcrypto, and libbz2 than those available for Fedora 18.

Normally, given constraints like that, building from source is the right approach, but the project has some fairly scary warnings about doing so. According to the docs, building on a 64-bit machine with 4G or less of RAM will "just swap forever", which didn't sound all that enticing. But there is a workaround that doesn't use CPython and instead requires PyPy using some magic incantations (an environment variable and command-line flag) to limit the memory usage—but that means making the static PyPy binary work. A little symbolic linking (for libbz2) and some library building (openssl-1.0.0j) resulted in a functioning PyPy. There is no real reason not to use that, but I was a little leery of it and curious to continue with the build process.

Running the PyPy build is a rather eye-opening experience. Beyond the voluminous output—including colored-ASCII Mandelbrot set rendering, lots of status, and some warnings that are seemingly not serious—it took more than 2 hours (8384.6 seconds according to the detailed timing information spit out at end of the build process) on my not horribly underpowered desktop (2.33 GHz Core 2 Duo). The Linux kernel only takes six minutes or so on that system.

Calculating the Mandelbrot set while translating is not the only whimsical touch that comes with PyPy. Starting it up leads to a fortune-like quote, though one with a PyPythonic orientation:

    $ pypy
    Python 2.7.3 (b9c3566aa0170aaa736db0491d542c309ec7a5dc, May 11 2013, 17:54:41)
    [PyPy 2.0.0 with GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    And now for something completely different: ``PyPy is a tool to keep otherwise
    dangerous minds safely occupied.''
    >>>>

From the command line, it acts pretty much exactly like Python 2.7.3—as advertised.

As a quick test of PyPy, I ran an amusingly shaped Mandelbrot program in both CPython and PyPy. As expected, the PyPy version ran significantly faster (just over 3 minutes, compared to 8+ minutes for CPython 2.7.3). In addition, the bitmaps produced were identical.

PyPy comes with its own set of standard library modules, but additional modules can be picked up from an existing CPython site-packages directory (via the PYTHONPATH environment variable). Trying out a few of those (BeautifulSoup 4 for example) showed no obvious problems, though a PyPy bug report shows problems using the lxml parser, along with some other, more subtle problems. The compatibility page gives an overview of the compatibility picture, while the compatibility wiki has more information on a per-package basis. A scan of the wiki shows there's lots more work to do, but it also shows a pretty wide range of code compatible with PyPy.

ARM and other projects

One of the more interesting developments in the PyPy world is ARM support, for which a 2.0 alpha has been released. It supports both ARMv6 (e.g., Raspberry Pi) and v7 (e.g., Beagleboard, Chromebook) and the benchmark results look good, especially given that the ARM code "is not nearly as polished as our x86 assembler". The Raspberry Pi Foundation helped get PyPy onto ARM with "a small amount of funding".

The PyPy project is running several concurrent fundraising efforts, three for specific sub-projects, and one for overall project funding. The transactional memory/automatic mutual exclusion sub-project is an effort to use software transactional memory to allow Python programs to use multiple cores more effectively. It would remove the global interpreter lock (GIL) for PyPy for better concurrency. PyPy hackers Armin Rigo and Maciej Fijałkowski gave a presentation at PyCon 2013 on this effort.

Another ongoing sub-project is an effort to add Python 3 support to PyPy. That would allow input programs in Python 3, but would not change the PyPy implementation language (RPython based on Python 2.x). A status report back in March shows good progress. On x86-32 Linux, the CPython regression test suite "now passes 289 out of approximately 354 modules (with 39 skips)".

The third sub-project is to make NumPy work with PyPy. NumPy is an extension for doing math with matrices and multi-dimensional arrays. Much of that work is done in C code, so PyPy's JIT would need to use the vector instructions on modern CPUs. A brief status update from May 11 shows some progress, as does the 2.0 release announcement (though the temporary loss of lazy expression evaluation may not exactly be considered progress).

Overall, the PyPy project seems to be cruising along. While none of the fundraising efforts have hit their targets, some fairly significant money has been raised. Beyond that, some major technical progress has been made. The sub-projects, software transactional memory in particular, are also providing interesting new dimensions for the project. We are getting closer to the day when most Python code is runnable with PyPy, though we still aren't there yet.

(Log in to post comments)