|
A look at the PyPy 2.0 release
It's hard to say why, but May appears to be the month where we look in on PyPy.
Three
years ago, we had a May 2010 introduction to
PyPy,
followed by an experiment using it in May
2011. This year, the PyPy
2.0 release was made on May 9—that, coupled with our evident
tradition, makes for a good reason to look in on this Python
interpreter written in Python.
One might ask: "Why write a Python interpreter in Python?" The possibly
surprising
answer is: "performance". It's not quite as simple as that, of course, as
there are some additional pieces to the puzzle. To start with, PyPy is
written in a subset of Python, called RPython,
which is oriented toward making a Python dialect that is less dynamic and
acts a bit more like C.
PyPy also includes a just-in-time (JIT) compiler that flat out beats "normal" Python (called
CPython) on a variety of benchmarks.
PyPy has been making steady progress for over ten years now, and has
reached a point where it can be used in place of CPython in lots of
places. For a long time, compatibility with the standard library and
other Python libraries and frameworks was lacking, but that situation has
improved
substantially over the years. Major frameworks like Django and Twisted
already work with PyPy. 2.0 adds support for Stackless Python with greenlets, which
provide micro-threading for Python. Those two pieces should allow asynchronous programs using gevent and eventlet to work as well (though gevent
requires some PyPy-specific changes).
In order to support more Python modules that call out to C (typically for
performance reasons), PyPy now includes CFFI 0.6, which is a
foreign function interface for calling C code from Python. Unlike other
methods for calling C functions, CFFI works well for both CPython and PyPy,
while also providing a "reasonable path" for IronPython
(Python on .NET) or Jython (Python on
the Java virtual machine).
Trying it out
Getting PyPy 2.0 is a bit tricky, at least for now. Those who are on
Ubuntu 10.04 or 12.04 can pick up binaries from the download page (as can Mac
OS X and Windows users). While many distributions carry PyPy in their
repositories, 2.0 has not arrived yet. There are "statically linked" PyPy
binaries, but the 64-bit version (at least) doesn't quite live up to the name—it requires a
dozen or so shared libraries, including older versions of libssl,
libcrypto, and libbz2 than those available for Fedora 18.
Normally, given constraints like that, building from source is the right
approach, but the project has some fairly scary
warnings about doing so. According to the docs, building on a
64-bit machine with 4G or less of RAM will "just swap
forever", which didn't sound all that enticing. But there is a
workaround that doesn't use CPython and instead requires PyPy using some
magic incantations (an environment variable and
command-line flag) to limit the memory usage—but that means making
the static PyPy binary work.
A little symbolic linking (for
libbz2) and some library building (openssl-1.0.0j)
resulted in a
functioning PyPy. There is no real reason not to use that, but I was a
little leery
of it and curious to continue with the build process.
Running the PyPy build is a rather eye-opening experience. Beyond the voluminous
output—including colored-ASCII Mandelbrot set rendering, lots of
status,
and some warnings that are seemingly not serious—it took more than 2
hours (8384.6 seconds according to the detailed timing information spit out
at end
of the build process) on my not horribly underpowered desktop (2.33 GHz
Core 2 Duo). The Linux kernel only takes six minutes or so on that system.
Calculating
the Mandelbrot set while translating is not the only whimsical
touch that comes with PyPy. Starting it up leads to a
fortune-like quote, though one with a PyPythonic orientation:
$ pypy
Python 2.7.3 (b9c3566aa0170aaa736db0491d542c309ec7a5dc, May 11 2013, 17:54:41)
[PyPy 2.0.0 with GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``PyPy is a tool to keep otherwise
dangerous minds safely occupied.''
>>>>
From the command line, it acts pretty much exactly like Python
2.7.3—as advertised.
As a quick test of PyPy, I ran an amusingly
shaped Mandelbrot program in both CPython and PyPy. As expected, the
PyPy version ran significantly faster (just over 3 minutes, compared to 8+
minutes for CPython 2.7.3). In addition, the bitmaps produced were identical.
PyPy comes with its own set of standard library modules, but additional
modules can be picked up from an existing CPython site-packages
directory (via the PYTHONPATH environment variable). Trying out a
few of those (BeautifulSoup 4 for example) showed no obvious
problems, though a PyPy bug
report shows problems using the lxml parser, along with some other, more
subtle problems. The compatibility page gives an overview
of the compatibility picture, while the compatibility
wiki has more information on a per-package basis. A scan of the wiki
shows there's lots more work to do, but it also shows a pretty wide range
of code compatible with PyPy.
ARM and other projects
One of the more interesting developments in the PyPy world is ARM support,
for which a 2.0 alpha has been released.
It supports both ARMv6 (e.g., Raspberry Pi) and v7 (e.g., Beagleboard,
Chromebook) and the benchmark results look good, especially given that the
ARM code "is not nearly as polished as our x86 assembler".
The Raspberry Pi Foundation helped get PyPy onto
ARM with "a
small amount of funding".
The PyPy project is running several concurrent fundraising efforts, three
for specific sub-projects, and one for overall project funding. The transactional memory/automatic mutual
exclusion sub-project is an effort to use software transactional memory
to allow Python programs to use multiple cores more effectively. It would
remove the global interpreter lock (GIL) for PyPy for better concurrency.
PyPy hackers Armin Rigo and Maciej Fijałkowski gave a presentation
at PyCon 2013 on this effort.
Another ongoing sub-project is an effort to add Python 3 support to PyPy.
That would allow input programs in Python 3, but would not change the
PyPy implementation language (RPython based on Python 2.x). A status
report back in March shows good progress. On x86-32 Linux, the CPython
regression test suite "now passes 289 out of approximately 354 modules (with 39 skips)".
The third sub-project is to make NumPy
work with PyPy. NumPy is an
extension for doing math with matrices and multi-dimensional arrays. Much
of that work is done in C code, so PyPy's JIT would need to use the vector
instructions on modern CPUs. A brief status
update from May 11 shows some progress, as does the 2.0 release
announcement (though the temporary loss of lazy expression evaluation may
not exactly be considered progress).
Overall, the PyPy project seems to be cruising along. While none of the
fundraising efforts have hit their targets, some fairly significant money has
been raised. Beyond that, some major technical progress has been made.
The sub-projects, software transactional memory in particular, are also
providing interesting new dimensions for the project.
We are getting closer to the day when most Python code is runnable with PyPy,
though we still aren't there yet.
(Log in to post comments)
|