My experience with the computer language shootout

Sun, Apr 3, 2011

For a long time we, the PyPy developers, have known the Python implementations on the Computer Language Shootout were not optimal under PyPy, and in fact had been ruthlessly microoptimized for CPython, to the detriment of PyPy. But we didn’t really care or do anything about it, because we figured those weren’t really representative of what people like to do with Python, and who really cares what it says, CPython is over 30 times slower than C, and people use it just the same. However, I’ve recently have a number of discussions about language implementation speed and people tend to cite the language shootout as the definitive source for cross-language comparisons. So I decided to see what I could do about making it faster.

The first benchmark I took a stab at was reverse-complement, PyPy was doing crappily on it, and it was super obviously optimized for CPython: every loop possible was pushed down into functions known to be implemented in C, various memory allocation tricks are played (e.g. del some_list[:] removes the contents of the list, but doesn’t deallocate the memory), and bound method allocation is pulled out of loops. The first one is the most important for PyPy, on PyPy your objective is generally to make sure your hot loops are in Python, the exact opposite of what you want on CPython. So I started coding up my own version, optimized for PyPy, I spent some time with our debugging and profiling tools, and whipped up a nice implementation that was something like 3x faster than the current one on PyPy, which you can see here. Generally the objective here was to make sure the program does as little memory allocation in the hot loops as possible, all of which are in Python. Try that with your average interpreter.

So I went ahead and submitted it, thinking PyPy would be looking 3 times better when I woke up. Naturally I wake up to an email from the shootout, which says that I should provide a Python 3 implementation, and that it doesn’t work on CPython. What the hell? I try to run it myself and indeed it doesn’t. It turns out on CPython sys.stdout.write(buffer(array.array("c"), 0, idx)) raises an exception. Which is a tad unfortunate because it should be an easy way to print out part of an array of characters without needing to allocate memory. After speaking with some CPython core developers, it appears that it is indeed a bug in CPython. And I noticed on PyPy buffer objects aren’t nearly as efficient as they should be, so I set out in search of a new way to work on CPython and PyPy, and be faster if possible. I happened to stuble across the method array.buffer_info which returns a tuple of the memory address of the array’s internal storage and its length, and a brilliant hack occurred to me: use ctypes to call libc’s write() function. I coded it up, and indeed it worked on PyPy and CPython and was 40% faster on PyPy to boot. Fantastic I thought, I’ll just submit this and PyPy will look rocking! Only 3.5x slower than C, not bad for an interpreter, in a language that is notoriously hard to optimize. You can see the implementation right here, it contains a few other performance tricks as well, but nothing too exciting.

So I submitted this, thinking, “Aha! I’ve done it”. Shortly, I had an email saying this has been accepted as an “interesting alternative” because it used ctypes, which is to say it won’t be included in the cumulative timings for each implementation, nor will it be listed with the normal implementations for the per-benchmark scores. Well crap, that’s no good, the whole point of this was to look good, what’s the point if no one is going to see this glorious work. So I sent a message asking why this implementation was considered alternative, since it appeared fairly legitimate. I received a confusing message questioning why this optimization was necessary, followed by a suggestion that perhaps PyPy wasn’t compatible enough with (with what I dare not ask, but the answer obviously isn’t Python the abstract language, since CPython had the bug!).

Overall it was a pretty crappy experience. The language shootout appears to be governed by arbitrary rules. For example the C implementations use GCC builtins, which are not part of the C standard, making them not implementation portable. The CPython pidigits version uses a C extension which is obviously not implementation portable (by comparison every major Python implementation includes ctypes, only CPython, and to varying extents IronPython and PyPy, support the CPython C-API), although here PyPy was allowed to use ctypes. It’s also not possible to send any messages once your ticket has been marked as closed, meaning to dispute a decision you basically need to pray the maintainer reopens it for some reason. The full back and forth is available here. I’m still interested in improving the PyPy submissions there (and further optimizing PyPy where needed). However given the seemingly capricious and painful submission process I’m not really inclined to continue work, nor can I take the shootout seriously as an honest comparison of languages.