The compiler rarely knows best

Thu, Jul 12, 2012

This is a response to http://pwang.wordpress.com/2012/07/11/does-the-compiler-know-best/ if you haven’t read it yet, start there.

For lack of any other way to say it, I disagree with nearly every premise presented and conclusion derived in Peter’s blog post. The post itself doesn’t appear to have any coherent theme, besides that PyPy is not the future of Python, so I’ll attempt to reply to Peter’s statements more or less in order.

First, and perhaps most erroneously, he claims that “PyPy is an even more drastic change to the Python language than Python3”. This is wrong. Complete and utterly. PyPy is in fact not a change to Python at all, PyPy faithfully implements the Python language as described by the Python language reference, and as verified by the test suite. Moreover, this is a statement that would apply equally to Jython and IronPython. It is pure, unadulterated FUD. Peter is trying to extend the definition of the Python language to things that it simple doesn’t cover, such as the C-API and what he thinks the interpreter should look like (to be discussed more).

Second, he writes, “What is the core precept of PyPy? Itâs that âthe compiler knows bestâ.” This too, is wrong. First, PyPy’s central thesis is, “any task repeatedly performed manually will be done incorrectly”, this is why we have things like automatic insertion of the garbage collector, in preference to CPython’s “reference counting everywhere”, and automatically generating the just in time compiler from the interpreter, in preference to Unladen Swallow’s (and almost every other language’s) manual construction of it. Second, the PyPy developers would never argue that the compiler knows best, as I alluded to in this post’s title. That doesn’t mean you should quit trying to write intelligent compilers, 1) the compiler often knows better than the user, just like with C, while it’s possible to write better x86 assembler than GCC for specific functions, over the course of a large project GCC will always win, 2) they aren’t mutually exclusive, having an intelligent compiler does not prohibit giving the user more control, in fact it’s a necessity! There are no pure-python hints that you can give to CPython to improve performance, but these can easily be added with PyPy’s JIT.

He follows this by saying that in contrast to PyPy’s (nonexistent) principle of “compiler knows best” CPython’s strength is that it can communicate with other platforms because its inner workings are simple. These three things have nothing to do with each other. CPython’s interoperability with other platforms is a function of it’s C-API. You can build an API like this on top of something monstrously complicated too, look at JNI for the JVM. (I don’t accept that PyPy is so complex, but that’s another post for another time.) In any event, the PyPy developers are deeply committed to interoperability with other platforms, which is why Armin and Maciej have been working on cffi: https://cffi.readthedocs.io/en/latest/index.html

The next paragraph is one of the most bizarre things I’ve ever read. He suggests that if you do want the free performance gains PyPy promises you should just build a a Python to JS compiler and use Node.js. I have to assume this paragraph is a joke not meant for publication, because it’s nonsense. First, I’ve been told by the scientific Python community (of which Peter is a member) that any solution that isn’t backwards compatible with a mountain of older platforms will never be adopted. So naturally his proposed solution is to throw away all existing work. Next, he implies that Google, Mozilla, Apple, and Microsoft are all collaborating on a single Javascript runtime which is untrue, in fact they each have their own VM. And V8, the one runtime specifically alluded to via Node.js, is not, as he writes, designed to be concurrent; Evan Phoenix, lead developer of Rubinius, comments, “It’s probably the least concurrent runtime I’ve seen.”

He then moves on to discussing the transparency of the levels involved in a runtime. Here I think he’s 100% correct. Being able to understand how a VM is operating, what it’s doing, what it’s optimizing, how it’s executing is enormously important. That’s why I’m confused that he’s positioning this as an argument against PyPy, as we’ve made transparency of our system incredibly important. We have the jitviewer, a tool which exposes the exact internal operations and machine code generated for everything PyPy compiles, which can be correlated to a individual line of Python code. We also have a set of hooks into the JIT to be able to programatically inspect what’s happening, including writing your own, pure Python, optimization passes: https://pypy.readthedocs.io/en/latest/jit-hooks.html!

That’s all I have. Hope you enjoyed.