alex gaynor's blago-blog

Posts tagged with pypy

PyPy is the Future of Python

Posted May 15th, 2010. Tagged with python, pypy.

Currently the most common implementation of Python is known as CPython, and it's the version of Python you get at python.org, probably 99.9% of Python developers are using it. However, I think over the next couple of years we're going to see a move away from this towards PyPy, Python written in Python. This is going to happen because PyPy offers better speed, more flexibility, and is a better platform for Python's growth, and the most important thing is you can make this transition happen.

The first thing to consider: speed. PyPy is a lot faster than CPython for a lot of tasks, and they've got the benchmarks to prove it. There's room for improvement, but it's clear that for a lot of benchmarks PyPy screams, and it's not just number crunching (although PyPy is good at that too). Although Python performance might not be a bottleneck for a lot of us (especially us web developers who like to push performance down the stack to our database), would you say no to having your code run 2x faster?

The next factor is the flexibility. By writing their interpreter in RPython PyPy can automatically generate C code (like CPython), but also JVM and .NET versions of the interpreter. Instead of writing entirely separate Jython and IronPython implementations of Python, just automatically generate them from one shared codebase. PyPy can also have its binary generated with a stackless option, just like stackless Python, again no separate implementations to maintain. Lastly, PyPy's JIT is almost totally separate from the interpreter, this means changes to the language itself can be made without needing to update the JIT, contrast this with many JITs that need to statically define fast-paths for various operations.

And finally that it's a better platform for growth. The last point is a good example of this: one can keep the speed from the JIT while making changes to the language, you don't need to be an assembly expert to write a new bytecode, or play with the builtin types, the JIT generator takes care of it for you. Also, it's written in Python, it may be RPython which isn't as high level as regular Python, but compare the implementations of of map from CPython and PyPy:

static PyObject *
builtin_map(PyObject *self, PyObject *args)
{
    typedef struct {
        PyObject *it;           /* the iterator object */
        int saw_StopIteration;  /* bool:  did the iterator end? */
    } sequence;

    PyObject *func, *result;
    sequence *seqs = NULL, *sqp;
    Py_ssize_t n, len;
    register int i, j;

    n = PyTuple_Size(args);
    if (n < 2) {
        PyErr_SetString(PyExc_TypeError,
                        "map() requires at least two args");
        return NULL;
    }

    func = PyTuple_GetItem(args, 0);
    n--;

    if (func == Py_None) {
        if (PyErr_WarnPy3k("map(None, ...) not supported in 3.x; "
                           "use list(...)", 1) < 0)
            return NULL;
        if (n == 1) {
            /* map(None, S) is the same as list(S). */
            return PySequence_List(PyTuple_GetItem(args, 1));
        }
    }

    /* Get space for sequence descriptors.  Must NULL out the iterator
     * pointers so that jumping to Fail_2 later doesn't see trash.
     */
    if ((seqs = PyMem_NEW(sequence, n)) == NULL) {
        PyErr_NoMemory();
        return NULL;
    }
    for (i = 0; i < n; ++i) {
        seqs[i].it = (PyObject*)NULL;
        seqs[i].saw_StopIteration = 0;
    }

    /* Do a first pass to obtain iterators for the arguments, and set len
     * to the largest of their lengths.
     */
    len = 0;
    for (i = 0, sqp = seqs; i < n; ++i, ++sqp) {
        PyObject *curseq;
        Py_ssize_t curlen;

        /* Get iterator. */
        curseq = PyTuple_GetItem(args, i+1);
        sqp->it = PyObject_GetIter(curseq);
        if (sqp->it == NULL) {
            static char errmsg[] =
                "argument %d to map() must support iteration";
            char errbuf[sizeof(errmsg) + 25];
            PyOS_snprintf(errbuf, sizeof(errbuf), errmsg, i+2);
            PyErr_SetString(PyExc_TypeError, errbuf);
            goto Fail_2;
        }

        /* Update len. */
        curlen = _PyObject_LengthHint(curseq, 8);
        if (curlen > len)
            len = curlen;
    }

    /* Get space for the result list. */
    if ((result = (PyObject *) PyList_New(len)) == NULL)
        goto Fail_2;

    /* Iterate over the sequences until all have stopped. */
    for (i = 0; ; ++i) {
        PyObject *alist, *item=NULL, *value;
        int numactive = 0;

        if (func == Py_None && n == 1)
            alist = NULL;
        else if ((alist = PyTuple_New(n)) == NULL)
            goto Fail_1;

        for (j = 0, sqp = seqs; j < n; ++j, ++sqp) {
            if (sqp->saw_StopIteration) {
                Py_INCREF(Py_None);
                item = Py_None;
            }
            else {
                item = PyIter_Next(sqp->it);
                if (item)
                    ++numactive;
                else {
                    if (PyErr_Occurred()) {
                        Py_XDECREF(alist);
                        goto Fail_1;
                    }
                    Py_INCREF(Py_None);
                    item = Py_None;
                    sqp->saw_StopIteration = 1;
                }
            }
            if (alist)
                PyTuple_SET_ITEM(alist, j, item);
            else
                break;
        }

        if (!alist)
            alist = item;

        if (numactive == 0) {
            Py_DECREF(alist);
            break;
        }

        if (func == Py_None)
            value = alist;
        else {
            value = PyEval_CallObject(func, alist);
            Py_DECREF(alist);
            if (value == NULL)
                goto Fail_1;
        }
        if (i >= len) {
            int status = PyList_Append(result, value);
            Py_DECREF(value);
            if (status < 0)
                goto Fail_1;
        }
        else if (PyList_SetItem(result, i, value) < 0)
            goto Fail_1;
    }

    if (i < len && PyList_SetSlice(result, i, len, NULL) < 0)
        goto Fail_1;

    goto Succeed;

Fail_1:
    Py_DECREF(result);
Fail_2:
    result = NULL;
Succeed:
    assert(seqs);
    for (i = 0; i < n; ++i)
        Py_XDECREF(seqs[i].it);
    PyMem_DEL(seqs);
    return result;
}

That's a lot of code! It wouldn't be bad, for C code, except for the fact that there's far too much boilerplate: every single call into the C-API needs to check for an exception, and INCREF and DECREF calls are littered throughout the code. Compare this with PyPy's RPython implementation:

def map(space, w_func, collections_w):
    """does 3 separate things, hence this enormous docstring.
       1.  if function is None, return a list of tuples, each with one
           item from each collection.  If the collections have different
           lengths,  shorter ones are padded with None.

       2.  if function is not None, and there is only one collection,
           apply function to every item in the collection and return a
           list of the results.

       3.  if function is not None, and there are several collections,
           repeatedly call the function with one argument from each
           collection.  If the collections have different lengths,
           shorter ones are padded with None
    """
    if not collections_w:
        msg = "map() requires at least two arguments"
        raise OperationError(space.w_TypeError, space.wrap(msg))
    num_collections = len(collections_w)
    none_func = space.is_w(w_func, space.w_None)
    if none_func and num_collections == 1:
        return space.call_function(space.w_list, collections_w[0])
    result_w = []
    iterators_w = [space.iter(w_seq) for w_seq in collections_w]
    num_iterators = len(iterators_w)
    while True:
        cont = False
        args_w = [space.w_None] * num_iterators
        for i in range(len(iterators_w)):
            try:
                args_w[i] = space.next(iterators_w[i])
            except OperationError, e:
                if not e.match(space, space.w_StopIteration):
                    raise
            else:
                cont = True
        w_args = space.newtuple(args_w)
        if cont:
            if none_func:
                result_w.append(w_args)
            else:
                w_res = space.call(w_func, w_args)
                result_w.append(w_res)
        else:
            return space.newlist(result_w)
map.unwrap_spec = [ObjSpace, W_Root, "args_w"]

It's not exactly what you'd write for a pure Python implementation of map, but it's a hell of a lot closer than the C version.

The case for PyPy being the future is strong, I think, however it's not all sunshine are roses, there are a few issues. It lags behind CPython's version (right now Python 2.5 is implemented), C extension compatibility isn't there yet, and not enough people are trying it out yet. But PyPy is getting there, and you can help.

Right now the single biggest way to help for most people is to test their code. Any pure Python code targeting Python 2.5 should run perfectly under PyPy, and if it doesn't: it's a bug, if it's slower than Python: let us know (unless it involves re, we know it's slow). Maybe try out your C-extensions, however cpyext is very alpha and even a segfault isn't surprising (but let us know so we can investigate). Of course help on development is always appreciated, right now most of the effort is going into speeding up the JIT even more, however I believe there is also going to be work on moving up to Python 2.7 (currently pre-release) this summer. If you're interested in helping out with either you should hop into #pypy on irc.freenode.net, or send a message to pypy-dev. PyPy's doing good work, Python doesn't need to be slow, and we don't all need to write C code!

You can find the rest here. There are view comments.

Making Django and PyPy Play Nice (Part 1)

Posted April 16th, 2010. Tagged with django, python, pypy.

If you track Django's commits aggressivly (ok so just me...), you may have noticed that there have been a number of commits to improve the compatibility of Django and PyPy in the last couple of days. In the run up to Django 1.0 there were a ton of commits to make sure Django didn't have too many assumptions that it was running on CPython, and that it could work with systems like Jython and PyPy. Unfortunately, since then, our support has laxed a little, and a number of tests have begun to fail. In the past couple of days I've been working to correct this. Here are some of the things that were wrong

The first issue I ran into was, in various tests, response.context and response.template being None, instead of the lists that were expected. This was a pain to diagnose, but the ultimate source of the bug is that Django registers a signal handler in the test client that listens for templates being rendered. However, it doesn't actually unregister that signal receiver. Instead it relies on the fact that signals are stored as weakrefs, and when the function ends, the receivers that were registered (which were local variables) would be automatically deallocated. On PyPy, Jython, and any other system with a garbage collector more advanced than CPython's reference counting, the local variables aren't guaranteed to be deallocated at the end of the function, and therefore the weakref can still be alive. Truthfully, I'm not 100% sure how this results in the next signal being sent not to store the appropriate data, but it does. The solution is the make sure that the signals are manually disconnected at the end of the run. This was fixed in r12964 of Django.

The next issue was actually a problem in PyPy, specifically it was crashing with a UnicodeDecodeError. When I say crashing I mean crashing in the C sense of the word, not the Python, nice exception and stack trace sense... sort of. PyPy is written in a language named RPython, RPython is Pythonesque (all valid RPython is valid Python), and has exceptions. However if they aren't caught they sort of propagate to the top, they give you a kind-of-ok stacktrace, but its function names are all the generated function names from the C source, not useful ones from the RPython source. Internally, PyPy uses an OperationError to keep track of exceptions at the interpreter level. A trick to debugging RPython is, if running the code on top of CPython works, than running it translated to C will work, and the contrapositive appears true as well, if the C doesn't work, running on CPython won't work. After trying to run the code on CPython, the location of the exception bubbled right to the top, and the fix followed easily.

These are the first two issues I fixed, a couple others have been fixed and committed, and a further few have also been fixed, but not committed yet. I'll be writing about those as I find the time.

You can find the rest here. There are view comments.

Languages Don't Have Speeds, Or Do They?

Posted March 15th, 2010. Tagged with python, compiler, pypy, programming-languages, psyco.

Everyone knows languages don't have speeds, implementations do. Python isn't slow; CPython is. Javascript isn't fast; V8, Squirrelfish, and Tracemonkey are. But what if a language was designed in such a way that it appeared that it was impossible to be implemented efficiently? Would it be fair to say that language is slow, or would we still have to speak in terms of implementations? For a long time I followed the conventional wisdom, that languages didn't have speeds, but lately I've come to believe that we can learn something by thinking about what the limits on how fast a language could possibly be, given a perfect implementation.

For example consider the following Python function:

def f(n):
    i = 0
    while i < n:
        i += 1
        n += i
    return n

And the equivilant C function:

int f(int n) {
    int i = 0;
    while (i < n) {
        i += 1;
        n += i;
    }
    return n;
}

CPython probably runs this code 100 times slower than the GCC compiled version of the C code. But we all know CPython is slow right? PyPy or Psyco probably runs this code 2.5 times slower than the C version (I'm just spitballing here). Psyco and PyPy are, and contain, really good just in time compilers that can profile this code, see that f is always called with an integer, and therefore a much more optimized version can be generated in assembly. For example the optimized version could generate just a few add instructions in the inner loop (plus a few more instructions to check for overflow), this would skip all the indirection of calling the __add__ function on integers, allocating the result on the heap, and the indirection of calling the __lt__ function on integers, and maybe even some other things I missed.

But there's one thing no JIT for Python can do, no matter how brilliant. It can't skip the check if n is an integer, because it can't prove it always will be an integer, someone else could import this function and call it with strings, or some custom type, or anything they felt like, so the JIT must verify that n is an integer before running the optimized version. C doesn't have to do that. It knows that n will always be an integer, even if you do nothing but call f until the end of the earth, GCC can be 100% positive that n is an integer.

The absolute need for at least a few guards in the resulting assembly guarantees that even the most perfect JIT compiler ever, could not generate code that was strictly faster than the GCC version. Of course, a JIT has some advantages over a static compiler, for example, it can inline at dynamic call sites. However, in practice I don't believe this ability is ever likely to beat a static compiler for a real world program. On the other hand I'm not going to stop using Python any time soon, and it's going to continue to get faster, a lot faster.

You can find the rest here. There are view comments.

PyCon Roundup - Days 2-4

Posted March 8th, 2010. Tagged with pycon, pypy, unladen-swallow.

As I said in my last post PyCon was completely and utterly awesome, but it was also a bit of a blur. Here's my attempt at summarizing everything that happened during the conference itself.

Day 2

Day 2 (the first day of the conference itself) started out with Guido's keynote. He did something rather unorthodox, instead of delivering a formal talk he just took audience questions via Twitter, starting out using Twitterfall, but it was a tad slow so Jacob Kaplan-Moss switched it out for the PyCon Live Stream that Eric Florenzano, Brian Rosner, and I built, switch was very awesome (seeing your creation on four projectors in front of an audience is a pretty good ego boost). Next up I had to deliver my own talk, it went well I think, you can find my slides and a video on the PyCon website. The rest of the day was a bit of a blur, but I enjoyed James Bennet's talk, the Form generator's panel, Jonathan Ellis's talk on database scalability, and Alex Martelli's talk.

That evening I got to visit the Django restaurant, I can't help but imagine what the staff there thought about the crazy groups of programmers visiting (I think we mobbed the place every night of the conference), many of us wearing our Django shirts.

Day 3

Day 3 was really about the VMs. It started with the Iron Python and PyPy keynotes. These were followed by Mark Shuttleworth's keynote, his slides weren't working (the perils of using an operating system alpha release), but he still delivered an awesome talk on software development processes. From there I went to the single best stretch of talks at PyCon. The Speed of PyPy, Unladen Swallow: fewer coconuts, faster Python, and Understanding the Python GIL. Three great topics, three great speakers, three great talks, one room. Simply put, you should drop whatever you're doing and watch each of these talks, they're all great. I was sad to miss Raymond Hettinger's talk for the Django Software Foundation meeting, but I caught the video and it was excellent (as usual for Raymond). The DSF meeting was interesting, but not super exciting, a lot of "action needed" tasks, some of which are already happening (like getting better buildbots running). Finally I caught the tail end of the Neo4j talk, I'd need to watch the full thing, because the ending caught my attention.

I ended up back at the Django restaurant again, this time for the speakers, volunteers, and sponsors dinner. Django wasn't quite equipped to handle the sheer number of us that showed us, but it was a good time nonetheless. Once again I was blown away by how many Google people there were. There were far too many awesome people to list, but suffice it to say that you should volunteer or speak at PyCon, if for no other reason than to get to attend this dinner, there are awesome people to have a conversation with as far as the eyes can see! After dinner I ended up staying up far too late with another group of awesome people. I'm told the testing birds of a feather was also great (I suppose events that aren't completely awesome don't invent things like the Testing Goat).

Day 4

Here I suffered the effects of the aforementioned late night. I missed all of the keynotes, which is unfortunately considering they all looked quite good, I'll have to catch the videos. The poster session was very cool, I only got to see about half of it, but it's definitely something I'm going to look forward to at future PyCons. Next I saw Donovan Preston's talk on eventlet, a talk on teaching compilers with Python. I missed Scott Chacon's talk on hg-git, though I can't remember what for. I'll have to catch it in video because I was really looking forward to it.

After that there was time for a final pizza with friends, and then I had to head home. I sprinted remotely, but it's not quite the same as being there. It's my hope that next year I can attend the sprints in person, but school never seems to work out for me in that respect.

You can find the rest here. There are view comments.

Committer Models of Unladen Swallow, PyPy, and Django

Posted February 25th, 2010. Tagged with django, python, pypy, unladen-swallow, open-source.

During this year's PyCon I became a committer on both PyPy and Unladen Swallow, in addition I've been a contributer to Django for quite a long time (as well as having commit privileges to my branch during the Google Summer of Code). One of the things I've observed is the very different models these projects have for granting commit privileges, and what the expectations and responsibilities are for committers.

Unladen Swallow

Unladen Swallow is a Google funded branch of CPython focused on speed. One of the things I've found is that the developers of this project carry over some of the development process from Google, specifically doing code review on every patch. All patches are posted to Rietveld, and reviewed, often by multiple people in the case of large patches, before being committed. Because there is a high level of review it is possible to grant commit privileges to people without requiring perfection in their patches, as long as they follow the review process the project is well insulated against a bad patch.

PyPy

PyPy is also an implementation of Python, however its development model is based largely around aggressive branching (I've never seen a project handle SVN's branching failures as well as PyPy) as well as sprints and pair programming. By branching aggressively PyPy avoids the overhead of reviewing every single patch, and instead only requires review when something is already believed to be "trunk-ready", further this model encourages experimentation (in the same way git's light weight branches do). PyPy's use of sprints and pair programming are two ways to avoid formal code reviews and instead approach code quality as more of a collaborative effort.

Django

Django is the project I've been involved with for the longest, and also the only one I don't have commit privileges on. Django is extremely conservative in giving out commit privileges (there about a dozen Django committers, and about 500 names in the AUTHORS file). Django's development model is based neither on branching (only changes as large in scope as multiple database support, or an admin UI refactor get their own branch) nor on code review (most commits are reviewed by no one besides the person who commits them). Django's committers maintain a level of autonomy that isn't seen in either of the other two projects. This fact comes from the period before Django 1.0 was released when Django's trunk was often used in production, and the need to keep it stable at all times, combined with the fact that Django has no paid developers who can guarantee time to do code review on patches. Therefore Django has maintained code quality by being extremely conservative in granting commit privileges and allowing developers with commit privileges to exercise their own judgment at all times.

Conclusion

Each of these projects uses different methods for maintaining code quality, and all seem to be successful in doing so. It's not clear whether there's any one model that's better than the others, or that any of these projects could work with another's model. Lastly, it's worth noting that all of these models are fairly orthogonal to the centralized VCS vs. DVCS debate which often surrounds such discussions.

You can find the rest here. There are view comments.

A Bit of Benchmarking

Posted November 22nd, 2009. Tagged with django, python, compiler, pypy, programming-languages.

PyPy recently posted some interesting benchmarks from the computer language shootout, and in my last post about Unladen Swallow I described a patch that would hopefully be landing soon. I decided it would be interesting to benchmarks something with this patch. For this I used James Tauber's Mandelbulb application, at both 100x100 and 200x200. I tested CPython, Unladen Swallow Trunk, Unladen Swallow Trunk with the patch, and a recent PyPy trunk (compiled with the JIT). My results were as follows:

VM 100 200
CPython 2.6.4 17s 64s
Unladen Swallow Trunk 16s 52s
Unladen swallow Trunk + Patch 13s 49s
PyPy Trunk 10s 46s

Interesting results. At 100x100 PyPy smokes everything else, and the patch shows a clear benefit for Unladen. However, at 200x200 both PyPy and the patch show diminishing returns. I'm not clear on why this is, but my guess is that something about the increased size causes a change in the parameters that makes the generated code less efficient for some reason.

It's important to note that Unladen Swallow has been far less focussed on numeric benchmarks than PyPy, instead focusing on more web app concerns (like template languages). I plan to benchmark some of these as time goes on, particularly after PyPy merges their "faster-raise" branch, which I'm told improves PyPy's performance on Django's template language dramatically.

You can find the rest here. There are view comments.

Things College Taught me that the "Real World" Didn't

Posted November 21st, 2009. Tagged with django, python, compile, ply, lex, yacc, c++, response, compiler, pypy, unladen-swallow, programming-languages, parse, college.

A while ago Eric Holscher blogged about things he didn't learn in college. I'm going to take a different spin on it, looking at both things that I did learn in school that I wouldn't have learned else where (henceforth defined as my job, or open source programming), as well as thinks I learned else where instead of at college.

Things I learned in college:

  • Big O notation, and algorithm analysis. This is the biggest one, I've had little cause to consider this in my open source or professional work, stuff is either fast or slow and that's usually enough. Learning rigorous algorithm analysis doesn't come up all the time, but every once in a while it pops up, and it's handy.
  • C++. I imagine that I eventually would have learned it myself, but my impetus to learn it was that's what was used for my CS2 class, so I started learning with the class then dove in head first. Left to my own devices I may very well have stayed in Python/Javascript land.
  • Finite automaton and push down automaton. I actually did lexing and parsing before I ever started looking at these in class (see my blog posts from a year ago) using PLY, however, this semester I've actually been learning about the implementation of these things (although sadly for class projects we've been using Lex/Yacc).

Things I learned in the real world:

  • Compilers. I've learned everything I know about compilers from reading my papers from my own interest and hanging around communities like Unladen Swallow and PyPy (and even contributing a little).
  • Scalability. Interesting this is a concept related to algorithm analysis/big O, however this is something I've really learned from talking about this stuff with guys like Mike Malone and Joe Stump.
  • APIs, Documentation. These are the core of software development (in my opinion), and I've definitely learned these skills in the open source world. You don't know what a good API or documentation is until it's been used by someone you've never met and it just works for them, and they can understand it perfectly. One of the few required, advanced courses at my school is titled, "Software Design and Documentation" and I'm deathly afraid it's going to waste my time with stuff like UML, instead of focusing on how to write APIs that people want to use and documentation that people want to read.

So these are my short lists. I've tried to highlight items that cross the boundaries between what people traditionally expect are topics for school and topics for the real world. I'd be curious to hear what other people's experience with topics like these are.</div>

You can find the rest here. There are view comments.

Another Pair of Unladen Swallow Optimizations

Posted November 19th, 2009. Tagged with django, python, pypy, unladen-swallow, programming-languages.

Today a patch of mine was committed to Unladen Swallow. In the past weeks I've described some of the optimizations that have gone into Unladen Swallow, in specific I looked at removing the allocation of an argument tuple for C functions. One of the "on the horizon" things I mentioned was extending this to functions with a variable arity (that is the number of arguments they take can change). This has been implemented for functions that take a finite range of argument numbers (that is, they don't take *args, they just have a few arguments with defaults). This support was used to optimize a number of builtin functions (dict.get, list.pop, getattr for example).

However, there were still a number of functions that weren't updated for this support. I initially started porting any functions I saw, but it wasn't a totally mechanical translation so I decided to do a little profiling to better direct my efforts. I started by using the cProfile module to see what functions were called most frequently in Unladen Swallow's Django template benchmark. Imagine my surprise when I saw that unicode.encode was called over 300,000 times! A quick look at that function showed that it was a perfect contender for this optimization, it was currently designated as a METH_VARARGS, but in fact it's argument count was a finite range. After about of dozen lines of code, to change the argument parsing, I ran the benchmark again, comparing it a control version of Unladen Swallow, and it showed a consistent 3-6% speedup on the Django benchmark. Not bad for 30 minutes of work.

Another optimization I want to look at, which hasn't landed yet, is one of optimize various operations. Right now Unladen Swallow tracks various data about the types seen in the interpreter loop, however for various operators this data isn't actually used. What this patch does is check at JIT compilation time whether the operator site is monomorphic (that is there is only one pair of types ever seen there), and if it is, and it is one of a few pairings that we have optimizations for (int + int, list[int], float - float for example) then optimized code is emitted. This optimized code checks the types of both the arguments that they are the expected ones, if they are then the optimized code is executed, otherwise the VM bails back to the interpreter (various literature has shown that a single compiled optimized path is better than compiling both the fast and slow paths). For simple algorithm code this optimization can show huge improvements.

The PyPy project has recently blogged about the results of the results of some benchmarks from the Computer Language Shootout run on PyPy, Unladen Swallow, and CPython. In these benchmarks Unladen Swallow showed that for highly algorithmic code (read: mathy) it could use some work, hopefully patches like this can help improve the situation markedly. Once this patch lands I'm going to rerun these benchmarks to see how Unladen Swallow improves, I'm also going to add in some of the more macro benchmarks Unladen Swallow uses to see how it compares with PyPy in those. Either way, seeing the tremendous improvements PyPy and Unladen Swallow have over CPython gives me tremendous hope for the future.

You can find the rest here. There are view comments.

Optimising compilers are there so that you can be a better programmer

Posted October 10th, 2009. Tagged with django, python, compiler, pypy, unladen-swallow.

In a discussion on the Django developers mailing list I recently commented that the performance impact of having logging infrastructure, in the case where the user doesn't want the logging, could essentially be disregarded because Unladen Swallow (and PyPy) are bringing us a proper optimising (Just in Time) compiler that would essentially remove that consideration. Shortly thereafter someone asked me if I really thought it was the job of the interpreter/compiler to make us not think about performance. And my answer is: the job of a compiler is to let me program with best practices and not suffer performance consequences for doing things the right way.

Let us consider the most common compiler optimisations. A relatively simple one is function inlining, in the case where including the body of the function would be more efficient than actually calling it, a compiler can simply move the functions body into its caller. However, we can actually do this optimisation in our own code. We could rewrite:

def times_2(x):
    return x * 2

def do_some_stuff(i):
    for x in i:
        # stuff
        z = times_2(x)
    # more stuff

as:

def do_some_stuff(i):
    for x in i:
        # stuff
        z = x * 2
    # more stuff

And this is a trivial change to make. However in the case where times_2 is slightly less trivial, and is used a lot in our codebase it would be exceptionally more programming practice to repeat this logic all over the place, what if we needed to change it down the road? Then we'd have to review our entire codebase to make sure we changed it everywhere. Needless to say that would suck. However, we don't want to give up the performance gain from inlining this function either. So here it's the job of the compiler to make sure functions are inlined when possible, that way we get the best possible performance, as well as allowing us to maintain our clean codebase.

Another common compiler optimisation is to transform multiplications by powers of 2 into binary shifts. Thus x * 2 becomes x << 1 A final optimisation we will consider is constant propagation. Many program have constants that are used throughout the codebase. These are often simple global variables. However, once again, inlining them into methods that use them could provide a significant benefit, by not requiring the code to making a lookup in the global scope whenever they are used. But we really don't want to do that by hand, as it makes our code less readable ("Why are we multiplying this value by this random float?", "You mean pi?", "Oh."), and makes it more difficult to update down the road. Once again our compiler is capable of saving the day, when it can detect a value is a constant it can propagate it throughout the code.

So does all of this mean we should never have to think about writing optimal code, the compiler can solve all problems for us? The answer to this is a resounding no. A compiler isn't going to rewrite your insertion sort into Tim sort, nor is it going to fix the fact that you do 700 SQL queries to render your homepage. What the compiler can do is allow you to maintain good programming practices.

So what does this mean for logging in Django? Fundamentally it means that we shouldn't be concerned with possible overhead from calls that do nothing (in the case where we don't care about the logging) since a good compiler will be able to eliminate those for us. In the case where we actually do want to do something (say write the log to a file) the overhead is unavoidable, we have to open a file and write to it, there's no way to optimise it out.

You can find the rest here. There are view comments.