alex gaynor's blago-blog

Posts tagged with compile

Thoughts on HipHop PHP

Posted February 2nd, 2010. Tagged with python, compile, c++, compiler, unladen-swallow, programming-languages, php.

This morning Facebook announced, as had been rumored for several weeks, a new, faster, implementation of PHP. To someone, like me, who loves dynamic languages and virtual machines this type of announcement is pretty exciting, after all, if they have some new techniques for optimizing dynamic languages they can almost certainly be ported to a language I care about. However, because of everything I've read (and learned about PHP, the language) since the announcement I'm not particularly excited about HipHop.

Firstly, there's the question of what problem HipHop solves. It aims to improve the CPU usage of PHP applications. For all practical purposes PHP exists exclusively to serve websites (yes, it can do other things, no one does them). Almost every single website on the internet is I/O bound, not CPU bound, web applications spend their time waiting on external resources (databases, memcache, other HTTP resources, etc.). So the part of me that develops websites professionally isn't super interested, Facebook is in the exceptionally rare circumstance that they've optimized their I/O to the point that optimizing CPU gives worthwhile returns. However, the part of me that spends his evenings contributing to Unladen Swallow and hanging around PyPy still thought that there might be some interesting VM technology to explore.

The next issue for consideration was the "VM" design Facebook choose. They've elected to compile PHP into C++, and then use a C++ compiler to get a binary out of it. This isn't a particularly new technique, in the Python world projects like Shedskin and Cython have exploited a similar technique to get good speed ups. However, Facebook also noted that in doing so they had dropped support for "some rarely used features — such as eval()". An important question is which features exactly, they had dropped support for. After all, the reason compiling a dynamic language to efficient machine code is difficult is because the dynamicism defeats the compiler's ability to optimize, but if you remove the dynamicism you remove the obstacles to efficient compilation. However, you're also not compiling the same language. PHP without eval(), and whatever else they've removed is quite simply a different language, for this reason I don't consider either Shedskin or Cython to be an implementation of Python, because they don't implement the entire language.

This afternoon, while I was idling in the Unladen Swallow IRC channel a discussion about HipHop came up, and I learned a few things about PHP I hadn't previous realized. The biggest of these is that a name bound to a function in PHP cannot be undefined, or redefined. If you've ever seen Collin Winter give a talk about Unladen Swallow, the canonical example of Python's dynamicism defeating a static compiler is the len() function. For lists, tuples, or dicts a call to len() should be able to be optimized to a single memory read out of a field on the object, plus a call to instantiate an integer object. However, in CPython today it's actually about 3 function calls, and 3 memory reads to get this data (plus the call to instantiate an integer object), plus the dictionary lookup in the global builtins to see what the object named len is. That's a hell of a lot more work than a single memory read (which is one instruction on an x86 CPU). The reason CPython needs to do all that work is that it a) doesn't know what the len object is, and b) when len is called it has no idea what its arguments will be.

As I've written about previously, Unladen Swallow has some creative ways to solve these problems, to avoid the dictionary lookups, and, eventually, to inline the body of the len() into the caller and optimize if for the types it's called with. However, this requires good runtime feedback, since the compiler simply cannot know statically what any of the objects will actually be at runtime. However, if len could be know to be the len() function at compile time Unladen Swallow could inline the body of the function, unconditionally, into the caller. Even with only static specialization for lists, dicts, and tuples like:

if isinstance(obj, list):
    return obj.ob_size
elif isinstance(obj, tuple):
    return obj.ob_size
elif isisinstance(obj, dict):
    return obj.ma_fill
else:
    return obj.__len__()

This would be quite a bit faster than the current amount of indirection. In PHP's case it's actually even easier, it only has one builtin array type, which acts as both a typical array as well as a hash table. Now extend this possibly optimization to not only every builtin, but every single function call. Instead of the dictionary lookups Python has to do for every global these can just become direct function calls.

Because of differences like this (and the fact that PHP only has machine sized integers, not arbitrary sized ones, and not implementing features of the language such as eval()) I believe that the work done on HipHop represents a fundamentally smaller challenge than that taken on by the teams working to improve the implementations of languages like Python, Ruby, or Javascript. At the time of the writing the HipHop source has not been released, however I am interested to see how they handle garbage collection.

You can find the rest here. There are view comments.

Things College Taught me that the "Real World" Didn't

Posted November 21st, 2009. Tagged with django, python, compile, ply, lex, yacc, c++, response, compiler, pypy, unladen-swallow, programming-languages, parse, college.

A while ago Eric Holscher blogged about things he didn't learn in college. I'm going to take a different spin on it, looking at both things that I did learn in school that I wouldn't have learned else where (henceforth defined as my job, or open source programming), as well as thinks I learned else where instead of at college.

Things I learned in college:

  • Big O notation, and algorithm analysis. This is the biggest one, I've had little cause to consider this in my open source or professional work, stuff is either fast or slow and that's usually enough. Learning rigorous algorithm analysis doesn't come up all the time, but every once in a while it pops up, and it's handy.
  • C++. I imagine that I eventually would have learned it myself, but my impetus to learn it was that's what was used for my CS2 class, so I started learning with the class then dove in head first. Left to my own devices I may very well have stayed in Python/Javascript land.
  • Finite automaton and push down automaton. I actually did lexing and parsing before I ever started looking at these in class (see my blog posts from a year ago) using PLY, however, this semester I've actually been learning about the implementation of these things (although sadly for class projects we've been using Lex/Yacc).

Things I learned in the real world:

  • Compilers. I've learned everything I know about compilers from reading my papers from my own interest and hanging around communities like Unladen Swallow and PyPy (and even contributing a little).
  • Scalability. Interesting this is a concept related to algorithm analysis/big O, however this is something I've really learned from talking about this stuff with guys like Mike Malone and Joe Stump.
  • APIs, Documentation. These are the core of software development (in my opinion), and I've definitely learned these skills in the open source world. You don't know what a good API or documentation is until it's been used by someone you've never met and it just works for them, and they can understand it perfectly. One of the few required, advanced courses at my school is titled, "Software Design and Documentation" and I'm deathly afraid it's going to waste my time with stuff like UML, instead of focusing on how to write APIs that people want to use and documentation that people want to read.

So these are my short lists. I've tried to highlight items that cross the boundaries between what people traditionally expect are topics for school and topics for the real world. I'd be curious to hear what other people's experience with topics like these are.</div>

You can find the rest here. There are view comments.

Writing a Lexer

Posted November 17th, 2009. Tagged with python, compile, lex, programming-languages, parse.

People who have been reading this blog since last year (good lord) may recall that once upon a time I did a short series of posts on lexing and parsing using PLY. Back then I was working on a language named Al. This past week or so I've started working on another personal project (not public yet) and I've once again had the need to lex things, but this time I wrote my lexer by hand, instead of using any sort of generator. This has been an exceptional learning experience, so I'd like to pass some of that on to you.

The first thing to note is that writing a lexer is a great place to TDD (test driven development), I've rewritten various parts of my lexer five or more times, I've needed my tests to keep me sane. Got your tests written? Ok it's time to dive right into our lexer.

I've structured my lexer as a single class that takes an input string, and has a parse method which returns a generator that yields tokens (tokens are just a namedtuple with a name and value field). The parser has two important attributes, state which is a string that says what state the lexer is in (this is used for tokens that are more than one character long), and current_val which is a list containing characters that will eventually become the value for the current token being found.

The parse method iterates through characters in the text and then it checks, if the parser has a state (self.state is not None) it does getattr(self, self.state)(character). Otherwise it calls self.generic(character). Then the various "state methods" are responsible for mutating self.current_val and self.state and returning a Token. So for example the string state looks like this:

def string(self, ch):
    if ch == '"':
        sym = Symbol("STRING", "".join(self.current_val))
        self.current_val = []
        self.state = None
        return sym
    elif ch == "\\":
        self.state = "string_escaped"
    else:
        self.current_val.append(ch)

If the character is a quote then we're closing our string so we return our string Symbol, reset the current_val and state. If the character is a then we switch into a string_escaped state which knows to handle the character as a literal and then go back to string state. If the character is anything else then we just append it to the current_val, it will get handled at the end of the string.

I've found this to be an exceptionally powerful method, and it makes my end result code very readable. Hopefully I'll be able to reveal my project in the coming weeks, as I'm very excited about it, even if it's not ready I'll continue to share these lessons learned as I go.

You can find the rest here. There are view comments.

Diving into Unladen Swallow's Optimizations

Posted November 3rd, 2009. Tagged with python, compile, internals, compiler, unladen-swallow.

Yesterday I described the general architecture of Unladen Swallow, and I said that just by switching to a JIT compiler and removing the interpretation overhead Unladen Swallow was able to get a performance gain. However, that gain is nowhere near what the engineers at Google are hoping to accomplish, and as such they've been working on building various optimizations into their JIT. Here I'm going to describe two particularly interesting ones they implemented during the 3rd quarter (they're doing quarterly releases).

Before diving into the optimizations themselves I should note there's one piece of the Unladen Swallow architecture I didn't discuss in yesterday's post. The nature of dynamic languages is that given code can do nearly anything depending on the types of the variables present, however in practice usually very few types are seen. Therefore it is necessary to collect information about the types seen in practice in order to perform optimizations. Therefore what Unladen Swallow has done is added data collection to the interpreter while it is executing bytecode. For example the BINARY_ADD opcode records the types of both of it's operands, the CALL_FUNCTION opcode records the function it is calling, and the UNPACK_SEQUENCE opcode records the type of the sequence it's unpacking. This data is then used when the function is compiled to generate optimal code for the most likely scenarios.

The first optimization I'm going to look at is one for the CALL_FUNCTION opcode. Python has a number of flags that functions defined in C can have, the two relevant to this optimization are METH_NOARGS and METHO_O. These flags indicate that the function (or method) in question take either 0 or 1 argument respectively (this is excluding the self argument on methods). Normally when Python calls a function it builds up a tuple of the arguments, and a dictionary for keyword arguments. For functions defined in Python CPython lines up the arguments with those the function takes and then sets them as local variables for the new function. For C functions they are given the tuple and dictionary directly and are responsible for parsing them themselves. By contrast functions with METH_NOARGS or METH_O receive their arguments (or nothing in the case of METH_NOARGS) directly.

Because calling METH_NOARGS and METH_O functions is so much easier than the alternate case (which involves several allocations and complex logic) when possible it is best to special case them in the generated assembly. Therefore, when compiling a CALL_FUNCTION opcode if using the data recorded there is only ever one function called (imagine a call to len, it is going to be the same len function every time), and that function is METH_NOARGS or METH_O then instead of generating a call to the usual function call machinery Unladen Swallow instead emits a check to make sure the function is actually the expected one and if it passes emits a call directly to the function with the correct arguments. If this guard fails then Unladen Swallow jumps back to the regular interpreter, leaving the optimized assembly. The reason for this is that the ultimately generated assembly can be more efficient when it only has to consider one best case scenario, as opposed to needing to deal with a large series of if else statements, which catalogue every single best case and the corresponding normal case. Ultimately, this results in more efficient code for calls to functions like len(), which are basically never redefined.

The next optimization we're going to look at is one for the LOAD_GLOBAL function. The LOAD_GLOBAL opcode is used for getting the value of a global variable, such as a builtin function, an imported class, or a global variable in the same module. In the interpreter the code for this opcode looks something like:

name = OPARG()
try:
    value = globals[name]
except KeyError:
    try:
        value = builtins[name]
    except KeyError:
        raise_exception(KeyError, name)
PUSH(value)

As you can see in the case of a builtin object (something like len, str, or dict) there are two dictionary lookups. While the Python dictionary is an exceptionally optimized datastructure it still isn't fast compared to a lookup of a local value (which is a single lookup in a C array). Therefore the goal of this optimization is to reduce the number of dictionary lookups needed to find the value for a global or builtin.

The way this was done was for code objects (the datastructures that hold the opcodes and various other internal details for functions) to register themselves with the globals and builtin dictionaries. By registering themselves the dictionaries are able to notify the code objects (similar to Django signals) whenever they are modified. The result of this is that the generated assembly for a LOAD_GLOBAL can perform the dictionary lookup once at compilation time and then the resulting assembly will be valid until the globals or builtins dictionary notifies the code object that they have been modified, thus rendering the assembly invalid. In practice this is very efficient because globals and builtins are very rarely modified.

Hopefully you've gotten a sense of the type of work that the people behind Unladen Swallow are doing. If you're interested in reading more on this type of work I'd highly recommend taking a look at the literature listed on the Unladen Swallow wiki, as they note that there is no attempt to do any original research, all the work being done is simply the application of existing, proven techniques to the CPython interpreter.

For the rest of this month I'm going to try to give a preview of the next day's post with each post, that way I can start thinking about it well in advance. Tomorrow I'm going to shift gears a little bit and write about the ManyToManyField refactoring I did over the summer and which was just committed to Django.

You can find the rest here. There are view comments.

Introduction to Unladen Swallow

Posted November 2nd, 2009. Tagged with python, compile, internals, compiler, unladen-swallow.

Unless you've been living under a rock for the past year (or have zero interest in either Python or dynamic languages, it which case why are you here?) you've probably heard of Unladen Swallow. Unladen Swallow is a Google funded branch of the CPython interpreter, with a goal of making CPython significantly faster while retaining both API and ABI compatibility. In this post I'm going to try to explain what it is Unladen Swallow is doing to bring a new burst of speed to the Python world.

In terms of virtual machines there are a few levels of complexity, which roughly correspond to their speed. The simplest type of interpreter is an AST evaluator, these are more or less the lowest of the low on the speed totem pole, up until YARV was merged into the main Ruby interpreter, MRI (Matz Ruby Interpreter) was this type of virtual machine. The next level of VM is a bytecode interpreter, this means that the language is compiled to an intermediary format (bytecode) which is then executed. Strictly speaking this is an exceptionally broad category which encompasses most virtual machines today, however for the purposes of this article I'm going to exclude any VMs with a Just-In-Time compiler from this section (more on them later). The current CPython VM is this type of interpreter. The most complex (and fastest) type of virtual machine is one with a Just-In-Time compiler, this means that the bytecode that the virtual machine interprets is also dynamically compiled into assembly and executed. This type of VM includes modern Javascript interpreters such as V8, Tracemonkey, and Squirellfish, as well as other VMs like the Hotspot Java virtual machine.

Now that we know where CPython is, and what the top of the totem pole looks like it's probably clear what Unladen Swallow is looking to accomplish, however there is a bit of prior art here that's worthy of taking a look. There is actually currently a JIT for CPython, named Psyco. Psyco is pretty commonly used to speed up numerical code, as that's what it's best at, but it can speed up most of the Python language. However, Psyco is extremely difficult to maintain and update. It only recently gained support for modern Python language features like generators, and it still only supports x86 CPUs. For these reasons the developers at Google chose to build their JIT rather than work to improve the existing solution (they also chose not to use one of the alternative Python VMs, I'll be discussing these in another post).

I just said that Unladen Swallow looked to build their own JIT, but that's not entirely true. The developers have chosen not to develop their own JIT (meaning their own assembly generator, and register allocator, and optimizer, and everything else that goes along with a JIT), they have instead chosen to utilize the LLVM (Low Level Virtual Machine) JIT for all the code generation. What this means is that instead of doing all the work I've alluded the devs can instead translate the CPython bytecode into LLVM IR (intermediate representation) and then use LLVM's existing JIT infrastructure to do some of the heavy lifting. This gives the devs more time to focus on the interesting work of how to optimize the Python language.

Now that I've layed out the background I'm going to dive into what exactly it is that Unladen Swallow does. Right now the CPython virtual machine looks something like this:

for opcode in opcodes:
    if opcode == BINARY_ADD:
        x, y = POP(), POP()
        z = x + y
        PUSH(z)
    elif opcode == JUMP_ABSOLUTE:
        pc = OPARG()
    # ...

This is both hugely simplified and translated into a Pythonesque psuedocode, but hopefully it makes the point clear, right now the CPython VM runs through the opcodes and based on what the opcode is executes some C code. This is particularly inefficient because there is a fairly substantial overhead to actually doing the dispatch on the opcode. What Unladen Swallow does is count the number of times a given Python function is called (the heuristic is actually slightly more complicated than this, but it's a good approximation of what happens), and when it reaches 10000 (the same value the JVM uses) it stops to compile the function using LLVM. Here what it does is essentially unrolls the interpreter loop, into the LLVM IR. So if you had the bytecode:

BINARY_ADD

Unladen Swallow would generate code like:

x, y = POP(), POP()
z = x + y
PUSH(z)

This eliminates all of the overhead of the large loop in the interpreter. Unladen Swallow also performs a number of optimizations based on Python's semantics, but I'll be getting into those in another post, for now LLVM run it's optimizers, which can improve the generated code somewhat, and then CPython executes the generated function. Now whenever this function is called in the future the optimized, assembly version of it is called.

This concludes the introduction to Unladen Swallow. Hopefully you've learned something about the CPython VM, Unladen Swallow, or virtual machines in general. In future posts I'm going to be diving in to some of the optimizations Unladen Swallow does, as well as what other players are doing in this space (particularly PyPy).

You can find the rest here. There are view comments.

Optimizing a View

Posted January 19th, 2009. Tagged with django, orm, models, python, compile.

Lately I've been playing with a bit of a fun side project. I have about a year and half worth of my own chatlogs with friends(and 65,000 messages total) and I've been playing around with them to find interesting statistics. One facet of my communication with my friends is that we link each other lots of things, and we can always tell when someone is linking something that we've already seen. So I decided an interesting bit of information would be to see who is the worst offender.

So we want to write a function that returns the number of items each person has relinked, excluding items they themselves linked. So I started off with the most simple implementation I could, and this was the end result:

from collections import defaultdict
from operator import itemgetter

from django.utils.html import word_split_re

from logger.models import Message

def calculate_relinks():
    """
    Calculate the number of times each individual has linked something that was
    linked previously in the course of the chat.
    """
    links = defaultdict(int)
    for message in Message.objects.all().order_by('-time').iterator():
        words = word_split_re.split(message.message)
        for word in words:
            if word.startswith('http'):
                if Message.objects.filter(time__lt=message.time).filter(message__contains=word).exclude(speaker=message.speaker).count():
                    links[message.speaker] += 1
    links = sorted(links.iteritems(), key=itemgetter(1), reverse=True)
    return links

Here I iterated over the messages and for each one I went through each of the words and if any of them started with http(the definition of a link for my purposes) I checked to see if this had ever been linked before by someone other than the author of the current message.

This took about 4 minutes to execute on my dataset, it also executed about 10,000 SQL queries. This is clearly unacceptable, you can't have a view that takes that long to render, or hits your DB that hard. Even with aggressive caching this would have been unmaintainable. Further this algorithm is O(n**2) or thereabouts so as my dataset grows this would have gotten worse exponentially.

By changing this around however I was able to achieve far better results:

from collections import defaultdict
from operator import itemgetter

from django.utils.html import word_split_re

from logger.models import Message

def calculate_relinks():
    """
    Calculate the number of times each individual has linked something that was
    linked previously in the course of the chat.
    """
    links = defaultdict(set)
    counts = defaultdict(int)
    for message in Message.objects.all().filter(message__contains="http").order_by('time').iterator():
        words = word_split_re.split(message.message)
        for word in words:
            if word.startswith('http'):
                if any([word in links[speaker] for speaker in links if speaker != message.speaker]):
                    counts[message.speaker] += 1
                links[message.speaker].add(word)
    counts = sorted(counts.iteritems(), key=itemgetter(1), reverse=True)
    return counts

Here what I do is go through each of the messages which contain the string "http"(this is already a huge advantage since that means we process about 1/6 of the messages in Python that we originally were), for each message we go through each of the words in it, and for each that is a link we check if any other person has said it by looking in the caches we maintain in Python, and if they do we increment their count, finally we add the link to that persons cache.

By comparison this executes in .3 seconds, executes only 1 SQL query, and it will scale linearly(as well as is possible). For reference both of these functions are compiled using Cython. This ultimately takes almost no work to do and for computationally heavy operations this can provide a huge boon.

You can find the rest here. There are view comments.

A quick update

Posted November 23rd, 2008. Tagged with python, compile, al, c++.

I've now set up Al to be using GMP for all integers, and I'll be doing the same for floats once they get implemented. I haven't started benchmarking yet, but it can compile and calculate the factorial of 50000 pretty quickly, and in Python vanilla that would result in a RuntimeError due to a stack overflow, so it's a good starting point. Sorry for such a short post, I'm pretty tired today.

You can find the rest here. There are view comments.

My Programming Language - Status Update

Posted November 21st, 2008. Tagged with python, compile, al, c++.

Over the past few weeks I've been working on compiling my programming language. At present it works by translating the source into C++, and then you compile that with your compiler of choice. It's garbage collected, using the excellent Boehm GC library. At present it can only compile a limited subset of what it can actually parse, or what the interpreter supports. As of today thought it can compile and run a factorial function, however it can't calculate any factorial greater than 12, due to integer overflow issues. To solve this I'm either going to use GMP or roll my own Bignum library, and I'm not sure which yet. On the whole though, progress is good. The generated C++ is about as good as it could be considering the limitations inherent in turning an interpreted language into a compiled one. I haven't started benchmarking it yet, that was originally going to be the point of today's post before I ran into the integer overflow issues, however this is an example of the C++ code that is generated.

Give this Al(also valid Python):

def fact(n):
   if n == 1 or n == 0:
       return 1
   return n * fact(n-1)

print(fact(1))
print(fact(12))

It generated the following C++:

#include "src/base.h"

AlObj *fact;
class f0:public AlFunction
{
public:
 virtual AlObj * operator () (ARG_TYPE args, KWARG_TYPE kwargs)
 {
   AlObj *n = args.back ();
     args.pop_back ();
   if (*
 ((*((*(n)) == (AlObj *) (new AlInt (1))))
  || (*(n)) == (AlObj *) (new AlInt (0))))
     {
 return (AlObj *) (new AlInt (1));;
     }
   ARG_TYPE t0;
   t0.push_back ((*(n)) - (AlObj *) (new AlInt (1)));
   return (*(n)) * (*fact) (t0, KWARG_TYPE ());
 }
};

int
main ()
{
 fact = new f0 ();
 ARG_TYPE t1;
 ARG_TYPE t2;
 t2.push_back ((AlObj *) (new AlInt (1)));
 t1.push_back ((*fact) (t2, KWARG_TYPE ()));
 (*print) (t1, KWARG_TYPE ());
 ARG_TYPE t3;
 ARG_TYPE t4;
 t4.push_back ((AlObj *) (new AlInt (12)));
 t3.push_back ((*fact) (t4, KWARG_TYPE ()));
 (*print) (t3, KWARG_TYPE ());
}

All said and done, I'm pretty impressed! You can get all the code here, all the compilation work is in the code-generation branch.

You can find the rest here. There are view comments.

Building a Programming Language with Python

Posted November 6th, 2008. Tagged with python, compile, ply.

One of my side projects of late has been building a programming language in Python, using the PLY library. PLY is essentially a Python implementation of the classic Lex and Yacc tools. The language, at present, has a syntax almost exactly the same as Python's (the notable difference, in so far as features that have been implemented, is that you are not allowed to have multiple statements on the same line, or to put anything following a colon on the same line). The language(currently called 'Al' although that's more of a working name), is a dynamic language that builds up an syntax tree for the code, and than executes it. However, the long term goal is to have it actually be a compiled language, similar to Lisp or C. Essentially the mechanism for doing this will be the same as how a C++ compiler handles multiple dispatch, which is dynamically at run time.

At present however, this mythical fully compiled language is far from complete, I haven't even began to think about the assembly generation, mostly because I don't know assembly at all, and one of the courses I will be taking next semester is one which covers assembly code. However, the question that has to be asked, are what are the advantages of a compiled language, and what are the costs?

First the benefits:

  • It's faster, even a worst case C++ program that fully utilizes multiple dispatch at runtime will go faster than a program using the same algorithms in Python.
  • You get an executable at the end. This is a huge advantage for distribution, you don't need to distribute the source code, and you have an exe to give to people.

There are probably others, but I'm assuming the semantics of a language similar to Python, so I haven't included things like compile time type checking. And now the disadvantages:

  • You lose some of the dynamicism. Doing things like eval(), or dynamic imports is inherently harder, if not impossible.
  • You lose the REPL (interactive interpreter).

So can we overcome those? As far as I can tell the first should be doable, eval() necessitates the inclusion of an interpreter with the language, the thought of this already has to be making people think this is just going to end up as a VM. But, I think this can be overcome, we can know, at compile time, whether or not a user will be using eval, and decide then whether or not to compile the interpreter and link against it. Dynamic imports are, if anything harder, I think, I think this is just an issue of doing run time linking, but I'm not sure. As for the issue of the REPL, this is a non-issue as far as I'm concerned, there is no inherent reason a compiled language can't have a REPL, we just often don't, languages like Common Lisp have long had both.

So now, let's see some code. I hope to have some code to show off, that handles at least a subset of Python, for PyCon 2009, as work begins on assembly generation I will post here. For anyone interested in the code at present, you can see it here.

You can find the rest here. There are view comments.