Everyone knows languages don't have speeds, implementations do. Python isn't slow; CPython is. Javascript isn't fast; V8, Squirrelfish, and Tracemonkey are. But what if a language was designed in such a way that it appeared that it was impossible to be implemented efficiently? Would it be fair to say that language is slow, or would we still have to speak in terms of implementations? For a long time I followed the conventional wisdom, that languages didn't have speeds, but lately I've come to believe that we can learn something by thinking about what the limits on how fast a language could possibly be, given a perfect implementation.
For example consider the following Python function:
def f(n):
i = 0
while i < n:
i += 1
n += i
return n
And the equivilant C function:
int f(int n) {
int i = 0;
while (i < n) {
i += 1;
n += i;
}
return n;
}
CPython probably runs this code 100 times slower than the GCC compiled version of the C code. But we all know CPython is slow right? PyPy or Psyco probably runs this code 2.5 times slower than the C version (I'm just spitballing here). Psyco and PyPy are, and contain, really good just in time compilers that can profile this code, see that f is always called with an integer, and therefore a much more optimized version can be generated in assembly. For example the optimized version could generate just a few add instructions in the inner loop (plus a few more instructions to check for overflow), this would skip all the indirection of calling the __add__ function on integers, allocating the result on the heap, and the indirection of calling the __lt__ function on integers, and maybe even some other things I missed.
But there's one thing no JIT for Python can do, no matter how brilliant. It can't skip the check if n is an integer, because it can't prove it always will be an integer, someone else could import this function and call it with strings, or some custom type, or anything they felt like, so the JIT must verify that n is an integer before running the optimized version. C doesn't have to do that. It knows that n will always be an integer, even if you do nothing but call f until the end of the earth, GCC can be 100% positive that n is an integer.
The absolute need for at least a few guards in the resulting assembly guarantees that even the most perfect JIT compiler ever, could not generate code that was strictly faster than the GCC version. Of course, a JIT has some advantages over a static compiler, for example, it can inline at dynamic call sites. However, in practice I don't believe this ability is ever likely to beat a static compiler for a real world program. On the other hand I'm not going to stop using Python any time soon, and it's going to continue to get faster, a lot faster.
You can find the rest here. There are view comments.
This morning Facebook announced, as had been rumored for several weeks, a new, faster, implementation of PHP. To someone, like me, who loves dynamic languages and virtual machines this type of announcement is pretty exciting, after all, if they have some new techniques for optimizing dynamic languages they can almost certainly be ported to a language I care about. However, because of everything I've read (and learned about PHP, the language) since the announcement I'm not particularly excited about HipHop.
Firstly, there's the question of what problem HipHop solves. It aims to improve the CPU usage of PHP applications. For all practical purposes PHP exists exclusively to serve websites (yes, it can do other things, no one does them). Almost every single website on the internet is I/O bound, not CPU bound, web applications spend their time waiting on external resources (databases, memcache, other HTTP resources, etc.). So the part of me that develops websites professionally isn't super interested, Facebook is in the exceptionally rare circumstance that they've optimized their I/O to the point that optimizing CPU gives worthwhile returns. However, the part of me that spends his evenings contributing to Unladen Swallow and hanging around PyPy still thought that there might be some interesting VM technology to explore.
The next issue for consideration was the "VM" design Facebook choose. They've elected to compile PHP into C++, and then use a C++ compiler to get a binary out of it. This isn't a particularly new technique, in the Python world projects like Shedskin and Cython have exploited a similar technique to get good speed ups. However, Facebook also noted that in doing so they had dropped support for "some rarely used features — such as eval()". An important question is which features exactly, they had dropped support for. After all, the reason compiling a dynamic language to efficient machine code is difficult is because the dynamicism defeats the compiler's ability to optimize, but if you remove the dynamicism you remove the obstacles to efficient compilation. However, you're also not compiling the same language. PHP without eval(), and whatever else they've removed is quite simply a different language, for this reason I don't consider either Shedskin or Cython to be an implementation of Python, because they don't implement the entire language.
This afternoon, while I was idling in the Unladen Swallow IRC channel a discussion about HipHop came up, and I learned a few things about PHP I hadn't previous realized. The biggest of these is that a name bound to a function in PHP cannot be undefined, or redefined. If you've ever seen Collin Winter give a talk about Unladen Swallow, the canonical example of Python's dynamicism defeating a static compiler is the len() function. For lists, tuples, or dicts a call to len() should be able to be optimized to a single memory read out of a field on the object, plus a call to instantiate an integer object. However, in CPython today it's actually about 3 function calls, and 3 memory reads to get this data (plus the call to instantiate an integer object), plus the dictionary lookup in the global builtins to see what the object named len is. That's a hell of a lot more work than a single memory read (which is one instruction on an x86 CPU). The reason CPython needs to do all that work is that it a) doesn't know what the len object is, and b) when len is called it has no idea what its arguments will be.
As I've written about previously, Unladen Swallow has some creative ways to solve these problems, to avoid the dictionary lookups, and, eventually, to inline the body of the len() into the caller and optimize if for the types it's called with. However, this requires good runtime feedback, since the compiler simply cannot know statically what any of the objects will actually be at runtime. However, if len could be know to be the len() function at compile time Unladen Swallow could inline the body of the function, unconditionally, into the caller. Even with only static specialization for lists, dicts, and tuples like:
if isinstance(obj, list):
return obj.ob_size
elif isinstance(obj, tuple):
return obj.ob_size
elif isisinstance(obj, dict):
return obj.ma_fill
else:
return obj.__len__()
This would be quite a bit faster than the current amount of indirection. In PHP's case it's actually even easier, it only has one builtin array type, which acts as both a typical array as well as a hash table. Now extend this possibly optimization to not only every builtin, but every single function call. Instead of the dictionary lookups Python has to do for every global these can just become direct function calls.
Because of differences like this (and the fact that PHP only has machine sized integers, not arbitrary sized ones, and not implementing features of the language such as eval()) I believe that the work done on HipHop represents a fundamentally smaller challenge than that taken on by the teams working to improve the implementations of languages like Python, Ruby, or Javascript. At the time of the writing the HipHop source has not been released, however I am interested to see how they handle garbage collection.
You can find the rest here. There are view comments.
PyPy recently posted some interesting benchmarks from the computer language shootout, and in my last post about Unladen Swallow I described a patch that would hopefully be landing soon. I decided it would be interesting to benchmarks something with this patch. For this I used James Tauber's Mandelbulb application, at both 100x100 and 200x200. I tested CPython, Unladen Swallow Trunk, Unladen Swallow Trunk with the patch, and a recent PyPy trunk (compiled with the JIT). My results were as follows:
| VM | 100 | 200 |
|---|---|---|
| CPython 2.6.4 | 17s | 64s |
| Unladen Swallow Trunk | 16s | 52s |
| Unladen swallow Trunk + Patch | 13s | 49s |
| PyPy Trunk | 10s | 46s |
Interesting results. At 100x100 PyPy smokes everything else, and the patch shows a clear benefit for Unladen. However, at 200x200 both PyPy and the patch show diminishing returns. I'm not clear on why this is, but my guess is that something about the increased size causes a change in the parameters that makes the generated code less efficient for some reason.
It's important to note that Unladen Swallow has been far less focussed on numeric benchmarks than PyPy, instead focusing on more web app concerns (like template languages). I plan to benchmark some of these as time goes on, particularly after PyPy merges their "faster-raise" branch, which I'm told improves PyPy's performance on Django's template language dramatically.
You can find the rest here. There are view comments.
A while ago Eric Holscher blogged about things he didn't learn in college. I'm going to take a different spin on it, looking at both things that I did learn in school that I wouldn't have learned else where (henceforth defined as my job, or open source programming), as well as thinks I learned else where instead of at college.
- Big O notation, and algorithm analysis. This is the biggest one, I've had little cause to consider this in my open source or professional work, stuff is either fast or slow and that's usually enough. Learning rigorous algorithm analysis doesn't come up all the time, but every once in a while it pops up, and it's handy.
- C++. I imagine that I eventually would have learned it myself, but my impetus to learn it was that's what was used for my CS2 class, so I started learning with the class then dove in head first. Left to my own devices I may very well have stayed in Python/Javascript land.
- Finite automaton and push down automaton. I actually did lexing and parsing before I ever started looking at these in class (see my blog posts from a year ago) using PLY, however, this semester I've actually been learning about the implementation of these things (although sadly for class projects we've been using Lex/Yacc).
- Compilers. I've learned everything I know about compilers from reading my papers from my own interest and hanging around communities like Unladen Swallow and PyPy (and even contributing a little).
- Scalability. Interesting this is a concept related to algorithm analysis/big O, however this is something I've really learned from talking about this stuff with guys like Mike Malone and Joe Stump.
- APIs, Documentation. These are the core of software development (in my opinion), and I've definitely learned these skills in the open source world. You don't know what a good API or documentation is until it's been used by someone you've never met and it just works for them, and they can understand it perfectly. One of the few required, advanced courses at my school is titled, "Software Design and Documentation" and I'm deathly afraid it's going to waste my time with stuff like UML, instead of focusing on how to write APIs that people want to use and documentation that people want to read.
So these are my short lists. I've tried to highlight items that cross the boundaries between what people traditionally expect are topics for school and topics for the real world. I'd be curious to hear what other people's experience with topics like these are.</div>
Today a patch of mine was committed to Unladen Swallow. In the past weeks I've described some of the optimizations that have gone into Unladen Swallow, in specific I looked at removing the allocation of an argument tuple for C functions. One of the "on the horizon" things I mentioned was extending this to functions with a variable arity (that is the number of arguments they take can change). This has been implemented for functions that take a finite range of argument numbers (that is, they don't take *args, they just have a few arguments with defaults). This support was used to optimize a number of builtin functions (dict.get, list.pop, getattr for example).
However, there were still a number of functions that weren't updated for this support. I initially started porting any functions I saw, but it wasn't a totally mechanical translation so I decided to do a little profiling to better direct my efforts. I started by using the cProfile module to see what functions were called most frequently in Unladen Swallow's Django template benchmark. Imagine my surprise when I saw that unicode.encode was called over 300,000 times! A quick look at that function showed that it was a perfect contender for this optimization, it was currently designated as a METH_VARARGS, but in fact it's argument count was a finite range. After about of dozen lines of code, to change the argument parsing, I ran the benchmark again, comparing it a control version of Unladen Swallow, and it showed a consistent 3-6% speedup on the Django benchmark. Not bad for 30 minutes of work.
Another optimization I want to look at, which hasn't landed yet, is one of optimize various operations. Right now Unladen Swallow tracks various data about the types seen in the interpreter loop, however for various operators this data isn't actually used. What this patch does is check at JIT compilation time whether the operator site is monomorphic (that is there is only one pair of types ever seen there), and if it is, and it is one of a few pairings that we have optimizations for (int + int, list[int], float - float for example) then optimized code is emitted. This optimized code checks the types of both the arguments that they are the expected ones, if they are then the optimized code is executed, otherwise the VM bails back to the interpreter (various literature has shown that a single compiled optimized path is better than compiling both the fast and slow paths). For simple algorithm code this optimization can show huge improvements.
The PyPy project has recently blogged about the results of the results of some benchmarks from the Computer Language Shootout run on PyPy, Unladen Swallow, and CPython. In these benchmarks Unladen Swallow showed that for highly algorithmic code (read: mathy) it could use some work, hopefully patches like this can help improve the situation markedly. Once this patch lands I'm going to rerun these benchmarks to see how Unladen Swallow improves, I'm also going to add in some of the more macro benchmarks Unladen Swallow uses to see how it compares with PyPy in those. Either way, seeing the tremendous improvements PyPy and Unladen Swallow have over CPython gives me tremendous hope for the future.
You can find the rest here. There are view comments.
People who have been reading this blog since last year (good lord) may recall that once upon a time I did a short series of posts on lexing and parsing using PLY. Back then I was working on a language named Al. This past week or so I've started working on another personal project (not public yet) and I've once again had the need to lex things, but this time I wrote my lexer by hand, instead of using any sort of generator. This has been an exceptional learning experience, so I'd like to pass some of that on to you.
The first thing to note is that writing a lexer is a great place to TDD (test driven development), I've rewritten various parts of my lexer five or more times, I've needed my tests to keep me sane. Got your tests written? Ok it's time to dive right into our lexer.
I've structured my lexer as a single class that takes an input string, and has a parse method which returns a generator that yields tokens (tokens are just a namedtuple with a name and value field). The parser has two important attributes, state which is a string that says what state the lexer is in (this is used for tokens that are more than one character long), and current_val which is a list containing characters that will eventually become the value for the current token being found.
The parse method iterates through characters in the text and then it checks, if the parser has a state (self.state is not None) it does getattr(self, self.state)(character). Otherwise it calls self.generic(character). Then the various "state methods" are responsible for mutating self.current_val and self.state and returning a Token. So for example the string state looks like this:
def string(self, ch):
if ch == '"':
sym = Symbol("STRING", "".join(self.current_val))
self.current_val = []
self.state = None
return sym
elif ch == "\\":
self.state = "string_escaped"
else:
self.current_val.append(ch)
If the character is a quote then we're closing our string so we return our string Symbol, reset the current_val and state. If the character is a then we switch into a string_escaped state which knows to handle the character as a literal and then go back to string state. If the character is anything else then we just append it to the current_val, it will get handled at the end of the string.
I've found this to be an exceptionally powerful method, and it makes my end result code very readable. Hopefully I'll be able to reveal my project in the coming weeks, as I'm very excited about it, even if it's not ready I'll continue to share these lessons learned as I go.
You can find the rest here. There are view comments.
Yesterday I wrote about why I wasn't very interested in Go. Two of my three major complaints were about the syntax of Go, and based on the comments I got here and on Hacker News a lot of people didn't seem to mind the syntax, or at least didn't think it was worth talking about. However, the opposite is true, for me the syntax is among the single most important things about a programming language.
I'd estimate that I spend about 60% of my day thinking about and reading code and 40% actually writing code. This means that code needs to be easy to read, that means no stray punctuation or anything else that distracts me from what I want to see in my code: what does it do when I run it. This means any code I'm looking at better be properly indented. It also means that I find braces and semicolons to be noise, stuff that just distracts me from what I'm reading the code to do. Therefore, code ought to use the existing, nonintrusive, structure, instead of obligating me to add more noise.
"Programs must be written for people to read, and only incidentally for machines to execute." This is a quote from Structure and Interpretation of Computer Programs, by Harold Abelson and Gerald Sussman. It has always struck me as odd that the people who wrote that chose to use Scheme for their text book. In my view Lisp and Scheme are the height of writing for a machine to execute. I think David Heinemeier Hansson got it right when he said, "code should be beautiful", I spent 5+ hours a day reading it, I damned well better want to look at it.
You can find the rest here. There are view comments.
When I first heard Rob Pike and Ken Thompson had created a new programming language my first instinct is, "I'm sure it'll be cool, these are brilliant guys!" Unfortunately that is not the case, mostly because I think they made some bad design choices, and because the "cool new things" aren't actually that new, or cool.
The first major mistake was using a C derived syntax: braces and semicolons. I know some people don't like significant whitespace, but if you aren't properly indenting your code it's already beyond the point of hopelessness. The parser ought to use the existing structure instead of adding more punctuation. Readability is one of the most important attributes of code, and this syntax detracts from that.
The next mistake is having a separate declaration and assignment operator. I understand that the point of this operator is to reduce the repetition of typing out the types name both in declaring the variable and in initializing the value. Yet it seems the purpose of the := operator is to avoid typos in variable names that are possible by making all assignment an implicit declaration if the variable wasn't already declared. I can see myself making many more typos by forgetting to use the := operator, and in cases where I make a typo in a variable name it would inevitably be caught by the compiler when I attempted to actually use it (the fact that this is an attempted declaration means the variable won't have been declared elsewhere).
The final mistake was not providing generics. C++'s templates is one of the things that make the language head and shoulders more useful for me than C for tasks I need to preform; generics allow one to provide reusable data structures. While Go seems to have something akin to generics with their map data structure, it disappointingly doesn't appear to be exposed in any way to user code. One of the things I've found makes me most productive in Python is that any time I need to perform a task I simply pick the data structure that does what I want and it is efficiently implemented. Without generics, I don't see a way for a statically typed language to offer this same breadth of data structures without each of them being a special case.
In terms of features which I believe are overhyped the most important one is the "goroutine". As best I can tell these are an implementation of fibers. Constructs like concurrency should not be granted their own syntax, especially when they can be cleanly implemented using the other constructs of a language, look at the C library libtask as an example.
Further, the handling of interfaces, though interesting, appears to be an implementation of C++0x's proposed concepts (which won't be in the final standard). I view this feature as something that is most useful in the context of generics, which Go doesn't have. The reason for this is to be able to make compile time assertions about the types that are being templated over. Doing anything else is more clearly implemented as abstract inheritance.
I'm not writing off Go permanently, but I'm not enthused about it, someone will probably have to let me know when I need to look again as I won't be following along. Enjoy your internet travels.
You can find the rest here. There are view comments.