This morning Facebook announced, as had been rumored for several weeks, a new, faster, implementation of PHP. To someone, like me, who loves dynamic languages and virtual machines this type of announcement is pretty exciting, after all, if they have some new techniques for optimizing dynamic languages they can almost certainly be ported to a language I care about. However, because of everything I've read (and learned about PHP, the language) since the announcement I'm not particularly excited about HipHop.
Firstly, there's the question of what problem HipHop solves. It aims to improve the CPU usage of PHP applications. For all practical purposes PHP exists exclusively to serve websites (yes, it can do other things, no one does them). Almost every single website on the internet is I/O bound, not CPU bound, web applications spend their time waiting on external resources (databases, memcache, other HTTP resources, etc.). So the part of me that develops websites professionally isn't super interested, Facebook is in the exceptionally rare circumstance that they've optimized their I/O to the point that optimizing CPU gives worthwhile returns. However, the part of me that spends his evenings contributing to Unladen Swallow and hanging around PyPy still thought that there might be some interesting VM technology to explore.
The next issue for consideration was the "VM" design Facebook choose. They've elected to compile PHP into C++, and then use a C++ compiler to get a binary out of it. This isn't a particularly new technique, in the Python world projects like Shedskin and Cython have exploited a similar technique to get good speed ups. However, Facebook also noted that in doing so they had dropped support for "some rarely used features — such as eval()". An important question is which features exactly, they had dropped support for. After all, the reason compiling a dynamic language to efficient machine code is difficult is because the dynamicism defeats the compiler's ability to optimize, but if you remove the dynamicism you remove the obstacles to efficient compilation. However, you're also not compiling the same language. PHP without eval(), and whatever else they've removed is quite simply a different language, for this reason I don't consider either Shedskin or Cython to be an implementation of Python, because they don't implement the entire language.
This afternoon, while I was idling in the Unladen Swallow IRC channel a discussion about HipHop came up, and I learned a few things about PHP I hadn't previous realized. The biggest of these is that a name bound to a function in PHP cannot be undefined, or redefined. If you've ever seen Collin Winter give a talk about Unladen Swallow, the canonical example of Python's dynamicism defeating a static compiler is the len() function. For lists, tuples, or dicts a call to len() should be able to be optimized to a single memory read out of a field on the object, plus a call to instantiate an integer object. However, in CPython today it's actually about 3 function calls, and 3 memory reads to get this data (plus the call to instantiate an integer object), plus the dictionary lookup in the global builtins to see what the object named len is. That's a hell of a lot more work than a single memory read (which is one instruction on an x86 CPU). The reason CPython needs to do all that work is that it a) doesn't know what the len object is, and b) when len is called it has no idea what its arguments will be.
As I've written about previously, Unladen Swallow has some creative ways to solve these problems, to avoid the dictionary lookups, and, eventually, to inline the body of the len() into the caller and optimize if for the types it's called with. However, this requires good runtime feedback, since the compiler simply cannot know statically what any of the objects will actually be at runtime. However, if len could be know to be the len() function at compile time Unladen Swallow could inline the body of the function, unconditionally, into the caller. Even with only static specialization for lists, dicts, and tuples like:
if isinstance(obj, list):
return obj.ob_size
elif isinstance(obj, tuple):
return obj.ob_size
elif isisinstance(obj, dict):
return obj.ma_fill
else:
return obj.__len__()
This would be quite a bit faster than the current amount of indirection. In PHP's case it's actually even easier, it only has one builtin array type, which acts as both a typical array as well as a hash table. Now extend this possibly optimization to not only every builtin, but every single function call. Instead of the dictionary lookups Python has to do for every global these can just become direct function calls.
Because of differences like this (and the fact that PHP only has machine sized integers, not arbitrary sized ones, and not implementing features of the language such as eval()) I believe that the work done on HipHop represents a fundamentally smaller challenge than that taken on by the teams working to improve the implementations of languages like Python, Ruby, or Javascript. At the time of the writing the HipHop source has not been released, however I am interested to see how they handle garbage collection.
You can find the rest here. There are view comments.
Open source works for a lot of reasons, but there is one that stands out. Open source is basically an application of democracy to a programming community, in fact it's the most perfect implementation of democracy yet.
The central idea of a democracy is that the governing party draws it's authority from the willing consent of the governed. In most modern democracies this is more or less true, but not exactly. In the real world it's often very difficult for a citizen to withdraw their consent to be governed by the governing: they have to wait until elections to formally enact change, violent rebellion basically doesn't exist in first world countries, and there are many barriers (economic and otherwise) to just picking up and leaving. Because of the difficulty in withdrawing consent, modern democracies are not (and probably cannot) perfectly embody this spirit.
But open source communities can. In the open source world forking is often considered to be a nuclear option (ignoring the use of the term in the DVCS sense of forking for collaboration), a tactic designed to fragment a community. But it's also the perfect equalizer. Authority in open source communities is derived from the community's willingness to stay under the leadership of that authority, at any point any member of the community can decide to exit the community: to use a different piece of software, or to fork it, and continue development however they please. Because there are practical options to withdrawing consent to the governance open source is probably the most perfect application of democracy.
You can find the rest here. There are view comments.
I read a blog post the other day titled "I Have No Talent", and I found it to be pretty interesting, and what the author says probably resonates with a lot of people. But not with me. I disagree with the article on a basic premises, that perseverance is less of an innate ability than talent.
If you read the post in question, the author argues that he doesn't have any innate ability to program, or use Ruby, but he does have a good work ethic, and this makes it possible for him to still get stuff done, as long as he's still willing to put in the time. Perseverance, and the willingness to work hard are just as much talent as the ability to program, and it's probably more innate than almost anything else. You can create a good work ethic for yourself by setting patterns, or providing yourself intensives, but the willingness to do that work is something that is innate (after all it'd be way easier to just give yourself a reward, instead of forcing yourself to spend the extra 30 minutes learning something new).
I say this because I think I'm exceptionally lucky to have been born with an above average intelligence, and that I can hardly take credit for that. Whatever combination of genetics and luck gives someone their start, I drew a decent hand in the IQ department. The author says that anybody can be good at this stuff if they put enough practice into it, and I think that's probably true, but the amount of practice it takes will vary wildly depending on intelligence (and probably a bunch of other factors). It's not uncommon for me to be working on a problem with a friend, perhaps one of us is helping the other with some homework, and we'll reach some point where we need to get to the next step, and the path for us to take will be completely obvious to me, but not to them. Whatever it is that makes those steps obvious, intelligence, instinctiveness, whatever the name for that attribute is, I don't think it's something that can be learned. It would be like someone telling me that if I practice hard enough I can learn to play basketball like Michael Jordan. I could get a lot better than I am with more practice, but he's got a gift (genetics, something else, whatever) that means it takes a hell of a lot less work for him. I know Michael Jordan put in insane hours practicing, but I also know that if I put in the same hours I still wouldn't be as good as him.
Now that I've said all this does it mean I think people shouldn't practice? Absolutely not, it's the only way you'll get better than you are now (no matter how good you are). But if people are complimenting you, that might be a sign that you've got more going for you than just your hard work, that perhaps you are somehow predisposed to it.
You can find the rest here. There are view comments.
Disclosure: I received a free review copy of Dive into Python 3 from Apress.
Unlike a ton of people I know in the Python world, my experience in learning Python didn't include the original Dive into Python at all, in fact I didn't encounter it until quite a while later when I was teaching a friend Python and I was looking for example exercises. Since Dive into Python is really a book for people who don't know Python a lot of my views on it are based on how helpful I think it would have been while teaching my friend, since it's pretty difficult to imagine myself not knowing Python as I do.
The first thing to note is that Mark Pilgrim has an absolutely brilliant writing style, even when I was reading about stuff I already knew it was an absolute pleasure. The next thing to note is that this book is squarely targeted at people who are already programmers who want to learn Python, I don't think it would make a good "my first programming book". Dive into Python 3 jumps into Python full steam ahead, it dives into Python's datatypes, generators, unit testing, and interacting with the web.
The book is strongly example based, Mark does a great job of showing code and explaining it clearly. He also does a good job of emphasising best practices such as unit testing. It also covers some external libraries like httplib2, plus there's stuff on porting your existing libraries to Python 3, and a great appendix.
For all these reasons I think Dive into Python 3 makes a good introduction to Python 3. But don't take my word for it, Mark has made a point of releasing all of his books online, free of charge. So if you think you're in the target audience (or even if you aren't) check it out, it doesn't cost you a dime, which Mark goes above and beyond the call of duty to ensure.
You can find the rest here. There are view comments.
For a long time it's been possible to deploy a Django project under a WSGI server, and in the run up to Django's 1.0 release a number of bugs were fixed to make Django's WSGI handler as compliant to the standard as possible. However, Django's support for interacting with the world of WSGI application, middleware, and frameworks has been less than stellar. However, I recently got inspired to improve this situation.
WSGI (Web Server Gateway Interface) is a specification for how applications and frameworks in Python can interface with a server. There are tons of servers that support the WSGI interface, most notably mod_wsgi (an Apache plugin), however there are tons of other ones, spawning, twisted, uwsgi, gunicorn, cherrypy, and probably dozens more.
The inspiration for improving Django's integration with the WSGI world was Ruby on Rails 3's improved support for Rack, Rack is the Ruby world's equivilant to WSGI. In Rails 3 every layer of the stack, the entire application, the dispatch, and individual controllers, is exposed as a Rack application. It occured to me that it would be pretty swell if we could do the same thing with Django, allow individual views and URLConfs to be exposed as WSGI application, and the reverse, allowing WSGI application to be deployed inside of Django application (via the standard URLConf mapping system). Another part of this inspiration was discussing gunicorn with Eric Florenzano, gunicorn is an awesome new WSGI server, inspired by Ruby's Unicorn, there's not enough space in this post to cover all the reasons it is awesome, but it is.
The end result of this is a new package, django-wsgi, which aims to bridge the gap between the WSGI world and the Django world. Here's an example of exposing a Django view as a WSGI application:
from django.http import HttpResponse
from django_wsgi import wsgi_application
def my_view(request):
return HttpResponse("Hello world, I'm a WSGI application!")
application = wsgi_application(my_view)
And now you can point any WSGI server at this and it'll serve it up for you. You can do the same thign with a URLConf:
from django.conf.urls.defaults import patterns
from django.http import HttpResponse
from django_wsgi import wsgi_application
def hello_world(request):
return HttpResponse("Hello world!")
def hello_you(request, name):
return HttpResponse("Hello %s!" % name)
urls = patterns("",
(r"^$", hello_world),
(r"^(?P<name>\w+)/$", hello_you)
)
application = wsgi_application(urls)
Again all you need to do is point your server at this and it just works. However, the point of all this isn't just to make building single file applications easier (although this definitely does), the real win is that you can take a Django application and mount it inside of another WSGI application through whatever process it supports. Of course you can also go the other direction, mount a WSGI application inside of a Django URLconf:
from django.conf.urls.defaults import *
from django_wsgi import django_view
def my_wsgi_app(environ, start_response):
start_response("200 OK", [("Content-type", "text/plain")])
return ["Hello World!"]
urlpatterns = patterns("",
# other views here
url("^my_view/$", django_view(my_wsgi_app))
)
And that's all there is to it. Write your apps the way you want and deploy them, plug them in to each other, whatever. There's a lot of work being done in the Django world to play nicer with the rest of the Python ecosystem, and that's definitely a good thing. I'd also like to thank Armin Ronacher for helping me make sure this actually implements WSGI correctly. Please use this, fork it, send me hate mail, improve it, and enjoy it!
You can find the rest here. There are view comments.
Welcome to the new home of my blog! I've finally gotten around to setting up everything for myself, no more Blogspot (not that they weren't gracious hosts). This blog runs a custom blogging engine built on Django, with django-taggit for tagging, and Disqus for comments. Hosting is graciously provided by the Steward of Gondor (Brian Rosner). Mike Malone has given me the domain (you can find the remnants of this domain's former glory here). I'll be migrating all my old posts here just as quickly as I can, for now enjoy the decor.
Edit: I forgot to thanks James Tauber I stole most of my design from him.
You can find the rest here. There are view comments.
Lately I've been thinking quite a lot about education, both my own and in general. I ended up writing quite a bit about it. You can find all my thoughts here (PDF warning). I stuck it in a PDF because I think it's a more canonical form. I'm interested in hearing any thoughts people have, both about what I wrote and about education.
You can find the rest here. There are view comments.
I've now been blogging for 30 days in a row, so tonight is just going to be a simple post to finish the month of. First of all, WOW, another month of blogging every day completed. This month was great fun, and I managaged to keep the bullshit filler posts to a minimum (3 or 4 by my count). I also finished the month with 42700 total hits (easily a record), including hitting reddit and Hacker News several times. It's been great fun writing every night, hopefully I'll be able to keep up the regular posts, but for now I plan on taking a nice long nap, then I'll get back to the requests.
You can find the rest here. There are view comments.
Recently I had a bit of an interesting problem, I needed to define a way to represent a C++ API in Python. So, I figured the best way to represent that was one class in Python for each class in C++, with a functions dictionary to track each of the methods on each class. Seems simple enough right, do something like this:
class String(object):
functions = {
"size": Function(Integer, []),
}
We've got a String class with a functions dictionary that maps method names to Function objects. The Function constructor takes a return type and a list of arguments. Unfortunately we run into a problem when we want to do something like this:
class String(object):
functions = {
"size": Function(Integer, []),
"append": Function(None, [String])
}
If we try to run this code we're going to get a NameError, String isn't defined yet. Django models have a similar issue, with recursive foreign keys. Django's solution is to use the placeholder string "self", and have a metaclass translate it into the right class. Also having a slightly more declarative API might be nice, so something like this:
class String(DeclarativeObject):
size = Function(Integer, [])
append = Function(None, ["self"])
So now that we have a nice pretty API we need our metaclass to make it happen:
RECURSIVE_TYPE_CONSTANT = "self"
class DeclarativeObjectMetaclass(type):
def __new__(cls, name, bases, attrs):
functions = dict([(n, attr) for n, attr in attrs.iteritems()
if isinstance(attr, Function)])
for attr in functions:
attrs.pop(attr)
new_cls = super(DeclarativeObjectMetaclass, cls).__new__(cls, name, bases, attrs)
new_cls.functions = {}
for name, function in functions.iteritems():
if function.return_type == RECURSIVE_TYPE_CONSTANT:
function.return_type = new_cls
for i, argument in enumerate(function.arguments):
if argument == RECURSIVE_TYPE_CONSTANT:
function.arguments[i] = new_cls
new_cls.functions[name] = function
return new_cls
class DeclarativeObject(object):
__metaclass__ = DeclarativeObjectMetaclass
And that's all their is to it. We take each of the functions on the class out of the attributes, create a normal class instance without the functions, and then we do the replacements on the function objects and stick them in a functions dictionary.
Simple patterns like this can be used to build beautiful APIs, as is seen in Django with the models and forms API.
You can find the rest here. There are view comments.
Following yesterday's post another hotly requested topic was testing in Django. Today I wanted to give a simple overview on how to get started writing tests for your Django applications. Since Django 1.1, Django has automatically provided a tests.py file when you create a new application, that's where we'll start.
For me the first thing I want to test with my applications is, "Do the views work?". This makes sense, the views are what the user sees, they need to at least be in a working state (200 OK response) before anything else can happen (business logic). So the most basic thing you can do to start testing is something like this:
from django.tests import TestCase
class MyTests(TestCase):
def test_views(self):
response = self.client.get("/my/url/")
self.assertEqual(response.status_code, 200)
By just making sure you run this code before you commit something you've already eliminated a bunch of errors, syntax errors in your URLs or views, typos, forgotten imports, etc. The next thing I like to test is making sure that all the branches of my code are covered, the most common place my views have branches is in views that handle forms, one branch for GET and one for POST. So I'll write a test like this:
from django.tests import TestCase
class MyTests(TestCase):
def test_forms(self):
response = self.client.get("/my/form/")
self.assertEqual(response.status_code, 200)
response = self.client.post("/my/form/", {"data": "value"})
self.assertEqual(response.status_code, 302) # Redirect on form success
response = self.client.post("/my/form/", {})
self.assertEqual(response.status_code, 200) # we get our page back with an error
Now I've tested both the GET and POST conditions on this view, as well the form is valid and form is invalid cases. With this strategy you can have a good base set of tests for any application with not a lot of work. The next step is setting up tests for your business logic. These are a little more complicated, you need to make sure models are created and edited in the right cases, emails are sent in the right places, etc. Django's testing documentation is a great place to read more on writing tests for your applications.
You can find the rest here. There are view comments.
Today I'm starting off doing some of the posts people want to see, and the number one item on that list is Django and Python 3. Python 3 has been out for about a year at this point, and so far Django hasn't really started to move towards it (at least at a first glance). However, Django has already begun the long process towards moving to Python 3, this post is going to recap exactly what Django's migration strategy is (most of this post is a recap of a message James Bennett sent to the django-developers mailing list after the 1.0 release, available here).
One of the most important things to recognize in this that though there are many developers using Django for smaller projects, or new projects that want to start these on Python 3, there are also a great many more with legacy (as if we can call recent deployments on Python2.6 and Django 1.1 legacy) deployments that they want to maintain and update. Further, Django's latest release, 1.1, has support for Python releases as old as 2.3, and a migration to Python 3 from 2.3 is nontrivial. However, it is significantly easier to make this migration from Python 2.6. This is the crux of James's plan, people want to move to Python 3.0 and moving towards Python 2.6 makes this easier for them and us. Therefore, since the 1.1 release Django has been removing support for one point version of Python per Django release. So, Django 1.1 will be the last release to support Python 2.3, 1.2 will be the last to support 2.4, etc. This plan isn't guaranteed, if there's a compelling reason to maintain support for a version for longer it will likely override this plan (for example if a particularly common deployment platform only offered Python 2.5 removing support for it might be delayed an additional release).
At the end of this process Django is going to end up only supporting Python 2.6. At this point (or maybe even before), a strategy will need to be devised for how to actually handle the switch. Some possibilities are, 1) having an official breakpoint, only one version is supported at a given time, 2) Python 3 support begins in a branch that tracks trunk and eventually it switches to become trunk once Python 3 is the more common deployment, 3) Python 2.6 and 3 are supported from a single codebase. I'm not sure which one of these is easiest, other projects such as PLY have chosen to go with option 3, however my inclination is that option 2 will be best for Django since issues like bytes vs. string are particularly prominent in Django (since it talks to so many external data sources).
For people who are interested Martin von Löwis actually put together a patch that, at the time, gave Django Python 3 support (at least enough to run the tutorial under SQLite). If you're very interested in Django on Python 3 the best path would probably be to bring that patch up to date (unless it's wildly out of date, I haven't checked), and starting to fix new things that have been introduced since the patch was written. This work isn't likely to get any official support, since maintaining Python 2.4 support and Python 3 would be far too difficult, however there's no reason you can't maintain the patch externally on something like Github or Bitbucket.
You can find the rest here. There are view comments.
Recently Russell Keith-Magee and I decided that the Meta.using option needed to be removed from the multiple-db work on Django, and so we did. Yesterday someone tweeted that this change caught them off guard, so I wanted to provide a bit of explanation as to why we made that change.
The first thing to note is that Meta.using was very good for one specific use case, horizontal partitioning by model. Meta.using allowed you to tie a specific model to a specific database by default. This meant that if you wanted to do things like have users be in one db and votes in another this was basically trivial. Making this use case this simple was definitely a good thing.
The downside was that this solution was very poorly designed, particularly in light on Django's reusable application philosophy. Django emphasizes the reusability of application, and having the Meta.using option tied your partitioning logic to your models, it also meant that if you wanted to partition a reusable application onto another DB this easily the solution was to go in and edit the source for the reusable application. Because of this we had to go in search of a better solution.
The better solution we've come up with is having some sort of callback you can define that lets you decide what database each query should be executed on. This would let you do simple things like direct all queries on a given model to a specific database, as well as more complex sharding logic like sending queries to the right database depending on which primary key value the lookup is by. We haven't figured out the exact API for this, and as such this probably won't land in time for 1.2, however it's better to have the right solution that has to wait than to implement a bad API that would become deprecated in the very next release.
You can find the rest here. There are view comments.
Unfortunately, I don't have an interesting, contentful post today. Just a small update about this blog instead. I now have a small widget on the right hand side where you can enter topics you'd like to hear about. I don't always have a good idea of what readers are interested in, and far too often I reject blog post ideas because I think either, "no one cares about that" or "everyone always knows that" so hopefully this will be both a good way for me to write interesting content that people want to read about, as well as a good way for me to overcome any writers block. So please submit anything you'd like to hear about, Python, Django, the web, programming in general, compilers, or me ranting about politics, I'm willing to consider any topic.
To my American readers: Happy Thanksgiving!
You can find the rest here. There are view comments.
Disclosure: I received a free review copy of the book.
Today I finished reading the Python Essential Reference and I wanted to share my final thoughts on the book. I'll start by saying I still agree with everything I wrote in my initial review, specifically that it's both a great resource as well as a good way to find out what you don't already know. Reading the second half of the book there were a few things that really exemplified this for me.
The first instance of this is the chapter on concurrency. I've done some concurrent programming with Python, but it's mostly been small scripts, a multiprocess and multithreaded web scraper for example, so I'm familiar with the basic APIs for threading and multiprocessing. However, this chapter goes into the full details, really covering the stuff you need to know if you want to build bigger applications that leverage these techniques. Things like shared data for processes or events and condition variables for threads and the kind of things that the book gives a good explanation of, as well as good examples of how to use them.
The other chapter that really stood out for me is the one on network programming and sockets. This chapter describes everything from the low-level select module up through through the included socket servers. The most valuable part is an example of how to build an asynchronous IO system. This example is about 2 pages long and it's a brilliant example of how to use the modules, how to make an asynchronous API feel natural, and what the tradeoffs of asynchronous versus concurrency are. In addition, in the wake of the "* in Unix" posts from a while ago I found the section on the socket module interesting as it's something I've never actually worked directly with.
The rest of the book is a handy reference, but for me these two chapters are the types of things that earns this a place on my bookshelf. The way Python Essential Reference balances depth with conciseness is excellent, it shows you the big picture for everything and gives you super details on the things that are really important. I just got my review copy of Dive into Python 3 today, so I look forward to giving a review of it in the coming days.
You can find the rest here. There are view comments.
I read just about every single ticket that's filed in Django's trac, and at this point I'e gotten a pretty good sense of what (subjectively) makes a useful ticket. Specifically there are a few things that can make your ticket no better than spam, and a few that can instantly bump your ticket to the top of my "TODO" list. Hopefully, these will be helpful in both filing ticket's for Django as well as other open source projects.
- Search for a ticket before filing a new one. Django's trac, for example, has at least 10 tickets describing "Decoupling urls in the tutorial, part 3". These have all been wontfixed (or closed as a duplicate of one of the others). Each time one of these is filed it takes time for someone to read through it, write up an appropriate closing message, and close it. Of course, the creator of the ticket also invested time in filing the ticket. Unfortunately, for both parties this is time that could be better spent doing just about anything else, as the ticket has been decisively dealt with plenty of times.
- On a related note, please don't reopen a ticket that's been closed before. This one depends more on the policy of the project, in Django's case the policy is that once a ticket has been closed by a core developer the appropriate next step is to start a discussion on the development mailing list. Again this results in some wasted time for everyone, which sucks.
- Read the contributing documentation. Not every project has something like this, but when a project does it's definitely the right starting point. It will hopefully contain useful general bits of knowledge (like what I'm trying to put here) as well as project specific details, what the processes are, how to dispute a decision, how to check the status of a patch, etc.
- Provide a minimal test case. If I see a ticket who's description involves a 30 field model, it drops a few rungs on my TODO list. Large blocks of code like this take more time to wrap ones head around, and most of it will be superfluous. If I see just a few lines of code it takes way less time to understand, and it will be easier to spot the origin of the problem. As an extension to this if the test case comes in the form of a patch to Django's test suite it becomes even easier for a developer to dive into the problem.
- Don't file a ticket advocating a major feature or sweeping change. Pretty much if it's going to require a discussion the right place to start is the mailing list. Trac is lousy at facilitating discussions, mailing lists are designed explicitly for that purpose. A discussion on the mailing list can more clearly outline what needs to happen, and it may turn out that several tickets are needed. For example filing a ticket saying, "Add CouchDB support to the ORM" is pretty useless, this requires a huge amount of underlying changes to make it even possible, and after that a database backend can live external to Django, so there's plenty of design decisions to go around.
These are some of the issues I've found to be most pressing while reviewing tickets for Django. I realize they are mostly in the "don't" category, but filing a good ticket can sometimes be as good as clearly stating what the problem is, and how to reproduce it.
You can find the rest here. There are view comments.