alex gaynor's blago-blog

Posts tagged with django

Service

Posted May 19th, 2014. Tagged with django, python, open-source, community.

If you've been around an Open Source community for any length of time, you've probably heard someone say, "We're all volunteers here". Often this is given as an explanation for why some feature hasn't been implemented, why a release has been delayed, and in general, why something hasn't happened.

I think when we say these things (and I've said them as much as anyone), often we're being dishonest. Almost always it's not a question of an absolute availability of resources, but rather how we prioritize among the many tasks we could complete. It can explain why we didn't have time to do things, but not why we did them poorly.

Volunteerism does not place us above criticism, nor should it absolve us when we err.

Beyond this however, many Open Source projects (including entirely volunteer driven ones) don't just make their codebases available to others, they actively solicit users, and make the claim that people can depend on this software.

That dependency can take many forms. It usually means an assumption that the software will still exist (and be maintained) tomorrow, that it will handle catastrophic bugs in a reasonable way, that it will be a stable base to build a platform or a business on, and that the software won't act unethically (such as by flagrantly violating expectations about privacy or integrity).

And yet, across a variety of these policy areas, such as security and backwards compatibility we often fail to properly consider the effects of our actions on our users, particularly in a context of "they have bet their businesses on this". Instead we continue to treat these projects as our hobby projects, as things we casually do on the side for fun.

Working on PyCA Cryptography, and security in general, has grealy influenced my thinking on these issues. The nature of cryptography means that when we make mistakes, we put our users' businesses, and potentially their customers' personal information at risk. This responsibility weighs heavily on me. It means we try to have policies that emphasize review, it means we utilize aggressive automated testing, it means we try to design APIs that prevent inadvertent mistakes which affect security, it means we try to write excellent documentation, and it means, should we have a security issue, we'll do everything in our power to protect our users. (I've previous written about what I think Open Source projects' security policies should look like).

Open Source projects of a certain size, scope, and importance need to take seriously the fact that we have an obligation to our users. Whether we are volunteers, or paid, we have a solemn responsibility to consider the impact of our decisions on our users. And too often in the past, we have failed, and acted negligently and recklessly with their trust.

Often folks in the Open Source community (again, myself included!) have asked why large corporations, who use our software, don't give back more. Why don't they employ developers to work on these projects? Why don't they donate money? Why don't they donate other resources (e.g. build servers)?

In truth, my salary is paid by every single user of Python and Django (though Rackspace graciously foots the bill). The software I write for these projects would be worth nothing if it weren't for the community around them, of which a large part is the companies which use them. This community enables me to have a job, to travel the world, and to meet so many people. So while companies, such as Google, don't pay a dime of my salary, I still gain a lot from their usage of Python.

Without our users, we would be nothing, and it's time we started acknowledging a simple truth: our projects exist in service of our users, and not the other way around.

You can find the rest here. There are view comments.

Security process for Open Source Projects

Posted October 19th, 2013. Tagged with django, python, open-source, community.

This post is intended to describe how open source projects should handle security vulnerabilities. This process is largely inspired by my involvement in the Django project, whose process is in turn largely drawn from the PostgreSQL project's process. For every recommendation I make I'll try to explain why I've made it, and how it serves to protect you and your users. This is largely tailored at large, high impact, projects, but you should able to apply it to any of your projects.

Why do you care?

Security vulnerabilities put your users, and often, in turn, their users at risk. As an author and distributor of software, you have a responsibility to your users to handle security releases in a way most likely to help them avoid being exploited.

Finding out you have a vulnerability

The first thing you need to do is make sure people can report security issues to you in a responsible way. This starts with having a page in your documentation (or on your website) which clearly describes an email address people can report security issues to. It should also include a PGP key fingerprint which reporters can use to encrypt their reports (this ensures that if the email goes to the wrong recipient, that they will be unable to read it).

You also need to describe what happens when someone emails that address. It should look something like this:

  1. You will respond promptly to any reports to that address, this means within 48 hours. This response should confirm that you received the issue, and ideally whether you've been able to verify the issue or more information is needed.
  2. Assuming you're able to reproduce the issue, now you need to figure out the fix. This is the part with a computer and programming.
  3. You should keep in regular contact with the reporter to update them on the status of the issue if it's taking time to resolve for any reason.
  4. Now you need to inform the reporter of your fix and the timeline (more on this later).

Timeline of events

From the moment you get the initial report, you're on the clock. Your goal is to have a new release issued within 2-weeks of getting the report email. Absolutely nothing that occurs until the final step is public. Here are the things that need to happen:

  1. Develop the fix and let the reporter know.
  2. You need to obtain a CVE (Common Vulnerabilities and Exposures) number. This is a standardized number which identifies vulnerabilities in packages. There's a section below on how this works.
  3. If you have downstream packagers (such as Linux distributions) you need to reach out to their security contact and let them know about the issue, all the major distros have contact processes for this. (Usually you want to give them a week of lead time).
  4. If you have large, high visibility, users you probably want a process for pre-notifying them. I'm not going to go into this, but you can read about how Django handles this in our documentation.
  5. You issue a release, and publicize the heck out of it.

Obtaining a CVE

In short, follow these instructions from Red Hat.

What goes in the release announcement

Your release announcement needs to have several things:

  1. A precise and complete description of the issue.
  2. The CVE number
  3. Actual releases using whatever channel is appropriate for your project (e.g. PyPI, RubyGems, CPAN, etc.)
  4. Raw patches against all support releases (these are in addition to the release, some of your users will have modified the software, and they need to be able to apply the patches easily too).
  5. Credit to the reporter who discovered the issue.

Why complete disclosure?

I've recommended that you completely disclose what the issue was. Why is that? A lot of people's first instinct is to want to keep that information secret, to give your users time to upgrade before the bad guys figure it out and start exploiting it.

Unfortunately it doesn't work like that in the real world. In practice, not disclosing gives more power to attackers and hurts your users. Dedicated attackers will look at your release and the diff and figure out what the exploit is, but your average users won't be able to. Even embedding the fix into a larger release with many other things doesn't mask this information.

In the case of yesterday's Node.JS release, which did not practice complete disclosure, and did put the fix in a larger patch, this did not prevent interested individuals from finding out the attack, it took me about five minutes to do so, and any serious individual could have done it much faster.

The first step for users in responding to a security release in something they use is to assess exposure and impact. Exposure means "Am I affected and how?", impact means "What is the result of being affected?". Denying users a complete description of the issue strips them of the ability to answer these questions.

What happens if there's a zero-day?

A zero-day is when an exploit is publicly available before a project has any chance to reply to it. Sometimes this happens maliciously (e.g. a black-hat starts using the exploit against your users) and sometimes it is accidentally (e.g. a user reports a security issue to your mailing list, instead of the security contact). Either way, when this happens, everything goes to hell in a handbasket.

When a zero-day happens basically everything happens in 16x fast-forward. You need to immediately begin preparing a patch and issuing a release. You should be aiming to issue a release on the same day as the issue is made public.

Unfortunately there's no secret to managing zero-days. They're quite simply a race between people who might exploit the issue, and you to issue a release and inform your users.

Conclusion

Your responsibility as a package author or maintainer is to protect your users. The name of the game is keeping your users informed and able to judge their own security, and making sure they have that information before the bad guys do.

You can find the rest here. There are view comments.

Meritocracy

Posted October 12th, 2013. Tagged with politics, community, ethics, django, open-source.

Let's start with a definition, a meritocracy is a group where leadership or authority is derived from merit (merit being skills or ability), and particularly objective merit. I think adding the word objective is important, but not often explicitly stated.

A lot of people like to say open source is a meritocracy, the people who are the top of projects are there because they have the most merit. I'd like to examine this idea. What if I told you the United States Congress was a meritocracy? You might say "gee, how could that be, they're really terrible at their jobs, the government isn't even operational!?!". To which I might respond "that's evidence that they aren't good at their jobs, it doesn't prove that they aren't the best of the available candidates". You'd probably tell me that "surely someone, somewhere, is better qualified to do their jobs", and I'd say "we have an open, democratic process, if there was someone better, they'd run for office and get elected".

Did you see what I did there? It was subtle, a lot of people miss it. I begged the question. Begging the question is the act of responding to a hypothesis with a conclusion that's premised on exactly the question the hypothesis asks.

So what if you told me that Open Source was meritocracy? Projects gain recognition because they're the best, people become maintainers of libraries because they're the best.

And those of us involved in open source love this explanation, why wouldn't we? This explanation says that the reason I'm a core developer of Django and PyPy because I'm so gosh-darned awesome. And who doesn't like to think they're awesome? And if I can have a philosophy that leads to myself being awesome, all the better!

Unfortunately, it's not a valid conclusion. The problem with stating that a group is meritocratic is that it's not a falsifiable hypothesis.

We don't have a definition of objective merit. As a result of which there's no piece of evidence that I can show you to prove that a group isn't in fact meritocratic. And a central tenant of any sort of rigorous inquisitive process is that we need to be able to construct a formal opposing argument. I can test whether a society is democratic, do the people vote, is the result of the vote respected? I can't test if a society is meritocratic.

It's unhealthy when we consider or groups, or cultures, or our societies as being meritocratic. It makes us ignore questions about who our leaders are, how they got there who isn't represented. The best we can say is that maybe our organizations are (perceptions of subjective merit)-ocracies, which is profoundly different from what we mean when we say meritocracy.

I'd like to encourage groups that self-identify as being meritocratic (such as The Gnome Foundation, The Apache Software Foundation, Mozilla, The Document Foundation, and The Django Software Foundation) to reconsider this. Aspiring to meritocracy is a reasonable, it makes sense to want for the people who are best capable of doing so to lead us, but it's not something we can ever say we've achieved.

You can find the rest here. There are view comments.

Effective Code Review

Posted September 26th, 2013. Tagged with openstack, python, community, django, open-source.

Maybe you practice code review, either as a part of your open source project or as a part of your team at work, maybe you don't yet. But if you're working on a software project with more than one person it is, in my view, a necessary piece of a healthy workflow. The purpose of this piece is to try to convince you its valuable, and show you how to do it effectively.

This is based on my experience doing code review both as a part of my job at several different companies, as well as in various open source projects.

What

It seems only seems fair that before I try to convince you to make code review an integral part of your workflow, I precisely define what it is.

Code review is the process of having another human being read over a diff. It's exactly like what you might do to review someone's blog post or essay, except it's applied to code. It's important to note that code review is about code. Code review doesn't mean an architecture review, a system design review, or anything like that.

Why

Why should you do code review? It's got a few benefits:

  • It raises the bus factor. By forcing someone else to have the familiarity to review a piece of code you guarantee that at least two people understand it.
  • It ensures readability. By getting someone else to provide feedback based on reading, rather than writing, the code you verify that the code is readable, and give an opportunity for someone with fresh eyes to suggest improvements.
  • It catches bugs. By getting more eyes on a piece of code, you increase the chances that someone will notice a bug before it manifests itself in production. This is in keeping with Eric Raymond's maxim that, "given enough eyeballs, all bugs are shallow".
  • It encourages a healthy engineering culture. Feedback is important for engineers to grow in their jobs. By having a culture of "everyone's code gets reviewed" you promote a culture of positive, constructive feedback. In teams without review processes, or where reviews are infrequent, code review tends to be a tool for criticism, rather than learning and growth.

How

So now that I've, hopefully, convinced you to make code review a part of your workflow how do you put it into practice?

First, a few ground rules:

  • Don't use humans to check for things a machine can. This means that code review isn't a process of running your tests, or looking for style guide violations. Get a CI server to check for those, and have it run automatically. This is for two reasons: first, if a human has to do it, they'll do it wrong (this is true of everything), second, people respond to certain types of reviews better when they come from a machine. If I leave the review "this line is longer than our style guide suggests" I'm nitpicking and being a pain in the ass, if a computer leaves that review, it's just doing it's job.
  • Everybody gets code reviewed. Code review isn't something senior engineers do to junior engineers, it's something everyone participates in. Code review can be a great equalizer, senior engineers shouldn't have special privledges, and their code certainly isn't above the review of others.
  • Do pre-commit code review. Some teams do post-commit code review, where a change is reviewed after it's already pushed to master. This is a bad idea. Reviewing a commit after it's already been landed promotes a feeling of inevitability or fait accompli, reviewers tend to focus less on small details (even when they're important!) because they don't want to be seen as causing problems after a change is landed.
  • All patches get code reviewed. Code review applies to all changes for the same reasons as you run your tests for all changes. People are really bad at guessing the implications of "small patches" (there's a near 100% rate of me breaking the build on change that are "so small, I don't need to run the tests"). It also encourages you to have a system that makes code review easy, you're going to be using it a lot! Finally, having a strict "everything gets code reviewed" policy helps you avoid arguments about just how small is a small patch.

So how do you start? First, get yourself a system. Phabricator, Github's pull requests, and Gerrit are the three systems I've used, any of them will work fine. The major benefit of having a tool (over just mailing patches around) is that it'll keep track of the history of reviews, and will let you easily do commenting on a line-by-line basis.

You can either have patch authors land their changes once they're approved, or you can have the reviewer merge a change once it's approved. Either system works fine.

As a patch author

Patch authors only have a few responsibilities (besides writing the patch itself!).

First, they need to express what the patch does, and why, clearly.

Second, they need to keep their changes small. Studies have shown that beyond 200-400 lines of diff, patch review efficacy trails off [1]. You want to keep your patches small so they can be effectively reviewed.

It's also important to remember that code review is a collaborative feedback process if you disagree with a review note you should start a conversation about it, don't just ignore it, or implement it even though you disagree.

As a review

As a patch reviewer, you're going to be looking for a few things, I recommend reviewing for these attributes in this order:

  • Intent - What change is the patch author trying to make, is the bug they're fixing really a bug? Is the feature they're adding one we want?
  • Architecture - Are they making the change in the right place? Did they change the HTML when really the CSS was busted?
  • Implementation - Does the patch do what it says? Is it possibly introducing new bugs? Does it have documentation and tests? This is the nitty-gritty of code review.
  • Grammar - The little things. Does this variable need a better name? Should that be a keyword argument?

You're going to want to start at intent and work your way down. The reason for this is that if you start giving feedback on variable names, and other small details (which are the easiest to notice), you're going to be less likely to notice that the entire patch is in the wrong place! Or that you didn't want the patch in the first place!

Doing reviews on concepts and architecture is harder than reviewing individual lines of code, that's why it's important to force yourself to start there.

There are three different types of review elements:

  • TODOs: These are things which must be addressed before the patch can be landed; for example a bug in the code, or a regression.
  • Questions: These are things which must be addressed, but don't necessarily require any changes; for example, "Doesn't this class already exist in the stdlib?"
  • Suggestions for follow up: Sometimes you'll want to suggest a change, but it's big, or not strictly related to the current patch, and can be done separately. You should still mention these as a part of a review in case the author wants to adjust anything as a result.

It's important to note which type of feedback each comment you leave is (if it's not already obvious).

Conclusion

Code review is an important part of a healthy engineering culture and workflow. Hopefully, this post has given you an idea of either how to implement it for your team, or how to improve your existing workflow.

[1]http://www.ibm.com/developerworks/rational/library/11-proven-practices-for-peer-review/

You can find the rest here. There are view comments.

Doing a release is too hard

Posted September 17th, 2013. Tagged with openstack, django, python, open-source.

I just shipped a new release of alchimia. Here are the steps I went through:

  • Manually edit version numbers in setup.py and docs/conf.py. In theory I could probably centralize this, but then I'd still have a place I need to update manually.
  • Issue a git tag (actually I forgot to do that on this project, oops).
  • python setup.py register sdist upload -s to build and upload some tarballs to PyPi
  • python setup.py register bdist_wheel upload -s to build and upload some wheels to PyPi
  • Bump the version again for the now pre-release status (I never remeber to do this)

Here's how it works for OpenStack projects:

  • git tag VERSION -s (-s) makes it be a GPG signed tag)
  • git push gerrit VERSION this sends the tag to gerrit for review

Once the tag is approved in the code review system, a release will automatically be issue including:

  • Uploading to PyPi
  • Uploading documentation
  • Landing the tag in the official repository

Version numbers are always automatically handled correctly.

This is how it should be. We need to bring this level of automation to all projects.

You can find the rest here. There are view comments.

You guys know who Philo Farnsworth was?

Posted September 15th, 2013. Tagged with django, python, open-source, community.

Friends of mine will know I'm a very big fan of the TV show Sports Night (really any of Aaron Sorkin's writing, but Sports Night in particular). Before you read anything I have to say, take a couple of minutes and watch this clip:

I doubt Sorkin knew it when he scripted this (I doubt he knows it now either), but this piece is about how Open Source happens (to be honest, I doubt he knows what Open Source Software is).

This short clip actually makes two profound observations about open source.

First, most contribution are not big things. They're not adding huge new features, they're not rearchitecting the whole system to address some limitation, they're not even fixing a super annoying bug that affects every single user. Nope, most of them are adding a missing sentence to the docs, fixing a bug in a wacky edge case, or adding a tiny hook so the software is a bit more flexible. And this is fantastic.

The common wisdom says that the thing open source is really bad at is polish. My experience has been the opposite, no one is better at finding increasingly edge case bugs than open source users. And no one is better at fixing edge case bugs than open source contributors (who overlap very nicely with open source users).

The second lesson in that clip is about how to be an effective contributor. Specifically that one of the keys to getting involved effectively is for other people to recognize that you know how to do things (this is an empirical observation, not a claim of how things ought to be). How can you do that?

  • Write good bug reports. Don't just say "it doesn't work", if you've been a programmer for any length of time, you know this isn't a useful bug report. What doesn't work? Show us the traceback, or otherwise unexpected behavior, include a test case or instructions for reproduction.
  • Don't skimp on the details. When you're writing a patch, make sure you include docs, tests, and follow the style guide, don't just throw up the laziest work possible. Attention to detail (or lack thereof) communicates very clearly to someone reviewing your work.
  • Start a dialogue. Before you send that 2,000 line patch with that big new feature, check in on the mailing list. Make sure you're working in a way that's compatible with where the project is headed, give people a chance to give you some feedback on the new APIs you're introducing.

This all works in reverse too, projects need to treat contributors with respect, and show them that the project is worth their time:

  • Follow community standards. In Python this means things like PEP8, having a working setup.py, and using Sphinx for documentation.
  • Have passing tests. Nothing throws me for a loop worse than when I checkout a project to contribute and the tests don't pass.
  • Automate things. Things like running your tests, linters, even state changes in the ticket tracker should all be automated. The alternative is making human beings manually do a bunch of "machine work", which will often be forgotten, leading to a sub-par experience for everyone.

Remember, Soylent Green Open Source is people

That's it, the blog post's over.

You can find the rest here. There are view comments.

Your project doesn't mean your playground

Posted September 8th, 2013. Tagged with community, django, python.

Having your own open source project is awesome. You get to build a thing you like, obviously. But you also get to have your own little playground, a chance to use your favorite tools: your favorite VCS, your favorite test framework, your favorite issue tracker, and so on.

And if the point of your project is to share a thing you're having fun with with the world, that's great, and that's probably all there is to the story (you may stop reading here). But if you're interested in growing a legion of contributors to build your small side project into an amazing thing, you need to forget about all of that and remember these words: Your contributors are more important than you.

Your preferences aren't that important: This means you probably shouldn't use bzr if everyone else is using git. You shouldn't use your own home grown documentation system when everyone is using Sphinx. Your playground is a tiny thing in the giant playground that is the Python (or whatever) community. And every unfamiliar thing a person needs to familiarize themselves with to contribute to your project is another barrier to entry, and another N% of potential contributors who won't actually materialize.

I'm extremely critical of the growing culture of "Github is open source", I think it's ignorant, shortsighted, and runs counter to innovation. But if your primary interest is "having more contributors", you'd be foolish to ignore the benefits of having your project on Github. It's where people are. It has tools that are better than almost anything else you'll potentially use. And most importantly it implies a workflow and toolset with which a huge number of people are familiar.

A successful open source project outgrows the preferences of its creators. It's important to prepare for that by remembering that (if you want contributors) your workflow preferences must always be subservient to those of your community.

You can find the rest here. There are view comments.

Why I support diversity

Posted August 28th, 2013. Tagged with python, diversity, community, programming, django.

I get asked from time to time why I care about diversity in the communities I'm a part of, particularly the Django, Python, and the broader software development and open source community.

There's a lot of good answers. The simplest one, and the one I imagine just about everyone can get behind: diverse groups perform better at creative tasks. A group composed of people from different backgrounds will do better work than a homogeneous group.

But that's not the main reason I care. I care because anyone who knows how to read some statistics knows that it's ridiculous that I'm where I am today. I have a very comfortable job and life, many great friends, and the opportunity to travel and to spend my time on the things I care about. And that's obscenely anomalous for a high school dropout like me.

All of that opportunity is because when I showed up to some open source communities no one cared that I was a high school dropout, they just cared about the fact that I seemed to be interested, wanted to help, and wanted to learn. I particularly benefited from the stereotype of white dropouts, which is considerably more charitable than (for example) the stereotype of African American dropouts.

Unfortunately, our communities aren't universally welcoming, aren't universally nice, and aren't universally thoughtful and caring. Not everyone has the same first experience I did. In particular people who don't look like me, aren't white males, disproportionately don't have this positive experience. But everyone ought to. (This is to say nothing of the fact that I had more access to computers at a younger age then most people.)

That's why I care. Because I benefited from so much, and many aren't able to.

This is why I support the Ada Initiative. I've had the opportunity to see their work up close twice. Once, as a participant in Ada Camp San Francisco's Allies Track. And a second time in getting their advice in writing the Code of Conduct for the Django community. They're doing fantastic work to support more diversity, and more welcoming communities.

Right now they're raising funds to support their operations for the next year, if you accord to, I hope you'll donate: http://supportada.org

You can find the rest here. There are view comments.

DjangoCon Europe 2011 Slides

Posted June 7th, 2011. Tagged with talk, djanogcon, pypy, python, django.

I gave a talk at DjangoCon Europe 2011 on using Django and PyPy (with another talk to be delivered tomorrow!), you can get the slides right on bitbucket, and for those who saw the talk, the PyPy compatibility wiki is here.

You can find the rest here. There are view comments.

Django and Python 3 (Take 2)

Posted February 17th, 2011. Tagged with python3, django, python.

About 18 months ago I blogged about Django and Python 3, I gave the official roadmap. Not a lot has changed since then unfortunately, we've dropped 2.3 entirely, 2.4 is on it's way out, but there's no 3.X support in core. One fun thing has changed though: I get to help set the roadmap, which is a) cool, b) means the fact that I care about Python 3 counts for something. I'm not going to get into "Why Py3k", because that's been done to death, but it's a better language, that I'd rather program in, so the only question is how do we get there.

I posted a while ago on Hacker News that I foresaw Python 3 support by the end of this summer. I still think that's reasonable, but there's a tiny unspoken question there: how do we go from 0 to there. I have an answer! Drum roll please... it turns out, as a student, I tend to have a lot of free time over the summer, and Google has this lovely thing called Google Summer of Code, where they give students money to work on open source. So I'd like to make Python 3 support for Django a GSOC project this summer. With myself either mentoring or student-ing. I don't care which role I take, just that someone who's committed to the project takes it on. I think this task is eminently reasonable for the timeline, especially in light of the work done by Martin von Löwis which, even though it probably doesn't apply, lays a lot of the ground work and identifies a lot of the key issues (and features a lot of utilities which will probably be of use).

All that being said any statements here or elsewhere (especially one involving timelines) reflects my personal goals and beliefs, not necessarily any other Django core developers, and certainly not an official project position.

You can find the rest here. There are view comments.

Announcing VCS Translator

Posted January 21st, 2011. Tagged with python, vcs, software, programming, django, open-source.

For the past month or so I've been using a combination of Google, Stackoverflow, and bugging people on IRC to muddle my way through using various VCS that I'm not very familiar with. And all too often my queries are of the form of "how do I do git foobar -q in mercurial?". A while ago I tweeted that someone should write a VCS translator website. Nobody else did, so when I woke up far too early today I decided I was going to get something online to solve this problem, today! About 6 hours later I tweeted the launch of VCS translator.

This is probably not even a minimum viable product. It doesn't handle a huge range of cases, or version control systems. However, it is open source and it provides a framework for answering these questions. If you're interested I'd encourage you to fork it on github and help me out in fixing some of the most requested translation (I remove them once they're implemented).

My future goals for this are to allow commenting, so users can explain the caveats of the translations (very infrequently are the translations one-to-one) and to add a proper API. Moreover my goal is to make this a useful tool to other programmers who, like myself, have far too many VCS in their lives.

You can find the rest here. There are view comments.

Getting the most out of tox

Posted December 17th, 2010. Tagged with testing, python, taggit, programming, django.

tox is a recent Python testing tool, by Holger Krekel. It's stated purpose is to make testing Python projects against multiple versions of Python (or different interpreters, like PyPy and Jython) much easier. However, it can be used for so much more. Yesterday I set it up for django-taggit, and it's an absolute dream, it automates testing against four different versions of Python, two different versions of Django, and it automates building the docs and checking for any warnings from Sphinx. I'll try to give a run through on what exactly you need to do to set this up with your project.

First create a tox.ini at the root of your project (i.e. in the same directory as your setup.py). Next create a [tox] section, and list out the enviroments you'd like to be tested (i.e. which Pythons):

[tox]
envlist =
    py25, py26 , py27, pypy

The enviroments we've listed out are a few of the ones included with tox, they point at specific versions of Python and use the default testing setup. Now add a [testenv] section which will tell tox how to actually run your tests:

[testenv]
commands =
    python setup.py test
deps =
    django==1.2.3

commands is the list of commands tox will run, and deps specifies any dependencies that are needed to run the tests (tox creates a virtualenv for each enviroment and doesn't include system wide site-packages, so you need to make sure you list everything needed by default here). If you want to use this same python setup.py test formulation you'll need to be using setuptools or distribute for your setup.py and provide the test_suite argument, Eric Holscher provides a good run down for how to do this for Django projects.

Now you should be able to just type tox into your command line and it will try to run your tests in each of the enviroments you specified. Hopefully they're all passing (future test runs will go faster, for the first run it has to install all the dependencies). The next thing you may want to do is get it setup to build your documentation. To do this create a [testenv:docs] section:

[testenv:docs]
changedir = docs
deps =
    sphinx
commands =
    sphinx-build -W -b html -d {envtmpdir}/doctrees . {envtmpdir}/html

This tells tox a few things. First changedir tells it that to run these commands it should cd into the docs/ directory (if you're docs live elsewhere, change as appropriate). Next it has sphinx as a dependency. Finally the commands invoke sphinx-build, -W makes warnings into errors (so you get an approrpiate failure status code), -b html uses the HTML builder, and the rest of the parameters tell Sphinx where the docs live and to put the output in the temporary directory that tox creates for each env.

Now all you need to do is add docs to the envlist, and a tox run will build your documentation.

The last thing you might want to do is set it up to test against multiple versions of a package (such as Django 1.1, Django 1.2, and trunk). To do this create another section whose name includes both the Python version and dependency version, e.g. [testenv:py25-trunk]. In it place:

[testenv:py25-trunk]
basepython = python2.5
deps =
    http://www.djangoproject.com/download/1.3-alpha-1/tarball/

This "inherits" from the default testenv, so it still has its commands, but we specify the basepython indicating this testenv is for python 2.5, and a different set of dependencies, here we're using the Django 1.3 alpha. You'll need to do a bit of copy-paste and create one of these for each version of Python you're testing against, and make sure to add each of these to the envlist.

At this point you should have a lean, mean, testing setup. With one command you can test your package with different dependencies, different pythons, and build your documentation. The tox documentation features tons of examples so you should use it as a reference.

You can find the rest here. There are view comments.

The continuous integration I want

Posted November 2nd, 2010. Tagged with testing, python, tests, django, open-source.

Testing is important, I've been a big advocate of writing tests for a while, however when you've got tests you need to run them. This is a big problem in open source, Django works on something like six versions of Python (2.4, 2.5, 2.6, 2.7, Jython, and PyPy), 4 databases (SQLite, PostgreSQL, MySQL, Oracle, plus the GIS backends, and external backends), and I don't even know how many operating systems (at least the various Linuxes, OS X, and Windows). If I tried to run the tests in all those configurations for every commit I'd go crazy. Reusable applications have it even worse, ideally they should be tested under all those configurations, with each version of Django they support. For a Django application that wants to work on Django 1.1, 1.2, and all of those interpreters, databases, and operating systems you've got over 100 configurations. Crazy. John Resig faced a similar problem with jQuery (5 major browsers, multiple versions, mobile and desktop, different OSs), and the result was Test Swarm (note that at this time it doesn't appear to be up), an automated way for people to volunteer their machines to run tests. We need something like that for Python.

It'd be nice if it were as simple as users pointing their browser at a URL, but that's not practical with Python: the environments we want to test in are more complex than what can be detected, we need to know what external services are available (databases, for example). My suggestion is users should maintain a config file (.ini perhaps) somewhere on their system, it would say what versions of Python are available, and what external services are available (and how they can be accessed, e.g. DB passwords). Then the user downloads a bootstrap script and runs it. This script sees what services the user has available on their machine and queries a central server to see what tests need to be run, given the configuration they have available. The script downloads a test, creates a virtualenv, and does whatever setup it needs to do (e.g. writing a Django settings.py file given the available DB configuration), and runs the tests. Finally it sends the test results back to the central server.

It's very much like a standard buildbot system, except any user can download the script and start running tests. There are a number of problems to be solved, how do you verify that a project's tests aren't malicious (only allow trusted tests to start), how do you verify that the test results are valid, how do you actually write the configuration for a test suite? However, if solved I think this could be an invaluable resource for the Python community. Have a reusable app you want tested? Sign it up for PonySwarm, add a post-commit hook, and users will automatically run the tests for it.

You can find the rest here. There are view comments.

Priorities

Posted October 24th, 2010. Tagged with programming, django, python, open-source.

When you work on something as large and multi-faceted as Django you need a way to prioritize what you work on, without a system how do I decide if I should work on a new feature for the template system, a bugfix in the ORM, a performance improvement to the localization features, or better docs for contrib.auth? There's tons of places to jump in and work on something in Django, and if you aren't a committer you'll eventually need one to commit your work to Django. So if you ever need me to commit something, here's how I prioritize my time on Django:

  1. Things I broke: If I broke a buildbot, or there's a ticket reported against something I committed this is my #1 priority. Though Django no longer has a policy of trunk generally being perfectly stable it's still a very good way to treat it, once it gets out of shape it's hard to get it back into good standing.
  2. Things I need for work: Strictly speaking these don't compete with the other items on this list, in that these happen on my work's time, rather than in my free time. However, practically speaking, this makes them a relatively high priority, since my work time is fixed, as opposed to free time for Django, which is rather elastic.
  3. Things that take me almost no time: These are mostly things like typos in the documentation, or really tiny bugfixes.
  4. Things I think are cool or important: These are either things I personally think are fun to work on, or are in high demand from the community.
  5. Other things brought to my attention: This is the most important category, I can only work on bugs or features that I know exist. Django's trac has about 2000 tickets, way too many for me to ever sift through in one sitting. Therefore, if you want me to take a look at a bug or a proposed patch it needs to be brought to my attention. Just pinging me on IRC is enough, if I have the time I'm almost always willing to take a look.

In actuality the vast majority of my time is spent in the bottom half of this list, it's pretty rare for the build to be broken, and even rarer for me to need something for work, however, there are tons of small things, and even more cool things to work on. An important thing to remember is that the best way to make something show up in category #3 is to have an awesome patch with tests and documentation, if all I need to do is git apply && git commit that saves me a ton of time.

You can find the rest here. There are view comments.

django-taggit 0.9 Released

Posted September 21st, 2010. Tagged with python, taggit, release, django, application.

It's been a long time coming, since taggit 0.8 was released in June, however 0.9 is finally here, and it brings a ton of cool bug fixes, improvements, and cleanups. If you don't already know, django-taggit is an application for django to make tagging simple and awesome. The biggest changes in this release are:

  • The addition of a bunch of locales.
  • Support for custom tag models.
  • Moving taggit.contrib.suggest into an external package.
  • Changed the filter syntax from filter(tags__in=["foo"]) to filter(tags__name__in=["foo"]). This change is backwards incompatible, but was necessary to support lookups on all fields.

You can checkout the docs for complete details. The goals for the 1.0 release are going to be adding some useful widgets for use in the admin and forms, hopefully adding a class based generic view to replace the current one, and adding a migration command to move data from django-tagging into the django-taggit models.

You can get the latest release on PyPi. Enjoy.

You can find the rest here. There are view comments.

DjangoCon 2010 Slides

Posted September 13th, 2010. Tagged with applications, reusable, django, djangocon, open-source.

DjangoCon 2010 was a total blast this year, and deserves a full recap, however for now I only have the time to post the slides from my talk on "Rethinking the Reusable Application Paradigm", the video has also been uploaded. Enjoy.

You can find the rest here. There are view comments.

Testing Utilities in Django

Posted July 6th, 2010. Tagged with testing, python, fixtures, tests, django.

Lately I've had the opportunity to do some test-driven development with Django, which a) is awesome, I love testing, and b) means I've been working up a box full of testing utilities, and I figured I'd share them.

Convenient get() and post() methods

If you've done testing of views with Django you probably have some tests that look like:

def test_my_view(self):
    response = self.client.get(reverse("my_url", kwargs={"pk": 1}))

    response = self.client.post(reverse("my_url", kwargs={"pk": 1}), {
        "key": "value",
    })

This was a tad too verbose for my tastes so I wrote:

def get(self, url_name, *args, **kwargs):
    return self.client.get(reverse(url_name, args=args, kwargs=kwargs))

def post(self, url_name, *args, **kwargs):
    data = kwargs.pop("data", None)
    return self.client.post(reverse(url_name, args=args, kwargs=kwargs), data)

Which are used:

def test_my_view(self):
    response = self.get("my_url", pk=1)

    response = self.post("my_url", pk=1, data={
        "key": "value",
    })

Much nicer.

login() wrapper

The next big issue I had was logging in and out of multiple users was too verbose. I often want to switch between users, either to check different permissions or to test some inter-user workflow. That was solved with a simple context manager:

class login(object):
    def __init__(self, testcase, user, password):
        self.testcase = testcase
        success = testcase.client.login(username=user, password=password)
        self.testcase.assertTrue(
            success,
            "login with username=%r, password=%r failed" % (user, password)
        )

    def __enter__(self):
        pass

    def __exit__(self, *args):
        self.testcase.client.logout()

def login(self, user, password):
    return login(self, user, password)

This is used:

def test_my_view(self):
    with self.login("username", "password"):
        response = self.get("my_url", pk=1)

Again, a lot better.

django-fixture-generator

Not quite a testing utility, but my app django-fixture-generator has made testing a lot easier for me. Fixtures are useful in getting data to work wit, but maintaining them is often a pain, you've got random scripts to generate them, or you just checkin some JSON to your repository with no way to regenerate it sanely (say if you add a new field to your model). django-fixture-generator gives you a clean way to manage the code for generating fixtures.

In general I've found context managers are a pretty awesome tool for writing clean, readable, succinct tests. I'm sure I'll have more utilities as I write more tests, hopefully someone finds these useful.

You can find the rest here. There are view comments.

Hey, could someone write this app for me

Posted June 8th, 2010. Tagged with applications, testing, python, fixtures, reusable, django.

While doing some work today I realized that generating fixtures in Django is way too much of a pain in the ass, and I suspect it's a pain in the ass for a lot of other people as well. I also came up with an API I'd kind of like to see for it, unfortunately I don't really have the time to write the whole thing, however I'm hoping someone else does.

The key problem with writing fixtures is that you want to have a clean enviroment to generate them, and you need to be able to edit them in the future. In addition, I'd personally prefer to have my fixture generation specifically be imperative. I have an API I think I think solves all of these concerns.

Essentially, in every application you can have a fixture_gen.py file, which contains a bunch of functions that can generate fixtures:

from fixture_generator import fixture_generator

from my_app.models import Model1, Model2


@fixture_generator(Model1, Model2, requires=["my_app.other_dataset"])
def some_dataset():
    # Some objects get created here

@fixture_generator(Model1)
def other_dataset():
    # Some objects get created here

Basically you have a bunch of functions, each of which is responsible for creating some objects that will become a fixture. You then decorate them with a decorator that specifies what models need to be included in the fixture that results from them, and finally you can optionally specify dependencies (these are necessary because a dependency could use models which your fixture doesn't).

After you have these functions there's a management command which can be invoked to actually generate the fixtures:

$ ./manage.py generate_fixture my_app.some_dataset --format=json --indent=4

Which actually creates the clean database enviroment, handles the dependencies, calls the functions, and dumps the fixtures to stdout. Then you can redirect that stdout off to a file somewhere, for use in testing or whatever else people use fixtures for.

Hopefully someone else has this problem, and likes the API enough to build this. Failing that I'll try to make some time for it, but no promises when (aka if you want it you should probably build it).

You can find the rest here. There are view comments.

DjangoCon.eu slides

Posted May 24th, 2010. Tagged with python, nosql, django, orm, djangocon.

I just finished giving my talk at DjangoCon.eu on Django and NoSQL (also the topic of my Google Summer of Code project). You can get the slides over at slideshare. My slides from my lightning talk on django-templatetag-sugar are also up on slideshare.

You can find the rest here. There are view comments.

A Tour of the django-taggit Internals

Posted May 9th, 2010. Tagged with taggit, django, python.

In a previous post I talked about a cool new customization API that django-taggit has. Now I'm going to dive into the internals.

The public API is almost exclusively exposed via a class named TaggableManager, you attach one of these to your model and it has some cool tagging APIs. This class basically masquerades as ManyToManyField, this is how it gets cool things like filtering and forms automatically. If you look at its definition you'll see it has a bunch of attributes that it never actually uses, basically all of these act to emulate the Field interface. This class is also the entry point for the new customization API, exposed via the through parameter. This basically acts as an analogue to the through parameter on actual ManyToManyFields (documented here). The final crucial method is __get__, which turns TaggableManager into a descriptor.

This descriptor exposes an _TaggableManager class, which holds some of the internal logic. This class exposes all of the "managery" type methods, add(), set(), remove(), and clear(). This class is pretty simple, basically it just proxies bewteen the methods called and it's through model. This class is, unlike TaggableManager, actually a subclass of models.Manager, it just defines get_query_set() to return a QuerySet of all the tags for that model, or instance, and then filtering, ordering, and more falls out naturally.

Beyond that there's not too much going on. The code is fairly simple, and it's not particularly long. I've found this to be a pretty good pattern for extensibility, and it really resolves the need to have dozens of parameters, or GenericForeignKeys popping out every which way.

You can find the rest here. There are view comments.

Cool New django-taggit API

Posted May 4th, 2010. Tagged with applications, django, python.

A little while ago I wrote about some of the issues with the reusable application paradigm in Django. Yesterday Carl Meyer pinged me about an issue in django-taggit, it uses an IntegerField for the GenericForeignKey, which is great. Except for when you have a model with a CharField, TextField, or anything else for a primary key. The easy solution is to change the GenericForeignKey to be something else. But that's lame, a pain in the ass, and a hack (more of a hack than a GenericForeignKey in the first place).

The alternate solution we came up with:

from django.db import models

from taggit.managers import TaggableManager
from taggit.models import TaggedItemBase


class TaggedFood(TaggedItemBase):
    content_object = models.ForeignKey('Food')

class Food(models.Model):
    # ... fields here

    tags = TaggableManager(through=TaggedFood)

Custom through models for the taggable relationship! This let's the included GenericForeignKey implementation cater to the common case of integer primary keys, and lets other people provide their own implementations when necessary. Plus it means doing things like, adding a ForeignKey to auth.User or adding the "originally" typed version of the tag (for systems where tags are normalized).

In addition I've finally added some docs, they aren't really complete, but they're a start. I'm planning a release for sometime next week, unless some major issue pops up.

You can find the rest here. There are view comments.

Making Django and PyPy Play Nice (Part 1)

Posted April 16th, 2010. Tagged with pypy, django, python.

If you track Django's commits aggressivly (ok so just me...), you may have noticed that there have been a number of commits to improve the compatibility of Django and PyPy in the last couple of days. In the run up to Django 1.0 there were a ton of commits to make sure Django didn't have too many assumptions that it was running on CPython, and that it could work with systems like Jython and PyPy. Unfortunately, since then, our support has laxed a little, and a number of tests have begun to fail. In the past couple of days I've been working to correct this. Here are some of the things that were wrong

The first issue I ran into was, in various tests, response.context and response.template being None, instead of the lists that were expected. This was a pain to diagnose, but the ultimate source of the bug is that Django registers a signal handler in the test client that listens for templates being rendered. However, it doesn't actually unregister that signal receiver. Instead it relies on the fact that signals are stored as weakrefs, and when the function ends, the receivers that were registered (which were local variables) would be automatically deallocated. On PyPy, Jython, and any other system with a garbage collector more advanced than CPython's reference counting, the local variables aren't guaranteed to be deallocated at the end of the function, and therefore the weakref can still be alive. Truthfully, I'm not 100% sure how this results in the next signal being sent not to store the appropriate data, but it does. The solution is the make sure that the signals are manually disconnected at the end of the run. This was fixed in r12964 of Django.

The next issue was actually a problem in PyPy, specifically it was crashing with a UnicodeDecodeError. When I say crashing I mean crashing in the C sense of the word, not the Python, nice exception and stack trace sense... sort of. PyPy is written in a language named RPython, RPython is Pythonesque (all valid RPython is valid Python), and has exceptions. However if they aren't caught they sort of propagate to the top, they give you a kind-of-ok stacktrace, but its function names are all the generated function names from the C source, not useful ones from the RPython source. Internally, PyPy uses an OperationError to keep track of exceptions at the interpreter level. A trick to debugging RPython is, if running the code on top of CPython works, than running it translated to C will work, and the contrapositive appears true as well, if the C doesn't work, running on CPython won't work. After trying to run the code on CPython, the location of the exception bubbled right to the top, and the fix followed easily.

These are the first two issues I fixed, a couple others have been fixed and committed, and a further few have also been fixed, but not committed yet. I'll be writing about those as I find the time.

You can find the rest here. There are view comments.

Designer Developer Relations

Posted March 29th, 2010. Tagged with design, html, django.

Lately I've been working with quite a few designers so I've been thinking about what exactly constitutes the ideal working relationship. Here are some of the models I've seen, in order of best to worst.

Best

The ideal relationship would be I write a view function, let the designer know what template they need to write, and what context is available, then they can ask me if they need any extra logic stuff available. At the end I can go and add in whatever Javascript is needed. This requires the designer to know the template language, and ideally be able to at least read model definitions so they know what properties are available to them.

Good

The designer provides a final templates (in the template syntax, with inheritance, blocks, etc.), with example data, and I plug some template syntax in there to wire it up with the real data. This is pretty convenient, the one downside is that I, the developer, have to know things like "do we truncate the values here", "do we need an ellipsis", or "if we only have 4 values do we have 2 columns with 3 and 1 values, or 2 columns with 2 and 2 values". This model requires whatever HTML/CSS the designer provides to cover all the scenarios, or the developer would be running back and forth asking questions all day.

Ok

The designer provides some HTML/CSS files. For these I have to convert them to the template syntax, and substitute the real values here. Here I need to know a lot about the semantics of the table to figure out what belongs in what blocks, how page elements are used across different pages, etc. For this reason it's not ideal, but it can work, and it requires minimal knowledge about the programming language and tools used on the part of your designers.

Bad

Anything that involves me writing HTML or CSS. I'm bad at those.

These are the working relationships I've worked with so far, the common theme is that the more your designer knows about your tools and languages the better (HTML is good, Django templates is better, being able to actually read a model definition is best).

You can find the rest here. There are view comments.

Towards Application Objects in Django

Posted March 28th, 2010. Tagged with applications, reusable, django, python.

Django's application paradigm (and the accompanying reusable application environment) have served it exceptionally well, however there are a few well known problems with it. Chief among these is pain in extendability (as exemplified by the User model), and abuse of GenericForeignKeys where a true ForeignKey would suffice (in the name of being generic), there are also smaller issues, such as wanting to install the same application multiple times, or having applications with the same "label" (in Django parlance this means the path.split(".")[-1]). Lately I've been thinking that the solution to these problems is a more holistic approach to application construction.

It's a little difficult to describe precisely what I'm thinking about, so I'll start with an example:

from django.contrib.auth import models as auth_models

class AuthApplication(Application):
    models = auth_models

    def login(self, request, template_name='registration/login.html'):
        pass

    # ... etc

And in settings.py:

from django.core import app

INSTALLED_APPS = [
    app("django.contrib.auth.AuthApplication", label="auth"),
]

The critical elements are that a) all models are referred to be the attribute on the class, so that they can be swapped out by a subclass, b) applications are now installed using an app object that wraps the app class, with a label (to allow multiple apps of the same name to be registered). But how does this allow swapping out the User model, from the perspective of people who are expecting to just be able to use django.contrib.auth.models.User for any purpose? Instead of explicit references to the model these could be replaced with: get_app("auth").models.User.

What about the issue of GenericForeignKeys? To solve these we'd really need something like C++'s templates, or Java's generics, but we'll settle for the next best thing, callables! Imagine a comment app where the models.py looked like:

from django.core import get_app
from django.db import models

def get_models(target_model):
    class Comment(models.Model):
        obj = models.ForeignKey(target_model)
        commenter = models.ForeignKey(get_app("auth").models.User)
        text = models.TextField()

    return [Comment]

Then instead of providing a module to be models on the application class this callable would be provided, and Django would know to call it with the appropriate model class based on either a class attribute (for subclasses) or a parameter from the app object (to allow for easily installing more than one of the comment app, for each object that should allow commenting), in practice I think allowing the same app to be installed multiple times would require some extra parameters to the get_models function, so that things like db_table can be adjusted appropriately.

I think this could be done in a backwards compatible manner, by having strings that are in INSTALLED_APPS automatically generate an app object that was the default "filler" one with just a models module, and the views ignoring self, and a default label. Like I said this is all just a set of ideas floating around my brain at this point, but hopefully by floating this design it'll get people thinking about big architecture ideas like this.

You can find the rest here. There are view comments.

Committer Models of Unladen Swallow, PyPy, and Django

Posted February 25th, 2010. Tagged with pypy, python, unladen-swallow, django, open-source.

During this year's PyCon I became a committer on both PyPy and Unladen Swallow, in addition I've been a contributer to Django for quite a long time (as well as having commit privileges to my branch during the Google Summer of Code). One of the things I've observed is the very different models these projects have for granting commit privileges, and what the expectations and responsibilities are for committers.

Unladen Swallow

Unladen Swallow is a Google funded branch of CPython focused on speed. One of the things I've found is that the developers of this project carry over some of the development process from Google, specifically doing code review on every patch. All patches are posted to Rietveld, and reviewed, often by multiple people in the case of large patches, before being committed. Because there is a high level of review it is possible to grant commit privileges to people without requiring perfection in their patches, as long as they follow the review process the project is well insulated against a bad patch.

PyPy

PyPy is also an implementation of Python, however its development model is based largely around aggressive branching (I've never seen a project handle SVN's branching failures as well as PyPy) as well as sprints and pair programming. By branching aggressively PyPy avoids the overhead of reviewing every single patch, and instead only requires review when something is already believed to be "trunk-ready", further this model encourages experimentation (in the same way git's light weight branches do). PyPy's use of sprints and pair programming are two ways to avoid formal code reviews and instead approach code quality as more of a collaborative effort.

Django

Django is the project I've been involved with for the longest, and also the only one I don't have commit privileges on. Django is extremely conservative in giving out commit privileges (there about a dozen Django committers, and about 500 names in the AUTHORS file). Django's development model is based neither on branching (only changes as large in scope as multiple database support, or an admin UI refactor get their own branch) nor on code review (most commits are reviewed by no one besides the person who commits them). Django's committers maintain a level of autonomy that isn't seen in either of the other two projects. This fact comes from the period before Django 1.0 was released when Django's trunk was often used in production, and the need to keep it stable at all times, combined with the fact that Django has no paid developers who can guarantee time to do code review on patches. Therefore Django has maintained code quality by being extremely conservative in granting commit privileges and allowing developers with commit privileges to exercise their own judgment at all times.

Conclusion

Each of these projects uses different methods for maintaining code quality, and all seem to be successful in doing so. It's not clear whether there's any one model that's better than the others, or that any of these projects could work with another's model. Lastly, it's worth noting that all of these models are fairly orthogonal to the centralized VCS vs. DVCS debate which often surrounds such discussions.

You can find the rest here. There are view comments.

Hot Django on WSGI Action (announcing django-wsgi)

Posted January 11th, 2010. Tagged with release, django, python, wsgi.

For a long time it's been possible to deploy a Django project under a WSGI server, and in the run up to Django's 1.0 release a number of bugs were fixed to make Django's WSGI handler as compliant to the standard as possible. However, Django's support for interacting with the world of WSGI application, middleware, and frameworks has been less than stellar. However, I recently got inspired to improve this situation.

WSGI (Web Server Gateway Interface) is a specification for how applications and frameworks in Python can interface with a server. There are tons of servers that support the WSGI interface, most notably mod_wsgi (an Apache plugin), however there are tons of other ones, spawning, twisted, uwsgi, gunicorn, cherrypy, and probably dozens more.

The inspiration for improving Django's integration with the WSGI world was Ruby on Rails 3's improved support for Rack, Rack is the Ruby world's equivilant to WSGI. In Rails 3 every layer of the stack, the entire application, the dispatch, and individual controllers, is exposed as a Rack application. It occured to me that it would be pretty swell if we could do the same thing with Django, allow individual views and URLConfs to be exposed as WSGI application, and the reverse, allowing WSGI application to be deployed inside of Django application (via the standard URLConf mapping system). Another part of this inspiration was discussing gunicorn with Eric Florenzano, gunicorn is an awesome new WSGI server, inspired by Ruby's Unicorn, there's not enough space in this post to cover all the reasons it is awesome, but it is.

The end result of this is a new package, django-wsgi, which aims to bridge the gap between the WSGI world and the Django world. Here's an example of exposing a Django view as a WSGI application:

from django.http import HttpResponse

from django_wsgi import wsgi_application

def my_view(request):
    return HttpResponse("Hello world, I'm a WSGI application!")


application = wsgi_application(my_view)

And now you can point any WSGI server at this and it'll serve it up for you. You can do the same thign with a URLConf:

from django.conf.urls.defaults import patterns
from django.http import HttpResponse

from django_wsgi import wsgi_application

def hello_world(request):
    return HttpResponse("Hello world!")

def hello_you(request, name):
    return HttpResponse("Hello %s!" % name)


urls = patterns("",
    (r"^$", hello_world),
    (r"^(?P<name>\w+)/$", hello_you)
)

application = wsgi_application(urls)

Again all you need to do is point your server at this and it just works. However, the point of all this isn't just to make building single file applications easier (although this definitely does), the real win is that you can take a Django application and mount it inside of another WSGI application through whatever process it supports. Of course you can also go the other direction, mount a WSGI application inside of a Django URLconf:

from django.conf.urls.defaults import *

from django_wsgi import django_view

def my_wsgi_app(environ, start_response):
    start_response("200 OK", [("Content-type", "text/plain")])
    return ["Hello World!"]

urlpatterns = patterns("",
    # other views here
    url("^my_view/$", django_view(my_wsgi_app))
)

And that's all there is to it. Write your apps the way you want and deploy them, plug them in to each other, whatever. There's a lot of work being done in the Django world to play nicer with the rest of the Python ecosystem, and that's definitely a good thing. I'd also like to thank Armin Ronacher for helping me make sure this actually implements WSGI correctly. Please use this, fork it, send me hate mail, improve it, and enjoy it!

You can find the rest here. There are view comments.

You Built a Metaclass for *what*?

Posted November 30th, 2009. Tagged with c++, django, python, metaclass.

Recently I had a bit of an interesting problem, I needed to define a way to represent a C++ API in Python. So, I figured the best way to represent that was one class in Python for each class in C++, with a functions dictionary to track each of the methods on each class. Seems simple enough right, do something like this:

class String(object):
    functions = {
        "size": Function(Integer, []),
    }

We've got a String class with a functions dictionary that maps method names to Function objects. The Function constructor takes a return type and a list of arguments. Unfortunately we run into a problem when we want to do something like this:

class String(object):
    functions = {
        "size": Function(Integer, []),
        "append": Function(None, [String])
    }

If we try to run this code we're going to get a NameError, String isn't defined yet. Django models have a similar issue, with recursive foreign keys. Django's solution is to use the placeholder string "self", and have a metaclass translate it into the right class. Also having a slightly more declarative API might be nice, so something like this:

class String(DeclarativeObject):
    size = Function(Integer, [])
    append = Function(None, ["self"])

So now that we have a nice pretty API we need our metaclass to make it happen:

RECURSIVE_TYPE_CONSTANT = "self"

class DeclarativeObjectMetaclass(type):
    def __new__(cls, name, bases, attrs):
        functions = dict([(n, attr) for n, attr in attrs.iteritems()
            if isinstance(attr, Function)])
        for attr in functions:
            attrs.pop(attr)
        new_cls = super(DeclarativeObjectMetaclass, cls).__new__(cls, name, bases, attrs)
        new_cls.functions = {}
        for name, function in functions.iteritems():
            if function.return_type == RECURSIVE_TYPE_CONSTANT:
                function.return_type = new_cls
            for i, argument in enumerate(function.arguments):
                if argument == RECURSIVE_TYPE_CONSTANT:
                    function.arguments[i] = new_cls
            new_cls.functions[name] = function
        return new_cls

class DeclarativeObject(object):
    __metaclass__ = DeclarativeObjectMetaclass

And that's all their is to it. We take each of the functions on the class out of the attributes, create a normal class instance without the functions, and then we do the replacements on the function objects and stick them in a functions dictionary.

Simple patterns like this can be used to build beautiful APIs, as is seen in Django with the models and forms API.

You can find the rest here. There are view comments.

Getting Started with Testing in Django

Posted November 29th, 2009. Tagged with tests, django, python.

Following yesterday's post another hotly requested topic was testing in Django. Today I wanted to give a simple overview on how to get started writing tests for your Django applications. Since Django 1.1, Django has automatically provided a tests.py file when you create a new application, that's where we'll start.

For me the first thing I want to test with my applications is, "Do the views work?". This makes sense, the views are what the user sees, they need to at least be in a working state (200 OK response) before anything else can happen (business logic). So the most basic thing you can do to start testing is something like this:

from django.tests import TestCase
class MyTests(TestCase):
    def test_views(self):
        response = self.client.get("/my/url/")
        self.assertEqual(response.status_code, 200)

By just making sure you run this code before you commit something you've already eliminated a bunch of errors, syntax errors in your URLs or views, typos, forgotten imports, etc. The next thing I like to test is making sure that all the branches of my code are covered, the most common place my views have branches is in views that handle forms, one branch for GET and one for POST. So I'll write a test like this:

from django.tests import TestCase
class MyTests(TestCase):
    def test_forms(self):
        response = self.client.get("/my/form/")
        self.assertEqual(response.status_code, 200)

        response = self.client.post("/my/form/", {"data": "value"})
        self.assertEqual(response.status_code, 302) # Redirect on form success

        response = self.client.post("/my/form/", {})
        self.assertEqual(response.status_code, 200) # we get our page back with an error

Now I've tested both the GET and POST conditions on this view, as well the form is valid and form is invalid cases. With this strategy you can have a good base set of tests for any application with not a lot of work. The next step is setting up tests for your business logic. These are a little more complicated, you need to make sure models are created and edited in the right cases, emails are sent in the right places, etc. Django's testing documentation is a great place to read more on writing tests for your applications.

You can find the rest here. There are view comments.

Django and Python 3

Posted November 28th, 2009. Tagged with django, python.

Today I'm starting off doing some of the posts people want to see, and the number one item on that list is Django and Python 3. Python 3 has been out for about a year at this point, and so far Django hasn't really started to move towards it (at least at a first glance). However, Django has already begun the long process towards moving to Python 3, this post is going to recap exactly what Django's migration strategy is (most of this post is a recap of a message James Bennett sent to the django-developers mailing list after the 1.0 release, available here).

One of the most important things to recognize in this that though there are many developers using Django for smaller projects, or new projects that want to start these on Python 3, there are also a great many more with legacy (as if we can call recent deployments on Python2.6 and Django 1.1 legacy) deployments that they want to maintain and update. Further, Django's latest release, 1.1, has support for Python releases as old as 2.3, and a migration to Python 3 from 2.3 is nontrivial. However, it is significantly easier to make this migration from Python 2.6. This is the crux of James's plan, people want to move to Python 3.0 and moving towards Python 2.6 makes this easier for them and us. Therefore, since the 1.1 release Django has been removing support for one point version of Python per Django release. So, Django 1.1 will be the last release to support Python 2.3, 1.2 will be the last to support 2.4, etc. This plan isn't guaranteed, if there's a compelling reason to maintain support for a version for longer it will likely override this plan (for example if a particularly common deployment platform only offered Python 2.5 removing support for it might be delayed an additional release).

At the end of this process Django is going to end up only supporting Python 2.6. At this point (or maybe even before), a strategy will need to be devised for how to actually handle the switch. Some possibilities are, 1) having an official breakpoint, only one version is supported at a given time, 2) Python 3 support begins in a branch that tracks trunk and eventually it switches to become trunk once Python 3 is the more common deployment, 3) Python 2.6 and 3 are supported from a single codebase. I'm not sure which one of these is easiest, other projects such as PLY have chosen to go with option 3, however my inclination is that option 2 will be best for Django since issues like bytes vs. string are particularly prominent in Django (since it talks to so many external data sources).

For people who are interested Martin von Löwis actually put together a patch that, at the time, gave Django Python 3 support (at least enough to run the tutorial under SQLite). If you're very interested in Django on Python 3 the best path would probably be to bring that patch up to date (unless it's wildly out of date, I haven't checked), and starting to fix new things that have been introduced since the patch was written. This work isn't likely to get any official support, since maintaining Python 2.4 support and Python 3 would be far too difficult, however there's no reason you can't maintain the patch externally on something like Github or Bitbucket.

You can find the rest here. There are view comments.

Why Meta.using was removed

Posted November 27th, 2009. Tagged with python, models, django, orm, gsoc.

Recently Russell Keith-Magee and I decided that the Meta.using option needed to be removed from the multiple-db work on Django, and so we did. Yesterday someone tweeted that this change caught them off guard, so I wanted to provide a bit of explanation as to why we made that change.

The first thing to note is that Meta.using was very good for one specific use case, horizontal partitioning by model. Meta.using allowed you to tie a specific model to a specific database by default. This meant that if you wanted to do things like have users be in one db and votes in another this was basically trivial. Making this use case this simple was definitely a good thing.

The downside was that this solution was very poorly designed, particularly in light on Django's reusable application philosophy. Django emphasizes the reusability of application, and having the Meta.using option tied your partitioning logic to your models, it also meant that if you wanted to partition a reusable application onto another DB this easily the solution was to go in and edit the source for the reusable application. Because of this we had to go in search of a better solution.

The better solution we've come up with is having some sort of callback you can define that lets you decide what database each query should be executed on. This would let you do simple things like direct all queries on a given model to a specific database, as well as more complex sharding logic like sending queries to the right database depending on which primary key value the lookup is by. We haven't figured out the exact API for this, and as such this probably won't land in time for 1.2, however it's better to have the right solution that has to wait than to implement a bad API that would become deprecated in the very next release.

You can find the rest here. There are view comments.

Filing a Good Ticket

Posted November 24th, 2009. Tagged with django, python, software.

I read just about every single ticket that's filed in Django's trac, and at this point I'e gotten a pretty good sense of what (subjectively) makes a useful ticket. Specifically there are a few things that can make your ticket no better than spam, and a few that can instantly bump your ticket to the top of my "TODO" list. Hopefully, these will be helpful in both filing ticket's for Django as well as other open source projects.

  • Search for a ticket before filing a new one. Django's trac, for example, has at least 10 tickets describing "Decoupling urls in the tutorial, part 3". These have all been wontfixed (or closed as a duplicate of one of the others). Each time one of these is filed it takes time for someone to read through it, write up an appropriate closing message, and close it. Of course, the creator of the ticket also invested time in filing the ticket. Unfortunately, for both parties this is time that could be better spent doing just about anything else, as the ticket has been decisively dealt with plenty of times.
  • On a related note, please don't reopen a ticket that's been closed before. This one depends more on the policy of the project, in Django's case the policy is that once a ticket has been closed by a core developer the appropriate next step is to start a discussion on the development mailing list. Again this results in some wasted time for everyone, which sucks.
  • Read the contributing documentation. Not every project has something like this, but when a project does it's definitely the right starting point. It will hopefully contain useful general bits of knowledge (like what I'm trying to put here) as well as project specific details, what the processes are, how to dispute a decision, how to check the status of a patch, etc.
  • Provide a minimal test case. If I see a ticket who's description involves a 30 field model, it drops a few rungs on my TODO list. Large blocks of code like this take more time to wrap ones head around, and most of it will be superfluous. If I see just a few lines of code it takes way less time to understand, and it will be easier to spot the origin of the problem. As an extension to this if the test case comes in the form of a patch to Django's test suite it becomes even easier for a developer to dive into the problem.
  • Don't file a ticket advocating a major feature or sweeping change. Pretty much if it's going to require a discussion the right place to start is the mailing list. Trac is lousy at facilitating discussions, mailing lists are designed explicitly for that purpose. A discussion on the mailing list can more clearly outline what needs to happen, and it may turn out that several tickets are needed. For example filing a ticket saying, "Add CouchDB support to the ORM" is pretty useless, this requires a huge amount of underlying changes to make it even possible, and after that a database backend can live external to Django, so there's plenty of design decisions to go around.

These are some of the issues I've found to be most pressing while reviewing tickets for Django. I realize they are mostly in the "don't" category, but filing a good ticket can sometimes be as good as clearly stating what the problem is, and how to reproduce it.

You can find the rest here. There are view comments.

A Bit of Benchmarking

Posted November 22nd, 2009. Tagged with pypy, python, django, compiler, programming-languages.

PyPy recently posted some interesting benchmarks from the computer language shootout, and in my last post about Unladen Swallow I described a patch that would hopefully be landing soon. I decided it would be interesting to benchmarks something with this patch. For this I used James Tauber's Mandelbulb application, at both 100x100 and 200x200. I tested CPython, Unladen Swallow Trunk, Unladen Swallow Trunk with the patch, and a recent PyPy trunk (compiled with the JIT). My results were as follows:

VM 100 200
CPython 2.6.4 17s 64s
Unladen Swallow Trunk 16s 52s
Unladen swallow Trunk + Patch 13s 49s
PyPy Trunk 10s 46s

Interesting results. At 100x100 PyPy smokes everything else, and the patch shows a clear benefit for Unladen. However, at 200x200 both PyPy and the patch show diminishing returns. I'm not clear on why this is, but my guess is that something about the increased size causes a change in the parameters that makes the generated code less efficient for some reason.

It's important to note that Unladen Swallow has been far less focussed on numeric benchmarks than PyPy, instead focusing on more web app concerns (like template languages). I plan to benchmark some of these as time goes on, particularly after PyPy merges their "faster-raise" branch, which I'm told improves PyPy's performance on Django's template language dramatically.

You can find the rest here. There are view comments.

Things College Taught me that the "Real World" Didn't

Posted November 21st, 2009. Tagged with pypy, parse, python, unladen-swallow, compile, django, ply, programming-languages, c++, response, lex, compiler, yacc, college.

A while ago Eric Holscher blogged about things he didn't learn in college. I'm going to take a different spin on it, looking at both things that I did learn in school that I wouldn't have learned else where (henceforth defined as my job, or open source programming), as well as thinks I learned else where instead of at college.

Things I learned in college:

  • Big O notation, and algorithm analysis. This is the biggest one, I've had little cause to consider this in my open source or professional work, stuff is either fast or slow and that's usually enough. Learning rigorous algorithm analysis doesn't come up all the time, but every once in a while it pops up, and it's handy.
  • C++. I imagine that I eventually would have learned it myself, but my impetus to learn it was that's what was used for my CS2 class, so I started learning with the class then dove in head first. Left to my own devices I may very well have stayed in Python/Javascript land.
  • Finite automaton and push down automaton. I actually did lexing and parsing before I ever started looking at these in class (see my blog posts from a year ago) using PLY, however, this semester I've actually been learning about the implementation of these things (although sadly for class projects we've been using Lex/Yacc).

Things I learned in the real world:

  • Compilers. I've learned everything I know about compilers from reading my papers from my own interest and hanging around communities like Unladen Swallow and PyPy (and even contributing a little).
  • Scalability. Interesting this is a concept related to algorithm analysis/big O, however this is something I've really learned from talking about this stuff with guys like Mike Malone and Joe Stump.
  • APIs, Documentation. These are the core of software development (in my opinion), and I've definitely learned these skills in the open source world. You don't know what a good API or documentation is until it's been used by someone you've never met and it just works for them, and they can understand it perfectly. One of the few required, advanced courses at my school is titled, "Software Design and Documentation" and I'm deathly afraid it's going to waste my time with stuff like UML, instead of focusing on how to write APIs that people want to use and documentation that people want to read.

So these are my short lists. I've tried to highlight items that cross the boundaries between what people traditionally expect are topics for school and topics for the real world. I'd be curious to hear what other people's experience with topics like these are.</div>

You can find the rest here. There are view comments.

Another Pair of Unladen Swallow Optimizations

Posted November 19th, 2009. Tagged with pypy, python, unladen-swallow, django, programming-languages.

Today a patch of mine was committed to Unladen Swallow. In the past weeks I've described some of the optimizations that have gone into Unladen Swallow, in specific I looked at removing the allocation of an argument tuple for C functions. One of the "on the horizon" things I mentioned was extending this to functions with a variable arity (that is the number of arguments they take can change). This has been implemented for functions that take a finite range of argument numbers (that is, they don't take *args, they just have a few arguments with defaults). This support was used to optimize a number of builtin functions (dict.get, list.pop, getattr for example).

However, there were still a number of functions that weren't updated for this support. I initially started porting any functions I saw, but it wasn't a totally mechanical translation so I decided to do a little profiling to better direct my efforts. I started by using the cProfile module to see what functions were called most frequently in Unladen Swallow's Django template benchmark. Imagine my surprise when I saw that unicode.encode was called over 300,000 times! A quick look at that function showed that it was a perfect contender for this optimization, it was currently designated as a METH_VARARGS, but in fact it's argument count was a finite range. After about of dozen lines of code, to change the argument parsing, I ran the benchmark again, comparing it a control version of Unladen Swallow, and it showed a consistent 3-6% speedup on the Django benchmark. Not bad for 30 minutes of work.

Another optimization I want to look at, which hasn't landed yet, is one of optimize various operations. Right now Unladen Swallow tracks various data about the types seen in the interpreter loop, however for various operators this data isn't actually used. What this patch does is check at JIT compilation time whether the operator site is monomorphic (that is there is only one pair of types ever seen there), and if it is, and it is one of a few pairings that we have optimizations for (int + int, list[int], float - float for example) then optimized code is emitted. This optimized code checks the types of both the arguments that they are the expected ones, if they are then the optimized code is executed, otherwise the VM bails back to the interpreter (various literature has shown that a single compiled optimized path is better than compiling both the fast and slow paths). For simple algorithm code this optimization can show huge improvements.

The PyPy project has recently blogged about the results of the results of some benchmarks from the Computer Language Shootout run on PyPy, Unladen Swallow, and CPython. In these benchmarks Unladen Swallow showed that for highly algorithmic code (read: mathy) it could use some work, hopefully patches like this can help improve the situation markedly. Once this patch lands I'm going to rerun these benchmarks to see how Unladen Swallow improves, I'm also going to add in some of the more macro benchmarks Unladen Swallow uses to see how it compares with PyPy in those. Either way, seeing the tremendous improvements PyPy and Unladen Swallow have over CPython gives me tremendous hope for the future.

You can find the rest here. There are view comments.

Announcing django-admin-histograms

Posted November 19th, 2009. Tagged with admin, release, django, python.

This is just a quick post because it's already past midnight here. Last week I realized some potentially useful code that I extracted from the DjangoDose and typewar codebases. Basically this code let's you get simple histogram reports for models in the admin, grouped by a date field. After I released it David Cramer did some work to make the code slightly more flexible, and to provide a generic templatetag for creating these histograms anywhere. The code can be found on github, and if you're curious what it looks like there's a screenshot on the wiki. Enjoy.

You can find the rest here. There are view comments.

My Next Blog

Posted November 16th, 2009. Tagged with django, blogging.

I've been using this Blogspot powered blog for over a year now, and it is starting to get a little long in the tooth. Therefore I've been planning on moving to a new, shinny, blog of my own in the coming months. Specifically it's going to be my goal to get myself 100% migrated over my winter break. Therefore I've started building myself a list of features I want:

  • Able to migrate all my old posts. This isn't going to be a huge deal, but it has to happen as I have quite a few posts I don't want to lose.
  • Accepts restructured text. I'm sick of writing my posts in reST and then converting it into HTML for Blogspot.
  • Pretty code highlighting.
  • Disqus for comments. I don't want to have to deal with spam or anything else, let users post with Disqus and they can deal with all the trouble.
  • Looks decent. Design is a huge weak point for me, so most of my time is going to be dedicated to this I think.
  • Django powered!

That's my pony list thus far. There's a good bet I'll use django-mingus since I've hear such good things about it, but for now I'm just dreaming of being able to write these posts in reST.

You can find the rest here. There are view comments.

Why jQuery shouldn't be in the admin

Posted November 14th, 2009. Tagged with python, admin, jquery, django, gsoc.

This summer as a part of the Google Summer of Code program Zain Memon worked on improving the UI for Django's admin, specifically he integrated jQuery for various interface improvements. I am opposed to including jQuery in Django's admin, as far as I know I'm the only one. I should note that on a personal level I love jQuery, however I don't think that means it should be included in Django proper. I'm going to try to explain why I think it's a bad idea and possibly even convince you.

The primary reason I'm opposed is because it lowers the pool of people who can contribute to developing Django's admin. I can hear the shouts from the audience, "jQuery makes Javascript easy, how can it LOWER the pool". By using jQuery we prevent people who know Javascript, but not jQuery from contributing to Django's admin. If we use more "raw" Javascript then anyone who knows jQuery should be able to contribute, as well as anyone who knows Mootools, or Dojo, or just vanilla Javascript. I'm sure there are some people who will say, "but it's possible to use jQuery without knowing Javascript", I submit to you that this is a bad thing and certainly shouldn't be encouraged. We need to look no further than Jacob Kaplan-Moss's talks on Django where he speaks of his concern at job postings that look for Django experience with no mention of Python.

The other reason I'm opposed is because selecting jQuery for the admin gives the impression that Django has a blessed Javascript toolkit. I'm normally one to say, "if people make incorrect inferences that's their own damned problem," however in this case I think they would be 100% correct, Django would have blessed a Javascript toolkit. Once again I can hear the calls, "But, it's in contrib, not Django core", and again I disagree, Django's contrib isn't like other projects' contrib directories that are just a dumping ground for random user contributed scripts and other half working features. Django's contrib is every bit as official as parts of Django that live elsewhere in the source tree. Jacob Kaplan-Moss has described what django.contrib is, no part of that description involves it being less official, quite the opposite in fact.

For these reasons I believe Django's admin should avoid selecting a Javascript toolkit, and instead maintain it's own handrolled code. Though this brings an increase burden on developers I believe it is more important to these philosophies than to take small development wins. People saying this stymies the admin's development should note that Django's admin's UI has changed only minimally over the past years, and only a small fraction of that can be attributed to difficulties in Javascript development.

You can find the rest here. There are view comments.

When Django Fails? (A response)

Posted November 11th, 2009. Tagged with django, python, response, rails.

I saw an article on reddit (or was in hacker news?) that asked the question: what happens when newbies make typos following the Rails tutorial, and how good of a job does Rails do at giving useful error messages? I decided it would be interesting to apply this same question to Django, and see what the results are. I didn't have the time to review the entire Django tutorial, so instead I'm going to make the same mistakes the author of that article did and see what the results are, I've only done the first few where the analogs in Django were clear.

Mistake #1: Point a URL at a non-existent view:

I pointed a URL at the view "django_fails.views.homme" when it should have been "home". Let's see what the error is:

ViewDoesNotExist at /
Tried homme in module django_fails.views. Error was: 'module' object has no attribute 'homme'

So the exception name is definitely a good start, combined with the error text I think it's pretty clear that the view doesn't exist.

Mistake #2: misspell url in the mapping file

Instead of doing url("^$" ...) I did urll:

NameError at /
name 'urll' is not defined

The error is a normal Python exception, which for a Python programmer is probably decently helpful, the cake is that if you look at the traceback it points to the exact line, in user code, that has the typo, which is exactly what you need.

Mistake #3: Linking to non-existent pages

I created a template and tried to use the {% url %} tag on a nonexistent view.

TemplateSyntaxError at /
Caught an exception while rendering: Reverse for 'homme' with arguments '()' and keyword arguments '{}' not found.

It points me at the exact line of the template that's giving me the error and it says that the reverse wasn't found, it seems pretty clear to me, but it's been a while since I was new, so perhaps a new users perspective on an error like this would be important.

It seems clear to me that Django does a pretty good job with providing useful exceptions, in particular the tracebacks on template specific exceptions can show you where in your templates the errors are. One issue I'll note that I've experience in my own work is that when you have an exception from within a templatetag it's hard to get the Python level traceback, which is important when you are debugging your own templatetags. However, there's a ticket that's been filed for that in Django's trac.

You can find the rest here. There are view comments.

The State of MultiDB (in Django)

Posted November 10th, 2009. Tagged with python, models, django, gsoc, internals, orm.

As you, the reader, may know this summer I worked for the Django Software Foundation via the Google Summer of Code program. My task was to implement multiple database support for Django. Assisting me in this task were my mentors Russell Keith-Magee and Nicolas Lara (you may recognize them as the people responsible for aggregates in Django). By the standards of the Google Summer of Code project my work was considered a success, however, it's not yet merged into Django's trunk, so I'm going to outline what happened, and what needs to happen before this work is considered complete.

Most of the major things happened, settings were changed from a series of DATABASE_* to a DATABASES setting that's keyed by DB aliases and who's values are dictionaries containing the usual DATABASE* options, QuerySets grew a using() method which takes a DB alias and says what DB the QuerySet should be evaluated against, save() and delete() grew similar using keyword arguments, a using option was added to the inner Meta class for models, transaction support was expanded to include support for multiple databases, as did the testing framework. In terms of internals almost every internal DB related function grew explicit passing of the connection or DB alias around, rather than assuming the global connection object as they used to. As I blogged previously ManyToMany relations were completely refactored. If it sounds like an awful lot got done, that's because it did, I knew going in that multi-db was a big project and it might not all happen within the confines of the summer.

So if all of that stuff got done, what's left? Right before the end of the GSOC time frame Russ and I decided that a fairly radical rearchitecting of the Query class (the internal datastructure that both tracks the state of an operation and forms its SQL) was needed. Specifically the issue was that database backends come in two varieties. One is something like a backend for App Engine, or CouchDB. These have a totally different design than SQL, they need different datastructures to track the relevant information, and they need different code generation. The second type of database backend is one for a SQL database. By contrast these all share the same philosophies and basic structure, in most cases their implementation just involves changing the names of database column types or the law LIMIT/OFFSET is handled. The problem is Django treated all the backends equally. For SQL backends this meant that they got their own Query classes even though they only needed to overide half of the Query functionality, the SQL generation half, as the datastructure part was identical since the underlying model is the same. What this means is that if you make a call to using() on a QuerySet half way through it's construction you need to change the class of the Query representation if you switch to a database with a different backend. This is obviously a poor architecture since the Query class doesn't need to be changed, just the bit at the end that actually constructs the SQL. To solve this problem Russ and I decided that the Query class should be split into two parts, a Query class that stores bits about the current query, and a SQLCompiler which generated SQL at the end of the process. And this is the refactoring that's holding up the merger of my multi-db work primarily.

This work is largely done, however the API needs to be finalized and the Oracle backend ported to the new system. In terms of other work that needs to be done, GeoDjango needs to be shown to shown to still work (or fixed). In my opinion everything else on the TODO list (available here, please don't deface) is optional for multi-db to be merge ready, with the exception of more example documentation.

There are already people using the multi-db branch (some even in production), so I'm confident about it's stability. For the next 6 weeks or so (until the 1.2 feature deadline), my biggest priority is going to be getting this branch into a merge ready state. If this is something that interests you please feel free to get in contact with me (although if you don't come bearing a patch I might tell you that I'll see you in 6 weeks ;)), if you happen to find bugs they can be filed on the Django trac, with version "soc2009/multidb". As always contributors are welcome, you can find the absolute latest work on my Github and a relatively stable version in my SVN branch (this doesn't contain the latest, in progress, refactoring). Have fun.

You can find the rest here. There are view comments.

Towards a Better Template Tag Definition Syntax

Posted November 6th, 2009. Tagged with django, python, template.

Eric Holscher has blogged a few times this month about various template tag definition syntax ideas. In particular he's looked at a system based on Surlex (which is essentially an alternate syntax for certain parts of regular expressions), and a system based on keywords. I highly recommend giving his posts a read as they explain the ideas he's looked at in far better detail than I could. However, I wasn't particularly satisfied with either of these solution, I love Django's use of regular expressions for URL resolving, however, for whatever reason I don't really like the look of using regular expressions (or an alternate syntax like Surlex) for template tag parsing. Instead I've been thinking about an object based parsing syntax, similar to PyParsing.

This is an idea I've been thinking about for several months now, but Eric's posts finally gave me the kick in the pants I needed to do the work. Therefore, I'm pleased to announce that I've released django-kickass-templatetags. Yes, I'm looking for a better name, it's already been pointed out to me that a name like that won't fly in the US government, or most corporate environments. This library is essentially me putting to code everything I've been thinking about, but enough talking let's take a look at what the template tag definition syntax:

@tag(register, [Constant("for"), Variable(), Optional([Constant("as"), Name()])]):
def example_tag(context, val, asvar=None):

As you can see it's a purely object based syntax, with different classes for different components of a template tag. For example this would parse something like:

{% example_tag for variable %}
{% example_tag for variable as new_var %}

It's probably clear that this is significantly less code than the manual parsing, manual node construction, and manual resolving of variable that you would have needed to do with a raw templatetag definition. Then the function you have gets the resolved values for each of its parameters, and at that point it's basically the same as Node.render, it is expected to either return a string to be inserted into the template or alter the context. I'm looking forward to never writing manual template parsing again. However, there are still a few scenarios it doens't handle, it won't handle something like the logic in the {% if %} tag, and it won't handle tags with {% end %} type tags. I feel like these should both be solvable problems, but since it's a bolt-on addition to the existing tools it ultimately doesn't have to cover every use case, just the common ones (when's the last time you wrote your own implementation of the {% if %} or {% for %} tags).

It's my hope that something like this becomes popular, as a) developers will be happier, b) moving towards a community standard is the first step towards including a solution out of the box. The pain and boilerplate of defining templatetags has long been a complain about Django's template language (especially because the language itself limits your ability to perform any sort of advanced logic), therefore making it as painless as possible ultimately helps make the case for the philosophy of the language itself (which I very much agree with it).

In keeping with my promise I'm giving an overview of what my next post will be, and this time I'm giving a full 3-day forecast :). Tommorow I'm going to blog about pip, virtualenv, and my development workflow. Sunday's post will cover a new optimization that just landed in Unladen Swallow. And finally Monday's post will contain a strange metaphor, and I'm not saying any more :P. Go checkout the code and enjoy.

Edit: Since this article was published the name of the library was changed to be: django-templatetag-sugar. I've updated all the links in this post.

You can find the rest here. There are view comments.

Django's ManyToMany Refactoring

Posted November 4th, 2009. Tagged with python, models, django, gsoc, internals, orm.

If you follow Django's development, or caught next week's DjangoDose Tracking Trunk episode (what? that's not how time flows you say? too bad) you've seen the recent ManyToManyField refactoring that Russell Keith-Magee committed. This refactoring was one of the results of my work as a Google Summer of Code student this summer. The aim of that work was to bring multiple database support to Django's ORM, however, along the way I ran into refactor the way ManyToManyField's were handled, the exact changes I made are the subject of tonight's post.

If you've looked at django.db.models.fields.related you may have come away asking how code that messy could possibly underlie Django's amazing API for handling related objects, indeed the mess so is so bad that there's a comment which says:

# HACK

which applies to an entire class. However, one of the real travesties of this module was that it contained a large swath of raw SQL in the manager for ManyToMany relations, for example the clear() method's implementation looks like:

cursor = connection.cursor()
cursor.execute("DELETE FROM %s WHERE %s = %%s" % \
    (self.join_table, source_col_name),
    [self._pk_val])
transaction.commit_unless_managed()

As you can see this hits the trifecta, raw SQL, manual transaction handling, and the use of a global connection object. From my perspective the last of these was the biggest issue. One of the tasks in my multiple database branch was to remove all uses of the global connection object, and since this uses it it was a major target for refactoring. However, I really didn't want to rewrite any of the connection logic I'd already implemented in QuerySets. This desire to avoid any new code duplication, coupled with a desire to remove the existing duplication (and flat out ugliness), led me to the simple solution: use the existing machinery.

Since Django 1.0 developers have been able to use a full on model for the intermediary table of a ManyToMany relation, thanks to the work of Eric Florenzano and Russell Keith-Magee. However, that support was only used when the user explicitly provided a through model. This of course leads to a lot of methods that basically have two implementation: one for the through model provided case, and one for the normal case -- which is yet another case of code bloat that I was now looking to eliminate. After reviewing these items my conclusion was that the best course was to use the provided intermediary model if it was there, otherwise create a full fledged model with the same fields (and everything else) as the table that would normally be specially created for the ManyToManyField.

The end result was dynamic class generation for the intermediary model, and simple QuerySet methods for the methods on the Manager, for example the clear() method I showed earlier now looks like this:

self.through._default_manager.filter(**{
    source_field_name: self._pk_val
}).delete()

Short, simple, and totally readable to anyone with familiarity with Python and Django. In addition this move allowed Russell to fix another ticket with just two lines of code. All in all this switch made for cleaner, smaller code and fewer bugs.

Tomorrow I'm going to be writing about both the talk I'm going to be giving at PyCon, as well as my experience as a member of the PyCon program committee. See you then.

You can find the rest here. There are view comments.

Optimising compilers are there so that you can be a better programmer

Posted October 10th, 2009. Tagged with pypy, python, unladen-swallow, django, compiler.

In a discussion on the Django developers mailing list I recently commented that the performance impact of having logging infrastructure, in the case where the user doesn't want the logging, could essentially be disregarded because Unladen Swallow (and PyPy) are bringing us a proper optimising (Just in Time) compiler that would essentially remove that consideration. Shortly thereafter someone asked me if I really thought it was the job of the interpreter/compiler to make us not think about performance. And my answer is: the job of a compiler is to let me program with best practices and not suffer performance consequences for doing things the right way.

Let us consider the most common compiler optimisations. A relatively simple one is function inlining, in the case where including the body of the function would be more efficient than actually calling it, a compiler can simply move the functions body into its caller. However, we can actually do this optimisation in our own code. We could rewrite:

def times_2(x):
    return x * 2

def do_some_stuff(i):
    for x in i:
        # stuff
        z = times_2(x)
    # more stuff

as:

def do_some_stuff(i):
    for x in i:
        # stuff
        z = x * 2
    # more stuff

And this is a trivial change to make. However in the case where times_2 is slightly less trivial, and is used a lot in our codebase it would be exceptionally more programming practice to repeat this logic all over the place, what if we needed to change it down the road? Then we'd have to review our entire codebase to make sure we changed it everywhere. Needless to say that would suck. However, we don't want to give up the performance gain from inlining this function either. So here it's the job of the compiler to make sure functions are inlined when possible, that way we get the best possible performance, as well as allowing us to maintain our clean codebase.

Another common compiler optimisation is to transform multiplications by powers of 2 into binary shifts. Thus x * 2 becomes x << 1 A final optimisation we will consider is constant propagation. Many program have constants that are used throughout the codebase. These are often simple global variables. However, once again, inlining them into methods that use them could provide a significant benefit, by not requiring the code to making a lookup in the global scope whenever they are used. But we really don't want to do that by hand, as it makes our code less readable ("Why are we multiplying this value by this random float?", "You mean pi?", "Oh."), and makes it more difficult to update down the road. Once again our compiler is capable of saving the day, when it can detect a value is a constant it can propagate it throughout the code.

So does all of this mean we should never have to think about writing optimal code, the compiler can solve all problems for us? The answer to this is a resounding no. A compiler isn't going to rewrite your insertion sort into Tim sort, nor is it going to fix the fact that you do 700 SQL queries to render your homepage. What the compiler can do is allow you to maintain good programming practices.

So what does this mean for logging in Django? Fundamentally it means that we shouldn't be concerned with possible overhead from calls that do nothing (in the case where we don't care about the logging) since a good compiler will be able to eliminate those for us. In the case where we actually do want to do something (say write the log to a file) the overhead is unavoidable, we have to open a file and write to it, there's no way to optimise it out.

You can find the rest here. There are view comments.

Django-filter 0.5 released!

Posted August 14th, 2009. Tagged with release, django, python, filter.

I've just tagged and uploaded Django-filter 0.5 to PyPi. The biggest change this release brings is that the package name has been changed from filter to django_filters in order to avoid conflicts with the Python builtin filter. Other changes included the addition of an initial argument on Filters, as well as the addition of an AllValuesFilter which is a ChoiceFilter who's choices are any values currently in the DB for that field. Despite the change in package name I will not be changing the name of the project due to the overhead in moving the repository (Github doesn't set up redirects when you change a project's name) and the PyPi package. I hope everyone enjoys this new release, as a lot of it's improvements have come out of my usage on Django-filter in piano-man.

As for what the future holds several people have indicated their interest in the inclusion of django-filter in Django itself as a contrib package, and for usage in the Admin as a new implementation of the list_filter option that is more flexible. Because of this my next work is probably going to be on implementing a custom ModelAdmin class that uses FilterSets for filtering.

You can find the latest release on PyPi and Github.

You can find the rest here. There are view comments.

Announcing pyvcs, django-vcs, and piano-man

Posted July 5th, 2009. Tagged with release, django, python, vcs.

Justin Lilly and I have just released pyvcs, a lightweight abstraction layer on top of multiple version control systems, and django-vcs, a Django application leveraging pyvcs in order to provide a web interface to version control systems. At this point pyvcs exists exclusively to serve the needs of django-vcs, although it is separate and completely usable on its own. Django-vcs has a robust feature set, including listing recent commits, pretty diff rendering of commits, and code browsing. It also supports multiple projects. Both pyvcs and django-vcs currently support Git and Mercurial, although adding support for a new backend is as simple as implementing four methods and we'd love to be able to support additional VCS like Subversion or Bazaar. Django-vcs comes with some starter templates (as well as CSS to support the pretty diff rendering).

It goes without saying that we'd like to thank the authors of the VCS themselves, in addition we'd like to thank the authors of Dulwich, for providing a pure Python implementation of the Git protocol, as well as the Pocoo guys, for pygments, the syntax highlighting library for Python, as well as the pretty diff rendering which we lifted out of the lodgeit pastbin application.

Having announced what we have already, we'll now be looking towards the future. As such Justin and I plan to be starting a new Django project "piano-man". Piano-man, a reference to Billy Joel, follows the Django tradition of naming things after musicians (although we've branched out a bit, leaving the realm of Jazz in favour of Rock 'n Roll). Piano-man will be a complete web based project management system, similar to Trac or Redmine. There are a number of logistical details that we still need to sort out, such as whether this will be a part of Pinax as a "code edition" or whether it will be a separate project entirely, like Satchmo and Reviewboard.

Some people are inevitably asking why we've chosen to start a new project, instead of working to improve one of the existing ones I alluded to. The reason is, after hearing coworkers complain about poor Git support in Trac (even with external plugins), and friends complain about the need to modify Redmine just to support branches in Git properly I'd become convinced it couldn't possibly be that hard, and I think Justin and I have proven that it isn't. All the work you see in pyvcs and django-vcs took 48 hours to complete, with both of us working evenings and a little bit during the day on these projects.

You can find both django-vcs and pyvcs on PyPi as well on Github under my account (http://github.com/alex/), both are released under the BSD license. We hope you enjoy both these projects and find them helpful, and we'd appreciate and contributions, just file bugs on Github. I'll have another blog post in a few days outlining the plan for piano-man once Justin and I work out the logistics. Enjoy.

Edit: I seem to have forgotten all the relevant links, here they are

Sorry about that.

You can find the rest here. There are view comments.

EuroDjangoCon 2009

Posted May 5th, 2009. Tagged with talk, django, python, forms.

EuroDjangoCon 2009 is still going strong, but I wanted to share the materials from my talk as quickly as possible. My slides are on Slide Share:

And the first examples code follows:

from django.forms.util import ErrorList
from django.utils.datastructures import SortedDict

def multiple_form_factory(form_classes, form_order=None):
   if form_order:
       form_classes = SortedDict([(prefix, form_classes[prefix]) for prefix in
           form_order])
   else:
       form_classes = SortedDict(form_classes)
   return type('MultipleForm', (MultipleFormBase,), {'form_classes': form_classes})

class MultipleFormBase(object):
   def __init__(self, data=None, files=None, auto_id='id_%s', prefix=None,
       initial=None, error_class=ErrorList, label_suffix=':',
       empty_permitted=False):
       if prefix is None:
           prefix = ''
       self.forms = [form_class(data, files, auto_id, prefix + form_prefix,
           initial[i], error_class, label_suffix, empty_permitted) for
           i, (form_prefix, form_class) in enumerate(self.form_classes.iteritems())]

   def __unicode__(self):
       return self.as_table()

   def __iter__(self):
       for form in self.forms:
           for field in form:
               yield field

   def is_valid(self):
       return all(form.is_valid() for form in self.forms)

   def as_table(self):
       return '\n'.join(form.as_table() for form in self.forms)

   def as_ul(self):
       return '\n'.join(form.as_ul() for form in self.forms)

   def as_p(self):
       return '\n'.join(form.as_p() for form in self.forms)

   def is_multipart(self):
       return any(form.is_multipart() for form in self.forms)

   def save(self, commit=True):
       return tuple(form.save(commit) for form in self.forms)
   save.alters_data = True

EuroDjangoCon has been a blast thus far and after the conference I'll do a blogpost that does it justice.

You can find the rest here. There are view comments.

Ajax Validation Aministrivia

Posted April 16th, 2009. Tagged with ajax_validation, release, django, filter.

Small administrative note, now that Github has it's own issue tracker users wishing to report issues with django-ajax-validation can now do so on Github, instead of the old Google Code page. Django-filter users can also report issues on its Github page, in place of the former system of "message or email me", which, though functional, wasn't very convenient.

You can find the rest here. There are view comments.

ORM Panel Recap

Posted March 30th, 2009. Tagged with python, alchemy, gae, django, orm, web2py, pycon, object, sql.

Having now completed what I thought was a quite successful panel I thought it would be nice to do a review of some of the decisions I made, that some people had been asking about. For those who missed it you can find a live blog of the event by James Bennett at his blog, and a video should hopefully be going up sometime soon.

Why Google App Engine

As Guido pointed out App Engine does not have an ORM, as App Engine doesn't have a relational datastore. However, it does have something that looks and acts quite a lot like other ORMs, and it does fundamentally try to serve the same purpose, offering a persistence layer. Therefore I decided it was at least in the same class of items I wanted to add. Further, with the rise of non-relational DBs that all fundamentally deal with the same issues as App Engine, and the relationship between ORMs and these new persistence layers I thought it would be advantageous to have one of these, Guido is a knowledgeable and interesting person, and that's how the cookie crumbled.

Why Not ZODB/Storm/A Talking Pony

Time. I would have loved to have as many different ORMs/things like them as exist in the Python eco-sphere, but there just wasn't time. We had 55 minutes to present and as it is that wasn't enough. I ultimately had time to ask 3 questions(one of which was just background), plus 5 shorter audience questions. I was forced to cut out several questions I wanted to ask, but didn't have time to, for those who are interested the major questions I would have liked to ask were:

  • What most often requested feature won't you add to your ORM?
  • What is the connection between an ORM and a schema migration tool. Should they both be part of the same project, should they be tied together, or are they totally orthogonal?
  • What's your support for geographic data? Is this(or other complex data types like it) in scope for the core of an ORM?

Despite these difficulties I thought the panel turned out very well. If there are any other questions about why things were the way they were just ask in the comments and I'll try to post a response.

You can find the rest here. There are view comments.

PyCon Wrapup

Posted March 30th, 2009. Tagged with pycon, django, python.

With PyCon now over for me(and the sprints just begining for those still there) I figured I'd recap it for those who couldn't be there, and compare notes for those who were. I'll be doing a separate post to cover my panel, since I have quite a brain dump there.

Day One

We start off the morning with lightning talks, much better than last year due to the removal of so many sponsor talks(thanks Jacob), by now I've already met up with quite a few Django people. Next I move on to the "Python for CS1" talk, this was particularly interesting for me since they moved away from C++, which is exactly what my school uses. Next, on to "Introduction to CherryPy" with a few other Djangonauts. CherryPy looks interesting as a minimalist framework, however a lot of the callback/plugin architecture was confusing me, so I feel like I'd be ignoring it and just being very close to the metal, which isn't always a bad thing, something to explore in the future. For there I'm off to "How Python is Developed", it's very clear that Django's development model is based of Python's, and that seems to work for both parties.

Now we head to lunch, which was fantastic, thanks to whoever put it together. From there I go to "Panel: Python VMs", this was very informative, though I didn't have a chance to ask my question, "Eveyone on the panel has mentioned that performance is likely to improve greatly in the future as a potential reason to use their alternate VM, should Unladen Swallow succeed in their goal, how does that affect you?". After that I went to the "Twisted, AMQP, and Thrift" talk, it was fairly good, though this feels like a space I have to look at more, or have a real world use case for, to totally understand.

A quick break, and then we're on to Jesse Noller's mutliprocessing talk. Jesse really is the expert in this, and having used the multiprocessing library before it's clear to me there's still tons of cool stuff to explore. Lastly we had the "Behind the scenes of Everyblock.com" from Adrian, it's very cool to see their architecture and it's getting me excited for their going open source in June. I also got a bit out a shoutout here when Adrian was asked about future scalability and he mentioned sharding across multiple databases. We had lightning talks and that was the end of the official conference for the day.

In the evening a big group of us(officially a Pinax meetup, but I think we had more than that) went to a restaurant, as the evening progressed our group shrunk, until they finally kicked us out at closing time. A good time was had by all I think.

Day two

After staying out exceptionally late the night before I ended up sleeping on the floor with friends rather than heading home(thanks!) so, retrospectively, day two should have been tiring, but it was as high energy as could be. I began the morning with lightning talks followed by Guido's keynote, which was a little odd and jumped around a lot, but I found I really enjoyed it. After that I got to sit in one room for 3 talks in a row, "The State of Django", "Pinax", and "State of TurboGears", which I can say, without qualifications, were all great(I received another shoutout during the Django talk, I guess I really need to deliver, assuming the GSOC project is accepted). From there it was off to lunch.

After lunch it was time to take the stage for my "ORM Panel". I'm saving all the details for another blog post, but I think it went well. After this we had a Django Birds of a Feather session, which was very fun. We got to see Honza Kraal show off the cool features of the Ella admin. The highlight had to be having Jacob Kalpan-Moss rotate us around the room to get Djangonauts to meet each other.

After this I heard Jesse Noller's second concurrency talk, and then Ian Bicking's talk on ... stuff, both talks were great, but Ian's is probably in a "you had to be there" category.

After that it was time for lightning talks, followed by what was supposed to be dinner for people interested in Oebfare(Django blog application) to make some design decisions. In the end quite a few more people were there and I got to meet some really cool people like Mark Pilgrim. After dinner it was back to the Hyatt for about an hour of hacking on Django before I called it a night.

Day Three

The final day of the conference. Once again I began my day with some lightning talks. After this was the keynote, by the reddit.com creators. They gave a short, but interesting keynote and answered a lot of questions. I have to say that some of their technical answers were less than satisfying. After this I went to an open space on podcasting where I learned that I'm probably the only person who doesn't mind podcasts that are over an hour long.

After this I went to the Eve Online talk, but I ended up leaving for the testing tools panel. This panel was interesting, but I can't help but feel that the Python community would be best served by getting the testing tools that everyone agrees about into the unittest module in the Python standard library. For my last talk of the conference I saw Jacob Kaplan-Moss's talk on "Django's design decisions", thankfully this didn't step on the toes of the ORM design decision panel, and it was very interesting. From there it was off to lunch, and a final round of lightning talks, capped off my Guido running away with the Django Pony(I hope someone caught this on video), and my conference was done.

This was a really great conference for me, hopefully everyone else enjoyed it as much as I did. I'll be doing a follow up post where I answer some of the questions I've seen about the ORM panel, as well as put the rest of my thoughts on paper(hard disk). And now if you'll excuse me, I have a virtual sprint to attend.

You can find the rest here. There are view comments.

Google Moderator for PyCon ORM Panel

Posted March 15th, 2009. Tagged with python, alchemy, gae, django, orm, web2py, object, sql.

I'm going to be moderating a panel this year at PyCon between 5 of the Python ORMs(Django, SQLAlchemy, SQLObject, Google App Engine, and web2py). To make my job easier, and to make sure the most interesting questions are asked I've setup a Google Moderator page for the panel here. Go ahead and submit your questions, and moderate others to try to ensure we get the best questions possible, even if you can't make it to PyCon(there will be a recording made I believe). I'll be adding my own questions shortly to make sure they are as interesting as I think they are.

Also, if you aren't already, do try to make it out to PyCon, there's still time and the talks look to be really exceptional.

You can find the rest here. There are view comments.

Announcing django-filter

Posted February 14th, 2009. Tagged with django, filter.

Django-filter is a reusable Django application I have been working on for a few weeks, individuals who follow me on Github may have noticed me working on it. Django-filter is basically an application that creates a generic interface for creating pages like the Django admin changelist page. It has support for defining vairous filters(including custom ones), filtering a queryset based on the user selections, and displaying an HTML form for the filter options. It has a design based around Django's ModelForm.

You can get the code, as well as tests and docs over at Github. For all bug reports, comments, and other feedback you can send me a message on Github(or message me on IRC) until I figure out a proper bug tracking system.

Enjoy.

You can find the rest here. There are view comments.

A Second Look at Inheritance and Polymorphism with Django

Posted February 10th, 2009. Tagged with python, models, django, internals, orm, metaclass.

Previously I wrote about ways to handle polymorphism with inheritance in Django's ORM in a way that didn't require any changes to your model at all(besides adding in a mixin), today we're going to look at a way to do this that is a little more invasive and involved, but also can provide much better performance. As we saw previously with no other information we could get the correct subclass for a given object in O(k) queries, where k is the number of subclasses. This means for a queryset with n items, we would need to do O(nk) queries, not great performance, for a queryset with 10 items, and 3 subclasses we'd need to do 30 queries, which isn't really acceptable for most websites. The major problem here is that for each object we simply guess as to which subclass a given object is. However, that's a piece of information we could know concretely if we cached it for later usage, so let's start off there, we're going to be building a mixin class just like we did last time:

from django.db import models

class InheritanceMixIn(models.Model):
    _class = models.CharField(max_length=100)

    class Meta:
        abstract = True

So now we have a simple abstract model that the base of our inheritance trees can subclass that has a field for caching which subclass we are. Now let's add a method to actually cache it and retrieve the subclass:

from django.db import models
from django.db.models.fields import FieldDoesNotExist
from django.db.models.related import RelatedObject

class InheritanceMixIn(models.Model):
    ...
    def save(self, *args, **kwargs):
        if not self.id:
            parent = self._meta.parents.keys()[0]
            subclasses = parent._meta.get_all_related_objects()
            for klass in subclasses:
                if isinstance(klass, RelatedObject) and klass.field.primary_key \
                    and klass.opts == self._meta:
                    self._class = klass.get_accessor_name()
                    break
        return super(InheritanceMixIn, self).save(*args, **kwargs)

    def get_object(self):
        try:
            if self._class and self._meta.get_field_by_name(self._class)[0].opts != self._meta:
                return getattr(self, self._class)
        except FieldDoesNotExist:
            pass
        return self

Our save method is where all the magic really happens. First, we make sure we're only doing this caching if it's the first time a model is being saved. Then we get the first parent class we have (this means this probably won't play nicely with multiple inheritance, that's unfortunate, but not as common a usecase), then we get all the related objects this class has(this includes the reverse relationship the subclasses have). Then for each of the subclasses, if it is a RelatedObject, and it is a primary key on it's model, and the class it points to is the same as us then we cache the accessor name on the model, break out, and do the normal save procedure.

Our get_object function is pretty simple, if we have our class cached, and the model we are cached as isn't of the same type as ourselves we get the attribute with the subclass and return it, otherwise we are the last descendent and just return ourselves. There is one(possible quite large) caveat here, if our inheritance chain is more than one level deep(that is to say our subclasses have subclasses) then this won't return those objects correctly. The class is actually cached correctly, but since the top level object doesn't have an attribute by the name of the 2nd level subclass it doesn't return anything. I believe this can be worked around, but I haven't found a way yet. One idea would be to actually store the full ancestor chain in the CharField, comma separated, and then just traverse it.

There is one thing we can do to make this even easier, which is to have instances automatically become the correct subclass when they are pulled in from the DB. This does have an overhead, pulling in a queryset with n items guarantees O(n) queries. This can be improved(just as it was for the previous solution) by ticket #7270 which allows select_related to traverse reverse relationships. In any event, we can write a metaclass to handle this for us automatically:

from django.db import models
from django.db.models.base import ModelBase
from django.db.models.fields import FieldDoesNotExist
from django.db.models.related import RelatedObject

class InheritanceMetaclass(ModelBase):
    def __call__(cls, *args, **kwargs):
        obj = super(InheritanceMetaclass, cls).__call__(*args, **kwargs)
        return obj.get_object()

class InheritanceMixIn(models.Model):
    __metaclass__ = InheritanceMetaclass
    ...

Here we've created a fairly trivial metaclass that subclasses the default one Django uses for it's models. The only method we've written is __call__, on a metalcass what __call__ does is handle the instantiation of an object, so it would call __init__. What we do is do whatever the default __call__ does, so that we get an instances as normal, and then we call the get_object() method we wrote earlier and return it, and that's all.

We've now looked at 2 ways to handle polymorphism, with this way being more efficient in all cases(ignoring the overhead of having the extra charfield). However, it still isn't totally efficient, and it fails in several edge cases. Whether automating the handling of something like this is a good idea is something that needs to be considered on a project by project basis, as the extra queries can be a large overhead, however, they may not be avoidable in which case automating it is probably advantages.

You can find the rest here. There are view comments.

Building a Magic Manager

Posted January 31st, 2009. Tagged with models, django, orm, python.

A very common pattern in Django is to create methods on a manager to abstract some usage of ones data. Some people take a second step and actually create a custom QuerySet subclass with these methods and have their manager proxy these methods to the QuerySet, this pattern is seen in Eric Florenzano's Django From the Ground Up screencast. However, this requires a lot of repetition, it would be far less verbose if we could just define our methods once and have them available to us on both our managers and QuerySets.

Django's manager class has one hook for providing the QuerySet, so we'll start with this:

from django.db import models

class MagicManager(models.Manager):
   def get_query_set(self):
       qs = super(MagicManager, self).get_query_set()
       return qs

Here we have a very simple get_query_set method, it doesn't do anything but return it's parent's queryset. Now we need to actually get the methods defined on our class onto the queryset:

class MagicManager(models.Manager):
   def get_query_set(self):
       qs = super(MagicManager, self).get_query_set()
       class _QuerySet(qs.__class__):
           pass
       for method in [attr for attr in dir(self) if not attr.startswith('__') and callable(getattr(self, attr)) and not hasattr(_QuerySet, attr)]:
           setattr(_QuerySet, method, getattr(self, method))
       qs.__class__ = _QuerySet
       return qs

The trick here is we dynamically create a subclass of whatever class the call to our parent's get_query_set method returns, then we take each attribute on ourself, and if the queryset doesn't have an attribute by that name, and if that attribute is a method then we assign it to our QuerySet subclass. Finally we set the __class__ attribute of the queryset to be our QuerySet subclass. The reason this works is when Django chains queryset methods it makes the copy of the queryset have the same class as the current one, so anything we add to our manager will not only be available on the immediately following queryset, but on any that follow due to chaining.

Now that we have this we can simply subclass it to add methods, and then add it to our models like a regular manager. Whether this is a good idea is a debatable issue, on the one hand having to write methods twice is a gross violation of Don't Repeat Yourself, however this is exceptionally implicit, which is a major violation of The Zen of Python.

You can find the rest here. There are view comments.

Django Ajax Validation 0.1.0 Released

Posted January 24th, 2009. Tagged with ajax_validation, release, django, easy_install.

I've just uploaded the first official release of Django Ajax Validation up to PyPi. You can get it there. For those that don't know it is a reusable Django application that allows you to do JS based validation using you're existing form definitions. Currently it only works using jQuery. If there are any problems please let me know.

You can find the rest here. There are view comments.

Optimizing a View

Posted January 19th, 2009. Tagged with python, compile, models, django, orm.

Lately I've been playing with a bit of a fun side project. I have about a year and half worth of my own chatlogs with friends(and 65,000 messages total) and I've been playing around with them to find interesting statistics. One facet of my communication with my friends is that we link each other lots of things, and we can always tell when someone is linking something that we've already seen. So I decided an interesting bit of information would be to see who is the worst offender.

So we want to write a function that returns the number of items each person has relinked, excluding items they themselves linked. So I started off with the most simple implementation I could, and this was the end result:

from collections import defaultdict
from operator import itemgetter

from django.utils.html import word_split_re

from logger.models import Message

def calculate_relinks():
    """
    Calculate the number of times each individual has linked something that was
    linked previously in the course of the chat.
    """
    links = defaultdict(int)
    for message in Message.objects.all().order_by('-time').iterator():
        words = word_split_re.split(message.message)
        for word in words:
            if word.startswith('http'):
                if Message.objects.filter(time__lt=message.time).filter(message__contains=word).exclude(speaker=message.speaker).count():
                    links[message.speaker] += 1
    links = sorted(links.iteritems(), key=itemgetter(1), reverse=True)
    return links

Here I iterated over the messages and for each one I went through each of the words and if any of them started with http(the definition of a link for my purposes) I checked to see if this had ever been linked before by someone other than the author of the current message.

This took about 4 minutes to execute on my dataset, it also executed about 10,000 SQL queries. This is clearly unacceptable, you can't have a view that takes that long to render, or hits your DB that hard. Even with aggressive caching this would have been unmaintainable. Further this algorithm is O(n**2) or thereabouts so as my dataset grows this would have gotten worse exponentially.

By changing this around however I was able to achieve far better results:

from collections import defaultdict
from operator import itemgetter

from django.utils.html import word_split_re

from logger.models import Message

def calculate_relinks():
    """
    Calculate the number of times each individual has linked something that was
    linked previously in the course of the chat.
    """
    links = defaultdict(set)
    counts = defaultdict(int)
    for message in Message.objects.all().filter(message__contains="http").order_by('time').iterator():
        words = word_split_re.split(message.message)
        for word in words:
            if word.startswith('http'):
                if any([word in links[speaker] for speaker in links if speaker != message.speaker]):
                    counts[message.speaker] += 1
                links[message.speaker].add(word)
    counts = sorted(counts.iteritems(), key=itemgetter(1), reverse=True)
    return counts

Here what I do is go through each of the messages which contain the string "http"(this is already a huge advantage since that means we process about 1/6 of the messages in Python that we originally were), for each message we go through each of the words in it, and for each that is a link we check if any other person has said it by looking in the caches we maintain in Python, and if they do we increment their count, finally we add the link to that persons cache.

By comparison this executes in .3 seconds, executes only 1 SQL query, and it will scale linearly(as well as is possible). For reference both of these functions are compiled using Cython. This ultimately takes almost no work to do and for computationally heavy operations this can provide a huge boon.

You can find the rest here. There are view comments.

New Admin URLs

Posted January 14th, 2009. Tagged with django, internals.

I'm very happy to announce that with revision 9739 of Django the admin now uses normal URL resolvers and its URLs can be reversed. This is a tremendous improvement over the previous ad hoc system and it gives users the distinct advantage of being able to reverse the admin's URLs. However, in order to make this work a new feature went into the URL resolving system that any user can use in their own code.

Specifically users can now have objects which provide URLs. Basically this is because include() can now accept any iterable, rather than just a string which points to a urlconf. To get an idea of what this looks like you can look at the way the admin now does it.

There are going to be a few more great additions to Django going in as we move towards the 1.1 alpha so keep an eye out for them.

You can find the rest here. There are view comments.

Building a Read Only Field in Django

Posted December 28th, 2008. Tagged with django, forms.

One commonly requested feature in Django is to have a field on a form(or in the admin), that is read only. Such a thing is going may be a Django 1.1 feature for the admin, exclusively, since this is the level that it makes sense at, a form is by definition for inputing data, not displaying data. However, it is still possible to do this with Django now, and it doesn't even take very much code. As I've said, doing it in this manner(as a form field) isn't particularly intuitive or sensible, however it is possible.

The first thing we need to examine is how we would want to use this, for our purposes we'll use this just like we would a normal field on a form:

from django import forms
from django.contrib.auth.models import User

class UserForm(forms.ModelForm):
    email = ReadOnlyField()

    class Meta:
        model = User
        fields = ['email', 'username']

So we need to write a field, our field will actually need to be a subclass of FileField, at first glance this makes absolutely no sense, our field isn't taking files, it isn't taking any data at all. However FileFields receive the initial data for their clean() method, which other fields don't, and we need this behavior for our field to work:

class ReadOnlyField(forms.FileField):
    widget = ReadOnlyWidget
    def __init__(self, widget=None, label=None, initial=None, help_text=None):
        forms.Field.__init__(self, label=label, initial=initial,
            help_text=help_text, widget=widget)

    def clean(self, value, initial):
        self.widget.initial = initial
        return initial

As you can see in the clean method we are exploiting this feature in order to give our widget the initial value, which it normally won't have access to at render time.

Now we write our ReadOnlyWidget:

from django.forms.util import flatatt

class ReadOnlyWidget(forms.Widget):
    def render(self, name, value, attrs):
        final_attrs = self.build_attrs(attrs, name=name)
        if hasattr(self, 'initial'):
            value = self.initial
        return "<p>%s</p>" % (flatatt(final_attrs), value or '')

    def _has_changed(self, initial, data):
        return False

Our widget simply renders the initial value to a p tag, instead of as an input tag. We also override the _has_changed method to always return False, this is used in formsets to avoid resaving data that hasn't changed, since our input can't change data, obviously it won't change.

And that's all there is to it, less than 25 lines of code in all. As I said earlier this is a fairly poor architecture, and I wouldn't recommend it, however it does work and serves as proof that Django will allow you to do just about anything in you give in a try.

You can find the rest here. There are view comments.

Building a Function Templatetag

Posted December 25th, 2008. Tagged with django, template.

One of the complaints often lobbied against Django's templating system is that there is no way to create functions. This is intentional, Django's template language is not meant to be a full programming language. However, if one wants to it is entirely possible to build a templatetag to allow a user to create, and call functions in the template language.

To get started we need to build a tag to create functions, first we need our parsing function:

from django import template

register = template.Library()

@register.tag
def function(parser, token):
    arglist = token.split_contents()[1:]
    func_name = arglist.pop(0)
    nodelist = parser.parse(('endfunction',))
    parser.delete_first_token()
    return FunctionNode(func_name, arglist, nodelist)

The format for our tag is going to be {% function func_name arglist... %}. So we parse this out of the token. We get everything after the initial piece of the function definition, we make the first part of this the function name, and the rest is the argument list. The next part is to parse until the {% endfunction %} tag. Finally we return a FunctionNode. Now we need to actually build this node:

class FunctionNode(template.Node):
    def __init__(self, func_name, arglist, nodelist):
        self.func_name = func_name
        self.nodelist = nodelist
        self.arglist = arglist

    def render(self, context):
        if '_functions' not in context:
            context['_functions'] = {}
        context['_functions'][self.func_name] = (self.arglist, self.nodelist)
        return ''

Our __init__ method just stores the data on the instance. Our render method stores the argument list and the nodes that make up the function in the context. This gives us all the information we will need to know in order to call and render our functions. Now we need our actual mechanism for calling our functions:

@register.tag
def call(parser, token):
    arglist = token.split_contents()[1:]
    func_name = arglist.pop(0)
    return CallNode(func_name, arglist)

Like our function tag, we parse out the name of the function, and then the argument list. And now we need to render the result of calling it:

class CallNode(template.Node):
    def __init__(self, func_name, arglist):
        self.func_name = func_name
        self.arglist = arglist

    def render(self, context):
        arglist, nodelist = context['_functions'][self.func_name]
        c = template.Context(dict(zip(arglist, [template.Variable(x).resolve(context) for x in self.arglist])))
        return nodelist.render(c)

render gets the arglist and nodelist out of the context, and then creates a context of the calling variables, by zipping together the variable names from function definition and function calling.

Now we can create functions by doing:

{% function f arg %}
    {{ arg }}
{% endfunction %}

And call them by doing:

{% call f some_var %}
{% call f some_other_var %}

Hopefully this has given you a useful insight into how to build powerful templatetags in Django's template language. One possible improvement the reader may want to do is to have the function tag actually register a templatetag out of the function definition, and then be able to use it by simpling using it like a normal templatetag. As a slight warning I haven't tested this with a wide range of data, so if there are any issues please report them.

You can find the rest here. There are view comments.

Many Thanks to Webfaction

Posted December 25th, 2008. Tagged with deployment, webfaction, django.

One of my recent projects has been to build a website for someone so that they could put their portfolio online. Building the site with Django, was of course, a snap. However, I would like to take a moment to thank Webfaction for the painless deployment. It's a small website, so shared hosting is perfect for it, and Webfaction made it a breeze. Setting up a PostgreSQL database was a snap. Getting Django installed and my code on the server was easy. My one pain point was of my own creation(I broke the PythonPath setting in the WSGI file Webfaction provides), but even with that deployment didn't take more than an hour. They even made it easy to use the ultralight weight Nginx for my static files.

All said, Webfaction was exactly what I needed from a host, and they delivered. For the curious parties you can see the site here.

You can find the rest here. There are view comments.

PyCon '09, Here I come!

Posted December 15th, 2008. Tagged with alchemy, gae, object, sql, django, orm, web2py.

This past year I attended PyCon 2008 in Chicago, which was a tremendous conference. I had a chance to meet people I knew from the community, listen to some amazing talks, meet new people, and get to sprint. As a result of this tremendous experience I decided for this year to submit a talk proposal. I found out just a few minutes ago that my proposal has been accepted.

I proposed a panel on "Object Relational Mapper Philosophies and Design Decisions". This panel is going to look at the design decisions that each of several ORMs engaged, and what philosophies they had, but with respect to their public APIs and their internal code design. Participating in the panel will be:
  • Jacob Kaplan-Moss, representing Django
  • Ian Bicking, representing SQL Object
  • Mike Bayer, represening SQL Alchemy
  • Guido van Rossum, representing Google App Engine
  • Dr. Massimo Di Pierro, representing web2py

I'm tremendously honored to be able to moderate a panel at PyCon, especially with these five individuals. They are all indcredibly smart, and they each bring a different insight and perspective to this panel.

PyCon is a great conference and I would encourage anyone who can to attend.

You can find the rest here. There are view comments.

Playing with Polymorphism in Django

Posted December 5th, 2008. Tagged with python, models, django, internals, orm.

One of the most common requests of people using inheritance in Django, is to have the a queryset from the baseclass return instances of the derives model, instead of those of the baseclass, as you might see with polymorphism in other languages. This is a leaky abstraction of the fact that our Python classes are actually representing rows in separate tables in a database. Django itself doesn't do this, because it would require expensive joins across all derived tables, which the user probably doesn't want in all situations. For now, however, we can create a function that given an instance of the baseclass returns an instance of the appropriate subclass, be aware that this will preform up to k queries, where k is the number of subclasses we have.

First let's set up some test models to work with:

from django.db import models

class Place(models.Model):
    name = models.CharField(max_length=50)

    def __unicode__(self):
        return u"%s the place" % self.name


class Restaurant(Place):
    serves_pizza = models.BooleanField()

    def __unicode__(self):
        return "%s the restaurant" % self.name

class Bar(Place):
    serves_wings = models.BooleanField()

    def __unicode__(self):
        return "%s the bar" % self.name

These are some fairly simple models that represents a common inheritance pattern. Now what we want to do is be able to get an instance of the correct subclass for a given instance of Place. To do this we'll create a mixin class, so that we can use this with other classes.

class InheritanceMixIn(object):
    def get_object(self):
        ...

class Place(models.Model, InheritanceMixIn):
    ...

So what do we need to do in our get_object method? Basically we need to loop each of the subclasses, try to get the correct attribute and return it if it's there, if none of them are there, we should just return ourself. We start by looping over the fields:

class InheritanceMixIn(object):
    def get_object(self):
        for f in self._meta.get_all_field_names():
            field = self._meta.get_field_by_name(f)[0]

_meta is where Django stores lots of the internal data about a mode, so we get all of the field names, this includes the names of the reverse descriptors that related models provide. Then we get the actual field for each of these names. Now that we have each of the fields we need to test if it's one of the reverse descriptors for the subclasses:

from django.db.models.related import RelatedObject

class InheritanceMixIn(object):
    def get_object(self):
        for f in self._meta.get_all_field_names():
            field = self._meta.get_field_by_name(f)[0]
            if isinstance(field, RelatedObject) and field.field.primary_key:

We first test if the field is a RelatedObject, and if it we see if the field on the other model is a primary key, which it will be if it's a subclass(or technically any one to one that is a primary key). Lastly we need to find what the name of that attribute is on our model and to try to return it:

class InheritanceMixIn(object):
    def get_object(self):
        for f in self._meta.get_all_field_names():
            field = self._meta.get_field_by_name(f)[0]
            if isinstance(field, RelatedObject) and field.field.primary_key:
                try:
                    return getattr(self, field.get_accessor_name())
                except field.model.DoesNotExist:
                    pass
        return self

We try to return the attribute, and if it raises a DoesNotExist exception we move on to the next one, if none of them return anything, we just return ourself.

And that's all it takes. This won't be super efficient, since for a queryset of n objects, this will take O(n*k) given k subclasses. Ticket 7270 deals with allowing select_related() to work across reverse one to one relations as well, which will allow one to optimise this, since the subclasses would already be gotten from the database.

You can find the rest here. There are view comments.

Fixing up our identity mapper

Posted December 1st, 2008. Tagged with foreignkey, models, django, internals, orm.

The past two days we've been looking at building an identity mapper in Django. Today we're going to implement some of the improvements I mentioned yesterday.

The first improvement we're going to do is it have it execute the query as usual and just cache the results, to prevent needing to execute additional queries. This means changing the __iter__ method on our queryset class:

def __iter__(self):
    for obj in self.iterator():
        try:
            yield get_from_cache(self.model, obj.pk)
        except KeyError:
            cache_instance(obj)
            yield obj

Now we just iterate over self.iterator() which is a slightly lower level interface to a querysets iteration, it bypasses all the caching that occurs(this means that for now at least, if we iterate over our queryset twice we actually execute two queries, whereas Django would normally do just one), however overall this will be a big win, since before if an item wasn't in the cache we would do an extra query for it.

The next improvement I proposed was to use Django's built in caching interfaces. However, this won't work, this is because the built in locmem cache backend pickles and unpickles everything before caching and retrieving everything from the cache, so we'd end up with different objects(which defeats the point of this).

The last improvement we can make is to have this work on related objects for which we already know the primary key. The obvious route to do this is to start hacking in django.db.models.fields.related, however as I've mentioned in a previous post this area of the code is a bit complex, however if we know a little bit about how this query is executed we can do the optimisation in a far simpler way. As it turns out the related object descriptor simply tries to do the query using the default manager's get method. Therefore, we can simply special case this occurrence in order to optimise this. We also have to make a slight chance to our manager, as by default the manager won't be used on related object queries:

class CachingManager(Manager):
    use_for_related_fields = True
    def get_query_set(self):
        return CachingQuerySet(self.model)

class CachingQueryset(QuerySet):
    ...
    def get(self, *args, **kwargs):
        if len(kwargs) == 1:
            k = kwargs.keys()[0]
            if k in ('pk', 'pk__exact', '%s' % self.model._meta.pk.attname, '%s__exact' % self.model._meta.pk.attname):
                try:
                    return get_from_cache(self.model, kwargs[k])
                except KeyError:
                    pass
        clone = self.filter(*args, **kwargs)
        objs = list(clone[:2])
        if len(objs) == 1:
            return objs[0]
        if not objs:
            raise self.model.DoesNotExist("%s matching query does not exist."
                             % self.model._meta.object_name)
        raise self.model.MultipleObjectsReturned("get() returned more than one %s -- it returned %s! Lookup parameters were %s"
                % (self.model._meta.object_name, len(objs), kwargs))

As you can see we just add one line to the manager, and a few lines to the begging of the get() method. Basically our logic is if there is only one kwarg to the get() method, and it is a query on the primary key of the model, we try to return our cached instance. Otherwise we fall back to executing the query.

And with this we've improved the efficiency of our identity map, there are almost definitely more places for optimisations, but now we have an identity map in very few lines of code.

You can find the rest here. There are view comments.

A Few More Thoughts on the Identity Mapper

Posted December 1st, 2008. Tagged with foreignkey, models, django, internals, orm.

It's late, and I've my flight was delayed for several hours so today is going to be another quick post. With that note here are a few thoughts on the identity mapper:
  • We can optimize it to actually execute fewer queries by having it run the query as usual, and then use the primary key to check the cache, else cache the instance we already have.
  • As Doug points out in the comments, there are built in caching utilities in Django we should probably be taking advantage of. The only qualification is that whatever cache we use needs to be in memory and in process.
  • The cache is actually going to be more efficient than I originally thought. On a review of the source the default manager is used for some related queries, so our manager will actually be used for those.
  • The next place to optimize will actually be on single related objects(foreign keys and one to ones). That's because we already have their primary key and so we can check for them in the cache without executing any SQL queries.

And lastly a small note. As you may have noticed I've been doing the National Blog Everyday for a Month Month, since I started two days late, I'm going to be continueing on for another two days.

You can find the rest here. There are view comments.

Building a simple identity map in Django

Posted November 29th, 2008. Tagged with models, django, orm.

In Django's ticket tracker lies ticket 17, the second oldest open ticket, this proposes an optimisation to have instances of the same database object be represented by the same object in Python, essentially that means for this code:

a = Model.objects.get(pk=3)
b = Model.objects.get(pk=3)

a and b would be the same object at the memory level. This can represent a large optimisation in memory usage if you're application has the potential to have duplicate objects(for example, related objects). It is possible to implement a very simple identity map without touching the Django source at all.

The first step is to set up some very basic infastructure, this is going to be almost identical to what Eric Florenzano does in his post, "Drop-dead simple Django caching".

We start with a few helper functions:

_CACHE = {}

def key_for_instance(obj, pk=None):
    if pk is None:
        pk = obj.pk
    return "%s-%s-%s" % (obj._meta.app_label, obj._meta.module_name, pk)

def get_from_cache(klass, pk):
    return _CACHE[key_for_instance(klass, pk)]

def cache_instance(instance):
    _CACHE[key_for_instance(instance)] = instance

We create our cache, which is a Python dictionary, a function to generate the cache key for an object, a function to get an item from the cache, and a function to cache an item. How these work should be pretty simple. Next we need to create some functions to make sure objects get update in the cache.

from django.db.models.signals import post_save, pre_delete

def post_save_cache(sender, instance, **kwargs):
    cache_instance(instance)
post_save.connect(post_save_cache)

def pre_delete_uncache(sender, instance, **kwargs):
    try:
        del _CACHE[key_for_instance(instance)]
    except KeyError:
        pass
pre_delete.connect(pre_delete_uncache)

Here we set up two signal receivers, when an object is saved we cache it, and when one is deleted we remove it from the cache.

Now we want a way to use our cache the way we already use our connection to the database, this means implementing some sort of hook in a QuerySet, this looks like:

from django.db.models.query import QuerySet

class CachingQueryset(QuerySet):
    def __iter__(self):
        obj = self.values_list('pk', flat=True)
        for pk in obj:
            try:
                yield get_from_cache(self.model, pk)
            except KeyError:
                instance = QuerySet(self.model).get(pk=pk)
                cache_instance(instance)
                yield instance

    def get(self, *args, **kwargs):
        clone = self.filter(*args, **kwargs)
        objs = list(clone[:2])
        if len(objs) == 1:
            return objs[0]
        if not objs:
            raise self.model.DoesNotExist("%s matching query does not exist."
                             % self.model._meta.object_name)
        raise self.model.MultipleObjectsReturned("get() returned more than one %s -- it returned %s! Lookup parameters were %s"
                % (self.model._meta.object_name, len(objs), kwargs))

We create a subclass of QuerySet and override it's __iter__() and get() methods. By default __iter__ does a fair bit of heavy lifting to internally cache the results and allow the usage of multiple iterators properly. We override this to do something simpler. We get the primary keys of each item in the queryset and iterate over them, if the object is in the cache we return it, otherwise we execute a database query to get it, and then cache it. We also override get() to make sure it makes use of the caching we just set up.

To use this on a model we need to create a simple manager:

class CachingManager(Manager):
    def get_query_set(self):
        return CachingQuerySet(self.model)

And then we can use this with our models:

class Post(models.Model):
    title = models.CharField(max_length=100)

    objects = CachingManager()

Post.objects.all()

Now all Posts accessed within the same thread will be cached using the strategy we've implemented.

This strategy will not save us database queries, indeed in some cases it can result in many more queries, it is designed to save memory usage(and be implemented as simply as possible). It can also be made far more useful by having related objects use this strategy as well(if Post had a foreign key to author it would be nice to have all post authors share the same instances, since even you have a large queryset of Posts were all the Posts are unique, they are likely to have duplicate authors).

You can find the rest here. There are view comments.

Other ORM Goodies

Posted November 29th, 2008. Tagged with models, django, orm.

In addition to the aggregate work, the GSOC student had time to finish ticket 7210, which adds support for expressions to filter() and update(). This means you'll be able to execute queries in the form of:

SELECT * FROM table WHERE height > width;

or similar UPDATE queries. This has a syntax similar to that of Q objects, using a new F object. So the above query would look like:

Model.objects.filter(height__gt=F('width'))

or an update query could look like:

Employee.objects.update(salary=F('salary')*1.1)

these objects support the full range of arithmetic operations. These are slated to be a part of Django 1.1.

You can find the rest here. There are view comments.

What aggregates are going to look like

Posted November 27th, 2008. Tagged with models, django, orm.

Prior to Django 1.0 there was a lot of discussion of what the syntax for doing aggregate queries would like. Eventually a syntax was more or less agreed upon, and over the summer Nicolas Lara implemented this for the Google Summer of Code project, mentored by Russell Keith-Magee. This feature is considered a blocker for Django 1.1, so I'm going to outline what the syntax for these aggregates will be.

To facillitate aggregates two new methods are being added to the queryset, aggregate and annotate. Aggregate is used to preform basic aggregation on queryset itself, for example getting the MAX, MIN, AVG, COUNT, and SUM for a given field on the model. Annotate is used for getting information about a related model.

For example, if we had a product model with a price field, we could get the max and minimum price for a product by doing the following:

Product.objects.aggregate(Min('price'), Max('price'))

this will return something like this:

{'price__min': 23.45,
 'price__max': 47.89,
}

We can also give the results aliases, so it's easier to read(if no alias is provided it fallsback to using fieldname__aggregate:

Product.objects.aggregate(max_price = Max('price'), min_price = Min('price'))
{'min_price': 23.45,
 'max_price': 47.89,
}

You can also do aggregate queries on related fields, but the idea is the same, return a single value for each aggregate.

In my opinion, annotate queries are far more interesting. Annotate queries let us represent queries such as, "give me all of the Tags that more than 3 objects have been tagged with", which would look like:

Tag.objects.annotate(num_items=Count('tagged')).filter(num_items__gt=3)

This would return a normal queryset where each Tag object has an attribute named num_items, that was the Count() of all of tagged for it(I'm assuming tagged is a reverse foreign key, to a model that represents a tagged relationship). Another query we might want to execute would be to see how many awards authors of each author's publisher had won, this would look like:

Author.objects.annotate(num_publisher_awards=Count('publisher__authors__awards')).order_by('num_publisher_awards')

This is a little more complicated, but just like when using filter() we can chain this __ syntax. Also, as you've probably noticed we can filter and order_by these annotated attributes the same as we can with regular fields.

If you're interested in seeing more of how this works, Nicolas Lara has written some documentation and doc tests that you can see here. For now none of this is in the Django source tree yet, but there is a patch with the latest work on ticket 366.

Happy thanksgiving!

You can find the rest here. There are view comments.

A timeline view in Django

Posted November 24th, 2008. Tagged with python, models, tips, django, orm.

One thing a lot of people want to do in Django is to have a timeline view, that shows all the objects of a given set of models ordered by a common key. Unfortunately the Django ORM doesn't have a way of representing this type of query. There are a few techniques people use to solve this. One is to have all of the models inherit from a common baseclass that stores all the common information, and has a method to get the actual object. The problem with this is that it could execute either O(N) or O(N*k) queries, where N is the number of items and k is the number of models. It's N if your baseclass has the subtype it is stored on it, in which case you can directly grab it, else it's N*k since you have to try each type. Another approach is to use a generic relation, this will also need O(N) queries since you need to get the related object for each generic one. However, there's a better solution.

What we can do is use get a queryset for each of the models we want to display(O(k) queries), sorted on the correct key, and then use a simple merge to combine all of these querysets into a single list, comparing on a given key. While this technically may do more operations than the other methods, it does fewer database queries, and this is often the most difficult portion of your application to scale.

Let's say we have 3 models, new tickets, changesets, and wikipage edits(what you see in a typical Trac install). We can get our querysets and then merge them like so:

def my_view(request):
   tickets = Ticket.objects.order_by('create_date')
   wikis = WikiEdit.objects.order_by('create_date')
   changesets = Changeset.objects.order_by('create_date')
   objs = merge(tickets, wikis, changesets, field='create_date')
   return render_to_response('my_app/template.html', {'objects': objs})

Now we just need to write our merge function:

def merge_lists(left, right, field=None):
    i, j = 0, 0
    result = []
    while i:
        if getattr(left[i], field):
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result.extend(left[i:])
    result.extend(right[j:])
    return result

def merge(*querysets, **kwargs):
    field = kwargs.pop('field')
    if field is None:
        raise TypeError('you need to provide a key to do comparisons on')
    if len(querysets) == 1:
        return querysets[0]

    qs = [list(x) for x in querysets]
    q1, q2 = qs.pop(), qs.pop()
    result = merge_lists(q1, q2, field)
    for q in qs:
        result = merge_lists(result, q)
    return result

There might be a more efficient way to write our merge function, but for now it merges together an arbitrary number of querysets on a given key.

And that's all their is too it. If you see a good way to make the merge function more efficient let me know, I would have liked to use Python's included heapq module, but it doesn't have a way to use a custom comparison function that I saw.

You can find the rest here. There are view comments.

Uncoupled code is good, but doesn't exist

Posted November 19th, 2008. Tagged with python, models, django, orm, turbogears.

Code should try to be as decoupled from the code it depends as possible, I want me C++ to work with any compiler, I want my web framework to work with any ORM, I want my ORM to work with any database. While all of these are achievable goals, some of the decoupling people are searching for is simply not possible. At DjangoCon 2008 Mark Ramm made the argument that the Django community was too segregated from the Python community, both in terms of the community itself, and the code, Django for example doesn't take enough advantage of WSGI level middlewear, and has and ORM unto itself. I believe some of these claims to be true, but I ultiamtely thing the level of uncoupling some people want is simply impossible.

One of Django's biggest selling features has always been it's automatically generated admin. The admin requires you to be using Django's models. Some people would like it to be decoupled. To them I ask, how? It's not as if Django's admin has a big if not isinstance(obj, models.Model): raise Exception, it simply expects whatever is passed to it to define the same API as it uses. And this larger conecern, the Django admin is simply an application, it has no hooks within Django itself, it just happens to live in that namespace, the moment any application does Model.objects.all(), it's no longer ORM agnostic, it's already assumed the usage of a Django ORM. However, all this means is that applications themselves are inextricably tied to a given ORM, templating language, and any other module they import, you quite simply can't write resonably code that works just as well with two different modules unless they both define the same API.

Eric Florenzano wrote a great blog post yesterday about how Django could take better advantage of WSGI middleware, and he's absolutely correct. It makes no sense for a Django project to have it's own special middlewear for using Python's profiling modules, when it can be done more generic a level up, all the code is in Python afterall. However, there are also things that you can't abstract out like that, because they require a knowledge of what components you are using, SQLAlchemy has one transation model, Django has another.

The fact that an application is tied to the modules it uses is not an argument against it. A Django application is no tightly coupled to Django's ORM and template system is than a Turbo Gears application that uses SQL Alchemy and Mako, which is to say of course they're tied to it, they import those modules, they use them, and unless the other implements the same API you can't just swap them out. And that's not a bad thing.

You can find the rest here. There are view comments.

What Python learned from economics

Posted November 18th, 2008. Tagged with economics, django, python.

I find economics to be a fairly interesting subject, mind you I'm bored out of my mind about hearing about the stock markets, derivatives, and whatever else is on CNBC, but I find what guys like Steven Levitt and Steve E. Landsburg do to be fascinating. A lot of what they write about is why people do what they do, and how to incentivise people to do the right thing. Yesterday I was reading through David Goodger's Code Like a Pythonista when I got to this portion:

LUKE: Is from module import * better than explicit imports?

YODA: No, not better. Quicker, easier, more seductive.

LUKE: But how will I know why explicit imports are better than the wild-card form?

YODA: Know you will when your code you try to read six months from now.

And I realized that Python had learned a lot from these economists.

It's often dificult for a programmer to see the advantage of doing something the right way, which will be benneficial in six months, over just getting something done now. However, Python enforces doing things the right way, and when doing things the right way is just as easy as doing in the wrong way, you make the intuitive decision of doing things the right way. Almost every code base I've worked with(outside of Python) had some basic indentation rules that the code observed, Python just encodes this into the language, which requires all code to have a certain level of readability.

Django has also learned this lesson. For example, the template language flat out prevents you from putting your business logic inside of it without doing some real work, you don't want to do that work, so you do things the right way and put your business logic in your views. Another example would be database queries, in Django it would be harder to write a query that injected unescaped into your SQL than it would be do the right thing at use parameterized queries.

Ultimately, this is why I like Python. The belief that best practices shouldn't be optional, and that they shouldn't be difficult creates a community where you actively want to go and learn from people's code. Newcomers to the language aren't encouraged to "just get something working, and then clean it up later," the communiity encourages them to do it right in the first place, and save themselves the time later.

You can find the rest here. There are view comments.

Running the Django Test Suite

Posted November 17th, 2008. Tagged with django, tests.

This question came up in IRC yesterday, so I figured I'd run through it today. Django has a very extensive test suite that tests all of the components of Django itself. If you've ever written a patch for Django you probably know that tests are a requirement, for both new features and bug fixes. I'm going to try to run down how to setup your own testing envrioment.

First you need to have Django installed somewhere, for now I'll assume you have it in ~/django_src/. Somewhere on your python path, go ahead are use django-admin.py to create a new project. I've named this project django_test. Next, inside of that project create a folder named settings, and move settings.py into that folder and renmae it __init__.py. The reason we're going to have a settings directory is so that we can have subsettings for individual scenarios. Put some default settings for the various field in there now, for example my default settings provides the necessary options for SQLite. Now if there are any other subsettings you wish to setup create a file for them in the settings directory, and at the top of this file put from django_test.settings import *, followed by whatever settings you wish to overide. For example I have a mysql.py that overides my default SQLite database settings with MySQL values.

Now that we have our test settings, go to the directory where you have Django installed. To run the tests do ./tests/runtests.py --settings=django_test.settings. You can also use the DJANGO_SETTINGS_MODULE enviromental variable in place of passing in the settings like this. You can also provide a verbosity argument, the default is 0, I prefer to run with verbosity=1 because this keeps you up to date on the progress of the tests, without too much extraneus output.

Usually when you are working on a patch you don't need to run the entire test suite, your patch only affects a few tests. Therefore, you can provide a list of tests you'd like to run, so for example you could do ./tests/runtests.py --settings=django_test.settings.mysql -v 1 model_forms model_formsets. This will run the model_forms and model_formsets tests, with your mysql settings, at verbosity level 1.

And that's all it takes to run the Django test suite.

You can find the rest here. There are view comments.

What I'm excited about in Django 1.1

Posted November 16th, 2008. Tagged with django.

This past week, Jacob Kaplan-Moss put together the list of all of the features proposed for Django 1.1, and began to solicit comments on them. This is going to be a list of features I'm excited about.
  • Making admin URLs reversable. Currently the Django Admin uses a bit of a convulted scheme to route URLs. The proposal is to have them work using the current URL scheme. This is something I've been working on for a while, and hoping to see it to completion.
  • Comment-utils inclusion. The proposal is to include the moderation level features from comment-utils in Django. I think this is a great idea, and can't wait to see what sort of spam check schemes people implement once moderation facilities are included.
  • Message passing for anonymous users. This is basically session level message passing. This is something I've had to implement in past, so I'm looking forward to this.
  • ORM aggregation, as part of the Google Summer of Code Nicolas Lara, mentored by Russell Keith-Magee, implemented this. I love the API design, and it's a hugely requested feature, I can't wait to point new users to the docs, rather than explaining to them that it's coming soon.
  • ORM expression support, this work was also done by Nicolas Lara, and will let you do things like Model.objects.filter(height__gt=F('width')), or Model.objects.update(salary = F('salary')*1.2).
  • Model Validation, before 1.0 I implemented unique, and unique_together checks for model forms. That's pretty much a massive special case of this.
  • Class-based generic views, often one of the first things a new user to Django will learn to do is create a view that simply wraps a generic view, in order to do some custom filtering on a queryset. This is a great solution for that usecase, however as users want to inject more and more flexibility into a generic view it can lead to a huge number of settings. Rather than this, subclassing a generic view could provide a nice clean solution.

These will all be great features, and there are many more proposed(you can see them all here), however these features only happen because people write code for them. If there's a feature you're excited about, or interested in making a reality, try to contribute to it, even if it's just writing some unit tests.

You can find the rest here. There are view comments.

And now for the disclaimer

Posted November 14th, 2008. Tagged with disclaimer, django, python, internals.

I've been discussing portions of how the Django internals work, and this is powerful knowledge for a Django user. However, it's also internals, and unless they are documented internals are not guaranteed to continue to work. That doesn't mean they break very frequently, they don't, but you should be aware that it has no guarantee of compatibility going forward.

Having said that, I've already discussed ways you can use this to do powerful things by using these, and you've probably seen other ways to use these in your own code. In my development I don't really balk at the idea of using the internals, because I track Django's development very aggressively, and I can update my code as necessary, but for a lot of developers that isn't really an options. Once you deploy something you need it to work, so your options are to either lock your code at a specific version of Django, or not using these internals. What happens if you want to update to Django 1.1 for aggregation support, but 1.1 also removed some internal helper function you were using. Something similar to this happened to django-tagging, before the queryset-refactor branch was merged into trunk there was a helper function to parse portions of the query, and django-tagging made use of this. However, queryset-refactor obsoleted this function and removed, and so django-tagging had to update in order to work going forward, needing to either handle this situation in the code itself, or to maintain two separate branches.

In my opinion, while these things may break, they are worth using if you need them, because they let you do very powerful things. This may not be the answer for everyone though. In any event I'm going to continue writing about them, and if they interest you Marty Alchin has a book coming out, named Pro Django, that looks like it will cover a lot of these.

You can find the rest here. There are view comments.

Django Models - Digging a Little Deeper

Posted November 13th, 2008. Tagged with foreignkey, python, models, django, orm, metaclass.

For those of you who read my last post on Django models you probably noticed that I skirted over a few details, specifically for quite a few items I said we, "added them to the new class". But what exactly does that entail? Here I'm going to look at the add_to_class method that's present on the ModelBase metaclass we look at earlier, and the contribute_to_class method that's present on a number of classes throughout Django.

So first, the add_to_class method. This is called for each item we add to the new class, and what it does is if that has a contribute_to_class method than we call that with the new class, and it's name(the name it should attach itself to the new class as) as arguments. Otherwise we simply set that attribute to that value on the new class. So for example new_class.add_to_class('abc', 3), 3 doesn't have a contribute_to_class method, so we just do setattr(new_class, 'abc', 3).

The contribute_to_class method is more common for things you set on your class, like Fields or Managers. The contribute_to_class method on these objects is responsible for doing whatever is necessary to add it to the new class and do it's setup. If you remember from my first blog post about User Foreign Keys, we used the contribute_to_class method to add a new manager to our class. Here we're going to look at what a few of the builtin contribute_to_class methods do.

The first case is a manager. The manager sets it's model attribute to be the model it's added to. Then it checks to see whether or not the model already has an _default_manager attribute, if it doesn't, or if it's creation counter is lower than that of the current creation counter, it sets itself as the default manager on the new class. The creation counter is essentially a way for Django to keep track of which manager was added to the model first. Lastly, if this is an abstract model, it adds itself to the abstract_managers list in _meta on the model.

The next case is if the object is a field, different fields actually do slightly different things, but first we'll cover the general field case. It also, first, sets a few of it's internal attributes, to know what it's name is on the new model, additionally calculating it's column name in the db, and it's verbose_name if one isn't explicitly provided. Next it calls add_field on _meta of the model to add itself to _meta. Lastly, if the field has choices, it sets the get_FIELD_display method on the class.

Another case is for file fields. They do everything a normal field does, plus some more stuff. They also add a FileDescriptor to the new class, and they also add a signal receiver so that when an instance of the model is deleted the file also gets deleted.

The final case is for related fields. This is also the most complicated case. I won't describe exactly what this code does, but it's biggest responsibility is to set up the reverse descriptors on the related model, those are nice things that let you author_obj.books.all().

Hopefully this gives you a good idea of what to do if you wanted to create a new field like object in Django. For another example of using these techniques, take a look at the generic foreign key field in django.contrib.contenttypes, here.

You can find the rest here. There are view comments.

What software do I use?

Posted November 12th, 2008. Tagged with ubuntu, gtk, django, python.

Taking a page from Brain Rosner's book, today I'm going to overview the software I use day to day. I'm only going to cover stuff I use under Ubuntu, I keep Windows XP on my system for gaming, but I'm not going to cover it here.

  • Ubuntu, I've been using the current version, Intrepid Ibex, since Alpha 4, and I love it. You quite simply couldn't get me to go back to Windows.
  • Python, it's my go to language, I fell in love about 14 months ago and I'm never going to leave it.
  • Django, it's my framework of choice, it's simple, clean, and well designed.
  • g++, C++ is the language used in my CS class, so I use my favorite free compiler.
  • gnome-do, this is an incredibly handy application, similar to Quicksilver for OS X, it makes simple things super fast, stuff like spawning the terminal, posting a tweet, searching for a file, or calling on Google's awesome calculator.
  • Firefox, the tried and true free browser, I also have to thank, Gmail Notifier, Firebug, Download them All, and Reload Every.
  • Chatzilla, I figured this extension deserved it's own mention, I use it almost 24/7 and couldn't live without it.
  • Gedit, who would think that the text editor that came with my OS would be so great?
  • VLC and Totem, you guys are both great, VLC is a bit nicer for playing flvs, but I love Totem's ability to search and play movies from Youtube.
  • Skype, makes it easy to get conference calls going with 5 friends, couldn't live without it.

As you can see most of the software I use is open source. I don't imagine anything I use is very outside the mainstream, but all of these projections deserve a round of applause for being great.

You can find the rest here. There are view comments.

How the Heck do Django Models Work

Posted November 10th, 2008. Tagged with models, django, python, metaclass.

Anyone who has used Django for just about any length of time has probably used a Django model, and possibly wondered how it works. The key to the whole thing is what's known as a metaclass, a metaclass is essentially a class that defines how a class is created. All the code for this occurs is here. And without further ado, let's see what this does.

So the first thing to look at is the method, __new__, __new__ is sort of like __init__, except instead of returning an instance of the class, it returns a new class. You can sort of see this is the argument signature, it takes cls, name, bases, and attrs. Where __init__ takes self, __new__ takes cls. Name is a string which is the name of the class, bases is the class that this new class is a subclass of, and attrs is a dictionary mapping names to class attributes.

The first thing the __new__ method does is check if the new class is a subclass of ModelBase, and if it's not, it bails out, and returns a normal class. The next thing is it gets the module of the class, and sets the attribute on the new class(this is going to be a recurring theme, getting something from the original class, and putting it in the right place on the new class). Then it checks if it has a Meta class(where you define your Model level options), it has to look in two places for this, first in the attrs dictionary, this is where it will be if you stick your class Meta inside your class. However, because of inheritance, we also have to check if the class has an _meta attribute already(this is where Django ultimately stores a bunch of internal information), and handle that scenario as well.

Next we get the app_label attribute, for this we either use the app_label attribute in the Meta class, or we pull it out of sys.modules. Lastly(at least for Meta), we build an instance of the Options class(which lives at django.db.models.options.Option) and add it to the new class as _meta. Next, if this class isn't an abstract base class we add the DoesNotExist and MultipleObjectsReturned exceptions to the class, and also inherit the ordering and get_latest_by attributes if we are a subclass.

Now we start getting to adding fields and stuff. First we check if we have an _default_manager attribute, and if not, we set it to None. Next we check if we've already defined the class, and if we have, we just return the class we already created. Now we go through each item that's left in the attrs dictionary and class the add_to_class method with it on the new class. add_to_class is a piece of internals that you may recognize from my first two blog posts, and what exactly it does I'll explain exactly what it does in another most, but at it's most basic level it adds each item in the dictionary to the new class, and each item knows where exactly it needs to get put.

Now we do a bunch of stuff to deal with inherited models. We iterate through ever item in bases, that's also a subclass of models.Model, and do the following: if it doesn't have an _meta attribute, we ignore it. If the parent isn't an abstract base class, if we already have a OneToOne field to it we set it up as a primary key, otherwise we create a new OneToOne field and install it as a primary key for the model. And now, if it is an abstract class, we iterate through the fields, if any of these fields has a name that is already defined on our class, we raise an error, otherwise we add that field to our class. And now we move managers from the parents down to the new class. Essentially we juts copy them over, and we also copy over virtual fields(these are things like GenericForeignKeys, which doesn't actually have a database field, but we still need to pass down and setup appropriately).

And then we do a few final pieces of cleanup. We make sure our new class doesn't have abstract=True in it's _meta, even if it's inherited from an abstract class. We add a few methods(get_next_in_order, and other), we inherit the docstring, or set a new one, and we send the class prepared signal. Finally, we register the model with Django's model loading system, and return the instance in Django's model cache, this is to make sure we don't have duplicate copies of the class floating around.

And that's it! Obviously I've skirted over how exactly somethings occur, but you should have a basic idea of what occurs. As always with Django, the source is an excellent resource. Hopefully you have a better idea of what exactly happens when you subclass models.Model now.

You can find the rest here. There are view comments.

That's not change we can believe in

Posted November 7th, 2008. Tagged with php, django, python, obama.

Yesterday president-elect Obama's campaign unveiled their transitional website, change.gov. So, as someone who's interested in these things I immediately began to look at what language, framework, or software package they were using. The first thing I saw was that they were using Apache, however beyond that there were no distinctive headers. None of the pages had tell-tale extensions like .php or .aspx. However, one thing that struck me was that most pages were at a url in the form of /page/*/, which is the same format of the Obama campaign website, which I knew was powered by Blue State Digital's CMS. On the Obama campaign's site however, there were a few pages with those tell-tale .php extension, so I've come to the conclusion that the new site also uses PHP. And to that I say, that's not change we can believe in.

PHP has been something of a powerhouse in web development for the last few years, noted for it's ease of deployment and quick startup times, it's drawn in legions of new users. However, PHP has several notable flaws. Firstly, it doesn't encourage best practices, ranging from things like code organization (PHP currently has no concept of namespaces), to database security (the included mysql database adapter doesn't feature parameterized queries), and beyond. However, this isn't just another post to bash on PHP (as much as I'd like to do one), there are already plenty of those out there. This post is instead to offer some of the benefits of switching, to Python, or Ruby, or whatever else.

  • You develop faster. Using a framework like Django, or Rails, or TurboGears let's you do things very quickly.
  • You get the benefits of the community, with Django you get all the reusable applications, Rails has plugins, TurboGears has middleware. Things like these quite simply don't exist in the PHP world.
  • You get a philosophy. As far as I can tell, PHP has no philosophy, however both Python and Ruby do, and so do their respective frameworks. Working within a consistant philsophy makes development remarkably more sane.

If you currently are a user of PHP, I beg of you, take a chance, try out Ruby or Python, or whatever else. Give Django, or TurboGears, or Rails a shot. Even if you don't end up liking it, or switching, it's worth giving it a shot.

You can find the rest here. There are view comments.

More Laziness with Foreign Keys

Posted November 4th, 2008. Tagged with foreignkey, models, django, orm.

Yesterday we looked at building a form field to make the process of getting a ForeignKey to the User model more simple, and to provide us with some useful tools, like the manager. But this process can be generalized, and made more robust. First we want to have a lazy ForeignKey field for all models(be careful not to confuse the term lazy, here I use it to refer to the fact that I am a lazy person, not the fact that foreign keys are lazy loaded).

A more generic lazy foreign key field might look like:

from django.db.models import ForeignKey, Manager

class LazyForeignKey(ForeignKey):
    def __init__(self, *args, **kwargs):
        model = kwargs.get('to')
        if model_name is None:
            model = args[0]
        try:
            name = model._meta.object_name.lower()
        except AttributeError:
            name = model.split('.')[-1].lower()
        self.manager_name = kwargs.pop('manager_name', 'for_%s' % name)
        super(ForeignKey, self).__init__(*args, **kwargs)

    def contribute_to_class(self, cls, name):
        super(ForeignKey, self).contribute_to_class(cls, name)

        class MyManager(Manager):
            def __call__(self2, obj):
                return cls._default_manager.filter(**{self.name: obj})

        cls.add_to_class(self.manager_name, MyManager())

As you can see, a lot of the code is the same as before. Most of the new code is in getting the mode's name, either through _meta, or through the last part of the string(i.e. User in "auth.User"). And now you will have a manager on your class, named either for_X where X is the name of the model the foreign key is to lowercase, or named whatever the kwarg manager_name is.

So if your model has this:

teacher = LazyForeignKey(Teacher)

You would be able to do:

MyModel.for_teacher(Teacher.objects.get(id=3))

That's all for today. Since tonight is election night, tomorrow I'll probably post about my application election-sim, and about PyGTK and PyProcessing(aka multiprocessing).

You can find the rest here. There are view comments.

Lazy User Foreign Keys

Posted November 3rd, 2008. Tagged with foreignkey, models, django, orm.

A very common pattern in Django is for models to have a foreign key to django.contrib.auth.User for the owner(or submitter, or whatever other relation with User) and then to have views that filter this down to the related objects for a specific user(often the currently logged in user). If we think ahead, we can make a manager with a method to filter down to a specific user. But since we are really lazy we are going to make a field that automatically generates the foreign key to User, and gives us a manager, automatically, to filter for a specific User, and we can reuse this for all types of models.

So what does the code look like:

from django.db.models import ForeignKey, Manager

from django.contrib.auth.models import User

class LazyUserForeignKey(ForeignKey):
    def __init__(self, **kwargs):
        kwargs['to'] = User
        self.manager_name = kwargs.pop('manager_name', 'for_user')
        super(ForeignKey, self).__init__(**kwargs)

    def contribute_to_class(self, cls, name):
        super(ForeignKey, self).contribute_to_class(cls, name)

        class MyManager(Manager):
            def __call__(self2, user):
                return cls._default_manager.filter(**{self.name: user})

        cls.add_to_class(self.manager_name, MyManager())

So now, what does this do?

We are subclassing ForeignKey. In __init__ we make sure to is set to User and we also set self.manager_name equal to either the manager_name kwarg, if provided or 'for_user'. contribute_to_class get called by the ModelMetaclass to add each item to the Model itself. So here we call the parent method, to get the ForeignKey itself set on the model, and then we create a new subclass of Manager. And we define an __call__ method on it, this lets us call an instance as if it were a function. And we make __call__ return the QuerySet that would be returned by filtering the default manager for the class where the user field is equal to the given user. And then we add it to the class with the name provided earlier.

And that's all. Now we can do things like:

MyModel.for_user(request.user)

Next post we'll probably look at making this more generic.

You can find the rest here. There are view comments.