Recently Russell Keith-Magee and I decided that the Meta.using option needed to be removed from the multiple-db work on Django, and so we did. Yesterday someone tweeted that this change caught them off guard, so I wanted to provide a bit of explanation as to why we made that change.
The first thing to note is that Meta.using was very good for one specific use case, horizontal partitioning by model. Meta.using allowed you to tie a specific model to a specific database by default. This meant that if you wanted to do things like have users be in one db and votes in another this was basically trivial. Making this use case this simple was definitely a good thing.
The downside was that this solution was very poorly designed, particularly in light on Django's reusable application philosophy. Django emphasizes the reusability of application, and having the Meta.using option tied your partitioning logic to your models, it also meant that if you wanted to partition a reusable application onto another DB this easily the solution was to go in and edit the source for the reusable application. Because of this we had to go in search of a better solution.
The better solution we've come up with is having some sort of callback you can define that lets you decide what database each query should be executed on. This would let you do simple things like direct all queries on a given model to a specific database, as well as more complex sharding logic like sending queries to the right database depending on which primary key value the lookup is by. We haven't figured out the exact API for this, and as such this probably won't land in time for 1.2, however it's better to have the right solution that has to wait than to implement a bad API that would become deprecated in the very next release.
You can find the rest here. There are view comments.
This summer as a part of the Google Summer of Code program Zain Memon worked on improving the UI for Django's admin, specifically he integrated jQuery for various interface improvements. I am opposed to including jQuery in Django's admin, as far as I know I'm the only one. I should note that on a personal level I love jQuery, however I don't think that means it should be included in Django proper. I'm going to try to explain why I think it's a bad idea and possibly even convince you.
The primary reason I'm opposed is because it lowers the pool of people who can contribute to developing Django's admin. I can hear the shouts from the audience, "jQuery makes Javascript easy, how can it LOWER the pool". By using jQuery we prevent people who know Javascript, but not jQuery from contributing to Django's admin. If we use more "raw" Javascript then anyone who knows jQuery should be able to contribute, as well as anyone who knows Mootools, or Dojo, or just vanilla Javascript. I'm sure there are some people who will say, "but it's possible to use jQuery without knowing Javascript", I submit to you that this is a bad thing and certainly shouldn't be encouraged. We need to look no further than Jacob Kaplan-Moss's talks on Django where he speaks of his concern at job postings that look for Django experience with no mention of Python.
The other reason I'm opposed is because selecting jQuery for the admin gives the impression that Django has a blessed Javascript toolkit. I'm normally one to say, "if people make incorrect inferences that's their own damned problem," however in this case I think they would be 100% correct, Django would have blessed a Javascript toolkit. Once again I can hear the calls, "But, it's in contrib, not Django core", and again I disagree, Django's contrib isn't like other projects' contrib directories that are just a dumping ground for random user contributed scripts and other half working features. Django's contrib is every bit as official as parts of Django that live elsewhere in the source tree. Jacob Kaplan-Moss has described what django.contrib is, no part of that description involves it being less official, quite the opposite in fact.
For these reasons I believe Django's admin should avoid selecting a Javascript toolkit, and instead maintain it's own handrolled code. Though this brings an increase burden on developers I believe it is more important to these philosophies than to take small development wins. People saying this stymies the admin's development should note that Django's admin's UI has changed only minimally over the past years, and only a small fraction of that can be attributed to difficulties in Javascript development.
You can find the rest here. There are view comments.
As you, the reader, may know this summer I worked for the Django Software Foundation via the Google Summer of Code program. My task was to implement multiple database support for Django. Assisting me in this task were my mentors Russell Keith-Magee and Nicolas Lara (you may recognize them as the people responsible for aggregates in Django). By the standards of the Google Summer of Code project my work was considered a success, however, it's not yet merged into Django's trunk, so I'm going to outline what happened, and what needs to happen before this work is considered complete.
Most of the major things happened, settings were changed from a series of DATABASE_* to a DATABASES setting that's keyed by DB aliases and who's values are dictionaries containing the usual DATABASE* options, QuerySets grew a using() method which takes a DB alias and says what DB the QuerySet should be evaluated against, save() and delete() grew similar using keyword arguments, a using option was added to the inner Meta class for models, transaction support was expanded to include support for multiple databases, as did the testing framework. In terms of internals almost every internal DB related function grew explicit passing of the connection or DB alias around, rather than assuming the global connection object as they used to. As I blogged previously ManyToMany relations were completely refactored. If it sounds like an awful lot got done, that's because it did, I knew going in that multi-db was a big project and it might not all happen within the confines of the summer.
So if all of that stuff got done, what's left? Right before the end of the GSOC time frame Russ and I decided that a fairly radical rearchitecting of the Query class (the internal datastructure that both tracks the state of an operation and forms its SQL) was needed. Specifically the issue was that database backends come in two varieties. One is something like a backend for App Engine, or CouchDB. These have a totally different design than SQL, they need different datastructures to track the relevant information, and they need different code generation. The second type of database backend is one for a SQL database. By contrast these all share the same philosophies and basic structure, in most cases their implementation just involves changing the names of database column types or the law LIMIT/OFFSET is handled. The problem is Django treated all the backends equally. For SQL backends this meant that they got their own Query classes even though they only needed to overide half of the Query functionality, the SQL generation half, as the datastructure part was identical since the underlying model is the same. What this means is that if you make a call to using() on a QuerySet half way through it's construction you need to change the class of the Query representation if you switch to a database with a different backend. This is obviously a poor architecture since the Query class doesn't need to be changed, just the bit at the end that actually constructs the SQL. To solve this problem Russ and I decided that the Query class should be split into two parts, a Query class that stores bits about the current query, and a SQLCompiler which generated SQL at the end of the process. And this is the refactoring that's holding up the merger of my multi-db work primarily.
This work is largely done, however the API needs to be finalized and the Oracle backend ported to the new system. In terms of other work that needs to be done, GeoDjango needs to be shown to shown to still work (or fixed). In my opinion everything else on the TODO list (available here, please don't deface) is optional for multi-db to be merge ready, with the exception of more example documentation.
There are already people using the multi-db branch (some even in production), so I'm confident about it's stability. For the next 6 weeks or so (until the 1.2 feature deadline), my biggest priority is going to be getting this branch into a merge ready state. If this is something that interests you please feel free to get in contact with me (although if you don't come bearing a patch I might tell you that I'll see you in 6 weeks ;)), if you happen to find bugs they can be filed on the Django trac, with version "soc2009/multidb". As always contributors are welcome, you can find the absolute latest work on my Github and a relatively stable version in my SVN branch (this doesn't contain the latest, in progress, refactoring). Have fun.
You can find the rest here. There are view comments.
If you follow Django's development, or caught next week's DjangoDose Tracking Trunk episode (what? that's not how time flows you say? too bad) you've seen the recent ManyToManyField refactoring that Russell Keith-Magee committed. This refactoring was one of the results of my work as a Google Summer of Code student this summer. The aim of that work was to bring multiple database support to Django's ORM, however, along the way I ran into refactor the way ManyToManyField's were handled, the exact changes I made are the subject of tonight's post.
If you've looked at django.db.models.fields.related you may have come away asking how code that messy could possibly underlie Django's amazing API for handling related objects, indeed the mess so is so bad that there's a comment which says:
# HACK
which applies to an entire class. However, one of the real travesties of this module was that it contained a large swath of raw SQL in the manager for ManyToMany relations, for example the clear() method's implementation looks like:
cursor = connection.cursor()
cursor.execute("DELETE FROM %s WHERE %s = %%s" % \
(self.join_table, source_col_name),
[self._pk_val])
transaction.commit_unless_managed()
As you can see this hits the trifecta, raw SQL, manual transaction handling, and the use of a global connection object. From my perspective the last of these was the biggest issue. One of the tasks in my multiple database branch was to remove all uses of the global connection object, and since this uses it it was a major target for refactoring. However, I really didn't want to rewrite any of the connection logic I'd already implemented in QuerySets. This desire to avoid any new code duplication, coupled with a desire to remove the existing duplication (and flat out ugliness), led me to the simple solution: use the existing machinery.
Since Django 1.0 developers have been able to use a full on model for the intermediary table of a ManyToMany relation, thanks to the work of Eric Florenzano and Russell Keith-Magee. However, that support was only used when the user explicitly provided a through model. This of course leads to a lot of methods that basically have two implementation: one for the through model provided case, and one for the normal case -- which is yet another case of code bloat that I was now looking to eliminate. After reviewing these items my conclusion was that the best course was to use the provided intermediary model if it was there, otherwise create a full fledged model with the same fields (and everything else) as the table that would normally be specially created for the ManyToManyField.
The end result was dynamic class generation for the intermediary model, and simple QuerySet methods for the methods on the Manager, for example the clear() method I showed earlier now looks like this:
self.through._default_manager.filter(**{
source_field_name: self._pk_val
}).delete()
Short, simple, and totally readable to anyone with familiarity with Python and Django. In addition this move allowed Russell to fix another ticket with just two lines of code. All in all this switch made for cleaner, smaller code and fewer bugs.
Tomorrow I'm going to be writing about both the talk I'm going to be giving at PyCon, as well as my experience as a member of the PyCon program committee. See you then.
You can find the rest here. There are view comments.