alex gaynor's blago-blog

Posts tagged with foreignkey

Fixing up our identity mapper

Posted December 1st, 2008. Tagged with foreignkey, models, django, orm, internals.

The past two days we've been looking at building an identity mapper in Django. Today we're going to implement some of the improvements I mentioned yesterday.

The first improvement we're going to do is it have it execute the query as usual and just cache the results, to prevent needing to execute additional queries. This means changing the __iter__ method on our queryset class:

def __iter__(self):
    for obj in self.iterator():
        try:
            yield get_from_cache(self.model, obj.pk)
        except KeyError:
            cache_instance(obj)
            yield obj

Now we just iterate over self.iterator() which is a slightly lower level interface to a querysets iteration, it bypasses all the caching that occurs(this means that for now at least, if we iterate over our queryset twice we actually execute two queries, whereas Django would normally do just one), however overall this will be a big win, since before if an item wasn't in the cache we would do an extra query for it.

The next improvement I proposed was to use Django's built in caching interfaces. However, this won't work, this is because the built in locmem cache backend pickles and unpickles everything before caching and retrieving everything from the cache, so we'd end up with different objects(which defeats the point of this).

The last improvement we can make is to have this work on related objects for which we already know the primary key. The obvious route to do this is to start hacking in django.db.models.fields.related, however as I've mentioned in a previous post this area of the code is a bit complex, however if we know a little bit about how this query is executed we can do the optimisation in a far simpler way. As it turns out the related object descriptor simply tries to do the query using the default manager's get method. Therefore, we can simply special case this occurrence in order to optimise this. We also have to make a slight chance to our manager, as by default the manager won't be used on related object queries:

class CachingManager(Manager):
    use_for_related_fields = True
    def get_query_set(self):
        return CachingQuerySet(self.model)

class CachingQueryset(QuerySet):
    ...
    def get(self, *args, **kwargs):
        if len(kwargs) == 1:
            k = kwargs.keys()[0]
            if k in ('pk', 'pk__exact', '%s' % self.model._meta.pk.attname, '%s__exact' % self.model._meta.pk.attname):
                try:
                    return get_from_cache(self.model, kwargs[k])
                except KeyError:
                    pass
        clone = self.filter(*args, **kwargs)
        objs = list(clone[:2])
        if len(objs) == 1:
            return objs[0]
        if not objs:
            raise self.model.DoesNotExist("%s matching query does not exist."
                             % self.model._meta.object_name)
        raise self.model.MultipleObjectsReturned("get() returned more than one %s -- it returned %s! Lookup parameters were %s"
                % (self.model._meta.object_name, len(objs), kwargs))

As you can see we just add one line to the manager, and a few lines to the begging of the get() method. Basically our logic is if there is only one kwarg to the get() method, and it is a query on the primary key of the model, we try to return our cached instance. Otherwise we fall back to executing the query.

And with this we've improved the efficiency of our identity map, there are almost definitely more places for optimisations, but now we have an identity map in very few lines of code.

You can find the rest here. There are view comments.

A Few More Thoughts on the Identity Mapper

Posted December 1st, 2008. Tagged with foreignkey, models, django, orm, internals.

It's late, and I've my flight was delayed for several hours so today is going to be another quick post. With that note here are a few thoughts on the identity mapper:
  • We can optimize it to actually execute fewer queries by having it run the query as usual, and then use the primary key to check the cache, else cache the instance we already have.
  • As Doug points out in the comments, there are built in caching utilities in Django we should probably be taking advantage of. The only qualification is that whatever cache we use needs to be in memory and in process.
  • The cache is actually going to be more efficient than I originally thought. On a review of the source the default manager is used for some related queries, so our manager will actually be used for those.
  • The next place to optimize will actually be on single related objects(foreign keys and one to ones). That's because we already have their primary key and so we can check for them in the cache without executing any SQL queries.

And lastly a small note. As you may have noticed I've been doing the National Blog Everyday for a Month Month, since I started two days late, I'm going to be continueing on for another two days.

You can find the rest here. There are view comments.

Django Models - Digging a Little Deeper

Posted November 13th, 2008. Tagged with foreignkey, python, models, django, orm, metaclass.

For those of you who read my last post on Django models you probably noticed that I skirted over a few details, specifically for quite a few items I said we, "added them to the new class". But what exactly does that entail? Here I'm going to look at the add_to_class method that's present on the ModelBase metaclass we look at earlier, and the contribute_to_class method that's present on a number of classes throughout Django.

So first, the add_to_class method. This is called for each item we add to the new class, and what it does is if that has a contribute_to_class method than we call that with the new class, and it's name(the name it should attach itself to the new class as) as arguments. Otherwise we simply set that attribute to that value on the new class. So for example new_class.add_to_class('abc', 3), 3 doesn't have a contribute_to_class method, so we just do setattr(new_class, 'abc', 3).

The contribute_to_class method is more common for things you set on your class, like Fields or Managers. The contribute_to_class method on these objects is responsible for doing whatever is necessary to add it to the new class and do it's setup. If you remember from my first blog post about User Foreign Keys, we used the contribute_to_class method to add a new manager to our class. Here we're going to look at what a few of the builtin contribute_to_class methods do.

The first case is a manager. The manager sets it's model attribute to be the model it's added to. Then it checks to see whether or not the model already has an _default_manager attribute, if it doesn't, or if it's creation counter is lower than that of the current creation counter, it sets itself as the default manager on the new class. The creation counter is essentially a way for Django to keep track of which manager was added to the model first. Lastly, if this is an abstract model, it adds itself to the abstract_managers list in _meta on the model.

The next case is if the object is a field, different fields actually do slightly different things, but first we'll cover the general field case. It also, first, sets a few of it's internal attributes, to know what it's name is on the new model, additionally calculating it's column name in the db, and it's verbose_name if one isn't explicitly provided. Next it calls add_field on _meta of the model to add itself to _meta. Lastly, if the field has choices, it sets the get_FIELD_display method on the class.

Another case is for file fields. They do everything a normal field does, plus some more stuff. They also add a FileDescriptor to the new class, and they also add a signal receiver so that when an instance of the model is deleted the file also gets deleted.

The final case is for related fields. This is also the most complicated case. I won't describe exactly what this code does, but it's biggest responsibility is to set up the reverse descriptors on the related model, those are nice things that let you author_obj.books.all().

Hopefully this gives you a good idea of what to do if you wanted to create a new field like object in Django. For another example of using these techniques, take a look at the generic foreign key field in django.contrib.contenttypes, here.

You can find the rest here. There are view comments.

More Laziness with Foreign Keys

Posted November 4th, 2008. Tagged with foreignkey, models, django, orm.

Yesterday we looked at building a form field to make the process of getting a ForeignKey to the User model more simple, and to provide us with some useful tools, like the manager. But this process can be generalized, and made more robust. First we want to have a lazy ForeignKey field for all models(be careful not to confuse the term lazy, here I use it to refer to the fact that I am a lazy person, not the fact that foreign keys are lazy loaded).

A more generic lazy foreign key field might look like:

from django.db.models import ForeignKey, Manager

class LazyForeignKey(ForeignKey):
    def __init__(self, *args, **kwargs):
        model = kwargs.get('to')
        if model_name is None:
            model = args[0]
        try:
            name = model._meta.object_name.lower()
        except AttributeError:
            name = model.split('.')[-1].lower()
        self.manager_name = kwargs.pop('manager_name', 'for_%s' % name)
        super(ForeignKey, self).__init__(*args, **kwargs)

    def contribute_to_class(self, cls, name):
        super(ForeignKey, self).contribute_to_class(cls, name)

        class MyManager(Manager):
            def __call__(self2, obj):
                return cls._default_manager.filter(**{self.name: obj})

        cls.add_to_class(self.manager_name, MyManager())

As you can see, a lot of the code is the same as before. Most of the new code is in getting the mode's name, either through _meta, or through the last part of the string(i.e. User in "auth.User"). And now you will have a manager on your class, named either for_X where X is the name of the model the foreign key is to lowercase, or named whatever the kwarg manager_name is.

So if your model has this:

teacher = LazyForeignKey(Teacher)

You would be able to do:

MyModel.for_teacher(Teacher.objects.get(id=3))

That's all for today. Since tonight is election night, tomorrow I'll probably post about my application election-sim, and about PyGTK and PyProcessing(aka multiprocessing).

You can find the rest here. There are view comments.

Lazy User Foreign Keys

Posted November 3rd, 2008. Tagged with foreignkey, models, django, orm.

A very common pattern in Django is for models to have a foreign key to django.contrib.auth.User for the owner(or submitter, or whatever other relation with User) and then to have views that filter this down to the related objects for a specific user(often the currently logged in user). If we think ahead, we can make a manager with a method to filter down to a specific user. But since we are really lazy we are going to make a field that automatically generates the foreign key to User, and gives us a manager, automatically, to filter for a specific User, and we can reuse this for all types of models.

So what does the code look like:

from django.db.models import ForeignKey, Manager

from django.contrib.auth.models import User

class LazyUserForeignKey(ForeignKey):
    def __init__(self, **kwargs):
        kwargs['to'] = User
        self.manager_name = kwargs.pop('manager_name', 'for_user')
        super(ForeignKey, self).__init__(**kwargs)

    def contribute_to_class(self, cls, name):
        super(ForeignKey, self).contribute_to_class(cls, name)

        class MyManager(Manager):
            def __call__(self2, user):
                return cls._default_manager.filter(**{self.name: user})

        cls.add_to_class(self.manager_name, MyManager())

So now, what does this do?

We are subclassing ForeignKey. In __init__ we make sure to is set to User and we also set self.manager_name equal to either the manager_name kwarg, if provided or 'for_user'. contribute_to_class get called by the ModelMetaclass to add each item to the Model itself. So here we call the parent method, to get the ForeignKey itself set on the model, and then we create a new subclass of Manager. And we define an __call__ method on it, this lets us call an instance as if it were a function. And we make __call__ return the QuerySet that would be returned by filtering the default manager for the class where the user field is equal to the given user. And then we add it to the class with the name provided earlier.

And that's all. Now we can do things like:

MyModel.for_user(request.user)

Next post we'll probably look at making this more generic.

You can find the rest here. There are view comments.