alex gaynor's blago-blog

Posts tagged with metaclass

You Built a Metaclass for *what*?

Posted November 30th, 2009. Tagged with django, python, metaclass, c++.

Recently I had a bit of an interesting problem, I needed to define a way to represent a C++ API in Python. So, I figured the best way to represent that was one class in Python for each class in C++, with a functions dictionary to track each of the methods on each class. Seems simple enough right, do something like this:

class String(object):
    functions = {
        "size": Function(Integer, []),
    }

We've got a String class with a functions dictionary that maps method names to Function objects. The Function constructor takes a return type and a list of arguments. Unfortunately we run into a problem when we want to do something like this:

class String(object):
    functions = {
        "size": Function(Integer, []),
        "append": Function(None, [String])
    }

If we try to run this code we're going to get a NameError, String isn't defined yet. Django models have a similar issue, with recursive foreign keys. Django's solution is to use the placeholder string "self", and have a metaclass translate it into the right class. Also having a slightly more declarative API might be nice, so something like this:

class String(DeclarativeObject):
    size = Function(Integer, [])
    append = Function(None, ["self"])

So now that we have a nice pretty API we need our metaclass to make it happen:

RECURSIVE_TYPE_CONSTANT = "self"

class DeclarativeObjectMetaclass(type):
    def __new__(cls, name, bases, attrs):
        functions = dict([(n, attr) for n, attr in attrs.iteritems()
            if isinstance(attr, Function)])
        for attr in functions:
            attrs.pop(attr)
        new_cls = super(DeclarativeObjectMetaclass, cls).__new__(cls, name, bases, attrs)
        new_cls.functions = {}
        for name, function in functions.iteritems():
            if function.return_type == RECURSIVE_TYPE_CONSTANT:
                function.return_type = new_cls
            for i, argument in enumerate(function.arguments):
                if argument == RECURSIVE_TYPE_CONSTANT:
                    function.arguments[i] = new_cls
            new_cls.functions[name] = function
        return new_cls

class DeclarativeObject(object):
    __metaclass__ = DeclarativeObjectMetaclass

And that's all their is to it. We take each of the functions on the class out of the attributes, create a normal class instance without the functions, and then we do the replacements on the function objects and stick them in a functions dictionary.

Simple patterns like this can be used to build beautiful APIs, as is seen in Django with the models and forms API.

You can find the rest here. There are view comments.

A Second Look at Inheritance and Polymorphism with Django

Posted February 10th, 2009. Tagged with django, orm, models, python, metaclass, internals.

Previously I wrote about ways to handle polymorphism with inheritance in Django's ORM in a way that didn't require any changes to your model at all(besides adding in a mixin), today we're going to look at a way to do this that is a little more invasive and involved, but also can provide much better performance. As we saw previously with no other information we could get the correct subclass for a given object in O(k) queries, where k is the number of subclasses. This means for a queryset with n items, we would need to do O(nk) queries, not great performance, for a queryset with 10 items, and 3 subclasses we'd need to do 30 queries, which isn't really acceptable for most websites. The major problem here is that for each object we simply guess as to which subclass a given object is. However, that's a piece of information we could know concretely if we cached it for later usage, so let's start off there, we're going to be building a mixin class just like we did last time:

from django.db import models

class InheritanceMixIn(models.Model):
    _class = models.CharField(max_length=100)

    class Meta:
        abstract = True

So now we have a simple abstract model that the base of our inheritance trees can subclass that has a field for caching which subclass we are. Now let's add a method to actually cache it and retrieve the subclass:

from django.db import models
from django.db.models.fields import FieldDoesNotExist
from django.db.models.related import RelatedObject

class InheritanceMixIn(models.Model):
    ...
    def save(self, *args, **kwargs):
        if not self.id:
            parent = self._meta.parents.keys()[0]
            subclasses = parent._meta.get_all_related_objects()
            for klass in subclasses:
                if isinstance(klass, RelatedObject) and klass.field.primary_key \
                    and klass.opts == self._meta:
                    self._class = klass.get_accessor_name()
                    break
        return super(InheritanceMixIn, self).save(*args, **kwargs)

    def get_object(self):
        try:
            if self._class and self._meta.get_field_by_name(self._class)[0].opts != self._meta:
                return getattr(self, self._class)
        except FieldDoesNotExist:
            pass
        return self

Our save method is where all the magic really happens. First, we make sure we're only doing this caching if it's the first time a model is being saved. Then we get the first parent class we have (this means this probably won't play nicely with multiple inheritance, that's unfortunate, but not as common a usecase), then we get all the related objects this class has(this includes the reverse relationship the subclasses have). Then for each of the subclasses, if it is a RelatedObject, and it is a primary key on it's model, and the class it points to is the same as us then we cache the accessor name on the model, break out, and do the normal save procedure.

Our get_object function is pretty simple, if we have our class cached, and the model we are cached as isn't of the same type as ourselves we get the attribute with the subclass and return it, otherwise we are the last descendent and just return ourselves. There is one(possible quite large) caveat here, if our inheritance chain is more than one level deep(that is to say our subclasses have subclasses) then this won't return those objects correctly. The class is actually cached correctly, but since the top level object doesn't have an attribute by the name of the 2nd level subclass it doesn't return anything. I believe this can be worked around, but I haven't found a way yet. One idea would be to actually store the full ancestor chain in the CharField, comma separated, and then just traverse it.

There is one thing we can do to make this even easier, which is to have instances automatically become the correct subclass when they are pulled in from the DB. This does have an overhead, pulling in a queryset with n items guarantees O(n) queries. This can be improved(just as it was for the previous solution) by ticket #7270 which allows select_related to traverse reverse relationships. In any event, we can write a metaclass to handle this for us automatically:

from django.db import models
from django.db.models.base import ModelBase
from django.db.models.fields import FieldDoesNotExist
from django.db.models.related import RelatedObject

class InheritanceMetaclass(ModelBase):
    def __call__(cls, *args, **kwargs):
        obj = super(InheritanceMetaclass, cls).__call__(*args, **kwargs)
        return obj.get_object()

class InheritanceMixIn(models.Model):
    __metaclass__ = InheritanceMetaclass
    ...

Here we've created a fairly trivial metaclass that subclasses the default one Django uses for it's models. The only method we've written is __call__, on a metalcass what __call__ does is handle the instantiation of an object, so it would call __init__. What we do is do whatever the default __call__ does, so that we get an instances as normal, and then we call the get_object() method we wrote earlier and return it, and that's all.

We've now looked at 2 ways to handle polymorphism, with this way being more efficient in all cases(ignoring the overhead of having the extra charfield). However, it still isn't totally efficient, and it fails in several edge cases. Whether automating the handling of something like this is a good idea is something that needs to be considered on a project by project basis, as the extra queries can be a large overhead, however, they may not be avoidable in which case automating it is probably advantages.

You can find the rest here. There are view comments.

Django Models - Digging a Little Deeper

Posted November 13th, 2008. Tagged with django, foreignkey, orm, models, python, metaclass.

For those of you who read my last post on Django models you probably noticed that I skirted over a few details, specifically for quite a few items I said we, "added them to the new class". But what exactly does that entail? Here I'm going to look at the add_to_class method that's present on the ModelBase metaclass we look at earlier, and the contribute_to_class method that's present on a number of classes throughout Django.

So first, the add_to_class method. This is called for each item we add to the new class, and what it does is if that has a contribute_to_class method than we call that with the new class, and it's name(the name it should attach itself to the new class as) as arguments. Otherwise we simply set that attribute to that value on the new class. So for example new_class.add_to_class('abc', 3), 3 doesn't have a contribute_to_class method, so we just do setattr(new_class, 'abc', 3).

The contribute_to_class method is more common for things you set on your class, like Fields or Managers. The contribute_to_class method on these objects is responsible for doing whatever is necessary to add it to the new class and do it's setup. If you remember from my first blog post about User Foreign Keys, we used the contribute_to_class method to add a new manager to our class. Here we're going to look at what a few of the builtin contribute_to_class methods do.

The first case is a manager. The manager sets it's model attribute to be the model it's added to. Then it checks to see whether or not the model already has an _default_manager attribute, if it doesn't, or if it's creation counter is lower than that of the current creation counter, it sets itself as the default manager on the new class. The creation counter is essentially a way for Django to keep track of which manager was added to the model first. Lastly, if this is an abstract model, it adds itself to the abstract_managers list in _meta on the model.

The next case is if the object is a field, different fields actually do slightly different things, but first we'll cover the general field case. It also, first, sets a few of it's internal attributes, to know what it's name is on the new model, additionally calculating it's column name in the db, and it's verbose_name if one isn't explicitly provided. Next it calls add_field on _meta of the model to add itself to _meta. Lastly, if the field has choices, it sets the get_FIELD_display method on the class.

Another case is for file fields. They do everything a normal field does, plus some more stuff. They also add a FileDescriptor to the new class, and they also add a signal receiver so that when an instance of the model is deleted the file also gets deleted.

The final case is for related fields. This is also the most complicated case. I won't describe exactly what this code does, but it's biggest responsibility is to set up the reverse descriptors on the related model, those are nice things that let you author_obj.books.all().

Hopefully this gives you a good idea of what to do if you wanted to create a new field like object in Django. For another example of using these techniques, take a look at the generic foreign key field in django.contrib.contenttypes, here.

You can find the rest here. There are view comments.

How the Heck do Django Models Work

Posted November 10th, 2008. Tagged with django, models, python, metaclass.

Anyone who has used Django for just about any length of time has probably used a Django model, and possibly wondered how it works. The key to the whole thing is what's known as a metaclass, a metaclass is essentially a class that defines how a class is created. All the code for this occurs is here. And without further ado, let's see what this does.

So the first thing to look at is the method, __new__, __new__ is sort of like __init__, except instead of returning an instance of the class, it returns a new class. You can sort of see this is the argument signature, it takes cls, name, bases, and attrs. Where __init__ takes self, __new__ takes cls. Name is a string which is the name of the class, bases is the class that this new class is a subclass of, and attrs is a dictionary mapping names to class attributes.

The first thing the __new__ method does is check if the new class is a subclass of ModelBase, and if it's not, it bails out, and returns a normal class. The next thing is it gets the module of the class, and sets the attribute on the new class(this is going to be a recurring theme, getting something from the original class, and putting it in the right place on the new class). Then it checks if it has a Meta class(where you define your Model level options), it has to look in two places for this, first in the attrs dictionary, this is where it will be if you stick your class Meta inside your class. However, because of inheritance, we also have to check if the class has an _meta attribute already(this is where Django ultimately stores a bunch of internal information), and handle that scenario as well.

Next we get the app_label attribute, for this we either use the app_label attribute in the Meta class, or we pull it out of sys.modules. Lastly(at least for Meta), we build an instance of the Options class(which lives at django.db.models.options.Option) and add it to the new class as _meta. Next, if this class isn't an abstract base class we add the DoesNotExist and MultipleObjectsReturned exceptions to the class, and also inherit the ordering and get_latest_by attributes if we are a subclass.

Now we start getting to adding fields and stuff. First we check if we have an _default_manager attribute, and if not, we set it to None. Next we check if we've already defined the class, and if we have, we just return the class we already created. Now we go through each item that's left in the attrs dictionary and class the add_to_class method with it on the new class. add_to_class is a piece of internals that you may recognize from my first two blog posts, and what exactly it does I'll explain exactly what it does in another most, but at it's most basic level it adds each item in the dictionary to the new class, and each item knows where exactly it needs to get put.

Now we do a bunch of stuff to deal with inherited models. We iterate through ever item in bases, that's also a subclass of models.Model, and do the following: if it doesn't have an _meta attribute, we ignore it. If the parent isn't an abstract base class, if we already have a OneToOne field to it we set it up as a primary key, otherwise we create a new OneToOne field and install it as a primary key for the model. And now, if it is an abstract class, we iterate through the fields, if any of these fields has a name that is already defined on our class, we raise an error, otherwise we add that field to our class. And now we move managers from the parents down to the new class. Essentially we juts copy them over, and we also copy over virtual fields(these are things like GenericForeignKeys, which doesn't actually have a database field, but we still need to pass down and setup appropriately).

And then we do a few final pieces of cleanup. We make sure our new class doesn't have abstract=True in it's _meta, even if it's inherited from an abstract class. We add a few methods(get_next_in_order, and other), we inherit the docstring, or set a new one, and we send the class prepared signal. Finally, we register the model with Django's model loading system, and return the instance in Django's model cache, this is to make sure we don't have duplicate copies of the class floating around.

And that's it! Obviously I've skirted over how exactly somethings occur, but you should have a basic idea of what occurs. As always with Django, the source is an excellent resource. Hopefully you have a better idea of what exactly happens when you subclass models.Model now.

You can find the rest here. There are view comments.