Lately I've had the opportunity to do some test-driven development with Django, which a) is awesome, I love testing, and b) means I've been working up a box full of testing utilities, and I figured I'd share them.
If you've done testing of views with Django you probably have some tests that look like:
def test_my_view(self):
response = self.client.get(reverse("my_url", kwargs={"pk": 1}))
response = self.client.post(reverse("my_url", kwargs={"pk": 1}), {
"key": "value",
})
This was a tad too verbose for my tastes so I wrote:
def get(self, url_name, *args, **kwargs):
return self.client.get(reverse(url_name, args=args, kwargs=kwargs))
def post(self, url_name, *args, **kwargs):
data = kwargs.pop("data", None)
return self.client.post(reverse(url_name, args=args, kwargs=kwargs), data)
Which are used:
def test_my_view(self):
response = self.get("my_url", pk=1)
response = self.post("my_url", pk=1, data={
"key": "value",
})
Much nicer.
The next big issue I had was logging in and out of multiple users was too verbose. I often want to switch between users, either to check different permissions or to test some inter-user workflow. That was solved with a simple context manager:
class login(object):
def __init__(self, testcase, user, password):
self.testcase = testcase
success = testcase.client.login(username=user, password=password)
self.testcase.assertTrue(
success,
"login with username=%r, password=%r failed" % (user, password)
)
def __enter__(self):
pass
def __exit__(self, *args):
self.testcase.client.logout()
def login(self, user, password):
return login(self, user, password)
This is used:
def test_my_view(self):
with self.login("username", "password"):
response = self.get("my_url", pk=1)
Again, a lot better.
Not quite a testing utility, but my app django-fixture-generator has made testing a lot easier for me. Fixtures are useful in getting data to work wit, but maintaining them is often a pain, you've got random scripts to generate them, or you just checkin some JSON to your repository with no way to regenerate it sanely (say if you add a new field to your model). django-fixture-generator gives you a clean way to manage the code for generating fixtures.
In general I've found context managers are a pretty awesome tool for writing clean, readable, succinct tests. I'm sure I'll have more utilities as I write more tests, hopefully someone finds these useful.
Every once and a while the topic of multimethods (also known as generic dispatch) comes up in the Python world (see here, and here, here too, and finally here, and probably others). For those of you who aren't familiar with the concept, the idea is that you declare a bunch of functions with the same name, but that take different arguments and the language routes your calls to that function to the correct implementation, based on what types you're calling it with. For example here's a C++ example:
#include <iostream>
void special(int k) {
std::cout << "I AM THE ALLMIGHTY INTEGER " << k << std::endl;
}
void special(std::string k) {
std::cout << "I AM THE ALLMIGHTY STRING " << k << std::endl;
}
int main() {
special(42);
special("magic");
return 0;
}
As you can probably guess this will print out:
I AM THE ALLMIGHTY INTEGER 3 I AM THE ALLMIGHTY STRING magic
You, the insightful reader, are no doubt fuming in your seats now, "Alex, you idiot, Python functions don't have type signatures, how can we route our calls based on something that does not exist!", and right you are. However, don't tell me you've never written a function that looks like:
def my_magic_function(o):
if isinstance(o, basestring):
return my_magic_function(int(o))
elif isinstance(o, (int, long)):
return cache[o]
else:
return o
Or something like that, the point is you have one function that has a couple of different behaviors based on the type of it's parameter. Perhaps it'd be nice to separate each of those behaviors into their own function (or not, I don't really care what you do).
I was saying that a bunch of people have already implemented these, why am I? Mostly for fun (that's still a valid reason, right?), but also because a bunch of the implementations make me sad. Some of them use crazy hacks (reading up through stack frames), a few of them have global registrys, and all of them rely on the name of the function to identify a single "function" to be overloaded. However, they also all have one good thing in common: decorators, yay!
My implementation is pretty simple, so I'll present it, and it's test suite without explanation:
class MultiMethod(object):
def __init__(self):
self._implementations = {}
def _get_predicate(self, o):
if isinstance(o, type):
return lambda x: isinstance(x, o)
assert callable(o)
return o
def register(self, *args, **kwargs):
def inner(f):
key = (
args,
tuple(kwargs.items()),
)
if key in self._implementations:
raise TypeError("Duplicate registration for %r" % key)
self._implementations[key] = f
return self
return inner
def __call__(self, *args, **kwargs):
for spec, func in self._implementations.iteritems():
arg_spec, kwarg_spec = spec
kwarg_spec = dict(kwarg_spec)
if len(args) != len(arg_spec) or set(kwargs) != set(kwarg_spec):
continue
if (all(self._get_predicate(spec)(arg) for spec, arg in zip(arg_spec, args)) and
all(self._get_predicate(spec)(kwargs[k]) for k, spec in kwarg_spec.iteritems())):
return func(*args, **kwargs)
raise TypeError("No implementation with a spec matching: %r, %r" % (
args, kwargs))
And the tests:
import unittest2 as unittest
from multimethod import MultiMethod
class MultiMethodTestCase(unittest.TestCase):
def test_basic(self):
items = MultiMethod()
@items.register(list)
def items(l):
return l
@items.register(dict)
def items(d):
return d.items()
self.assertEqual(items([1, 2, 3]), [1, 2, 3])
# TODO: dict ordering dependent, 1 item dict?
self.assertEqual(items({"a": 1, "b": 2}), [("a", 1), ("b", 2)])
with self.assertRaises(TypeError):
items(xrange(3))
def test_duplicate(self):
m = MultiMethod()
@m.register(list)
def m(o):
return o
with self.assertRaises(TypeError):
@m.register(list)
def m(o):
return o
if __name__ == "__main__":
unittest.main()
Bon appétit.
You can find the rest here. There are view comments.
While doing some work today I realized that generating fixtures in Django is way too much of a pain in the ass, and I suspect it's a pain in the ass for a lot of other people as well. I also came up with an API I'd kind of like to see for it, unfortunately I don't really have the time to write the whole thing, however I'm hoping someone else does.
The key problem with writing fixtures is that you want to have a clean enviroment to generate them, and you need to be able to edit them in the future. In addition, I'd personally prefer to have my fixture generation specifically be imperative. I have an API I think I think solves all of these concerns.
Essentially, in every application you can have a fixture_gen.py file, which contains a bunch of functions that can generate fixtures:
from fixture_generator import fixture_generator
from my_app.models import Model1, Model2
@fixture_generator(Model1, Model2, requires=["my_app.other_dataset"])
def some_dataset():
# Some objects get created here
@fixture_generator(Model1)
def other_dataset():
# Some objects get created here
Basically you have a bunch of functions, each of which is responsible for creating some objects that will become a fixture. You then decorate them with a decorator that specifies what models need to be included in the fixture that results from them, and finally you can optionally specify dependencies (these are necessary because a dependency could use models which your fixture doesn't).
After you have these functions there's a management command which can be invoked to actually generate the fixtures:
$ ./manage.py generate_fixture my_app.some_dataset --format=json --indent=4
Which actually creates the clean database enviroment, handles the dependencies, calls the functions, and dumps the fixtures to stdout. Then you can redirect that stdout off to a file somewhere, for use in testing or whatever else people use fixtures for.
Hopefully someone else has this problem, and likes the API enough to build this. Failing that I'll try to make some time for it, but no promises when (aka if you want it you should probably build it).
You can find the rest here. There are view comments.
I just finished giving my talk at DjangoCon.eu on Django and NoSQL (also the topic of my Google Summer of Code project). You can get the slides over at slideshare. My slides from my lightning talk on django-templatetag-sugar are also up on slideshare.
You can find the rest here. There are view comments.
Currently the most common implementation of Python is known as CPython, and it's the version of Python you get at python.org, probably 99.9% of Python developers are using it. However, I think over the next couple of years we're going to see a move away from this towards PyPy, Python written in Python. This is going to happen because PyPy offers better speed, more flexibility, and is a better platform for Python's growth, and the most important thing is you can make this transition happen.
The first thing to consider: speed. PyPy is a lot faster than CPython for a lot of tasks, and they've got the benchmarks to prove it. There's room for improvement, but it's clear that for a lot of benchmarks PyPy screams, and it's not just number crunching (although PyPy is good at that too). Although Python performance might not be a bottleneck for a lot of us (especially us web developers who like to push performance down the stack to our database), would you say no to having your code run 2x faster?
The next factor is the flexibility. By writing their interpreter in RPython PyPy can automatically generate C code (like CPython), but also JVM and .NET versions of the interpreter. Instead of writing entirely separate Jython and IronPython implementations of Python, just automatically generate them from one shared codebase. PyPy can also have its binary generated with a stackless option, just like stackless Python, again no separate implementations to maintain. Lastly, PyPy's JIT is almost totally separate from the interpreter, this means changes to the language itself can be made without needing to update the JIT, contrast this with many JITs that need to statically define fast-paths for various operations.
And finally that it's a better platform for growth. The last point is a good example of this: one can keep the speed from the JIT while making changes to the language, you don't need to be an assembly expert to write a new bytecode, or play with the builtin types, the JIT generator takes care of it for you. Also, it's written in Python, it may be RPython which isn't as high level as regular Python, but compare the implementations of of map from CPython and PyPy:
static PyObject *
builtin_map(PyObject *self, PyObject *args)
{
typedef struct {
PyObject *it; /* the iterator object */
int saw_StopIteration; /* bool: did the iterator end? */
} sequence;
PyObject *func, *result;
sequence *seqs = NULL, *sqp;
Py_ssize_t n, len;
register int i, j;
n = PyTuple_Size(args);
if (n < 2) {
PyErr_SetString(PyExc_TypeError,
"map() requires at least two args");
return NULL;
}
func = PyTuple_GetItem(args, 0);
n--;
if (func == Py_None) {
if (PyErr_WarnPy3k("map(None, ...) not supported in 3.x; "
"use list(...)", 1) < 0)
return NULL;
if (n == 1) {
/* map(None, S) is the same as list(S). */
return PySequence_List(PyTuple_GetItem(args, 1));
}
}
/* Get space for sequence descriptors. Must NULL out the iterator
* pointers so that jumping to Fail_2 later doesn't see trash.
*/
if ((seqs = PyMem_NEW(sequence, n)) == NULL) {
PyErr_NoMemory();
return NULL;
}
for (i = 0; i < n; ++i) {
seqs[i].it = (PyObject*)NULL;
seqs[i].saw_StopIteration = 0;
}
/* Do a first pass to obtain iterators for the arguments, and set len
* to the largest of their lengths.
*/
len = 0;
for (i = 0, sqp = seqs; i < n; ++i, ++sqp) {
PyObject *curseq;
Py_ssize_t curlen;
/* Get iterator. */
curseq = PyTuple_GetItem(args, i+1);
sqp->it = PyObject_GetIter(curseq);
if (sqp->it == NULL) {
static char errmsg[] =
"argument %d to map() must support iteration";
char errbuf[sizeof(errmsg) + 25];
PyOS_snprintf(errbuf, sizeof(errbuf), errmsg, i+2);
PyErr_SetString(PyExc_TypeError, errbuf);
goto Fail_2;
}
/* Update len. */
curlen = _PyObject_LengthHint(curseq, 8);
if (curlen > len)
len = curlen;
}
/* Get space for the result list. */
if ((result = (PyObject *) PyList_New(len)) == NULL)
goto Fail_2;
/* Iterate over the sequences until all have stopped. */
for (i = 0; ; ++i) {
PyObject *alist, *item=NULL, *value;
int numactive = 0;
if (func == Py_None && n == 1)
alist = NULL;
else if ((alist = PyTuple_New(n)) == NULL)
goto Fail_1;
for (j = 0, sqp = seqs; j < n; ++j, ++sqp) {
if (sqp->saw_StopIteration) {
Py_INCREF(Py_None);
item = Py_None;
}
else {
item = PyIter_Next(sqp->it);
if (item)
++numactive;
else {
if (PyErr_Occurred()) {
Py_XDECREF(alist);
goto Fail_1;
}
Py_INCREF(Py_None);
item = Py_None;
sqp->saw_StopIteration = 1;
}
}
if (alist)
PyTuple_SET_ITEM(alist, j, item);
else
break;
}
if (!alist)
alist = item;
if (numactive == 0) {
Py_DECREF(alist);
break;
}
if (func == Py_None)
value = alist;
else {
value = PyEval_CallObject(func, alist);
Py_DECREF(alist);
if (value == NULL)
goto Fail_1;
}
if (i >= len) {
int status = PyList_Append(result, value);
Py_DECREF(value);
if (status < 0)
goto Fail_1;
}
else if (PyList_SetItem(result, i, value) < 0)
goto Fail_1;
}
if (i < len && PyList_SetSlice(result, i, len, NULL) < 0)
goto Fail_1;
goto Succeed;
Fail_1:
Py_DECREF(result);
Fail_2:
result = NULL;
Succeed:
assert(seqs);
for (i = 0; i < n; ++i)
Py_XDECREF(seqs[i].it);
PyMem_DEL(seqs);
return result;
}
That's a lot of code! It wouldn't be bad, for C code, except for the fact that there's far too much boilerplate: every single call into the C-API needs to check for an exception, and INCREF and DECREF calls are littered throughout the code. Compare this with PyPy's RPython implementation:
def map(space, w_func, collections_w):
"""does 3 separate things, hence this enormous docstring.
1. if function is None, return a list of tuples, each with one
item from each collection. If the collections have different
lengths, shorter ones are padded with None.
2. if function is not None, and there is only one collection,
apply function to every item in the collection and return a
list of the results.
3. if function is not None, and there are several collections,
repeatedly call the function with one argument from each
collection. If the collections have different lengths,
shorter ones are padded with None
"""
if not collections_w:
msg = "map() requires at least two arguments"
raise OperationError(space.w_TypeError, space.wrap(msg))
num_collections = len(collections_w)
none_func = space.is_w(w_func, space.w_None)
if none_func and num_collections == 1:
return space.call_function(space.w_list, collections_w[0])
result_w = []
iterators_w = [space.iter(w_seq) for w_seq in collections_w]
num_iterators = len(iterators_w)
while True:
cont = False
args_w = [space.w_None] * num_iterators
for i in range(len(iterators_w)):
try:
args_w[i] = space.next(iterators_w[i])
except OperationError, e:
if not e.match(space, space.w_StopIteration):
raise
else:
cont = True
w_args = space.newtuple(args_w)
if cont:
if none_func:
result_w.append(w_args)
else:
w_res = space.call(w_func, w_args)
result_w.append(w_res)
else:
return space.newlist(result_w)
map.unwrap_spec = [ObjSpace, W_Root, "args_w"]
It's not exactly what you'd write for a pure Python implementation of map, but it's a hell of a lot closer than the C version.
The case for PyPy being the future is strong, I think, however it's not all sunshine are roses, there are a few issues. It lags behind CPython's version (right now Python 2.5 is implemented), C extension compatibility isn't there yet, and not enough people are trying it out yet. But PyPy is getting there, and you can help.
Right now the single biggest way to help for most people is to test their code. Any pure Python code targeting Python 2.5 should run perfectly under PyPy, and if it doesn't: it's a bug, if it's slower than Python: let us know (unless it involves re, we know it's slow). Maybe try out your C-extensions, however cpyext is very alpha and even a segfault isn't surprising (but let us know so we can investigate). Of course help on development is always appreciated, right now most of the effort is going into speeding up the JIT even more, however I believe there is also going to be work on moving up to Python 2.7 (currently pre-release) this summer. If you're interested in helping out with either you should hop into #pypy on irc.freenode.net, or send a message to pypy-dev. PyPy's doing good work, Python doesn't need to be slow, and we don't all need to write C code!
You can find the rest here. There are view comments.
In a previous post I talked about a cool new customization API that django-taggit has. Now I'm going to dive into the internals.
The public API is almost exclusively exposed via a class named TaggableManager, you attach one of these to your model and it has some cool tagging APIs. This class basically masquerades as ManyToManyField, this is how it gets cool things like filtering and forms automatically. If you look at its definition you'll see it has a bunch of attributes that it never actually uses, basically all of these act to emulate the Field interface. This class is also the entry point for the new customization API, exposed via the through parameter. This basically acts as an analogue to the through parameter on actual ManyToManyFields (documented here). The final crucial method is __get__, which turns TaggableManager into a descriptor.
This descriptor exposes an _TaggableManager class, which holds some of the internal logic. This class exposes all of the "managery" type methods, add(), set(), remove(), and clear(). This class is pretty simple, basically it just proxies bewteen the methods called and it's through model. This class is, unlike TaggableManager, actually a subclass of models.Manager, it just defines get_query_set() to return a QuerySet of all the tags for that model, or instance, and then filtering, ordering, and more falls out naturally.
Beyond that there's not too much going on. The code is fairly simple, and it's not particularly long. I've found this to be a pretty good pattern for extensibility, and it really resolves the need to have dozens of parameters, or GenericForeignKeys popping out every which way.
You can find the rest here. There are view comments.
This semester I've been taking a course in ethics, and while it hasn't changed my perspective on any issues, it has allowed me to form some opinions about ethical systems. In particular, I've found that utilitarianism is an awful ethical system, with almost no merit. The major problems with it being that it is conditional, and that it's impossible to attempt to apply.
The first problem with utilitarianism is that it is conditional. Utilitarianism is a teleological system that says, "seek to maximize utility", different thinkers have put for different answers to what that that utility is. This makes utilitarianism a conditional system, it only applies so long as the actor agrees with the identified activity, or property that provides utility. If one seeks to maximize pleasure, as Bentham suggests, that's fine, except if I don't want to maximize pleasure the entire system is useless to me. This is a major problem, as an ethical system shouldn't be entirely contingent on an assumption, that happiness is the correct thing to attempt to maximize. David Hume calls this the is ought-problem.
The second, arguably larger issue, is that it's impossible to apply for two reasons. Because utilitarianism attempts to maximize something we must have a way to quantify it, or at least compare two different items to see which is greater. Except how does one quantify pleasure or pain? Bentham proposes a "pleasure calculus" based on 7 attributes of pleasures or pains, but this is really just moving the goal post, how do you compare the intensity of pleasure, of the fecundity of pain? These are impossible, John Stuart Mill suggests there are two types of pleasures, higher and lower, but this is just a further attempt to both ignore the impossibility of comparing pleasures and pains as well create artificial distinctions, grounded not in reason, but in individual intuition. If we can't tell which actions are better, we can't actually make any decisions from our ethical system.
The other issue in the application of utilitarianism is that, even if we could compare pleasures and pains, they're often impossible to predict in advance, or even years later. For example, was the accident at Three Mile Island good or bad? It obviously had devastating effects, but it also was a catalyst for changing nuclear power policy in the US, and even now, 30 years later, we probably can't say whether the benefits in safety policy outweigh the obvious costs.
Because utilitarianism is both logically unsound (it relies on an unproven assumption) and impossibly to realistically implement it is a bad ethical system. I have no understanding of how people try to follow a utilitarian ideology in light of these indisputable flaws. In a future post I'll cover my issues with some deontological ethical systems.
You can find the rest here. There are view comments.
A little while ago I wrote about some of the issues with the reusable application paradigm in Django. Yesterday Carl Meyer pinged me about an issue in django-taggit, it uses an IntegerField for the GenericForeignKey, which is great. Except for when you have a model with a CharField, TextField, or anything else for a primary key. The easy solution is to change the GenericForeignKey to be something else. But that's lame, a pain in the ass, and a hack (more of a hack than a GenericForeignKey in the first place).
The alternate solution we came up with:
from django.db import models
from taggit.managers import TaggableManager
from taggit.models import TaggedItemBase
class TaggedFood(TaggedItemBase):
content_object = models.ForeignKey('Food')
class Food(models.Model):
# ... fields here
tags = TaggableManager(through=TaggedFood)
Custom through models for the taggable relationship! This let's the included GenericForeignKey implementation cater to the common case of integer primary keys, and lets other people provide their own implementations when necessary. Plus it means doing things like, adding a ForeignKey to auth.User or adding the "originally" typed version of the tag (for systems where tags are normalized).
In addition I've finally added some docs, they aren't really complete, but they're a start. I'm planning a release for sometime next week, unless some major issue pops up.
You can find the rest here. There are view comments.
If you track Django's commits aggressivly (ok so just me...), you may have noticed that there have been a number of commits to improve the compatibility of Django and PyPy in the last couple of days. In the run up to Django 1.0 there were a ton of commits to make sure Django didn't have too many assumptions that it was running on CPython, and that it could work with systems like Jython and PyPy. Unfortunately, since then, our support has laxed a little, and a number of tests have begun to fail. In the past couple of days I've been working to correct this. Here are some of the things that were wrong
The first issue I ran into was, in various tests, response.context and response.template being None, instead of the lists that were expected. This was a pain to diagnose, but the ultimate source of the bug is that Django registers a signal handler in the test client that listens for templates being rendered. However, it doesn't actually unregister that signal receiver. Instead it relies on the fact that signals are stored as weakrefs, and when the function ends, the receivers that were registered (which were local variables) would be automatically deallocated. On PyPy, Jython, and any other system with a garbage collector more advanced than CPython's reference counting, the local variables aren't guaranteed to be deallocated at the end of the function, and therefore the weakref can still be alive. Truthfully, I'm not 100% sure how this results in the next signal being sent not to store the appropriate data, but it does. The solution is the make sure that the signals are manually disconnected at the end of the run. This was fixed in r12964 of Django.
The next issue was actually a problem in PyPy, specifically it was crashing with a UnicodeDecodeError. When I say crashing I mean crashing in the C sense of the word, not the Python, nice exception and stack trace sense... sort of. PyPy is written in a language named RPython, RPython is Pythonesque (all valid RPython is valid Python), and has exceptions. However if they aren't caught they sort of propagate to the top, they give you a kind-of-ok stacktrace, but its function names are all the generated function names from the C source, not useful ones from the RPython source. Internally, PyPy uses an OperationError to keep track of exceptions at the interpreter level. A trick to debugging RPython is, if running the code on top of CPython works, than running it translated to C will work, and the contrapositive appears true as well, if the C doesn't work, running on CPython won't work. After trying to run the code on CPython, the location of the exception bubbled right to the top, and the fix followed easily.
These are the first two issues I fixed, a couple others have been fixed and committed, and a further few have also been fixed, but not committed yet. I'll be writing about those as I find the time.
You can find the rest here. There are view comments.
Lately I've been working with quite a few designers so I've been thinking about what exactly constitutes the ideal working relationship. Here are some of the models I've seen, in order of best to worst.
The ideal relationship would be I write a view function, let the designer know what template they need to write, and what context is available, then they can ask me if they need any extra logic stuff available. At the end I can go and add in whatever Javascript is needed. This requires the designer to know the template language, and ideally be able to at least read model definitions so they know what properties are available to them.
The designer provides a final templates (in the template syntax, with inheritance, blocks, etc.), with example data, and I plug some template syntax in there to wire it up with the real data. This is pretty convenient, the one downside is that I, the developer, have to know things like "do we truncate the values here", "do we need an ellipsis", or "if we only have 4 values do we have 2 columns with 3 and 1 values, or 2 columns with 2 and 2 values". This model requires whatever HTML/CSS the designer provides to cover all the scenarios, or the developer would be running back and forth asking questions all day.
The designer provides some HTML/CSS files. For these I have to convert them to the template syntax, and substitute the real values here. Here I need to know a lot about the semantics of the table to figure out what belongs in what blocks, how page elements are used across different pages, etc. For this reason it's not ideal, but it can work, and it requires minimal knowledge about the programming language and tools used on the part of your designers.
Anything that involves me writing HTML or CSS. I'm bad at those.
These are the working relationships I've worked with so far, the common theme is that the more your designer knows about your tools and languages the better (HTML is good, Django templates is better, being able to actually read a model definition is best).
Django's application paradigm (and the accompanying reusable application environment) have served it exceptionally well, however there are a few well known problems with it. Chief among these is pain in extendability (as exemplified by the User model), and abuse of GenericForeignKeys where a true ForeignKey would suffice (in the name of being generic), there are also smaller issues, such as wanting to install the same application multiple times, or having applications with the same "label" (in Django parlance this means the path.split(".")[-1]). Lately I've been thinking that the solution to these problems is a more holistic approach to application construction.
It's a little difficult to describe precisely what I'm thinking about, so I'll start with an example:
from django.contrib.auth import models as auth_models
class AuthApplication(Application):
models = auth_models
def login(self, request, template_name='registration/login.html'):
pass
# ... etc
And in settings.py:
from django.core import app
INSTALLED_APPS = [
app("django.contrib.auth.AuthApplication", label="auth"),
]
The critical elements are that a) all models are referred to be the attribute on the class, so that they can be swapped out by a subclass, b) applications are now installed using an app object that wraps the app class, with a label (to allow multiple apps of the same name to be registered). But how does this allow swapping out the User model, from the perspective of people who are expecting to just be able to use django.contrib.auth.models.User for any purpose? Instead of explicit references to the model these could be replaced with: get_app("auth").models.User.
What about the issue of GenericForeignKeys? To solve these we'd really need something like C++'s templates, or Java's generics, but we'll settle for the next best thing, callables! Imagine a comment app where the models.py looked like:
from django.core import get_app
from django.db import models
def get_models(target_model):
class Comment(models.Model):
obj = models.ForeignKey(target_model)
commenter = models.ForeignKey(get_app("auth").models.User)
text = models.TextField()
return [Comment]
Then instead of providing a module to be models on the application class this callable would be provided, and Django would know to call it with the appropriate model class based on either a class attribute (for subclasses) or a parameter from the app object (to allow for easily installing more than one of the comment app, for each object that should allow commenting), in practice I think allowing the same app to be installed multiple times would require some extra parameters to the get_models function, so that things like db_table can be adjusted appropriately.
I think this could be done in a backwards compatible manner, by having strings that are in INSTALLED_APPS automatically generate an app object that was the default "filler" one with just a models module, and the views ignoring self, and a default label. Like I said this is all just a set of ideas floating around my brain at this point, but hopefully by floating this design it'll get people thinking about big architecture ideas like this.
You can find the rest here. There are view comments.
Everyone knows languages don't have speeds, implementations do. Python isn't slow; CPython is. Javascript isn't fast; V8, Squirrelfish, and Tracemonkey are. But what if a language was designed in such a way that it appeared that it was impossible to be implemented efficiently? Would it be fair to say that language is slow, or would we still have to speak in terms of implementations? For a long time I followed the conventional wisdom, that languages didn't have speeds, but lately I've come to believe that we can learn something by thinking about what the limits on how fast a language could possibly be, given a perfect implementation.
For example consider the following Python function:
def f(n):
i = 0
while i < n:
i += 1
n += i
return n
And the equivilant C function:
int f(int n) {
int i = 0;
while (i < n) {
i += 1;
n += i;
}
return n;
}
CPython probably runs this code 100 times slower than the GCC compiled version of the C code. But we all know CPython is slow right? PyPy or Psyco probably runs this code 2.5 times slower than the C version (I'm just spitballing here). Psyco and PyPy are, and contain, really good just in time compilers that can profile this code, see that f is always called with an integer, and therefore a much more optimized version can be generated in assembly. For example the optimized version could generate just a few add instructions in the inner loop (plus a few more instructions to check for overflow), this would skip all the indirection of calling the __add__ function on integers, allocating the result on the heap, and the indirection of calling the __lt__ function on integers, and maybe even some other things I missed.
But there's one thing no JIT for Python can do, no matter how brilliant. It can't skip the check if n is an integer, because it can't prove it always will be an integer, someone else could import this function and call it with strings, or some custom type, or anything they felt like, so the JIT must verify that n is an integer before running the optimized version. C doesn't have to do that. It knows that n will always be an integer, even if you do nothing but call f until the end of the earth, GCC can be 100% positive that n is an integer.
The absolute need for at least a few guards in the resulting assembly guarantees that even the most perfect JIT compiler ever, could not generate code that was strictly faster than the GCC version. Of course, a JIT has some advantages over a static compiler, for example, it can inline at dynamic call sites. However, in practice I don't believe this ability is ever likely to beat a static compiler for a real world program. On the other hand I'm not going to stop using Python any time soon, and it's going to continue to get faster, a lot faster.
You can find the rest here. There are view comments.
As I said in my last post PyCon was completely and utterly awesome, but it was also a bit of a blur. Here's my attempt at summarizing everything that happened during the conference itself.
Day 2 (the first day of the conference itself) started out with Guido's keynote. He did something rather unorthodox, instead of delivering a formal talk he just took audience questions via Twitter, starting out using Twitterfall, but it was a tad slow so Jacob Kaplan-Moss switched it out for the PyCon Live Stream that Eric Florenzano, Brian Rosner, and I built, switch was very awesome (seeing your creation on four projectors in front of an audience is a pretty good ego boost). Next up I had to deliver my own talk, it went well I think, you can find my slides and a video on the PyCon website. The rest of the day was a bit of a blur, but I enjoyed James Bennet's talk, the Form generator's panel, Jonathan Ellis's talk on database scalability, and Alex Martelli's talk.
That evening I got to visit the Django restaurant, I can't help but imagine what the staff there thought about the crazy groups of programmers visiting (I think we mobbed the place every night of the conference), many of us wearing our Django shirts.
Day 3 was really about the VMs. It started with the Iron Python and PyPy keynotes. These were followed by Mark Shuttleworth's keynote, his slides weren't working (the perils of using an operating system alpha release), but he still delivered an awesome talk on software development processes. From there I went to the single best stretch of talks at PyCon. The Speed of PyPy, Unladen Swallow: fewer coconuts, faster Python, and Understanding the Python GIL. Three great topics, three great speakers, three great talks, one room. Simply put, you should drop whatever you're doing and watch each of these talks, they're all great. I was sad to miss Raymond Hettinger's talk for the Django Software Foundation meeting, but I caught the video and it was excellent (as usual for Raymond). The DSF meeting was interesting, but not super exciting, a lot of "action needed" tasks, some of which are already happening (like getting better buildbots running). Finally I caught the tail end of the Neo4j talk, I'd need to watch the full thing, because the ending caught my attention.
I ended up back at the Django restaurant again, this time for the speakers, volunteers, and sponsors dinner. Django wasn't quite equipped to handle the sheer number of us that showed us, but it was a good time nonetheless. Once again I was blown away by how many Google people there were. There were far too many awesome people to list, but suffice it to say that you should volunteer or speak at PyCon, if for no other reason than to get to attend this dinner, there are awesome people to have a conversation with as far as the eyes can see! After dinner I ended up staying up far too late with another group of awesome people. I'm told the testing birds of a feather was also great (I suppose events that aren't completely awesome don't invent things like the Testing Goat).
Here I suffered the effects of the aforementioned late night. I missed all of the keynotes, which is unfortunately considering they all looked quite good, I'll have to catch the videos. The poster session was very cool, I only got to see about half of it, but it's definitely something I'm going to look forward to at future PyCons. Next I saw Donovan Preston's talk on eventlet, a talk on teaching compilers with Python. I missed Scott Chacon's talk on hg-git, though I can't remember what for. I'll have to catch it in video because I was really looking forward to it.
After that there was time for a final pizza with friends, and then I had to head home. I sprinted remotely, but it's not quite the same as being there. It's my hope that next year I can attend the sprints in person, but school never seems to work out for me in that respect.
This year's PyCon was completely and utterly awesome, to the point where anything I point in writing won't do it justice. But I'm going to try to anyways.
Day 0 was Wednesday, I got in pretty late and didn't do much of anything besides meet up with people in the hotel lobby for an hour or two. Still, it was good to be able to catch up with people, and put some faces to people I knew exclusively via the internet. Finally I had to finish up some homework (in Ruby on Rails of all things) so it wouldn't be an albatross around my neck for the rest of the conference.
Day 1 for me was Thursday, for most conference atendees this was the last day of tutorials, however I was going to the language summit instead. The language summit impressed me with the Python community in ways I can barely describe. We had a laundry list of issues to cover, ranging from the state of packaging, to the Unladen Swallow PEP, to what policies for pure Python and C modules should be in the standard library. Besides a heated discussion about setuptools, distutils, and distribute every decision was handled extremely professionally, covering pros, cons, and impact on users. A few of the conclusions (which you've probably already heard about) are that The Hitchhikers Guide to Packaging is awesome, Unladen Swallow PEP accepted, stdlib modules that come with a C variety exclusively for speed must also have a pure Python version, stdlib modules written in C to interface with other systems are ok (but optionally having a ctypes version should be nice for alternate distributions). At lunch I met Antonio Rodriguez (one of the keynote speakers) and two Red Hat (and Fedora) developers. It was interesting to discuss the state of Linux, Ubuntu, and other distributions with people who are professionally involved in them.
I'll be trying to dedicate a full post to each of the remaining days, though as I've said there's a ton to cover. Really the takeaway for you should be: PyCon is awesome and if it's humanly possible for you to make it, you should.
During this year's PyCon I became a committer on both PyPy and Unladen Swallow, in addition I've been a contributer to Django for quite a long time (as well as having commit privileges to my branch during the Google Summer of Code). One of the things I've observed is the very different models these projects have for granting commit privileges, and what the expectations and responsibilities are for committers.
Unladen Swallow is a Google funded branch of CPython focused on speed. One of the things I've found is that the developers of this project carry over some of the development process from Google, specifically doing code review on every patch. All patches are posted to Rietveld, and reviewed, often by multiple people in the case of large patches, before being committed. Because there is a high level of review it is possible to grant commit privileges to people without requiring perfection in their patches, as long as they follow the review process the project is well insulated against a bad patch.
PyPy is also an implementation of Python, however its development model is based largely around aggressive branching (I've never seen a project handle SVN's branching failures as well as PyPy) as well as sprints and pair programming. By branching aggressively PyPy avoids the overhead of reviewing every single patch, and instead only requires review when something is already believed to be "trunk-ready", further this model encourages experimentation (in the same way git's light weight branches do). PyPy's use of sprints and pair programming are two ways to avoid formal code reviews and instead approach code quality as more of a collaborative effort.
Django is the project I've been involved with for the longest, and also the only one I don't have commit privileges on. Django is extremely conservative in giving out commit privileges (there about a dozen Django committers, and about 500 names in the AUTHORS file). Django's development model is based neither on branching (only changes as large in scope as multiple database support, or an admin UI refactor get their own branch) nor on code review (most commits are reviewed by no one besides the person who commits them). Django's committers maintain a level of autonomy that isn't seen in either of the other two projects. This fact comes from the period before Django 1.0 was released when Django's trunk was often used in production, and the need to keep it stable at all times, combined with the fact that Django has no paid developers who can guarantee time to do code review on patches. Therefore Django has maintained code quality by being extremely conservative in granting commit privileges and allowing developers with commit privileges to exercise their own judgment at all times.
Each of these projects uses different methods for maintaining code quality, and all seem to be successful in doing so. It's not clear whether there's any one model that's better than the others, or that any of these projects could work with another's model. Lastly, it's worth noting that all of these models are fairly orthogonal to the centralized VCS vs. DVCS debate which often surrounds such discussions.