alex gaynor's blago-blog

Posts tagged with open-source

Announcing VCS Translator

Posted January 21st, 2011. Tagged with python, vcs, software, programming, django, open-source.

For the past month or so I've been using a combination of Google, Stackoverflow, and bugging people on IRC to muddle my way through using various VCS that I'm not very familiar with. And all too often my queries are of the form of "how do I do git foobar -q in mercurial?". A while ago I tweeted that someone should write a VCS translator website. Nobody else did, so when I woke up far too early today I decided I was going to get something online to solve this problem, today! About 6 hours later I tweeted the launch of VCS translator.

This is probably not even a minimum viable product. It doesn't handle a huge range of cases, or version control systems. However, it is open source and it provides a framework for answering these questions. If you're interested I'd encourage you to fork it on github and help me out in fixing some of the most requested translation (I remove them once they're implemented).

My future goals for this are to allow commenting, so users can explain the caveats of the translations (very infrequently are the translations one-to-one) and to add a proper API. Moreover my goal is to make this a useful tool to other programmers who, like myself, have far too many VCS in their lives.

You can find the rest here. There are view comments.

The continuous integration I want

Posted November 2nd, 2010. Tagged with testing, python, tests, django, open-source.

Testing is important, I've been a big advocate of writing tests for a while, however when you've got tests you need to run them. This is a big problem in open source, Django works on something like six versions of Python (2.4, 2.5, 2.6, 2.7, Jython, and PyPy), 4 databases (SQLite, PostgreSQL, MySQL, Oracle, plus the GIS backends, and external backends), and I don't even know how many operating systems (at least the various Linuxes, OS X, and Windows). If I tried to run the tests in all those configurations for every commit I'd go crazy. Reusable applications have it even worse, ideally they should be tested under all those configurations, with each version of Django they support. For a Django application that wants to work on Django 1.1, 1.2, and all of those interpreters, databases, and operating systems you've got over 100 configurations. Crazy. John Resig faced a similar problem with jQuery (5 major browsers, multiple versions, mobile and desktop, different OSs), and the result was Test Swarm (note that at this time it doesn't appear to be up), an automated way for people to volunteer their machines to run tests. We need something like that for Python.

It'd be nice if it were as simple as users pointing their browser at a URL, but that's not practical with Python: the environments we want to test in are more complex than what can be detected, we need to know what external services are available (databases, for example). My suggestion is users should maintain a config file (.ini perhaps) somewhere on their system, it would say what versions of Python are available, and what external services are available (and how they can be accessed, e.g. DB passwords). Then the user downloads a bootstrap script and runs it. This script sees what services the user has available on their machine and queries a central server to see what tests need to be run, given the configuration they have available. The script downloads a test, creates a virtualenv, and does whatever setup it needs to do (e.g. writing a Django settings.py file given the available DB configuration), and runs the tests. Finally it sends the test results back to the central server.

It's very much like a standard buildbot system, except any user can download the script and start running tests. There are a number of problems to be solved, how do you verify that a project's tests aren't malicious (only allow trusted tests to start), how do you verify that the test results are valid, how do you actually write the configuration for a test suite? However, if solved I think this could be an invaluable resource for the Python community. Have a reusable app you want tested? Sign it up for PonySwarm, add a post-commit hook, and users will automatically run the tests for it.

You can find the rest here. There are view comments.

Priorities

Posted October 24th, 2010. Tagged with django, python, open-source, programming.

When you work on something as large and multi-faceted as Django you need a way to prioritize what you work on, without a system how do I decide if I should work on a new feature for the template system, a bugfix in the ORM, a performance improvement to the localization features, or better docs for contrib.auth? There's tons of places to jump in and work on something in Django, and if you aren't a committer you'll eventually need one to commit your work to Django. So if you ever need me to commit something, here's how I prioritize my time on Django:

  1. Things I broke: If I broke a buildbot, or there's a ticket reported against something I committed this is my #1 priority. Though Django no longer has a policy of trunk generally being perfectly stable it's still a very good way to treat it, once it gets out of shape it's hard to get it back into good standing.
  2. Things I need for work: Strictly speaking these don't compete with the other items on this list, in that these happen on my work's time, rather than in my free time. However, practically speaking, this makes them a relatively high priority, since my work time is fixed, as opposed to free time for Django, which is rather elastic.
  3. Things that take me almost no time: These are mostly things like typos in the documentation, or really tiny bugfixes.
  4. Things I think are cool or important: These are either things I personally think are fun to work on, or are in high demand from the community.
  5. Other things brought to my attention: This is the most important category, I can only work on bugs or features that I know exist. Django's trac has about 2000 tickets, way too many for me to ever sift through in one sitting. Therefore, if you want me to take a look at a bug or a proposed patch it needs to be brought to my attention. Just pinging me on IRC is enough, if I have the time I'm almost always willing to take a look.

In actuality the vast majority of my time is spent in the bottom half of this list, it's pretty rare for the build to be broken, and even rarer for me to need something for work, however, there are tons of small things, and even more cool things to work on. An important thing to remember is that the best way to make something show up in category #3 is to have an awesome patch with tests and documentation, if all I need to do is git apply && git commit that saves me a ton of time.

You can find the rest here. There are view comments.

DjangoCon 2010 Slides

Posted September 13th, 2010. Tagged with applications, reusable, django, open-source, djangocon.

DjangoCon 2010 was a total blast this year, and deserves a full recap, however for now I only have the time to post the slides from my talk on "Rethinking the Reusable Application Paradigm", the video has also been uploaded. Enjoy.

You can find the rest here. There are view comments.

Education Slides

Posted August 16th, 2010. Tagged with education, open-source.

This past weekend I attended a symposium on authentic learning, in Santa Barbara, California. I gave a talk there about Open Source, and how we emulate the practices of a learning community (as often seen in colleges and universities) and whether the pedagogal practices we engange in (to help get new contributors started, and hopefully guide them to becoming committers) are applicable to other fields. Unfortunately the talk wasn't recorded, however my slides are available online, there is also an accompanying paper that will be available in the future.

You can find the rest here. There are view comments.

Committer Models of Unladen Swallow, PyPy, and Django

Posted February 25th, 2010. Tagged with pypy, python, unladen-swallow, django, open-source.

During this year's PyCon I became a committer on both PyPy and Unladen Swallow, in addition I've been a contributer to Django for quite a long time (as well as having commit privileges to my branch during the Google Summer of Code). One of the things I've observed is the very different models these projects have for granting commit privileges, and what the expectations and responsibilities are for committers.

Unladen Swallow

Unladen Swallow is a Google funded branch of CPython focused on speed. One of the things I've found is that the developers of this project carry over some of the development process from Google, specifically doing code review on every patch. All patches are posted to Rietveld, and reviewed, often by multiple people in the case of large patches, before being committed. Because there is a high level of review it is possible to grant commit privileges to people without requiring perfection in their patches, as long as they follow the review process the project is well insulated against a bad patch.

PyPy

PyPy is also an implementation of Python, however its development model is based largely around aggressive branching (I've never seen a project handle SVN's branching failures as well as PyPy) as well as sprints and pair programming. By branching aggressively PyPy avoids the overhead of reviewing every single patch, and instead only requires review when something is already believed to be "trunk-ready", further this model encourages experimentation (in the same way git's light weight branches do). PyPy's use of sprints and pair programming are two ways to avoid formal code reviews and instead approach code quality as more of a collaborative effort.

Django

Django is the project I've been involved with for the longest, and also the only one I don't have commit privileges on. Django is extremely conservative in giving out commit privileges (there about a dozen Django committers, and about 500 names in the AUTHORS file). Django's development model is based neither on branching (only changes as large in scope as multiple database support, or an admin UI refactor get their own branch) nor on code review (most commits are reviewed by no one besides the person who commits them). Django's committers maintain a level of autonomy that isn't seen in either of the other two projects. This fact comes from the period before Django 1.0 was released when Django's trunk was often used in production, and the need to keep it stable at all times, combined with the fact that Django has no paid developers who can guarantee time to do code review on patches. Therefore Django has maintained code quality by being extremely conservative in granting commit privileges and allowing developers with commit privileges to exercise their own judgment at all times.

Conclusion

Each of these projects uses different methods for maintaining code quality, and all seem to be successful in doing so. It's not clear whether there's any one model that's better than the others, or that any of these projects could work with another's model. Lastly, it's worth noting that all of these models are fairly orthogonal to the centralized VCS vs. DVCS debate which often surrounds such discussions.

You can find the rest here. There are view comments.

Why Open Source Works

Posted January 27th, 2010. Tagged with thinking, open-source.

Open source works for a lot of reasons, but there is one that stands out. Open source is basically an application of democracy to a programming community, in fact it's the most perfect implementation of democracy yet.

The central idea of a democracy is that the governing party draws it's authority from the willing consent of the governed. In most modern democracies this is more or less true, but not exactly. In the real world it's often very difficult for a citizen to withdraw their consent to be governed by the governing: they have to wait until elections to formally enact change, violent rebellion basically doesn't exist in first world countries, and there are many barriers (economic and otherwise) to just picking up and leaving. Because of the difficulty in withdrawing consent, modern democracies are not (and probably cannot) perfectly embody this spirit.

But open source communities can. In the open source world forking is often considered to be a nuclear option (ignoring the use of the term in the DVCS sense of forking for collaboration), a tactic designed to fragment a community. But it's also the perfect equalizer. Authority in open source communities is derived from the community's willingness to stay under the leadership of that authority, at any point any member of the community can decide to exit the community: to use a different piece of software, or to fork it, and continue development however they please. Because there are practical options to withdrawing consent to the governance open source is probably the most perfect application of democracy.

You can find the rest here. There are view comments.