alex gaynor's blago-blog

Posts tagged with open-source

Why Crypto

Posted February 12th, 2014. Tagged with python, open-source.

People who follow me on twitter or github have probably noticed over the past six months or so: I've been talking about, and working on, cryptography a lot. Before this I had basically zero crypto experience. Not a lot of programmers know about cryptography, and many of us (myself included) are frankly a bit scared of it. So how did this happen?

At first it was simple: PyCrypto (probably the most used cryptographic library for Python) didn't work on PyPy, and I needed to perform some simple cryptographic operations on PyPy. Someone else had already started work on a cffi based cryptography library, so I started trying to help out. Unfortunately the maintainer had to stop working on it. At about the same time several other people (some with much more cryptography experience than I) expressed interest in the idea of a new cryptography library for Python, so we got started on it.

It's worth noting that at the same time this was happening, Edward Snowden's disclosures about the NSA's activities were also coming out. While this never directly motivated me to work on cryptography, I also don't think it's a coincidence.

Since then I've been in something of a frenzy, reading and learning everything I can about cryptography. And while originally my motivation was "a thing that works on PyPy", I've now grown considerably more bold:

Programmers are used to being able to pick up domain knowledge as we go. When I worked on a golf website, I learned about how people organized golf outings, when I worked at rdio I learned about music licensing, etc. Programmers will apply their trade to many different domains, so we're used to learning about these different domains with a combination of Google, asking folks for help, and looking at the result of our code and seeing if it looks right.

Unfortunately, this methodology leads us astray: Google for many cryptographic problems leaves you with a pile of wrong answers, very few of us have friends who are cryptography experts to ask for help, and one can't just look at the result of a cryptographic operation and see if it's secure. Security is a property much more subtle than we usually have to deal with:

>>> encrypt(b"a secret message")
b'n frperg zrffntr'

Is the encrypt operation secure? Who knows!

Correctness in this case is dictated by analyzing the algorithms at play, not by looking at the result. And most of us aren't trained by this. In fact we've been actively encouraged not to know how. Programmers are regularly told "don't do your own crypto" and "if you want to do any crypto, talk to a real cryptographer". This culture of ignorance about cryptography hasn't resulted in us all contacting cryptographers, it's resulted in us doing bad crypto:

Usually when we design APIs, our goal is to make it easy to do something. Cryptographic APIs seem to have been designed on the same principle. Unfortunately that something is almost never secure. In fact, with many libraries, the path of least resistance leads you to doing something that is extremely wrong.

So we set out to design a better library, with the following principles:

  • It should never be easier to do the wrong thing than it is to do the right thing.
  • You shouldn't need to be a cryptography expert to use it, our documentation should equip you to make the right decisions.
  • Things which are dangerous should be obviously dangerous, not subtly dangerous.
  • Put our users' safety and security above all else.

I'm very proud of our work so far. You can find our documentation online. We're not done. We have many more types of cryptographic operations left to expose, and more recipes left to write. But the work we've done so far has stayed true to our principles. Please let us know if our documentation ever fails to make something accessible to you.

You can find the rest here. There are view comments.

Why Travis CI is great for the Python community

Posted January 6th, 2014. Tagged with python, open-source.

In the unlikely event you're both reading my blog, and have not heard of Travis CI, it's a CI service which specifically targets open source projects. It integrates nicely with Github, and is generally a pleasure to work with.

I think it's particularly valuable for the Python community, because it makes it easy to test against a variety of Pythons, which maybe you don't have at your fingertips on your own machine, such as Python 3 or PyPy (Editor's note: Why aren't you using PyPy for all the things?).

Travis makes this drop dead simple, in your .travis.yml simply write:

language: python
python:
    - "2.6"
    - "2.7"
    - "3.2"
    - "3.3"
    - "pypy"

And you'll be whisked away into a land of magical cross-Python testing. Or, if like me you're a fan of tox, you can easily run with that:

python: 2.7
env:
    - TOX_ENV=py26
    - TOX_ENV=py27
    - TOX_ENV=py32
    - TOX_ENV=py33
    - TOX_ENV=pypy
    - TOX_ENV=docs
    - TOX_ENV=pep8

script:
    - tox -e $TOX_ENV

This approach makes it easy to include things like linting or checking your docs as well.

Travis is also pretty great because it offers you a workflow. I'm a big fan of code review, and the combination of Travis and Github's pull requests are awesome. For basically every project I work on now, I work in this fashion:

  • Create a branch, write some code, push.
  • Send a pull request.
  • Iteratively code review
  • Check for travis results
  • Merge

And it's fantastic.

Lastly, and perhaps most importantly, Travis CI consistently gets better, without me doing anything.

You can find the rest here. There are view comments.

Gender neutral language - An FAQ

Posted November 30th, 2013. Tagged with ethics, diversity, open-source, community.

I'd like to refer to a hypothetical person in my documentation

Try something like this:

When a user visits the website, they will be assigned a session ID, and it will be transmitted to them in the HTTP response and stored in their browser.

But not like this!

When a user visits the website, he will be assigned a session ID, and it will be transmitted to him in the HTTP response and stored in his browser.

Why?

Using gendered pronouns signals to the audience your assumptions about who they are, and very often lets them know that they don't belong. Since that's not your intent, better to just be gender neutral.

And if you don't believe me, some folks did some science (other studies have consistently reproduced this result).

Can I just go 50/50 on male and female pronouns?

It's a nice idea, unfortunately it doesn't work. Your users don't read your documentation cover to cover, so they won't be able to see your good intentions. Instead they'll be linked somewhere in the middle, see your gendered language, and feel excluded.

In addition, not everyone identifies by male or female pronouns. Play it safe, just be gender neutral.

Using the plural pronouns isn't grammatically correct!

I've been assured by people far more knowlegable than I that it's ok, even Shakespeare did it. Personally, I'm comforted by the knowledge that even if I'm wrong about the grammar, I won't have made anyone feel excluded.

Someone sent a pull request to my project changing the languages!

So merge it! If you've got some process that a contributors needs to go through (such as a CLA), let them know. They're just trying to make your community better and bigger!

They said I was being hostile!

I'm sorry, but you were. Your choice of language has an impact on people.

I wasn't trying to be!

That's ok, hostility isn't about intent, your words had an impact whether you meant it or not.

Maybe you didn't know, you're not a native English speaker, your 11th grade English teacher beat you over the head with some bad advice. That's ok, it only takes a moment to fix it, and then you're letting everyone know it's easy to fix!

Aren't there bigger issues we should be dealing with?

There are so many giant issues we face. This one takes 15 seconds to fix, has no downsides, and we can all be a part of making it better. If we can't do this, how could we ever tackle the other challenges?

Has anyone ever asked these questions?

You have no idea.

Some of these aren't questions!

That's ok.

You can find the rest here. There are view comments.

Security process for Open Source Projects

Posted October 19th, 2013. Tagged with django, python, open-source, community.

This post is intended to describe how open source projects should handle security vulnerabilities. This process is largely inspired by my involvement in the Django project, whose process is in turn largely drawn from the PostgreSQL project's process. For every recommendation I make I'll try to explain why I've made it, and how it serves to protect you and your users. This is largely tailored at large, high impact, projects, but you should able to apply it to any of your projects.

Why do you care?

Security vulnerabilities put your users, and often, in turn, their users at risk. As an author and distributor of software, you have a responsibility to your users to handle security releases in a way most likely to help them avoid being exploited.

Finding out you have a vulnerability

The first thing you need to do is make sure people can report security issues to you in a responsible way. This starts with having a page in your documentation (or on your website) which clearly describes an email address people can report security issues to. It should also include a PGP key fingerprint which reporters can use to encrypt their reports (this ensures that if the email goes to the wrong recipient, that they will be unable to read it).

You also need to describe what happens when someone emails that address. It should look something like this:

  1. You will respond promptly to any reports to that address, this means within 48 hours. This response should confirm that you received the issue, and ideally whether you've been able to verify the issue or more information is needed.
  2. Assuming you're able to reproduce the issue, now you need to figure out the fix. This is the part with a computer and programming.
  3. You should keep in regular contact with the reporter to update them on the status of the issue if it's taking time to resolve for any reason.
  4. Now you need to inform the reporter of your fix and the timeline (more on this later).

Timeline of events

From the moment you get the initial report, you're on the clock. Your goal is to have a new release issued within 2-weeks of getting the report email. Absolutely nothing that occurs until the final step is public. Here are the things that need to happen:

  1. Develop the fix and let the reporter know.
  2. You need to obtain a CVE (Common Vulnerabilities and Exposures) number. This is a standardized number which identifies vulnerabilities in packages. There's a section below on how this works.
  3. If you have downstream packagers (such as Linux distributions) you need to reach out to their security contact and let them know about the issue, all the major distros have contact processes for this. (Usually you want to give them a week of lead time).
  4. If you have large, high visibility, users you probably want a process for pre-notifying them. I'm not going to go into this, but you can read about how Django handles this in our documentation.
  5. You issue a release, and publicize the heck out of it.

Obtaining a CVE

In short, follow these instructions from Red Hat.

What goes in the release announcement

Your release announcement needs to have several things:

  1. A precise and complete description of the issue.
  2. The CVE number
  3. Actual releases using whatever channel is appropriate for your project (e.g. PyPI, RubyGems, CPAN, etc.)
  4. Raw patches against all support releases (these are in addition to the release, some of your users will have modified the software, and they need to be able to apply the patches easily too).
  5. Credit to the reporter who discovered the issue.

Why complete disclosure?

I've recommended that you completely disclose what the issue was. Why is that? A lot of people's first instinct is to want to keep that information secret, to give your users time to upgrade before the bad guys figure it out and start exploiting it.

Unfortunately it doesn't work like that in the real world. In practice, not disclosing gives more power to attackers and hurts your users. Dedicated attackers will look at your release and the diff and figure out what the exploit is, but your average users won't be able to. Even embedding the fix into a larger release with many other things doesn't mask this information.

In the case of yesterday's Node.JS release, which did not practice complete disclosure, and did put the fix in a larger patch, this did not prevent interested individuals from finding out the attack, it took me about five minutes to do so, and any serious individual could have done it much faster.

The first step for users in responding to a security release in something they use is to assess exposure and impact. Exposure means "Am I affected and how?", impact means "What is the result of being affected?". Denying users a complete description of the issue strips them of the ability to answer these questions.

What happens if there's a zero-day?

A zero-day is when an exploit is publicly available before a project has any chance to reply to it. Sometimes this happens maliciously (e.g. a black-hat starts using the exploit against your users) and sometimes it is accidentally (e.g. a user reports a security issue to your mailing list, instead of the security contact). Either way, when this happens, everything goes to hell in a handbasket.

When a zero-day happens basically everything happens in 16x fast-forward. You need to immediately begin preparing a patch and issuing a release. You should be aiming to issue a release on the same day as the issue is made public.

Unfortunately there's no secret to managing zero-days. They're quite simply a race between people who might exploit the issue, and you to issue a release and inform your users.

Conclusion

Your responsibility as a package author or maintainer is to protect your users. The name of the game is keeping your users informed and able to judge their own security, and making sure they have that information before the bad guys do.

You can find the rest here. There are view comments.

Meritocracy

Posted October 12th, 2013. Tagged with politics, community, ethics, django, open-source.

Let's start with a definition, a meritocracy is a group where leadership or authority is derived from merit (merit being skills or ability), and particularly objective merit. I think adding the word objective is important, but not often explicitly stated.

A lot of people like to say open source is a meritocracy, the people who are the top of projects are there because they have the most merit. I'd like to examine this idea. What if I told you the United States Congress was a meritocracy? You might say "gee, how could that be, they're really terrible at their jobs, the government isn't even operational!?!". To which I might respond "that's evidence that they aren't good at their jobs, it doesn't prove that they aren't the best of the available candidates". You'd probably tell me that "surely someone, somewhere, is better qualified to do their jobs", and I'd say "we have an open, democratic process, if there was someone better, they'd run for office and get elected".

Did you see what I did there? It was subtle, a lot of people miss it. I begged the question. Begging the question is the act of responding to a hypothesis with a conclusion that's premised on exactly the question the hypothesis asks.

So what if you told me that Open Source was meritocracy? Projects gain recognition because they're the best, people become maintainers of libraries because they're the best.

And those of us involved in open source love this explanation, why wouldn't we? This explanation says that the reason I'm a core developer of Django and PyPy because I'm so gosh-darned awesome. And who doesn't like to think they're awesome? And if I can have a philosophy that leads to myself being awesome, all the better!

Unfortunately, it's not a valid conclusion. The problem with stating that a group is meritocratic is that it's not a falsifiable hypothesis.

We don't have a definition of objective merit. As a result of which there's no piece of evidence that I can show you to prove that a group isn't in fact meritocratic. And a central tenant of any sort of rigorous inquisitive process is that we need to be able to construct a formal opposing argument. I can test whether a society is democratic, do the people vote, is the result of the vote respected? I can't test if a society is meritocratic.

It's unhealthy when we consider or groups, or cultures, or our societies as being meritocratic. It makes us ignore questions about who our leaders are, how they got there who isn't represented. The best we can say is that maybe our organizations are (perceptions of subjective merit)-ocracies, which is profoundly different from what we mean when we say meritocracy.

I'd like to encourage groups that self-identify as being meritocratic (such as The Gnome Foundation, The Apache Software Foundation, Mozilla, The Document Foundation, and The Django Software Foundation) to reconsider this. Aspiring to meritocracy is a reasonable, it makes sense to want for the people who are best capable of doing so to lead us, but it's not something we can ever say we've achieved.

You can find the rest here. There are view comments.

Effective Code Review

Posted September 26th, 2013. Tagged with openstack, python, community, django, open-source.

Maybe you practice code review, either as a part of your open source project or as a part of your team at work, maybe you don't yet. But if you're working on a software project with more than one person it is, in my view, a necessary piece of a healthy workflow. The purpose of this piece is to try to convince you its valuable, and show you how to do it effectively.

This is based on my experience doing code review both as a part of my job at several different companies, as well as in various open source projects.

What

It seems only seems fair that before I try to convince you to make code review an integral part of your workflow, I precisely define what it is.

Code review is the process of having another human being read over a diff. It's exactly like what you might do to review someone's blog post or essay, except it's applied to code. It's important to note that code review is about code. Code review doesn't mean an architecture review, a system design review, or anything like that.

Why

Why should you do code review? It's got a few benefits:

  • It raises the bus factor. By forcing someone else to have the familiarity to review a piece of code you guarantee that at least two people understand it.
  • It ensures readability. By getting someone else to provide feedback based on reading, rather than writing, the code you verify that the code is readable, and give an opportunity for someone with fresh eyes to suggest improvements.
  • It catches bugs. By getting more eyes on a piece of code, you increase the chances that someone will notice a bug before it manifests itself in production. This is in keeping with Eric Raymond's maxim that, "given enough eyeballs, all bugs are shallow".
  • It encourages a healthy engineering culture. Feedback is important for engineers to grow in their jobs. By having a culture of "everyone's code gets reviewed" you promote a culture of positive, constructive feedback. In teams without review processes, or where reviews are infrequent, code review tends to be a tool for criticism, rather than learning and growth.

How

So now that I've, hopefully, convinced you to make code review a part of your workflow how do you put it into practice?

First, a few ground rules:

  • Don't use humans to check for things a machine can. This means that code review isn't a process of running your tests, or looking for style guide violations. Get a CI server to check for those, and have it run automatically. This is for two reasons: first, if a human has to do it, they'll do it wrong (this is true of everything), second, people respond to certain types of reviews better when they come from a machine. If I leave the review "this line is longer than our style guide suggests" I'm nitpicking and being a pain in the ass, if a computer leaves that review, it's just doing it's job.
  • Everybody gets code reviewed. Code review isn't something senior engineers do to junior engineers, it's something everyone participates in. Code review can be a great equalizer, senior engineers shouldn't have special privledges, and their code certainly isn't above the review of others.
  • Do pre-commit code review. Some teams do post-commit code review, where a change is reviewed after it's already pushed to master. This is a bad idea. Reviewing a commit after it's already been landed promotes a feeling of inevitability or fait accompli, reviewers tend to focus less on small details (even when they're important!) because they don't want to be seen as causing problems after a change is landed.
  • All patches get code reviewed. Code review applies to all changes for the same reasons as you run your tests for all changes. People are really bad at guessing the implications of "small patches" (there's a near 100% rate of me breaking the build on change that are "so small, I don't need to run the tests"). It also encourages you to have a system that makes code review easy, you're going to be using it a lot! Finally, having a strict "everything gets code reviewed" policy helps you avoid arguments about just how small is a small patch.

So how do you start? First, get yourself a system. Phabricator, Github's pull requests, and Gerrit are the three systems I've used, any of them will work fine. The major benefit of having a tool (over just mailing patches around) is that it'll keep track of the history of reviews, and will let you easily do commenting on a line-by-line basis.

You can either have patch authors land their changes once they're approved, or you can have the reviewer merge a change once it's approved. Either system works fine.

As a patch author

Patch authors only have a few responsibilities (besides writing the patch itself!).

First, they need to express what the patch does, and why, clearly.

Second, they need to keep their changes small. Studies have shown that beyond 200-400 lines of diff, patch review efficacy trails off [1]. You want to keep your patches small so they can be effectively reviewed.

It's also important to remember that code review is a collaborative feedback process if you disagree with a review note you should start a conversation about it, don't just ignore it, or implement it even though you disagree.

As a review

As a patch reviewer, you're going to be looking for a few things, I recommend reviewing for these attributes in this order:

  • Intent - What change is the patch author trying to make, is the bug they're fixing really a bug? Is the feature they're adding one we want?
  • Architecture - Are they making the change in the right place? Did they change the HTML when really the CSS was busted?
  • Implementation - Does the patch do what it says? Is it possibly introducing new bugs? Does it have documentation and tests? This is the nitty-gritty of code review.
  • Grammar - The little things. Does this variable need a better name? Should that be a keyword argument?

You're going to want to start at intent and work your way down. The reason for this is that if you start giving feedback on variable names, and other small details (which are the easiest to notice), you're going to be less likely to notice that the entire patch is in the wrong place! Or that you didn't want the patch in the first place!

Doing reviews on concepts and architecture is harder than reviewing individual lines of code, that's why it's important to force yourself to start there.

There are three different types of review elements:

  • TODOs: These are things which must be addressed before the patch can be landed; for example a bug in the code, or a regression.
  • Questions: These are things which must be addressed, but don't necessarily require any changes; for example, "Doesn't this class already exist in the stdlib?"
  • Suggestions for follow up: Sometimes you'll want to suggest a change, but it's big, or not strictly related to the current patch, and can be done separately. You should still mention these as a part of a review in case the author wants to adjust anything as a result.

It's important to note which type of feedback each comment you leave is (if it's not already obvious).

Conclusion

Code review is an important part of a healthy engineering culture and workflow. Hopefully, this post has given you an idea of either how to implement it for your team, or how to improve your existing workflow.

[1]http://www.ibm.com/developerworks/rational/library/11-proven-practices-for-peer-review/

You can find the rest here. There are view comments.

Doing a release is too hard

Posted September 17th, 2013. Tagged with openstack, django, python, open-source.

I just shipped a new release of alchimia. Here are the steps I went through:

  • Manually edit version numbers in setup.py and docs/conf.py. In theory I could probably centralize this, but then I'd still have a place I need to update manually.
  • Issue a git tag (actually I forgot to do that on this project, oops).
  • python setup.py register sdist upload -s to build and upload some tarballs to PyPi
  • python setup.py register bdist_wheel upload -s to build and upload some wheels to PyPi
  • Bump the version again for the now pre-release status (I never remeber to do this)

Here's how it works for OpenStack projects:

  • git tag VERSION -s (-s) makes it be a GPG signed tag)
  • git push gerrit VERSION this sends the tag to gerrit for review

Once the tag is approved in the code review system, a release will automatically be issue including:

  • Uploading to PyPi
  • Uploading documentation
  • Landing the tag in the official repository

Version numbers are always automatically handled correctly.

This is how it should be. We need to bring this level of automation to all projects.

You can find the rest here. There are view comments.

You guys know who Philo Farnsworth was?

Posted September 15th, 2013. Tagged with django, python, open-source, community.

Friends of mine will know I'm a very big fan of the TV show Sports Night (really any of Aaron Sorkin's writing, but Sports Night in particular). Before you read anything I have to say, take a couple of minutes and watch this clip:

I doubt Sorkin knew it when he scripted this (I doubt he knows it now either), but this piece is about how Open Source happens (to be honest, I doubt he knows what Open Source Software is).

This short clip actually makes two profound observations about open source.

First, most contribution are not big things. They're not adding huge new features, they're not rearchitecting the whole system to address some limitation, they're not even fixing a super annoying bug that affects every single user. Nope, most of them are adding a missing sentence to the docs, fixing a bug in a wacky edge case, or adding a tiny hook so the software is a bit more flexible. And this is fantastic.

The common wisdom says that the thing open source is really bad at is polish. My experience has been the opposite, no one is better at finding increasingly edge case bugs than open source users. And no one is better at fixing edge case bugs than open source contributors (who overlap very nicely with open source users).

The second lesson in that clip is about how to be an effective contributor. Specifically that one of the keys to getting involved effectively is for other people to recognize that you know how to do things (this is an empirical observation, not a claim of how things ought to be). How can you do that?

  • Write good bug reports. Don't just say "it doesn't work", if you've been a programmer for any length of time, you know this isn't a useful bug report. What doesn't work? Show us the traceback, or otherwise unexpected behavior, include a test case or instructions for reproduction.
  • Don't skimp on the details. When you're writing a patch, make sure you include docs, tests, and follow the style guide, don't just throw up the laziest work possible. Attention to detail (or lack thereof) communicates very clearly to someone reviewing your work.
  • Start a dialogue. Before you send that 2,000 line patch with that big new feature, check in on the mailing list. Make sure you're working in a way that's compatible with where the project is headed, give people a chance to give you some feedback on the new APIs you're introducing.

This all works in reverse too, projects need to treat contributors with respect, and show them that the project is worth their time:

  • Follow community standards. In Python this means things like PEP8, having a working setup.py, and using Sphinx for documentation.
  • Have passing tests. Nothing throws me for a loop worse than when I checkout a project to contribute and the tests don't pass.
  • Automate things. Things like running your tests, linters, even state changes in the ticket tracker should all be automated. The alternative is making human beings manually do a bunch of "machine work", which will often be forgotten, leading to a sub-par experience for everyone.

Remember, Soylent Green Open Source is people

That's it, the blog post's over.

You can find the rest here. There are view comments.

You don't have to be a jerk to code review

Posted July 16th, 2013. Tagged with open-source.

By this point it's likely you've read about the discussion Sarah Sharp started on the Linux Kernel Mailing List (LKML) about the abusive nature of some of the comments there, she also wrote about it on her blog.

The negative responses to this broadly fall into two categories: 1) The Linux development process works, stop trying to change it, and 2) professionalism means sugar coating things and that leads to backstabbing and people writing terrible patches. I'll try to respond to each of these.

To the former, all I'll say is that unless you believe the Linux development process is literally perfect you should be ok to investigating changes to it. You want to discuss the merits of a specific change, that's fine, but you can't honestly dismiss change as a whole out of hand.

To the latter, first I should probably say "yes this is a real concern". Quoting Linus here:

Because if you want me to "act professional", I can tell you that I'm not interested. I'm sitting in my home office wearign a bathrobe. The same way I'm not going to start wearing ties, I'm also not going to buy into the fake politeness, the lying, the office politics and backstabbing, the passive aggressiveness, and the buzzwords. Because THAT is what "acting professionally" results in: people resort to all kinds of really nasty things because they are forced to act out their normal urges in unnatural ways.

First, I want to stress that he makes a wholly unsubstantiated claim. That acting professionally (in this context: not being outrageously abusive) inevitably leads to backstabbing, passive aggressiveness, and politics.

I've done code review as a part of my job, I've done code review for several open source projects, including Django, CPython, PyPy, OpenStack, and Twisted. You don't need to be a jerk to review code. And you certainly don't need to be emotionally abusive. A useful code review involves telling someone what's wrong with their patch that prevents it from being merged, and if it's not self-evident, how they can fix it. None of this is made better by profanity-laced tirades.

Sarah, thank you for trying to help improve the Linux kernel community.

You can find the rest here. There are view comments.

Linux on the Desktop Dead

Posted September 3rd, 2012. Tagged with open-source.

At 10:40 AM this morning Linux on the desktop was pronounced dead by Dr. Randall Gnu at GPL Memorial Hospital in Portland, Oregon. The cause of death is, as yet, unknown but it is believed to be the result of two decades of constant exposure to FUD.

Friends and colleagues were shocked and saddened at the loss of Linux on the desktop. Said one, "It was so shocking to hear it passed away, it had been doing so well lately, more users than ever, more non-technical users". Said another, "People always say, 'I never expected to get this news', but I really never expected it."

Linux on the desktop had a rocky two decade history, however the last five years had been characterized by unprecedented growth.

The family has requested that in lieu of flowers donations be made to the Linux Foundation.

You can find the rest here. There are view comments.

Why personal funding

Posted July 4th, 2012. Tagged with open-source.

First, I have to apologize, because this post is incredibly self-serving, I believe everything I write here applies to myself, and if you do what I suggest, it will be to my benefit. I believe every single word I'm writing, but you have no way of knowing that, except trusting me. I can only hope I've written persuasively enough.

If you've ever asked an open source developer why they haven't fixed a bug or added a feature, the answer probably fell into one of three categories:

  • .00001% That is technically impossible
  • 25% I haven't needed that
  • 75% I don't have enough time

(Number may not add up to 100% due to rounding).

But why do you care? If you've made your way to my blog, it's incredibly likely you're a direct user of open source software. First, I think it'd be nice if you gave back. You're under no obligation to legally, morally, or otherwise, but I think it'd be a nice thing to do.

Second, and far more importantly: open source is an obscenely efficient way to develop software. I'm pretty sure if you compared the total person hours invested in V8, SpiderMonkey, the Hotspot JVM, and PyPy you'd find that PyPy had fewer person hours than any of those, by an order of magnitude, at least. And yet we can discuss them in the same sentence.

So here's the idea, there are a lot of developers (Github claims 1.7 million users). If we each spent a little of our money to pay a few open source developer's salaries, let's see what they could build if they did have the time.

If that sounds at all interesting to you, you should go over to Gittip, and help fund someone. I'm not sure Gittip is the perfect implementation of this idea, or even a good one. But I'm glad someone's trying it, because I want to see it work.

You can find the rest here. There are view comments.

Announcing VCS Translator

Posted January 21st, 2011. Tagged with python, vcs, software, programming, django, open-source.

For the past month or so I've been using a combination of Google, Stackoverflow, and bugging people on IRC to muddle my way through using various VCS that I'm not very familiar with. And all too often my queries are of the form of "how do I do git foobar -q in mercurial?". A while ago I tweeted that someone should write a VCS translator website. Nobody else did, so when I woke up far too early today I decided I was going to get something online to solve this problem, today! About 6 hours later I tweeted the launch of VCS translator.

This is probably not even a minimum viable product. It doesn't handle a huge range of cases, or version control systems. However, it is open source and it provides a framework for answering these questions. If you're interested I'd encourage you to fork it on github and help me out in fixing some of the most requested translation (I remove them once they're implemented).

My future goals for this are to allow commenting, so users can explain the caveats of the translations (very infrequently are the translations one-to-one) and to add a proper API. Moreover my goal is to make this a useful tool to other programmers who, like myself, have far too many VCS in their lives.

You can find the rest here. There are view comments.

The continuous integration I want

Posted November 2nd, 2010. Tagged with testing, python, tests, django, open-source.

Testing is important, I've been a big advocate of writing tests for a while, however when you've got tests you need to run them. This is a big problem in open source, Django works on something like six versions of Python (2.4, 2.5, 2.6, 2.7, Jython, and PyPy), 4 databases (SQLite, PostgreSQL, MySQL, Oracle, plus the GIS backends, and external backends), and I don't even know how many operating systems (at least the various Linuxes, OS X, and Windows). If I tried to run the tests in all those configurations for every commit I'd go crazy. Reusable applications have it even worse, ideally they should be tested under all those configurations, with each version of Django they support. For a Django application that wants to work on Django 1.1, 1.2, and all of those interpreters, databases, and operating systems you've got over 100 configurations. Crazy. John Resig faced a similar problem with jQuery (5 major browsers, multiple versions, mobile and desktop, different OSs), and the result was Test Swarm (note that at this time it doesn't appear to be up), an automated way for people to volunteer their machines to run tests. We need something like that for Python.

It'd be nice if it were as simple as users pointing their browser at a URL, but that's not practical with Python: the environments we want to test in are more complex than what can be detected, we need to know what external services are available (databases, for example). My suggestion is users should maintain a config file (.ini perhaps) somewhere on their system, it would say what versions of Python are available, and what external services are available (and how they can be accessed, e.g. DB passwords). Then the user downloads a bootstrap script and runs it. This script sees what services the user has available on their machine and queries a central server to see what tests need to be run, given the configuration they have available. The script downloads a test, creates a virtualenv, and does whatever setup it needs to do (e.g. writing a Django settings.py file given the available DB configuration), and runs the tests. Finally it sends the test results back to the central server.

It's very much like a standard buildbot system, except any user can download the script and start running tests. There are a number of problems to be solved, how do you verify that a project's tests aren't malicious (only allow trusted tests to start), how do you verify that the test results are valid, how do you actually write the configuration for a test suite? However, if solved I think this could be an invaluable resource for the Python community. Have a reusable app you want tested? Sign it up for PonySwarm, add a post-commit hook, and users will automatically run the tests for it.

You can find the rest here. There are view comments.

Priorities

Posted October 24th, 2010. Tagged with programming, django, python, open-source.

When you work on something as large and multi-faceted as Django you need a way to prioritize what you work on, without a system how do I decide if I should work on a new feature for the template system, a bugfix in the ORM, a performance improvement to the localization features, or better docs for contrib.auth? There's tons of places to jump in and work on something in Django, and if you aren't a committer you'll eventually need one to commit your work to Django. So if you ever need me to commit something, here's how I prioritize my time on Django:

  1. Things I broke: If I broke a buildbot, or there's a ticket reported against something I committed this is my #1 priority. Though Django no longer has a policy of trunk generally being perfectly stable it's still a very good way to treat it, once it gets out of shape it's hard to get it back into good standing.
  2. Things I need for work: Strictly speaking these don't compete with the other items on this list, in that these happen on my work's time, rather than in my free time. However, practically speaking, this makes them a relatively high priority, since my work time is fixed, as opposed to free time for Django, which is rather elastic.
  3. Things that take me almost no time: These are mostly things like typos in the documentation, or really tiny bugfixes.
  4. Things I think are cool or important: These are either things I personally think are fun to work on, or are in high demand from the community.
  5. Other things brought to my attention: This is the most important category, I can only work on bugs or features that I know exist. Django's trac has about 2000 tickets, way too many for me to ever sift through in one sitting. Therefore, if you want me to take a look at a bug or a proposed patch it needs to be brought to my attention. Just pinging me on IRC is enough, if I have the time I'm almost always willing to take a look.

In actuality the vast majority of my time is spent in the bottom half of this list, it's pretty rare for the build to be broken, and even rarer for me to need something for work, however, there are tons of small things, and even more cool things to work on. An important thing to remember is that the best way to make something show up in category #3 is to have an awesome patch with tests and documentation, if all I need to do is git apply && git commit that saves me a ton of time.

You can find the rest here. There are view comments.

DjangoCon 2010 Slides

Posted September 13th, 2010. Tagged with applications, reusable, django, djangocon, open-source.

DjangoCon 2010 was a total blast this year, and deserves a full recap, however for now I only have the time to post the slides from my talk on "Rethinking the Reusable Application Paradigm", the video has also been uploaded. Enjoy.

You can find the rest here. There are view comments.

Education Slides

Posted August 16th, 2010. Tagged with education, open-source.

This past weekend I attended a symposium on authentic learning, in Santa Barbara, California. I gave a talk there about Open Source, and how we emulate the practices of a learning community (as often seen in colleges and universities) and whether the pedagogal practices we engange in (to help get new contributors started, and hopefully guide them to becoming committers) are applicable to other fields. Unfortunately the talk wasn't recorded, however my slides are available online, there is also an accompanying paper that will be available in the future.

You can find the rest here. There are view comments.

Committer Models of Unladen Swallow, PyPy, and Django

Posted February 25th, 2010. Tagged with pypy, python, unladen-swallow, django, open-source.

During this year's PyCon I became a committer on both PyPy and Unladen Swallow, in addition I've been a contributer to Django for quite a long time (as well as having commit privileges to my branch during the Google Summer of Code). One of the things I've observed is the very different models these projects have for granting commit privileges, and what the expectations and responsibilities are for committers.

Unladen Swallow

Unladen Swallow is a Google funded branch of CPython focused on speed. One of the things I've found is that the developers of this project carry over some of the development process from Google, specifically doing code review on every patch. All patches are posted to Rietveld, and reviewed, often by multiple people in the case of large patches, before being committed. Because there is a high level of review it is possible to grant commit privileges to people without requiring perfection in their patches, as long as they follow the review process the project is well insulated against a bad patch.

PyPy

PyPy is also an implementation of Python, however its development model is based largely around aggressive branching (I've never seen a project handle SVN's branching failures as well as PyPy) as well as sprints and pair programming. By branching aggressively PyPy avoids the overhead of reviewing every single patch, and instead only requires review when something is already believed to be "trunk-ready", further this model encourages experimentation (in the same way git's light weight branches do). PyPy's use of sprints and pair programming are two ways to avoid formal code reviews and instead approach code quality as more of a collaborative effort.

Django

Django is the project I've been involved with for the longest, and also the only one I don't have commit privileges on. Django is extremely conservative in giving out commit privileges (there about a dozen Django committers, and about 500 names in the AUTHORS file). Django's development model is based neither on branching (only changes as large in scope as multiple database support, or an admin UI refactor get their own branch) nor on code review (most commits are reviewed by no one besides the person who commits them). Django's committers maintain a level of autonomy that isn't seen in either of the other two projects. This fact comes from the period before Django 1.0 was released when Django's trunk was often used in production, and the need to keep it stable at all times, combined with the fact that Django has no paid developers who can guarantee time to do code review on patches. Therefore Django has maintained code quality by being extremely conservative in granting commit privileges and allowing developers with commit privileges to exercise their own judgment at all times.

Conclusion

Each of these projects uses different methods for maintaining code quality, and all seem to be successful in doing so. It's not clear whether there's any one model that's better than the others, or that any of these projects could work with another's model. Lastly, it's worth noting that all of these models are fairly orthogonal to the centralized VCS vs. DVCS debate which often surrounds such discussions.

You can find the rest here. There are view comments.

Why Open Source Works

Posted January 27th, 2010. Tagged with thinking, open-source.

Open source works for a lot of reasons, but there is one that stands out. Open source is basically an application of democracy to a programming community, in fact it's the most perfect implementation of democracy yet.

The central idea of a democracy is that the governing party draws it's authority from the willing consent of the governed. In most modern democracies this is more or less true, but not exactly. In the real world it's often very difficult for a citizen to withdraw their consent to be governed by the governing: they have to wait until elections to formally enact change, violent rebellion basically doesn't exist in first world countries, and there are many barriers (economic and otherwise) to just picking up and leaving. Because of the difficulty in withdrawing consent, modern democracies are not (and probably cannot) perfectly embody this spirit.

But open source communities can. In the open source world forking is often considered to be a nuclear option (ignoring the use of the term in the DVCS sense of forking for collaboration), a tactic designed to fragment a community. But it's also the perfect equalizer. Authority in open source communities is derived from the community's willingness to stay under the leadership of that authority, at any point any member of the community can decide to exit the community: to use a different piece of software, or to fork it, and continue development however they please. Because there are practical options to withdrawing consent to the governance open source is probably the most perfect application of democracy.

You can find the rest here. There are view comments.