alex gaynor's blago-blog

Math Games

Posted August 4th, 2014. Tagged with math.

When I was in the third grade my friend Alexander and I invented a math game (whereby invented I mean, "I'm unable to precisely track down the origins of this game, so I'm assuming we created it entirely on our own"). In this post I'm going to describe how to play the game, and why I think it was really a really excellent tool for teaching several skills.

Rules

To start with, you need a deck of cards, we used some special math cards where the number of cards with each value were not evenly distributed, and the cards went 1-20 (possibly zero was included). Since most people don't have access to these, a deck of playing cards should work in a pinch, note that it's not really important to have a complete deck, or only one deck, and the faces don't matter, so it's fine to mix-and-match decks from that drawer where you accumulate partial card decks.

To start, you deal a row of 4 cards, face down in a line between the two players:

    Player one

C     C     C     C

    Player two

Then, on each side of the stacks in the middle, you deal a pile of 4 cards, face down. (Meaning each player has 4 stacks of 4 cards, each of which is associated with one of the cards in the middle):

    Player one

4C    4C    4C    4C

C     C     C     C

4C    4C    4C    4C

    Player two

To start the game, each player flips two of the cards in the middle over, simultaneously.

Then players look at each of their stacks, and for each one they must find a series of arithmetic operations which lead to the value of the corresponding card in the middle. For example, if my target was 7, and I was dealt 2, 4, 6, 8, I might find: (8 - 4) + (6 / 2). Each number must be used exactly once, and any binary operators are legal (at the time we only knew about addition, subtraction, multiplication, and division, but if you can find a use for logarithms or exponentiation be my guest).

Once a player has found a series for all 4 of their decks, they tell their opponent, and then they explain the series of operations they used for each target. If a player has forgotten one of the solutions, or made a mistake, both of them return to trying to find solutions.

If finding a solution seems impossible, a player can show their cards to their opponent and both of them can think really hard about if it's possible.

Why I like this

I believe this game was an important tool in developing my arithmetic skills at an early age. It teaches a few skills:

  • Arithmetic: Obviously you have to be able to do computations quickly and correctly in order to do well at this game.
  • Estimation: One of the tricks to quickly processing all the possible operations you could perform is to look at the scale of numbers, for example if I'm targetting a 2, and in my hand I have a 7 and an 8, I would never contemplate 7 * 8, because I know it will be difficult to get back down from 56.
  • Cooperation: While it is something of a competitive game, when a player is stuck and thinks a solution is impossible to find, the other player helps them. Further, at the end of a round, a player always explains how they found their solutions, so the other player can learn.
  • While not a skill, as such, this game teaches that math is fun! There's no doubt in my mind that this game is more fun that the memorization of multiplication tables many students are forced to do.

You can find the rest here. There are view comments.

There is a flash of light! Your PYTHON has evolved into ...

Posted July 4th, 2014. Tagged with python, open-source.

This year has been marked, for me, by many many discussions of Python versions. Finally, though, I've acquiesced, I've seen the light, and I'm doing what many have suggested. I'm taking the first steps: I'm changing my default Python.

Yes indeed, my global python is now something different:

$ python
Python 2.7.6 (32f35069a16d, Jun 06 2014, 20:12:47)
[PyPy 2.3.1 with GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>>

Yup, my default python is now PyPy.

Why?

Because I believe PyPy is the future, and I want to stamp out every single possible bug and annoyance a person might hit, and the best way to do that is to subject myself to them constantly. If startup performance is too slow, you know for damned sure I'll get pissed off and try to fix it.

I'm only one day into this, but thus far: I've found one bug in Mercurial's setup.py, and lots of my random scripts run faster. But this shouldn't be just me! In today's revolutionary spirit, I want to encourage you too to cast off the shackles of slow Python, and embrace the possibility of performance without compromises!

If you run into any issues at all: packages that won't install, things that are too slow, or take too much memory. You can email me personally. I'm committed to making this the most fantastic Python experience you could ever have.

Technical details

I changed by default Python, or OS X, using pyenv:

$ brew install pyenv
$ # Muck with my fish config
$ pyenv install pypy-2.3.1
$ pyenv global pypy-2.3.1
$ pip install a nice list of utilities I use

Tada.

You can find the rest here. There are view comments.

Quo Vadimus?

Posted May 26th, 2014. Tagged with open-source, community, python.

I've spent just about every single day for the last 6 months doing something with Python 3. Some days it was helping port a library, other days it was helping projects put together their porting strategies, and on others I've written prose on the subject. At this point, I am very very bored of talking about porting, and about the health of our ecosystem.

Most of all, I'm exhausted, particularly from arguing about whether or not the process is going well. So here's what I would like:

I would like to know what the success condition for Python 3 is. If we were writing a test case for this, when would it pass?

And let's do this with objective measures. Here are some ideas I have:

  • Percentage of package downloads from PyPI performed with Python 3 clients
  • Percentage of packages on PyPI which support Python 3
  • Percentage of Python builds on Travis CI which featured a Python 3 builder

I'd like a measurement, and I'd like a schedule: "At present x% of PyPI downloads use Python 3, in 3 months we'd like it to be at y%, in 12 months we'd like it to be at z%". Then we can have some way of judging whether we're on a successful path. And if we miss our goal, we'll know it's time to reevaluate this effort.

Quo vadimus?

You can find the rest here. There are view comments.

Service

Posted May 19th, 2014. Tagged with django, python, open-source, community.

If you've been around an Open Source community for any length of time, you've probably heard someone say, "We're all volunteers here". Often this is given as an explanation for why some feature hasn't been implemented, why a release has been delayed, and in general, why something hasn't happened.

I think when we say these things (and I've said them as much as anyone), often we're being dishonest. Almost always it's not a question of an absolute availability of resources, but rather how we prioritize among the many tasks we could complete. It can explain why we didn't have time to do things, but not why we did them poorly.

Volunteerism does not place us above criticism, nor should it absolve us when we err.

Beyond this however, many Open Source projects (including entirely volunteer driven ones) don't just make their codebases available to others, they actively solicit users, and make the claim that people can depend on this software.

That dependency can take many forms. It usually means an assumption that the software will still exist (and be maintained) tomorrow, that it will handle catastrophic bugs in a reasonable way, that it will be a stable base to build a platform or a business on, and that the software won't act unethically (such as by flagrantly violating expectations about privacy or integrity).

And yet, across a variety of these policy areas, such as security and backwards compatibility we often fail to properly consider the effects of our actions on our users, particularly in a context of "they have bet their businesses on this". Instead we continue to treat these projects as our hobby projects, as things we casually do on the side for fun.

Working on PyCA Cryptography, and security in general, has grealy influenced my thinking on these issues. The nature of cryptography means that when we make mistakes, we put our users' businesses, and potentially their customers' personal information at risk. This responsibility weighs heavily on me. It means we try to have policies that emphasize review, it means we utilize aggressive automated testing, it means we try to design APIs that prevent inadvertent mistakes which affect security, it means we try to write excellent documentation, and it means, should we have a security issue, we'll do everything in our power to protect our users. (I've previous written about what I think Open Source projects' security policies should look like).

Open Source projects of a certain size, scope, and importance need to take seriously the fact that we have an obligation to our users. Whether we are volunteers, or paid, we have a solemn responsibility to consider the impact of our decisions on our users. And too often in the past, we have failed, and acted negligently and recklessly with their trust.

Often folks in the Open Source community (again, myself included!) have asked why large corporations, who use our software, don't give back more. Why don't they employ developers to work on these projects? Why don't they donate money? Why don't they donate other resources (e.g. build servers)?

In truth, my salary is paid by every single user of Python and Django (though Rackspace graciously foots the bill). The software I write for these projects would be worth nothing if it weren't for the community around them, of which a large part is the companies which use them. This community enables me to have a job, to travel the world, and to meet so many people. So while companies, such as Google, don't pay a dime of my salary, I still gain a lot from their usage of Python.

Without our users, we would be nothing, and it's time we started acknowledging a simple truth: our projects exist in service of our users, and not the other way around.

You can find the rest here. There are view comments.

Best of PyCon 2014

Posted April 17th, 2014. Tagged with python, community.

This year was my 7th PyCon, I've been to every one since 2008. The most consistent trend in my attendance has been that over the years, I've gone to fewer and fewer talks, and spent more and more time volunteering. As a result, I can't tell you what the best talks to watch are (though I recommend watching absolutely anything that sounds interesting online). Nonetheless, I wanted to write down the two defining events at PyCon for me.

The first is the swag bag stuffing. This event occurs every year on the Thursday before the conference. Dozens of companies provide swag for PyCon to distribute to our attendees, and we need to get it into over 2,000 bags. This is one of the things that defines the Python community for me. By all rights, this should be terribly boring and monotonous work, but PyCon has turned it into an incredibly fun, and social event. Starting at 11AM, half a dozen of us unpacked box after box from our sponsors, and set the area up. At 3PM, over one hundred volunteers showed up to help us operate the human assembly line, and in less than two and a half hours, we'd filled the bags.

The second event I wanted to highlight was an open space session, on Composition. For over two hours, a few dozen people discussed the problems with inheritance, the need for explicit interface definition, what the most idiomatic ways to use decorators are, and other big picture software engineering topics. We talked about design mistakes we'd all made in our past, and discussed refactoring strategies to improve code.

These events are what make PyCon special for me: community, and technical excellence, in one place.

PS: You should totally watch my two talks. One is about pickle and the other is about performance.

You can find the rest here. There are view comments.

House and Twitter

Posted March 20th, 2014. Tagged with meta.

When I was younger, I started watching the TV show House M.D., and I really liked it. At some point my mom asked me if I was more sarcastic since I started watching the show. I said of course not, I've always been extremely sarcastic.

I was wrong. Watching House made being sarcastic cool.

Using Twitter makes being snarky and not putting thought into things cool. So I'm quitting Twitter. I'm already snarky and not-thoughtful enough, I don't need something to incentivize it for me.

I'll miss Twitter. Strange as it is to say, I've made many friends via Twitter, I've exposed myself to new perspectives, and I've laughed until it hurt. It's not worth it though.

If you still want to chat with me, or, for some unknown reason, hear what I have to say, you can join ##alex_gaynor on freenode, follow this blog, or email me at alex.gaynor@gmail.com.

You can find the rest here. There are view comments.

Why Crypto

Posted February 12th, 2014. Tagged with python, open-source.

People who follow me on twitter or github have probably noticed over the past six months or so: I've been talking about, and working on, cryptography a lot. Before this I had basically zero crypto experience. Not a lot of programmers know about cryptography, and many of us (myself included) are frankly a bit scared of it. So how did this happen?

At first it was simple: PyCrypto (probably the most used cryptographic library for Python) didn't work on PyPy, and I needed to perform some simple cryptographic operations on PyPy. Someone else had already started work on a cffi based cryptography library, so I started trying to help out. Unfortunately the maintainer had to stop working on it. At about the same time several other people (some with much more cryptography experience than I) expressed interest in the idea of a new cryptography library for Python, so we got started on it.

It's worth noting that at the same time this was happening, Edward Snowden's disclosures about the NSA's activities were also coming out. While this never directly motivated me to work on cryptography, I also don't think it's a coincidence.

Since then I've been in something of a frenzy, reading and learning everything I can about cryptography. And while originally my motivation was "a thing that works on PyPy", I've now grown considerably more bold:

Programmers are used to being able to pick up domain knowledge as we go. When I worked on a golf website, I learned about how people organized golf outings, when I worked at rdio I learned about music licensing, etc. Programmers will apply their trade to many different domains, so we're used to learning about these different domains with a combination of Google, asking folks for help, and looking at the result of our code and seeing if it looks right.

Unfortunately, this methodology leads us astray: Google for many cryptographic problems leaves you with a pile of wrong answers, very few of us have friends who are cryptography experts to ask for help, and one can't just look at the result of a cryptographic operation and see if it's secure. Security is a property much more subtle than we usually have to deal with:

>>> encrypt(b"a secret message")
b'n frperg zrffntr'

Is the encrypt operation secure? Who knows!

Correctness in this case is dictated by analyzing the algorithms at play, not by looking at the result. And most of us aren't trained by this. In fact we've been actively encouraged not to know how. Programmers are regularly told "don't do your own crypto" and "if you want to do any crypto, talk to a real cryptographer". This culture of ignorance about cryptography hasn't resulted in us all contacting cryptographers, it's resulted in us doing bad crypto:

Usually when we design APIs, our goal is to make it easy to do something. Cryptographic APIs seem to have been designed on the same principle. Unfortunately that something is almost never secure. In fact, with many libraries, the path of least resistance leads you to doing something that is extremely wrong.

So we set out to design a better library, with the following principles:

  • It should never be easier to do the wrong thing than it is to do the right thing.
  • You shouldn't need to be a cryptography expert to use it, our documentation should equip you to make the right decisions.
  • Things which are dangerous should be obviously dangerous, not subtly dangerous.
  • Put our users' safety and security above all else.

I'm very proud of our work so far. You can find our documentation online. We're not done. We have many more types of cryptographic operations left to expose, and more recipes left to write. But the work we've done so far has stayed true to our principles. Please let us know if our documentation ever fails to make something accessible to you.

You can find the rest here. There are view comments.

Why Travis CI is great for the Python community

Posted January 6th, 2014. Tagged with python, open-source.

In the unlikely event you're both reading my blog, and have not heard of Travis CI, it's a CI service which specifically targets open source projects. It integrates nicely with Github, and is generally a pleasure to work with.

I think it's particularly valuable for the Python community, because it makes it easy to test against a variety of Pythons, which maybe you don't have at your fingertips on your own machine, such as Python 3 or PyPy (Editor's note: Why aren't you using PyPy for all the things?).

Travis makes this drop dead simple, in your .travis.yml simply write:

language: python
python:
    - "2.6"
    - "2.7"
    - "3.2"
    - "3.3"
    - "pypy"

And you'll be whisked away into a land of magical cross-Python testing. Or, if like me you're a fan of tox, you can easily run with that:

python: 2.7
env:
    - TOX_ENV=py26
    - TOX_ENV=py27
    - TOX_ENV=py32
    - TOX_ENV=py33
    - TOX_ENV=pypy
    - TOX_ENV=docs
    - TOX_ENV=pep8

script:
    - tox -e $TOX_ENV

This approach makes it easy to include things like linting or checking your docs as well.

Travis is also pretty great because it offers you a workflow. I'm a big fan of code review, and the combination of Travis and Github's pull requests are awesome. For basically every project I work on now, I work in this fashion:

  • Create a branch, write some code, push.
  • Send a pull request.
  • Iteratively code review
  • Check for travis results
  • Merge

And it's fantastic.

Lastly, and perhaps most importantly, Travis CI consistently gets better, without me doing anything.

You can find the rest here. There are view comments.

PyPI Download Statistics

Posted January 3rd, 2014. Tagged with python.

For the past few weeks, I've been spending a bunch of time on a side project, which is to get better insight into who uses packages from PyPI. I don't mean what people, I mean what systems: how many users are on Windows, how many still use Python 2.5, do people install with pip or easy_install, questions like these; which come up all the time for open source projects.

Unfortunately until now there's been basically no way to get this data. So I sat down to solve this, and to do that I went straight to the source. PyPI! Downloads of packages are probably our best source of information about users of packages. So I set up a simple system: process log lines from the web server, parse any information I could out of the logs (user agents have tons of great stuff), and then insert it into a simple PostgreSQL database.

We don't yet have the system in production, but I've started playing with sample datasets, here's my current one:

pypi=> select count(*), min(download_time), max(download_time) from downloads;
  count  |         min         |         max
---------+---------------------+---------------------
 1981765 | 2014-01-02 14:46:42 | 2014-01-03 17:40:04
(1 row)

All of the downloads over the course of about 27 hours. There's a few caveats to the data: it only covers PyPI, packages installed with things like apt-get on Ubuntu/Debian aren't counted. Things like CI servers which frequently install the same package can "inflate" the download count, this isn't a way of directly measuring users. As with all data, knowing how to interpret it and ask good questions is at least as important as having the data.

Eventually I'm looking forwards to making this dataset available to the community; both as a way to ask one off queries ("What version of Python do people install my package with?") and as a whole dataset for running large analysis on ("How long does it take after a release before a new version of Django has widespread uptake?").

Here's a sample query:

pypi=> SELECT
pypi->     substring(python_version from 0 for 4),
pypi->     to_char(100 * COUNT(*)::numeric / (SELECT COUNT(*) FROM downloads), 'FM999.990') || '%' as percent_of_total_downloads
pypi-> FROM downloads
pypi-> GROUP BY
pypi->     substring(python_VERSION from 0 for 4)
pypi-> ORDER BY
pypi->     count(*) DESC;
 substring | percent_of_total_downloads
-----------+----------------------------
 2.7       | 75.533%
 2.6       | 15.960%
           | 5.840%
 3.3       | 2.079%
 3.2       | .350%
 2.5       | .115%
 1.1       | .054%
 2.4       | .052%
 3.4       | .016%
 3.1       | .001%
 2.1       | .000%
 2.0       | .000%
(12 rows)

Here's the schema to give you a sense of what data we have:

                                   Table "public.downloads"
          Column          |            Type             |              Modifiers
--------------------------+-----------------------------+-------------------------------------
 id                       | uuid                        | not null default uuid_generate_v4()
 package_name             | text                        | not null
 package_version          | text                        |
 distribution_type        | distribution_type           |
 python_type              | python_type                 |
 python_release           | text                        |
 python_version           | text                        |
 installer_type           | installer_type              |
 installer_version        | text                        |
 operating_system         | text                        |
 operating_system_version | text                        |
 download_time            | timestamp without time zone | not null
 raw_user_agent           | text                        |

Let your imagination run wild with the questions you can answer now that we have data!

You can find the rest here. There are view comments.

About Python 3

Posted December 30th, 2013. Tagged with python.

Python community, friends, fellow developers, we need to talk. On December 3rd, 2008 Python 3.0 was first released. At the time it was widely said that Python 3 adoption was going to be a long process, it was referred to as a five year process. We've just passed the five year mark.

At the time of Python 3's release, and for years afterwards I was very excited about it, evangelizing it, porting my projects to it, for the past year or two every new projects I've started has had Python 3 support from the get go.

Over the past six months or so, I've been reconsidering this position, and excitement has given way to despair.

For the first few years of the Python 3 migration, the common wisdom was that a few open source projects would need to migrate, and then the flood gates would open. In the Django world, that meant we needed a WSGI specification, we needed database drivers to migrate, and then we could migrate, and then our users could migrate.

By now, all that has happened, Django (and much of the app ecosystem) now supports Python 3, NumPy and the scientific ecosystem supports Python 3, several new releases of Python itself have been released, and users still aren't using it.

Looking at download statistics for the Python Package Index, we can see that Python 3 represents under 2% of package downloads. Worse still, almost no code is written for Python 3. As I said all of my new code supports Python 3, but I run it locally with Python 2, I test it locally with Python 2; Travis CI runs it under Python 3 for me; certainly none of my code is Python 3 only. At companies with large Python code bases I talk to no one is writing Python 3 code, and basically none of them are thinking about migrating their codebases to Python 3.

Since the time of the Python 3.1 it's been regularly said that the new features and additions the standard library would act as carrots to motivate people to upgrade. Don't get me wrong, Python 3.3 has some really cool stuff in it. But 99% of everybody can't actually use it, so when we tell them "that's better in Python 3", we're really telling them "Fuck You", because nothing is getting fixed for them.

Beyond all of this, it has a nasty pernicious effect on the development of Python itself: it means there's no feedback cycle. The fact that Python 3 is being used exclusively by very early adopters means that what little feedback happens on new features comes from users who may not be totally representative of the broader community. And as we get farther and farther in the 3.X series it gets worse and worse. Now we're building features on top of other features and at no level have they been subjected to actual wide usage.

Why aren't people using Python 3?

First, I think it's because of a lack of urgency. Many years ago, before I knew how to program, the decision to have Python 3 releases live in parallel to Python 2 releases was made. In retrospect this was a mistake, it resulted in a complete lack of urgency for the community to move, and the lack of urgency has given way to lethargy.

Second, I think there's been little uptake because Python 3 is fundamentally unexciting. It doesn't have the super big ticket items people want, such as removal of the GIL or better performance (for which many are using PyPy). Instead it has many new libraries (whose need is largely filled by pip install), and small cleanups which many experienced Python developers just avoid by habit at this point. Certainly nothing that would make one stop their development for any length of time to upgrade, not when Python 2 seems like it's going to be here for a while.

So where does this leave us?

Not a happy place. First and foremost I think a lot of us need to be more realistic about the state of Python 3. Particularly the fact that for the last few years, for the average developer Python, the language, has not gotten better.

The divergent paths of Python 2 and Python 3 have been bad for our community. We need to bring them back together.

Here's an idea: let's release a Python 2.8 which backports every new feature from Python 3. It will also deprecate anything which can't be changed in a backwards compatible fashion, for example str + unicode will emit a warning, as will any file which doesn't have from __future__ import unicode_literals. Users need to be able to follow a continuous process for their upgrades, Python 3 broke it, let's fix it.

That's my only idea. We need more ideas. We need to bridge this gap, because with every Python 3 release, it grows wider.

Thanks to Maciej Fijalkowski and several others for their reviews, it goes without saying that all remaining errors are my own.

You can find the rest here. There are view comments.

Gender neutral language - An FAQ

Posted November 30th, 2013. Tagged with ethics, diversity, open-source, community.

I'd like to refer to a hypothetical person in my documentation

Try something like this:

When a user visits the website, they will be assigned a session ID, and it will be transmitted to them in the HTTP response and stored in their browser.

But not like this!

When a user visits the website, he will be assigned a session ID, and it will be transmitted to him in the HTTP response and stored in his browser.

Why?

Using gendered pronouns signals to the audience your assumptions about who they are, and very often lets them know that they don't belong. Since that's not your intent, better to just be gender neutral.

And if you don't believe me, some folks did some science (other studies have consistently reproduced this result).

Can I just go 50/50 on male and female pronouns?

It's a nice idea, unfortunately it doesn't work. Your users don't read your documentation cover to cover, so they won't be able to see your good intentions. Instead they'll be linked somewhere in the middle, see your gendered language, and feel excluded.

In addition, not everyone identifies by male or female pronouns. Play it safe, just be gender neutral.

Using the plural pronouns isn't grammatically correct!

I've been assured by people far more knowlegable than I that it's ok, even Shakespeare did it. Personally, I'm comforted by the knowledge that even if I'm wrong about the grammar, I won't have made anyone feel excluded.

Someone sent a pull request to my project changing the languages!

So merge it! If you've got some process that a contributors needs to go through (such as a CLA), let them know. They're just trying to make your community better and bigger!

They said I was being hostile!

I'm sorry, but you were. Your choice of language has an impact on people.

I wasn't trying to be!

That's ok, hostility isn't about intent, your words had an impact whether you meant it or not.

Maybe you didn't know, you're not a native English speaker, your 11th grade English teacher beat you over the head with some bad advice. That's ok, it only takes a moment to fix it, and then you're letting everyone know it's easy to fix!

Aren't there bigger issues we should be dealing with?

There are so many giant issues we face. This one takes 15 seconds to fix, has no downsides, and we can all be a part of making it better. If we can't do this, how could we ever tackle the other challenges?

Has anyone ever asked these questions?

You have no idea.

Some of these aren't questions!

That's ok.

You can find the rest here. There are view comments.

Affirmative action

Posted November 27th, 2013. Tagged with community, ethics, diversity.

Whenever the topic of affirmative action comes up, you can be sure someone will ask the question: "How would you feel if you found out that you got your job, or got into college, because of your race?"

It's funny, no one ever asks: "How would you feel if you got your job, or got into college, because you were systemically advantaged from the moment you were born?"

Interesting.

You can find the rest here. There are view comments.

Security process for Open Source Projects

Posted October 19th, 2013. Tagged with django, python, open-source, community.

This post is intended to describe how open source projects should handle security vulnerabilities. This process is largely inspired by my involvement in the Django project, whose process is in turn largely drawn from the PostgreSQL project's process. For every recommendation I make I'll try to explain why I've made it, and how it serves to protect you and your users. This is largely tailored at large, high impact, projects, but you should able to apply it to any of your projects.

Why do you care?

Security vulnerabilities put your users, and often, in turn, their users at risk. As an author and distributor of software, you have a responsibility to your users to handle security releases in a way most likely to help them avoid being exploited.

Finding out you have a vulnerability

The first thing you need to do is make sure people can report security issues to you in a responsible way. This starts with having a page in your documentation (or on your website) which clearly describes an email address people can report security issues to. It should also include a PGP key fingerprint which reporters can use to encrypt their reports (this ensures that if the email goes to the wrong recipient, that they will be unable to read it).

You also need to describe what happens when someone emails that address. It should look something like this:

  1. You will respond promptly to any reports to that address, this means within 48 hours. This response should confirm that you received the issue, and ideally whether you've been able to verify the issue or more information is needed.
  2. Assuming you're able to reproduce the issue, now you need to figure out the fix. This is the part with a computer and programming.
  3. You should keep in regular contact with the reporter to update them on the status of the issue if it's taking time to resolve for any reason.
  4. Now you need to inform the reporter of your fix and the timeline (more on this later).

Timeline of events

From the moment you get the initial report, you're on the clock. Your goal is to have a new release issued within 2-weeks of getting the report email. Absolutely nothing that occurs until the final step is public. Here are the things that need to happen:

  1. Develop the fix and let the reporter know.
  2. You need to obtain a CVE (Common Vulnerabilities and Exposures) number. This is a standardized number which identifies vulnerabilities in packages. There's a section below on how this works.
  3. If you have downstream packagers (such as Linux distributions) you need to reach out to their security contact and let them know about the issue, all the major distros have contact processes for this. (Usually you want to give them a week of lead time).
  4. If you have large, high visibility, users you probably want a process for pre-notifying them. I'm not going to go into this, but you can read about how Django handles this in our documentation.
  5. You issue a release, and publicize the heck out of it.

Obtaining a CVE

In short, follow these instructions from Red Hat.

What goes in the release announcement

Your release announcement needs to have several things:

  1. A precise and complete description of the issue.
  2. The CVE number
  3. Actual releases using whatever channel is appropriate for your project (e.g. PyPI, RubyGems, CPAN, etc.)
  4. Raw patches against all support releases (these are in addition to the release, some of your users will have modified the software, and they need to be able to apply the patches easily too).
  5. Credit to the reporter who discovered the issue.

Why complete disclosure?

I've recommended that you completely disclose what the issue was. Why is that? A lot of people's first instinct is to want to keep that information secret, to give your users time to upgrade before the bad guys figure it out and start exploiting it.

Unfortunately it doesn't work like that in the real world. In practice, not disclosing gives more power to attackers and hurts your users. Dedicated attackers will look at your release and the diff and figure out what the exploit is, but your average users won't be able to. Even embedding the fix into a larger release with many other things doesn't mask this information.

In the case of yesterday's Node.JS release, which did not practice complete disclosure, and did put the fix in a larger patch, this did not prevent interested individuals from finding out the attack, it took me about five minutes to do so, and any serious individual could have done it much faster.

The first step for users in responding to a security release in something they use is to assess exposure and impact. Exposure means "Am I affected and how?", impact means "What is the result of being affected?". Denying users a complete description of the issue strips them of the ability to answer these questions.

What happens if there's a zero-day?

A zero-day is when an exploit is publicly available before a project has any chance to reply to it. Sometimes this happens maliciously (e.g. a black-hat starts using the exploit against your users) and sometimes it is accidentally (e.g. a user reports a security issue to your mailing list, instead of the security contact). Either way, when this happens, everything goes to hell in a handbasket.

When a zero-day happens basically everything happens in 16x fast-forward. You need to immediately begin preparing a patch and issuing a release. You should be aiming to issue a release on the same day as the issue is made public.

Unfortunately there's no secret to managing zero-days. They're quite simply a race between people who might exploit the issue, and you to issue a release and inform your users.

Conclusion

Your responsibility as a package author or maintainer is to protect your users. The name of the game is keeping your users informed and able to judge their own security, and making sure they have that information before the bad guys do.

You can find the rest here. There are view comments.

Meritocracy

Posted October 12th, 2013. Tagged with politics, community, ethics, django, open-source.

Let's start with a definition, a meritocracy is a group where leadership or authority is derived from merit (merit being skills or ability), and particularly objective merit. I think adding the word objective is important, but not often explicitly stated.

A lot of people like to say open source is a meritocracy, the people who are the top of projects are there because they have the most merit. I'd like to examine this idea. What if I told you the United States Congress was a meritocracy? You might say "gee, how could that be, they're really terrible at their jobs, the government isn't even operational!?!". To which I might respond "that's evidence that they aren't good at their jobs, it doesn't prove that they aren't the best of the available candidates". You'd probably tell me that "surely someone, somewhere, is better qualified to do their jobs", and I'd say "we have an open, democratic process, if there was someone better, they'd run for office and get elected".

Did you see what I did there? It was subtle, a lot of people miss it. I begged the question. Begging the question is the act of responding to a hypothesis with a conclusion that's premised on exactly the question the hypothesis asks.

So what if you told me that Open Source was meritocracy? Projects gain recognition because they're the best, people become maintainers of libraries because they're the best.

And those of us involved in open source love this explanation, why wouldn't we? This explanation says that the reason I'm a core developer of Django and PyPy because I'm so gosh-darned awesome. And who doesn't like to think they're awesome? And if I can have a philosophy that leads to myself being awesome, all the better!

Unfortunately, it's not a valid conclusion. The problem with stating that a group is meritocratic is that it's not a falsifiable hypothesis.

We don't have a definition of objective merit. As a result of which there's no piece of evidence that I can show you to prove that a group isn't in fact meritocratic. And a central tenant of any sort of rigorous inquisitive process is that we need to be able to construct a formal opposing argument. I can test whether a society is democratic, do the people vote, is the result of the vote respected? I can't test if a society is meritocratic.

It's unhealthy when we consider or groups, or cultures, or our societies as being meritocratic. It makes us ignore questions about who our leaders are, how they got there who isn't represented. The best we can say is that maybe our organizations are (perceptions of subjective merit)-ocracies, which is profoundly different from what we mean when we say meritocracy.

I'd like to encourage groups that self-identify as being meritocratic (such as The Gnome Foundation, The Apache Software Foundation, Mozilla, The Document Foundation, and The Django Software Foundation) to reconsider this. Aspiring to meritocracy is a reasonable, it makes sense to want for the people who are best capable of doing so to lead us, but it's not something we can ever say we've achieved.

You can find the rest here. There are view comments.

Thoughts on Lavabit

Posted October 2nd, 2013. Tagged with politics, security, ethics.

If you haven't already, you should start by reading Wired's article on this.

I am not a lawyer. That said, I want to walk through my take on each stage of this.

The government served Lavabit with an order requiring them to supply metadata about every email, as well as mailbox accesses, for a specific user. Because this was "metadata" only the government was not required to supply probable cause.

First, it should be noted that metadata isn't a thing. There's not a definition, it has no meaning. There's simply data.

Lavabit refused to comply, whereupon the government filed a motion requiring them to comply, which a US magistrate so ordered.

And here's where things go wrong. The magistrate erred in ordering compliance. While an argument could be made (note: I'm not making this argument) that in general, certain metadata does not have an expectation of privacy, Lavabit operates a specialized service. Immediately upon receipt of mail, it's encrypted with a user's public key. After that it's technically impossible for the service to read the plaintext of a user's email. This relationship creates a strong expectation of privacy, and the Fourth Amendment very explicitly requires a warrant supported by probably cause at this point.

But let's ignore this first order. Lavabit has, in past, complied with lawful search warrants, and there's no reason to believe they would not have been able to comply with a lawfully constructed one here.

Following this the FBI obtained a warrant requiring that Lavabit turn over their SSL private key. The application for, and issue of, this warrant unambiguously violated Lavabit's constitutional protection. The Fourth Amendment requires that a warrant describe specifically where is to be searched, and what they're looking for.

Access to Lavabit's private key would allow someone with the raw internet traffic (which presumably the FBI, had access to) to decrypt and read any user's emails before they reached Lavabit's servers. Simply put, this was a warrant issued in flagrant violation of the United State Constitution.

The fact that Lavabit refused to cooperate with the government's original order in no way gave them the right to apply (or be granted) the follow up order. Failure to comply with a lawfully issued warrant can result in fines, or even jail time, but it does not grant the government extra-legal authority.

The entirety of this case, but particularly the government's second request, demonstrate a travesty of immense proportions. The assumptions I grew up with about my legal protections as an American are rapidly being shown to be illusory. Lavabit's founder is raising money to support his legal defense, I've donated and I hope you will too.

You can find the rest here. There are view comments.