What's in a version number? (Or: Musings on backwards compatibility)

Software packages have version numbers. Thinking about them from scratch, the first thing we might want from version numbers is to know if two pieces of software are the same, we could accomplish this by making version numbers into an opaque value, like a UUID. Of course, a UUID isn’t a very useful version number because in practice we also want to do things like order versions, to know which is newer. We could make our version numbers just integers that increment, 1, 2, 3, 4…. But this doesn’t let us express things like “version 2 but with one of the security patches from 3 backported”, so perhaps we need some decimals, 2.5 since it’s half way between release 2 and 3. Before too long of going down this path, we’ll find we’ve invented the modern version number, which often look like major.minor.patch perhaps with optional modifiers like -beta. Having structured our version numbers like this, we can now assign semantics to what it means to increment each of these major, minor, and patch values. Thus, Semantic Versioning (SemVer) is born.

The most important semantic that SemVer expresses is that the major version number is incremented “when you make incompatible API changes”. SemVer doesn’t precisely define an incompatible change, except to make clear it’s only applicable to the public API of a package. The goal of all this is that users should be able to easily and safely update package versions, confident that things won’t break as long as the major version number stays the same. In order to accomplish this, it’s critical that consumers and package maintainers have compatible views of what constitutes a backwards incompatible change.

Unfortunately, we run squarely into Hyrum’s Law, which tell us that, “With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.” Phrased another way, users will rely on things that are not defined parts of your public API, and thus that you can break without incrementing the major version. If this is true (it is), it means users cannot update packages within a major version number with total confidence that things won’t break, even if the maintainer fully intends to follow SemVer.

Consider a package with a parse_foo function, which takes some bytes and returns a Foo. It is documented to: “Parse a Foo from its binary representation. Raises an exception if it can’t be parsed.” In version 1.2 of the package, the maintainer fixes a bug that was allowing some malformed Foos to be parsed. Unfortunately, one of its users has a bunch of binary Foos in their database that are malformed, and now when they upgrade they start getting exceptions all over production! This change was clearly consistent with the original documentation, and SemVer permits us to make this change without incrementing the major version number. Nonetheless, the user experienced a significant breakage (and one that may have been difficult to find with unit tests to boot!) and users are likely to complain about such breakages, regardless of the fact that the maintainer complied with SemVer to the letter.

We also experience problems from the opposite side. Consider a package with a parse_bar function, which takes some bytes and returns a Bar. It is documented to: “Parse a Bar from its binary representation using the Bar 2.4 specification. Raises an exception if it can’t be parsed.” In version 1.2 of this package we upgraded to the Bar 3.0 specification, which is stricter about what can be parsed, rejecting some Bars which were allowed in 2.4. This change is clearly a violation of backwards-compatibility, and requires a major version number bump. However, if the maintainer knows that it’s very unlikely for someone to be impacted by the changes in the Bar specification (perhaps they impact only very obscure parts of the spec that they believe no one uses), a maintainer may choose not to bump the major version number and see if they can get away with it. And very often they’ll be correct, if it’s an obscure feature that very few people rely on it’s entirely possible to violate backwards-compatibility and never hear a complaint!

The combination of these two generates a vicious feedback cycle: It’s possible to strictly adhere to SemVer while getting complaints from users that things are breaking, and to ignore SemVer by breaking backwards compatibility without a major version bump and to receive no complaints about it. Further, there are many properties of a codebase we care about, such as performance and security, which are rarely directly described in public APIs but which could substantially impact users. Consider a function documented to: “Take a URL and perform an HTTPS request securely, raising an error if it cannot be completed”. Between version 1.1 and 1.2 of the package the maintainer is considering changing which TLS ciphersuites are supported. One user argues, “this will break connections to servers which only support older TLS ciphersuites, therefore it is backwards incompatible”. A second user argues, “these TLS ciphersuites are known to be insecure, if we do not remove them than we are breaking backwards compatibility with respect to our claim that requests are made securely”. It seems the maintainer can neither release 1.2 with this change nor can we release without this change. Therefore they increment the major version number, 2.0 it is.

In fact, a reasonable person might conclude that fully contemplating whether each change is backwards incompatible is simply too much work, perhaps they should always increment the major version number, similar to how Firefox or Chrome’s versioning works. This is compatible with SemVer as written ("[Major version] MAY also include minor and patch level changes."), but not with people’s intuition of SemVer. This may lead to users not updating their version of packages, even when it would be safe to do so, which in turn may generate expectations that package authors maintain a supported branch for every major version where security patches are backported to, despite the burden on them to do so, lest users who haven’t updated be caught unable to fix security issues.

Frequent major version increments also have a more insidious impact: they encourage more backwards incompatible changes. If one of the constraints on how many backwards incompatible changes are made is a desire to avoid a major version increment, if they’re incrementing the major version all the time anyways then there’s no harm in lots of backwards incompatible changes. But of course, there is harm, as users will find it more difficult to adopt new versions if they contain many backwards incompatibilities.

In view of all of these competing desiderata, is there a better way to think about version numbers, and the semantics they express, than simply “backwards-incompatible or not”? I believe we can find an answer in the SRE concept of error budgets. Originating within Google’s then nascent SRE team, the idea is that absolutely zero downtime or production errors is unrealistic and the pursuit of it is likely to be incredibly costly with respect to competing desires, such as shipping product features quickly. Instead, the correct formulation is to decide how many errors you are willing to tolerate (e.g. 0.1% of requests may error per month) and then the team can balance ways to achieve that: if you’re under budget, you can ship features more aggressively, if you’re over budget you have to slow down and invest in testing and infrastructure. We need the analogous concept of a backwards-incompatibility-budget.

Pursuing absolutely no backwards incompatible changes is similarly unrealistic, for all the reasons discussed above. Instead packages should aspire to articulate what their goals for stability are, and find ways to pursue those. For example, if we have a hypothesis that a change we’d like to make is technically backwards incompatible, but in practice will impact very few people, we should look for ways to quantify and confirm or disprove this hypothesis. Can we run the tests of every package that depends on us with this change and see if their tests pass? Thinking of backwards compatibility as something we have a scarce amount of, and which we need to spend wisely, helps us make better decisions. It helps focus on the important substantive question of whether how valuable a backwards incompatible change is and how many users are likely to be impacted by it, and away from the procedural question of whether it is permitted by SemVer without a major version increment. And this aligns with the reality that users are upset by things that break for them, and not upset when they don’t, regardless of whether they are consistent with SemVer.