Scaling Software Development

software engineering is programming integrated over time

This quote, from Titus Winters, expresses an important notion: that software engineering as a discipline must be considered not just with programming at a point in time, but with programming over an extended period. One of the things that tends to happen to a codebase as time is added, is that it tends to scale. We often think of scale in terms of amount of traffic to be served, or volumes of data to be processed. And these parameters do often increase as a codebase ages. But more importantly, the codebase itself scales.

As a codebase ages, it tends to increase the number of developers working on it, number of lines of code, and number of design requirements (of which traffic/data scale are an example). As Titus suggests, and as I’ve seen in my experience, all of these factors change what’s required in order to develop software that meets its goals.

I would like to contribute one of my own observations of what the the integration of programming over time means:

As the scale of a codebase increases, any properties of it which are not programmatically enforced will tend to regress.

A property of a program is any boolean statement or metric about the code or the resulting program. “Runs on a particular input without crashing” is a property, so is “adheres to our style-guide”, as is “percent of lines of code covered by automated tests”.

Programmatically enforced means that there is a software-driven process, which on some cadence (e.g. on every pull request, on every merge, or on a fixed schedule) ensures that a given property of the codebase is upheld.

I believe this observation is important because it connects our understanding of desirable properties about our code to scale. I believe these are often discussed independently of one another, or that a particular scale is assumed without being stated, and that leads to people talking past each other.

For example, the appeals of an automatic code formatter such as go fmt or black is not immediately obvious for a small development team. They may all have similar notions of code-style and their normal approach to development is capable of maintaining a consistently formatted codebase. However, as you grow to hundreds of developers and millions of lines of code, without automated tooling it becomes impossible to maintain consistent style.

Another desirable property is correctness. For a small program, manual testing as you make changes is often practical. However, as a project grows, remembering to perform all the manual tests becomes impractical, and automated test suites (whether unit or integration) generally become necessary to protect against regressions and ensure code works correctly in the first place.

These two examples highlight a common mistake people make: believing that because they can maintain a property of their code with attention to detail on a small scale, that it is necessarily true that attention to detail is sufficient on a larger scale.

As the field of software engineering has developed, our notions of programmatic checks have advanced. At first, developers were free to format their code however they pleased. And then we introduced descriptive (but unenforced) style guides such as PEP8, to guide individuals. And then we began programmatically enforcing adherence with tools such as pep8. And then we introduced auto-formatters such as go fmt – and more recent auto-formatters (e.g. black or rustfmt) are even more aggressive.

Similarly the growth of static type checkers for dynamically typed languages (e.g. sorbet and mypy) demonstrate the desire to programmatically enforce type safety in order to facilitate scale.

All of this brings me to a property that is near and dear to my heart – security, specifically memory safety. I’ve written many times about the problems I see with memory unsafe languages (principally C and C++). And I often get push back of the form “I write C and it’s fine, why aren’t the Windows/Chrome/Android/iOS/Firefox developers more disciplined?” I believe scale is the key to understanding this question.

Even a significant project such as OpenSSL is dwarfed by the scale of a web browser or operating system. They have more developers, more lines of code, and more competing design requirements (e.g. performance, new features, customizability, security). The differences in scale along these axes can be significant, browsers usually have more than 100 commits per day and tens of millions of lines of code, while OpenSSL has single digit commits per day and hundreds of thousands of lines of code. At the scale of a browser or operating system, discipline is empirically irrelevant, programmatic enforcement is the only thing capable of withstanding the deluge of new code and churn in existing code. These projects use automated code formatters, tests, performance measurements, and other tools to cope with the complexity this scale brings.

In the same way that code formatters and coverage measurement are introduced to projects to ensure those properties do not regress at scale, things need to be introduced to ensure security does not regress in C and C++ codebases. Examples of such tools are the sanitizers and fuzzers. And indeed the very large projects I mentioned above all make extensive use of these. And yet they still struggle with getting a handle on their memory unsafety vulnerabilities. That is because none of these enforce the property that they care about: that there are no security vulnerabilities due to memory unsafety. They instead enforce weaker properties such as that no test case exhibits memory unsafety that ASAN is capable of catching. The much stronger property of absolute memory safety cannot be enforced over arbitrary C or C++. Hence safer languages such as Rust and Swift, where it can be enforced by disallowing unsafe.

My conclusion from this analysis is that I need to amend my critique: It is possible to write secure C/C++ code. It’s just not possible to do it at scale. However, as Titus’s quote reminds us, scale comes from time. All large codebases start out as small codebases. Therefore the only prudent thing to do is avoid memory unsafe languages entirely. And so long as you do have to maintain a C or C++ codebase, enforce as many safety properties programmatically as you can.