Standard Libraries and their Discontents
Standard libraries are among the most debated topics in programming language design. They are by turns the most widely used APIs in any programming ecosystem and also the most criticized. This post will explore what makes standard libraries what they are. It is entirely non-normative; my goal is to describe what standard libraries are, not specify what they should be.
A standard library is the set of libraries/APIs that are available to users of a programming language without needing to take any additional action. The definition of which APIs are available is specified in the same place as where the language’s own semantics are specified. With such a simple definition, how can they be a source of such debate?
Well, let’s start by looking at what people like and dislike about standard libraries. Again, this is purely descriptive, I’m not passing judgement on whether people are right or wrong to like these things.
Likes
Perhaps the single most liked thing about standard libraries is that they’re easy to use. They’re simply there (“batteries included”). They don’t require configuration or installing packages or anything else. This is particularly true in programming environments without universal package management or where external dependencies are otherwise painful because they magnify the difference in ease of use between standard libraries and other third-party dependencies.
Beyond merely being easy to use on a technical level, standard libraries are cognitively easier to use. It’s treated as not being a decision at all as to whether to use a standard library’s Date
type, while opting to use a third-party library means choosing between multiple competing options. A standard library isn’t just standard, it’s a default.
Standard libraries also provide standard vocabulary types that enable interoperability between other libraries. For example, a standard DateTime
type can enable interoperability between a third-party ASN.1 parsing library that parses DateTime
values and a third-party MySQL library that knows how to store DateTime
values in the database. Similarly, a standard library Reader
interface can allow a third-party .zip
library to produce readers that can be consumed by a third-party CSV parser. While ubiquitous third-party libraries can work similarly, standard libraries more easily facilitate this kind of interoperability by virtue of being, well, standard.
Another thing people like about standard libraries is that they inherit policies, such as backwards compatibility and platform support, from the language itself. And they’re presumed to be high quality (or, at least as high quality as the language), to manage security issues responsibly, and to be well tested. Some of these points are from people’s observations and experiences using standard libraries, others are largely assumptions people make. All reflect things that people desire in libraries, and generally believe that standard libraries are guaranteed to deliver on.
As a result of these observations and assumptions, standard libraries are always available in policy-constrained environments (e.g., those with third-party security review policies or who operate in air-gapped environments). In contrast, in these types of environments third-party libraries require reviews which may have extended timelines – if they’re allowed at all. This is yet another way in which a standard library is simply easier to use.
The calculus of how inclusion in a standard library increases supply chain risk (independent of compliance) is slightly more complicated. All other things being equal, a library being maintained by the existing set of standard library maintainers reduces supply chain risk as compared to a third-party library with an independent maintainer. But does a third-party package whose maintainers are also maintainers of the standard library add supply chain risk? Probably not (as long as downstream consumers have some way to notice if the maintainership changes). Similarly, if migrating a third-party library into the standard library requires adding new maintainers to the standard library how does that impact risk? In general, we tend to think of standard libraries as being so critical and so closely watch that they have negligible supply chain risk, but at the margin it must be the case that expanding the pool of maintainers does introduce some forms of supply chain risk (while possibly also mitigating other forms of risk, by adding more people on the lookout for bad actors).
On an entirely different note, because standard libraries are developed and distributed with a programming language implementation they may rely on implementation details. This allows them to have APIs or internals that would not be possible to safely reproduce in a third-party library, giving users of a standard library package advantage that they can’t obtain elsewhere (performance is a common example of the benefits of being able to rely on implementation details). For similar reasons, a desired improvement to a standard library can motivate changes to the language itself to accommodate the functionality in a way that is more difficult for a third-party library to justify.
Lastly, as a result of all of this, standard libraries are extremely widely used. Therefore, they justify significant investment in performance, polish, documentation, and other axes that benefit from elbow grease. Which is not to say they always receive this justified investment, but users generally carry an assumption that this will be true at least some of the time.
Dislikes
Perhaps standard libraries' biggest critique is that they tend to atrophy (“the standard library is where modules go to die”). This is true in several ways. First, because they have strong backwards compatibility policies but limited ability to engage in Semantic Versioning-like processes (see: Python 3), they tend to carry poor APIs for extended periods (often not attempting to deprecate and evolve old APIs at all). They’re also less convenient to develop and contribute to than stand-alone libraries: they’re developed in what are effectively large monorepos (but without a dedicated engineering team to support them) with long CI cycle times and frequently bespoke build processes. Lastly, as a language grows older it becomes harder to add to them, leading to a sense that standard libraries are inconsistent over time and you can tell from looking at a library which era it was added to the standard library in.
Another complaint about standard libraries is that they require upgrading the entire programming language to take advantage of new additions (or bug fixes). An extremely common pattern is for users to get their programming language from an operating system vendor while third-party libraries are installed with a non-system package manager. The effect of this is that standard libraries have upgrade cycles measured in years while third-party libraries can have much shorter upgrade cycles. This dynamic even extends to open source library developers, who have considerable flexibility in the versions of third-party dependencies they rely on, but face more pushback if they attempt to raise the minimum supported programming language version too aggressively.
Finally, there is a concern that distribution in a standard library is effectively an unfair method of competition for a library. By being in a standard library, a library can compete with third-party alternatives on the basis of ease of distribution, as opposed to based on quality measures such as API usability, performance, reliability, or documentation. In effect, a library in a standard library can be worse than third-party alternatives while remaining the default. In extreme cases, a third-party library may be able to out-compete a standard library, but in the common case ease of distribution will frequently win out, to the point of crowding people out from even attempting to compete with something in the standard library.
Conclusion
One thing that should be clear is that there’s incredible symmetry between what people like and dislike about standard libraries. Backwards compatibility is great because it means you can upgrade without fear, and it sucks because you live with every mistake for forever. Being pre-installed and easy to use is incredibly convenient and also displaces investment in third-party libraries, reducing competitive pressure to build the best thing possible. How each individual weighs the competing sides of the coin is usually a proxy for their beliefs and preferences about whether standard libraries should be larger or smaller.
Ideally, we’d be able to find ways to capture (some of) these benefits without the downsides. It’s my hope that this post will be a useful resource in catalyzing people to think about ways to accomplish this.