Notes on coreutils in Rust

Canonical recently announced that they’re on the path to switching the coreutils that Ubuntu ships from being the venerable upstream GNU coreutils to a newer Rust implementation. This has made some people very excited and some people very upset (no one appears to be in the middle, though I suspect that’s just a sampling bias). This post will analyze some of the merits for/against this change.

The place to start is with the usual strongest argument in favor of programs written in Rust: memory safety. As compared to programs written in C and C++, Rust programs can expect to have significantly fewer vulnerabilities as a result of being memory safe by default. This is widely considered good.

However, unlike many large programs written in C and C++, coreutils does not have a particularly significant history of security vulnerabilities due to memory unsafety. Speculating somewhat from first principles, this is most likely because most individual coreutils binaries are relatively small and have fairly constrained attack surfaces. Further, compared to something like a network protocol parsing library, most coreutils binaries aren’t used in contexts that are security relevant. For this reason, on the security front, coreutils does not present a strong argument for the benefits of Rust.

The next argument in favor of Rust is generally performance. The strongest version of this argument is when comparing Rust to Python or Ruby or even Java or Go, where Rust benefits from precise control over memory allocations, lack of GC, and other overhead-avoiding functionality, while still offering the ability to produce high level abstractions (i.e., writing a program to make 100 HTTP requests looks pretty similar in Rust and Python). However, as compared to C the arguments in favor of Rust’s performance are more muddied. While some will point out that Rust’s strict aliasing rules can lead to more compiler optimizations, it’s unclear that this provides significant benefits in the large (while it’s often easy to produce small kernels that benefit significantly, I’ve never seen these benefits extend to a measurable performance difference over a large program). A slightly different argument is that while it’s entirely possible to write equally performing programs in C and Rust, doing so in C can be annoying (particularly for smaller codebases). For example, if you have an algorithm that would benefit from a high performing hash table, Rust comes with one out of the box, and C does not. You could write your own, or depend on a library, but here you’ll face some inconvenience due to C’s lack of generics or standard package manager. From first principles, a C hash table will be just as fast a a Rust hash table, but for an engineer there’s considerable benefits to being able to just work with one that’s already available. All that said, it’s unclear whether the Rust coreutils programs are actually faster than GNU coreutils (or more likely, which ones are faster, which are slower, and by how much).

On the score so far, the arguments in favor of a coreutils rewrite are relatively weak. Were I SVP of Engineering for The Internet, I would probably not staff this project. But I’m not the SVP of Engineering for the Internet, in fact no one is. Some folks have, for their own reasons, built a Rust implementation of coreutils. A tremendous feature of open source software is that people can just build stuff and don’t have to justify themselves. Therefore, the question before us is not, “should someone rewrite coreutils”, it’s “given we now have two implementations of coreutils, which should we use”. And that’s an entirely different question. Based on what I’ve discussed so far, there’s some reasons to think a Rust coreutils implementation might have some marginal advantages, but not large ones. But that’s the question to ask, not “should we do a rewrite”, but rather “given that someone has, should we use it?”

Lastly, I want to consider a different benefit to Ubuntu of doing the work to enable switching to a Rust implementation of coreutils: that the process of doing so will inevitably uncover, and require resolving, innumerable blockers to using Rust for core parts of the distribution. And that the work of doing so will pave the way for using Rust for other pieces of the distribution that do benefit from Rust’s memory safety.

This was our experience with pyca/cryptography, where the initial release we did with Rust code did not benefit from Rust at all (it was a no-op module to test the build infrastructure). Shipping it, however, flushed out considerable technical and social challenges, everything from clearer error messages and instructions for how to install Rust to the need for a standard for distributing Python binaries for systems that use musl as their libc. Having done that work, we were able to a) use Rust for significantly more of our library where we did benefit considerably from both the security and performance benefits (e.g., replace OpenSSL’s X.509 parsing with our own, netting a 10x speed-up), b) make it dramatically easier for other Python projects to adopt Rust. So even though our initial release did nothing to showcase Rust’s benefits, it set the stage for big wins down the line.

If Canonical’s adoption of a Rust coreutils can play the same role, of identifying and resolving the blockers that make it difficult to ship Rust at the core of a distribution, that will be incredibly valuable, and enable Rust to be used in the places that it really shines.