A Memory Safety Research Agenda
I’ve been a ferocious critic of C, C++, and other memory unsafe languages, and a booster of memory safe languages such as Swift, Go, and particularly Rust. And though I believe there is a more-than-sufficient body of evidence to support the claim that the time to start migrating is now, there are still open questions related to how we migrate systems to memory safe languages more scalably, and how we maximize the safety of code written in these languages.
The remainder of this post will outline specific areas I believe are open research questions. It is my hope that this will prove useful to software and security engineers, researchers, as well as funding bodies. I’ll be making references to things in Rust, however I believe most, if not all, of my examples are equally applicable to other memory safe languages.
Driving adoption
The first set of research projects deal with how we can best ensure more projects adopt memory safe languages.
Automated conversion
There is presently an enormous body of existing C and C++ code. Migrating even
small portions of it by hand is the work of several lifetimes. An important open
question is how we can use tools to automate these questions. Projects such as
C2Rust 1 begin to answer this question, by providing automated conversion
from C to Rust. Unsurprisingly, this does not automatically make the Rust code
safe, the generated code makes heavy use of Rust’s unsafe
.
I believe exploration of how unsafe
Rust derived from C can be converted to
safe Rust, probably utilizing human guided automation, would be an incredibly
valuable accomplishment. Two specific areas I believe could benefit would be
converting pointer plus length pairs to Rust’s slices, and converting raw
malloc
invocations to use Rust’s Box
. As an initial hypothesis, I
imagine this could work something like:
- An automated tool identifies function arguments that look like they could be pointers plus lengths
- The tool prompts the user to confirm that the function should be rewritten to accept a slice
- If the user confirms, the tool automatically rewrites the function’s arguments to take a slice, its callers to pass a slice, and the body of the function to manipulate the slice safely (possibly with additional human prompting for details that cannot automatically be inferred)
Such a process could be an incredible time saver for reducing the unsafe
-ty
in converted code (particularly given Microsoft’s research indicating that
spatial safety is the most common vulnerability category).
Build critical abstractions
Lots of memory unsafe code is built on a small number of very popular abstractions and libraries, for example Linux kernel modules and Skia. Both are written in C or C++, and thus effectively all consumers of their APIs are as well. Major projects like these should have bindings in memory safe languages that allow new consumers to be written in memory safe languages, and existing consumers to migrate. Such projects should also, in the long term, assist with inverting this relationship, allowing these projects to be written in a memory safe language and offer C/C++ bindings for backwards compatibility.
An example of a project like this is linux-kernel-module-rust, by Geoffrey Thomas and myself, which allows writing Linux kernel modules entirely in safe Rust. Unfortunately, it currently supports only a narrow subset of the kernel APIs (exposing character devices and sysctls). Expanding its API surface and helping port real world kernel modules would have a profound impact.
Empirical research
Finally, we simply need more research on what strategies are effective for migrating codebases (and which strategies aren’t!). This can include both writing descriptions of successful introductions of memory safety into an existing code base (e.g. Firefox and librsvg) as well as experiments which try different approaches to migration.
Safer unsafe
The second set of research projects deal with how to improve the safety of code
written using unsafe
keywords are similar.
Lifetimes with Foreign Function Interfaces (FFI)
One of the larger challenges with using C and C++ libraries from a language like Rust is figuring out what lifetimes are required of arguments to functions within the API (truth be told, this is a large challenge when using C and C++ libraries from C and C++ as well!). Libraries often do not document these requirements, and never expose this information programmatically (since C and C++ have no notion of lifetimes or syntax for expressing them). However, correctly maintaining lifetimes across FFI is vital to maintaining memory safety of a program!
We need better tooling for squaring this circle. I do not have a specific proposal for how to address this, but I’m hopeful that bright minds dedicated to solving it could make a meaningful improvement — until all libraries are written in memory safe languages, we’ll need to be able to interoperate safely.
Integer overflows
In C and C++, integer overflows are a classic source of memory corruption.
Memory safe languages address this by enforcing bounds checks (but not using
checked arithmetic everywhere!), however unsafe
blocks which attempt to
bypass these bounds checks for performance often forget to handle integer
overflows, introducing vulnerabilities.
I believe two avenues of research exist here, the first is better analysis tools
for identifying unhandled integer overflow in the presence of unsafe
blocks,
and the second is improving code generation and other optimizations such that
enabling checked arithmetic by default is practical.
Better static analysis for unsafe
Finally, building on the previous project, there is a general purpose need for
better static analysis tools for finding vulnerabilities in unsafe
, and
automatically suggesting safe idioms to replace unnecessary usage of unsafe
.
Ideally this work would be empirically driven, based on data sets from
RustSec as well as review of large codebases which make use unsafe
.
Conclusion
I believe even without these improvements, the case for migration to memory safe languages is acute and compelling. However, we must never be satisfied by the status quo, improvements which make adoption of memory safe languages easier, and these languages even safer, can have a profound impact on the safety of our overall computing experience.
It is my hope that engineers and researchers have found projects here they are excited to work on, and funding bodies have ideas for projects to fund.