Defining the memory safe problem
I’m a staunch advocate for need to migrate away from memory unsafe programming languages, in order to address the endemic security issues they produce. This sentence contains a number of terms that are worth defining, and a number of asterisks that are worth explicating. My objective with this blog post is to add increased precision to this discussion.
What is a memory unsafe language?
It’s a programming language which, by default, allows code to introduce memory-related vulnerabilities (use after free, buffer over/under flow, use of uninitialized memory, type confusion) and undefined behavior, and as a result code written in this language suffers an increased rate of security issues. Each element of this definition is important.
We’re interested in languages that have this behavior by default: Java’s sun.misc.Unsafe
, Rust’s unsafe
, or Python’s ctypes
do not meet this definition.
An empirical increase in the rate of vulnerabilities is a necessary condition because some languages technically meet the first two criteria, but the potential for vulnerabilities is sufficiently marginal that in practice they do not occur at a problematic rate. Go is an important example of such a language.
What software are we concerned with?
We’re principally concerned with systems software which has performance requirements that necessitate using a language that provides control over memory allocations and layout, and does not have a garbage collector. If one can use a garbage collected language, there is a very large design space of programming languages that are memory safe.
We’re concerned with software that has a threat model involving the processing of untrusted data or operations: network clients and servers, media parsers, kernels, and sandbox supervisor processes.
We’re principally concerned with larger codebases: empirically, the challenges of managing memory unsafe programming languages with risk mitigation strategies (e.g., fuzzing, static analysis, red teaming) appear to break down as code bases and engineering teams grow.
We’re not concerned with software that has no attack surface, e.g. a physics simulation where all the parameters and data it runs on are trusted. Even if such software were replete with vulnerabilities, in the absence of any attack surface, there’s no mechanism for an attacker to exploit them. It does, however, bear acknowledging that it’s not uncommon for software to transition from one use case to another (e.g., a research prototype which becomes a product), at which point these security issues become relevant.
Finally, we’re concerned with general purpose software, not software designed and implemented for more constrained and regulated environments, such as automotive, aeronautic, or weapons control software. Software in those environments has a set of specialized toolchains such that even where it is nominally written in a memory-unsafe programming language it bears no resemblance to ordinary code written in that language. By the same turn, the existence of these safe toolchains does not make these programming languages safe, because they are not used outside of these specialized domains (because of the burdens they impose on software engineers).
What problems are we addressing?
We’re addressing a set of vulnerabilities and undefined behavior which are induced by the programming language. Notably this is not the full set of all software security issues, logic vulnerabilities are not addressed at the language level (while some claim that safe languages are also more expressive, and therefore reduce the rate of logic vulnerabilities, this claim does not yet have empirical support).
A significant class of logic vulnerability, which have symptoms similar to language-induced memory safety vulnerabilities, is vulnerabilities in JIT compilers where an attacker can cause them to emit incorrect, exploitable, code (e.g., in the browser context). Such vulnerabilities are not addressed by implementing JIT compilers in memory-safe programming languages (memory corruption within the compiler itself, as opposed to in the generated code, can of course be addressed by the use of a safe language).