Introduction to Memory Unsafety for VPs of Engineering

Mon, Aug 12, 2019

What is memory unsafety?

Memory unsafety is a property of some programming languages where they allow the programmer to introduce certain types of bugs and allow these bugs to cause serious security issues. These bugs deal with errors in how memory is used spatially and temporally.

To begin understanding these bugs, we’ll consider the example of an application that maintains to do lists for many users. We’ll first look at the spatial errors.

If I have a to do list with ten items, and I ask for the eleventh item, what should happen? Clearly I should receive an error of some sort. And the same should occur if I ask for the negative first item. In a memory unsafe language, unless the programmer explicitly checks which item they’re asking for against the length of the list, the item that happens to be at that position in memory is fetched, whereas in a memory safe language an error is always triggered, often crashing the program. Crashing the program may seem severe, but if we simply allow someone to ask for the eleventh element of a ten item list, they may get to read the first item out of someone else’s list! Similarly the negative first item may be the last item in another user’s list. This would be a severe security vulnerability, and while crashing the program would be unfortunate, it’s still a better choice than letting users steal each other’s data. Allowing programs to read past the front or end of a list is called an out-of-bounds read.

A closely related spatial vulnerability is an out-of-bounds write. In this case imagine we tried to change the eleventh or negative first item in our to do list. Now we are changing someone else’s to do list.

The other type of error is a temporal error. Imagine I delete a to do list, and then later I request the first item of that list. Clearly I should receive an error, you can’t get items from a deleted to do list. A memory unsafe language allows programs to fetch memory that they’ve said they are done with. In this case, that location in memory may now contain someone else’s to do list! This is called a use-after-free vulnerability.

Out-of-bounds reads, out-of-bounds writes, and use-after-frees form the majority of memory unsafety vulnerabilities, and for many projects, the majority of vulnerabilities period. While the examples I described showed stealing memory as the impact of these vulnerabilities, they can also lead to remote-code-execution exploits. Memory safe languages prevent these by default — the programmer has to go out of their way to introduce these vulnerabilities, whereas in memory unsafe languages the programmer has to do extra work to prevent them! The most prominent languages which are memory unsafe are C, C++, and assembly. Nearly every programming language besides these three is memory safe, meaning programmers cannot introduce these bugs or these bugs cannot cause a security issue: Javascript, Rust, Python, Java, Ruby, and Swift are all examples of memory safe languages.

Vulnerabilities caused by memory unsafety are the basis of a great many impactful security issues. The Slammer worm from 2003 was a buffer overflow (out-of-bounds write). So was WannaCry (out-of-bounds write). The Trident exploit against iPhones used three different memory unsafety vulnerabilities (two use-after-frees and an out-of-bounds read). HeartBleed was memory unsafety (out-of-bounds read). Stagefright on Android too (out-of-bounds writes). The Ghost vulnerability in glibc? You betcha (out-of-bounds write).

These vulnerabilities and exploits, and many others, are made possible because C and C++ are not memory safe.

Who is this for?

This is for leaders in software engineering organizations that use memory unsafe languages like C and C++, particularly those that write security sensitive software such as operating systems, network servers, and desktop software.

My goal is to introduce you the perils of continued usage of memory unsafety, and suggest alternatives for your organization.

How common are vulnerabilities due to memory unsafety?

Extremely. A recent study found that 60-70% of vulnerabilities in iOS and macOS are caused by memory unsafety. Microsoft estimates that 70% of all vulnerabilities in their products over the last decade have been caused by memory unsafety. Google estimated that 90% of Android vulnerabilities are memory unsafety. An analysis of 0-days that were discovered being exploited in the wild found that more than 80% of the exploited vulnerabilities were due to memory unsafety ¹.

Organizations which write large amounts of C and C++ inevitably produce large numbers of vulnerabilities that can be directly attributed to memory unsafety. These vulnerabilities are exploited, to the peril of hospitals, human rights dissidents, and health policy experts. Using C and C++ is bad for society, bad for your reputation, and it’s bad for your customers.

Are there other perils of memory unsafety, besides security?

But of course. Memory unsafety also impacts stability, developer productivity, and application performance.

Based on my experience as a security engineer for Firefox, a significant number of crashes experienced by users have their roots in memory unsafety. Even when these crashes are not security sensitive they are still a very poor experience for users.

Worse, these bugs can be incredibly difficult for developers to track down. Memory corruption can often cause crashes to occur very far from where the bug actually is. When multi-threading is involved, additional bugs can be triggered by slight differences in which thread runs when, leading to even more difficult to reproduce bugs. The result is that developers often need to stare at crash reports for hours in order to ascertain the cause of a memory corruption bug. These bugs can remain unfixed for months, with developers absolutely convinced a bug exists, but have no idea of how to make progress on uncovering its cause and fixing it.

Finally, there is performance. In decades past, one could rely on CPUs getting significantly faster every year or two. This is no longer the case, instead CPUs now grow more cores. To take advantage of these, developers are tasked with writing multi-threaded code.

Unfortunately, multi-threading plays very poorly with memory unsafety, and tends to exacerbate both the stability and security problems inherent in memory unsafe languages. As a result, efforts to take advantage of multi-core CPUs are often intractable in C and C++. Mozilla had multiple failed attempts to introduce multi-threading into Firefox’s C++ CSS subsystem before finally (successfully) rewriting the system in multi-threaded Rust.

What is the alternative to memory unsafe languages?

Using memory safe languages! There are lots of great ones to choose from. Writing an operating system kernel or web browser? Consider Rust! Building for iOS and macOS? Swift’s got you covered. Network server? Go’s a fine choice. These are just a few examples, there are many other excellent memory safe languages to choose from (and many other wonderful use case pairings!).

Changing the programming language your organization uses is not something to be undertaken lightly. It means changing the skills you’re looking for when you hire, it means retraining your workforce, it means rewriting large amounts of code. Nonetheless, I believe in the long term this is required, so I’d like to lay out why alternatives to adopting a new programming language have not been successful.

If we take for granted that using a memory unsafe language will produce some number of vulnerabilities, the question we’d want to ask is: are there techniques we can undertake to reduce this risk, without forcing ourselves to entirely change programming languages? And the answer is absolutely yes. Not all projects written in memory unsafe languages are equally unsafe and unreliable.

Some practices which can lower the risk of using a memory unsafe language are:

Using some modern C++ idioms which can help produce more safe and reliable code
Using fuzzers and sanitizers to help find bugs before they make it into production
Using exploit mitigations to help increase the difficulty of exploiting vulnerabilities
Privilege separation so that even when a vulnerability is exploited, the blast radius is smaller

These practices meaningfully lower the risk of using a memory unsafe language, and if I’ve failed to convince you to change languages, and you are going to continue to write C and C++, adopting these is an imperative. Unfortunately, they are also woefully insufficient.

The people who are the forefront of developing modern C++ idioms, fuzzers, sanitizers, exploit mitigations, and privilege separation techniques are browser and operating system developers — precisely the groups I highlighted at the start with statistics about the prevalence of memory unsafety. Despite these teams' investment in these techniques, their use of memory unsafe languages weighs them down. At pwn2own, a large hacking competition, in 2019 over half of vulnerabilities exploited in these products were due to memory unsafety, and with one exception, every successful attack exploited at least one memory unsafety vulnerability.

Is dropping C and C++ really practical?

Hopefully by now I’ve convinced you that memory unsafe languages like C and C++ are fundamental root causes of huge swathes of the insecurity in our products, and that while there are practices you can undertake to reduce the risk, you can’t get anywhere close to eliminating it. All of which may still leave you with a feeling that changing the programming language you use, to produce millions of lines of code, is an overwhelmingly large task. By breaking it down into manageable pieces, we can start making progress — our goal is not one big-bang rewrite-the-world, but rather to make progress towards reducing our risk.

The first place to start is with brand new projects. For these, you have the choice to simply not choose a memory unsafe language. These have the lowest risk, because you do not need to start by rewriting any code, though projects like this often do require improvements to testing or deployment infrastructure to support a new programming language. This was the approach taken in ChromeOS’s CrosVM, a brand new component of the operating system.

If you don’t have new projects, the next place to look for opportunities to use a memory safe language are new components of an existing project. Several of the memory safe languages have first-class support for interoperating with C and C++ codebases (both Rust and Swift, for example). This has a slightly higher initial investment required, as it requires integration into build systems, as well as building abstractions in a new language for objects and data that need to be passed across the boundary between the two languages. This is the strategy that was successfully used when WebAuthn was implemented as a new component of Firefox and by my own project to enable writing Linux kernel modules in Rust.

The thing both of these first two approaches have in common is they deal with new code. This has the advantage of having well defined interaction points with existing code, and not needing to rewrite anything to get started on the effort. It also gives you a chance to stem the bleeding: no new components in a memory unsafe language, and we’ll deal with the existing code incrementally. For projects that don’t have any natural new component to get started with using a memory safe language, adoption is more challenging.

In this case you need to look for some existing component to rewrite from a memory unsafe language to a memory safe language. It’s best if the component you choose is something where you were already considering a rewrite: maybe for performance, or for security, or because the code was too difficult to maintain. You should attempt to pick something with as small a scope as possible for your first memory safety rewrite, in order to help the project be successful and ship as quickly as possible; this helps minimize the risk inherent in a rewrite. Stylo, the rewrite of Firefox’s CSS engine in Rust, is a successful example of this approach.

Regardless of which approach is the right fit for your organization, there are a few things to keep in mind to maximize your chances of success. The first is to make sure you have internal champions and senior engineers who can provide code reviews and mentoring in a language that will be new to many team members. The natural extension of this is to make sure that engineers who will be working in a new language have resources available to them like books, trainings, or internal guides. Finally, you’ll want to make sure you have the same shared infrastructure for your new language that you have for your old one, such as build system, test, deployment, crash reporting, and other integrations.

Conclusion

Adopting a new programming language and beginning the process of migrating to it is not an easy task. It requires planning, resourcing, and ultimately an investment from your entire organization. Life would be much easier if we didn’t have to contemplate such things. Unfortunately, a review of the data makes clear we simply cannot consider continuing to use memory unsafe languages for security sensitive projects.

The data bears out, over and over again, that when projects use memory unsafe languages like C and C++ they are burdened by an avalanche of resulting security vulnerabilities. No matter how talented the engineers, how great the investment in privilege reduction and exploit mitigations, memory unsafety simply results in too many bugs. And these bugs destroy security, as well as stability and productivity.

Fortunately, we do not need to be satisfied with the status quo. The last few years have produced a groundswell of fantastic alternatives to C and C++, such as Rust, Swift, and Go, amongst many others. And this means we don’t have to wear memory corruption vulnerabilities as an albatross around our necks for years and years to come, as long as we choose not to. I look forward to a time when choosing to use a memory unsafe language is considered as negligent as not having multi-factor-authentication or not encrypting data in transit.

This is specifically a measure of software vulnerabilities, it does not include things like credential phishing, which are incredibly common. Incidentally, credential phishing itself is another security issue where we know how to defeat it once and for all — deploy security keys with WebAuthn. This will be the subject of a future article. ↩︎