Modern C++ Won't Save Us

I’m a frequent critic of memory unsafe languages, principally C and C++, and how they induce an exceptional number of security vulnerabilities. My conclusion, based on reviewing evidence from numerous large software projects using C and C++, is that we need to be migrating our industry to memory safe by default languages (such as Rust and Swift). One of the responses I frequently receive is that the problem isn’t C and C++ themselves, developers are simply holding them wrong. In particular, I often receive defenses of C++ of the form, “C++ is safe if you don’t use any of the functionality inherited from C” 1 or similarly that if you use modern C++ types and idioms you will be immune from the memory corruption vulnerabilities that plague other projects.

I would like to credit C++’s smart pointer types, because they do significantly help. Unfortunately, my experience working on large C++ projects which use modern idioms is that these are not nearly sufficient to stop the flood of vulnerabilities. My goal for the remainder of this post is to highlight a number of completely modern C++ idioms which produce vulnerabilities.

Hide the reference use-after-free

The first example I’d like to describe, originally from Kostya Serebryany, is how C++’s std::string_view can make it easy to hide use-after-free vulnerabilities:

#include <iostream>
#include <string>
#include <string_view>

int main() {
  std::string s = "Hellooooooooooooooo ";
  std::string_view sv = s + "World\n";
  std::cout << sv;
}

What’s happening here is that s + "World\n" allocates a new std::string, and then is converted to a std::string_view. At this point the temporary std::string is freed, but sv still points at the memory that used to be owned by it. Any future use of sv is a use-after-free vulnerability. Oops! C++ lacks the facilities for the compiler to be aware that sv captures a reference to something where the reference lives longer than the referent. The same issue impacts std::span, also an extremely modern C++ type.

Another fun variant involves using C++’s lambda support to hide a reference:

#include <memory>
#include <iostream>
#include <functional>


std::function<int(void)> f(std::shared_ptr<int>& x) {
    return [&]() { return *x; };
}

int main() {
    std::function<int(void)> y(nullptr);
    {
        std::shared_ptr<int> x(std::make_shared<int>(4));
        y = f(x);
    }
    std::cout << y() << std::endl;
}

Here the [&] in f causes the lambda to capture values by reference. Then in main x goes out of scope, destroying the last reference to the data, and causing it to be freed. At this point y contains a dangling pointer. This occurs despite our meticulous use of smart pointers throughout. And yes, people really do write code that handles std::shared_ptr<T>&, often as an attempt to avoid additional increment and decrements on the reference count.

std::optional<T> dereference

std::optional represents a value that may or may not be present, often replacing magic sentinel values (such as -1 or nullptr). It offers methods such as value(), which extract the T it contains and raises an exception if the the optional is empty. However, it also defines operator* and operator->. These methods also provide access to the underlying T, however they do not check if the optional actually contains a value or not.

The following code for example, simply returns an uninitialized value:

#include <optional>

int f() {
    std::optional<int> x(std::nullopt);
    return *x;
}

If you use std::optional as a replacement for nullptr this can produce even more serious issues! Dereferencing a nullptr gives a segfault (which is not a security issue, except in older kernels). Dereferencing a nullopt however, gives you an uninitialized value as a pointer, which can be a serious security issue. While having a T* with an uninitialized value is also possible, these are much less common than dereferencing a pointer that was correctly initialized to nullptr.

And no, this doesn’t require you to be using raw pointers. You can get uninitialized/wild pointers with smart pointers as well:

#include <optional>
#include <memory>

std::unique_ptr<int> f() {
    std::optional<std::unique_ptr<int>> x(std::nullopt);
    return std::move(*x);
}

std::span<T> indexing

std::span<T> provides an ergonomic way to pass around a reference to a contiguous slice of memory and a length. This lets you easily write code that works over multiple different types; a std::span<uint8_t> can point to memory owned by a std::vector<uint8_t>, a std::array<uint8_t, N>, or even a raw pointer. Failure to correctly check bounds is a frequent source of security vulnerabilities, and in many senses span helps out with this by ensuring you always have a length handy.

Like all STL data structures, span’s operator[] method does not perform any bounds checks. This is regrettable, since operator[] is the most ergonomic and default way people use data structures. std::vector and std::array can at least theoretically be used safely because they offer an at() method which is bounds checked (in practice I’ve never seen this done, but you could imagine a project adopting a static analysis tool which simply banned calls to std::vector<T>::operator[]). span does not offer an at() method, or any other method which performs a bounds checked lookup.

Interestingly, both Firefox and Chromium’s backports of std::span do perform bounds checks in operator[], and thus they’ll never be able to safely migrate to std::span.

Conclusion

Modern C++ idioms introduce many changes which have the potential to improve security: smart pointers better express expected lifetimes, std::span ensures you always have a correct length handy, std::variant provides a safer abstraction for unions. However modern C++ also introduces some incredible new sources of vulnerabilities: lambda capture use-after-free, uninitialized-value optionals, and un-bounds-checked spans.

My professional experience writing relatively modern C++, and auditing Rust code (including Rust code that makes significant use of unsafe) is that the safety of modern C++ is simply no match for memory safe by default languages like Rust and Swift (or Python and Javascript, though I find it rare in life to have a program that makes sense to write in either Python or C++).

There are significant challenges to migrating existing, large, C and C++ codebases to a different language – no one can deny this. Nonetheless, the question simply must be how we can accomplish it, rather than if we should try. Even with the most modern C++ idioms available, the evidence is clear that, at scale, it’s simply not possible to hold C++ right.


  1. I understood this to be referring to raw pointers, arrays-as-pointers, manual malloc/free, and other similar features. However I think it’s worth acknowledging that given that C++ explicitly incorporated C into its specification, in practice most C++ code incorporates some of these elements. ↩︎