Modern C++ Won't Save Us
I’m a frequent critic of memory unsafe languages, principally C and C++, and how they induce an exceptional number of security vulnerabilities. My conclusion, based on reviewing evidence from numerous large software projects using C and C++, is that we need to be migrating our industry to memory safe by default languages (such as Rust and Swift). One of the responses I frequently receive is that the problem isn’t C and C++ themselves, developers are simply holding them wrong. In particular, I often receive defenses of C++ of the form, “C++ is safe if you don’t use any of the functionality inherited from C” 1 or similarly that if you use modern C++ types and idioms you will be immune from the memory corruption vulnerabilities that plague other projects.
I would like to credit C++’s smart pointer types, because they do significantly help. Unfortunately, my experience working on large C++ projects which use modern idioms is that these are not nearly sufficient to stop the flood of vulnerabilities. My goal for the remainder of this post is to highlight a number of completely modern C++ idioms which produce vulnerabilities.
Hide the reference use-after-free
The first example I’d like to describe, originally from Kostya Serebryany, is
how C++’s std::string_view
can make it easy to hide use-after-free
vulnerabilities:
#include <iostream>
#include <string>
#include <string_view>
int main() {
std::string s = "Hellooooooooooooooo ";
std::string_view sv = s + "World\n";
std::cout << sv;
}
What’s happening here is that s + "World\n"
allocates a new std::string
,
and then is converted to a std::string_view
. At this point the temporary
std::string
is freed, but sv
still points at the memory that used to be
owned by it. Any future use of sv
is a use-after-free vulnerability. Oops!
C++ lacks the facilities for the compiler to be aware that sv
captures a
reference to something where the reference lives longer than the referent. The
same issue impacts std::span
, also an extremely modern C++ type.
Another fun variant involves using C++’s lambda support to hide a reference:
#include <memory>
#include <iostream>
#include <functional>
std::function<int(void)> f(std::shared_ptr<int>& x) {
return [&]() { return *x; };
}
int main() {
std::function<int(void)> y(nullptr);
{
std::shared_ptr<int> x(std::make_shared<int>(4));
y = f(x);
}
std::cout << y() << std::endl;
}
Here the [&]
in f
causes the lambda to capture values by reference. Then
in main
x
goes out of scope, destroying the last reference to the data,
and causing it to be freed. At this point y
contains a dangling pointer.
This occurs despite our meticulous use of smart pointers throughout. And yes,
people really do write code that handles std::shared_ptr<T>&
, often as an
attempt to avoid additional increment and decrements on the reference count.
std::optional<T>
dereference
std::optional
represents a value that may or may not be present, often
replacing magic sentinel values (such as -1
or nullptr
). It offers
methods such as value()
, which extract the T
it contains and raises an
exception if the the optional
is empty. However, it also defines
operator*
and operator->
. These methods also provide access to the
underlying T
, however they do not check if the optional
actually
contains a value or not.
The following code for example, simply returns an uninitialized value:
#include <optional>
int f() {
std::optional<int> x(std::nullopt);
return *x;
}
If you use std::optional
as a replacement for nullptr
this can produce
even more serious issues! Dereferencing a nullptr
gives a segfault (which is
not a security issue, except in older kernels). Dereferencing a nullopt
however, gives you an uninitialized value as a pointer, which can be a serious
security issue. While having a T*
with an uninitialized value is also
possible, these are much less common than dereferencing a pointer that was
correctly initialized to nullptr
.
And no, this doesn’t require you to be using raw pointers. You can get uninitialized/wild pointers with smart pointers as well:
#include <optional>
#include <memory>
std::unique_ptr<int> f() {
std::optional<std::unique_ptr<int>> x(std::nullopt);
return std::move(*x);
}
std::span<T>
indexing
std::span<T>
provides an ergonomic way to pass around a reference to a
contiguous slice of memory and a length. This lets you easily write code that
works over multiple different types; a std::span<uint8_t>
can point to
memory owned by a std::vector<uint8_t>
, a std::array<uint8_t, N>
, or
even a raw pointer. Failure to correctly check bounds is a frequent source of
security vulnerabilities, and in many senses span
helps out with this by
ensuring you always have a length handy.
Like all STL data structures, span
’s operator[]
method does not perform
any bounds checks. This is regrettable, since operator[]
is the most
ergonomic and default way people use data structures. std::vector
and
std::array
can at least theoretically be used safely because they offer an
at()
method which is bounds checked (in practice I’ve never seen this done,
but you could imagine a project adopting a static analysis tool which simply
banned calls to std::vector<T>::operator[]
). span
does not offer an
at()
method, or any other method which performs a bounds checked lookup.
Interestingly, both Firefox and Chromium’s backports of std::span
do
perform bounds checks in operator[]
, and thus they’ll never be able to
safely migrate to std::span
.
Conclusion
Modern C++ idioms introduce many changes which have the potential to improve
security: smart pointers better express expected lifetimes, std::span
ensures you always have a correct length handy, std::variant
provides a
safer abstraction for union
s. However modern C++ also introduces some
incredible new sources of vulnerabilities: lambda capture use-after-free,
uninitialized-value optional
s, and un-bounds-checked span
s.
My professional experience writing relatively modern C++, and auditing Rust code
(including Rust code that makes significant use of unsafe
) is that the
safety of modern C++ is simply no match for memory safe by default languages
like Rust and Swift (or Python and Javascript, though I find it rare in life to
have a program that makes sense to write in either Python or C++).
There are significant challenges to migrating existing, large, C and C++ codebases to a different language – no one can deny this. Nonetheless, the question simply must be how we can accomplish it, rather than if we should try. Even with the most modern C++ idioms available, the evidence is clear that, at scale, it’s simply not possible to hold C++ right.
-
I understood this to be referring to raw pointers, arrays-as-pointers, manual
malloc
/free
, and other similar features. However I think it’s worth acknowledging that given that C++ explicitly incorporated C into its specification, in practice most C++ code incorporates some of these elements. ↩︎