Buffers on the edge: Python and Rust

Sun, Oct 23, 2022

One of my least favorite kinds of bug is when two different systems are interacting and the result has bad behavior but it’s difficult to say which (if either!) system is at fault. This is one of those stories, about Python’s buffer protocol and Rust’s memory model.

Python buffer protocol

Python’s Buffer Protocol is a set of APIs which allow Python objects to expose their backing memory, so that 0-copy interoperability is possible between different data structures. For example, they can be used to seamlessly share memory between an image parsing library and numpy. They also support various metadata to enable more advanced interoperability, such as multi-dimensional arrays and arrays of different types. But, for the rest of this post we’re going to pretend they’re just a uint8_t * and a length for simplicity.

If you have a Python object and want to obtain its buffer, you can do so with memoryview in Python or PyObject_GetBuffer in C. If you’re defining a class and want to expose a buffer, you can do so in Python by… actually you can’t, only classes implemented in C can implement the buffer protocol. To implement the buffer protocol in C, you provide the bf_getbuffer and bf_releasebuffer functions which are called to obtain a buffer from an object and when that buffer is being released, respectively.

Shifting gears slightly, let’s talk about data races. Data races are a type of race condition that happens when a write and a read or write to the same address occur from different threads without synchronization. Synchronization could mean a lock, or an explicit atomic operation. Data races are undefined behavior in C¹. Undefined behavior is Latin for “the code will often do what you want, but the compiler is free to do whatever it wants including cause security vulnerabilities”. Undefined behavior should be avoided.

Are data races possible with Python buffer objects? Sadly, yes. Imagine we have an object which implements the buffer protocol, and we request two buffers from it. This will give us two pointers to the same memory location. Now we spin up two threads, one reading from the buffer and the other writing to it. We’ve got ourselves a data race.

Perhaps you are thinking, “doesn’t the GIL prevent this?” If we are imagining pure Python code, then yes, the GIL would prevent this – it’s a lock, which means accesses are synchronized. But one of the goals of the buffer protocol is to allow C extensions to release the GIL while processing buffers. Therefore the reads and writes to our buffer could be coming from a C extension which has released the GIL – now we have no synchronization.

If we imagine that our reading and writing code comes from the same C extension we might say that’s a bug in the extension. But what if they come from totally separate packages (the point of the buffer protocol!)? Neither side is buggy, it’s totally correct to either read or write from a buffer. That’d mean the Python code which invoked them in parallel was buggy. But Python code (even buggy Python code!) is not supposed to be able to trigger C-level undefined behavior, that’s part of the point of using a high-level language like Python. It seems the design of the buffer protocol and C’s notion of data race undefined behavior may not play nicely with this.

Rust

Let’s talk about Rust. In Rust a sequence of objects in memory, and their length, are represented with a slice. A slice of bytes is written &[u8], or &mut [u8] for a mutable slice. Rust implements a simple (but powerful!) rule: references may be mutable XOR shared. This means that if a mutable reference to memory exists then that reference must be the only one that exists, and vice versa if there are multiple references to a piece of memory then they must be immutable references. This has basic implications, like that if you have a &[u8] it must be the case that no one is mutating it behind your back, because there can’t be any mutable references. This is different from C/C++’s notion of const, which means “this reference may not be used for mutation, but other mutable references may exist”.

Rust also introduces a notion of “soundness”, which is a stronger notion than C’s undefined behavior. Most C undefined behavior is defined by what happens at runtime. Soundness is about how a function could be used, regardless of how it actually is used. A function is sound if it’s impossible to trigger undefined behavior, with any combination of arguments it takes. Inversely, a function is unsound if it’s a safe function (i.e. not declared unsafe fn) and it’s possible to trigger undefined behavior with it. The Rust community considers all instances of unsoundness to be security issues, even if they’re improbable in practice. Using unsafe to violate the mutable XOR shared rule is undefined behavior, and thus unsound.

Putting it all together

If we have a Python buffer, and we want to represent the data in Rust, how should we do so? The natural answer would be a &[u8], but as you may have picked up, in the face of the possibility of concurrent writes, this would be unsound. Similarly, an &mut [u8] is unacceptable because the Python buffer protocol provides no assurances that only one mutable buffer is handed out at a time. Importantly, because Rust’s notion of unsoundness is a source-code level concern even if Python code never actually creates multiple buffers in this fashion the code would still be unsound.

pyo3 is a popular Rust library for binding to the CPython C-API. Its solution to this is interior mutability, which is a pattern in Rust code where structures safely encapsulate mutation with shared references. In pyo3 a Python buffer’s contents is represented as &[ReadOnlyCell<u8>]. This is safe and sound, but unfortunately struggles with interoperability.

The challenge is that if you want to pass some bytes to a Rust library to parse them (or do any other processing for that matter), the library almost certainly expects a &[u8], and there’s no way to turn a &[ReadOnlyCell<u8>] into a &[u8] safely, without allocating and copying. And of course, the whole point of the Python buffer protocol is to avoid these sorts of inefficiencies.

Therefore, the regrettable solution is that, right now, there is no way to have all three of: efficiency, interoperability, and soundness.

A better future?

That’s the current state of the world, what could we do to improve things?

The simplest answer I can come up with is for Python’s buffer protocol to implement Rust’s mutable XOR shared semantics. Providing such semantics would also address the possibility of undefined behavior from C code. It could further be done in a backwards compatible way by providing a flag that allows implementors of the buffer protocol to signal that they provide these semantics – and thus can safely be represented as &[u8]. In fact, an implementor of the buffer protocol could provide these semantics today, the only problem is that code requesting a buffer would have no way of knowing that they were adhered to.

Perhaps there are other solutions that also address this problem too! I’m very excited to hear other people’s thoughts on how we can address this. As the presence of Python extension modules written in Rust becomes more prominent finding an efficient, interoperable, and sound way of handling Python buffers in Rust will become important.

The exact language of the spec is: “The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.” C11 (ISO/IEC 9899:2011) section 5.1.2.4 ↩︎