Concurrency and performance

Having code that is easier to reason about and where the state cannot be changed is even more important in multithreaded scenarios. This prevents so-called anomalies (or side effects) where the state of an object is changed outside a dependent thread.

Locks are generally made to change the state of a shared object—they secure critical sections, which only a single thread can modify at any given time. Other threads have to "line up" and wait for the lock to be released to access the part as well. In Rust, this is called a mutex.

Locks and mutex zones are bad for the following reasons:

  • They have to be in the right order (acquired and released).
  • What happens when a thread panics in a mutex zone?
  • They are hard to integrate seamlessly into the part of the program that they protect.
  • They are a bottleneck for performance.

Immutability is a simple way to avoid all of these, and there are many immutable data structure crates available, including one with persistent data structures called Rust Persistent Data Structures (RPDS) (https://crates.io/crates/rpds), that utilize a copy-on-write approach with versioning to capture state changes. Since these changes build on top of each other, threads can fully read one consistent object state at a time without having to wait or acquire a lock.

Lock-free data structures are a specialized version of data structures that are very challenging to implement. These data structures use atomic operations to modify important parts (for example, the head pointer in a stack) and thereby achieve excellent performance without locking caveats.
Persistent data structures are a take on creating data structures that are as efficient and mutable as their traditional counterparts, but better suited for concurrency. This is achieved by keeping the original data immutable and storing versioned change sets.

The idea of immutable data is best thought of in the context of functional programming. Functional programming is built on the principle of mathematical functions. A function is a relation of two sets of data (typically X and Y), where each element of X has exactly one element in Y that it maps to using the f function ( in short: where ).

As a consequence, the input data, X, will not be changed to produce output data, Y, making it easy to run the f function in parallel. The downside is the increased cost at runtime: regardless of the operation, whether it's only to flip a bit on the input data or to overhaul everything, the result is always a full copy.

To reduce this inefficiency, the Gang of Four's decorator pattern on X's iterator can be used to stack up only the changes and execute them on every call, reducing runtime complexity and avoiding multiple copies of the output data. A problem that remains is that if the input and the output are large, a lot of memory is required. This is a tricky situation and can only be avoided by the programmer thinking thoroughly about decomposing the function better.