A proof of concept implementation of cyclic data structures in stable, safe, Rust.
This demonstrates the combined power of the static-rc crate and the ghost-cell crate.
A simple combination of 2 facts about data structures:
- They are pervasive and often used to store user-controlled input.
- They are extremely error-prone, juggling lifetime and aliasing concerns at the same time.
The combination of this two factors means that a single logical bug can result in Undefined Behavior, opening up the door to a multitude of attacks.
There are typically two obstacles, as mentioned: lifetime, and aliasing.
The state of the art recommendation today is one of:
- A
Vecwith indices instead of pointers. - An arena to handle lifetime, and some form of cell to handle aliasing.
Rc+RefCell.
Vec based solutions will blow up the algorithmic complexity of splice/split operations. In a linked list, splicing in
another list, or splitting a part of the list, are both O(1) operations, but transferring N elements from one Vec to
another is at least an O(N) operation.
Arena based solution will allow implementing all operations with their expected algorithmic complexity, but this comes at the cost of never being able to reclaim memory from the arena as long as a single data structure exists which references the arena.
Finally Rc + RefCell suffer from increased memory consumption and runtime overhead:
Rctypically embeds 2usizecounters: the strong and weak count.- An
Rcimplementation optimized for data structures could ditch the weak count.
- An
RefCelltypically embeds anisizecounter: to count the number of readers and writers.
Maybe.
This crate aims at refining the Rc + RefCell solution by substituting both of them with StaticRc and GhostCell
respectively:
GhostCellis a compile-time cell, splitting the aliasing compile-time checks to the zero-sizedGhostToken.StaticRcis a compile-time reference counted pointer, so is mostly free at run-time.- One exception: joining 2
StaticRcrequires checking that they both refer to the same allocation.
- One exception: joining 2
On the face of it, this seems fairly interesting:
- No memory overhead.
- No run-time overhead.
There's a catch, though.
GhostCell is a straightforward replacement, StaticRc is a bit more complicated. Typical implementations based on
Rc will clone it whenever necessary -- but cloning a StaticRc is impossible.
If you have split your StaticRc in halves, that is, you have 2 instances of StaticRc<T, 1, 2>, then you only have
those two instances. You can split one, but it changes its type. This makes temporarily holding onto one more
instance of a reference-counted pointer... impossible. There's no temporary instance.
To solve this problem, the nodes of the TripodList and TripodTree contain one extra instance of a pointer, compared
to the typical implementation. That is, a node from a TripodList contains 3 pointers, and one from a TripodTree
contains 4 pointers.
That's the same memory overhead as a special-purpose Rc with no weak count. And mostly the same run-time overhead too
as deploying the tripod pointer is a memory write, just as cloning a Rc, and retracting the tripod pointer is
another memory write, just as destroying a Rc. It may be slightly more efficient, and linear logic keeps one honest...
It may be possible to avoid this overhead altogether. LinkedList avoids it... and uses the experimental
GhostCursor instead. It's not clear whether GhostCursor is quite sufficient to implement all the functionality of
a list, though. And then there's the experimental issue...
Mostly?
The core functionality of GhostCell and StaticRc are really strong contenders:
GhostCell(the paper) was developed by Ralf Jung and co, and proven safe. The implementation hopefully is also safe.StaticRcis really basic reference-counting, just at compile-time.
The core functionality, though, is also slightly insufficient. This forces this crate to use extra, experimental functionality from its 2 core crates, and those are much less proven:
static_rc::lift*is at the core of this crate. None of the collections are too useful without it. Its core idea looks sound (scary eh?), though even if it is, the implementation may not be...GhostCursoris even more sketchy, safety-wise. If safe, it may allow writing the collections without an extra tripod pointer... May.
Yes.
At the very least, this crate is a proof-of-concept that GhostCell is really usable.
To the best of our knowledge, this is the first crate making extensive use of GhostCell, and there were some questions
as to whether it was possible to express algorithms with GhostCell. Having implemented a complete linked list and
binary tree around GhostCell I can confirm that it is practical enough.
Replacing RefCell by GhostCell means avoiding both the memory and run-time overhead of RefCell. That's a
significant advantage.
There are some ergonomic downsides, though:
- Using lifetime as a brand means that all the entire program needs to be wrapped within a closure.
- Having to pass the extra
GhostTokeneverywhere means that many traits -- such asDebug-- cannot be implemented on the collections.
Those were all foreseen, so no surprise there, but of course it's still annoying.
This crate also proves that StaticRc works, although it is perhaps not the best tool for cyclic data structures.
Finally, this crate provides a test bed for both of the above.
After putting them through their paces, I am happy to report that MIRI finds no issue when running the full test-suite.
One interesting avenue of investigation is automatic translation of the current code, stripping out StaticRc and/or
GhostCell.
As long as:
- The current code is safe -- meaning, the 2 crates used are safe.
- The translation preserves safety.
Then this opens up the ability to write entirely safe code and automatically translate it into simpler (though unsafe) code -- not unlike the work that compilers do.
Exciting, isn't it?
There is one difficulty, though, it's unclear whether the remove/split operations can be translated automatically:
- Merging 2 pools of nodes -- through
splice-- is trivially safe: the result is a coarser grain. - Splitting 1 pool of nodes into two, however, is not: the result is finer grain.
Today, the reference-counting and borrowing works globally, regardless of which pools of nodes a node belongs to.
However translating StaticRc<GhostCell<T>, N, D> into just NonNull<T> breaks this assumptions, and functional bugs
such as accidentally keeping a link between nodes across pools could lead to aliasing/lifetime issues.
Note: and even though tracing could solve the issue, it would be linear in the number of elements to trace at best, so would not make sense algorithmically.
And thanks for reading.