@@ -29,6 +29,47 @@ use ringbuf::RingBuf;
29
29
30
30
31
31
/// A map based on a B-Tree.
32
+ ///
33
+ /// B-Trees represent a fundamental compromise between cache-efficiency and actually minimizing
34
+ /// the amount of work performed in a search. In theory, a binary search tree (BST) is the optimal
35
+ /// choice for a sorted map, as a perfectly balanced BST performs the theoretical minimum amount of
36
+ /// comparisons necessary to find an element (log<sub>2</sub>n). However, in practice the way this
37
+ /// is done is *very* inefficient for modern computer architectures. In particular, every element
38
+ /// is stored in its own individually heap-allocated node. This means that every single insertion
39
+ /// triggers a heap-allocation, and every single comparison should be a cache-miss. Since these
40
+ /// are both notably expensive things to do in practice, we are forced to at very least reconsider
41
+ /// the BST strategy.
42
+ ///
43
+ /// A B-Tree instead makes each node contain B-1 to 2B-1 elements in a contiguous array. By doing
44
+ /// this, we reduce the number of allocations by a factor of B, and improve cache effeciency in
45
+ /// searches. However, this does mean that searches will have to do *more* comparisons on average.
46
+ /// The precise number of comparisons depends on the node search strategy used. For optimal cache
47
+ /// effeciency, one could search the nodes linearly. For optimal comparisons, one could search
48
+ /// search the node using binary search. As a compromise, one could also perform a linear search
49
+ /// that initially only checks every i<sup>th</sup> element for some choice of i.
50
+ ///
51
+ /// Currently, our implementation simply performs naive linear search. This provides excellent
52
+ /// performance on *small* nodes of elements which are cheap to compare. However in the future we
53
+ /// would like to further explore choosing the optimal search strategy based on the choice of B,
54
+ /// and possibly other factors. Using linear search, searching for a random element is expected
55
+ /// to take O(Blog<sub>B</sub>n) comparisons, which is generally worse than a BST. In practice,
56
+ /// however, performance is excellent. `BTreeMap` is able to readily outperform `TreeMap` under
57
+ /// many workloads, and is competetive where it doesn't. BTreeMap also generally *scales* better
58
+ /// than TreeMap, making it more appropriate for large datasets.
59
+ ///
60
+ /// However, `TreeMap` may still be more appropriate to use in many contexts. If elements are very
61
+ /// large or expensive to compare, `TreeMap` may be more appropriate. It won't allocate any
62
+ /// more space than is needed, and will perform the minimal number of comparisons necessary.
63
+ /// `TreeMap` also provides much better performance stability guarantees. Generally, very few
64
+ /// changes need to be made to update a BST, and two updates are expected to take about the same
65
+ /// amount of time on roughly equal sized BSTs. However a B-Tree's performance is much more
66
+ /// amortized. If a node is overfull, it must be split into two nodes. If a node is underfull, it
67
+ /// may be merged with another. Both of these operations are relatively expensive to perform, and
68
+ /// it's possible to force one to occur at every single level of the tree in a single insertion or
69
+ /// deletion. In fact, a malicious or otherwise unlucky sequence of insertions and deletions can
70
+ /// force this degenerate behaviour to occur on every operation. While the total amount of work
71
+ /// done on each operation isn't *catastrophic*, and *is* still bounded by O(Blog<sub>B</sub>n),
72
+ /// it is certainly much slower when it does.
32
73
#[ deriving( Clone ) ]
33
74
pub struct BTreeMap < K , V > {
34
75
root : Node < K , V > ,
@@ -93,6 +134,8 @@ impl<K: Ord, V> BTreeMap<K, V> {
93
134
}
94
135
95
136
/// Makes a new empty BTreeMap with the given B.
137
+ ///
138
+ /// B cannot be less than 2.
96
139
pub fn with_b ( b : uint ) -> BTreeMap < K , V > {
97
140
assert ! ( b > 1 , "B must be greater than 1" ) ;
98
141
BTreeMap {
0 commit comments