-
Notifications
You must be signed in to change notification settings - Fork 321
Description
I'm writing a string-interning data structure in a memory-constrained environment. The strings effectively have 'static lifetime, so I don't need to support deallocation. Therefore, I just append null-terminated strings to a Vec<u8> buffer and identify a string by its (u32) offset into this buffer.
Then, we can query a string by its identifier by starting at the given offset and finding the null byte. But, we'd also like to see if a string is already in the buffer and get its offset if so. I'd like to use hashbrown for maintaining this relation from String to u32 without duplicating each string in memory twice.
The raw entry API is almost there for supporting this. When looking up a string, we can compute its hash and then use RawEntryBuilder::from_hash.
struct Offset(u32);
impl Hash for Offset {
fn hash<H: Hasher>(&self, h: &mut H) {
panic!("We shouldn't be hashing the offsets directly!");
}
}
struct InternedTable {
buf: Vec<u8>,
index: HashMap<Offset, ()>,
}
impl InternedTable {
fn lookup(&self, key: &str) -> Option<u32> {
let key_hash = ...;
let result = self.index.raw_entry().from_hash(key_hash, |offset| {
let offset = offset.0 as usize;
key.as_bytes() == &self.buf[offset...min(offset + key.len(), buf.len())]
});
result.map(|(k, v)| k)
}
}However, doing the same for insertions doesn't quite work. The essential issue is that the raw entry API only lets us control the hash of a single key, and inserting may require recomputing hashes of other keys when resizing.
impl InternedTable {
fn insert(&mut self, key: &str) -> u32 {
let key_hash = ...;
let entry = self.index.raw_entry_mut().from_hash(key_hash, |offset| { ... });
let vacant_entry = match entry {
// String is already in the table, so return its existing offset.
RawEntryMut::Occupied(entry) => return *entry.key(),
RawEntryMut::Vacant(entry) => entry,
};
let offset = self.buf.len() as u32;
self.buf.extend(key.as_bytes());
self.buf.append(0u8);
// Panics here from `Hash` implementation called from `RawTable::resize`
vacant_entry.insert_hashed_nocheck(key_hash, Offset(offset), ());
offset
}
}Today, I get around this by sneaking in a pointer to buf via a thread-local to the Hash implementation. I could also stash a Rc<RefCell<...>> to the buffer on the hash builder for the same effect. However, both of these solutions are ugly, and given how close it is, it feels to me like the RawEntry API should support this use case.
Let me know if accommodating this is acceptable and I can submit a PR. It looks like it should be pretty easy since RawTable already takes in a hasher: impl Fn(&T) -> u64 for many of its methods.