Feature : Binary Serialization for Index and Index Player Tool #239
+1,198
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Binary Serialization for Index and Index Player Tool
Overview
This PR introduces binary serialization capabilities for the
Indexstructure, along with a web-based debugging tool for exploring FSM indices.New Features
1. Index Serialization (
save()andload())Added two new methods to the
Indexstructure:save(path: &Path) -> Result<()>Serializes the Index to a compressed binary file. This method:
flate2)Usage:
load(path: &Path) -> Result<Index>Deserializes an Index from a compressed binary file. This static method:
Usage:
Benefits:
2. Binary Format Specification
The serialization uses a custom binary format optimized for FSM representation:
Format Structure (uncompressed)
vocab_sizeeos_token_idinitial_state_idnum_final_statesfinal_statesindex_typenum_statesstate_idnum_transitionstoken_idnext_state_idKey Features:
index_typefield allows for future format extensionsFull specification available in
INDEX_BINARY_FORMAT.md.3. Index Player Tool (
tools/index_player.html)A standalone HTML/CSS/JavaScript tool for debugging and exploring FSM indices.
Purpose
The Index Player serves as a debug and explanation tool that allows developers to:
How It Works
The tool is a fully static, single-file application that runs entirely in the browser:
Load Index File: Upload a binary
.outlinesfile created withIndex::save()DecompressionStreamAPILoad Vocabulary (Optional): Upload a
vocab.jsonfile from HuggingFaceInteractive Exploration:
Visual Feedback:
Screenshot
Use Cases
Testing
Added comprehensive Rust tests for serialization:
test_save_and_load: Verifies round-trip serialization preserves Index integritytest_save_and_load_multibyte: Tests with multi-byte Unicode characters (emojis)test_load_nonexistent_file: Error handling for missing filestest_load_corrupted_file: Error handling for invalid datatest_save_preserves_file_size: Validates compression is workingAll tests pass successfully.
Dependencies
flate2crate for gzip compression/decompressionFiles Changed
src/index.rs: Addedsave()andload()methodssrc/error.rs: AddedIOErrorvariant for I/O operationsCargo.toml: Addedflate2dependencyINDEX_BINARY_FORMAT.md: Complete binary format specificationtools/index_player.html: New interactive debugging tooltests/create_index_binary.py: Example script for creating binary indicesBreaking Changes
None. This is a purely additive change.