-
Notifications
You must be signed in to change notification settings - Fork 223
feat(kuzu): add HNSW vector index support #1109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements HNSW vector index support for Kuzu following the same pattern as the Postgres implementation in PR cocoindex-io#1050. Changes: - Remove blanket "Vector indexes are not supported for Kuzu yet" error - Add validation to accept HNSW and reject IVFFlat with clear error message - Implement CREATE_VECTOR_INDEX and DROP_VECTOR_INDEX Cypher generation - Map cocoindex HNSW parameters to Kuzu format (m→mu/ml, ef_construction→efc) - Add vector index lifecycle management (create, update, delete) - Install Kuzu vector extension automatically when needed - Support all similarity metrics (cosine, l2, dotproduct) Technical details: - Add VectorIndexState struct to track index configuration - Update SetupState and GraphElementDataSetupChange for index tracking - Implement diff_setup_states logic for index change computation - Add vector index compatibility checking in check_state_compatibility - Integrate vector index operations in apply_setup_changes Fixes cocoindex-io#1055 Related to cocoindex-io#1051 Follows pattern from cocoindex-io#1050
src/ops/targets/kuzu.rs
Outdated
Ok( | ||
if desired.referenced_node_tables != existing.referenced_node_tables { | ||
SetupStateCompatibility::NotCompatible | ||
} else if desired.vector_indexes != existing.vector_indexes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should revert change at this place.
Whether or not having vector index change shouldn't affect compatibility of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, done! Removed that vector index compatibility check. Commited the latest changes...
The vector index changes should now operate without affecting data compatibility. This should addresses feedback from @georgeh0
Hi @0xTnxl, please fix the failing checks:
Thanks! |
Alright @georgeh0 ... I'll work on them right away |
Hey @georgeh0, fixed the compilation error, also added tests for
Thanks! Please let me know if there's any more corrections |
src/ops/targets/kuzu.rs
Outdated
#[test] | ||
fn test_vector_index_state_equality() { | ||
let state1 = VectorIndexState { | ||
field_name: "embedding".to_string(), | ||
metric: VectorSimilarityMetric::CosineSimilarity, | ||
method: Some(VectorIndexMethod::Hnsw { | ||
m: Some(16), | ||
ef_construction: Some(200), | ||
}), | ||
}; | ||
|
||
let state2 = VectorIndexState { | ||
field_name: "embedding".to_string(), | ||
metric: VectorSimilarityMetric::CosineSimilarity, | ||
method: Some(VectorIndexMethod::Hnsw { | ||
m: Some(16), | ||
ef_construction: Some(200), | ||
}), | ||
}; | ||
|
||
let state3 = VectorIndexState { | ||
field_name: "embedding".to_string(), | ||
metric: VectorSimilarityMetric::L2Distance, // Different metric | ||
method: Some(VectorIndexMethod::Hnsw { | ||
m: Some(16), | ||
ef_construction: Some(200), | ||
}), | ||
}; | ||
|
||
assert_eq!(state1, state2); | ||
assert_ne!(state1, state3); | ||
} | ||
|
||
#[test] | ||
fn test_vector_index_state_serialization() { | ||
let state = VectorIndexState { | ||
field_name: "embedding".to_string(), | ||
metric: VectorSimilarityMetric::CosineSimilarity, | ||
method: Some(VectorIndexMethod::Hnsw { | ||
m: Some(16), | ||
ef_construction: Some(200), | ||
}), | ||
}; | ||
|
||
// Test serialization | ||
let serialized = serde_json::to_string(&state).unwrap(); | ||
assert!(serialized.contains("embedding")); | ||
assert!(serialized.contains("CosineSimilarity")); | ||
|
||
// Test deserialization | ||
let deserialized: VectorIndexState = serde_json::from_str(&serialized).unwrap(); | ||
assert_eq!(state, deserialized); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we don't need these two tests. They're natural properties already guaranteed by these standard macros and we don't need to have extra code to test it.
(exhaustive tests also add some difficulty to maintain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yh, you're right @georgeh0, just wanted to make sure everything added up correctly, I'll prune out those tests rn
Thanks @0xTnxl ! One last thing to confirm: have you get a chance to bring up Kuzu and test it end to end with an example? |
@georgeh0 actually, not yet! I've tested the code logic thoroughly with the unit tests, but haven't spun up an actual Kuzu instance for end-to-end testing. |
…e_serialisation since they're testing standard Rust derive macros
@0xTnxl thanks a lot for the PR and @georgeh0 thanks a lot for the review. @0xTnxl please follow this example And attach a screenshot with kuzu explorer for the test before we merge it. if you have any question please let us know! (You can always find us https://discord.com/invite/zpA9S2DR7s for live chat too) |
@badmonster0 sure! I'll look into it right away, I'll get back to you guys as soon as I can. Cheers! |
Hey @georgeh0 @badmonster0, hope you’re both doing great! I ran into some issues running the tests and wanted to check, which version of the cocoindex module was used in the block/article you shared, @badmonster0? |
https://github.com/cocoindex-io/cocoindex/tree/main/examples/docs_to_knowledge_graph Does 0.2.8 work? could you share the error message? thanks for testing! |
Description of the Changes
This PR implements HNSW vector index support for Kuzu, allowing the users to create vector indexes with custom parameters and similarity metrics. The implementation follows the same pattern established by the Postgres vector index support (PR #1050).
Key Changes:
Technical Implementation:
VectorIndexState
struct for tracking index configurationSetupState
andGraphElementDataSetupChange
for index state managementdiff_setup_states()
apply_setup_changes()
Motivation and Context
Issue #1055 requested HNSW vector index support for Kuzu as part of the broader initiative (#1051) to add VectorIndexMethod support across all targets. This should ideally users to:
User Impact:
After this change, users can now do:
Breaking Changes
Overall, there are no breaking changes - This is purely additive functionality, so the existing Kuzu flows without vector indexes will continue to work unchanged.
Related Issues (References)
VectorIndexMethod
in Kuzu #1055 - [FEATURE] support VectorIndexMethod in KuzuVectorIndexMethod
in targets other than Postgres #1051 - Parent issue: [FEATURE] Also support VectorIndexMethod in targets other than PostgresVectorIndexMethod
in qdrant #1052 (Qdrant), [FEATURE] supportVectorIndexMethod
in Neo4j #1053 (Neo4j), [FEATURE] supportVectorIndexMethod
in LanceDB #1054 (LanceDB)Parameter Mapping
The implementation maps cocoindex HNSW parameters to Kuzu's format:
m
mu
+ml
m
formu
,2*m
forml
(satisfies mu < ml constraint)ef_construction
efc
metric
metric
Testing
Notes for Reviewers
The implementation is prod-level functional, but it may need minor adjustments to Kuzu's actual API syntax during testing phase.
Thank you for reviewing! Looking forward to contributing to CocoIndex!