-
Notifications
You must be signed in to change notification settings - Fork 224
feat: fix Gemini embedding config and expose Postgres index tuning #1050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update: fixed the rust formatting error in Someone kindly run the checks once again for this. |
Thanks a lot for contributing! |
@georgeh0 thankyou for the review. I pushed a change addressing the comments. Updates:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR!
Great PR, thanks a lot @banrovegrie for your contribution! |
@badmonster0 @georgeh0 thankyou. We have been using cocoindex for our startup and it has been a pleasure! |
Implements HNSW vector index support for Kuzu following the same pattern as the Postgres implementation in PR cocoindex-io#1050. Changes: - Remove blanket "Vector indexes are not supported for Kuzu yet" error - Add validation to accept HNSW and reject IVFFlat with clear error message - Implement CREATE_VECTOR_INDEX and DROP_VECTOR_INDEX Cypher generation - Map cocoindex HNSW parameters to Kuzu format (m→mu/ml, ef_construction→efc) - Add vector index lifecycle management (create, update, delete) - Install Kuzu vector extension automatically when needed - Support all similarity metrics (cosine, l2, dotproduct) Technical details: - Add VectorIndexState struct to track index configuration - Update SetupState and GraphElementDataSetupChange for index tracking - Implement diff_setup_states logic for index change computation - Add vector index compatibility checking in check_state_compatibility - Integrate vector index operations in apply_setup_changes Fixes cocoindex-io#1055 Related to cocoindex-io#1051 Follows pattern from cocoindex-io#1050
This PR attempts at rolling in two related improvements. It addresses both #1049 and #1043.
First, Gemini embeddings now mirror
outputDimensionality
inside theconfig
block, sogemini-embedding-*
honors custom dimensions as requested by AurumnPegasus.Second, we expose vector index tuning for Postgres: callers can choose between HNSW and IVFFlat (with parameters) from Python, and the engine creates the matching index with pgvector options and updated naming.
Changes
src/llm/gemini.rs
: build the embed payload via a helper so Gemini gets both top-level andconfig
outputDimensionality
.src/base/spec.rs
: addVectorIndexMethod
(HNSW/IVFFlat with optional params), default existing indexes to HNSW, extendVectorIndexDef
formatting.src/ops/targets/postgres.rs
: honor the selected method when generatingCREATE INDEX
, emitWITH
options (m
,ef_construction
,lists
), suffix non-default methods in the index name.python/cocoindex/index.py
,python/cocoindex/__init__.py
: expose matching Python dataclasses (HnswVectorIndexMethod
,IvfFlatVectorIndexMethod
) so flows can configure index methods/options.docs/docs/core/flow_def.mdx
,docs/docs/examples/examples/simple_vector_index.md
: document the newmethod
field and show an IVFFlat example.P.S. most of the changes are schema/SQL plumbing but I couldn't find existing tests covering the vector-index SQL paths (not sure if I missed anything).