Skip to content

Conversation

banrovegrie
Copy link
Contributor

@banrovegrie banrovegrie commented Sep 25, 2025

This PR attempts at rolling in two related improvements. It addresses both #1049 and #1043.

First, Gemini embeddings now mirror outputDimensionality inside the config block, so gemini-embedding-* honors custom dimensions as requested by AurumnPegasus.

Second, we expose vector index tuning for Postgres: callers can choose between HNSW and IVFFlat (with parameters) from Python, and the engine creates the matching index with pgvector options and updated naming.

Changes

  • src/llm/gemini.rs: build the embed payload via a helper so Gemini gets both top-level and config outputDimensionality.
  • src/base/spec.rs: add VectorIndexMethod (HNSW/IVFFlat with optional params), default existing indexes to HNSW, extend VectorIndexDef formatting.
  • src/ops/targets/postgres.rs: honor the selected method when generating CREATE INDEX, emit WITH options (m, ef_construction, lists), suffix non-default methods in the index name.
  • python/cocoindex/index.py, python/cocoindex/__init__.py: expose matching Python dataclasses (HnswVectorIndexMethod, IvfFlatVectorIndexMethod) so flows can configure index methods/options.
  • docs/docs/core/flow_def.mdx, docs/docs/examples/examples/simple_vector_index.md: document the new method field and show an IVFFlat example.

P.S. most of the changes are schema/SQL plumbing but I couldn't find existing tests covering the vector-index SQL paths (not sure if I missed anything).

@banrovegrie banrovegrie marked this pull request as ready for review September 25, 2025 18:53
@banrovegrie
Copy link
Contributor Author

banrovegrie commented Sep 25, 2025

Update: fixed the rust formatting error in spec.rs.

Someone kindly run the checks once again for this.

@georgeh0
Copy link
Member

Thanks a lot for contributing!

@banrovegrie
Copy link
Contributor Author

@georgeh0 thankyou for the review. I pushed a change addressing the comments. Updates:

  • made method optional
  • switched the serde tag to kind with PascalCase values
  • aligned the Python dataclasses to use kind = "Hnsw" / "IvfFlat"
  • letting the Postgres SQL helper default to HNSW when method is None while still lowercasing the suffix for non-default cases
  • updated the serialized "hnsw"/"ivfflat" strings

Copy link
Member

@georgeh0 georgeh0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR!

@georgeh0 georgeh0 merged commit c65d1e5 into cocoindex-io:main Sep 25, 2025
9 checks passed
@badmonster0
Copy link
Member

Great PR, thanks a lot @banrovegrie for your contribution!

@banrovegrie
Copy link
Contributor Author

@badmonster0 @georgeh0 thankyou. We have been using cocoindex for our startup and it has been a pleasure!

0xTnxl added a commit to 0xTnxl/cocoindex that referenced this pull request Oct 2, 2025
Implements HNSW vector index support for Kuzu following the same pattern
as the Postgres implementation in PR cocoindex-io#1050.

Changes:
- Remove blanket "Vector indexes are not supported for Kuzu yet" error
- Add validation to accept HNSW and reject IVFFlat with clear error message
- Implement CREATE_VECTOR_INDEX and DROP_VECTOR_INDEX Cypher generation
- Map cocoindex HNSW parameters to Kuzu format (m→mu/ml, ef_construction→efc)
- Add vector index lifecycle management (create, update, delete)
- Install Kuzu vector extension automatically when needed
- Support all similarity metrics (cosine, l2, dotproduct)

Technical details:
- Add VectorIndexState struct to track index configuration
- Update SetupState and GraphElementDataSetupChange for index tracking
- Implement diff_setup_states logic for index change computation
- Add vector index compatibility checking in check_state_compatibility
- Integrate vector index operations in apply_setup_changes

Fixes cocoindex-io#1055
Related to cocoindex-io#1051
Follows pattern from cocoindex-io#1050
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants