Consistent qdrant point ids #1839

shanbady · 2024-11-20T21:25:33Z

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/6094

Description (What does it do?)

This PR generates reproducible uuids off of the resource readable id for vector points stored in Qdrant. What this lets us do, is directly reference and check for existing embeddings in Qdrant if we have a learning resource or content file. Currently for vector similarity, the endpoint unnecessarily re-embeds the referenced document even though the embeddings for that already exist in qdrant (causes a slight delay when loading /api/v1/learning_resources/181/vector_similar/) - this is resolved in this PR since we can re-use the existing embedding

How can this be tested?

Checkout main and make sure you have learning resources locally
clear existing collections and generate the embeddings via python manage.py generate_embeddings --all --skip-contentfiles
find some learning resource id and load the vector similarity endpoint /api/v1/learning_resources/{resource id}/vector_similar/- note the delay in loading
Checkout this branch
make sure you have learning resources locally
clear existing collections and generate the embeddings via python manage.py generate_embeddings --all --skip-contentfiles
find some learning resource and load the vector similarity endpoint /api/v1/learning_resources/{resource id}/vector_similar/ - note how much faster it loads

Additional Context

We generate the uuid off of the resource "readable_id" instead of the "id" so that if we had some "master embeddings" snapshot - it can be instantly re-used in any environment.

abeglova

lgtm

* adding util method for generating point id * moving point id generation outside of model and adding to embed command * fixing vector similarity endpoint * adding test * sorting ids in test * updating hash key for contentfiles

shanbady added 4 commits November 20, 2024 09:13

adding util method for generating point id

6b5ff6a

moving point id generation outside of model and adding to embed command

612231f

fixing vector similarity endpoint

0d2e6ae

adding test

2d53534

shanbady added the Work in Progress label Nov 20, 2024

shanbady added 2 commits November 21, 2024 09:46

sorting ids in test

5e5fed0

updating hash key for contentfiles

a5a1b83

shanbady added Needs Review An open Pull Request that is ready for review and removed Work in Progress labels Nov 21, 2024

shanbady marked this pull request as ready for review November 21, 2024 15:31

shanbady mentioned this pull request Nov 21, 2024

similar resources carousel #1835

Merged

abeglova self-assigned this Nov 22, 2024

abeglova approved these changes Nov 22, 2024

View reviewed changes

shanbady merged commit 0370de1 into main Nov 22, 2024
11 checks passed

odlbot mentioned this pull request Nov 25, 2024

Release 0.26.0 #1858

Merged

19 tasks

rhysyngsun deleted the shanbady/qdrant-consistent-ids branch February 7, 2025 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistent qdrant point ids #1839

Consistent qdrant point ids #1839

Uh oh!

shanbady commented Nov 20, 2024 •

edited

Loading

Uh oh!

abeglova left a comment

Uh oh!

Uh oh!

Uh oh!

Consistent qdrant point ids #1839

Consistent qdrant point ids #1839

Uh oh!

Conversation

shanbady commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are the relevant tickets?

Description (What does it do?)

How can this be tested?

Additional Context

Uh oh!

abeglova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shanbady commented Nov 20, 2024 •

edited

Loading