vector search endpoint #1827

shanbady · 2024-11-15T18:08:14Z

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/6077

Description (What does it do?)

This PR adds a learning resource search endpoint (/api/v1/learning_resources_vector_search/) that retrieves results from qdrant using vector search instead of opensearch.

How can this be tested?

checkout this branch
have some learning resources populated locally
generate vectors for learning resources and store in qdrant via python manage.py generate_embeddings --all --skip-contentfiles
hit the vector search endpoint with a query ("q=" get parameter) and validate results: http://open.odl.local:8063/api/v1/learning_resources_vector_search/?q=biology

Additional Context

the endpoint currently only supports the query "q=test" parameter (so no filtering by topic etc)
There is a known issue where the qdrant response does not provide a convenient way to get the total count of results in the response (it requires a separate call to the counts api). Return total count when offeset and limit are used for pagination qdrant/qdrant#4882 - for now we just return a constant 10000 results until we patch it or totals are at some point available in the qdrant response

abeglova · 2024-11-18T16:37:44Z

learning_resources_search/views.py

+    )
+    @extend_schema(summary="Vector Search")
+    def get(self, request):
+        request_data = LearningResourcesSearchRequestSerializer(data=request.GET)


This should have a separate serializer otherwise the openapi spec shows this endpoint supporting a bunch of options that are opensearch specific such as search mode or that are just not implemented yet

Also, can this be a V0 api for now? The V1 apis are supposed to be stable and have good documentation and be usable by an outside project. It can be moved to V1 once we build it out more.

Good point. I moved this to the V0 api and created a separate serializer for vector results that only exposes the "q","limit" and "offset" params

abeglova · 2024-11-18T16:58:54Z

learning_resources_search/api.py

+        hits = [hit.metadata for hit in search_result]
+    else:
+        results = LearningResource.objects.for_search_serialization().all()
+        hits = serialize_bulk_learning_resources([resource.id for resource in results])


While we are testing this, maybe it would be nicer to to return an abbreviated resource response so that it's easier to scan through the results to evaluate the quality of the response. Something like returning just the title, description and platform for each resource.

For now I have reduced it to return just the id, title, description, resource_type, platform and readable_id

abeglova · 2024-11-19T16:02:55Z

learning_resources_search/serializers.py

@@ -469,6 +469,11 @@ class LearningResourcesSearchRequestSerializer(SearchRequestSerializer):
    )


+class LearningResourcesVectorSearchRequestSerializer(SearchRequestSerializer):


This shouldn't inherit from SearchRequestSerializer. http://api.open.odl.local:8063/api/v0/schema/redoc/#tag/learning_resources_vector_search/operation/learning_resources_vector_search_retrieve is still showing a bunch of filters that are not yet implemented and some that will not ever be implemented such as dev_mode

abeglova

lgtm

* Release 0.24.3 * Release date for 0.24.3 * Server rendered search page results * v2 drawer certification updates (#1823) * update certification display in v2 drawer to match latest designs * don't show price info item if runs have differing data * MicroMasters not Micromasters * if there is no price for the certificate but it's indicated that one is included, display that * if resource is free, includes a certification but has no prices, still display the pill in the info item * generate migration for MicroMasters spelling change * fix certificate pill padding on mobile * Unit channel page and search prefetch * Featured list and testimonials only for unit channels * v2 learning resource drawer formats and location (#1826) * add format info item * display location if format is in_person * add tests * also show location for hybrid courses * LocalDate and NoSSR components to render localized dates only on client * Revert "LocalDate and NoSSR components to render localized dates only on client" This reverts commit b4ccd6d. * LocalDate and NoSSR components to render localized dates only on client (#1831) * LocalDate and NoSSR components to render localized dates only on client * Remove unnecessary React.Fragment * separate starts and as taught in, show anytime availability (#1828) * refactor starts / as taught in functionality to show on separate lines, show "anytime" in starts if availability is anytime * fix rebase mishap * Map address search params * Search params test * Update dependency pytest-cov to v6 (#1818) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update dependency safety to v3 (#1819) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * URL search param validation anf transforms to align with course-search-utils * Update dependency django-anymail to v12 (#1815) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * vector search endpoint (#1827) * adding initial vector search view * adding working vector results endpoint * regenerate openapi spec * fixing format of returned results * adding test * patching qdrant client * moving to v0 api * switch to custom serializer for vector search * fix v0 url * using minimal serializer * returning minimal response for vector results * regenerate spec * adding some other useful bits to response * fixing response for empty query and adjusting test * regenerate spec * uninheriting from searchrequest serializer * updating oai spec * updating oai spec * Update dependency @mui/lab to v6.0.0-beta.15 (#1830) * Update dependency @mui/lab to v6.0.0-beta.15 * update lockfile --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: shankar ambady <[email protected]> * Update to use validators from course-search-utils --------- Co-authored-by: Doof <[email protected]> Co-authored-by: Carey P Gumaer <[email protected]> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Shankar Ambady <[email protected]> Co-authored-by: shankar ambady <[email protected]>

* adding initial vector search view * adding working vector results endpoint * regenerate openapi spec * fixing format of returned results * adding test * patching qdrant client * moving to v0 api * switch to custom serializer for vector search * fix v0 url * using minimal serializer * returning minimal response for vector results * regenerate spec * adding some other useful bits to response * fixing response for empty query and adjusting test * regenerate spec * uninheriting from searchrequest serializer * updating oai spec * updating oai spec

shanbady added 6 commits November 14, 2024 15:57

adding initial vector search view

9736f5e

adding working vector results endpoint

5423b77

regenerate openapi spec

c38004f

fixing format of returned results

8d69c30

adding test

c24b792

patching qdrant client

4d76614

shanbady added the Needs Review An open Pull Request that is ready for review label Nov 15, 2024

shanbady marked this pull request as ready for review November 15, 2024 18:08

abeglova self-assigned this Nov 18, 2024

abeglova requested changes Nov 18, 2024

View reviewed changes

shanbady added 9 commits November 18, 2024 16:24

moving to v0 api

eea7668

switch to custom serializer for vector search

0488d9e

fix v0 url

a1a4e45

using minimal serializer

a6840b8

returning minimal response for vector results

671415f

regenerate spec

5f44ed2

adding some other useful bits to response

1900fb6

fixing response for empty query and adjusting test

c942028

regenerate spec

9c17dd6

shanbady requested a review from abeglova November 19, 2024 15:29

abeglova requested changes Nov 19, 2024

View reviewed changes

abeglova assigned shanbady and unassigned abeglova Nov 19, 2024

shanbady added 3 commits November 19, 2024 11:10

uninheriting from searchrequest serializer

329935a

updating oai spec

1becb9e

updating oai spec

04f3025

shanbady requested a review from abeglova November 19, 2024 18:10

abeglova approved these changes Nov 19, 2024

View reviewed changes

shanbady merged commit 04018b7 into main Nov 19, 2024
11 checks passed

shanbady deleted the shanbady/vector-search-endpoint branch November 19, 2024 20:07

odlbot mentioned this pull request Nov 25, 2024

Release 0.26.0 #1858

Merged

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vector search endpoint #1827

vector search endpoint #1827

Uh oh!

shanbady commented Nov 15, 2024

Uh oh!

abeglova Nov 18, 2024 •

edited

Loading

Uh oh!

abeglova Nov 18, 2024

Uh oh!

shanbady Nov 19, 2024

Uh oh!

abeglova Nov 18, 2024

Uh oh!

shanbady Nov 19, 2024

Uh oh!

abeglova Nov 19, 2024

Uh oh!

shanbady Nov 19, 2024

Uh oh!

abeglova left a comment

Uh oh!

Uh oh!

Uh oh!

		@@ -469,6 +469,11 @@ class LearningResourcesSearchRequestSerializer(SearchRequestSerializer):
		)


		class LearningResourcesVectorSearchRequestSerializer(SearchRequestSerializer):

vector search endpoint #1827

vector search endpoint #1827

Uh oh!

Conversation

shanbady commented Nov 15, 2024

What are the relevant tickets?

Description (What does it do?)

How can this be tested?

Additional Context

Uh oh!

abeglova Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abeglova Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

shanbady Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

abeglova Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

shanbady Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

abeglova Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

shanbady Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

abeglova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

abeglova Nov 18, 2024 •

edited

Loading