Skip to content

vector search endpoint #1827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Nov 19, 2024
Merged

vector search endpoint #1827

merged 18 commits into from
Nov 19, 2024

Conversation

shanbady
Copy link
Contributor

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/6077

Description (What does it do?)

This PR adds a learning resource search endpoint (/api/v1/learning_resources_vector_search/) that retrieves results from qdrant using vector search instead of opensearch.

How can this be tested?

  1. checkout this branch
  2. have some learning resources populated locally
  3. generate vectors for learning resources and store in qdrant via python manage.py generate_embeddings --all --skip-contentfiles
  4. hit the vector search endpoint with a query ("q=" get parameter) and validate results: http://open.odl.local:8063/api/v1/learning_resources_vector_search/?q=biology

Additional Context

  1. the endpoint currently only supports the query "q=test" parameter (so no filtering by topic etc)
  2. There is a known issue where the qdrant response does not provide a convenient way to get the total count of results in the response (it requires a separate call to the counts api). Return total count when offeset and limit are used for pagination  qdrant/qdrant#4882 - for now we just return a constant 10000 results until we patch it or totals are at some point available in the qdrant response

@shanbady shanbady added the Needs Review An open Pull Request that is ready for review label Nov 15, 2024
@shanbady shanbady marked this pull request as ready for review November 15, 2024 18:08
@abeglova abeglova self-assigned this Nov 18, 2024
)
@extend_schema(summary="Vector Search")
def get(self, request):
request_data = LearningResourcesSearchRequestSerializer(data=request.GET)
Copy link
Contributor

@abeglova abeglova Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a separate serializer otherwise the openapi spec shows this endpoint supporting a bunch of options that are opensearch specific such as search mode or that are just not implemented yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can this be a V0 api for now? The V1 apis are supposed to be stable and have good documentation and be usable by an outside project. It can be moved to V1 once we build it out more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I moved this to the V0 api and created a separate serializer for vector results that only exposes the "q","limit" and "offset" params

hits = [hit.metadata for hit in search_result]
else:
results = LearningResource.objects.for_search_serialization().all()
hits = serialize_bulk_learning_resources([resource.id for resource in results])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are testing this, maybe it would be nicer to to return an abbreviated resource response so that it's easier to scan through the results to evaluate the quality of the response. Something like returning just the title, description and platform for each resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I have reduced it to return just the id, title, description, resource_type, platform and readable_id

@shanbady shanbady requested a review from abeglova November 19, 2024 15:29
@@ -469,6 +469,11 @@ class LearningResourcesSearchRequestSerializer(SearchRequestSerializer):
)


class LearningResourcesVectorSearchRequestSerializer(SearchRequestSerializer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't inherit from SearchRequestSerializer. http://api.open.odl.local:8063/api/v0/schema/redoc/#tag/learning_resources_vector_search/operation/learning_resources_vector_search_retrieve is still showing a bunch of filters that are not yet implemented and some that will not ever be implemented such as dev_mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Fixed

@abeglova abeglova assigned shanbady and unassigned abeglova Nov 19, 2024
@shanbady shanbady requested a review from abeglova November 19, 2024 18:10
Copy link
Contributor

@abeglova abeglova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@shanbady shanbady merged commit 04018b7 into main Nov 19, 2024
11 checks passed
@shanbady shanbady deleted the shanbady/vector-search-endpoint branch November 19, 2024 20:07
jonkafton added a commit that referenced this pull request Nov 20, 2024
* Release 0.24.3

* Release date for 0.24.3

* Server rendered search page results

* v2 drawer certification updates (#1823)

* update certification display in v2 drawer to match latest designs

* don't show price info item if runs have differing data

* MicroMasters not Micromasters

* if there is no price for the certificate but it's indicated that one is included, display that

* if resource is free, includes a certification but has no prices, still display the pill in the info item

* generate migration for MicroMasters spelling change

* fix certificate pill padding on mobile

* Unit channel page and search prefetch

* Featured list and testimonials only for unit channels

* v2 learning resource drawer formats and location (#1826)

* add format info item

* display location if format is in_person

* add tests

* also show location for hybrid courses

* LocalDate and NoSSR components to render localized dates only on client

* Revert "LocalDate and NoSSR components to render localized dates only on client"

This reverts commit b4ccd6d.

* LocalDate and NoSSR components to render localized dates only on client (#1831)

* LocalDate and NoSSR components to render localized dates only on client

* Remove unnecessary React.Fragment

* separate starts and as taught in, show anytime availability (#1828)

* refactor starts / as taught in functionality to show on separate lines, show "anytime" in starts if availability is anytime

* fix rebase mishap

* Map address search params

* Search params test

* Update dependency pytest-cov to v6 (#1818)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update dependency safety to v3 (#1819)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* URL search param validation anf transforms to align with course-search-utils

* Update dependency django-anymail to v12 (#1815)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* vector search endpoint (#1827)

* adding initial vector search view

* adding working vector results endpoint

* regenerate openapi spec

* fixing format of returned results

* adding test

* patching qdrant client

* moving to v0 api

* switch to custom serializer for vector search

* fix v0 url

* using minimal serializer

* returning minimal response for vector results

* regenerate spec

* adding some other useful bits to response

* fixing response for empty query and adjusting test

* regenerate spec

* uninheriting from searchrequest serializer

* updating oai spec

* updating oai spec

* Update dependency @mui/lab to v6.0.0-beta.15 (#1830)

* Update dependency @mui/lab to v6.0.0-beta.15

* update lockfile

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: shankar ambady <[email protected]>

* Update to use validators from course-search-utils

---------

Co-authored-by: Doof <[email protected]>
Co-authored-by: Carey P Gumaer <[email protected]>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Shankar Ambady <[email protected]>
Co-authored-by: shankar ambady <[email protected]>
mbertrand pushed a commit that referenced this pull request Nov 22, 2024
* adding initial vector search view

* adding working vector results endpoint

* regenerate openapi spec

* fixing format of returned results

* adding test

* patching qdrant client

* moving to v0 api

* switch to custom serializer for vector search

* fix v0 url

* using minimal serializer

* returning minimal response for vector results

* regenerate spec

* adding some other useful bits to response

* fixing response for empty query and adjusting test

* regenerate spec

* uninheriting from searchrequest serializer

* updating oai spec

* updating oai spec
@odlbot odlbot mentioned this pull request Nov 25, 2024
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review An open Pull Request that is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants