embedding content updates and removals #2217

shanbady · 2025-04-29T14:17:38Z

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/6754

Description (What does it do?)

This PR introduces updates/upserts and removals for embedding vectors.

The main changes:

For contentfiles

the original "checksum" attribute on contentfiles has been renamed to "archive_checksum" to maintain functionality in how the ETL uses it to check tar file checksums.
the new "checksum" attribute contentfiles is a checksum of the content attribute which gets generated on each save
When generating embeddings for contentfiles, the checksum is used to determine if the contentfile has changed. If so all existing chunks for that contentfile are removed. The file is then re-chunked and re-embedded.
If the content checksum is the same - regeneration of the embedding is skipped but the "metadata"/attributes are upserted - this accounts for cases where the content has not changed but there may have been updates to some other attributes.

For resources

The embedding context is just the "{title} {description} {full description}" which we check against to see if it needs updating.

How can this be tested?

testing the etl hooks

checkout this branch
in settings.py set QDRANT_ENABLE_INDEXING_PLUGIN_HOOKS to True
restart celery
(optional step - makes it easier to see something happened) visit your local qdrant dashboard and delete both collections
run any backpopulate management command like python manage.py backpopulate_micromasters_data
you should see the qdrant records getting populated with learning resources that have been pulled down, updated or removed.

For reference, here is a list of things that this PR (and functionality before it) should ensure:

New Learning resources are embedded. New contentfiles are chunked and embedded as new records.
When contentfile contents are changed, old points are removed and the contentfile is re-chunked and embedded
regardless of changes to embedding content (for both contentfiles and learning resources) updates to indexed metadata fields are still applied.
when contentfiles or learning resources are unpublished/deleted, the associated records are removed from qdrant

github-actions · 2025-04-29T14:17:55Z

OpenAPI Changes

Show/hide 16 changes: 0 error, 0 warning, 16 info

16 changes: 0 error, 0 warning, 16 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-04-29T17:11:54Z

OpenAPI Changes

Show/hide 16 changes: 0 error, 0 warning, 16 info

16 changes: 0 error, 0 warning, 16 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-04-29T17:55:28Z

OpenAPI Changes

Show/hide 16 changes: 0 error, 0 warning, 16 info

16 changes: 0 error, 0 warning, 16 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-04-29T18:10:15Z

OpenAPI Changes

Show/hide 16 changes: 0 error, 0 warning, 16 info

16 changes: 0 error, 0 warning, 16 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-04-29T18:23:22Z

OpenAPI Changes

Show/hide 16 changes: 0 error, 0 warning, 16 info

16 changes: 0 error, 0 warning, 16 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

abeglova

This seems to work well although i haven't tested it will a delete or update for existing vectors. I will figure out how to test that out. In the meantime, here are some comments

abeglova · 2025-05-01T17:45:48Z

learning_resources/migrations/0089_contentfile_archive_checksum.py

+    operations = [
+        migrations.AddField(
+            model_name="contentfile",
+            name="archive_checksum",


Can you add a task to the migration to set archive_checksum to the current value of checksum and calculate the checksum from the content for existing contentfiles

abeglova · 2025-05-01T17:48:15Z

learning_resources/serializers.py

@@ -1286,13 +1288,15 @@ class Meta:
            "uid",
            "title",
            "description",
+            "archive_checksum",


You don't need archive_checksum in the serializer since it's not useful information for users or used for elasticsearch/vector search

abeglova · 2025-05-01T19:41:04Z

learning_resources_search/plugins.py

+            unpublished_tasks = [
+                tasks.bulk_deindex_learning_resources.si(ids, resource_type),
+            ]
+            if django_settings.QDRANT_ENABLE_INDEXING_PLUGIN_HOOKS:


I think the elasticsearch tasks should be chained before the vector embedding tasks since the vector embedding tasks can potentially take a long time. It's better to make the elasticsearch index up to date sooner

abeglova · 2025-05-01T19:44:31Z

vector_search/utils.py

+    based on the resource_type (learning resource vs content file)
+    """
+    client = qdrant_client()
+    if resource_type != CONTENT_FILE_TYPE:


Does it make sense to have a seperate version of update_payload(), should_generate_embeddings(), _embedding_context() etc for learning resources and content files instead of having all the if statements?you should be able to know which version of the function to call in the task

abeglova · 2025-05-01T19:49:19Z

vector_search/utils.py

+    )
+
+
+def should_generate_embeddings(serialized_document, resource_type):


I think it makes sense to have seperate content file and learning resource verisons of this function instead of all the if statements.
You can consider separate content file or resource specific versions of update_payload(), and _embedding_context() too

good idea. I updated the code so they each have their own _embedding_context, update_payload and should_generate_embeddings methods

github-actions · 2025-05-02T17:52:42Z

OpenAPI Changes

Show/hide 16 changes: 0 error, 0 warning, 16 info

16 changes: 0 error, 0 warning, 16 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-05-02T17:58:01Z

OpenAPI Changes

Show/hide 16 changes: 0 error, 0 warning, 16 info

16 changes: 0 error, 0 warning, 16 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-05-02T17:59:07Z

OpenAPI Changes

Show/hide 16 changes: 0 error, 0 warning, 16 info

16 changes: 0 error, 0 warning, 16 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-05-02T20:26:38Z

OpenAPI Changes

Show/hide 16 changes: 0 error, 0 warning, 16 info

16 changes: 0 error, 0 warning, 16 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-05-02T20:33:56Z

OpenAPI Changes

Show/hide 16 changes: 0 error, 0 warning, 16 info

16 changes: 0 error, 0 warning, 16 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'archive_checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-05-05T14:15:12Z

OpenAPI Changes

Show/hide 8 changes: 0 error, 0 warning, 8 info

8 changes: 0 error, 0 warning, 8 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-05-05T14:49:42Z

OpenAPI Changes

Show/hide 8 changes: 0 error, 0 warning, 8 info

8 changes: 0 error, 0 warning, 8 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

… file

github-actions · 2025-05-05T15:35:33Z

OpenAPI Changes

Show/hide 8 changes: 0 error, 0 warning, 8 info

8 changes: 0 error, 0 warning, 8 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

github-actions · 2025-05-05T15:55:58Z

OpenAPI Changes

Show/hide 8 changes: 0 error, 0 warning, 8 info

8 changes: 0 error, 0 warning, 8 info
info	[response-optional-property-added] at head/openapi/specs/v0.yaml	
	in API GET /api/v0/vector_content_files_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/content_file_search/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/courses/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/
		added the optional property 'results/items/checksum' to the response with the '200' status

info	[response-optional-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resources/{learning_resource_id}/contentfiles/{id}/
		added the optional property 'checksum' to the response with the '200' status

abeglova

lgtm

shanbady added 20 commits April 22, 2025 11:29

adding method to remove qdrant points by filter params

2d0d6af

adding safety check

3dc0c36

adding method to remove points by db id

7d74782

ensuring ocw contentfiles have a checksum

1a03e43

adding checksum to contentfile serializer

d973de3

ensuring we only generate embeddings if embedding context has changed

fea4c54

enforce checksum on contentfile save

97c2452

adding checksum to param map

acf2729

reverting changes to pipeline

b541dea

adding separate archive checksum field

efc4aa9

upserting points and contentfile diffing

8c6f3e7

adding missing migration

845ab08

update specs

fd88339

test fixes

c013b9a

fixing test

7e11ec9

fixing tests

c6d780e

addition of updating payloads

f7ed4b1

adding more tests and task to remove embeddings from index

97587e4

adding plugin hooks for embeddings

9d08cf2

reverting additions to plugins

94c8c3e

shanbady added the Work in Progress label Apr 29, 2025

adding plugin hooks

3b60002

fixing tests

f9ef2d0

adding setting to toggle embedding tasks in plugin methods

c42793f

fix test

c9dee8a

shanbady mentioned this pull request Apr 30, 2025

Adding setting to enable embedding processing in ETL plugin mitodl/ol-infrastructure#3162

Merged

abeglova self-assigned this May 1, 2025

abeglova requested changes May 1, 2025

View reviewed changes

setting archve_checksum from checksum value

8d72ab1

reorganizing call order

b895601

removing archive_checksum from serializer

868bcc7

fix tests

259bc09

separate methods for updating payloads

283a058

regenerate spec

39c85fe

fixing test to adjust for separate methods

9df8d72

separate _embedding_context methods for learning resource and content…

e000fb2

… file

splitting methods for embedding contentfiles and learning resources

5435a74

shanbady requested a review from abeglova May 5, 2025 16:23

abeglova approved these changes May 5, 2025

View reviewed changes

shanbady merged commit 69274bf into main May 6, 2025
12 of 13 checks passed

shanbady deleted the shanbady/embeddings-updates-and-removals branch May 6, 2025 15:16

This was referenced May 6, 2025

Release 0.32.0 #2227

Closed

Release 0.32.0 #2228

Closed

Release 0.31.2 #2229

Closed

Release 0.31.2 #2243

Closed

Release 0.31.3 #2244

Merged

		)


		def should_generate_embeddings(serialized_document, resource_type):

embedding content updates and removals #2217

embedding content updates and removals #2217

Uh oh!

Conversation

shanbady commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are the relevant tickets?

Description (What does it do?)

How can this be tested?

testing the etl hooks

Uh oh!

github-actions bot commented Apr 29, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented Apr 29, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented Apr 29, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented Apr 29, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented Apr 29, 2025

OpenAPI Changes

Uh oh!

abeglova left a comment

Choose a reason for hiding this comment

Uh oh!

abeglova May 1, 2025

Choose a reason for hiding this comment

Uh oh!

shanbady May 2, 2025

Choose a reason for hiding this comment

Uh oh!

abeglova May 1, 2025

Choose a reason for hiding this comment

Uh oh!

abeglova May 1, 2025

Choose a reason for hiding this comment

Uh oh!

abeglova May 1, 2025

Choose a reason for hiding this comment

Uh oh!

abeglova May 1, 2025

Choose a reason for hiding this comment

Uh oh!

shanbady May 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 2, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented May 2, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented May 2, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented May 2, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented May 2, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented May 5, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented May 5, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented May 5, 2025

OpenAPI Changes

Uh oh!

github-actions bot commented May 5, 2025

OpenAPI Changes

Uh oh!

abeglova left a comment

Choose a reason for hiding this comment

Uh oh!

shanbady commented Apr 29, 2025 •

edited

Loading