-
Notifications
You must be signed in to change notification settings - Fork 3
ingestion of test/sandbox courses #2224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
def configure_test_resources(self, options): | ||
if options.get("test_ids"): | ||
test_ids = options["test_ids"].split(",") | ||
LearningResource.objects.filter(id__in=test_ids).update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that unpublished resources are also ingested and saved to the database by the ETL scripts, but this isn't always the case. For example the mitxonline ETL pipeline skips over unpublished courses altogether:
https://github.com/mitodl/mit-learn/blob/main/learning_resources/etl/mitxonline.py#L141-L144
return list(
_fetch_data(
settings.MITX_ONLINE_COURSES_API_URL,
params={
"page__live": True,
"live": True,
"courserun_is_enrollable": True,
},
)
)
So might need to modify the above, ingesting all mitxonline courses returned by the API without filters, and setting the ones not matching the above criteria to unpublished=False
Also, if the run(s) for any of these test courses are unpublished (which depends on source-specific criteria, for example xpro runs need prices, mitxonline runs need is_enrollable=True
and a live course url, etc), then I think the contentfiles still won't be ingested even with the --test-ids
option.
mit-learn/learning_resources/etl/edx_shared.py
Lines 114 to 117 in 4531af3
runs = LearningResourceRun.objects.filter( | |
learning_resource__etl_source=etl_source, | |
learning_resource_id__in=ids, | |
published=True, |
Will every test course have at least one published run even if the course itself is not published? If not, should all runs of a "test" course be processed regardless of its published status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding the runs filtering in edx_shared. would it make sense to do something like the following?
runs = LearningResourceRun.objects.filter(
learning_resource__etl_source=etl_source,
learning_resource_id__in=ids
).filter(Q(published=True) | Q(learning_resource__test_mode=True))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good to me
if options.get("test_ids"): | ||
test_ids = options["test_ids"].split(",") | ||
LearningResource.objects.filter(id__in=test_ids).update( | ||
test_mode=True, published=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the standard scheduled ETL pipeline runs, might be good to set test_mode=False
if published
gets set to True
(indicating that the course is no longer in "test" mode and should no longer be treated differently than other courses).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made it so that all the etl management commands (both file and data) do this at the end
LearningResource.objects.filter(published=True, test_mode=True).update(test_mode=False)
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
@@ -139,9 +138,7 @@ def extract_courses(): | |||
_fetch_data( | |||
settings.MITX_ONLINE_COURSES_API_URL, | |||
params={ | |||
"page__live": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running backpopulate_mitxonline_data
on a fresh database, I'm getting this error:
celery-1 | "published": bool(course_run["is_enrollable"] and course["page"]["live"]),
celery-1 | ~~~~~~~~~~~~~~^^^^^^^^
celery-1 | TypeError: 'NoneType' object is not subscriptable
Seems like sometimes course['page']
is None
now that more results are coming through from the API. Try changing that line to:
"published": bool(course_run.get("is_enrollable", False) and (course.get("page") or {}).get("live", False)),
After that a new exception occurs:
celery-1 | File "/src/learning_resources/etl/mitxonline.py", line 360, in transform_programs
celery-1 | for course in _fetch_courses_by_ids(program["courses"])
celery-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery-1 | File "/src/learning_resources/etl/mitxonline.py", line 329, in _fetch_courses_by_ids
celery-1 | return list(
celery-1 | ^^^^^
celery-1 | File "/src/learning_resources/etl/mitxonline.py", line 47, in _fetch_data
celery-1 | ).json()
celery-1 | ^^^^^^
celery-1 | File "/opt/venv/lib/python3.12/site-packages/requests/models.py", line 978, in json
celery-1 | raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
OpenAPI ChangesShow/hide 190 changes: 0 error, 0 warning, 190 info
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
What are the relevant tickets?
Closes https://github.com/mitodl/hq/issues/7170
Description (What does it do?)
This PR gives us the ability to mark specific courses as "test mode" courses which means - they are un-published (dont display anywhere) but still have their contentfiles pulled down and ingested into opensearch and qdrant (and therefore exposed via apis)
How can this be tested?
Additional Context:
Checklist: