Skip to content

Conversation

@shanbady
Copy link
Contributor

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/7081

Description (What does it do?)

This PR allows us to scrape multiple pages for a given marketing site (for purposes of embedding).

How can this be tested?

  1. Checkout this branch
  2. rebuild your web and celery containers
  3. set settings.EMBEDDINGS_EXTERNAL_FETCH_USE_WEBDRIVER = True
  4. docker compose down/up celery
  5. run the task to fetch marketing page data
from learning_resources.tasks import scrape_marketing_pages
scrape_marketing_pages.run()
  1. inspect the content of the marketing pages:
from learning_resources.models import ContentFile
cfs = ContentFile.objects.filter(file_type="marketing_page")
print(cfs.first().content)
  1. for micromasters program pages - it should contain content from all the pages (tabs at the top):
Screenshot 2025-04-11 at 4 37 09 PM
from learning_resources.models import ContentFile
ContentFile.objects.filter(file_type="marketing_page", learning_resource__url__icontains='micromasters')

@github-actions
Copy link

OpenAPI Changes

Show/hide No detectable change.

@shanbady shanbady changed the title multi marketing page scraping multi page marketing site scraping Apr 11, 2025
@shanbady shanbady marked this pull request as ready for review April 11, 2025 20:40
@abeglova abeglova self-assigned this Apr 14, 2025
Copy link
Contributor

@abeglova abeglova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@shanbady shanbady merged commit 4f82990 into main Apr 14, 2025
12 checks passed
@shanbady shanbady deleted the shanbady/multi-page-scraping branch April 14, 2025 18:45
This was referenced May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants