Skip to content

Conversation

nmdefries
Copy link
Collaborator

evalcast pulls each day/week of truth data from the COVIDcast API one by one. Since the overhead dominates the pull time, this is slow compared to pulling all desired dates at once. We can do this using the caching feature in evalcast.

This reduces the time to pull data from 3h 20m to 20m. Max memory usage decreases ~7 GB.

Copy link
Collaborator

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. A couple minor comments [below this message] if you think they need the edits.

Note that I have not been able to test create_reports.R in its entirety.

Locally, non-docker, I get

Warning in covidHubUtils::get_model_designations(source = "zoltar") :
  get_model_designations() will be deprecated soon. please use get_model_metadata() instead.
get_token(): POST: https://zoltardata.com/api-token-auth/
get_resource(): GET: https://zoltardata.com/api/projects/
Error in data.frame(id = id_column, url = url_column, owner_url = owner_url_column,  : 
  arguments imply differing number of rows: 10, 0

and make build fails with

mkdir dist
test -f dist/score_cards_state_deaths.rds || curl -o dist/score_cards_state_deaths.rds https://forecast-eval.s3.us-east-2.amazonaws.com/score_cards_state_deaths.rds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 25.0M  100 25.0M    0     0  10.5M      0  0:00:02  0:00:02 --:--:-- 10.5M
test -f dist/score_cards_state_cases.rds || curl -o dist/score_cards_state_cases.rds https://forecast-eval.s3.us-east-2.amazonaws.com/score_cards_state_cases.rds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14.2M  100 14.2M    0     0  8760k      0  0:00:01  0:00:01 --:--:-- 8759k
test -f dist/score_cards_nation_cases.rds || curl -o dist/score_cards_nation_cases.rds https://forecast-eval.s3.us-east-2.amazonaws.com/score_cards_nation_cases.rds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  364k  100  364k    0     0  30567      0  0:00:12  0:00:12 --:--:-- 93150
test -f dist/score_cards_nation_deaths.rds || curl -o dist/score_cards_nation_deaths.rds https://forecast-eval.s3.us-east-2.amazonaws.com/score_cards_nation_deaths.rds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  651k  100  651k    0     0   756k      0 --:--:-- --:--:-- --:--:--  756k
test -f dist/score_cards_state_hospitalizations.rds || curl -o dist/score_cards_state_hospitalizations.rds https://forecast-eval.s3.us-east-2.amazonaws.com/score_cards_state_hospitalizations.rds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 94.8M  100 94.8M    0     0  14.9M      0  0:00:06  0:00:06 --:--:-- 18.1M
test -f dist/score_cards_nation_hospitalizations.rds || curl -o dist/score_cards_nation_hospitalizations.rds https://forecast-eval.s3.us-east-2.amazonaws.com/score_cards_nation_hospitalizations.rds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2131k  100 2131k    0     0  1998k      0  0:00:01  0:00:01 --:--:-- 1999k
test -f dist/datetime_created_utc.rds || curl -o dist/datetime_created_utc.rds https://forecast-eval.s3.us-east-2.amazonaws.com/datetime_created_utc.rds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   167  100   167    0     0    401      0 --:--:-- --:--:-- --:--:--   400
docker build --no-cache=true --pull -t ghcr.io/cmu-delphi/forecast-eval: -f devops/Dockerfile .
invalid argument "ghcr.io/cmu-delphi/forecast-eval:" for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
make: *** [Makefile:45: build_dashboard] Error 125

Is it worth taking a closer look at these replication issues, or are you already convinced that it will deploy successfully?

@nmdefries
Copy link
Collaborator Author

nmdefries commented Dec 5, 2022

This runs successfully in production and locally for me.

The make build issue is because that is the production target for the dashboard. It relies on a environment variable being set. The make score_forecast target runs the scoring pipeline.

The arguments imply differing number of rows: 10, 0 error is actually a zoltr problem that's been resolved in a new version.

I wouldn't bother trying to test it, it takes a fair amount of setup. Thanks for your feedback on trying, though, I obviously need to update the README and make a dev version of the pipeline for local testing to make this easier to use!

@nmdefries nmdefries merged commit 6dc330d into dev Dec 5, 2022
@nmdefries nmdefries deleted the ndefries/use-cache branch December 5, 2022 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants