Skip to content

Conversation

nmdefries
Copy link
Collaborator

@nmdefries nmdefries commented Jun 9, 2021

Description

Dashboard data pipeline now fetches and filters hospitalization forecasts in addition to deaths and cases.

Changes

  • Pull all daily hospitalization forecasts (ahead of 1 to 28)
  • Generalize valid target date/forecast date filters to apply to hospitalizations
  • Generalize save_score_cards and evaluate_chu to apply to hospitalizations
  • Calculate and save hospitalization forecast scores

Implications

This exacerbates existing memory issues since we're loading more forecasts and reference data. Memory issues will be addressed separately in changes to evalcast.

fetch hosp data with all aheads
@nmdefries nmdefries requested a review from kateharwood June 9, 2021 16:36
geo_values = state_geos,
verbose = TRUE,
use_disk = TRUE)
use_disk = TRUE) %>%
filter(!(incidence_period == "epiweek" & ahead > 4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about this addition. It looks like this wasn't here before, yet we were still only saving aheads 1-4 for epiweek predictions (cases and deaths). I thought Jed made this cutoff elsewhere.

Copy link
Collaborator Author

@nmdefries nmdefries Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, get_covidhub_predictions uses ahead = 1:4 for both day and epiweek forecasts. However, daily forecasts actually go up to aheads of 28. To get those without getting epiweek forecasts more than 4 weeks ahead, I switched the ahead setting to 1-28 and added the filter.

We could do two separate calls to get_covidhub_predictions here, one for cases + deaths and one for hospitalizations, with different ahead settings. However the underlying get_forecaster_predictions_alt downloads all forecast files every time it's run (are you aware of any particular reason for this?), so the memory/speed tradeoff is poor at the moment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you aware of any particular reason for this?

Ah, perhaps because this was originally intended to be run in the GitHub Actions, the files wouldn't persist between sessions anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. And yes I believe that is the case re: downloading files.

# Only accept forecasts made Monday or earlier
# For epiweek predictions, only accept forecasts made Monday or earlier.
# target_end_date is the date of the last day (Saturday) in the epiweek
# For daily predictions, accept any forecast where the target_end_date is later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we aren't using the "Monday or earlier" cutoff for hospitalization data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hospitalization forecasts are produced for every day following the forecast date; the target is N day ahead inc hosp. My understanding is that the "Monday or earlier" cutoff is only relevant for weekly forecasts, since we want to make sure that forecasts for a week aren't made with partial information for that week (i.e. it's easy to predict cases for a week if you know the values for 6 out of 7 days for that week). Will check with Dan.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach matches Dan's understanding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

@nmdefries
Copy link
Collaborator Author

After incorporating docker changes in #125, this updated pipeline runs as expected.

@nmdefries nmdefries requested a review from kateharwood June 24, 2021 14:05
@nmdefries nmdefries merged commit ac76cc1 into dev Jun 25, 2021
@nmdefries nmdefries deleted the support-hospitalizations branch June 25, 2021 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants