-
Notifications
You must be signed in to change notification settings - Fork 3
Support hospitalizations in data pipeline #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fetch hosp data with all aheads
geo_values = state_geos, | ||
verbose = TRUE, | ||
use_disk = TRUE) | ||
use_disk = TRUE) %>% | ||
filter(!(incidence_period == "epiweek" & ahead > 4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about this addition. It looks like this wasn't here before, yet we were still only saving aheads 1-4 for epiweek predictions (cases and deaths). I thought Jed made this cutoff elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, get_covidhub_predictions
uses ahead = 1:4
for both day and epiweek forecasts. However, daily forecasts actually go up to aheads of 28. To get those without getting epiweek forecasts more than 4 weeks ahead, I switched the ahead
setting to 1-28 and added the filter.
We could do two separate calls to get_covidhub_predictions
here, one for cases + deaths and one for hospitalizations, with different ahead
settings. However the underlying get_forecaster_predictions_alt
downloads all forecast files every time it's run (are you aware of any particular reason for this?), so the memory/speed tradeoff is poor at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you aware of any particular reason for this?
Ah, perhaps because this was originally intended to be run in the GitHub Actions, the files wouldn't persist between sessions anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes sense. And yes I believe that is the case re: downloading files.
# Only accept forecasts made Monday or earlier | ||
# For epiweek predictions, only accept forecasts made Monday or earlier. | ||
# target_end_date is the date of the last day (Saturday) in the epiweek | ||
# For daily predictions, accept any forecast where the target_end_date is later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we aren't using the "Monday or earlier" cutoff for hospitalization data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hospitalization forecasts are produced for every day following the forecast date; the target is N day ahead inc hosp
. My understanding is that the "Monday or earlier" cutoff is only relevant for weekly forecasts, since we want to make sure that forecasts for a week aren't made with partial information for that week (i.e. it's easy to predict cases for a week if you know the values for 6 out of 7 days for that week). Will check with Dan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach matches Dan's understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks.
After incorporating docker changes in #125, this updated pipeline runs as expected. |
Description
Dashboard data pipeline now fetches and filters hospitalization forecasts in addition to deaths and cases.
Changes
save_score_cards
andevaluate_chu
to apply to hospitalizationsImplications
This exacerbates existing memory issues since we're loading more forecasts and reference data. Memory issues will be addressed separately in changes to
evalcast
.