Skip to content

total rain between onset and cessation, multiplying size of problem b… #298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

remicousin
Copy link
Contributor

…y munber of years

@remicousin remicousin self-assigned this Mar 21, 2023
@remicousin remicousin requested review from aaron-kaplan and removed request for aaron-kaplan March 27, 2023 14:29
total_mean:
map_max: 1100
total_stddev:
map_max: 300
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be hard to remember to update all these max values in the config file when we get new data. Not sure what to do about it. One idea is to calculate them within the zarrification script and store the values in zarr attributes, but we can't necessarily guess ahead of time all the different values we'll eventually need. Does it make sense to rerun the zarrification script every time we think of a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by new data, you mean new data for Zambia, then those values won't need updates, or maybe at the end of the century when climate change will have totally change the domain of definition of those quantities. If you mean new data for another country, we will likely know right away that we have to change them because the values will likely be off for another part of the world and so the maps will look funny.

I also don't think we want those values to be coming from the data because, as you said, the possibilities are many, and we also rounded values that make the colorbar reading easy (and maybe even more specific values depending on the colorbar features). Typically, there will be values that we can figure out beforehand like these, or max_taw (even though I computed it live) that we can hard code. And there will be values that are dependent on the choices made in the Maproom page and thus must be computed in-app (like the length of search days in onset maprooms is a user choice and used to assess max of several maps).

Ultimately, all these things should be default values that have the page make sense upon landing (or new options triggered) but we should have a feature across maprooms to allow the user to change the range of their colorscale. So I would reather focus on such a feature.

seasonal_total = precip_tile.where(
(precip_tile["T"] >= onset_dates.rename({"T": "Ty"}))
& (precip_tile["T"] <= cess_dates.rename({"T": "Ty"}))
).sum("T")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? precip_tile has daily or dekadal resolution, while onset_dates and cess_dates are yearly, right? Along what dimension to they get aligned in order to do the comparison?

@remicousin
Copy link
Contributor Author

Please hold: I clicked you as reviewer by mistake this morning. This is work in progress... sorry.

@remicousin
Copy link
Contributor Author

@aaron-kaplan , this is ready for review. I did both the map and the time series in the same PR because I find it informative that the dimensionality of the object against this is applied calls for different solutions.

The commits have sense:

  • total rain between onset... is an FYI mostly: it's the first solution I implemented that blows up the size of the objects being dealt with by creating a days by years array
  • More efficient... corrects the above by using grouping for the map. Since the groups must be the same through independent dimensions, the varying seasons edges are used to mask the data instead of creating groups, then the search dates for onset are used to create groups (so constantly for all X/Y)
  • tot time series... covers the time series case. There is no more the previous dimensional problem raised by the map, so instead I'm jumping straight into grouping using succession of onset and cessation dates as bin edges, and then picking every other group
  • correct paring... and bar plot for total are after rebasing with master and integrating the corrections made this week
  • then the last commit deals with the time series case where some onset or cessation dates are missing, then the pair must be dropped and the rest of the code can remain the same

(you might see at some point some toto= in the code, please ignore, they were not meant to be committed, they were for printing stuff and exploration).

Let me know what you think and then there might be stuff to extract for generalization.

seasonal_total = precip.groupby_bins(
"T",
seasonal_edges,
include_lowest=True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to double check is whether the inclusion is indeed inclusive for both edges. That's what I would want the way it is currently written.

@aaron-kaplan
Copy link
Collaborator

Memory usage is excessive. I'm working on bringing it down.

@remicousin
Copy link
Contributor Author

Memory usage is excessive. I'm working on bringing it down.

3 remarks:

1/ This soil moisture, calculated by water_balance, is the same always, at least in this maproom set up. So we could calculate it once and for all, store it in zarr format and read that to feed into the cessation date calculation

2/ This water_balance function was written quickly to satisfy the need to deliver this maproom. It's a special case of agronomy.soil_plant_water_balance but I don't know if soil_plant_water_balance has a more cost effective implementation.

3/ when/if the soil moisture, one day, is required to be computed live for some reason, we could probably be a bit more subtle and compute it only for a period of time allowing to do the seasonal search for the cessation date, as opposed to computing soil_moisture for all times regardless as it's done now.

@aaron-kaplan
Copy link
Collaborator

I think I see the way forward for computing functions where the value at each time step depends on the previous time steps. We should use DataArray.reduce. Its first argument should be a function func that takes two arguments: a DataArray that represents the result of the cumulative calculation for steps 0..i, and a DataArray that represents the input data for step i+1. Writing it this way will have two advantages: (1) it avoids storing intermediate results longer than necessary, which is currently causing memory problems, and (2) although it won't immediately make the calculation faster, it opens the possibility of someday rewriting func in C, which would make it faster with relatively low effort.

This entails rewriting some of the existing code from previous PRs. I think we should rewrite everything in this paradigm: water balance, onset date, cessation date, and seasonal total. Write each one as a function that takes the result of the calculation from previous steps and the rainfall for the current step, and returns
the result of the calculation for the current step. The seasonal total function will call the onset and cessation date functions, which in turn will each call the water balance function. (I guess that duplicates the water balance calculation; we can probably figure out a way to avoid that.)

@aaron-kaplan
Copy link
Collaborator

BTW precalculating water balance is also a good idea, but then the zarrification/precalculation script should use reduce.

@aaron-kaplan
Copy link
Collaborator

This water_balance function was written quickly to satisfy the need to deliver this maproom. It's a special case of agronomy.soil_plant_water_balance but I don't know if soil_plant_water_balance has a more cost effective implementation.

soil_plant_water_balance uses a function called soil_plant_water_step, which is the type of func that reduce expects. But then it stores the result of the soil moisture calculation for all times (variable sm), whereas I'm suggesting that we should store the soil moisture for time t only long enough to calculate it for t+1.

@aaron-kaplan
Copy link
Collaborator

@remicousin where do we stand on this? Is the ball in your court or mine?

@remicousin
Copy link
Contributor Author

That one is on hold until we finish rewriting the water balance / cessation to save or not to save all time steps, which is in progress in that PR #351 which is on you

@remicousin
Copy link
Contributor Author

But if anything I'd rather you start with PR #280 because Tufa would like to demo it in 2 weeks, whereas this updated onset or the water balance work, it's less clear when they want them finished.

1 similar comment
@remicousin
Copy link
Contributor Author

But if anything I'd rather you start with PR #280 because Tufa would like to demo it in 2 weeks, whereas this updated onset or the water balance work, it's less clear when they want them finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants