-
Notifications
You must be signed in to change notification settings - Fork 141
Remove implicit use of CRS in code base #1020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merge develop into branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the lat lon check function! I think it is very valuable. I am not really sure in which other functions we could add it. What about:
latlon_to_geosph_vector(lat, lon, rad=False, basis=False)
lon_bounds(lon, buffer=0.0)
latlon_bounds(lat, lon, buffer=0.0)
dist_approx()
dist_to_coast_nasa()
These were all I could find in climada.util.coordinates that looked like they would require geographical coordinates. But maybe you thought about it and it does not make sense here?
I only have a few suggestions:
- describe the error message a bit better so the user has a better idea what could be wrong and what to do about it.
- change the range to -540 to 540 (apparently there is no convention in CLIMADA, and this seems soft enough as a error range)
- Could you restructure the PR description a bit? It was unclear what was done and what not. Also, the link to scrum number 129 is a link to issue 129 which I think is different? For the description, e.g.
Changes proposed in this PR:
- Add a cheap test for functions that must be in epsg:4326 but do take only value as input to catch when input coordinates are not geographic (e.g. in meters)
This PR treats scrum user story #129 Consistent use of CRS in CLIMADA
Further information:
...
Suggestions for future PRs
- Update functions that assume epsg:4326 without reason to be able to use other crs. (work in progress / requires more information)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it looks good! I would propose a few improvements and clarifications.
The error message is repeated quite often in the code. This should thus rather be handled by a function. Please either write a wrapper, or include into check_if_geo_coords
.
if not check_if_geo_coords(lat, lon):
raise ValueError(
"Input lat and lon coordinates do not seem to correspond"
" to geographic coordinates in degrees. This can be because"
" total extents are > 180 for lat or > 360 for lon, lat coordinates"
" are outside of -90<lat<90, or lon coordinates are outside of -540<lon<540."
" If you use degree values outside of these ranges,"
" please shift the coordinates to the valid ranges."
)
Oh and please update the changelog, thanks!
raise ValueError( | ||
f"Unknown unit: {unit}. Please provide a crs that has a unit " | ||
"of 'degree', 'metre' or 'kilometre'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to forbid other units? I am not sure to understand why at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forbade it essentially because of the threshold
parameter that we use to assign or not the coordinates. If we allow for other units, we would need to ensure that the user overseeds the default threshold value (which is currently set to 100 km), or define default thresholds for other coordinate units. I figured it was simpler to force the input coordinates to be in either SI units (m, km) or degrees, but I guess we could be more flexible if you think that it adds value. Let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it makes sense. A well-spotted subtlety, thanks.
It is a bit unfortunate to introduce a limitation just to avoid incoherent results with some default value, which anyway, is quite sub-optimal at the moment. I think 100km for the threshold is often not very well chosen. I am not sure what to do here.
I would tend not to introduce limitations in this PR, and rather open an issue to better handle the threshold value. So, concretely, I would continue to handle explicitly meters and kilometres, but only for the threshold in the assignement. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit unfortunate to introduce a limitation just to avoid incoherent results with some default value, which anyway, is quite sub-optimal at the moment. I think 100km for the threshold is often not very well chosen. I am not sure what to do here.
Yes, I agree. 100 km is probably far from being optimal in many situations. But on the other hand I would not know what other value would be suited.. Maybe using a relative threshold i.e. that is based on the resolution of the input data would be better suited? E.g. res_coords = np.roll(coord_gdf, 1, axis=(0,1)).mean()
, res_centr = np.roll(centroids, 1, axis=(0,1)).mean()
and then set a default threshold
in function of that and alternatively warn if the difference of resolution between the two input data is too large? Not sure if that would make sense though. And also it would only work with gridded data... But yes maybe it is better to open a new issue for that.
So, concretely, I would continue to handle explicitly meters and kilometres, but only for the threshold in the assignement. What do you think?
Not sure I understood right. Do you suggest to not raise an exception if unit
is not one of (degree, km, m), and then just use a default value of 100 as a threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about you raise a warning when the unit is not among the supported ones in the assign centroids part and explain that the threshold might be unsuitable?
I like the suggestion for improving the default value of the threshold, but maybe this should be done in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now do not convert the coordinates to be consistent with the default threshold in km and warn that the threshold is now considered as being defined in the input unit. I will set up a new issue to discuss how to improve the default behavior for the distance threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. Can you please open an issue to address this?
climada/util/coordinates.py
Outdated
except AttributeError as exc: | ||
# a crs attribute is needed for unit inference | ||
raise ValueError( | ||
"Please provide coordinate GeoDataFrame and Hazard object with a valid crs attribute." | ||
) from exc | ||
|
||
# get unit of coordinate systems from axis of crs | ||
unit = get_crs_unit(coord_gdf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused. Why would the above exception be needed, since the function get_crs_unit(coord_gdf)
is used after? Why would one first check if the CRS are equal, and afterwards check if they exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be able to correctly handle different units, we need to have a crs attribute defined for both coord_gdf
and centroids
. They also need to be equal for consistency. This is essentially what those try-except-raise
blocks do, I think:
try:
- Check if crs of
coord_gdf
and centroids are equal - Check that none of the crs of both
coord_gdf
andcentroids
isNone
except AttributeError:
- If
AttributeError
is raised, then it means that one of the input (e.g.centroids
) does not have acrs
attribute, hence we also need to raise an error as crs are required.
I don't know if that helps.. Also I am not sure why you mention the get_crs_unit(coord_gdf)
; do you mean we should not raise before the call to get_crs_unit(coord_gdf)
and just let an exception be raised from it when there is no crs defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being unclear.
If you check that the centroids are equal, then there is no point in checking whether either is None, as this is only possible if they are unequal. The only question is what to do if both are None, and I do not know what equal_crs returns in this case.
For the AttributeError you are right we can leave it as is. The error message could be improved a bit, since the method does neither have GeoDataFrame nor Hazard as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked and equal_crs
returns True
if both are None
. I simplified the check and clarified the error messages accordingly.
climada/util/coordinates.py
Outdated
raise ValueError( | ||
"Only degree unit is supported for nearest neighbor approximation." | ||
"Please use euclidean distance for non-degree units." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error message is not useful. The method cannot use Euclidian distance. What should the user do to fix this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the message to the following:
raise ValueError( "Only degree unit is supported for nearest neighbor matching using" "'haversine' distance. Please use euclidean distance by setting " "'distance=euclidean' in the match_coordinates or match_centroids " "function when using units that are not degrees." )
Let me know if that is clearer.
climada/util/coordinates.py
Outdated
else: | ||
# if unit is not in degree, check_antimeridian is forced to False | ||
check_antimeridian = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the antimeridian is crossed in a crs without degrees? Is this possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I see two possible cases:
- The user uses a local projected coordinates system which is locally defined around the antimeridian region. Hence, there is no discontinuity at the antimeridian and everything is fine.
- The user uses a global projected coordinates system which is discontinuous at the antimeridian (e.g. EPSG:3857). In which case, I am not sure if crossing the antimeridian should be allowed, as it might be unphysical, and I am even less sure on how to handle this, given our current implementation of handling antimeridian crossing that relies on the haversine distance.
Maybe we could just raise a warning that "handling of antimeridian crossing is not implemented for non-degree coordinates. Check_antimeridian has been set to False. Please use degree-based coordinates system if you want to allow antimeridian crossing". Let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can try that. If this ends up logging warnings all the time we should remove it again as this would not be the goal of a warning.
Co-authored-by: Chahan M. Kropf <[email protected]>
…old in the given unit
… in non-degree units
Please do not forget to update the changelog, thanks! |
…hbor_haversine Co-authored-by: Chahan M. Kropf <[email protected]>
…hbor_approx Co-authored-by: Chahan M. Kropf <[email protected]>
climada/util/coordinates.py
Outdated
@@ -1089,15 +1173,38 @@ def match_coordinates( | |||
|
|||
# assign remaining coordinates to their geographically nearest neighbor | |||
if threshold > 0 and exact_assign_idx.size != coords_view.size: | |||
# convert to proper units before proceeding to nearest neighbor search | |||
if unit == "degree": | |||
# check that coords are indeed geographic before converting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? No coordinate conversion is performed in this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a legacy of when I was converting meters to kilometers. I removed it.
I just tried to implement a solution for the default threshold, issue #1079 . I found that we probably should do this at the same time as this PR in the end. I found some small redundancies in the current implementation which are easier to remove at once. It is still a work in progress on #1080, let's have a direct chat to decide how to proceed. |
The changes proposed here were built upon in the PR #1080. I will close this PR to concentrate on the other one. |
Changes proposed in this PR:
This PR treats scrum user story number 129 Consistent use of CRS in CLIMADA
Further information:
get_country_geometries
ordist_to_coast
that are based on naturalearth data) at least test that input data seems geographic. Currently allowed values are -91 < lat < 90 and -181 < lon < 541 and maximum extension of 181 (latitudinal) and 361 (longitudinal) but it is not exactly clear to me where to do the cut-off. For instance, do we want to allow longitudinal extension greater than 360, or longitudinal values smaller than -180 (some tests related to that e.g.test_track_land_params
are still failing due to longitudinal extension > 360)?trop_cyclone
module but I was not exactly sure of what to do there. For instance, theget_close_centroids
does not check that input centroids and lat and lon from the TC track are geographic; I was not sure if this should be explicitly check or if it can be left as is.PR Author Checklist
develop
)PR Reviewer Checklist