Skip to content

Conversation

luseverin
Copy link
Collaborator

@luseverin luseverin commented Mar 6, 2025

Changes proposed in this PR:

  • Add a cheap test for functions that must be in epsg:4326 but do take only value as input to catch when input coordinates are not geographic (e.g. in meters)
  • Update functions that assume epsg:4326 without reason to be able to use other crs. (work in progress / requires more information)

This PR treats scrum user story number 129 Consistent use of CRS in CLIMADA

Further information:

  • I added a quick test so function that require geographic coordinates (e.g. get_country_geometries or dist_to_coast that are based on naturalearth data) at least test that input data seems geographic. Currently allowed values are -91 < lat < 90 and -181 < lon < 541 and maximum extension of 181 (latitudinal) and 361 (longitudinal) but it is not exactly clear to me where to do the cut-off. For instance, do we want to allow longitudinal extension greater than 360, or longitudinal values smaller than -180 (some tests related to that e.g. test_track_land_params are still failing due to longitudinal extension > 360)?
  • It was specifically mentionned to treat issues in the trop_cyclone module but I was not exactly sure of what to do there. For instance, the get_close_centroids does not check that input centroids and lat and lon from the TC track are geographic; I was not sure if this should be explicitly check or if it can be left as is.
  • It was also supposed to adapt functions to allow use of other crs when possible but I did not find any instances of where this would be worth doing it or could be done without too much trouble. Some inputs on this are welcome!

PR Author Checklist

PR Reviewer Checklist

@luseverin luseverin added conventions Coding conventions or style Code update labels Mar 6, 2025
Copy link
Collaborator

@ValentinGebhart ValentinGebhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the lat lon check function! I think it is very valuable. I am not really sure in which other functions we could add it. What about:
latlon_to_geosph_vector(lat, lon, rad=False, basis=False)
lon_bounds(lon, buffer=0.0)
latlon_bounds(lat, lon, buffer=0.0)
dist_approx()
dist_to_coast_nasa()
These were all I could find in climada.util.coordinates that looked like they would require geographical coordinates. But maybe you thought about it and it does not make sense here?

I only have a few suggestions:

  • describe the error message a bit better so the user has a better idea what could be wrong and what to do about it.
  • change the range to -540 to 540 (apparently there is no convention in CLIMADA, and this seems soft enough as a error range)
  • Could you restructure the PR description a bit? It was unclear what was done and what not. Also, the link to scrum number 129 is a link to issue 129 which I think is different? For the description, e.g.

Changes proposed in this PR:

  • Add a cheap test for functions that must be in epsg:4326 but do take only value as input to catch when input coordinates are not geographic (e.g. in meters)

This PR treats scrum user story #129 Consistent use of CRS in CLIMADA

Further information:
...

Suggestions for future PRs

  • Update functions that assume epsg:4326 without reason to be able to use other crs. (work in progress / requires more information)

@chahank chahank removed the request for review from emanuel-schmid July 24, 2025 09:15
Copy link
Member

@chahank chahank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks good! I would propose a few improvements and clarifications.

The error message is repeated quite often in the code. This should thus rather be handled by a function. Please either write a wrapper, or include into check_if_geo_coords.

if not check_if_geo_coords(lat, lon):
    raise ValueError(
        "Input lat and lon coordinates do not seem to correspond"
        " to geographic coordinates in degrees. This can be because"
        " total extents are > 180 for lat or > 360 for lon, lat coordinates"
        " are outside of -90<lat<90, or lon coordinates are outside of -540<lon<540."
        " If you use degree values outside of these ranges,"
        " please shift the coordinates to the valid ranges."
    )

Oh and please update the changelog, thanks!

Comment on lines +140 to +142
raise ValueError(
f"Unknown unit: {unit}. Please provide a crs that has a unit "
"of 'degree', 'metre' or 'kilometre'."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to forbid other units? I am not sure to understand why at this stage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forbade it essentially because of the threshold parameter that we use to assign or not the coordinates. If we allow for other units, we would need to ensure that the user overseeds the default threshold value (which is currently set to 100 km), or define default thresholds for other coordinate units. I figured it was simpler to force the input coordinates to be in either SI units (m, km) or degrees, but I guess we could be more flexible if you think that it adds value. Let me know!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it makes sense. A well-spotted subtlety, thanks.

It is a bit unfortunate to introduce a limitation just to avoid incoherent results with some default value, which anyway, is quite sub-optimal at the moment. I think 100km for the threshold is often not very well chosen. I am not sure what to do here.

I would tend not to introduce limitations in this PR, and rather open an issue to better handle the threshold value. So, concretely, I would continue to handle explicitly meters and kilometres, but only for the threshold in the assignement. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit unfortunate to introduce a limitation just to avoid incoherent results with some default value, which anyway, is quite sub-optimal at the moment. I think 100km for the threshold is often not very well chosen. I am not sure what to do here.

Yes, I agree. 100 km is probably far from being optimal in many situations. But on the other hand I would not know what other value would be suited.. Maybe using a relative threshold i.e. that is based on the resolution of the input data would be better suited? E.g. res_coords = np.roll(coord_gdf, 1, axis=(0,1)).mean(), res_centr = np.roll(centroids, 1, axis=(0,1)).mean() and then set a default threshold in function of that and alternatively warn if the difference of resolution between the two input data is too large? Not sure if that would make sense though. And also it would only work with gridded data... But yes maybe it is better to open a new issue for that.

So, concretely, I would continue to handle explicitly meters and kilometres, but only for the threshold in the assignement. What do you think?

Not sure I understood right. Do you suggest to not raise an exception if unit is not one of (degree, km, m), and then just use a default value of 100 as a threshold?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about you raise a warning when the unit is not among the supported ones in the assign centroids part and explain that the threshold might be unsuitable?

I like the suggestion for improving the default value of the threshold, but maybe this should be done in a separate PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now do not convert the coordinates to be consistent with the default threshold in km and warn that the threshold is now considered as being defined in the input unit. I will set up a new issue to discuss how to improve the default behavior for the distance threshold.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Can you please open an issue to address this?

Comment on lines 1295 to 1302
except AttributeError as exc:
# a crs attribute is needed for unit inference
raise ValueError(
"Please provide coordinate GeoDataFrame and Hazard object with a valid crs attribute."
) from exc

# get unit of coordinate systems from axis of crs
unit = get_crs_unit(coord_gdf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. Why would the above exception be needed, since the function get_crs_unit(coord_gdf) is used after? Why would one first check if the CRS are equal, and afterwards check if they exist?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be able to correctly handle different units, we need to have a crs attribute defined for both coord_gdf and centroids. They also need to be equal for consistency. This is essentially what those try-except-raise blocks do, I think:

try:

  1. Check if crs of coord_gdf and centroids are equal
  2. Check that none of the crs of both coord_gdf and centroids is None

except AttributeError:

  • If AttributeError is raised, then it means that one of the input (e.g. centroids) does not have a crs attribute, hence we also need to raise an error as crs are required.

I don't know if that helps.. Also I am not sure why you mention the get_crs_unit(coord_gdf); do you mean we should not raise before the call to get_crs_unit(coord_gdf) and just let an exception be raised from it when there is no crs defined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being unclear.

If you check that the centroids are equal, then there is no point in checking whether either is None, as this is only possible if they are unequal. The only question is what to do if both are None, and I do not know what equal_crs returns in this case.

For the AttributeError you are right we can leave it as is. The error message could be improved a bit, since the method does neither have GeoDataFrame nor Hazard as input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked and equal_crs returns True if both are None. I simplified the check and clarified the error messages accordingly.

Comment on lines 1424 to 1426
raise ValueError(
"Only degree unit is supported for nearest neighbor approximation."
"Please use euclidean distance for non-degree units."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error message is not useful. The method cannot use Euclidian distance. What should the user do to fix this error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the message to the following:
raise ValueError( "Only degree unit is supported for nearest neighbor matching using" "'haversine' distance. Please use euclidean distance by setting " "'distance=euclidean' in the match_coordinates or match_centroids " "function when using units that are not degrees." )
Let me know if that is clearer.

Comment on lines 1510 to 1512
else:
# if unit is not in degree, check_antimeridian is forced to False
check_antimeridian = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the antimeridian is crossed in a crs without degrees? Is this possible?

Copy link
Collaborator Author

@luseverin luseverin Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I see two possible cases:

  1. The user uses a local projected coordinates system which is locally defined around the antimeridian region. Hence, there is no discontinuity at the antimeridian and everything is fine.
  2. The user uses a global projected coordinates system which is discontinuous at the antimeridian (e.g. EPSG:3857). In which case, I am not sure if crossing the antimeridian should be allowed, as it might be unphysical, and I am even less sure on how to handle this, given our current implementation of handling antimeridian crossing that relies on the haversine distance.

Maybe we could just raise a warning that "handling of antimeridian crossing is not implemented for non-degree coordinates. Check_antimeridian has been set to False. Please use degree-based coordinates system if you want to allow antimeridian crossing". Let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can try that. If this ends up logging warnings all the time we should remove it again as this would not be the goal of a warning.

@luseverin luseverin requested a review from peanutfun as a code owner July 28, 2025 15:03
@luseverin luseverin removed the request for review from peanutfun July 28, 2025 16:10
@chahank
Copy link
Member

chahank commented Jul 30, 2025

Please do not forget to update the changelog, thanks!

@@ -1089,15 +1173,38 @@ def match_coordinates(

# assign remaining coordinates to their geographically nearest neighbor
if threshold > 0 and exact_assign_idx.size != coords_view.size:
# convert to proper units before proceeding to nearest neighbor search
if unit == "degree":
# check that coords are indeed geographic before converting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? No coordinate conversion is performed in this method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a legacy of when I was converting meters to kilometers. I removed it.

@chahank chahank mentioned this pull request Aug 5, 2025
13 tasks
@chahank
Copy link
Member

chahank commented Aug 5, 2025

I just tried to implement a solution for the default threshold, issue #1079 . I found that we probably should do this at the same time as this PR in the end. I found some small redundancies in the current implementation which are easier to remove at once. It is still a work in progress on #1080, let's have a direct chat to decide how to proceed.

@chahank
Copy link
Member

chahank commented Aug 26, 2025

The changes proposed here were built upon in the PR #1080.

I will close this PR to concentrate on the other one.

@chahank chahank closed this Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code update conventions Coding conventions or style
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants