-
Notifications
You must be signed in to change notification settings - Fork 126
Add basic ItemCollection implementation #430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic ItemCollection implementation #430
Conversation
|
I think I lean toward I think it's ok to require users having to actively open an ItemCollection with ItemCollection.from_file. As long as they know that's how you open an ItemCollection. I know it doesn't meet the request from @scottyhq , but I think that's just because before it was ambiguous on if an ItemCollection was a STACObject, because there was the ItemCollection class in stac-api-spec, and it had Now that it's clear an ItemCollection is just a GeoJSON FeatureCollection with features that are STAC Items, it's reasonable that it you would open it differently. |
|
Wait, I think I'm wrong here....I don't think a STACObject function should return a non STACObject. But the pystac.read_file is just a convenience function, and now that I look at the code I think this is exactly the right way to do it. STACObject still only returns a STACObject, but the |
|
Finished going through this, looks really good overall, with some comments above. I can see eventually having some high level convenience functions in ItemCollection (such as calculating summaries over an ItemCollection), but I think this great for an initial implementation. |
Codecov Report
@@ Coverage Diff @@
## main #430 +/- ##
==========================================
+ Coverage 89.56% 89.67% +0.11%
==========================================
Files 39 40 +1
Lines 5115 5172 +57
==========================================
+ Hits 4581 4638 +57
Misses 534 534
Continue to review full report at Codecov.
|
|
@lossyrob This should be ready for your review, too, if you want to take a look. |
lossyrob
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm besides the top level read and write methods, which I commented on. I'm -0.5 for the change to those methods, so if there's someone feels strongly it should be that way I'm fine with things getting merged in as is.
pystac/__init__.py
Outdated
|
|
||
|
|
||
| def read_file(href: str) -> STACObject: | ||
| def read_file(href: str) -> Union[STACObject, ItemCollection]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it makes sense to change these top level package read/write methods to include ItemCollection. It breaks user code that may rely on the STACObject type, forcing the type differentiation to happen by the caller. Also the last two parameters of write_file make it feel a bit shoe-horned. On the other hand, I can see people getting confused about why read_file wouldn't work on an ItemCollection if they didn't know it wasn't a core stac object.
I think my preference would be to keep these methods working with core STAC types, and force users to treat ItemCollections as a separate concept, which makes more sense with the spec as it currently stands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with all of the points you make here, and I think my preference would be to not change the top-level functions as well. If that ends up being the decision, I will add more clear documentation to the ItemCollection class indicating that it is not a STACObject and cannot be read using those top-level methods.
@scottyhq I'm curious how much of a priority it is for you to be able to read Item Collections using pystac.read_file vs. ItemCollection.from_file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this! Yes, I'm coming at this primarily from someone new to STAC, who is unfamiliar with spec details. The key expectation I have is that after searching a STAC API for data, and saving 'results.json' to work with later, there is an straightforward way to open that file and navigate it. For Python it seems like pystac_client takes care of the searching and pystac should care of the I/O and navigation of the results. I don't think people should have to understand the concepts of ItemCollections versus Collections, Core vs Not, for this fundamental workflow.
So if pystac.read_file can't handle ItemCollections, and a separate ItemCollection.from_file() is the way forward (or pystac_client.read_file()`?), I think that just needs to be clearly documented.
Also useful (and I think a non-breaking change) would then be for pystac.read_file to have error handing that can recognize the json is an ItemCollection and suggest the correct method to open it, rather than the current KeyError: 'id' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scottyhq, that all makes sense to me. I'll change those top-level functions back to their original signatures and make sure we have clear docs on how to work with ItemCollections. I'm pretty sure the issue with a KeyError being raised in pystac.read_file is fixed by #402, but I'll add a test to be sure.
@matthewhanson Looking back at the code example in the original issue it seems like the name of the ItemSearch.items_as_collection method might also be misleading. Maybe we should rename that to items_as_item_collection so that users don't think they are saving a STAC Collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed ItemCollection handling from top-level functions and added test that pystac.read_file raises a STACTypeError instead of the KeyError in 404bb99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duckontheweb ItemSearch.items_as_collection was misleading. I've renamed it already, there's now get_pages (get the raw JSON of the pages), get_item_collection (gets pages as item collections), get_items (iterator through all pages and items), and get_all_items (gets all items from all pages and returns as single ItemCollection). This matches the get_ syntax used in PySTAC
404bb99 to
2bb16a1
Compare
|
@lossyrob It also just occurred to me that including |
|
@duckontheweb I agree that's not great, and should have been removed when ItemCollection was dropped. I'm +1 for dropping it from that Enum and using another separate constant in the migration/validation places where needed. |
I removed this in 47017df. I ended up just removing any ItemCollection references in the |
|
@lossyrob @matthewhanson Ready for another review whenever you have time. Thanks! |
|
@duckontheweb Looks good to me to merge! |
stac-utils/pystac#430 implemented `pystac.ItemCollection`, which is being removed from `pystac-client` in the next release. This PR adds similar handling for `pystac.ItemCollection`.
Related Issue(s): #
Description:
Adds
ItemCollectionclass to work with GeoJSON Feature Collections containing only STAC Items. This class can be serialized/deserialized from JSON, and also implementsCollectionto make it more convenient to loop over items, get an item by index, and check if an Item is included. Note that until we implement more specific STAC object equality (as per #380),assert item in item_collectionwill always fail unlessclone_items=Falseis used when instantiating theItemCollection.This PR also updates the I/O functions in
pystac.__init__to work with the newItemCollectionclass in response to #371. SinceItemCollections are notSTACObjects, this means that the read functions in that file (read_file,read_dict) are now not guaranteed to return aSTACObject. This means users will need to either handle both cases in their code, orcastthe result of those functions to the appropriate type. I'm interested in feedback on whether this is worth the change, or if we should keep these functions unchanged and make people useItemCollection.from_filein order to read ItemCollections.PR Checklist:
scripts/format)scripts/test)cc: @scottyhq
UPDATE 2021-06-10:
Implemented
__contains__to makeItemCollectionatyped.Collectionsince we now have the option not to clone Items on instantiation and we are no longer manipulating the Item links.