-
Notifications
You must be signed in to change notification settings - Fork 56
Closed
Milestone
Description
This issue documents some slowness on moderately large queries. In the snippet below we fetch 8,234 items. It takes about a minute to construct the results.
import rasterio.features
import pystac_client
area_of_interest = {
"type": "Polygon",
"coordinates": [
[
[
-123.46435546875,
46.4605655457854
],
[
-119.608154296875,
46.4605655457854
],
[
-119.608154296875,
48.26125565204099
],
[
-123.46435546875,
48.26125565204099
],
[
-123.46435546875,
46.4605655457854
]
]
]
}
bbox = rasterio.features.bounds(area_of_interest)
stac = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
search = stac.search(
bbox=bbox,
datetime="2016-01-01/2020-12-31",
collections=["sentinel-2-l2a"],
limit=2500, # fetch items in batches of 2500
)
print(search.matched()) # 8234
items = list(search.items())
I ran list(search.items())
under snakeviz and came up with this result: https://gistcdn.rawgit.org/TomAugspurger/fb5b3bde8cee09d2d9aa2f7215edf2b2/94e4ec2ae97bec2169f9263e8f41183418e885d9/mosaic-static.html
A few notes:
- We're spend roughly 2/3s of our time in
stac_io.get_pages
, which includes IO, waiting for the endpoint (and maybe parsing the JSON into Python objects?) - We spend the other 1/3 of our time in
item_collection.from_dict
Some ideas for optimization:
- Most of the time in
item_collections.from_dict
is spent on adeepcopy
inpystac.Item.from_dict
. It might be safe to skip that copy (since these should be coming off the network with no other references) and provide acopy=False
flag topystac.Item.from_dict
, to allow it to mutate the incomingdict
. - Maybe
pystac_client.Client
or.search
could provide araw=True/False
flag to allow skipping constructing pystac Items? - Maybe some kind of async magic would speed up the reads? Hard to say, since I don't know how much time is spent waiting for results vs. parsing JSON. I don't know if it's a good idea to parse JSON on the asyncio event loop.
Metadata
Metadata
Assignees
Labels
No labels