Performance to construct a large ItemCollection

This issue documents some slowness on moderately large queries. In the snippet below we fetch 8,234 items. It takes about a minute to construct the results.

```python
import rasterio.features
import pystac_client

area_of_interest = {
    "type": "Polygon",
        "coordinates": [
          [
            [
              -123.46435546875,
              46.4605655457854
            ],
            [
              -119.608154296875,
              46.4605655457854
            ],
            [
              -119.608154296875,
              48.26125565204099
            ],
            [
              -123.46435546875,
              48.26125565204099
            ],
            [
              -123.46435546875,
              46.4605655457854
            ]
          ]
        ]
}
bbox = rasterio.features.bounds(area_of_interest)
stac = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search = stac.search(
    bbox=bbox,
    datetime="2016-01-01/2020-12-31",
    collections=["sentinel-2-l2a"],
    limit=2500,  # fetch items in batches of 2500
)

print(search.matched())  # 8234
items = list(search.items())
```

I ran `list(search.items())` under snakeviz and came up with this result: https://gistcdn.rawgit.org/TomAugspurger/fb5b3bde8cee09d2d9aa2f7215edf2b2/94e4ec2ae97bec2169f9263e8f41183418e885d9/mosaic-static.html

A few notes:

1. We're spend roughly 2/3s of our time in `stac_io.get_pages`, which includes IO, waiting for the endpoint (and maybe parsing the JSON into Python objects?)
2. We spend the other 1/3 of our time in `item_collection.from_dict`

Some ideas for optimization:

1. Most of the time in `item_collections.from_dict` is spent on a `deepcopy` in `pystac.Item.from_dict`. It *might* be safe to skip that copy (since these should be coming off the network with no other references) and provide a `copy=False` flag to `pystac.Item.from_dict`, to allow it to mutate the incoming `dict`.
2. Maybe `pystac_client.Client` or `.search` could provide a `raw=True/False` flag to allow skipping constructing pystac Items?
3.  Maybe some kind of async magic would speed up the reads? Hard to say, since I don't know how much time is spent waiting for results vs. parsing JSON. I don't know if it's a good idea to parse JSON on the asyncio event loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance to construct a large ItemCollection #49

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance to construct a large ItemCollection #49

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions