Consider validating sensible key values in `epi_archive`, valid ops for & performance improvements from nonunique keys

Currently, we do not check for distinct key values in `epi_archive`.
- User experience-wise: this is convenient but also gives them the opportunity to form unexpected or invalid archives.
- Semantics: we assume there aren't duplicate key values.  `$as_of` might give the right result if the duplicate keys have duplicate values and we don't want duplicate rows as output, but gives the wrong thing if instead user is trying to take advantage of a particular update-reporting structure which might enable some performance improvements, which we could take further advantage of by being more flexible with the key, described next.
- Performance: we require a version-search for *every key value* across the *single* huge archive DT.
  -  If all (geo_value, time_value, otherkey1, ..., otherkeyn) are re-reported in every `version` --- we are working off of full snapshots in DT --- then we only need to look up the version once, and can key by `version` alone.  But we can't use the `unique` lookup for `as_of` here; maybe a rolling join would work & generalize to the next case.
  - If all (geo_value, otherkey1, ..., otherkeyn) are re-reported in every `version` for time_values `version - 1:windowlength`, then we could key just by (time_value, version).
  - If we have patch-based reporting and no special guarantees, then we need to have (geo_value, otherkey1, ..., otherkeyn, version) as the key, and there should be no duplicates.
  - (Stratifying the DT into multiple DTs, say, one per geo, might increase the number of lookups required but make them faster.)

Might interact with #87.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider validating sensible key values in `epi_archive`, valid ops for & performance improvements from nonunique keys #89

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider validating sensible key values in epi_archive, valid ops for & performance improvements from nonunique keys #89

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consider validating sensible key values in `epi_archive`, valid ops for & performance improvements from nonunique keys #89