Fastest way to access skims

@jiffyclub At some point pretty soon we'll want to diagnose the fastest way to access skims.  Given that we store the skims in OMX format (we might want to consider packing multiple matrices into a single h5 for convenience?), the big question is how to store/access them in memory.

Given our recent history with the `.loc` command I'm guessing storing zone_ids directly is basically a non-starter.  Fortunately, we're storing a dense matrix so we can make sure every zone_id is in position 1 greater than it's index (i.e. zone 1 is in index 0).  That way we can either 1) have a dataframe with a multi-index and call `.take` or 2) have a 2-D numpy array and access then directly, but only for one column at a time.  Do we think that 1) is slower than 2) because 1) is definitely more attractive from a code perspective.  I guess this "stacked" vs "unstacked" format.

At any rate, we should probably write a small abstraction to hide this from the user.  Basically we pass in one of the formats above with dimension N and then pass in two series of "origin" and "destination" zone ids and get back the values.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fastest way to access skims #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fastest way to access skims #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions