Skip to content

Fastest way to access skims #8

@fscottfoti

Description

@fscottfoti

@jiffyclub At some point pretty soon we'll want to diagnose the fastest way to access skims. Given that we store the skims in OMX format (we might want to consider packing multiple matrices into a single h5 for convenience?), the big question is how to store/access them in memory.

Given our recent history with the .loc command I'm guessing storing zone_ids directly is basically a non-starter. Fortunately, we're storing a dense matrix so we can make sure every zone_id is in position 1 greater than it's index (i.e. zone 1 is in index 0). That way we can either 1) have a dataframe with a multi-index and call .take or 2) have a 2-D numpy array and access then directly, but only for one column at a time. Do we think that 1) is slower than 2) because 1) is definitely more attractive from a code perspective. I guess this "stacked" vs "unstacked" format.

At any rate, we should probably write a small abstraction to hide this from the user. Basically we pass in one of the formats above with dimension N and then pass in two series of "origin" and "destination" zone ids and get back the values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions