-
Notifications
You must be signed in to change notification settings - Fork 118
Description
@jiffyclub At some point pretty soon we'll want to diagnose the fastest way to access skims. Given that we store the skims in OMX format (we might want to consider packing multiple matrices into a single h5 for convenience?), the big question is how to store/access them in memory.
Given our recent history with the .loc command I'm guessing storing zone_ids directly is basically a non-starter. Fortunately, we're storing a dense matrix so we can make sure every zone_id is in position 1 greater than it's index (i.e. zone 1 is in index 0). That way we can either 1) have a dataframe with a multi-index and call .take or 2) have a 2-D numpy array and access then directly, but only for one column at a time. Do we think that 1) is slower than 2) because 1) is definitely more attractive from a code perspective. I guess this "stacked" vs "unstacked" format.
At any rate, we should probably write a small abstraction to hide this from the user. Basically we pass in one of the formats above with dimension N and then pass in two series of "origin" and "destination" zone ids and get back the values.