Skip to content

Tracing Memory Improvements with Sharrow #754

@jpn--

Description

@jpn--

Is your feature request related to a problem? Please describe.
When running production-scale ActivitySim simulations with Sharrow turned on, tracing consumes a lot of memory. This is because Sharrow is materializing very large intermediate arrays. For example, in a logit model when computing utility values, we compute $V = X \beta$. The array $X$ has a row for every observation and a column for every data element (i.e. every line in the SPEC file). When not tracing, the data in the $X$ array is assembled, consumed, and released dynamically by numba one row at a time, so that the memory to store all of $X$ is never needed. But for tracing, we need to write out to the trace file a (usually small) subset of the rows of $X$. Currently sharrow has no mechanism to save selected rows from the dynamically created values for $X$, so the only way to trace this data is create all of the rows, which temporarily uses a massive amount of memory.

@dhensle pointed out that tracing outside of a full-scale production run might not work when the effects of the full data are important (e.g. in shadow pricing).

Describe the solution you'd like
Sharrow needs additional capabilities to (a) receive instructions about what trace, and (b) output an array of tracing values that can then be dumped into the tracing outputs.

Describe alternatives you've considered
An alternative would be to implement tracing in an all-or-none mode, and selectively re-run only a subset of households through model components. This would probably be fine in most cases, but as noted above may be undesirable if there are interactions that depend on simulating at scale.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FeatureNew feature or requestPerformanceChanges that improve performance

    Type

    No type

    Projects

    Status

    No status

    Status

    Punt

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions