Implement concurrent filtering and postprocessing

(This bug has morphed a couple of times but has mostly been about optimizing the filtering pipeline in the sonarlog and db layers.  What remains now is making use of available parallelism - this is not high priority, perf is pretty good already.)

For the heavy-users script, we end up running the profile command for every job that passes the initial filtering.  The profile command is actually pretty slow.  I don't know why precisely, but there is one main loop that is quadratic-ish (though there's an assertion that "the inner loop will tend to be very short") and then when we do bucketing, which I do, there's a bucketing loop that is cubic-ish.  One would want to profile this.  File I/O should not be a factor because this is with the caching daemon.

I'm running this on four weeks of fox data: `heavy-users.py fox 4w`, which would be roughly may 24 through june 21.  There are 38 output lines unless I miscounted.

- [x] Move filter earlier in the pipeline
- [x] Avoid allocating a tremendously long array for all the samples
- [x] Avoid computing bounds when they are not needed, as this is expensive (it must be done before filtering)
- [x] Maybe: optimize the filter by specializing it for common cases (one value of one attribute + time) (see below + branch larstha-526-better-filter for WIP, this looks hard in general and the best we can do may be to implement special logic for particularly important cases)
- [ ] Maybe: push filtering down into the reading pipeline to do it concurrently
- [ ] Maybe: parallelize postprocessing, though this matters a lot less than efficient reading and filtering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement concurrent filtering and postprocessing #526

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement concurrent filtering and postprocessing #526

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions