Skip to content

Conversation

gouthamve
Copy link
Contributor

fixes #2906

When we use the caching client (which is what is used in most cases), we
load the entire row (tableName+HashKey) irrespective of what the
rangeKey parameters are. Which means with the optimisation, we are
loading the same row multiple times and then operating on the same data.
This PR moves the optimisation to after the data is loaded which should
be faster.

Signed-off-by: Goutham Veeramachaneni [email protected]

What this PR does:

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@gouthamve
Copy link
Contributor Author

➜  cortex git:(fix-2906) ✗ benchcmp old.txt new.txt 
benchmark                                     old ns/op     new ns/op     delta
BenchmarkParseIndexEntries500-8               342057        291739        -14.71%
BenchmarkParseIndexEntries2500-8              1718683       1516488       -11.76%
BenchmarkParseIndexEntries10000-8             6568014       6463280       -1.59%
BenchmarkParseIndexEntries50000-8             34212971      34284051      +0.21%
BenchmarkParseIndexEntriesRegexSet500-8       137045        89742         -34.52%
BenchmarkParseIndexEntriesRegexSet2500-8      647045        441737        -31.73%
BenchmarkParseIndexEntriesRegexSet10000-8     2772264       1881870       -32.12%
BenchmarkParseIndexEntriesRegexSet50000-8     14044719      10336336      -26.40%

benchmark                                     old allocs     new allocs     delta
BenchmarkParseIndexEntries500-8               1503           1504           +0.07%
BenchmarkParseIndexEntries2500-8              7503           7504           +0.01%
BenchmarkParseIndexEntries10000-8             30005          30007          +0.01%
BenchmarkParseIndexEntries50000-8             150007         150008         +0.00%
BenchmarkParseIndexEntriesRegexSet500-8       1503           1522           +1.26%
BenchmarkParseIndexEntriesRegexSet2500-8      7503           7522           +0.25%
BenchmarkParseIndexEntriesRegexSet10000-8     30005          30023          +0.06%
BenchmarkParseIndexEntriesRegexSet50000-8     150007         150024         +0.01%

benchmark                                     old bytes     new bytes     delta
BenchmarkParseIndexEntries500-8               96289         96320         +0.03%
BenchmarkParseIndexEntries2500-8              481975        482008        +0.01%
BenchmarkParseIndexEntries10000-8             1928507       1928559       +0.00%
BenchmarkParseIndexEntries50000-8             9606750       9606808       +0.00%
BenchmarkParseIndexEntriesRegexSet500-8       88222         88693         +0.53%
BenchmarkParseIndexEntriesRegexSet2500-8      441163        441627        +0.11%
BenchmarkParseIndexEntriesRegexSet10000-8     1764739       1765087       +0.02%
BenchmarkParseIndexEntriesRegexSet50000-8     8804033       8804335       +0.00%

Hrm, not such a big change, interesting but still worth having imo.

@gouthamve gouthamve marked this pull request as ready for review August 4, 2020 09:29
Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth a CHANGELOG entry? If so, could you rebase master and add the CHANGELOG entry, please?

Comment on lines 559 to 570
Copy link
Contributor

@pstibrany pstibrany Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments here:

  1. Nit: if condition can be merged into single if statement.
  2. No unit test covers continue branch. I think we should cover it by tests.
  3. If labelValue is one of matched values, we still run regex on it on the next line. We can remove matcher.Matches in this case, and save some cpu cycles.

@pstibrany
Copy link
Contributor

This looks useful, and would be great to merge. @gouthamve would you find some time to look at the comments and get the PR into mergeable state?

@gouthamve gouthamve force-pushed the fix-2906 branch 2 times, most recently from 49a7cb6 to 8e64367 Compare August 25, 2020 08:44
fixes cortexproject#2906

When we use the caching client (which is what is used in most cases), we
load the entire row (tableName+HashKey) irrespective of what the
rangeKey parameters are. Which means with the optimisation, we are
loading the same row multiple times and then operating on the same data.
This PR moves the optimisation to after the data is loaded which should
be faster.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Goutham Veeramachaneni <[email protected]>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@pstibrany pstibrany merged commit 5ea4cc6 into cortexproject:master Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regex optimisation on index does redundant lookups
4 participants