-
Notifications
You must be signed in to change notification settings - Fork 817
Improve throughput through sharding fingerprints #3097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: yuanchao <[email protected]>
Signed-off-by: yuanchao <[email protected]>
When I was testing, I created a structure In order to avoid other interference, I directly modified the Push method of Distributor:
Below is the magnitude of my data:
The data is not extreme, they come from our real production environment (from a large number of node_exporters). |
Here's a simple test out of the box:
It prints these as it runs on my machine:
When you properly increase In fact, I noticed a shard here:
It can effectively reduce lock contention (I call it lock segmentation), but when the value of |
Signed-off-by: storyicon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, that looks really interesting. I have left some comments in the code.
Would it be possible to write a benchmark with adding same number of metrics and compare values before and after this change? (I see you already wrote a comparison above... can you add it as a benchmark to the PR please?)
pkg/ingester/index/index.go
Outdated
} | ||
|
||
func (c *indexValueEntry) shard(fp model.Fingerprint) int { | ||
return int(math.Floor(float64(len(c.shards)) * float64(fp) / math.MaxUint64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will return len(c.shards)
for values close to math.MaxUint64, which will then cause panics.
Perhaps simplest solution would be to use fp % len(c.shards)
as a shard number. (That would require little bit of extra effort when iterating them in sorted order)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for correcting, I made a low-level mistake. This fully illustrates the importance of CodeReview.
Perhaps simplest solution would be to use fp % len(c.shards) as a shard number. (That would require little bit of extra effort when iterating them in sorted order)
Breaking the order of the shard will bring additional costs. I think the simplest way is this:
func (c *indexValueEntry) shard(fp model.Fingerprint) int {
n := int(math.Floor(float64(len(c.shards)) * float64(fp) / math.MaxUint64))
if n == len(c.shards) {
n = len(c.shards) - 1
}
return n
}
What is your opinion?
j := sort.Search(len(fps), func(i int) bool { | ||
return fps[i] >= fp | ||
}) | ||
c.shards[num] = fps[:j+copy(fps[j:], fps[j+1:])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If element is not int the fps
, this will delete invalid entry or panic. (Original code does the same)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I copied the previous code directly. Now I've added some conditional statements here:
if len(fps) == j {
return
}
pkg/ingester/index/index.go
Outdated
} | ||
|
||
func (c *indexValueEntry) fps() []model.Fingerprint { | ||
var fps []model.Fingerprint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to save allocations, we can preallocate this slice with length of c.length()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good idea
j := sort.Search(len(fps), func(i int) bool { | ||
return fps[i] >= fp | ||
}) | ||
fps = append(fps, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to delete
, we don't check if fp
is already on given found position or not, and add it anyway. (But seems like adding duplicates is fine, based on original code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upper-level logic ensures that there will be no duplicate fingerprints here:
https://github.com/cortexproject/cortex/blob/master/pkg/ingester/user_state.go#L178-L181
I also would be very interested into this, benchmarking both CPU and memory. For example, the following change may have an impact: - fps map[string]indexValueEntry
+ fps map[string]*indexValueEntry |
Hmmm, the benchmark I will commit is not affected by this |
Signed-off-by: yuanchao <[email protected]>
I submitted a modification, it contains a simple benchmark, this is the result of my running:
If my benchmark is written without problems, after shard, the insertion speed is increased by 60 times. This is close to the results I tested in my production environment. |
Cortex (and Prometheus) is designed to handle timeseries that have many values over hours; if you have only one sample in a series then metadata (labels, etc) dominates storage and processing time. Don't you find that the IO costs of putting this on disk are enormous? I suspect an entirely different storage scheme would be needed to make this style of data manageable. |
Sorry, I mean there is only one sample for each series |
Based on further discussion on Slack, I think you are focused on the time Cortex sees a lot of series for the first time, e.g. after an ingester is restarted, or if a large (tens of millions of series) Prometheus starts sending for the first time. This is a valid concern, though perhaps one that has slipped past other large users. Possibly other users are scaled more horizontally - the copying effect you describe will increase exponentially as numbers of series in one index shard grows. I think the same effect should be observable in the ingester hand-over process, where millions of series will be created in a single transaction. |
@bboreham Sorry, I was busy with some other things a few days ago, and today I see your reply here. In order to make the comparison clearer, I re-tested under the same conditions:
The test data used above comes from our production environment (not any special stress test data), which only accounts for about 20% of the total. (This is also the data source I have been using before) When most fingerprints already exist in memory, the speed of both has been greatly improved, but there is no doubt that the revised version will always be a little faster. If a new batch of data is accessed at this time, the fixed version will be able to digest them quickly. |
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
What this PR does:
This PR is expected to increase the throughput of ingester(chunk storage) by 10-100 times.
Which issue(s) this PR fixes:
Fixes #3093
According to the issue description,
Lock
andAppend
take up 95% of the entirePush
request time.The time consumption of the
Lock
is affected by the amount of concurrency, but in essence it is affected by the time consumption of the code area it protects.So the key is to solve the problem that
Copy
takes too long. In fact, this is a truth that any data structure beginner understands: the cost of random insertion into the array is expensive because it requires moving a large number of elements.To be honest, I first thought of skiplist, because the scene here is very similar to redis's sorted set. After some comparative tests, the random insertion speed of skiplist is indeed much better than that of array. But when it was actually integrated into cortex and used in my environment, it took up too much memory. So I finally went back to the array.
What I consider is to divide a big ordered slice into multiple ordered slices to reduce the cost of
Copy
. It sounds a bit simple, but is really effective and suitableIts general idea is as follows:
It divides
math.MaxUint64
(this is the value range of fingerprint) intoN
shards:(the actual shard is not as accurate as the above).
Fingerprints
in each shard are kept in order. In this way, we can deal with it as we did with a large array of fingerprints. It brings a small cost (shard), but greatly improves the throughput, and it even helps to reduce memory consumption to some extent (large arrays take up more memory when growing).Comparison:

In my test, each consumer can only consume up to
25000 series/s
before the modification, and it can consume up to1400000 series/s
after the modification. You may have doubts about the data, I will provide the testing process below.