Skip to content

The throughput of ingester causes a lot of remote write latency #3093

Closed
@storyicon

Description

@storyicon

I used cortex in production environment, which was stable, but there was a lot of latency after adding some new remote writes from Prometheus.

I use the following expression to calculate the delay of remote write:

(time() - prometheus_remote_storage_queue_highest_sent_timestamp_seconds{instance_name=~"$instance_name"}) < 100000

It's measured in seconds, and it looks terrible:
image

Through monitoring, I have good reasons to believe that Distributor and Ingester have not encountered resource bottlenecks (CPU/Memory)
Through the distributor's log, I can see that a large number of push requests take more than 5 seconds, or even more than 40 seconds, which makes me surprised. As far as I know, when the ingester receives the data, it only writes to the memory without any IO operation. Data persistence is done periodically and asynchronously. It should be very fast。
I conducted a detailed time-consuming analysis (by recording the time-consuming of the core steps in the context and passing them layer by layer). Finally, I found that 90% of the processing time of the push request (for example, 8s out of 10s) is generated in the following code:
https://github.com/cortexproject/cortex/blob/master/pkg/ingester/index/index.go#L118-L151

The time spent in this code is mainly concentrated in two places:

  1. Lock
    https://github.com/cortexproject/cortex/blob/master/pkg/ingester/index/index.go#L119
  2. Copy
    https://github.com/cortexproject/cortex/blob/master/pkg/ingester/index/index.go#L144

According to actual measurement, a certain Push takes 30s (including 10000 series), among which Copy takes 14427ms and Lock takes 15313ms.

As a time series database, throughput is an extremely core indicator, this problem should be solved.


My remote write rate is 830000 timeseries/s, each series contains only one sample and about 10 labels in every request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions