-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
BufferError: Local: Queue full #18624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It seems like I'm sure this can be done in other parts of the code too so it might be happening, but currently I have to restart Sentry every X messages it seems like. |
I am facing exactly same thing after 40 hours of uptime. Every event produces this error. All queues from
So could anybody add Some more metadata from last line:
|
I’m not entirely sure why you’re getting it, but it’s being misdiagnosed. The local queue can’t flush because it’s unable to write to Kafka. Why it can’t write to Kafka, I don’t know. But that’s the issue. The only reason that queue will get full is if the broker is down or unable to accept data or something along those lines. So there’s nothing wrong here or anything wrong with what we’re doing. |
Also the “queue” in this case is referring to an in-memory queue being flushing/writing to Kafka since writes are async. Nothing will be yielded from |
I don't see any error logs in kafka container. And restarting only the
Next thing I have noticed is that all events are processed by Sentry "just fine". So I can see them on Sentry Issue detail page. And in Sentry it looks like everything is working in spite of the Queue full error is bombarding web container logs. |
I have added Btw. https://github.com/getsentry/sentry/blob/master/src/sentry/eventstream/kafka/backend.py#L52 |
@petrprikryl can you tell us if you see any outcomes data on your instance? You should see something under the path |
@BYK it happened with us today,the errors came exclusively from a specific project, and it's the only one that uses filters for not accepting events, at the moment it happened there was 13,096 filtered events vs 12.862 queue full events. Note: We are using an Error Message inbound filtering for filtering the message |
@BYK I validated it again (because it happened again) and removing the filters seems to have fixed it (at the cost of receiving spam) I feel like using the Discarded Feature will "fix" it but maybe it would be nice to see why such errors are happening if you get something around 3.5K filtered events in minutes |
@ibm5155 it seems like this is indeed a missing |
Why don’t we have an issue with this ourselves in production? I’d be skeptical of slapping down a patch without understanding that. cc @getsentry/sns |
We do see this in Snuba from time to time, but only with a producer that follows a similar usage pattern as the one in Sentry. We haven't seen this in any producers that explicitly call |
We simply may have more resources at hand. Also, AFAIK relay would mitigate most of the issue for outcomes. /cc @jan-auer
There are multiple issues on Kafka repo itself, strongly recommending the use of a
This is essentially making the call synchronous as
This is also what Rust lib does from what I heard from @jan-auer which also supports adding the |
Fixes #18624. Kafka needs `poll()` to be called at regular intervals to clear its in-memory buffer and triggering any callbacks for producers. This patch adds the missing `poll(0)` call which is essentially free, to the main `KafkaProducer` class, mainly affecting the `track_outcomes` producer as the other user uses the synchronous mode, calling `flush()` which calls `poll()` behind the scenes already.
FWIW, the Kafka library we use in Relay also calls
Tiny clarification and disclaimer: I do not know what the root cause of this is, that is, why sending the message is rejected. With the introduction of Relay to onpremise we've also added an outcomes consumer. So if the problem is that Kafka no longer accepts because it fills up or the offset gets too large, then that added consumer will help. |
I think this is happening even with that consumer. |
Fixes #18624. Kafka needs `poll()` to be called at regular intervals to clear its in-memory buffer and triggering any callbacks for producers. This patch adds the missing `poll(0)` call which is essentially free, to the main `KafkaProducer` class, mainly affecting the `track_outcomes` producer as the other user uses the synchronous mode, calling `flush()` which calls `poll()` behind the scenes already.
I am receiving this error once every 2-4 days and I need to restart Sentry to fix it. This started after moving to the Docker version of Sentry.
I never noticed this being an issue on 9.1.2 also with Clickhouse and Snuba running, but without Kafka.
I am not sure where to look / poke / monitor to see this queue that is being spoken of and how I can flush it / enlarge it if needed.
sentry queues list
showed all 0's so it's not looking like there is a massive backlog of events.Any help is appreciated!
The text was updated successfully, but these errors were encountered: