Replies: 1 comment
-
| I'm going to close as a duplicate of #10369 | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Team,
Refer to https://docs.fluentbit.io/manual/administration/scheduling-and-retries#configure-retries , there are 3 types of values for scheduler Retry_Limit: N, no_limits and no_retries.
For our case, the output is Loki and we set Retry_Limit with no_limites.
In Loki site, we allow to accept these old logs within 2 days by setting proper value of ingester.max_chunk_age.
Now our scenario is that in a rainy case, we manually make Loki to offline and let fluent-bit to buffer/cache local chunk/logs more than 2 days, then online the Loki service.
With the scheduler of retry, local chunks will be flushed to Loki one by one with each dedicated task_id.
However, for these timestamps that older than 48h, Loki will reject to accept them by reporting "write operation failed, older acceptable timestamp is xxxx" error. For fluent-bit, the flush action is not successful and loop in the endless retries.
This is the actual issue for us, when Loki could be back within 48h, all local buffered chunks could be flushed to Loki with no_limits successfully, but when it is larger then 48h, some logs will be sent to Loki endless and Loki will always reject them.
Therefore, I would like to propose another values of Retry_Limit which is a timeout-based value.
For example, when Retry_Limit is set to 24h, then for one specific retry action will be same as no_limits from start, but when the retry action reached 24h, it will change to no_retries.
Appreciate for any comments or suggestions and I will raise one feature request later.
Beta Was this translation helpful? Give feedback.
All reactions