Skip to content

Conversation

SungJin1212
Copy link
Member

@SungJin1212 SungJin1212 commented Aug 5, 2025

This PR adds tsdb metrics cortex_ingester_tsdb_wal_replay_unknown_refs_total and cortex_ingester_tsdb_wbl_replay_unknown_refs_total as per-user metrics to track unknown series references during wal/wbl replaying.

The Prometheus PR: prometheus/prometheus#16166

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@friedrichg friedrichg changed the title Add tsdb metrics to track unknown sereis references during wal/wbl re… Add tsdb metrics to track unknown series references during wal/wbl re… Aug 5, 2025
tsdbWBLReplayUnknownRefsTotal: prometheus.NewDesc(
"cortex_ingester_tsdb_wbl_replay_unknown_refs_total",
"Total number of unknown series references encountered during TSDB WBL replay.",
[]string{"user", "type"}, nil),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need these metrics to be per tenant? Let's make it consistent to other TSDB metrics. I think they are not per tenant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure about that. Yeah, let's make it not per tenant and attach it later if necessary.

@SungJin1212 SungJin1212 force-pushed the Add-wal-unknown-total-metrics branch from 5a1d6eb to 654f3b3 Compare August 6, 2025 07:53
@SungJin1212 SungJin1212 requested a review from yeya24 August 9, 2025 02:22
@yeya24
Copy link
Contributor

yeya24 commented Aug 10, 2025

query_fuzz_test.go:1796: case 1344 results mismatch.
        range query: quantile by (status_code, job) (
          (
              scalar(-{__name__="test_series_b",series="4",status_code=~"4.*"})
            -
              ((0.6373642855522483 != bool 0.14686267726480157) ^ (0.5905225600834878 > bool 0.24146324674051756))
          ),
          min without (__name__) (-count_over_time({__name__="test_series_a"}[2m]))
        )
        res1 len: 1 data: {job="test"} =>
        -Inf @[1754615836.178]
        -Inf @[1754615896.178]
        -Inf @[1754615956.178]
        -Inf @[1754616016.178]
        -Inf @[1754616076.178]
        -Inf @[1754616136.178]
        -Inf @[1754616196.178]
        -Inf @[1754616256.178]
        -Inf @[1754616316.178]
        -Inf @[1754616376.178]
        -Inf @[1754616436.178]
        -Inf @[1754616496.178]
        -Inf @[1754616556.178]
        -Inf @[1754616616.178]
        -Inf @[1754616676.178]
        -Inf @[1754616736.178]
        -Inf @[1754616796.178]
        -Inf @[1754616856.178]
        -Inf @[1754616916.178]
        -Inf @[1754616976.178]
        -Inf @[1754617036.178]
        -Inf @[1754617096.178]
        -Inf @[1754617156.178]
        -Inf @[1754617216.178]
        -Inf @[1754617276.178]
        -Inf @[1754617336.178]
        -Inf @[1754617396.178]
        -Inf @[1754617456.178]
        -Inf @[1754617516.178]
        -Inf @[1754617576.178]
        -Inf @[1754617636.178]
        -Inf @[1754617696.178]
        -Inf @[1754617756.178]
        -Inf @[1754617816.178]
        -Inf @[1754617876.178]
        -Inf @[1754617936.178]
        -Inf @[1754617996.178]
        -Inf @[1754618056.178]
        -Inf @[1754618116.178]
        -Inf @[1754618176.178]
        -Inf @[1754618236.178]
        -Inf @[1754618296.178]
        -Inf @[1754618356.178]
        -Inf @[1754618416.178]
        -Inf @[1754618476.178]
        -Inf @[1754618536.178]
        -Inf @[1754618596.178]
        -Inf @[1754618656.178]
        -Inf @[1754618716.178]
        -Inf @[1754618776.178]
        -Inf @[1754618836.178]
        -Inf @[1754618896.178]
        -Inf @[1754618956.178]
        -Inf @[1754619016.178]
        -Inf @[1754619076.178]
        -Inf @[1754619136.178]
        -Inf @[1754619196.178]
        -Inf @[1754619256.178]
        -Inf @[1754619316.178]
        -Inf @[1754619376.178]
        res2 len: 1 data: {job="test"} =>
        NaN @[1754615836.178]
        NaN @[1754615896.178]
        NaN @[1754615956.178]
        NaN @[1754616016.178]
        NaN @[1754616076.178]
        NaN @[1754616136.178]
        NaN @[1754616196.178]
        NaN @[1754616256.178]
        NaN @[1754616316.178]
        NaN @[1754616376.178]
        NaN @[1754616436.178]
        NaN @[1754616496.178]
        NaN @[1754616556.178]
        NaN @[1754616616.178]
        NaN @[1754616676.178]
        NaN @[1754616736.178]
        NaN @[1754616796.178]
        NaN @[1754616856.178]
        NaN @[1754616916.178]
        NaN @[1754616976.178]
        NaN @[1754617036.178]
        NaN @[1754617096.178]
        NaN @[1754617156.178]
        NaN @[1754617216.178]
        NaN @[1754617276.178]
        NaN @[1754617336.178]
        NaN @[1754617396.178]
        NaN @[1754617456.178]
        NaN @[1754617516.178]
        NaN @[1754617576.178]
        NaN @[1754617636.178]
        NaN @[1754617696.178]
        NaN @[1754617756.178]
        NaN @[1754617816.178]
        NaN @[1754617876.178]
        NaN @[1754617936.178]
        NaN @[1754617996.178]
        NaN @[1754618056.178]
        NaN @[1754618116.178]
        NaN @[1754618176.178]
        NaN @[1754618236.178]
        NaN @[1754618296.178]
        NaN @[1754618356.178]
        NaN @[1754618416.178]
        NaN @[1754618476.178]
        NaN @[1754618536.178]
        NaN @[1754618596.178]
        NaN @[1754618656.178]
        NaN @[1754618716.178]
        NaN @[1754618776.178]
        NaN @[1754618836.178]
        NaN @[1754618896.178]
        NaN @[1754618956.178]
        NaN @[1754619016.178]
        NaN @[1754619076.178]
        NaN @[1754619136.178]
        NaN @[1754619196.178]
        NaN @[1754619256.178]
        NaN @[1754619316.178]
        NaN @[1754619376.178]
    query_fuzz_test.go:1801: 
        	Error Trace:	/home/runner/work/cortex/cortex/integration/query_fuzz_test.go:1801
        	            				/home/runner/work/cortex/cortex/integration/query_fuzz_test.go:1584
        	Error:      	finished query fuzzing tests
        	Test:       	TestBackwardCompatibilityQueryFuzz
        	Messages:   	1 test cases failed

This might worth taking a closer look as it keeps failing

@yeya24 yeya24 requested a review from danielblando August 10, 2025 17:12
@yeya24 yeya24 merged commit d9079f5 into cortexproject:master Aug 11, 2025
72 of 77 checks passed
aclaygray pushed a commit to aclaygray/cortex that referenced this pull request Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants