Skip to content

Commit d22a29d

Browse files
Integrate CRUD statistics with metrics
If `metrics` [1] found, metrics collectors are used to store statistics. It is required to use `>= 0.5.0`, while at least `0.9.0` is recommended to support age buckets in summary. The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. If `metrics` found, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Add CI matrix to run tests with `metrics` installed. To get real coverage result from coveralls, it is needed to merge different CI job results. See more in #248. 1. https://github.com/tarantool/metrics Closes #224
1 parent b605917 commit d22a29d

File tree

6 files changed

+525
-5
lines changed

6 files changed

+525
-5
lines changed

.github/workflows/test_on_push.yaml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,19 @@ jobs:
1313
matrix:
1414
# We need 1.10.6 here to check that module works with
1515
# old Tarantool versions that don't have "tuple-keydef"/"tuple-merger" support.
16-
tarantool-version: ["1.10.6", "1.10", "2.2", "2.3", "2.4", "2.5", "2.6", "2.7"]
16+
tarantool-version: ["1.10.6", "1.10", "2.2", "2.3", "2.4", "2.5", "2.6", "2.7", "2.8"]
17+
metrics-version: [""]
1718
remove-merger: [false]
1819
include:
1920
- tarantool-version: "2.7"
2021
remove-merger: true
22+
- tarantool-version: "2.8"
23+
metrics-version: "0.1.8"
24+
- tarantool-version: "2.8"
25+
metrics-version: "0.9.0"
2126
- tarantool-version: "2.8"
2227
coveralls: true
28+
metrics-version: "0.12.0"
2329
fail-fast: false
2430
runs-on: [ubuntu-latest]
2531
steps:
@@ -47,6 +53,10 @@ jobs:
4753
tarantool --version
4854
./deps.sh
4955
56+
- name: Install metrics
57+
if: matrix.metrics-version != ''
58+
run: tarantoolctl rocks install metrics ${{ matrix.metrics-version }}
59+
5060
- name: Remove external merger if needed
5161
if: ${{ matrix.remove-merger }}
5262
run: rm .rocks/lib/tarantool/tuple/merger.so

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
99

1010
### Added
1111
* Statistics for CRUD operations on router (#224).
12+
* Integrate CRUD statistics with `metrics` (#224).
1213

1314
### Changed
1415

README.md

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -607,6 +607,11 @@ While statistics collection should not affect performance
607607
in a noticeable way, you may disable it if you want to
608608
prioritize performance.
609609

610+
If [`metrics`](https://github.com/tarantool/metrics) found,
611+
metrics collectors are used to store statistics.
612+
It is required to use version `0.9.0` or greater,
613+
otherwise local collectors will be used.
614+
610615
Enabling stats on non-router instances is meaningless.
611616

612617
`crud.stats()` contains several sections: `insert` (for `insert` and `insert_object` calls),
@@ -631,9 +636,40 @@ crud.stats()['insert']
631636
Each section contains different collectors for success calls
632637
and error (both error throw and `nil, err`) returns. `count`
633638
is total requests count since instance start or stats restart.
634-
`latency` is average time of requests execution,
639+
`latency` is 0.99 quantile of request execution time,
640+
(if `metrics` not found, shows average instead).
635641
`time` is total time of requests execution.
636642

643+
In `metrics` registry statistics are stored as `tnt_crud_stats` metrics
644+
with `operation` and `status` label_pairs.
645+
```
646+
metrics:collect()
647+
---
648+
- - label_pairs:
649+
status: ok
650+
operation: insert
651+
value: 221411
652+
metric_name: tnt_crud_stats_count
653+
- label_pairs:
654+
status: ok
655+
operation: insert
656+
value: 10.49834896344692
657+
metric_name: tnt_crud_stats_sum
658+
- label_pairs:
659+
status: ok
660+
operation: insert
661+
quantile: 0.5
662+
value: 0.000003523699706
663+
metric_name: tnt_crud_stats
664+
- label_pairs:
665+
status: ok
666+
operation: insert
667+
quantile: 0.99
668+
value: 0.00023606420935973
669+
metric_name: tnt_crud_stats
670+
...
671+
```
672+
637673
Additionally, `select` section contains `details` collectors.
638674
```lua
639675
crud.stats()['select']['details']
@@ -647,7 +683,10 @@ crud.stats()['select']['details']
647683
(including those not executed successfully). `tuples_fetched`
648684
is a count of tuples fetched from storages during execution,
649685
`tuples_lookup` is a count of tuples looked up on storages
650-
while collecting response for call.
686+
while collecting response for call. In `metrics` registry they
687+
are stored as `tnt_crud_map_reduces`, `tnt_crud_tuples_fetched`
688+
and `tnt_crud_tuples_lookup` metrics with
689+
`{ operation = 'select' }` label_pairs.
651690

652691
## Cartridge roles
653692

crud/stats/metrics_registry.lua

Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
local is_package, metrics = pcall(require, 'metrics')
2+
3+
local label = require('crud.stats.label')
4+
local dev_checks = require('crud.common.dev_checks')
5+
local registry_common = require('crud.stats.registry_common')
6+
7+
local registry = {}
8+
local _registry = {}
9+
10+
local metric_name = {
11+
-- Summary collector for all operations.
12+
op = 'tnt_crud_stats',
13+
-- `*_count` and `*_sum` are automatically created
14+
-- by summary collector.
15+
op_count = 'tnt_crud_stats_count',
16+
op_sum = 'tnt_crud_stats_sum',
17+
18+
-- Counter collectors for select/pairs details.
19+
tuples_fetched = 'tnt_crud_tuples_fetched',
20+
tuples_lookup = 'tnt_crud_tuples_lookup',
21+
map_reduces = 'tnt_crud_map_reduces',
22+
}
23+
24+
local LATENCY_QUANTILE = 0.99
25+
26+
local DEFAULT_QUANTILES = {
27+
[0.5] = 1e-3,
28+
[LATENCY_QUANTILE] = 1e-3,
29+
}
30+
31+
local DEFAULT_SUMMARY_PARAMS = {
32+
max_age_time = 60,
33+
age_buckets_count = 5,
34+
}
35+
36+
--- Check if application supports metrics rock for registry
37+
--
38+
-- `metrics >= 0.9.0` is required to use summary with
39+
-- age buckets. `metrics >= 0.5.0, < 0.9.0` is unsupported
40+
-- due to quantile overflow bug
41+
-- (https://github.com/tarantool/metrics/issues/235).
42+
--
43+
-- @function is_supported
44+
--
45+
-- @treturn boolean Returns true if `metrics >= 0.9.0` found, false otherwise.
46+
--
47+
function registry.is_supported()
48+
if is_package == false then
49+
return false
50+
end
51+
52+
-- Only metrics >= 0.9.0 supported.
53+
local is_summary, summary = pcall(require, 'metrics.collectors.summary')
54+
if is_summary == false or summary.rotate_age_buckets == nil then
55+
return false
56+
end
57+
58+
return true
59+
end
60+
61+
62+
--- Initialize collectors in global metrics registry
63+
--
64+
-- @function init
65+
--
66+
-- @treturn boolean Returns true.
67+
--
68+
function registry.init()
69+
_registry[metric_name.op] = metrics.summary(
70+
metric_name.op,
71+
'CRUD router calls statistics',
72+
DEFAULT_QUANTILES,
73+
DEFAULT_SUMMARY_PARAMS)
74+
75+
_registry[metric_name.tuples_fetched] = metrics.counter(
76+
metric_name.tuples_fetched,
77+
'Tuples fetched from CRUD storages during select/pairs')
78+
79+
_registry[metric_name.tuples_lookup] = metrics.counter(
80+
metric_name.tuples_lookup,
81+
'Tuples looked up on CRUD storages while collecting response during select/pairs')
82+
83+
_registry[metric_name.map_reduces] = metrics.counter(
84+
metric_name.map_reduces,
85+
'Map reduces planned during CRUD select/pairs')
86+
87+
return true
88+
end
89+
90+
--- Unregister collectors in global metrics registry
91+
--
92+
-- @function destroy
93+
--
94+
-- @treturn boolean Returns true.
95+
--
96+
function registry.destroy()
97+
for _, c in pairs(_registry) do
98+
metrics.registry:unregister(c)
99+
end
100+
101+
_registry = {}
102+
return true
103+
end
104+
105+
--- Get copy of global metrics registry
106+
--
107+
-- @function get
108+
--
109+
-- @treturn table Returns copy of metrics registry.
110+
function registry.get()
111+
local stats = {}
112+
113+
if next(_registry) == nil then
114+
return stats
115+
end
116+
117+
-- Fill empty collectors with zero values.
118+
for _, op_label in pairs(label) do
119+
stats[op_label] = registry_common.build_collector(op_label)
120+
end
121+
122+
for _, obs in ipairs(_registry[metric_name.op]:collect()) do
123+
local operation = obs.label_pairs.operation
124+
local status = obs.label_pairs.status
125+
if obs.metric_name == metric_name.op then
126+
if obs.label_pairs.quantile == LATENCY_QUANTILE then
127+
stats[operation][status].latency = obs.value
128+
end
129+
elseif obs.metric_name == metric_name.op_sum then
130+
stats[operation][status].time = obs.value
131+
elseif obs.metric_name == metric_name.op_count then
132+
stats[operation][status].count = obs.value
133+
end
134+
end
135+
136+
local _, obs_tuples_fetched = next(_registry[metric_name.tuples_fetched]:collect())
137+
if obs_tuples_fetched ~= nil then
138+
stats[label.SELECT].details.tuples_fetched = obs_tuples_fetched.value
139+
end
140+
141+
local _, obs_tuples_lookup = next(_registry[metric_name.tuples_lookup]:collect())
142+
if obs_tuples_lookup ~= nil then
143+
stats[label.SELECT].details.tuples_lookup = obs_tuples_lookup.value
144+
end
145+
146+
local _, obs_map_reduces = next(_registry[metric_name.map_reduces]:collect())
147+
if obs_map_reduces ~= nil then
148+
stats[label.SELECT].details.map_reduces = obs_map_reduces.value
149+
end
150+
151+
return stats
152+
end
153+
154+
--- Increase requests count and update latency info
155+
--
156+
-- @function observe
157+
--
158+
-- @tparam string op_label
159+
-- Label of registry collectos.
160+
-- Use `require('crud.common.const').OP` to pick one.
161+
--
162+
-- @tparam boolean success
163+
-- true if no errors on execution, false otherwise.
164+
--
165+
-- @tparam number latency
166+
-- Time of call execution.
167+
--
168+
-- @treturn boolean Returns true.
169+
--
170+
function registry.observe(op_label, success, latency)
171+
dev_checks('string', 'boolean', 'number')
172+
173+
local label_pairs = { operation = op_label }
174+
if success == true then
175+
label_pairs.status = 'ok'
176+
else
177+
label_pairs.status = 'error'
178+
end
179+
180+
_registry[metric_name.op]:observe(latency, label_pairs)
181+
182+
return true
183+
end
184+
185+
--- Increase statistics of storage select/pairs calls
186+
--
187+
-- @function observe_fetch
188+
--
189+
-- @tparam number tuples_fetched
190+
-- Count of tuples fetched during storage call.
191+
--
192+
-- @tparam number tuples_lookup
193+
-- Count of tuples looked up on storages while collecting response.
194+
--
195+
-- @treturn boolean Returns true.
196+
--
197+
function registry.observe_fetch(tuples_fetched, tuples_lookup)
198+
dev_checks('number', 'number')
199+
200+
local label_pairs = { operation = label.SELECT }
201+
202+
_registry[metric_name.tuples_fetched]:inc(tuples_fetched, label_pairs)
203+
_registry[metric_name.tuples_lookup]:inc(tuples_lookup, label_pairs)
204+
return true
205+
end
206+
207+
--- Increase statistics of planned map reduces during select/pairs
208+
--
209+
-- @function observe_map_reduces
210+
--
211+
-- @tparam number count
212+
-- Count of map reduces planned.
213+
--
214+
-- @treturn boolean Returns true.
215+
--
216+
function registry.observe_map_reduces(count)
217+
dev_checks('number')
218+
219+
local label_pairs = { operation = label.SELECT }
220+
221+
_registry[metric_name.map_reduces]:inc(count, label_pairs)
222+
return true
223+
end
224+
225+
return registry

crud/stats/module.lua

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,19 @@ local clock = require('clock')
22
local dev_checks = require('crud.common.dev_checks')
33
local utils = require('crud.common.utils')
44

5-
local stats_registry = require('crud.stats.local_registry')
6-
75
local stats = {}
86
local _is_enabled = false
97

8+
local stats_registry
9+
local local_registry = require('crud.stats.local_registry')
10+
local metrics_registry = require('crud.stats.metrics_registry')
11+
12+
if metrics_registry.is_supported() then
13+
stats_registry = metrics_registry
14+
else
15+
stats_registry = local_registry
16+
end
17+
1018
--- Check if statistics module if enabled
1119
--
1220
-- @function is_enabled

0 commit comments

Comments
 (0)