Skip to content

Commit f14ffee

Browse files
Integrate CRUD statistics with metrics
If `metrics` [1] found, metrics collectors are used to store statistics. It is required to use `>= 0.5.0`, while at least `0.9.0` is recommended to support age buckets in summary. The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. If `metrics` found, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Add CI matrix to run tests with `metrics` installed. 1. https://github.com/tarantool/metrics Closes #224
1 parent 0971c76 commit f14ffee

File tree

6 files changed

+526
-4
lines changed

6 files changed

+526
-4
lines changed

.github/workflows/test_on_push.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,16 @@ jobs:
1414
# We need 1.10.6 here to check that module works with
1515
# old Tarantool versions that don't have "tuple-keydef"/"tuple-merger" support.
1616
tarantool-version: ["1.10.6", "1.10", "2.2", "2.3", "2.4", "2.5", "2.6", "2.7"]
17+
metrics-version: ""
1718
remove-merger: [false]
1819
include:
20+
- tarantool-version: "1.10"
21+
metrics-version: "0.12.0"
1922
- tarantool-version: "2.7"
2023
remove-merger: true
2124
- tarantool-version: "2.8"
2225
coveralls: true
26+
metrics-version: ["", "0.12.0"]
2327
fail-fast: false
2428
runs-on: [ubuntu-latest]
2529
steps:
@@ -47,6 +51,10 @@ jobs:
4751
tarantool --version
4852
./deps.sh
4953
54+
- name: Install metrics
55+
if: matrix.metrics-version != ''
56+
run: tarantoolctl rocks install metrics ${{ matrix.metrics-version }}
57+
5058
- name: Remove external merger if needed
5159
if: ${{ matrix.remove-merger }}
5260
run: rm .rocks/lib/tarantool/tuple/merger.so

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
99

1010
### Added
1111
* Statistics for CRUD operations on router (#224).
12+
* Integrate CRUD statistics with `metrics` (#224).
1213

1314
### Changed
1415

README.md

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -604,6 +604,11 @@ crud.disable_stats()
604604
crud.enable_stats()
605605
```
606606

607+
If [`metrics`](https://github.com/tarantool/metrics) found,
608+
metrics collectors are used to store statistics.
609+
It is required to use `>= 0.5.0`, while at least `0.9.0`
610+
is recommended to support age buckets in summary.
611+
607612
Enabling stats on non-router instances is meaningless.
608613

609614
`crud.stats()` contains several sections: `insert` (for `insert` and `insert_object` calls),
@@ -628,9 +633,46 @@ crud.stats()['insert']
628633
Each section contains different collectors for success calls
629634
and error (both error throw and `nil, err`) returns. `count`
630635
is total requests count since instance start or stats restart.
631-
`latency` is average time of requests execution,
636+
`latency` is 0.99 quantile of request execution time,
637+
(if `metrics` not found, shows average instead).
632638
`time` is total time of requests execution.
633639

640+
In `metrics` registry statistics are stored as `tnt_crud_stats` metrics
641+
with `operation` and `status` label_pairs.
642+
```
643+
metrics:collect()
644+
---
645+
- - label_pairs:
646+
status: ok
647+
operation: insert
648+
value: 221411
649+
metric_name: tnt_crud_stats_count
650+
- label_pairs:
651+
status: ok
652+
operation: insert
653+
value: 10.49834896344692
654+
metric_name: tnt_crud_stats_sum
655+
- label_pairs:
656+
status: ok
657+
operation: insert
658+
quantile: 0.5
659+
value: 0.000003523699706
660+
metric_name: tnt_crud_stats
661+
- label_pairs:
662+
status: ok
663+
operation: insert
664+
quantile: 0.9
665+
value: 0.000006997063523
666+
metric_name: tnt_crud_stats
667+
- label_pairs:
668+
status: ok
669+
operation: insert
670+
quantile: 0.99
671+
value: 0.00023606420935973
672+
metric_name: tnt_crud_stats
673+
...
674+
```
675+
634676
Additionally, `select` section contains `details` collectors.
635677
```lua
636678
crud.stats()['select']['details']
@@ -644,7 +686,10 @@ crud.stats()['select']['details']
644686
(including those not executed successfully). `tuples_fetched`
645687
is a count of tuples fetched from storages during execution,
646688
`tuples_lookup` is a count of tuples looked up on storages
647-
while collecting response for call.
689+
while collecting response for call. In `metrics` registry they
690+
are stored as `tnt_crud_map_reduces`, `tnt_crud_tuples_fetched`
691+
and `tnt_crud_tuples_lookup` metrics with
692+
`{ operation = 'select' }` label_pairs.
648693

649694
## Cartridge roles
650695

crud/stats/metrics_registry.lua

Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
local is_package, metrics = pcall(require, 'metrics')
2+
local label = require('crud.stats.label')
3+
local dev_checks = require('crud.common.dev_checks')
4+
local registry_common = require('crud.stats.registry_common')
5+
6+
local registry = {}
7+
local _registry = {}
8+
9+
local metric_name = {
10+
-- Summary collector for all operations.
11+
op = 'tnt_crud_stats',
12+
-- `*_count` and `*_sum` are automatically created
13+
-- by summary collector.
14+
op_count = 'tnt_crud_stats_count',
15+
op_sum = 'tnt_crud_stats_sum',
16+
17+
-- Counter collectors for select/pairs details.
18+
tuples_fetched = 'tnt_crud_tuples_fetched',
19+
tuples_lookup = 'tnt_crud_tuples_lookup',
20+
map_reduces = 'tnt_crud_map_reduces',
21+
}
22+
23+
local LATENCY_QUANTILE = 0.99
24+
25+
local DEFAULT_QUANTILES = {
26+
[0.5] = 1e-9,
27+
[0.9] = 1e-9,
28+
[LATENCY_QUANTILE] = 1e-9,
29+
}
30+
31+
local DEFAULT_SUMMARY_PARAMS = {
32+
max_age_time = 60,
33+
age_buckets_count = 5,
34+
}
35+
36+
--- Check if application supports metrics rock for registry
37+
--
38+
-- `metrics >= 0.5.0` is required to use summary,
39+
-- while at least `metrics >= 0.9.0` is required
40+
-- for age bucket support in quantiles.
41+
--
42+
-- @function is_supported
43+
--
44+
-- @treturn boolean Returns true if `metrics >= 0.5.0` found, false otherwise.
45+
--
46+
function registry.is_supported()
47+
if is_package == false then
48+
return false
49+
end
50+
51+
-- Only metrics >= 0.5.0 supported.
52+
if metrics.summary == nil then
53+
return false
54+
end
55+
56+
return true
57+
end
58+
59+
60+
--- Initialize collectors in global metrics registry
61+
--
62+
-- @function init
63+
--
64+
-- @treturn boolean Returns true.
65+
--
66+
function registry.init()
67+
_registry[metric_name.op] = metrics.summary(
68+
metric_name.op,
69+
'CRUD router calls statistics',
70+
DEFAULT_QUANTILES,
71+
DEFAULT_SUMMARY_PARAMS)
72+
73+
_registry[metric_name.tuples_fetched] = metrics.counter(
74+
metric_name.tuples_fetched,
75+
'Tuples fetched from CRUD storages during select/pairs')
76+
77+
_registry[metric_name.tuples_lookup] = metrics.counter(
78+
metric_name.tuples_lookup,
79+
'Tuples looked up on CRUD storages while collecting response during select/pairs')
80+
81+
_registry[metric_name.map_reduces] = metrics.counter(
82+
metric_name.map_reduces,
83+
'Map reduces planned during CRUD select/pairs')
84+
85+
return true
86+
end
87+
88+
--- Unregister collectors in global metrics registry
89+
--
90+
-- @function destroy
91+
--
92+
-- @treturn boolean Returns true.
93+
--
94+
function registry.destroy()
95+
for _, c in pairs(_registry) do
96+
metrics.registry:unregister(c)
97+
end
98+
99+
_registry = {}
100+
return true
101+
end
102+
103+
--- Get copy of global metrics registry
104+
--
105+
-- @function get
106+
--
107+
-- @treturn table Returns copy of metrics registry.
108+
function registry.get()
109+
local stats = {}
110+
111+
if next(_registry) == nil then
112+
return stats
113+
end
114+
115+
-- Fill empty collectors with zero values.
116+
for _, op_label in pairs(label) do
117+
stats[op_label] = registry_common.build_collector(op_label)
118+
end
119+
120+
for _, obs in ipairs(_registry[metric_name.op]:collect()) do
121+
local operation = obs.label_pairs.operation
122+
local status = obs.label_pairs.status
123+
if obs.metric_name == metric_name.op then
124+
if obs.label_pairs.quantile == LATENCY_QUANTILE then
125+
stats[operation][status].latency = obs.value
126+
end
127+
elseif obs.metric_name == metric_name.op_sum then
128+
stats[operation][status].time = obs.value
129+
elseif obs.metric_name == metric_name.op_count then
130+
stats[operation][status].count = obs.value
131+
end
132+
end
133+
134+
local _, obs_tuples_fetched = next(_registry[metric_name.tuples_fetched]:collect())
135+
if obs_tuples_fetched ~= nil then
136+
stats[label.SELECT].details.tuples_fetched = obs_tuples_fetched.value
137+
end
138+
139+
local _, obs_tuples_lookup = next(_registry[metric_name.tuples_lookup]:collect())
140+
if obs_tuples_lookup ~= nil then
141+
stats[label.SELECT].details.tuples_lookup = obs_tuples_lookup.value
142+
end
143+
144+
local _, obs_map_reduces = next(_registry[metric_name.map_reduces]:collect())
145+
if obs_map_reduces ~= nil then
146+
stats[label.SELECT].details.map_reduces = obs_map_reduces.value
147+
end
148+
149+
return stats
150+
end
151+
152+
--- Increase requests count and update latency info
153+
--
154+
-- @function observe
155+
--
156+
-- @tparam string op_label
157+
-- Label of registry collectos.
158+
-- Use `require('crud.common.const').OP` to pick one.
159+
--
160+
-- @tparam boolean success
161+
-- true if no errors on execution, false otherwise.
162+
--
163+
-- @tparam number latency
164+
-- Time of call execution.
165+
--
166+
-- @treturn boolean Returns true.
167+
--
168+
function registry.observe(op_label, success, latency)
169+
dev_checks('string', 'boolean', 'number')
170+
171+
local label_pairs = { operation = op_label }
172+
if success == true then
173+
label_pairs.status = 'ok'
174+
else
175+
label_pairs.status = 'error'
176+
end
177+
178+
_registry[metric_name.op]:observe(latency, label_pairs)
179+
180+
return true
181+
end
182+
183+
--- Increase statistics of storage select/pairs calls
184+
--
185+
-- @function observe_fetch
186+
--
187+
-- @tparam number tuples_fetched
188+
-- Count of tuples fetched during storage call.
189+
--
190+
-- @tparam number tuples_lookup
191+
-- Count of tuples looked up on storages while collecting response.
192+
--
193+
-- @treturn boolean Returns true.
194+
--
195+
function registry.observe_fetch(tuples_fetched, tuples_lookup)
196+
dev_checks('number', 'number')
197+
198+
local label_pairs = { operation = label.SELECT }
199+
200+
_registry[metric_name.tuples_fetched]:inc(tuples_fetched, label_pairs)
201+
_registry[metric_name.tuples_lookup]:inc(tuples_lookup, label_pairs)
202+
return true
203+
end
204+
205+
--- Increase statistics of planned map reduces during select/pairs
206+
--
207+
-- @function observe_map_reduces
208+
--
209+
-- @tparam number count
210+
-- Count of map reduces planned.
211+
--
212+
-- @treturn boolean Returns true.
213+
--
214+
function registry.observe_map_reduces(count)
215+
dev_checks('number')
216+
217+
local label_pairs = { operation = label.SELECT }
218+
219+
_registry[metric_name.map_reduces]:inc(count, label_pairs)
220+
return true
221+
end
222+
223+
return registry

crud/stats/module.lua

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,19 @@ local clock = require('clock')
22
local dev_checks = require('crud.common.dev_checks')
33
local utils = require('crud.common.utils')
44

5-
local stats_registry = require('crud.stats.local_registry')
6-
75
local stats = {}
86
local _is_enabled = false
97

8+
local stats_registry
9+
local local_registry = require('crud.stats.local_registry')
10+
local metrics_registry = require('crud.stats.metrics_registry')
11+
12+
if metrics_registry.is_supported() then
13+
stats_registry = metrics_registry
14+
else
15+
stats_registry = local_registry
16+
end
17+
1018
--- Check if statistics module if enabled
1119
--
1220
-- @function is_enabled

0 commit comments

Comments
 (0)