@@ -3,8 +3,8 @@ LuaJIT memory profiler
3
3
======================
4
4
5
5
Starting from version :doc: `2.7.1 </release/2.7.1 >`, Tarantool
6
- has the built- in module called ``misc.memprof `` that implements the LuaJIT memory
7
- profiler and the profile parser (further, *profiler *). The profiler provides
6
+ has the built‑ in module called ``misc.memprof `` that implements the LuaJIT memory
7
+ profiler (further, *profiler *). The profiler provides
8
8
a memory allocation report that helps analyse Lua code and find out the places
9
9
that put the most pressure on the Lua garbage collector (GC).
10
10
@@ -42,6 +42,7 @@ Below is a chunk of simple Lua code named ``test.lua`` to illustrate this.
42
42
.. code-block :: lua
43
43
:linenos:
44
44
45
+ -- Prevent allocations on traces.
45
46
jit.off()
46
47
local str, err = misc.memprof.start("memprof_new.bin")
47
48
-- Lua doesn't create a new frame to call string.rep, and all allocations
@@ -52,49 +53,28 @@ Below is a chunk of simple Lua code named ``test.lua`` to illustrate this.
52
53
53
54
local t = {}
54
55
for i = 1, 1e5 do
55
- -- table.insert is the built- in function and all corresponding
56
+ -- table.insert is the built‑ in function and all corresponding
56
57
-- allocations are reported in the scope of the main chunk.
57
58
table.insert(t,
58
59
append('q', i)
59
60
)
60
61
end
61
62
local stp, err = misc.memprof.stop()
62
63
63
- .. note ::
64
-
65
- Usually, the information about allocations inside Lua built-ins are not really
66
- useful for the developer. That's why if a Lua built-in function is called from
67
- a Lua function, the profiler attributes all allocations to the Lua function.
68
- Otherwise, this event is attributed to a C function.
69
-
70
- Tail call optimization doesn't create a new call frame, so all allocations
71
- inside the function called via the ``CALLT/CALLMT `` `bytecodes <http://wiki.luajit.org/Bytecode-2.0 >`_
72
- are attributed to its caller.
73
-
74
- Example above illustrates these cases.
75
-
76
64
Starting profiler in Lua code:
77
65
78
66
.. code-block :: lua
79
67
80
68
local str, err = misc.memprof.start(FILENAME)
81
69
82
- where ``FILENAME `` is a name of the binary file where profile events are written.
83
- The writer for this function performs ``fwrite() `` for each call retrying
84
- in case of ``EINTR ``. When the profiling is stopped, ``fclose() `` is called.
70
+ where ``FILENAME `` is a name of the binary file where profiling events are written.
85
71
86
72
If it is not possible to open a file for writing or the profiler fails to start,
87
73
``misc.memprof.start() `` returns ``nil `` on failure. Also, in this case
88
74
the function returns an error message as the second result and
89
75
a system-dependent error code as the third result.
90
76
Otherwise, it returns ``true ``.
91
77
92
- .. note ::
93
-
94
- It is recommended to switch the JIT compilation off by calling ``jit.off() ``
95
- before the profiler start. Refer to the following
96
- :ref: `explanation <profiler_usage_internal_jitoff >` for details.
97
-
98
78
Stopping profiler in Lua code:
99
79
100
80
.. code-block :: lua
@@ -110,8 +90,9 @@ Otherwise, it returns ``true``.
110
90
111
91
.. _profiler_usage_generate :
112
92
113
- To generate the ``memprof_new.bin `` file with the memory profile in binary format,
114
- execute the code under Tarantool:
93
+ To generate the file with memory profile in binary format
94
+ (in the :ref: `example above <profiler_usage_example01 >`,
95
+ it's ``memprof_new.bin ``), execute the code under Tarantool:
115
96
116
97
.. code-block :: tarantoolconsole
117
98
@@ -121,6 +102,26 @@ Tarantool collects the allocation events in ``memprof_new.bin``, puts
121
102
the file in its :ref: `working directory <cfg_basic-work_dir >`, and closes
122
103
the session.
123
104
105
+ The :ref: `code example <profiler_usage_example01 >` above also illustrates the memory
106
+ allocation logic in some of the cases that are important to understand for further
107
+ :ref: `reading <profiler_usage_parse >` and :ref: `analysing <profiler_analysis >`
108
+ a profiling report:
109
+
110
+ * Line 2: It is recommended to switch the JIT compilation off by calling ``jit.off() ``
111
+ before the profiler start. Refer to the following
112
+ :ref: `explanation <profiler_usage_internal_jitoff >` for more details.
113
+
114
+ * Lines 6-8: Tail call optimization doesn't create a new call frame, so all
115
+ allocations inside the function called via the ``CALLT/CALLMT `` `bytecodes <http://wiki.luajit.org/Bytecode-2.0#calls-and-vararg-handling >`_
116
+ are attributed to its caller. See also comments to these lines.
117
+
118
+ * Lines 14-16: Usually, the information about allocations inside Lua built‑ins
119
+ are not really
120
+ useful for developers. That's why if a Lua built‑in function is called from
121
+ a Lua function, the profiler attributes all allocations to the Lua function.
122
+ Otherwise, this event is attributed to a C function.
123
+ See also comments to these lines.
124
+
124
125
.. _profiler_usage_parse :
125
126
126
127
Parsing binary profile and generating profiling report
@@ -140,36 +141,35 @@ via Tarantool by using the following command
140
141
where ``memprof_new.bin `` is the binary profile
141
142
:ref: `generated earlier <profiler_usage_generate >`.
142
143
143
- Tarantool generates a profiling report and closes the session.
144
+ Tarantool generates a profiling report that is displayed in console and closes
145
+ the session:
144
146
145
147
.. code-block :: console
146
148
147
149
ALLOCATIONS
148
- @test.lua:0, line 14 : 1002 531818 0
149
- @test.lua:0, line 13 : 1 24 0
150
- @test.lua:0, line 9 : 1 32 0
151
- @test.lua:0, line 7 : 1 20 0
150
+ @test.lua:0, line 15 : 1002 531818 0
151
+ @test.lua:0, line 14 : 1 24 0
152
+ @test.lua:0, line 10 : 1 32 0
153
+ @test.lua:0, line 8 : 1 20 0
152
154
153
155
REALLOCATIONS
154
- @test.lua:0, line 13: 9 16424 8248
155
- Overrides:
156
- @test.lua:0, line 13
157
-
158
- @test.lua:0, line 14: 5 1984 992
156
+ @test.lua:0, line 14: 9 16424 8248
159
157
Overrides:
160
158
@test.lua:0, line 14
161
-
159
+ @test.lua:0, line 15: 5 1984 992
160
+ Overrides:
161
+ @test.lua:0, line 15
162
162
163
163
DEALLOCATIONS
164
164
INTERNAL: 20 0 1481
165
- @test.lua:0, line 14 : 3 0 7168
165
+ @test.lua:0, line 15 : 3 0 7168
166
166
Overrides:
167
- @test.lua:0, line 14
167
+ @test.lua:0, line 15
168
168
169
169
.. note ::
170
170
171
- On MacOS , a report is different because Tarantool and LuaJIT are built
172
- with the GC64 mode enabled for this OS .
171
+ On MacOS, a report will be different for the same chunk of code because
172
+ Tarantool and LuaJIT are built with the GC64 mode enabled for MacOS .
173
173
174
174
Let's examine the report structure. A report has three sections:
175
175
@@ -217,11 +217,11 @@ structures.
217
217
218
218
So, if the JIT compilation is on,
219
219
new traces will be generated and there will be a mixture of events labeled
220
- ``INTERNAL `` in the profiling report : some of them are really caused by
220
+ ``INTERNAL `` in the profiling report: some of them are really caused by
221
221
internal LuaJIT structures, but some of them are caused by allocations on
222
222
traces.
223
223
224
- If you want to have more definite report without new trace allocations,
224
+ If you want to have more definite report without JIT compiler allocations,
225
225
:ref: `call jit.off() <profiler_usage_example01 >` before starting the profiling.
226
226
And if you want to completely exclude the trace allocations from the report,
227
227
remove also the old traces by additionally calling ``jit.flush() `` after
@@ -244,8 +244,8 @@ most probably can arise while using the profiler.
244
244
245
245
.. _profiler_faq :
246
246
247
- Frequently Asked Questions
248
- --------------------------
247
+ FAQ
248
+ ---
249
249
250
250
In this section, some of the profiler-related points are discussed in
251
251
a Q&A format.
@@ -264,7 +264,7 @@ or other non-Lua allocators. You can use ``valgrind`` to debug them.
264
264
What does it mean?
265
265
266
266
**A **: ``INTERNAL `` means that these allocations/reallocations/deallocations are
267
- related to the internal LuaJIT structures or are made on JIT traces.
267
+ related to the internal LuaJIT structures or are made on traces.
268
268
Currently, the memory profiler doesn't report verbosely allocations of objects
269
269
that are made during trace execution. Try to :ref: `add jit.off() <profiler_usage_internal_jitoff >`
270
270
before profiler start.
@@ -334,16 +334,19 @@ from a console.
334
334
335
335
.. _profiler_analysis :
336
336
337
- Report analysis example
338
- -----------------------
337
+ Profiling report analysis example
338
+ ---------------------------------
339
339
340
340
In the example below, the following Lua code named ``format_concat.lua `` is
341
341
investigated with the help of the memory profiler reports.
342
342
343
+ .. _profiler_usage_example03 :
344
+
343
345
.. code-block :: lua
344
346
:linenos:
345
347
346
- jit.off() -- Prevent allocations on new traces.
348
+ -- Prevent allocations on new traces.
349
+ jit.off()
347
350
348
351
local function concat(a)
349
352
local nstr = a.."a"
@@ -355,7 +358,7 @@ investigated with the help of the memory profiler reports.
355
358
return nstr
356
359
end
357
360
358
- collectgarbage() -- Clean up.
361
+ collectgarbage()
359
362
360
363
local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
361
364
@@ -367,7 +370,7 @@ investigated with the help of the memory profiler reports.
367
370
local f = format(i)
368
371
local c = concat(i)
369
372
end
370
- collectgarbage() -- Clean up.
373
+ collectgarbage()
371
374
372
375
local st, err = misc.memprof.stop()
373
376
assert(st, err)
@@ -381,19 +384,18 @@ you will get the following profiling report:
381
384
.. code-block :: console
382
385
383
386
ALLOCATIONS
384
- @format_concat.lua:8 , line 9 : 19998 624322 0
387
+ @format_concat.lua:9 , line 10 : 19998 624322 0
385
388
INTERNAL: 1 65536 0
386
389
387
390
REALLOCATIONS
388
391
389
392
DEALLOCATIONS
390
393
INTERNAL: 19998 0 558816
391
394
Overrides:
392
- @format_concat.lua:8, line 9
393
-
394
- @format_concat.lua:8, line 9: 2 0 98304
395
+ @format_concat.lua:9, line 10
396
+ @format_concat.lua:9, line 10: 2 0 98304
395
397
Overrides:
396
- @format_concat.lua:8 , line 9
398
+ @format_concat.lua:9 , line 10
397
399
398
400
The reasonable questions regarding the report can be:
399
401
@@ -402,53 +404,54 @@ The reasonable questions regarding the report can be:
402
404
* Why are there approximately 20K allocations instead of 10K?
403
405
404
406
First of all, LuaJIT doesn't create a new string if the string with the same
405
- payload exists. This is called the string interning. So, when the string is
407
+ payload exists (see details on `lua-users.org/wiki <http://lua-users.org/wiki/ImmutableObjects >`_).
408
+ This is called the `string interning <https://en.wikipedia.org/wiki/String_interning >`_.
409
+ So, when the string is
406
410
created via the ``format() `` function, there is no need to create the same
407
- string via the ``concat() `` function, and LuaJIT just use the previous one.
411
+ string via the ``concat() `` function, and LuaJIT just uses the previous one.
408
412
409
- That is also the reason why the amount of allocations is not the round numbber
413
+ That is also the reason why the amount of allocations is not the round number
410
414
as can be expected from the cycle operator ``for i = 1, 10000... ``:
411
415
Tarantool creates some
412
- strings for internal needs and built- in modules, so some strings already exist.
416
+ strings for internal needs and built‑ in modules, so some strings already exist.
413
417
414
418
But why are there so many allocations? It's almost twice as big as the expected
415
- amount. This is because the ``string.format() `` built- in function creates
419
+ amount. This is because the ``string.format() `` built‑ in function creates
416
420
another string necessary for the ``%s `` identifier, so there are two allocations
417
421
for each iteration: for ``tostring(i) `` and for ``string.format("%sa", string_i_value) ``.
418
422
You can see the difference in behaviour by adding the
419
- ``local _ = tostring(i) `` line between lines 21 and 22 .
423
+ ``local _ = tostring(i) `` line between lines 22 and 23 .
420
424
421
- To profile only the ``concat() `` function, comment the line 22 , namely,
425
+ To profile only the ``concat() `` function, comment line 23 , namely,
422
426
``local f = format(i) `` and run the profiler.
423
427
424
428
The profiler's output is the following:
425
429
426
430
.. code-block :: console
427
431
428
432
ALLOCATIONS
429
- @format_concat.lua:3 , line 4 : 10000 284411 0
433
+ @format_concat.lua:4 , line 5 : 10000 284411 0
430
434
431
435
REALLOCATIONS
432
436
433
437
DEALLOCATIONS
434
438
INTERNAL: 10000 0 218905
435
439
Overrides:
436
- @format_concat.lua:3, line 4
437
-
438
- @format_concat.lua:3, line 4: 1 0 32768
439
-
440
+ @format_concat.lua:4, line 5
441
+ @format_concat.lua:4, line 5: 1 0 32768
440
442
441
443
**Q **: But what will change if the JIT compilation is enabled?
442
444
443
- **A **: Let's comment the first line of the code, namely, ``jit.off() `` and run
445
+ **A **: In the :ref: `code <profiler_usage_example03 >`, comment line 2, namely,
446
+ ``jit.off() `` and run
444
447
the profiler . Now, there are only 56 allocations in the report, and all other
445
448
allocations are JIT-related (see also the related
446
449
`dev issue <https://github.com/tarantool/tarantool/issues/5679 >`_):
447
450
448
451
.. code-block :: console
449
452
450
453
ALLOCATIONS
451
- @format_concat.lua:3 , line 4 : 56 1112 0
454
+ @format_concat.lua:4 , line 5 : 56 1112 0
452
455
@format_concat.lua:0, line 0: 4 640 0
453
456
INTERNAL: 2 382 0
454
457
@@ -457,24 +460,24 @@ allocations are JIT-related (see also the related
457
460
DEALLOCATIONS
458
461
INTERNAL: 58 0 1164
459
462
Overrides:
460
- @format_concat.lua:3 , line 4
463
+ @format_concat.lua:4 , line 5
461
464
INTERNAL
462
465
463
466
This happens because a trace is compiled after 56 iterations, and the
464
467
JIT-compiler removed the unused ``c `` variable from the trace, and, therefore,
465
468
the dead code of the ``concat() `` function is eliminated.
466
469
467
470
Next, let's profile only the ``format() `` function with JIT enabled.
468
- For that, keep the lines 1 and 23 commented (``jit.off() `` and
469
- ``local c = concat(i) `` respectively), uncomment the line 22
471
+ For that, keep lines 2 and 24 commented (``jit.off() `` and
472
+ ``local c = concat(i) `` respectively), uncomment line 23
470
473
(``local f = format(i) ``), and run the profiler.
471
474
472
475
The profiler's output is the following:
473
476
474
477
.. code-block :: console
475
478
476
479
ALLOCATIONS
477
- @format_concat.lua:8 , line 9 : 19998 624322 0
480
+ @format_concat.lua:9 , line 10 : 19998 624322 0
478
481
INTERNAL: 4 66824 0
479
482
@format_concat.lua:0, line 0: 4 640 0
480
483
@@ -484,19 +487,19 @@ The profiler's output is the following:
484
487
INTERNAL: 19999 0 559072
485
488
Overrides:
486
489
@format_concat.lua:0, line 0
487
- @format_concat.lua:8, line 9
488
-
489
- @format_concat.lua:8, line 9: 2 0 98304
490
+ @format_concat.lua:9, line 10
491
+ @format_concat.lua:9, line 10: 2 0 98304
490
492
Overrides:
491
- @format_concat.lua:8 , line 9
493
+ @format_concat.lua:9 , line 10
492
494
493
495
**Q **: Why is there so many allocations in comparison to the ``concat() `` function?
494
496
495
497
**A **: The answer is simple: the ``string.format() `` function with the ``%s ``
496
498
identifier is not yet compiled via LuaJIT. So, a trace can't be recorded and
497
499
the compiler doesn't perform the corresponding optimizations.
498
500
499
- If we change the ``format() `` function in the following way
501
+ If we change the ``format() `` function in the :ref: `code chunk <profiler_usage_example03 >`
502
+ in the following way
500
503
501
504
.. code-block :: lua
502
505
@@ -510,7 +513,7 @@ the profiling report becomes much prettier:
510
513
.. code-block :: console
511
514
512
515
ALLOCATIONS
513
- @format_concat.lua:8 , line 9 : 110 2131 0
516
+ @format_concat.lua:9 , line 10 : 110 2131 0
514
517
@format_concat.lua:0, line 0: 4 640 0
515
518
INTERNAL: 3 1148 0
516
519
@@ -520,5 +523,5 @@ the profiling report becomes much prettier:
520
523
INTERNAL: 113 0 2469
521
524
Overrides:
522
525
@format_concat.lua:0, line 0
523
- @format_concat.lua:8 , line 9
526
+ @format_concat.lua:9 , line 10
524
527
INTERNAL
0 commit comments