2
2
LuaJIT memory profiler
3
3
======================
4
4
5
- Stating from version :doc: `2.7.1 </release/2.7.1 >`, Tarantool
6
- has the built-in module called ``memprof `` that implements a LuaJIT memory
7
- profiler and a profile parser. The profiler provides
5
+ Starting from version :doc: `2.7.1 </release/2.7.1 >`, Tarantool
6
+ has the built-in module called ``misc. memprof `` that implements the LuaJIT memory
7
+ profiler and the profile parser (further, * profiler *) . The profiler provides
8
8
a memory allocation report that helps analyse Lua code and find out the places
9
- that put the most pressure on the Lua garbage collector.
9
+ that put the most pressure on the Lua garbage collector (GC) .
10
10
11
- //?where to put this paragraph//
12
- Usually developers are not interested in information about allocations
13
- inside built-ins. So if a Lua built-in function is called from a Lua function,
14
- all allocations are attributed to this Lua function.
15
- Otherwise, this event is attributed to a C function.
16
-
17
- //?where to put this paragraph//
18
- Tail call optimization does not create a new call frame, so all allocations
19
- inside the function called via CALLT/CALLMT bytecodes are attributed to its caller.
11
+ .. contents ::
12
+ :local:
13
+ :depth: 2
20
14
21
15
.. _profiler_usage :
22
16
@@ -37,11 +31,11 @@ Collecting binary profile
37
31
~~~~~~~~~~~~~~~~~~~~~~~~~
38
32
39
33
To collect a binary profile for a particular part of the Lua code,
40
- you need to place this part between two ``memprof `` functions,
34
+ you need to place this part between two ``misc. memprof `` functions,
41
35
namely, ``misc.memprof.start() `` and ``misc.memprof.stop() ``, and then execute
42
36
the code under Tarantool.
43
37
44
- Below is a piece of simple Lua code named ``test.lua `` to illustrate this.
38
+ Below is a chunk of simple Lua code named ``test.lua `` to illustrate this.
45
39
46
40
.. _profiler_usage_example01 :
47
41
@@ -57,15 +51,28 @@ Below is a piece of simple Lua code named ``test.lua`` to illustrate this.
57
51
end
58
52
59
53
local t = {}
60
- for _ = 1, 1e5 do
54
+ for i = 1, 1e5 do
61
55
-- table.insert is the built-in function and all corresponding
62
- -- allocations are reported in the scope of main chunk.
56
+ -- allocations are reported in the scope of the main chunk.
63
57
table.insert(t,
64
- append('q', _ )
58
+ append('q', i )
65
59
)
66
60
end
67
61
local stp, err = misc.memprof.stop()
68
62
63
+ .. note ::
64
+
65
+ Usually, the information about allocations inside Lua built-ins are not really
66
+ useful for the developer. That's why if a Lua built-in function is called from
67
+ a Lua function, the profiler attributes all allocations to the Lua function.
68
+ Otherwise, this event is attributed to a C function.
69
+
70
+ Tail call optimization doesn't create a new call frame, so all allocations
71
+ inside the function called via the ``CALLT/CALLMT `` `bytecodes <http://wiki.luajit.org/Bytecode-2.0 >`_
72
+ are attributed to its caller.
73
+
74
+ Example above illustrates these cases.
75
+
69
76
Starting profiler in Lua code:
70
77
71
78
.. code-block :: lua
@@ -161,9 +168,8 @@ Tarantool generates a profiling report and closes the session.
161
168
162
169
.. note ::
163
170
164
- A report can look differently for the same piece of Lua code depending
165
- on the OS used. On MacOS, the report data
166
- //?can be influenced by the LuaJIT GC64 running//.
171
+ On MacOS , a report is different because Tarantool and LuaJIT are built
172
+ with the GC64 mode enabled for this OS.
167
173
168
174
Let's examine the report structure. A report has three sections:
169
175
@@ -181,39 +187,38 @@ An event record has the following format:
181
187
@<filename>:<function_line>, line <line_number>: <number_of_events> <allocated> <freed>
182
188
183
189
* <filename>—a name of the file containing Lua code.
184
- * <function_line>—a number of the line where the function generating the event
185
- is declared. Sometimes <function_line> is `` 0 ``. It means that
186
- the function generating the event is the //? main/entire code of //?file/script itself.
187
- This is exactly the case in the :ref: `example above <profiler_usage_example01 >`.
188
- Comments in the code explain why it happens for each of the functions .
189
- * <line_number>—a number of the line where the event is detected.
190
+ * <function_line>—the line number where the function generating the event
191
+ is declared. In some of the cases, allocations are attributed not to
192
+ the declared function but to the main chunk. In this case, the <function_line>
193
+ is set to `` 0 ``. See the :ref: `code chunk above <profiler_usage_example01> `
194
+ with the explanation in the comments for some examples .
195
+ * <line_number>—the line number where the event is detected.
190
196
* <number_of_events>—a number of events for this code line.
191
- * <allocated>—bytes allocated in memory during the //?event/ events.
192
- * <freed>—bytes freed in memory during //?event/ events.
197
+ * <allocated>—amount of memory allocated during all the events, bytes .
198
+ * <freed>—amount of memory freed during all the events, bytes .
193
199
194
- ``Overrides `` shows what allocation has been overridden.
200
+ The ``Overrides `` label shows what allocation has been overridden.
195
201
196
202
.. _profiler_usage_internal_jitoff :
197
203
198
- ``INTERNAL `` indicates that this event is caused by internal LuaJIT structures.
199
-
200
- //!the note below really needs to be reviewed thoroughly//
204
+ The ``INTERNAL `` label indicates that this event is caused by internal LuaJIT
205
+ structures.
201
206
202
207
.. note ::
203
208
204
209
Important note regarding the ``INTERNAL `` label and the recommendation
205
210
of switching the JIT compilation off (``jit.off() ``): this version of the
206
- profiler doesn't support verbose reporting for allocations //?on/for
211
+ profiler doesn't support verbose reporting for allocations on
207
212
`traces <https://en.wikipedia.org/wiki/Tracing_just-in-time_compilation#Technical_details >`_.
208
- If some memory allocations are made //?on/for a trace,
213
+ If memory allocations are made on a trace,
209
214
the profiler can't associate the allocations with the part of Lua code
210
215
that generated the trace. In this case, the profiler labels such allocations
211
216
as ``INTERNAL ``.
212
217
213
218
So, if the JIT compilation is on,
214
219
new traces will be generated and there will be a mixture of events labeled
215
220
``INTERNAL `` in the profiling report : some of them are really caused by
216
- internal LuaJIT structures, but some of them are caused by allocations //?on/for
221
+ internal LuaJIT structures, but some of them are caused by allocations on
217
222
traces.
218
223
219
224
If you want to have more definite report without new trace allocations,
@@ -249,11 +254,13 @@ a Q&A format.
249
254
inside C code?
250
255
251
256
**Answer (A) **: The profiler reports only allocation events caused by the Lua
252
- allocation functions . All Lua-related allocations, like table or string creation
257
+ allocator . All Lua-related allocations, like table or string creation
253
258
are reported. But the profiler doesn't report allocations made by ``malloc() ``
254
259
or other non-Lua allocators. You can use ``valgrind `` to debug them.
255
260
256
- **Q **: Why is there so many ``INTERNAL `` allocations in my profiling report?
261
+ |
262
+
263
+ **Q **: Why are there so many ``INTERNAL `` allocations in my profiling report?
257
264
What does it mean?
258
265
259
266
**A **: ``INTERNAL `` means that these allocations/reallocations/deallocations are
@@ -262,20 +269,26 @@ Currently, the memory profiler doesn't report verbosely allocations of objects
262
269
that are made during trace execution. Try to :ref: `add jit.off() <profiler_usage_internal_jitoff >`
263
270
before profiler start.
264
271
272
+ |
273
+
265
274
**Q **: Why is there some reallocations/deallocations without the ``Overrides ``
266
275
section?
267
276
268
277
**A **: These objects can be created before the profiler starts. Adding
269
278
``collectgarbage() `` before the profiler's start enables to collect all
270
279
previously allocated objects that are dead when the profiler starts.
271
280
281
+ |
282
+
272
283
**Q **: Why some objects are not collected during profiling? Is it
273
284
a memory leak?
274
285
275
286
**A **: LuaJIT uses incremental Garbage Collector (GC). A GC cycle may not be
276
287
finished at the moment of the profiler's stop. Add ``collectgarbage() `` before
277
288
stopping the profiler to collect all the dead objects for sure.
278
289
290
+ |
291
+
279
292
**Q **: Can I profile not just a current chunk but the entire running application?
280
293
Can I start the profiler when the application is already running?
281
294
@@ -330,7 +343,7 @@ investigated with the help of the memory profiler reports.
330
343
.. code-block :: lua
331
344
:linenos:
332
345
333
- jit.off() -- More verbose reports .
346
+ jit.off() -- Prevent allocations on new traces .
334
347
335
348
local function concat(a)
336
349
local nstr = a.."a"
@@ -384,28 +397,29 @@ you will get the following profiling report:
384
397
385
398
The reasonable questions regarding the report can be:
386
399
387
- * Why are there no allocations related to the ``concat() `` function?
388
- * Why the amount of allocations is not a round number?
389
- * Why are there approximately 20K allocations instead of 10K?
400
+ * Why are there no allocations related to the ``concat() `` function?
401
+ * Why the amount of allocations is not a round number?
402
+ * Why are there approximately 20K allocations instead of 10K?
390
403
391
404
First of all, LuaJIT doesn't create a new string if the string with the same
392
- payload exists. It is called the string interning. So, when the string is
393
- created via
394
- the ``format() `` function, there is no need to create the same string via
395
- the ``concat() `` function, and LuaJIT just use the previous one.
405
+ payload exists. This is called the string interning. So, when the string is
406
+ created via the ``format() `` function, there is no need to create the same
407
+ string via the ``concat() `` function, and LuaJIT just use the previous one.
396
408
397
- This is the reason of //?unpretty amount of allocations: Tarantool creates some
409
+ That is also the reason why the amount of allocations is not the round numbber
410
+ as can be expected from the cycle operator ``for i = 1, 10000... ``:
411
+ Tarantool creates some
398
412
strings for internal needs and built-in modules, so some strings already exist.
399
413
400
414
But why are there so many allocations? It's almost twice as big as the expected
401
415
amount. This is because the ``string.format() `` built-in function creates
402
416
another string necessary for the ``%s `` identifier, so there are two allocations
403
417
for each iteration: for ``tostring(i) `` and for ``string.format("%sa", string_i_value) ``.
404
- You can see the difference in the behaviour by adding the
418
+ You can see the difference in behaviour by adding the
405
419
``local _ = tostring(i) `` line between lines 21 and 22.
406
420
407
- Let's comment the 22nd line, namely, `` local f = format(i) ``
408
- (by adding `` -- `` at the line start) to take a look at the `` concat() `` function .
421
+ To profile only the `` concat() `` function, comment the line 22, namely,
422
+ `` local f = format(i) `` and run the profiler .
409
423
410
424
The profiler's output is the following:
411
425
@@ -424,10 +438,10 @@ The profiler's output is the following:
424
438
@format_concat.lua:3, line 4: 1 0 32768
425
439
426
440
427
- **Q **: But what will change if JIT compilation is enabled?
441
+ **Q **: But what will change if the JIT compilation is enabled?
428
442
429
- **A **: Let's comment the first line of the code, namely, ``jit.off() `` to see what
430
- will happen . Now, there are only 56 allocations in the report, and all other
443
+ **A **: Let's comment the first line of the code, namely, ``jit.off() `` and run
444
+ the profiler . Now, there are only 56 allocations in the report, and all other
431
445
allocations are JIT-related (see also the related
432
446
`dev issue <https://github.com/tarantool/tarantool/issues/5679 >`_):
433
447
@@ -450,7 +464,11 @@ This happens because a trace is compiled after 56 iterations, and the
450
464
JIT-compiler removed the unused ``c `` variable from the trace, and, therefore,
451
465
the dead code of the ``concat() `` function is eliminated.
452
466
453
- Let's now profile only the ``format() `` function with JIT enabled.
467
+ Next, let's profile only the ``format() `` function with JIT enabled.
468
+ For that, keep the lines 1 and 23 commented (``jit.off() `` and
469
+ ``local c = concat(i) `` respectively), uncomment the line 22
470
+ (``local f = format(i) ``), and run the profiler.
471
+
454
472
The profiler's output is the following:
455
473
456
474
.. code-block :: console
0 commit comments