-
Couldn't load subscription status.
- Fork 341
Speeding up v8 heap snapshots #702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
7d91e99
485d04f
3dbf61b
b6860f4
77c2744
9eac65c
53d996a
a0518ff
f02741a
bc0bcee
91525cd
69a958e
1689958
3a1cdd9
91dc5a8
3899164
de70d28
3811c74
c89f34f
239123a
3785dbe
c603786
979a434
ca5bf19
bdd5b96
8f083a8
b065c43
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,161 @@ | ||||||
| --- | ||||||
| title: 'Speeding up V8 heap snapshots' | ||||||
| description: 'This post about V8 heap snapshots presents some performance problems found by Bloomberg engineers, and how we fixed them to make JavaScript memory analysis faster than ever.' | ||||||
| author: 'Jose Dapena Paz' | ||||||
| date: 2023-06-23 | ||||||
| tags: | ||||||
| - memory | ||||||
| - tools | ||||||
| --- | ||||||
| *This blog post has been authored by José Dapena Paz (Igalia), with contributions from Jason Williams (Bloomberg), Ashley Claymore (Bloomberg), Rob Palmer (Bloomberg) and Joyee Cheung(Igalia).* | ||||||
|
|
||||||
| In this post about V8 heap snapshots, I will talk about some performance problems found by Bloomberg engineers, and how we fixed them to make JavaScript memory analysis faster than ever. | ||||||
|
|
||||||
| ## The Problem | ||||||
|
|
||||||
| Ashley Claymore was working on diagnosing a memory leak in a JavaScript application. It was failing with *Out-Of-Memory* errors. Despite the process having access to plenty of system memory, V8 places a hard limit on the amount of memory dedicated to the garbage-collected heap from which all JavaScript objects are allocated. For the tested application, this limit was configured to be around 1400MB. Normally V8's garbage collector should be able to keep the heap usage under that limit, so the failures indicated that there was a leak. | ||||||
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| The standard way to debug a routine memory leak scenario like this is to capture a heap snapshot and then inspect the various summaries and object attributes using DevTools "Memory" tab to find out what is consuming the most memory. In DevTools, you click the round button marked _"Take heap snapshot"_ to perform the capture. For Node.js applications, you can [trigger the snapshot](https://nodejs.org/en/docs/guides/diagnostics/memory/using-heap-snapshot) programmatically using this API: | ||||||
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| ```javascript | ||||||
| require('v8').writeHeapSnapshot(); | ||||||
| ``` | ||||||
|
|
||||||
| The desire was to capture several snapshots at different points in the application's life, so that DevTools Memory viewer could be used to show the difference between the heaps at different times. The problem was that capturing a single full-size (500MB) snapshot was taking **over 30 minutes** alone! | ||||||
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| It was this slowness in the memory analysis workflow that we needed to solve. | ||||||
|
|
||||||
| ## Narrowing down the Problem | ||||||
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| Jason Williams started investigating the issue using some V8 parameters. As described in the previous post, V8 has some nice command line parameters that can help with that. These options were used to curate the heap snapshots , simplify the reproduction, and improve observability: | ||||||
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| Jason Williams started investigating the issue using some V8 parameters. As described in the previous post, V8 has some nice command line parameters that can help with that. These options were used to curate the heap snapshots , simplify the reproduction, and improve observability: | |
| Jason Williams started investigating the issue using some V8 parameters. These options were used to create the heap snapshots, simplify the reproduction, and improve observability: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than --max-old-space-size, all the flags below are Node-specific. What do you recommend folks do that are trying to debug memory leaks not on Node?
Also what does "improve observability" mean?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading this section after reading the entire post, I'm unclear on the point of listing the flags in detail. It seems almost incidental: here are the flags that we were using to capture heap snapshots, and where we observed surprising slowdowns. Could this section be shortened? Perhaps I'm missing the intention though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially the problem was detected just extracting regular heap snapshots from DevTools.
But using these command line arguments allowed to get a detailed breakdown of what was happening, and also allowed to reproduce the heap snapshot performance problem several times, without requiring to use a remote DevTools connection.
So this is part of the investigation steps (increasing and simplifying reproducibility).
In general, the intent of the whole post is not only explaining the specific fixes, but the whole investigation procedure that led to them.
jdapena marked this conversation as resolved.
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this section adds to the article.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section seems overly detailed to me. The upshot seems to be that there's a Windows-specific profiler that gave you good leads, which I think suffices. The meat of this is what you found out from those leads and how you optimized the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I wanted to be explicit about the procedure used for the investigation, so other people can benefit from it investigating performance problems in V8 (and not only V8) scope.
Even when Windows Performance Toolkit documentation is not bad, the specific steps to investigate this V8 problem are not completely trivial. The actual solutions to the detected problems were not really complex, but identifying exactly where the overhead was has been more significant. And for that, the step-by-step looks valuable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the upside of this is that this teaches people how to address similar issues and would help future contributors. But one down side is that this is very Windows-specific. Maybe we can save the instructions about the exact buttons to click, because the links to the Microsoft documentation are already more than enough for people who wants to get their hands dirty. And we can focus on what to look (so people who are using e.g. Linux perf can apply it to their tool as well) instead, and perhaps touch on V8/Node.js flags to use for lack of better documentation on them. So maybe something like:
To analyze the performance of heap snapshot generation, I followed [these steps](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/wpr-how-to-topics#start-a-recording) to run the Windows Performance Recorder with the failing script using `NODE_OPTIONS="--max-old-space-size=100 --heapsnapshot-near-heap-limit=10 --profile-heap-snapshot`. I had to modify Node.js to accept `--profile-heap-snapshot` in `NODE_OPTIONS`, as it uses an allowlist to filter V8 flags that can be configured through the environment variable.
To make this more succinct, we can also move the Narrowing the Problem section (or just the explanation about the flags) here - the first step was adding the flag, which gave us a rough picture of the time spent on snapshot generation. In terms of the readers' understanding, ETW does not come into the picture until we start the second step, so we do not have to explain about the flags or introduce ETW/Windows performance toolkit before we finish talking about the first step (the fact that we used ETW before we added the flag was a less relevant background info for readers, we really only put them into good use in the second step) . It also seems that the Narrowing the Problem left a lot of gaps that are not filled until we reach this section, and when we finally reach here we have to again repeat some information mentioned before, so we might as well just merge them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is still interesting to add how to use the specific Windows tooling. It is not trivial to take the right steps with it to achieve a good recording that is informative enough.
Main issue I see with what I wrote here is that, one thing is explaining how to use the tool for this specific problem, and another different one is being a step-by-step manual of how to use WPR. This post should refer to the link you propose for the "step-by-step" guide, and only add hints of the specific configuration used.
I will rewrite this accordingly.
The gaps in "Narrowing the problem" are intentional, as they are explaining the investigation process. We come from scattered facts to getting more information (in this case first with logging, then profiling) that allows to resolve the problem. I would prefer to keep the flow like that.
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's a well-understood use of "flatmap", especially in a JS context where flatMap exists. I'd avoid giving it a name and just describe the representation.
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jdapena marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| What did I do to fix it? As the problem comes mostly from numbers represented as strings that would fall in consecutive positions, I modified the hash function so we would rotate the resulting hash value 2 positions to the left. So, for each number, we would introduce 3 free positions. Why 2? Empirical testing across several work-sets showed this number was the best choice to minimize collisions. | |
| What did I do to fix it? As the problem comes mostly from numbers represented as strings that would fall in consecutive positions, I modified the hash function so we would rotate the resulting hash value 2 positions to the left. So each pair of consecutive number would be placed with 3 free positions in between. Why 2? Empirical testing across several work-sets showed this number was the best choice to minimize collisions. |
(I took me a while to realize that it was talking about ((n + 1) << 2) - (n << 2) = 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you perhaps add code snippets demonstrating the improved hash directly? V8 blog audience should be used to bitwise manipulation code.
Uh oh!
There was an error while loading. Please reload this page.