-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Current memory usage
On railsbench, with M1 Mac and stats build of ruby@0949cd7, yjit_alloc_size is 9.0MB (code_region_size is 6.9MB).
Heap profiling result
Using yjit-dhat branch, the profiling result was:
- 3.1MB (33.9%):
Blockbygen_single_block - 2.2MB (23.9%):
Branchbymake_branch_entry - 1.9MB (21.5%):
Vecbyget_or_create_version_list, etc. - 0.7MB (7.6%):
BranchStubbyset_branch_target - 0.5MB (5.8%):
HashMapbyassume_method_lookup_stable, etc. - 0.2MB (2.3%):
IseqPayloadbyget_or_create_iseq_payload - 0.2MB (2.1%):
BranchTargetbyset_branch_target
Struct counts
Using yjit-mem-stats branch, the breakdown of structs was:
# of IseqPayload: 2190
# of Block: 15296
# of Branch: 26968
# of BranchTarget: 28161
# of BranchTarget::Stub: 11836
# of BranchTarget::Block: 16325
size of IseqPayload: 96
size of Block: 176
size of Branch: 56
size of BranchTarget: 16
size of BranchStub: 58
size of Context: 38
Note that #s might be missing some unreachable structs on the heap, but then it might also mean they are currently leaking.
Known ideas
The following ideas are ordered based on the estimated impact (the earlier, the more impactful).
1. Eliminate extra blocks/branches created by defer_compilation
Related: #462
defer_count was 6883. If we just replace the previous block with a combined block, we could reduce Blocks and Branches by 6883, which would save more memory than any other idea.
2. Hash consing for Context
The idea was proposed by Maxime.
Context could be a series of "cons" where each cons is a pair of a diff and a pointer to the previous Context, which roots from the default Context. If you replace Context (38 bytes) in Block and BranchStub with a Box of cons (8 bytes), it would save 30 bytes from 15296 blocks and 11836 branch stubs, which could give 0.81MB reduction at most (9% of all).
Background
I already tried deduplicating contexts, but the hash table used for deduplication ended up being large, even with shrink_to_fit. The reduction was too limited despite the complexity.
However, thanks to the patch, I was able to easily collect the distribution of duplicated contexts https://gist.github.com/k0kubun/c1a5854e011db90d7e6efd73a6eda3b3. Many duplicated contexts share the same local_types, temp_types, and temp_mapping, so we could save those 32 bytes if we optimize the data structure for it.
3. Replace CodePtr with write_pos
If we allow only up to 4GB in --yjit-exec-mem-size, Option<CodePtr> could be changed to Option<NonZeroU32> by storing a write_pos instead of a pointer. (well, write_pos needs to be interpreted as a 1-origin value then 😅)
4. Combine targets and shape in Branch
In Branch, we could combine targets and shape into a single enum. Each variant should have either Box<[BranchTarget; 2]> or Box<BranchTarget>. Not a lot of reduction, but it’d still save 8 byte for every case.
Notes
For each Vec, we might be able to save some space by eliminating the capacity if it's no longer modified, e.g using into_boxed_slice.