Skip to content

YJIT: Profile and report memory used by YJIT Rust #484

@k0kubun

Description

@k0kubun

Current memory usage

On railsbench, with M1 Mac and stats build of ruby@0949cd7, yjit_alloc_size is 9.0MB (code_region_size is 6.9MB).

Heap profiling result

Using yjit-dhat branch, the profiling result was:

  • 3.1MB (33.9%): Block by gen_single_block
  • 2.2MB (23.9%): Branch by make_branch_entry
  • 1.9MB (21.5%): Vec by get_or_create_version_list, etc.
  • 0.7MB (7.6%): BranchStub by set_branch_target
  • 0.5MB (5.8%): HashMap by assume_method_lookup_stable, etc.
  • 0.2MB (2.3%): IseqPayload by get_or_create_iseq_payload
  • 0.2MB (2.1%): BranchTarget by set_branch_target

Struct counts

Using yjit-mem-stats branch, the breakdown of structs was:

# of IseqPayload: 2190
# of Block: 15296
# of Branch: 26968
# of BranchTarget: 28161
# of BranchTarget::Stub: 11836
# of BranchTarget::Block: 16325

size of IseqPayload: 96
size of Block: 176
size of Branch: 56
size of BranchTarget: 16
size of BranchStub: 58
size of Context: 38

Note that #s might be missing some unreachable structs on the heap, but then it might also mean they are currently leaking.

Known ideas

The following ideas are ordered based on the estimated impact (the earlier, the more impactful).

1. Eliminate extra blocks/branches created by defer_compilation

Related: #462

defer_count was 6883. If we just replace the previous block with a combined block, we could reduce Blocks and Branches by 6883, which would save more memory than any other idea.

2. Hash consing for Context

The idea was proposed by Maxime.

Context could be a series of "cons" where each cons is a pair of a diff and a pointer to the previous Context, which roots from the default Context. If you replace Context (38 bytes) in Block and BranchStub with a Box of cons (8 bytes), it would save 30 bytes from 15296 blocks and 11836 branch stubs, which could give 0.81MB reduction at most (9% of all).

Background

I already tried deduplicating contexts, but the hash table used for deduplication ended up being large, even with shrink_to_fit. The reduction was too limited despite the complexity.

However, thanks to the patch, I was able to easily collect the distribution of duplicated contexts https://gist.github.com/k0kubun/c1a5854e011db90d7e6efd73a6eda3b3. Many duplicated contexts share the same local_types, temp_types, and temp_mapping, so we could save those 32 bytes if we optimize the data structure for it.

3. Replace CodePtr with write_pos

If we allow only up to 4GB in --yjit-exec-mem-size, Option<CodePtr> could be changed to Option<NonZeroU32> by storing a write_pos instead of a pointer. (well, write_pos needs to be interpreted as a 1-origin value then 😅)

4. Combine targets and shape in Branch

In Branch, we could combine targets and shape into a single enum. Each variant should have either Box<[BranchTarget; 2]> or Box<BranchTarget>. Not a lot of reduction, but it’d still save 8 byte for every case.

Notes

For each Vec, we might be able to save some space by eliminating the capacity if it's no longer modified, e.g using into_boxed_slice.


cc: @maximecb @XrXr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions