YJIT: Profile and report memory used by YJIT Rust

## Current memory usage
On railsbench, with M1 Mac and stats build of https://github.com/ruby/ruby/commit/0949cd7107cf19bd7c93b06c4fd0250670719156, `yjit_alloc_size` is 9.0MB (`code_region_size` is 6.9MB).

## Heap profiling result
Using [yjit-dhat](https://github.com/Shopify/ruby/tree/yjit-dhat) branch, the profiling result was:

* 3.1MB (33.9%): `Block` by `gen_single_block`
* 2.2MB (23.9%): `Branch` by `make_branch_entry`
* 1.9MB (21.5%): `Vec` by `get_or_create_version_list`, etc.
* 0.7MB (7.6%): `BranchStub` by `set_branch_target`
* 0.5MB (5.8%): `HashMap` by `assume_method_lookup_stable`, etc.
* 0.2MB (2.3%): `IseqPayload` by `get_or_create_iseq_payload`
* 0.2MB (2.1%): `BranchTarget` by `set_branch_target`

## Struct counts
Using [yjit-mem-stats](https://github.com/Shopify/ruby/tree/yjit-mem-stats) branch, the breakdown of structs was:

```
# of IseqPayload: 2190
# of Block: 15296
# of Branch: 26968
# of BranchTarget: 28161
# of BranchTarget::Stub: 11836
# of BranchTarget::Block: 16325

size of IseqPayload: 96
size of Block: 176
size of Branch: 56
size of BranchTarget: 16
size of BranchStub: 58
size of Context: 38
```

Note that `#`s might be missing some unreachable structs on the heap, but then it might also mean they are currently leaking.

## Known ideas
The following ideas are ordered based on the estimated impact (the earlier, the more impactful).

### 1. Eliminate extra blocks/branches created by defer_compilation
Related: https://github.com/Shopify/ruby/issues/462

`defer_count` was 6883. If we just replace the previous block with a combined block, we could reduce Blocks and Branches by 6883, which would save more memory than any other idea.

### 2. Hash consing for Context
The idea was proposed by Maxime.

`Context` could be a series of "cons" where each cons is a pair of a diff and a pointer to the previous Context, which roots from the default Context. If you replace `Context` (38 bytes) in `Block` and `BranchStub` with a Box of cons (8 bytes), it would save 30 bytes from 15296 blocks and 11836 branch stubs, which could give 0.81MB reduction at most (9% of all).

#### Background
I already [tried deduplicating contexts](https://github.com/Shopify/ruby/commit/951a403763c0b50766907adb96d7bfbd9252a816), but the hash table used for deduplication ended up being large, even with `shrink_to_fit`. The reduction was too limited despite the complexity.

However, thanks to the patch, I was able to easily collect the distribution of duplicated contexts https://gist.github.com/k0kubun/c1a5854e011db90d7e6efd73a6eda3b3. Many duplicated contexts share the same `local_types`, `temp_types`, and `temp_mapping`, so we could save those 32 bytes if we optimize the data structure for it.

### 3. Replace CodePtr with write_pos
If we allow only up to 4GB in --yjit-exec-mem-size, `Option<CodePtr>` could be changed to `Option<NonZeroU32>` by storing a write_pos instead of a pointer. (well, write_pos needs to be interpreted as a 1-origin value then 😅)

### 4. Combine targets and shape in Branch
In `Branch`, we could combine `targets` and `shape` into a single enum. Each variant should have either `Box<[BranchTarget; 2]>` or `Box<BranchTarget>`. Not a lot of reduction, but it’d still save 8 byte for every case.

## Notes
For each `Vec`, we might be able to save some space by eliminating the capacity if it's no longer modified, e.g using `into_boxed_slice`.

---

cc: @maximecb @XrXr 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

YJIT: Profile and report memory used by YJIT Rust #484

Current memory usage

Heap profiling result

Struct counts

Known ideas

1. Eliminate extra blocks/branches created by defer_compilation

2. Hash consing for Context

Background

3. Replace CodePtr with write_pos

4. Combine targets and shape in Branch

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

YJIT: Profile and report memory used by YJIT Rust #484

Description

Current memory usage

Heap profiling result

Struct counts

Known ideas

1. Eliminate extra blocks/branches created by defer_compilation

2. Hash consing for Context

Background

3. Replace CodePtr with write_pos

4. Combine targets and shape in Branch

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions