Skip to content

Conversation

@saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Mar 29, 2025

Noticed this while looking at the codegen for floating->integral casts where we use vfixupimms[sd], which is a SIMD scalar instruction that takes a control table as a vector. The table value for that instruction is almost always const, and it's the operand that supports memory load, so we end up emitting a full SIMD16 vector to the data section for what ends up being a DWORD or QWORD reloc. This change identifies vector constants contained by instructions that read only a scalar and shrinks the emitted data section value down to the scalar size.

We were also emitting full vectors to the data section for uncontained CNS_VEC to be loaded to register for use. #92017 had previously addressed compressing large vectors down to SIMD16 broadcast when they were immediately consumed by STOREIND. This PR does compression down as far as 4-byte scalar, and does it for all uncontained CNS_VEC loads.

Diffs

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 29, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 29, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Member Author

@saucecontrol saucecontrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding I've updated this based on the approach we discussed on Discord, including using broadcast loads where possible.

Latest diffs look good. It's overall a small code size regression, but generally we're trading +1 byte of code for -8 or more bytes of data.

case NI_AVX2_BroadcastVector128ToVector256:
case NI_AVX512F_BroadcastVector128ToVector512:
case NI_AVX512F_BroadcastVector256ToVector512:
if (GetAuxiliaryJitType() == CORINFO_TYPE_PTR)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these deletions are reverting questionable changes from #92017, which attempted to solve the same problem, but only for uncontained CNS_VEC directly consumed by STOREIND. Those cases are handled better by this PR.

case TYP_DOUBLE:
{
result->f64[arg1] = static_cast<float>(arg2);
result->f64[arg1] = arg2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and the change in valuenum.cpp are unrelated things I noticed while reading how the SIMD constant folding works.

@saucecontrol saucecontrol changed the title JIT: Shrink data for scalar instructions with contained const vectors JIT: Shrink data section for const vector loads Apr 2, 2025
@tannergooding
Copy link
Member

It's overall a small code size regression, but generally we're trading +1 byte of code for -8 or more bytes of data.

Yep, that's pretty typical. I'd actually like if we had a way for SPMI to track the diff of the method local data section as well to help account for such changes. Possibly @EgorBo or @jakobbotsch have ideas on if that's feasible.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC. @dotnet/jit-contrib, @EgorBo, @jakobbotsch for secondary review

@tannergooding
Copy link
Member

Ping @dotnet/jit-contrib, @EgorBo, @jakobbotsch for secondary review

@BruceForstall
Copy link
Contributor

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@BruceForstall BruceForstall merged commit 8b00880 into dotnet:main May 2, 2025
105 of 108 checks passed
@saucecontrol saucecontrol deleted the shrink-relocs branch May 4, 2025 15:20
@github-actions github-actions bot locked and limited conversation to collaborators Jun 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants