Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Use saveNext unwind opcode on arm64. #21683

Merged
merged 4 commits into from
Jan 8, 2019
Merged

Conversation

sandreenko
Copy link

@sandreenko sandreenko commented Dec 27, 2018

According to Microsoft ARM64 exception handling doc we can use save_next as unwind code.

And we have had part of it implemented in genPrologSaveRegPair but before the cleaning in #21395 it was tricky to support it in the epilog generation and keep prolog/epilog unwind infos matched.

This PR adds genSetUseSaveNextPairs that marks register pairs that we can save/restore with save_next and teaches genEpilogRestoreRegPair to use save_next (genPrologSaveRegPair has already known how to do that).

Asm diffs for System.Private.CoreLib (arm64 checked, altjit):

1 files had text diffs but not size diffs.
System.Private.CoreLib.dasm had 65544 diffs

that look like:

***** F:\DIFFS\DIFFOUT\DASMSET_86\BASE\System.Private.CoreLib.dasm
    C9 8A       save_regp X#6 Z#10 (0x0A); stp x25, x26, [sp, #80]
    C9 08       save_regp X#4 Z#8 (0x08); stp x23, x24, [sp, #64]
    C8 86       save_regp X#2 Z#6 (0x06); stp x21, x22, [sp, #48]
    C8 04       save_regp X#0 Z#4 (0x04); stp x19, x20, [sp, #32]
***** F:\DIFFS\DIFFOUT\DASMSET_86\DIFF\SYSTEM.PRIVATE.CORELIB.DASM
    C9 8A       save_regp X#6 Z#10 (0x0A); stp x25, x26, [sp, #80]
    E6          save_next
    E6          save_next
    C8 04       save_regp X#0 Z#4 (0x04); stp x19, x20, [sp, #32]
*****

and there is a tiny improvement in the native image System.Private.CoreLib.dll size.

Tested with GC_Stress=0xc and forced holes on arm64.

@sandreenko
Copy link
Author

PTAL @BruceForstall @dotnet/arm64-contrib

@sandreenko
Copy link
Author

The beginnings of genSaveCalleeSavedRegistersHelp and https://github.com/dotnet/coreclr/blob/03f0b6d5f97a5d65387cad5fd0f60342d3118047/src/jit/codegenarm64.cpp#L704 look like copypaste after cleaning:

void CodeGen::genSaveCalleeSavedRegistersHelp(regMaskTP regsToSaveMask, int lowestCalleeSavedOffset, int spDelta)
{
assert(spDelta <= 0);
unsigned regsToSaveCount = genCountBits(regsToSaveMask);
if (regsToSaveCount == 0)
{
if (spDelta != 0)
{
// Currently this is the case for varargs only
// whose size is MAX_REG_ARG * REGSIZE_BYTES = 64 bytes.
genStackPointerAdjustment(spDelta, REG_NA, nullptr);
}
return;
}
assert((spDelta % 16) == 0);
assert((regsToSaveMask & RBM_FP) == 0); // We never save FP here.

void CodeGen::genRestoreCalleeSavedRegistersHelp(regMaskTP regsToRestoreMask, int lowestCalleeSavedOffset, int spDelta)
{
assert(spDelta >= 0);
unsigned regsToRestoreCount = genCountBits(regsToRestoreMask);
if (regsToRestoreCount == 0)
{
if (spDelta != 0)
{
// Currently this is the case for varargs only
// whose size is MAX_REG_ARG * REGSIZE_BYTES = 64 bytes.
genStackPointerAdjustment(spDelta, REG_NA, nullptr);
}
return;
}
assert((spDelta % 16) == 0);
assert((regsToRestoreMask & RBM_FP) == 0); // We never restore FP here.

but the only way that I saw to fix that was to create another function like genSaveOrRestoreCalleeSavedRegistersHelp with an additional bool argument and move it there. But after all, it looked worse than it had been before (because there are many arguments and they create long headers). So I decided to leave it as is, but will be glad to fix it if anybody has a better idea.

@filipnavara
Copy link
Member

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test
@dotnet-bot test Ubuntu x64 Checked Innerloop Build and Test
@dotnet-bot test Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0)
@dotnet-bot test Windows_NT arm Cross Checked Innerloop Build and Test

@sandreenko
Copy link
Author

test Windows_NT arm Cross Checked Innerloop Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs0x10 Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs2 Build and Test
test Windows_NT arm64 Cross Checked jitstress2_jitstressregs0x1000 Build and Test

@BruceForstall
Copy link

In your example diff, why doesn't

C9 8A       save_regp X#6 Z#10 (0x0A); stp x25, x26, [sp, #80]

also get replaced by save_next?

@BruceForstall
Copy link

Are there diff examples where save_next is used for FP?

Apparently, save_next used after the last int register means the first callee-saved FP register pair. Do we support that usage also? (Or does the first FP pair always get its own save_fregp?)

@sandreenko
Copy link
Author

In your example diff, why doesn't
C9 8A save_regp X#6 Z#10 (0x0A); stp x25, x26, [sp, #80]
also get replaced by save_next?

because x27, x28 were not saved as pair for this frame.

@mikedn
Copy link

mikedn commented Jan 2, 2019

and there is a tiny improvement in the native image System.Private.CoreLib.dll size.

So what's the advantage? Just curious, I'm not very familiar with unwind info.

@sandreenko
Copy link
Author

So what's the advantage? Just curious, I'm not very familiar with unwind info.

I was rewriting this part to fix the issue (#21395) and did not want to leave any commented lines or // TODO. So the main goal was to make this code more readable while I was around, the second goal was to get the encoding improvement.

@janvorli
Copy link
Member

janvorli commented Jan 2, 2019

@mikedn the save_next is encoded as 1 byte, the save_regp as 2 bytes in the unwind info.

@sandreenko
Copy link
Author

Apparently, save_next used after the last int register means the first callee-saved FP register pair. Do we support that usage also? (Or does the first FP pair always get its own save_fregp?)

We do not support this case, but we can easily add this. However, I do not see any examples of saving float registers in System.Private.CoreLib.dasm for arm64, so we won't see any diffs and the testing will be poor. Is it expected that we do not have any floats in regsToSaveMask(that comes from genFuncletInfo.fiSaveRegs)?

@BruceForstall
Copy link

because x27, x28 were not saved as pair for this frame.

@sandreenko I'm not sure I understand this. I think that the x19/x20 case forms the "base", then the save_next entries relate to the "next" register pair from this base, namely x21/x22, x23/x24 in your example. So I don't understand why x25/x26 wouldn't be the next logical save_next case.

@BruceForstall
Copy link

@sandreenko

However, I do not see any examples of saving float registers in System.Private.CoreLib.dasm for arm64

That seems odd; there are 8 callee-saved FP regs on arm64. I tried and see 61 cases of save_freg/save_fregp in System.Private.CoreLib.dll. I even see at least one case where save_next could be used for FP regs (System.Globalization.CalendricalCalculationsHelper:EquationOfTime(double):double).

continue;
}

if (genCanUseSaveNextPair(prev, curr) && genCanUseSaveNextPair(curr, next))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is && genCanUseSaveNextPair(curr, next) really required?

I would expect the code to not depend on next, so it would simply be:

    for (int i = 1; i < regStack->Height(); ++i)
    {
        RegPair& curr = regStack->BottomRef(i);
        RegPair& prev = regStack->BottomRef(i - 1);

        if (prev.reg2 == REG_NA || curr.reg2 == REG_NA)
        {
            continue;
        }

        if (genCanUseSaveNextPair(prev, curr))
        {
            curr.useSaveNextPair = true;
        }
    }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, lets imagine we have mask of
r3, r4, r5, r6, r7, r8,
then we can do only
stp r3, r4; store_next; stp r7, r8
in the prolog and the epilog will do it in the reversed order:
stp r7, r8; store_next; stp r3, r4.
We can't start an epilog with store_next that means we can't finish a prolog with store_next to keep matching.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we start epilog with save_next?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, you can see how save_next is handled (and how all the unwinding happens) here: https://github.com/dotnet/coreclr/blob/master/src/unwinder/arm64/unwinder_arm64.cpp

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explaining this, fixed.

@sandreenko
Copy link
Author

That seems odd; there are 8 callee-saved FP regs on arm64. I tried and see 61 cases of save_freg/save_fregp in System.Private.CoreLib.dll. I even see at least one case where save_next could be used for FP regs (System.Globalization.CalendricalCalculationsHelper:EquationOfTime(double):double).

Thanks, now I see, there was something strange with my search that did not see the file:

Search "save_regp" (58040 hits in 1 file)
Search "save_freg" (0 hits in 0 files)
Search "save_fregp" (0 hits in 0 files)

Now I see them. However, we need sequences of 3 or more of consecutive pairs to be able to use save_next and I do not see any.

I will push the change that supports using of save_next for float after int tomorrow. I will test it with the mode that forces int integers to allocate from the end, it should give us few occurrences of that.

@sandreenko
Copy link
Author

test Windows_NT arm Cross Checked Innerloop Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs0x10 Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs2 Build and Test
test Windows_NT arm64 Cross Checked jitstress2_jitstressregs0x1000 Build and Test

@sandreenko
Copy link
Author

The PR was updated to support save_next after the last int pair and save_next in the beginning of an epilog.
Now we have:
System.Private.CoreLib.dasm had 107866 diffs (before was 65544).
Size of checked crossgened System.Private.CoreLib.dll for arm64 went down from 2,110,848 bytes to 12,101,120 bytes (-9,728 bytes) that is smaller than I expected. Looks like padding and alignment fill most bytes that we won with save_next.

please take another look @BruceForstall.

@BruceForstall
Copy link

Can you show some asm diffs here? Both for pure int save_next, and for cases of float save_next following last int save_next?

@sandreenko
Copy link
Author

Can you show some asm diffs here? Both for pure int save_next, and for cases of float save_next following last int save_next?

As disscused above we have examples only for the first case:

***** F:\DIFFS\DIFFOUT\DASMSET_90\BASE\System.Private.CoreLib.dasm
    ---- Epilog start at index 1 ----
    D8 8A       save_fregp X#2 Z#10 (0x0A); stp d10, d11, [sp, #80]
    D8 08       save_fregp X#0 Z#8 (0x08); stp d8, d9, [sp, #64]
***** F:\DIFFS\DIFFOUT\DASMSET_90\DIFF\SYSTEM.PRIVATE.CORELIB.DASM
    ---- Epilog start at index 1 ----
    E6          save_next
    D8 08       save_fregp X#0 Z#8 (0x08); stp d8, d9, [sp, #64]
*****

or

  ---- Unwind codes ----
    E1          set_fp; mov fp, sp
    ---- Epilog start at index 1 ----
    E6          save_next
    E6          save_next
    C8 02       save_regp X#0 Z#2 (0x02); stp x19, x20, [sp, #16]
    87          save_fplr_x #7 (0x07); stp fp, lr, [sp, #-64]!
    E4          end
    E4          end

instead of

  ---- Unwind codes ----
    E1          set_fp; mov fp, sp
    ---- Epilog start at index 1 ----
    C9 06       save_regp X#4 Z#6 (0x06); stp x23, x24, [sp, #48]
    C8 84       save_regp X#2 Z#4 (0x04); stp x21, x22, [sp, #32]
    C8 02       save_regp X#0 Z#2 (0x02); stp x19, x20, [sp, #16]
    87          save_fplr_x #7 (0x07); stp fp, lr, [sp, #-64]!
    E4          end
    E4          end
    E4          end
    E4          end

@sandreenko
Copy link
Author

a new infrastructure failure on arm64 windows:

13:58:52 D:\j\workspace\arm64_cross_d---ffc60a4b>powershell -NoProfile -Command "Add-Type -Assembly 'System.IO.Compression.FileSystem'; [System.IO.Compression.ZipFile]::CreateFromDirectory('.\bin\tests\Windows_NT.arm64.Debug', '.\bin\tests\tests.zip')" 
14:02:21 Exception calling "CreateFromDirectory" with "2" argument(s): "Operation did not complete successfully because the 
14:02:21 file contains a virus or potentially unwanted software.
14:02:21 "
14:02:21 At line:1 char:56
14:02:21 + ... ileSystem'; [System.IO.Compression.ZipFile]::CreateFromDirectory('.\b ...
14:02:21 +                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
14:02:21     + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
14:02:21     + FullyQualifiedErrorId : IOException
14:02:21  
14:02:21 
14:02:21 D:\j\workspace\arm64_cross_d---ffc60a4b>exit 1 

@dotnet/dnceng PTAL

@sandreenko
Copy link
Author

test Windows_NT arm Cross Checked Innerloop Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs0x10 Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs2 Build and Test
test Windows_NT arm64 Cross Checked jitstress2_jitstressregs0x1000 Build and Test

@MattGal
Copy link
Member

MattGal commented Jan 4, 2019

@sandreenko this problem has been around since the mid-2000s; CoreCLR tests just look like viruses, independent of architecture used, and anti-virus must have exceptions for the build output folders to succeed when building them.

I'm unsure how you just started noticing this (I'd guess perhaps more stuff is building, build agent changed, or tests were disabled before... dunno? ) @meganaquinn is actively working to address this via https://github.com/dotnet/core-eng/issues/4555

@sandreenko
Copy link
Author

test OSX10.12 x64 Checked Innerloop Build and Test
test Windows_NT arm Cross Debug Innerloop Build
test Windows_NT arm64 Cross Checked Innerloop Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs2 Build and Test

@sandreenko
Copy link
Author

test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs2 Build and Test

@sandreenko
Copy link
Author

Windows_NT arm Cross Debug Innerloop Build fails because of "file contains a virus or potentially unwanted software".

I have checked diffs once more and found that the change improved size for 9508 unwind sections (means Code Words after is lower than Code Words before), i.e.:

after:
Unwind Info:
  >> Start offset   : 0xd1ffab1e (not in unwind data)
  >>   End offset   : 0xd1ffab1e (not in unwind data)
  Code Words        : 2
  Epilog Count      : 0
  E bit             : 1
  X bit             : 0
  Vers              : 0
  Function Length   : 20 (0x00014) Actual length = 80 (0x000050)
  --- One epilog, unwind codes at 0
  ---- Unwind codes ----
    ---- Epilog start at index 0 ----
    E6          save_next
    E6          save_next
    E6          save_next
    E6          save_next
    C8 04       save_regp X#0 Z#4 (0x04); stp x19, x20, [sp, #32]
    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
    E4          end
before:
Unwind Info:
  >> Start offset   : 0xd1ffab1e (not in unwind data)
  >>   End offset   : 0xd1ffab1e (not in unwind data)
  Code Words        : 3
  Epilog Count      : 0
  E bit             : 1
  X bit             : 0
  Vers              : 0
  Function Length   : 20 (0x00014) Actual length = 80 (0x000050)
  --- One epilog, unwind codes at 0
  ---- Unwind codes ----
    ---- Epilog start at index 0 ----
    CA 0C       save_regp X#8 Z#12 (0x0C); stp x27, x28, [sp, #96]
    C9 8A       save_regp X#6 Z#10 (0x0A); stp x25, x26, [sp, #80]
    C9 08       save_regp X#4 Z#8 (0x08); stp x23, x24, [sp, #64]
    C8 86       save_regp X#2 Z#6 (0x06); stp x21, x22, [sp, #48]
    C8 04       save_regp X#0 Z#4 (0x04); stp x19, x20, [sp, #32]
    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
    E4          end

and each Code Word is 4 bytes long. So it means we should expect ~40 Kbytes improvement, but I see only ~10 in crossgened System.Private.CoreLib.dll image size diff, where do we lose other 30?

@BruceForstall
Copy link

where do we lose other 30?

It's very likely we don't gain much due to alignment. I think the full alignment data is 4-byte aligned, and padded.

@sandreenko
Copy link
Author

sandreenko commented Jan 4, 2019

It's very likely we don't gain much due to alignment. I think the full alignment data is 4-byte aligned, and padded.

Unwind Info is currently 4-byte aligned, so we have cases where we replaced one save_pair with save_next and added end to keep the alignment:

after:
  ---- Unwind codes ----
    E1          set_fp; mov fp, sp
    ---- Epilog start at index 1 ----
    E6          save_next
    C8 02       save_regp X#0 Z#2 (0x02); stp x19, x20, [sp, #16]
    85          save_fplr_x #5 (0x05); stp fp, lr, [sp, #-48]!
    E4          end
    E4          end
    E4          end

before:
  ---- Unwind codes ----
    E1          set_fp; mov fp, sp
    ---- Epilog start at index 1 ----
    C8 84       save_regp X#2 Z#4 (0x04); stp x21, x22, [sp, #32]
    C8 02       save_regp X#0 Z#2 (0x02); stp x19, x20, [sp, #16]
    85          save_fplr_x #5 (0x05); stp fp, lr, [sp, #-48]!
    E4          end
    E4          end

but even with that we should see 9508 code words improvement.

@sandreenko
Copy link
Author

@BruceForstall I think it is ready for another round of review.

Copy link

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for the cleanup, too.

@sandreenko
Copy link
Author

@BruceForstall thank you for the review.

@sandreenko sandreenko merged commit 62298e6 into dotnet:master Jan 8, 2019
@sandreenko sandreenko deleted the useNextPair branch January 8, 2019 23:40
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Use `saveNext` opcode on arm64.

* Support using of  `save_next` on int/float border.

* Delete the extra requirement that an epilog sequences can't start from `save_next`.

* response feedback


Commit migrated from dotnet/coreclr@62298e6
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants