Use `saveNext` unwind opcode on arm64. #21683

sandreenko · 2018-12-27T02:09:07Z

According to Microsoft ARM64 exception handling doc we can use save_next as unwind code.

And we have had part of it implemented in genPrologSaveRegPair but before the cleaning in #21395 it was tricky to support it in the epilog generation and keep prolog/epilog unwind infos matched.

This PR adds genSetUseSaveNextPairs that marks register pairs that we can save/restore with save_next and teaches genEpilogRestoreRegPair to use save_next (genPrologSaveRegPair has already known how to do that).

Asm diffs for System.Private.CoreLib (arm64 checked, altjit):

1 files had text diffs but not size diffs.
System.Private.CoreLib.dasm had 65544 diffs

that look like:

***** F:\DIFFS\DIFFOUT\DASMSET_86\BASE\System.Private.CoreLib.dasm
    C9 8A       save_regp X#6 Z#10 (0x0A); stp x25, x26, [sp, #80]
    C9 08       save_regp X#4 Z#8 (0x08); stp x23, x24, [sp, #64]
    C8 86       save_regp X#2 Z#6 (0x06); stp x21, x22, [sp, #48]
    C8 04       save_regp X#0 Z#4 (0x04); stp x19, x20, [sp, #32]
***** F:\DIFFS\DIFFOUT\DASMSET_86\DIFF\SYSTEM.PRIVATE.CORELIB.DASM
    C9 8A       save_regp X#6 Z#10 (0x0A); stp x25, x26, [sp, #80]
    E6          save_next
    E6          save_next
    C8 04       save_regp X#0 Z#4 (0x04); stp x19, x20, [sp, #32]
*****

and there is a tiny improvement in the native image System.Private.CoreLib.dll size.

Tested with GC_Stress=0xc and forced holes on arm64.

sandreenko · 2018-12-27T02:09:23Z

PTAL @BruceForstall @dotnet/arm64-contrib

sandreenko · 2018-12-27T02:19:44Z

The beginnings of genSaveCalleeSavedRegistersHelp and https://github.com/dotnet/coreclr/blob/03f0b6d5f97a5d65387cad5fd0f60342d3118047/src/jit/codegenarm64.cpp#L704 look like copypaste after cleaning:

coreclr/src/jit/codegenarm64.cpp

Lines 566 to 583 in 03f0b6d

    
           void CodeGen::genSaveCalleeSavedRegistersHelp(regMaskTP regsToSaveMask, int lowestCalleeSavedOffset, int spDelta) 
        
           { 
        
               assert(spDelta <= 0); 
        
               unsigned regsToSaveCount = genCountBits(regsToSaveMask); 
        
               if (regsToSaveCount == 0) 
        
               { 
        
                   if (spDelta != 0) 
        
                   { 
        
                       // Currently this is the case for varargs only 
        
                       // whose size is MAX_REG_ARG * REGSIZE_BYTES = 64 bytes. 
        
                       genStackPointerAdjustment(spDelta, REG_NA, nullptr); 
        
                   } 
        
                   return; 
        
               } 
        
               assert((spDelta % 16) == 0); 
        
               assert((regsToSaveMask & RBM_FP) == 0); // We never save FP here.

coreclr/src/jit/codegenarm64.cpp

Lines 704 to 720 in 03f0b6d

    
           void CodeGen::genRestoreCalleeSavedRegistersHelp(regMaskTP regsToRestoreMask, int lowestCalleeSavedOffset, int spDelta) 
        
           { 
        
               assert(spDelta >= 0); 
        
               unsigned regsToRestoreCount = genCountBits(regsToRestoreMask); 
        
               if (regsToRestoreCount == 0) 
        
               { 
        
                   if (spDelta != 0) 
        
                   { 
        
                       // Currently this is the case for varargs only 
        
                       // whose size is MAX_REG_ARG * REGSIZE_BYTES = 64 bytes. 
        
                       genStackPointerAdjustment(spDelta, REG_NA, nullptr); 
        
                   } 
        
                   return; 
        
               } 
        
               assert((spDelta % 16) == 0); 
        
               assert((regsToRestoreMask & RBM_FP) == 0); // We never restore FP here.

but the only way that I saw to fix that was to create another function like genSaveOrRestoreCalleeSavedRegistersHelp with an additional bool argument and move it there. But after all, it looked worse than it had been before (because there are many arguments and they create long headers). So I decided to leave it as is, but will be glad to fix it if anybody has a better idea.

filipnavara · 2018-12-27T10:59:44Z

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test
@dotnet-bot test Ubuntu x64 Checked Innerloop Build and Test
@dotnet-bot test Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0)
@dotnet-bot test Windows_NT arm Cross Checked Innerloop Build and Test

sandreenko · 2018-12-27T18:07:02Z

test Windows_NT arm Cross Checked Innerloop Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs0x10 Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs2 Build and Test
test Windows_NT arm64 Cross Checked jitstress2_jitstressregs0x1000 Build and Test

BruceForstall · 2019-01-02T21:04:53Z

In your example diff, why doesn't

C9 8A       save_regp X#6 Z#10 (0x0A); stp x25, x26, [sp, #80]

also get replaced by save_next?

BruceForstall · 2019-01-02T21:08:14Z

Are there diff examples where save_next is used for FP?

Apparently, save_next used after the last int register means the first callee-saved FP register pair. Do we support that usage also? (Or does the first FP pair always get its own save_fregp?)

sandreenko · 2019-01-02T21:37:39Z

In your example diff, why doesn't
C9 8A save_regp X#6 Z#10 (0x0A); stp x25, x26, [sp, #80]
also get replaced by save_next?

because x27, x28 were not saved as pair for this frame.

mikedn · 2019-01-02T21:46:32Z

and there is a tiny improvement in the native image System.Private.CoreLib.dll size.

So what's the advantage? Just curious, I'm not very familiar with unwind info.

sandreenko · 2019-01-02T21:58:05Z

So what's the advantage? Just curious, I'm not very familiar with unwind info.

I was rewriting this part to fix the issue (#21395) and did not want to leave any commented lines or // TODO. So the main goal was to make this code more readable while I was around, the second goal was to get the encoding improvement.

janvorli · 2019-01-02T22:32:37Z

@mikedn the save_next is encoded as 1 byte, the save_regp as 2 bytes in the unwind info.

sandreenko · 2019-01-02T22:42:49Z

Apparently, save_next used after the last int register means the first callee-saved FP register pair. Do we support that usage also? (Or does the first FP pair always get its own save_fregp?)

We do not support this case, but we can easily add this. However, I do not see any examples of saving float registers in System.Private.CoreLib.dasm for arm64, so we won't see any diffs and the testing will be poor. Is it expected that we do not have any floats in regsToSaveMask(that comes from genFuncletInfo.fiSaveRegs)?

BruceForstall · 2019-01-03T00:17:20Z

because x27, x28 were not saved as pair for this frame.

@sandreenko I'm not sure I understand this. I think that the x19/x20 case forms the "base", then the save_next entries relate to the "next" register pair from this base, namely x21/x22, x23/x24 in your example. So I don't understand why x25/x26 wouldn't be the next logical save_next case.

BruceForstall · 2019-01-03T01:59:30Z

@sandreenko

However, I do not see any examples of saving float registers in System.Private.CoreLib.dasm for arm64

That seems odd; there are 8 callee-saved FP regs on arm64. I tried and see 61 cases of save_freg/save_fregp in System.Private.CoreLib.dll. I even see at least one case where save_next could be used for FP regs (System.Globalization.CalendricalCalculationsHelper:EquationOfTime(double):double).

BruceForstall · 2019-01-03T02:11:56Z

src/jit/codegenarm64.cpp

+            continue;
+        }
+
+        if (genCanUseSaveNextPair(prev, curr) && genCanUseSaveNextPair(curr, next))


Is && genCanUseSaveNextPair(curr, next) really required?

I would expect the code to not depend on next, so it would simply be:

for (int i = 1; i < regStack->Height(); ++i) { RegPair& curr = regStack->BottomRef(i); RegPair& prev = regStack->BottomRef(i - 1); if (prev.reg2 == REG_NA || curr.reg2 == REG_NA) { continue; } if (genCanUseSaveNextPair(prev, curr)) { curr.useSaveNextPair = true; } } }

I see, lets imagine we have mask of
r3, r4, r5, r6, r7, r8,
then we can do only
stp r3, r4; store_next; stp r7, r8
in the prolog and the epilog will do it in the reversed order:
stp r7, r8; store_next; stp r3, r4.
We can't start an epilog with store_next that means we can't finish a prolog with store_next to keep matching.

Why can't we start epilog with save_next?

fyi, you can see how save_next is handled (and how all the unwinding happens) here: https://github.com/dotnet/coreclr/blob/master/src/unwinder/arm64/unwinder_arm64.cpp

Thank you for explaining this, fixed.

sandreenko · 2019-01-03T02:37:26Z

That seems odd; there are 8 callee-saved FP regs on arm64. I tried and see 61 cases of save_freg/save_fregp in System.Private.CoreLib.dll. I even see at least one case where save_next could be used for FP regs (System.Globalization.CalendricalCalculationsHelper:EquationOfTime(double):double).

Thanks, now I see, there was something strange with my search that did not see the file:

Search "save_regp" (58040 hits in 1 file)
Search "save_freg" (0 hits in 0 files)
Search "save_fregp" (0 hits in 0 files)

Now I see them. However, we need sequences of 3 or more of consecutive pairs to be able to use save_next and I do not see any.

I will push the change that supports using of save_next for float after int tomorrow. I will test it with the mode that forces int integers to allocate from the end, it should give us few occurrences of that.

…m `save_next`.

sandreenko · 2019-01-03T21:09:41Z

test Windows_NT arm Cross Checked Innerloop Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs0x10 Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs2 Build and Test
test Windows_NT arm64 Cross Checked jitstress2_jitstressregs0x1000 Build and Test

sandreenko · 2019-01-03T21:42:36Z

The PR was updated to support save_next after the last int pair and save_next in the beginning of an epilog.
Now we have:
System.Private.CoreLib.dasm had 107866 diffs (before was 65544).
Size of checked crossgened System.Private.CoreLib.dll for arm64 went down from 2,110,848 bytes to 12,101,120 bytes (-9,728 bytes) that is smaller than I expected. Looks like padding and alignment fill most bytes that we won with save_next.

please take another look @BruceForstall.

BruceForstall · 2019-01-03T21:45:45Z

Can you show some asm diffs here? Both for pure int save_next, and for cases of float save_next following last int save_next?

src/jit/codegenarm64.cpp

sandreenko · 2019-01-03T23:47:38Z

Can you show some asm diffs here? Both for pure int save_next, and for cases of float save_next following last int save_next?

As disscused above we have examples only for the first case:

***** F:\DIFFS\DIFFOUT\DASMSET_90\BASE\System.Private.CoreLib.dasm
    ---- Epilog start at index 1 ----
    D8 8A       save_fregp X#2 Z#10 (0x0A); stp d10, d11, [sp, #80]
    D8 08       save_fregp X#0 Z#8 (0x08); stp d8, d9, [sp, #64]
***** F:\DIFFS\DIFFOUT\DASMSET_90\DIFF\SYSTEM.PRIVATE.CORELIB.DASM
    ---- Epilog start at index 1 ----
    E6          save_next
    D8 08       save_fregp X#0 Z#8 (0x08); stp d8, d9, [sp, #64]
*****

or

  ---- Unwind codes ----
    E1          set_fp; mov fp, sp
    ---- Epilog start at index 1 ----
    E6          save_next
    E6          save_next
    C8 02       save_regp X#0 Z#2 (0x02); stp x19, x20, [sp, #16]
    87          save_fplr_x #7 (0x07); stp fp, lr, [sp, #-64]!
    E4          end
    E4          end

instead of

  ---- Unwind codes ----
    E1          set_fp; mov fp, sp
    ---- Epilog start at index 1 ----
    C9 06       save_regp X#4 Z#6 (0x06); stp x23, x24, [sp, #48]
    C8 84       save_regp X#2 Z#4 (0x04); stp x21, x22, [sp, #32]
    C8 02       save_regp X#0 Z#2 (0x02); stp x19, x20, [sp, #16]
    87          save_fplr_x #7 (0x07); stp fp, lr, [sp, #-64]!
    E4          end
    E4          end
    E4          end
    E4          end

sandreenko · 2019-01-03T23:59:03Z

a new infrastructure failure on arm64 windows:

13:58:52 D:\j\workspace\arm64_cross_d---ffc60a4b>powershell -NoProfile -Command "Add-Type -Assembly 'System.IO.Compression.FileSystem'; [System.IO.Compression.ZipFile]::CreateFromDirectory('.\bin\tests\Windows_NT.arm64.Debug', '.\bin\tests\tests.zip')" 
14:02:21 Exception calling "CreateFromDirectory" with "2" argument(s): "Operation did not complete successfully because the 
14:02:21 file contains a virus or potentially unwanted software.
14:02:21 "
14:02:21 At line:1 char:56
14:02:21 + ... ileSystem'; [System.IO.Compression.ZipFile]::CreateFromDirectory('.\b ...
14:02:21 +                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
14:02:21     + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
14:02:21     + FullyQualifiedErrorId : IOException
14:02:21  
14:02:21 
14:02:21 D:\j\workspace\arm64_cross_d---ffc60a4b>exit 1

@dotnet/dnceng PTAL

sandreenko · 2019-01-04T00:34:35Z

test Windows_NT arm Cross Checked Innerloop Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs0x10 Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs2 Build and Test
test Windows_NT arm64 Cross Checked jitstress2_jitstressregs0x1000 Build and Test

MattGal · 2019-01-04T00:39:24Z

@sandreenko this problem has been around since the mid-2000s; CoreCLR tests just look like viruses, independent of architecture used, and anti-virus must have exceptions for the build output folders to succeed when building them.

I'm unsure how you just started noticing this (I'd guess perhaps more stuff is building, build agent changed, or tests were disabled before... dunno? ) @meganaquinn is actively working to address this via https://github.com/dotnet/core-eng/issues/4555

sandreenko · 2019-01-04T06:51:40Z

test OSX10.12 x64 Checked Innerloop Build and Test
test Windows_NT arm Cross Debug Innerloop Build
test Windows_NT arm64 Cross Checked Innerloop Build and Test
test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs2 Build and Test

sandreenko · 2019-01-04T19:30:04Z

test Ubuntu16.04 arm64 Cross Checked jitstress2_jitstressregs2 Build and Test

sandreenko · 2019-01-04T21:41:21Z

Windows_NT arm Cross Debug Innerloop Build fails because of "file contains a virus or potentially unwanted software".

I have checked diffs once more and found that the change improved size for 9508 unwind sections (means Code Words after is lower than Code Words before), i.e.:

after:
Unwind Info:
  >> Start offset   : 0xd1ffab1e (not in unwind data)
  >>   End offset   : 0xd1ffab1e (not in unwind data)
  Code Words        : 2
  Epilog Count      : 0
  E bit             : 1
  X bit             : 0
  Vers              : 0
  Function Length   : 20 (0x00014) Actual length = 80 (0x000050)
  --- One epilog, unwind codes at 0
  ---- Unwind codes ----
    ---- Epilog start at index 0 ----
    E6          save_next
    E6          save_next
    E6          save_next
    E6          save_next
    C8 04       save_regp X#0 Z#4 (0x04); stp x19, x20, [sp, #32]
    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
    E4          end
before:
Unwind Info:
  >> Start offset   : 0xd1ffab1e (not in unwind data)
  >>   End offset   : 0xd1ffab1e (not in unwind data)
  Code Words        : 3
  Epilog Count      : 0
  E bit             : 1
  X bit             : 0
  Vers              : 0
  Function Length   : 20 (0x00014) Actual length = 80 (0x000050)
  --- One epilog, unwind codes at 0
  ---- Unwind codes ----
    ---- Epilog start at index 0 ----
    CA 0C       save_regp X#8 Z#12 (0x0C); stp x27, x28, [sp, #96]
    C9 8A       save_regp X#6 Z#10 (0x0A); stp x25, x26, [sp, #80]
    C9 08       save_regp X#4 Z#8 (0x08); stp x23, x24, [sp, #64]
    C8 86       save_regp X#2 Z#6 (0x06); stp x21, x22, [sp, #48]
    C8 04       save_regp X#0 Z#4 (0x04); stp x19, x20, [sp, #32]
    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
    E4          end

and each Code Word is 4 bytes long. So it means we should expect ~40 Kbytes improvement, but I see only ~10 in crossgened System.Private.CoreLib.dll image size diff, where do we lose other 30?

BruceForstall · 2019-01-04T22:13:45Z

where do we lose other 30?

It's very likely we don't gain much due to alignment. I think the full alignment data is 4-byte aligned, and padded.

sandreenko · 2019-01-04T22:29:10Z

It's very likely we don't gain much due to alignment. I think the full alignment data is 4-byte aligned, and padded.

Unwind Info is currently 4-byte aligned, so we have cases where we replaced one save_pair with save_next and added end to keep the alignment:

after:
  ---- Unwind codes ----
    E1          set_fp; mov fp, sp
    ---- Epilog start at index 1 ----
    E6          save_next
    C8 02       save_regp X#0 Z#2 (0x02); stp x19, x20, [sp, #16]
    85          save_fplr_x #5 (0x05); stp fp, lr, [sp, #-48]!
    E4          end
    E4          end
    E4          end

before:
  ---- Unwind codes ----
    E1          set_fp; mov fp, sp
    ---- Epilog start at index 1 ----
    C8 84       save_regp X#2 Z#4 (0x04); stp x21, x22, [sp, #32]
    C8 02       save_regp X#0 Z#2 (0x02); stp x19, x20, [sp, #16]
    85          save_fplr_x #5 (0x05); stp fp, lr, [sp, #-48]!
    E4          end
    E4          end

but even with that we should see 9508 code words improvement.

sandreenko · 2019-01-08T19:56:28Z

@BruceForstall I think it is ready for another round of review.

BruceForstall

Looks good! Thanks for the cleanup, too.

sandreenko · 2019-01-08T23:40:47Z

@BruceForstall thank you for the review.

* Use `saveNext` opcode on arm64. * Support using of `save_next` on int/float border. * Delete the extra requirement that an epilog sequences can't start from `save_next`. * response feedback Commit migrated from dotnet/coreclr@62298e6

Use saveNext opcode on arm64.

e3627fd

sandreenko added optimization area-CodeGen arch-arm64 labels Dec 27, 2018

sandreenko requested a review from BruceForstall December 27, 2018 02:09

BruceForstall suggested changes Jan 3, 2019

View reviewed changes

Sergey Andreenko added 2 commits January 3, 2019 12:22

Support using of save_next on int/float border.

b6ca00f

Delete the extra requirement that an epilog sequences can't start fro…

fa5923e

…m `save_next`.

BruceForstall reviewed Jan 3, 2019

View reviewed changes

src/jit/codegenarm64.cpp Outdated Show resolved Hide resolved

src/jit/codegenarm64.cpp Outdated Show resolved Hide resolved

src/jit/codegenarm64.cpp Outdated Show resolved Hide resolved

src/jit/codegenarm64.cpp Outdated Show resolved Hide resolved

response feedback

aed2a96

BruceForstall approved these changes Jan 8, 2019

View reviewed changes

sandreenko merged commit 62298e6 into dotnet:master Jan 8, 2019

sandreenko deleted the useNextPair branch January 8, 2019 23:40

Use saveNext unwind opcode on arm64. #21683

Use saveNext unwind opcode on arm64. #21683

Uh oh!

Conversation

sandreenko commented Dec 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sandreenko commented Dec 27, 2018

Uh oh!

sandreenko commented Dec 27, 2018

Uh oh!

filipnavara commented Dec 27, 2018

Uh oh!

sandreenko commented Dec 27, 2018

Uh oh!

BruceForstall commented Jan 2, 2019

Uh oh!

BruceForstall commented Jan 2, 2019

Uh oh!

sandreenko commented Jan 2, 2019

Uh oh!

mikedn commented Jan 2, 2019

Uh oh!

sandreenko commented Jan 2, 2019

Uh oh!

janvorli commented Jan 2, 2019

Uh oh!

sandreenko commented Jan 2, 2019

Uh oh!

BruceForstall commented Jan 3, 2019

Uh oh!

BruceForstall commented Jan 3, 2019

Uh oh!

BruceForstall Jan 3, 2019

Choose a reason for hiding this comment

Uh oh!

sandreenko Jan 3, 2019

Choose a reason for hiding this comment

Uh oh!

BruceForstall Jan 3, 2019

Choose a reason for hiding this comment

Uh oh!

BruceForstall Jan 3, 2019

Choose a reason for hiding this comment

Uh oh!

sandreenko Jan 3, 2019

Choose a reason for hiding this comment

Uh oh!

sandreenko commented Jan 3, 2019

Uh oh!

sandreenko commented Jan 3, 2019

Uh oh!

sandreenko commented Jan 3, 2019

Uh oh!

BruceForstall commented Jan 3, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sandreenko commented Jan 3, 2019

Uh oh!

sandreenko commented Jan 3, 2019

Uh oh!

sandreenko commented Jan 4, 2019

Uh oh!

MattGal commented Jan 4, 2019

Uh oh!

sandreenko commented Jan 4, 2019

Uh oh!

sandreenko commented Jan 4, 2019

Uh oh!

sandreenko commented Jan 4, 2019

Uh oh!

BruceForstall commented Jan 4, 2019

Uh oh!

sandreenko commented Jan 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sandreenko commented Jan 8, 2019

Use `saveNext` unwind opcode on arm64. #21683

Use `saveNext` unwind opcode on arm64. #21683

sandreenko commented Dec 27, 2018 •

edited

Loading

sandreenko commented Jan 4, 2019 •

edited

Loading