Skip to content

[CodeGen][WinEH] Update saved esp for inlined inallocas #116585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

MuellerMP
Copy link
Contributor

This fixes issue #116583

When inalloca calls are inlined the static stack pointer saving prolog of X86WinEHState breaks due to dynamic allocas.
In this case we need to update the saved esp for every inalloca and for every stackrestore also related to inalloca.

@llvmbot
Copy link
Member

llvmbot commented Nov 18, 2024

@llvm/pr-subscribers-backend-x86

Author: Mirko (MuellerMP)

Changes

This fixes issue #116583

When inalloca calls are inlined the static stack pointer saving prolog of X86WinEHState breaks due to dynamic allocas.
In this case we need to update the saved esp for every inalloca and for every stackrestore also related to inalloca.


Full diff: https://github.com/llvm/llvm-project/pull/116585.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86WinEHState.cpp (+33-3)
  • (added) llvm/test/CodeGen/WinEH/wineh-inlined-inalloca.ll (+90)
diff --git a/llvm/lib/Target/X86/X86WinEHState.cpp b/llvm/lib/Target/X86/X86WinEHState.cpp
index ef212736730114..c941196dd599cf 100644
--- a/llvm/lib/Target/X86/X86WinEHState.cpp
+++ b/llvm/lib/Target/X86/X86WinEHState.cpp
@@ -23,6 +23,7 @@
 #include "llvm/IR/Function.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/IntrinsicsX86.h"
 #include "llvm/IR/Module.h"
@@ -75,6 +76,8 @@ class WinEHStatePass : public FunctionPass {
   int getStateForCall(DenseMap<BasicBlock *, ColorVector> &BlockColors,
                       WinEHFuncInfo &FuncInfo, CallBase &Call);
 
+  void updateEspForInAllocas(Function &F);
+
   // Module-level type getters.
   Type *getEHLinkRegistrationType();
   Type *getSEHRegistrationType();
@@ -100,6 +103,9 @@ class WinEHStatePass : public FunctionPass {
   /// fs:00 chain and the current state.
   AllocaInst *RegNode = nullptr;
 
+  // Struct type of RegNode. Used for GEPing.
+  Type *RegNodeTy = nullptr;
+
   // The allocation containing the EH security guard.
   AllocaInst *EHGuardNode = nullptr;
 
@@ -188,11 +194,13 @@ bool WinEHStatePass::runOnFunction(Function &F) {
   // numbers into an immutable analysis pass.
   WinEHFuncInfo FuncInfo;
   addStateStores(F, FuncInfo);
+  updateEspForInAllocas(F);
 
   // Reset per-function state.
   PersonalityFn = nullptr;
   Personality = EHPersonality::Unknown;
   UseStackGuard = false;
+  RegNodeTy = nullptr;
   RegNode = nullptr;
   EHGuardNode = nullptr;
 
@@ -269,9 +277,6 @@ void WinEHStatePass::emitExceptionRegistrationRecord(Function *F) {
   assert(Personality == EHPersonality::MSVC_CXX ||
          Personality == EHPersonality::MSVC_X86SEH);
 
-  // Struct type of RegNode. Used for GEPing.
-  Type *RegNodeTy;
-
   IRBuilder<> Builder(&F->getEntryBlock(), F->getEntryBlock().begin());
   Type *Int8PtrType = Builder.getPtrTy();
   Type *Int32Ty = Builder.getInt32Ty();
@@ -774,3 +779,28 @@ void WinEHStatePass::insertStateNumberStore(Instruction *IP, int State) {
                                               RegNode, StateFieldIndex);
   Builder.CreateStore(Builder.getInt32(State), StateField);
 }
+
+void WinEHStatePass::updateEspForInAllocas(Function& F)
+{
+  for (BasicBlock& BB : F) {
+    for (Instruction &I : BB) {
+      if (auto* Alloca = dyn_cast<AllocaInst>(&I)) {
+        if (!Alloca->isUsedWithInAlloca())
+          continue;
+        IRBuilder<> Builder(Alloca->getNextNonDebugInstruction());
+        // SavedESP = llvm.stacksave()
+        Value *SP = Builder.CreateStackSave();
+        Builder.CreateStore(SP, Builder.CreateStructGEP(RegNodeTy, RegNode, 0));
+      }
+
+      if (auto* II = dyn_cast<IntrinsicInst>(&I)) {
+        if (II->getIntrinsicID() != Intrinsic::stackrestore)
+          continue;
+        IRBuilder<> Builder(II->getNextNonDebugInstruction());
+        // SavedESP = llvm.stacksave()
+        Value *SP = Builder.CreateStackSave();
+        Builder.CreateStore(SP, Builder.CreateStructGEP(RegNodeTy, RegNode, 0));
+      }
+    }
+  }
+}
diff --git a/llvm/test/CodeGen/WinEH/wineh-inlined-inalloca.ll b/llvm/test/CodeGen/WinEH/wineh-inlined-inalloca.ll
new file mode 100644
index 00000000000000..87f6d000e4b58a
--- /dev/null
+++ b/llvm/test/CodeGen/WinEH/wineh-inlined-inalloca.ll
@@ -0,0 +1,90 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s | FileCheck %s
+target datalayout = "e-m:x-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32-a:0:32-S32"
+target triple = "i386-pc-windows-msvc"
+
+%struct.Foo = type { i32, i32 }
+
+define dso_local noundef i32 @foo() local_unnamed_addr #0 personality ptr @__CxxFrameHandler3 {
+; CHECK-LABEL: foo:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    pushl %ebp
+; CHECK-NEXT:    movl %esp, %ebp
+; CHECK-NEXT:    pushl %ebx
+; CHECK-NEXT:    pushl %edi
+; CHECK-NEXT:    pushl %esi
+; CHECK-NEXT:    subl $16, %esp
+; CHECK-NEXT:    movl %esp, -28(%ebp)
+; CHECK-NEXT:    movl $-1, -16(%ebp)
+; CHECK-NEXT:    leal -24(%ebp), %eax
+; CHECK-NEXT:    movl $___ehhandler$foo, -20(%ebp)
+; CHECK-NEXT:    movl %fs:0, %ecx
+; CHECK-NEXT:    movl %ecx, -24(%ebp)
+; CHECK-NEXT:    movl %eax, %fs:0
+; CHECK-NEXT:    pushl %eax
+; CHECK-NEXT:    pushl %eax
+; CHECK-NEXT:    movl %esp, %ecx
+; CHECK-NEXT:    movl %esp, -28(%ebp)
+; CHECK-NEXT:    movl $123, (%ecx)
+; CHECK-NEXT:    calll _bar
+; CHECK-NEXT:    movl $0, -16(%ebp)
+; CHECK-NEXT:    calll _alwaysthrows
+; CHECK-NEXT:  # %bb.3: # %unreachable.i
+; CHECK-NEXT:  LBB0_2: # Block address taken
+; CHECK-NEXT:    # %catch.i
+; CHECK-NEXT:    addl $12, %ebp
+; CHECK-NEXT:    jmp LBB0_4
+; CHECK-NEXT:  LBB0_4: # %exit
+; CHECK-NEXT:  $ehgcr_0_4:
+; CHECK-NEXT:    movl -24(%ebp), %eax
+; CHECK-NEXT:    movl %eax, %fs:0
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    leal -12(%ebp), %esp
+; CHECK-NEXT:    popl %esi
+; CHECK-NEXT:    popl %edi
+; CHECK-NEXT:    popl %ebx
+; CHECK-NEXT:    popl %ebp
+; CHECK-NEXT:    retl
+; CHECK-NEXT:    .def "?catch$1@?0?foo@4HA";
+; CHECK-NEXT:    .scl 3;
+; CHECK-NEXT:    .type 32;
+; CHECK-NEXT:    .endef
+; CHECK-NEXT:    .p2align 4
+; CHECK-NEXT:  "?catch$1@?0?foo@4HA":
+; CHECK-NEXT:  LBB0_1: # %catch.i
+; CHECK-NEXT:    pushl %ebp
+; CHECK-NEXT:    addl $12, %ebp
+; CHECK-NEXT:    movl %esp, -28(%ebp)
+; CHECK-NEXT:    movl $LBB0_2, %eax
+; CHECK-NEXT:    popl %ebp
+; CHECK-NEXT:    retl # CATCHRET
+; CHECK-NEXT:  Lfunc_end0:
+entry:
+  %argmem = alloca inalloca <{ %struct.Foo }>, align 4
+  store i32 123, ptr %argmem, align 4
+  call x86_thiscallcc void @bar(ptr noundef nonnull align 4 dereferenceable(8) %argmem)
+  invoke void @alwaysthrows() #1
+          to label %unreachable.i unwind label %catch.dispatch.i
+
+catch.dispatch.i:                                 ; preds = %entry
+  %3 = catchswitch within none [label %catch.i] unwind to caller
+
+catch.i:                                          ; preds = %catch.dispatch.i
+  %4 = catchpad within %3 [ptr null, i32 64, ptr null]
+  catchret from %4 to label %exit
+
+unreachable.i:                                    ; preds = %entry
+  unreachable
+
+exit:                                             ; preds = %catch.i
+  ret i32 0
+}
+
+declare dso_local x86_thiscallcc void @bar(ptr noundef nonnull align 4 dereferenceable(8) %this) local_unnamed_addr
+
+declare dso_local i32 @__CxxFrameHandler3(...)
+
+declare dso_local void @alwaysthrows() local_unnamed_addr
+
+attributes #0 = { norecurse "min-legal-vector-width"="0" "target-cpu"="pentium4" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #1 = { noreturn }

Copy link

github-actions bot commented Nov 18, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is not a fun bug.

for (BasicBlock &BB : F) {
for (Instruction &I : BB) {
if (auto *Alloca = dyn_cast<AllocaInst>(&I)) {
if (!Alloca->isUsedWithInAlloca())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you really want to do here is update the saved ESP after every stack pointer update, which means every dynamic alloca, and every stackrestore operation.

It's only correct to exclude inalloca allocas because it's generally impossible to recover from exceptions inside of a call expression, although it occurs to me that statement expressions make this possible (Ew).

For that reason, and given the general disrepair and non-performance criticality of 32-bit Windows code quality, I would recommend using Alloca->isStaticAlloca() as the criteria here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

First of all thanks for the review.

Also added another test case for dynamic allocas.
I was unsure if this was actually a case we have to handle because my test cases never seemd to generate one.
But I see now that you can just use the c alloca api directly to generate one.

Regarding the inalloca case:
Statement expressions are not the only way to introduce this hazard - inlining does as well as I have demonstrated in the C++ code of issue #116583.
There I overrun the inalloca stack object within the called function itself.
This is actually something I find more often in my test code base then "alloca" calls.

Lifetime annoations and stacksave/restore intrinsics will get properly inlined into a caller if they exist.
My initial thought was to then maybe allow WinEH to process the lifetimes of these dynamically allocated stack objects.
The problem that I encountered with this approach is the following:

; Function Attrs: mustprogress sspstrong
define internal fastcc noundef i32 @"??$invoke@AAV<lambda_0>@?0??..."(ptr noundef nonnull align 4 dereferenceable(48) %_Obj, ptr noundef nonnull align 4 dereferenceable(8) %_Arg1, ptr noundef nonnull align 4 dereferenceable(4) %_Args21, ptr noundef nonnull align 4 dereferenceable(28) %_Args2) unnamed_addr #3 !dbg !7302 {
entry:
  tail call void @llvm.dbg.value(metadata ptr %_Args2, metadata !7306, metadata !DIExpression()), !dbg !7315
  tail call void @llvm.dbg.value(metadata ptr %_Args21, metadata !7307, metadata !DIExpression()), !dbg !7315
  tail call void @llvm.dbg.value(metadata ptr %_Arg1, metadata !7308, metadata !DIExpression()), !dbg !7315
  tail call void @llvm.dbg.value(metadata ptr %_Obj, metadata !7309, metadata !DIExpression()), !dbg !7315
  %argmem = alloca inalloca <{ %"class.std::shared_ptr.16", ptr, ptr }>, align 4
  %call = call x86_thiscallcc noundef ptr @"??0?$shared_ptr@..."(ptr noundef nonnull align 4 dereferenceable(8) %argmem, ptr noundef nonnull align 4 dereferenceable(8) %_Arg1) #18, !dbg !7316
  %0 = getelementptr inbounds <{ %"class.std::shared_ptr.16", ptr, ptr }>, ptr %argmem, i32 0, i32 1, !dbg !7316
  store ptr %_Args21, ptr %0, align 4, !dbg !7316
  %1 = getelementptr inbounds <{ %"class.std::shared_ptr.16", ptr, ptr }>, ptr %argmem, i32 0, i32 2, !dbg !7316
  store ptr %_Args2, ptr %1, align 4, !dbg !7316
  %call3 = call fastcc noundef i32 @"??R<lambda_0>@?0??.."(ptr noundef nonnull %_Obj, ptr nonnull %argmem), !dbg !7316, !range !7232
  ret i32 %call3, !dbg !7316
}

Here we have a std::invoke instantiation (truncated for readability) which does execute the call operator of the lambda.
We do not have lifetimes for the argmem and also the call operator has its inalloca attribute stripped.

I see the case for not emitting lifetime annotations here in the frontend due to this being a single block func.
But what is more problematic in my opinion is that GlobalOpt does strip the inalloca attribute of argmem inside the CallBase but doesnt strip the attribute for the alloca.
If we now inline the call operator func and want to generate lifetimes for the inalloca alloca we would have to walk all instructions to check if the caller has an inalloca alloca.
We can't determine it from the CallBase since the Inliner runs after GlobalOpt.

So TL;DR is:
I guess the lifetime management of inalloca allocas is also suboptimal and could be improved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is any security aspect to this since the original bug was causing a stack object overrun, which could theoretically be used for exploitation in some weird scenario.
But since this is only caused by a miscompile of clang-cl I'm uncertain wheter this is actually relevant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what is more problematic in my opinion is that GlobalOpt does strip the inalloca attribute of argmem inside the CallBase but doesnt strip the attribute for the alloca.

That is a good point!

I would lean away from messing with lifetime attribute placement. There are many aspects of exception handling and optimizations that make it hard to use these reliably.

A low-cost idea that comes to mind is to have a local pass that looks for inalloca allocas that are not used in conjunction with an inalloca call and promote them to static allocas. There are many passes beyond inlining that might delete the consuming inalloca call, so there are good reasons to do this separately from GlobalOpt. However, inalloca allocas are often captured by constructors and destructors, which is why we're using inalloca in the first place, and that makes it hard to prove the alloca is not used later by an inalloca call.

I previously attempted to design a more complete solution to this problem with the preallocated attribute which uses tokens to make it possible to statically identify which calls use the argument memory, but the reality is that it's not a priority for my employer, so I wasn't able to get this work prioritized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the security implications, I would just fix the miscompile. The on-stack exception handling records make this platform extremely easy to attack, and I can't say with confidence that LLVM has implemented all of the hardening techniques to protect the exception registration nodes in the FS:00 linked list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your insight @rnk - greatly appreciated!
Fixing up callless inalloca allocas will probably reduce the impact this fix has on O2.
I'm going to look into it the following couple of weekends.
I guess my current fix is still valid for (atleast) those edge cases of using dynamic allocas or statement expressions.
Maybe I should separate the continued work on hoisting callless inalloca allocas if you have concerns regarding the reliability of proving them (new PR).

PS: Not to be obnoxious but if you find any more spare time to spend on WinEH reviews - a colleague of mine has prepped some state handling improvements for /EHa over at #116546.

This fixes issue llvm#116583

When inalloca calls are inlined the static stack pointer saving prolog of X86WinEHState breaks due to dynamic allocas.
In this case we need to update the saved esp for every inalloca and for every stackrestore also related to inalloca.
Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! I'll merge it

@rnk rnk merged commit 926a71f into llvm:main Nov 21, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants