Skip to content

[CalcSpillWeights] Avoid x87 excess precision influencing weight result #100165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

DimitryAndric
Copy link
Collaborator

Fixes #99396

The result of VirtRegAuxInfo::weightCalcHelper can be influenced by x87 excess precision, which can result in slightly different register choices when the compiler is hosted on x86_64 or i386. This leads to different object file output when cross-compiling to i386, or native.

Similar to 7af3432, add a volatile qualifier to the local Weight variable to force it onto the stack, and avoid the excess precision.

@llvmbot
Copy link
Member

llvmbot commented Jul 23, 2024

@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-backend-x86

Author: Dimitry Andric (DimitryAndric)

Changes

Fixes #99396

The result of VirtRegAuxInfo::weightCalcHelper can be influenced by x87 excess precision, which can result in slightly different register choices when the compiler is hosted on x86_64 or i386. This leads to different object file output when cross-compiling to i386, or native.

Similar to 7af3432, add a volatile qualifier to the local Weight variable to force it onto the stack, and avoid the excess precision.


Full diff: https://github.com/llvm/llvm-project/pull/100165.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+3-1)
  • (added) llvm/test/CodeGen/X86/pr99396.ll (+59)
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index 1d767a3484bca..60055e6ba0cd3 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -257,7 +257,9 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
       return -1.0f;
     }
 
-    float Weight = 1.0f;
+    // Force Weight onto the stack so that x86 doesn't add hidden precision,
+    // similar to HWeight below.
+    volatile float Weight = 1.0f;
     if (IsSpillable) {
       // Get loop info for mi.
       if (MI->getParent() != MBB) {
diff --git a/llvm/test/CodeGen/X86/pr99396.ll b/llvm/test/CodeGen/X86/pr99396.ll
new file mode 100644
index 0000000000000..7bc1153b57990
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr99396.ll
@@ -0,0 +1,59 @@
+; RUN: llc < %s -mtriple=i386-unknown-freebsd -enable-misched -relocation-model=pic | FileCheck %s
+
+target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i128:128-f64:32:64-f80:32-n8:16:32-S128"
+target triple = "i386-unknown-freebsd15.0"
+
+@c = external local_unnamed_addr global ptr
+
+declare i32 @fn2() local_unnamed_addr
+
+declare i32 @fn3() local_unnamed_addr
+
+define noundef i32 @fn4() #0 {
+entry:
+  %0 = load i32, ptr @fn4, align 4
+; CHECK: movl fn4@GOT(%ebx), %edi
+; CHECK-NEXT: movl (%edi), %edx
+  %1 = load ptr, ptr @c, align 4
+; CHECK: movl c@GOT(%ebx), %eax
+; CHECK-NEXT: movl (%eax), %esi
+; CHECK-NEXT: testl %esi, %esi
+  %cmp.g = icmp eq ptr %1, null
+  br i1 %cmp.g, label %if.then.g, label %if.end3.g
+
+if.then.g:                                        ; preds = %entry
+  %2 = load i32, ptr inttoptr (i32 1 to ptr), align 4
+  %cmp1.g = icmp slt i32 %2, 0
+  br i1 %cmp1.g, label %if.then2.g, label %if.end3.g
+
+if.then2.g:                                       ; preds = %if.then.g
+  %.g = load volatile i32, ptr null, align 2147483648
+  br label %f.exit
+
+if.end3.g:                                        ; preds = %if.then.g, %entry
+  %h.i.g = icmp eq i32 %0, 0
+  br i1 %h.i.g, label %f.exit, label %while.body.g
+
+while.body.g:                                     ; preds = %if.end3.g, %if.end8.g
+  %buff.addr.019.g = phi ptr [ %incdec.ptr.g, %if.end8.g ], [ @fn4, %if.end3.g ]
+  %g.addr.018.g = phi i32 [ %dec.g, %if.end8.g ], [ %0, %if.end3.g ]
+  %call4.g = tail call i32 @fn3(ptr %1, ptr %buff.addr.019.g, i32 %g.addr.018.g)
+  %cmp5.g = icmp slt i32 %call4.g, 0
+  br i1 %cmp5.g, label %if.then6.g, label %if.end8.g
+
+if.then6.g:                                       ; preds = %while.body.g
+  %call7.g = tail call i32 @fn2(ptr null)
+  br label %f.exit
+
+if.end8.g:                                        ; preds = %while.body.g
+  %dec.g = add i32 %g.addr.018.g, 1
+  %incdec.ptr.g = getelementptr i32, ptr %buff.addr.019.g, i32 1
+  store i64 0, ptr %1, align 4
+  %h.not.g = icmp eq i32 %dec.g, 0
+  br i1 %h.not.g, label %f.exit, label %while.body.g
+
+f.exit:                                           ; preds = %if.end8.g, %if.then6.g, %if.end3.g, %if.then2.g
+  ret i32 0
+}
+
+attributes #0 = { "frame-pointer"="all" "tune-cpu"="generic" }

%0 = load i32, ptr @fn4, align 4
; CHECK: movl fn4@GOT(%ebx), %edi
; CHECK-NEXT: movl (%edi), %edx
%1 = load ptr, ptr @c, align 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use named values in tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I haven't got much experience with writing these tests, what does this mean? I'm intending here to check the exact registers used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%0 should be %load or some other name etc.

float Weight = 1.0f;
// Force Weight onto the stack so that x86 doesn't add hidden precision,
// similar to HWeight below.
volatile float Weight = 1.0f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap this in a macro in Support somewhere to only do this if using degenerate x87 handling?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, maybe a specific stack_float_t type?

Comment on lines 3 to 4
target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i128:128-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-freebsd15.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are redundant with the command line arguments

@@ -770,6 +770,10 @@ std::enable_if_t<std::is_signed_v<T>, T> MulOverflow(T X, T Y, T &Result) {
#endif
}

/// Type to force float point values onto the stack, so that x86 doesn't add
/// hidden precision, avoiding rounding differences on various platforms.
using stack_float_t = volatile float;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason this would go in support would be to conditionalize this on using the volatile only when x87 is in use

Fixes llvm#99396

The result of `VirtRegAuxInfo::weightCalcHelper` can be influenced by
x87 excess precision, which can result in slightly different register
choices when the compiler is hosted on x86_64 or i386. This leads to
different object file output when cross-compiling to i386, or native.

Similar to 7af3432, we need to add a `volatile` qualifier to the
local `Weight` variable to force it onto the stack, and avoid the excess
precision. Define `stack_float_t` in `MathExtras.h` for this purpose,
and use it.
@@ -770,6 +770,14 @@ std::enable_if_t<std::is_signed_v<T>, T> MulOverflow(T X, T Y, T &Result) {
#endif
}

/// Type to force float point values onto the stack, so that x86 doesn't add
/// hidden precision, avoiding rounding differences on various platforms.
#if defined(__i386__) || defined(_M_IX86)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is the exact condition but ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

i386-on-i386 build is different from i386-on-amd64 build
3 participants