-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[clang] Add optional pass to remove UBSAN traps using PGO #84214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang] Add optional pass to remove UBSAN traps using PGO #84214
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Vitaly Buka (vitalybuka) ChangesWith #83470 it reduces UBSAN overhead from 52.5% to 12.6%. On real server apps we see 95% of code instrumented, with 10% -> 1.5% We have followup patches to improve it even further. Full diff: https://github.com/llvm/llvm-project/pull/84214.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 056f790d41853d..6f9aa262f3d51e 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -76,6 +76,7 @@
#include "llvm/Transforms/Instrumentation/MemProfiler.h"
#include "llvm/Transforms/Instrumentation/MemorySanitizer.h"
#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Instrumentation/RemoveTrapsPass.h"
#include "llvm/Transforms/Instrumentation/SanitizerBinaryMetadata.h"
#include "llvm/Transforms/Instrumentation/SanitizerCoverage.h"
#include "llvm/Transforms/Instrumentation/ThreadSanitizer.h"
@@ -83,6 +84,7 @@
#include "llvm/Transforms/Scalar/EarlyCSE.h"
#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Scalar/JumpThreading.h"
+#include "llvm/Transforms/Scalar/SimplifyCFG.h"
#include "llvm/Transforms/Utils/Debugify.h"
#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"
#include "llvm/Transforms/Utils/ModuleUtils.h"
@@ -98,6 +100,10 @@ using namespace llvm;
namespace llvm {
extern cl::opt<bool> PrintPipelinePasses;
+cl::opt<bool> ClRemoveTraps("clang-remove-traps", cl::Optional,
+ cl::desc("Insert remove-traps pass."),
+ cl::init(false));
+
// Experiment to move sanitizers earlier.
static cl::opt<bool> ClSanitizeOnOptimizerEarlyEP(
"sanitizer-early-opt-ep", cl::Optional,
@@ -743,6 +749,17 @@ static void addSanitizers(const Triple &TargetTriple,
// LastEP does not need GlobalsAA.
PB.registerOptimizerLastEPCallback(SanitizersCallback);
}
+
+ if (ClRemoveTraps) {
+ PB.registerOptimizerEarlyEPCallback([&](ModulePassManager &MPM,
+ OptimizationLevel Level) {
+ FunctionPassManager FPM;
+ FPM.addPass(RemoveTrapsPass());
+ FPM.addPass(
+ SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
+ MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
+ });
+ }
}
void EmitAssemblyHelper::RunOptimizationPipeline(
diff --git a/clang/test/CodeGen/remote-traps.cpp b/clang/test/CodeGen/remote-traps.cpp
new file mode 100644
index 00000000000000..7d4eb76a7d81e9
--- /dev/null
+++ b/clang/test/CodeGen/remote-traps.cpp
@@ -0,0 +1,15 @@
+// RUN: %clang_cc1 -O1 -emit-llvm -fsanitize=signed-integer-overflow -fsanitize-trap=signed-integer-overflow %s -o - | FileCheck %s
+// RUN: %clang_cc1 -O1 -emit-llvm -fsanitize=signed-integer-overflow -fsanitize-trap=signed-integer-overflow -mllvm -clang-remove-traps -mllvm -remove-traps-random-rate=1 %s -o - | FileCheck %s --implicit-check-not="call void @llvm.ubsantrap" --check-prefixes=REMOVE
+
+int f(int x) {
+ return x + 123;
+}
+
+// CHECK-LABEL: define dso_local noundef i32 @_Z1fi(
+// CHECK: call { i32, i1 } @llvm.sadd.with.overflow.i32(
+// CHECK: trap:
+// CHECK-NEXT: call void @llvm.ubsantrap(i8 0)
+// CHECK-NEXT: unreachable, !nosanitize !2
+
+// REMOVE-LABEL: define dso_local noundef i32 @_Z1fi(
+// REMOVE: call { i32, i1 } @llvm.sadd.with.overflow.i32(
|
Please ignore buildkite for now, as the patch depends on #83471 which I uploaded without stacked review. |
is there a long term plan to add a driver flag for this? |
73a4120
to
000854e
Compare
yes, but I'd like to that after we collect feedback from first users |
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
They are introduced by earlier transformations Note: I'd like to have special intrinsic for this optimization. When we have it, we likely don't need this SimplifyCFG. |
lgtm with a comment added on why we're adding the extra SimplifyCFG |
oh I missed that you'd added one. but maybe a TODO with your comment about an intrinsic? |
@vitalybuka the test you added |
looking |
The goal is to have ability to change logic compile time based on PGO data. Our primary application is removing UBSAN checks from hot code. Then we'd like to use this for libc++ hardening and regular debug asserts. Previous attempt is #84214. Benefits from special intrinsic vs #84214: 1. Resulting binary is 3% faster than removing traps (on "test-suite/MultiSource/Benchmarks" with PGO+ThinLTO) 2. Intrinsic can be used from source code to change behavior from C/C++ program. E.g. enabling asserts in cold code. 3. Easier to match basic blocks. RFC: https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641 --------- Co-authored-by: Nikita Popov <[email protected]>
With #83471 it reduces UBSAN overhead from 44% to 6%.
Measured as "Geomean difference" on "test-suite/MultiSource/Benchmarks"
with PGO+ThinLTO build.
On real large server binary we see 95% of code is still instrumented,
with 10% -> 1.5% UBSAN overhead improvements. We can pass this test only
with subset of UBSAN, so base overhead is smaller.
We have followup patches to improve it even further.