Skip to content

[HWASAN] Implement selective instrumentation based on profiling information #83503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 1, 2024

Conversation

kstoimenov
Copy link
Contributor

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Feb 29, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Kirill Stoimenov (kstoimenov)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/83503.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp (+49-2)
diff --git a/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp b/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
index 33add6d4cd767b..bf59f10723c2db 100644
--- a/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
@@ -15,11 +15,14 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/Analysis/GlobalsModRef.h"
 #include "llvm/Analysis/PostDominators.h"
+#include "llvm/Analysis/BlockFrequencyInfo.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Analysis/StackSafetyAnalysis.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
@@ -177,6 +180,22 @@ static cl::opt<bool> ClWithTls(
              "platforms that support this"),
     cl::Hidden, cl::init(true));
 
+static cl::opt<bool>
+    CSkipHotCode("hwasan-skip-hot-code",
+                 cl::desc("Do not instument hot functions based on FDO."),
+                 cl::Hidden, cl::init(false));
+
+static cl::opt<int> HotPercentileCutoff("hwasan-percentile-cutoff-hot",
+                                        cl::init(0));
+
+STATISTIC(NumTotalFuncs, "Number of funcs seen by HWASAN");
+STATISTIC(NumHwasanCtors, "Number of HWASAN ctors");
+STATISTIC(NumNoSanitizeFuncs, "Number of no-sanitize HWASAN funcs");
+STATISTIC(NumConsideredFuncs, "Number of funcs considered for HWASAN");
+STATISTIC(NumInstrumentedFuncs, "Number of HWASAN instrumented funcs");
+STATISTIC(NumNoProfileSummaryFuncs, "Number of HWASAN funcs without PS");
+STATISTIC(NumSkippedHotFuncs, "Number of skipped hot HWASAN funcs");
+
 // Mode for selecting how to insert frame record info into the stack ring
 // buffer.
 enum RecordStackHistoryMode {
@@ -1501,11 +1520,39 @@ bool HWAddressSanitizer::instrumentStack(memtag::StackInfo &SInfo,
 
 void HWAddressSanitizer::sanitizeFunction(Function &F,
                                           FunctionAnalysisManager &FAM) {
-  if (&F == HwasanCtorFunction)
+  NumTotalFuncs++;
+  if (&F == HwasanCtorFunction) {
+    NumHwasanCtors++;
     return;
+  }
 
-  if (!F.hasFnAttribute(Attribute::SanitizeHWAddress))
+  if (!F.hasFnAttribute(Attribute::SanitizeHWAddress)) {
+    NumNoSanitizeFuncs++;
     return;
+  }
+
+  NumConsideredFuncs++;
+  if (CSkipHotCode) {
+    auto &MAMProxy = FAM.getResult<ModuleAnalysisManagerFunctionProxy>(F);
+    ProfileSummaryInfo *PSI =
+        MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
+    if (PSI != nullptr && PSI->hasProfileSummary()) {
+      auto &BFI = FAM.getResult<BlockFrequencyAnalysis>(F);
+      const bool is_hot =
+          (HotPercentileCutoff.getNumOccurrences() && HotPercentileCutoff >= 0)
+              ? PSI->isFunctionHotInCallGraphNthPercentile(HotPercentileCutoff,
+                                                           &F, BFI)
+              : PSI->isFunctionEntryHot(&F);
+
+      if (is_hot) {
+        ++NumSkippedHotFuncs;
+        return;
+      }
+    } else {
+      ++NumNoProfileSummaryFuncs;
+    }
+  }
+  NumInstrumentedFuncs++;
 
   LLVM_DEBUG(dbgs() << "Function: " << F.getName() << "\n");
 

@llvmbot
Copy link
Member

llvmbot commented Feb 29, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Kirill Stoimenov (kstoimenov)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/83503.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp (+49-2)
diff --git a/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp b/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
index 33add6d4cd767b..bf59f10723c2db 100644
--- a/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
@@ -15,11 +15,14 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/Analysis/GlobalsModRef.h"
 #include "llvm/Analysis/PostDominators.h"
+#include "llvm/Analysis/BlockFrequencyInfo.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Analysis/StackSafetyAnalysis.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
@@ -177,6 +180,22 @@ static cl::opt<bool> ClWithTls(
              "platforms that support this"),
     cl::Hidden, cl::init(true));
 
+static cl::opt<bool>
+    CSkipHotCode("hwasan-skip-hot-code",
+                 cl::desc("Do not instument hot functions based on FDO."),
+                 cl::Hidden, cl::init(false));
+
+static cl::opt<int> HotPercentileCutoff("hwasan-percentile-cutoff-hot",
+                                        cl::init(0));
+
+STATISTIC(NumTotalFuncs, "Number of funcs seen by HWASAN");
+STATISTIC(NumHwasanCtors, "Number of HWASAN ctors");
+STATISTIC(NumNoSanitizeFuncs, "Number of no-sanitize HWASAN funcs");
+STATISTIC(NumConsideredFuncs, "Number of funcs considered for HWASAN");
+STATISTIC(NumInstrumentedFuncs, "Number of HWASAN instrumented funcs");
+STATISTIC(NumNoProfileSummaryFuncs, "Number of HWASAN funcs without PS");
+STATISTIC(NumSkippedHotFuncs, "Number of skipped hot HWASAN funcs");
+
 // Mode for selecting how to insert frame record info into the stack ring
 // buffer.
 enum RecordStackHistoryMode {
@@ -1501,11 +1520,39 @@ bool HWAddressSanitizer::instrumentStack(memtag::StackInfo &SInfo,
 
 void HWAddressSanitizer::sanitizeFunction(Function &F,
                                           FunctionAnalysisManager &FAM) {
-  if (&F == HwasanCtorFunction)
+  NumTotalFuncs++;
+  if (&F == HwasanCtorFunction) {
+    NumHwasanCtors++;
     return;
+  }
 
-  if (!F.hasFnAttribute(Attribute::SanitizeHWAddress))
+  if (!F.hasFnAttribute(Attribute::SanitizeHWAddress)) {
+    NumNoSanitizeFuncs++;
     return;
+  }
+
+  NumConsideredFuncs++;
+  if (CSkipHotCode) {
+    auto &MAMProxy = FAM.getResult<ModuleAnalysisManagerFunctionProxy>(F);
+    ProfileSummaryInfo *PSI =
+        MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
+    if (PSI != nullptr && PSI->hasProfileSummary()) {
+      auto &BFI = FAM.getResult<BlockFrequencyAnalysis>(F);
+      const bool is_hot =
+          (HotPercentileCutoff.getNumOccurrences() && HotPercentileCutoff >= 0)
+              ? PSI->isFunctionHotInCallGraphNthPercentile(HotPercentileCutoff,
+                                                           &F, BFI)
+              : PSI->isFunctionEntryHot(&F);
+
+      if (is_hot) {
+        ++NumSkippedHotFuncs;
+        return;
+      }
+    } else {
+      ++NumNoProfileSummaryFuncs;
+    }
+  }
+  NumInstrumentedFuncs++;
 
   LLVM_DEBUG(dbgs() << "Function: " << F.getName() << "\n");
 

Copy link

github-actions bot commented Feb 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@kstoimenov kstoimenov merged commit e7c3cd2 into llvm:main Mar 1, 2024
@Enna1
Copy link
Contributor

Enna1 commented Mar 1, 2024

Hi @kstoimenov @vitalybuka

I have some high level questions:

  1. What is the motivation of profile information based selective instrumentation? is it about code size?
  2. Will the profile information based selective instrumentation be implemented for other sanitizers(asan, msan, tsan)?
  3. Will the profile information based selective instrumentation cause some false negatives or false positives?
    IIUR, lets say the compilation unit where the heap-buffer-overflow happens is not instrumented, then there is no hwasan check inserted, so false negative may happen?

Thanks!

edited:
Just noticed this ubsan patch: #83471

@RKSimon
Copy link
Collaborator

RKSimon commented Mar 1, 2024

@kstoimenov @vitalybuka This appears to be breaking some bots:

https://lab.llvm.org/buildbot/#/builders/124/builds/9846
Failed Tests (5):
LLVM :: CodeGen/AArch64/shufflevector.ll
LLVM :: CodeGen/Hexagon/loop_align_count.ll
LLVM :: CodeGen/Hexagon/loop_align_count.mir
LLVM :: Instrumentation/HWAddressSanitizer/pgo-opt-out-no-ps.ll
LLVM :: Instrumentation/HWAddressSanitizer/pgo-opt-out.ll

kstoimenov added a commit that referenced this pull request Mar 1, 2024
@kstoimenov
Copy link
Contributor Author

I reverted it with f6f79d4.

kstoimenov added a commit that referenced this pull request Mar 4, 2024
…ing information (#83503)"

Added REQUIRES: asserts, which should fix the build problem.

This reverts commit f6f79d4.
@kstoimenov
Copy link
Contributor Author

Reapplied in 5675447.

@vitalybuka
Copy link
Collaborator

Hi @kstoimenov @vitalybuka

I have some high level questions:

  1. What is the motivation of profile information based selective instrumentation? is it about code size?

There are use cases when we can't run 100% of instrumentation because of unacceptable performance overhead. However if we opt out hot code we can still keep most of coverage with minimal loss.

  1. Will the profile information based selective instrumentation be implemented for other sanitizers(asan, msan, tsan)?

ubsan, hwasan only.

To avoid false positives msan, tsan even with no_sanitize attribute require some instrumentation, so perf overhead will be smaller, but still significant. So I think it's infeasible here.

asan for good detection requires allocator quarantine and fake stack. So it may benefit, but still expensive. We don't have plans on that, but anyone is welcome to try to port this patch to Asan.

  1. Will the profile information based selective instrumentation cause some false negatives or false positives?
    IIUR, lets say the compilation unit where the heap-buffer-overflow happens is not instrumented, then there is no hwasan check inserted, so false negative may happen?

non-instrumented code with hwasan, like with asan, will not cause false positive.
Some false negative are expected by definition, if code involved into the bug is hot and instrumentation is omitted.

Thanks!

edited: Just noticed this ubsan patch: #83471

kstoimenov added a commit that referenced this pull request Mar 7, 2024
…3942)

1. Change tests to use IR instead of -stats to avoid depending on Debug
mode
2. Add SkipInstrumentationRandomRate 
3. Remove HWASAN from stat strings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants