Skip to content

[DebugInfo] Fix duplicate DIFile when main file is preprocessed #75022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 12, 2023

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Dec 11, 2023

When the main file is preprocessed and we change MainFileName to the
original source file name (e.g. a.i => a.c), the source manager does
not contain a.c, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i && clang -g -S a.i && clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"

Fix #56378 by disassociating md5(a.i) with a.c.

When the main file is preprocessed and we change `MainFileName` to the
original source file name (e.g. `a.i => a.c`), the source manager does
not contain `a.c`, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

```
% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i && clang -g -S a.i && clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"
```

Fix llvm#56378 by disassociating md5(a.i) with a.c.
@MaskRay MaskRay requested review from pogo59, dwblaikie and rnk December 11, 2023 03:55
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. debuginfo labels Dec 11, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 11, 2023

@llvm/pr-subscribers-clang-codegen

Author: Fangrui Song (MaskRay)

Changes

When the main file is preprocessed and we change MainFileName to the
original source file name (e.g. a.i => a.c), the source manager does
not contain a.c, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i && clang -g -S a.i && clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"

Fix #56378 by disassociating md5(a.i) with a.c.


Full diff: https://github.com/llvm/llvm-project/pull/75022.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+6-4)
  • (modified) clang/test/CodeGen/debug-info-preprocessed-file.i (+4)
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 7cf661994a29c7..37d7a6755d3908 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -554,14 +554,16 @@ void CGDebugInfo::CreateCompileUnit() {
     // If the main file name provided is identical to the input file name, and
     // if the input file is a preprocessed source, use the module name for
     // debug info. The module name comes from the name specified in the first
-    // linemarker if the input is a preprocessed source.
+    // linemarker if the input is a preprocessed source. In this case we don't
+    // know the content to compute a checksum.
     if (MainFile->getName() == MainFileName &&
         FrontendOptions::getInputKindForExtension(
             MainFile->getName().rsplit('.').second)
-            .isPreprocessed())
+            .isPreprocessed()) {
       MainFileName = CGM.getModule().getName().str();
-
-    CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    } else {
+      CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    }
   }
 
   llvm::dwarf::SourceLanguage LangTag;
diff --git a/clang/test/CodeGen/debug-info-preprocessed-file.i b/clang/test/CodeGen/debug-info-preprocessed-file.i
index 23fd26525e3bf0..c8a2307d46c316 100644
--- a/clang/test/CodeGen/debug-info-preprocessed-file.i
+++ b/clang/test/CodeGen/debug-info-preprocessed-file.i
@@ -6,6 +6,10 @@
 # 1 "<built-in>" 2
 # 1 "preprocessed-input.c" 2
 
+/// The main file is preprocessed. We change it to preprocessed-input.c. Since
+/// the content is not available, we don't compute a checksum.
 // RUN: %clang -g -c -S -emit-llvm -o - %s | FileCheck %s
 // CHECK: !DICompileUnit(language: DW_LANG_C{{.*}}, file: ![[FILE:[0-9]+]]
 // CHECK: ![[FILE]] = !DIFile(filename: "/foo/bar/preprocessed-input.c"
+// CHECK-NOT: checksumkind:
+// CHECK-NOT: !DIFile(

@llvmbot
Copy link
Member

llvmbot commented Dec 11, 2023

@llvm/pr-subscribers-debuginfo

Author: Fangrui Song (MaskRay)

Changes

When the main file is preprocessed and we change MainFileName to the
original source file name (e.g. a.i =&gt; a.c), the source manager does
not contain a.c, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i &amp;&amp; clang -g -S a.i &amp;&amp; clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"

Fix #56378 by disassociating md5(a.i) with a.c.


Full diff: https://github.com/llvm/llvm-project/pull/75022.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+6-4)
  • (modified) clang/test/CodeGen/debug-info-preprocessed-file.i (+4)
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 7cf661994a29c7..37d7a6755d3908 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -554,14 +554,16 @@ void CGDebugInfo::CreateCompileUnit() {
     // If the main file name provided is identical to the input file name, and
     // if the input file is a preprocessed source, use the module name for
     // debug info. The module name comes from the name specified in the first
-    // linemarker if the input is a preprocessed source.
+    // linemarker if the input is a preprocessed source. In this case we don't
+    // know the content to compute a checksum.
     if (MainFile->getName() == MainFileName &&
         FrontendOptions::getInputKindForExtension(
             MainFile->getName().rsplit('.').second)
-            .isPreprocessed())
+            .isPreprocessed()) {
       MainFileName = CGM.getModule().getName().str();
-
-    CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    } else {
+      CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    }
   }
 
   llvm::dwarf::SourceLanguage LangTag;
diff --git a/clang/test/CodeGen/debug-info-preprocessed-file.i b/clang/test/CodeGen/debug-info-preprocessed-file.i
index 23fd26525e3bf0..c8a2307d46c316 100644
--- a/clang/test/CodeGen/debug-info-preprocessed-file.i
+++ b/clang/test/CodeGen/debug-info-preprocessed-file.i
@@ -6,6 +6,10 @@
 # 1 "<built-in>" 2
 # 1 "preprocessed-input.c" 2
 
+/// The main file is preprocessed. We change it to preprocessed-input.c. Since
+/// the content is not available, we don't compute a checksum.
 // RUN: %clang -g -c -S -emit-llvm -o - %s | FileCheck %s
 // CHECK: !DICompileUnit(language: DW_LANG_C{{.*}}, file: ![[FILE:[0-9]+]]
 // CHECK: ![[FILE]] = !DIFile(filename: "/foo/bar/preprocessed-input.c"
+// CHECK-NOT: checksumkind:
+// CHECK-NOT: !DIFile(

@llvmbot
Copy link
Member

llvmbot commented Dec 11, 2023

@llvm/pr-subscribers-clang

Author: Fangrui Song (MaskRay)

Changes

When the main file is preprocessed and we change MainFileName to the
original source file name (e.g. a.i =&gt; a.c), the source manager does
not contain a.c, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i &amp;&amp; clang -g -S a.i &amp;&amp; clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"

Fix #56378 by disassociating md5(a.i) with a.c.


Full diff: https://github.com/llvm/llvm-project/pull/75022.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+6-4)
  • (modified) clang/test/CodeGen/debug-info-preprocessed-file.i (+4)
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 7cf661994a29c7..37d7a6755d3908 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -554,14 +554,16 @@ void CGDebugInfo::CreateCompileUnit() {
     // If the main file name provided is identical to the input file name, and
     // if the input file is a preprocessed source, use the module name for
     // debug info. The module name comes from the name specified in the first
-    // linemarker if the input is a preprocessed source.
+    // linemarker if the input is a preprocessed source. In this case we don't
+    // know the content to compute a checksum.
     if (MainFile->getName() == MainFileName &&
         FrontendOptions::getInputKindForExtension(
             MainFile->getName().rsplit('.').second)
-            .isPreprocessed())
+            .isPreprocessed()) {
       MainFileName = CGM.getModule().getName().str();
-
-    CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    } else {
+      CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    }
   }
 
   llvm::dwarf::SourceLanguage LangTag;
diff --git a/clang/test/CodeGen/debug-info-preprocessed-file.i b/clang/test/CodeGen/debug-info-preprocessed-file.i
index 23fd26525e3bf0..c8a2307d46c316 100644
--- a/clang/test/CodeGen/debug-info-preprocessed-file.i
+++ b/clang/test/CodeGen/debug-info-preprocessed-file.i
@@ -6,6 +6,10 @@
 # 1 "<built-in>" 2
 # 1 "preprocessed-input.c" 2
 
+/// The main file is preprocessed. We change it to preprocessed-input.c. Since
+/// the content is not available, we don't compute a checksum.
 // RUN: %clang -g -c -S -emit-llvm -o - %s | FileCheck %s
 // CHECK: !DICompileUnit(language: DW_LANG_C{{.*}}, file: ![[FILE:[0-9]+]]
 // CHECK: ![[FILE]] = !DIFile(filename: "/foo/bar/preprocessed-input.c"
+// CHECK-NOT: checksumkind:
+// CHECK-NOT: !DIFile(

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, thanks!

@MaskRay MaskRay merged commit 831484e into llvm:main Dec 12, 2023
@MaskRay MaskRay deleted the clang-debug-file0 branch December 12, 2023 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category debuginfo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clang: "-save-temps -g" option leads to "warning: inconsistent use of MD5 checksums"
3 participants