Skip to content

Add Clang attribute to ensure that fields are initialized explicitly #102040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 14, 2025

Conversation

higher-performance
Copy link
Contributor

@higher-performance higher-performance commented Aug 5, 2024

This is a new Clang-specific attribute to ensure that field initializations are performed explicitly.

For example, if we have

struct B {
  [[clang::explicit]] int f1;
};

then the diagnostic would trigger if we do B b{};:

field 'f1' is left uninitialized, but was marked as requiring initialization

This prevents callers from accidentally forgetting to initialize fields, particularly when new fields are added to the class.

Naming:

We are open to alternative names; we would just like their meanings to be clear. For example, must_init, requires_init, etc. are some alternative suggestions that would be fine. However, we would like to avoid a name such as required as must_specify, as their meanings might be potentially unclear or confusing (e.g., due to confusion with requires).

Note:

I'm running into an issue with duplicated diagnostics (see lit tests) that I'm not sure how to properly resolve, but I suspect it revolves around VerifyOnly. If you know the proper fix please let me know.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Aug 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 5, 2024

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: None (higher-performance)

Changes

This is a new Clang-specific attribute to ensure that field initializations are performed explicitly.

For example, if we have

  struct B {
    [[clang::explicit]] int f1;
  };

---
Full diff: https://github.com/llvm/llvm-project/pull/102040.diff


10 Files Affected:

- (modified) clang/include/clang/AST/CXXRecordDeclDefinitionBits.def (+8) 
- (modified) clang/include/clang/AST/DeclCXX.h (+5) 
- (modified) clang/include/clang/Basic/Attr.td (+8) 
- (modified) clang/include/clang/Basic/AttrDocs.td (+22) 
- (modified) clang/include/clang/Basic/DiagnosticGroups.td (+1) 
- (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+3) 
- (modified) clang/lib/AST/DeclCXX.cpp (+9) 
- (modified) clang/lib/Sema/SemaDeclAttr.cpp (+7) 
- (modified) clang/lib/Sema/SemaInit.cpp (+11) 
- (modified) clang/test/SemaCXX/uninitialized.cpp (+22) 


``````````diff
diff --git a/clang/include/clang/AST/CXXRecordDeclDefinitionBits.def b/clang/include/clang/AST/CXXRecordDeclDefinitionBits.def
index cdf0804680ad0..a782026462566 100644
--- a/clang/include/clang/AST/CXXRecordDeclDefinitionBits.def
+++ b/clang/include/clang/AST/CXXRecordDeclDefinitionBits.def
@@ -119,6 +119,14 @@ FIELD(HasInitMethod, 1, NO_MERGE)
 /// within anonymous unions or structs.
 FIELD(HasInClassInitializer, 1, NO_MERGE)
 
+/// Custom attribute that is True if any field is marked as explicit in a type
+/// without a user-provided default constructor, or if this is the case for any
+/// base classes and/or member variables whose types are aggregates.
+///
+/// In this case, default-construction is diagnosed, as it would not explicitly
+/// initialize the field.
+FIELD(HasUninitializedExplicitFields, 1, NO_MERGE)
+
 /// True if any field is of reference type, and does not have an
 /// in-class initializer.
 ///
diff --git a/clang/include/clang/AST/DeclCXX.h b/clang/include/clang/AST/DeclCXX.h
index bf6a5ce92d438..8228595d84b0f 100644
--- a/clang/include/clang/AST/DeclCXX.h
+++ b/clang/include/clang/AST/DeclCXX.h
@@ -1151,6 +1151,11 @@ class CXXRecordDecl : public RecordDecl {
   /// structs).
   bool hasInClassInitializer() const { return data().HasInClassInitializer; }
 
+  bool hasUninitializedExplicitFields() const {
+    return !isUnion() && !hasUserProvidedDefaultConstructor() &&
+           data().HasUninitializedExplicitFields;
+  }
+
   /// Whether this class or any of its subobjects has any members of
   /// reference type which would make value-initialization ill-formed.
   ///
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 8ac2079099c85..409a7cd9177e8 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1823,6 +1823,14 @@ def Leaf : InheritableAttr {
   let SimpleHandler = 1;
 }
 
+def Explicit : InheritableAttr {
+  let Spellings = [Clang<"explicit", 0>];
+  let Subjects = SubjectList<[Field], ErrorDiag>;
+  let Documentation = [ExplicitDocs];
+  let LangOpts = [CPlusPlus];
+  let SimpleHandler = 1;
+}
+
 def LifetimeBound : DeclOrTypeAttr {
   let Spellings = [Clang<"lifetimebound", 0>];
   let Subjects = SubjectList<[ParmVar, ImplicitObjectParameter], ErrorDiag>;
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 94c284fc73158..66d0044aa979d 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -1419,6 +1419,28 @@ is not specified.
   }];
 }
 
+def ExplicitDocs : Documentation {
+  let Category = DocCatField;
+  let Content = [{
+The ``clang::explicit`` attribute indicates that the field must be initialized
+explicitly by the caller when the class is constructed.
+
+Example usage:
+
+.. code-block:: c++
+
+  struct some_aggregate {
+    int x;
+    int y [[clang::explicit]];
+  };
+
+  some_aggregate create() {
+    return {.x = 1};  // error: y is not initialized explicitly
+  }
+
+  }];
+}
+
 def NoUniqueAddressDocs : Documentation {
   let Category = DocCatField;
   let Content = [{
diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td
index 19c3f1e043349..52af3d3a7af39 100644
--- a/clang/include/clang/Basic/DiagnosticGroups.td
+++ b/clang/include/clang/Basic/DiagnosticGroups.td
@@ -787,6 +787,7 @@ def Trigraphs      : DiagGroup<"trigraphs">;
 def UndefinedReinterpretCast : DiagGroup<"undefined-reinterpret-cast">;
 def ReinterpretBaseClass : DiagGroup<"reinterpret-base-class">;
 def Unicode  : DiagGroup<"unicode">;
+def UninitializedExplicit : DiagGroup<"uninitialized-explicit">;
 def UninitializedMaybe : DiagGroup<"conditional-uninitialized">;
 def UninitializedSometimes : DiagGroup<"sometimes-uninitialized">;
 def UninitializedStaticSelfInit : DiagGroup<"static-self-init">;
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 581434d33c5c9..fb493e3c4f7f0 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -2325,6 +2325,9 @@ def err_init_reference_member_uninitialized : Error<
   "reference member of type %0 uninitialized">;
 def note_uninit_reference_member : Note<
   "uninitialized reference member is here">;
+def warn_field_requires_init : Warning<
+  "field %0 is left uninitialized, but was marked as requiring initialization">,
+  InGroup<UninitializedExplicit>;
 def warn_field_is_uninit : Warning<"field %0 is uninitialized when used here">,
   InGroup<Uninitialized>;
 def warn_base_class_is_uninit : Warning<
diff --git a/clang/lib/AST/DeclCXX.cpp b/clang/lib/AST/DeclCXX.cpp
index 9a3ede426e914..732117ae263c2 100644
--- a/clang/lib/AST/DeclCXX.cpp
+++ b/clang/lib/AST/DeclCXX.cpp
@@ -81,6 +81,7 @@ CXXRecordDecl::DefinitionData::DefinitionData(CXXRecordDecl *D)
       HasPrivateFields(false), HasProtectedFields(false),
       HasPublicFields(false), HasMutableFields(false), HasVariantMembers(false),
       HasOnlyCMembers(true), HasInitMethod(false), HasInClassInitializer(false),
+      HasUninitializedExplicitFields(false),
       HasUninitializedReferenceMember(false), HasUninitializedFields(false),
       HasInheritedConstructor(false), HasInheritedDefaultConstructor(false),
       HasInheritedAssignment(false),
@@ -1108,6 +1109,10 @@ void CXXRecordDecl::addedMember(Decl *D) {
     } else if (!T.isCXX98PODType(Context))
       data().PlainOldData = false;
 
+    if (Field->hasAttr<ExplicitAttr>() && !Field->hasInClassInitializer()) {
+      data().HasUninitializedExplicitFields = true;
+    }
+
     if (T->isReferenceType()) {
       if (!Field->hasInClassInitializer())
         data().HasUninitializedReferenceMember = true;
@@ -1359,6 +1364,10 @@ void CXXRecordDecl::addedMember(Decl *D) {
         if (!FieldRec->hasCopyAssignmentWithConstParam())
           data().ImplicitCopyAssignmentHasConstParam = false;
 
+        if (FieldRec->hasUninitializedExplicitFields() &&
+            FieldRec->isAggregate() && !Field->hasInClassInitializer())
+          data().HasUninitializedExplicitFields = true;
+
         if (FieldRec->hasUninitializedReferenceMember() &&
             !Field->hasInClassInitializer())
           data().HasUninitializedReferenceMember = true;
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 9011fa547638e..b55c845b74528 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -5943,6 +5943,10 @@ static void handleNoMergeAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   D->addAttr(NoMergeAttr::Create(S.Context, AL));
 }
 
+static void handleExplicitAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
+  D->addAttr(ExplicitAttr::Create(S.Context, AL));
+}
+
 static void handleNoUniqueAddressAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   D->addAttr(NoUniqueAddressAttr::Create(S.Context, AL));
 }
@@ -6848,6 +6852,9 @@ ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D, const ParsedAttr &AL,
   case ParsedAttr::AT_NoMerge:
     handleNoMergeAttr(S, D, AL);
     break;
+  case ParsedAttr::AT_Explicit:
+    handleExplicitAttr(S, D, AL);
+    break;
   case ParsedAttr::AT_NoUniqueAddress:
     handleNoUniqueAddressAttr(S, D, AL);
     break;
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp
index 90fd6df782f09..47b5c4bd388eb 100644
--- a/clang/lib/Sema/SemaInit.cpp
+++ b/clang/lib/Sema/SemaInit.cpp
@@ -743,6 +743,11 @@ void InitListChecker::FillInEmptyInitForField(unsigned Init, FieldDecl *Field,
         ILE->updateInit(SemaRef.Context, Init, Filler);
       return;
     }
+
+    if (Field->hasAttr<ExplicitAttr>()) {
+      SemaRef.Diag(ILE->getExprLoc(), diag::warn_field_requires_init) << Field;
+    }
+
     // C++1y [dcl.init.aggr]p7:
     //   If there are fewer initializer-clauses in the list than there are
     //   members in the aggregate, then each member not explicitly initialized
@@ -4475,6 +4480,12 @@ static void TryConstructorInitialization(Sema &S,
 
   CXXConstructorDecl *CtorDecl = cast<CXXConstructorDecl>(Best->Function);
   if (Result != OR_Deleted) {
+    if (!IsListInit && Kind.getKind() == InitializationKind::IK_Default &&
+        DestRecordDecl != nullptr && DestRecordDecl->isAggregate() &&
+        DestRecordDecl->hasUninitializedExplicitFields()) {
+      S.Diag(Kind.getLocation(), diag::warn_field_requires_init) << "in class";
+    }
+
     // C++11 [dcl.init]p6:
     //   If a program calls for the default initialization of an object
     //   of a const-qualified type T, T shall be a class type with a
diff --git a/clang/test/SemaCXX/uninitialized.cpp b/clang/test/SemaCXX/uninitialized.cpp
index 8a640c9691b32..3a339b4f9c9c4 100644
--- a/clang/test/SemaCXX/uninitialized.cpp
+++ b/clang/test/SemaCXX/uninitialized.cpp
@@ -1472,3 +1472,25 @@ template<typename T> struct Outer {
   };
 };
 Outer<int>::Inner outerinner;
+
+void aggregate() {
+  struct B {
+    [[clang::explicit]] int f1;
+  };
+
+  struct S : B { // expected-warning {{uninitialized}}
+    int f2;
+    int f3 [[clang::explicit]];
+  };
+
+#if __cplusplus >= 202002L
+  S a({}, 0);  // expected-warning {{'f1' is left uninitialized}} expected-warning {{'f3' is left uninitialized}}
+#endif
+  S b{.f3 = 1}; // expected-warning {{'f1' is left uninitialized}}
+  S c{.f2 = 5}; // expected-warning {{'f1' is left uninitialized}} expected-warning {{'f3' is left uninitialized}} expected-warning {{'f3' is left uninitialized}}
+  c = {{}, 0};  // expected-warning {{'f1' is left uninitialized}} expected-warning {{'f3' is left uninitialized}}
+  S d; // expected-warning {{uninitialized}} expected-note {{constructor}}
+  (void)b;
+  (void)c;
+  (void)d;
+}

@cor3ntin
Copy link
Contributor

cor3ntin commented Aug 5, 2024

@AaronBallman I think this is worth of an RFC, WDYT?

@AaronBallman
Copy link
Collaborator

Thank you for this!

@AaronBallman I think this is worth of an RFC, WDYT?

Yes, this should definitely get an RFC. Some things worth discussing in the RFC:

  • Is there a larger design here beyond just fields? e.g., what about local variables?
  • Should there be a class-level attribute for cases where you want all fields to be handled the same way?
  • Why is the default to assume all fields don't require initialization? Would it make more sense to have a driver flag to opt in to a mode where it's an error to not initialize a field unless it has an attribute saying it's okay to leave it uninitialized?
  • Does there need to be a way to silence the diagnostic on a case-by-case basis? e.g., the field should always be initialized explicitly except in one special case that needs to opt out of that behavior

(I'm sure there are other questions, but basically, it's good to have a big-picture understanding of why a particular design is the way you think we should go.)

@higher-performance higher-performance force-pushed the required-fields branch 2 times, most recently from f09b427 to 5e03c06 Compare August 8, 2024 18:30
@higher-performance
Copy link
Contributor Author

@higher-performance
Copy link
Contributor Author

I updated the PR to change the attribute name from explicit to explicit_init and to clarify the error message that initializations must be explicit.

@AaronBallman are we good to move forward?

Copy link
Contributor

@ilya-biryukov ilya-biryukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope folks are ok with me chiming in as a reviewer for this.
I've left quite a few comments in the RFC and is also supportive of landing this change and happy to invest into supporting it going forward inside our team.

@higher-performance higher-performance force-pushed the required-fields branch 2 times, most recently from 002027b to 1695451 Compare August 30, 2024 18:09
Copy link

github-actions bot commented Aug 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@higher-performance higher-performance force-pushed the required-fields branch 2 times, most recently from 9833f57 to dc56548 Compare September 4, 2024 18:18
Copy link
Contributor

@ilya-biryukov ilya-biryukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some good parallel disscussions happening in the RFC, but despite their outcomes, we could probably update the PR to capture current behavior in those interesting cases.

I left a few comments along those lines, PTAL.

@higher-performance higher-performance force-pushed the required-fields branch 4 times, most recently from 4319585 to 9335d80 Compare September 9, 2024 21:45
Copy link
Contributor

@ilya-biryukov ilya-biryukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about VerifyOnly and duplicate diagnostics.
The rest are small NITs.

C a; // expected-warning {{not explicitly initialized}}
(void)a;
#endif
D b{.f2 = 1}; // expected-warning {{'x' is not explicitly initialized}} expected-warning {{'q' is not explicitly initialized}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an idea for future improvements: we could also collect all unitialized fields and emit a single diagnostic that lists them all (with notes to the locations of the fields).

However, I think this is good enough for the first version, I don't necessarily feel we should do it right away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's more complicated than I can invest in right now, it's something we can do in the future though.

@ilya-biryukov
Copy link
Contributor

@AaronBallman @cor3ntin I believe we are getting close to finalizing this PR.
Would you be okay with this feature landing and myself approving this when it's ready?

There was some discussion here and in the RFC, but I don't think there was explicit approval (or objection) to land this, so I wanted to clarify this.

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the rfc has reached its conclusion yet, and consensus has not been called (for example, i still need to think about whether my questions were addressed) so we should wait for the RFC process before continuing with that PR.

Thanks

@ilya-biryukov
Copy link
Contributor

I don't think the rfc has reached its conclusion yet, and consensus has not been called (for example, i still need to think about whether my questions were addressed) so we should wait for the RFC process before continuing with that PR.

Thanks

Thanks for explicitly calling this out. There were no replies from you on the RFC for some time, so it was unclear whether there is anything left. We will be waiting for your feedback on Discourse.

@cor3ntin
Copy link
Contributor

cor3ntin commented Sep 13, 2024 via email

@higher-performance
Copy link
Contributor Author

@AaronBallman Oh no worries at all. I'm actually struggling with something else -- apparently C++20 aggregate initialization with parentheses isn't handled correctly, and I'm struggling to get it to work. I added a bunch of test cases that all fail... any chance you know how these need to be handled?

I tried adding

  if (Args.empty()) {
    if (FieldDecl *FD = dyn_cast_if_present<FieldDecl>(Entity.getDecl())) {
      if (FD->hasAttr<ExplicitInitAttr>()) {
        S.Diag(Kind.getLocation(), diag::warn_field_requires_explicit_init)
            << /* Var-in-Record */ 0 << FD;
        S.Diag(FD->getLocation(), diag::note_entity_declared_at) << FD;
      }
    }
  }

to the beginning of InitializationSequence::Perform, but that also duplicated some of the errors, and still failed to handle some other cases (I think value-initialization?)...

@ilya-biryukov
Copy link
Contributor

ilya-biryukov commented Nov 25, 2024

re the C++20 init: the code that handles this is here:

// We've processed all of the args, but there are still members that

You will probably have to update it accordingly.

I believe it would be ideal that this code was shared with FillEmptyFieldInitializer, however it is not.
Looking at both places, I suspect we are actually missing have a chance to miss some interesting functionality for C++20 paren initializers, e.g. in the future, when adding code like similar to

SemaRef.checkInitializerLifetime(MemberEntity, DIE.get());

it is easy to imagine that we would add that to list initializers as the most commonly used pattern and forget to add it for C++20 parenthesized initialization.

@higher-performance
Copy link
Contributor Author

Yup that's basically what I tried, but it seems to miss some cases and duplicate others. Hmm..

@higher-performance
Copy link
Contributor Author

Looks like the tests finally all pass with my last fix. Could we merge? :) Happy new year!

@higher-performance
Copy link
Contributor Author

Hi @AaronBallman~ would you have a chance to do a final review? :)

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your patience on the review!

@higher-performance
Copy link
Contributor Author

Fantastic, thank you! Could we merge it?

@AaronBallman AaronBallman merged commit 1594413 into llvm:main Jan 14, 2025
9 checks passed
@AaronBallman
Copy link
Collaborator

Fantastic, thank you! Could we merge it?

heh, thanks for the poke, I didn't remember you didn't have commit access. I've merged now.

@higher-performance higher-performance deleted the required-fields branch January 14, 2025 18:32
@higher-performance
Copy link
Contributor Author

Ah haha I see, thanks! Looking forward to getting it sometime, that would definitely be helpful :)

@AaronBallman
Copy link
Collaborator

Ah haha I see, thanks! Looking forward to getting it sometime, that would definitely be helpful :)

FWIW, I think you're fine to ask for commit privs now: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

@higher-performance
Copy link
Contributor Author

higher-performance commented Jan 14, 2025

@AaronBallman I... just noticed that the name clang::requires_explicit_initialization is subtly different from clang::require_constant_initialization. (requires vs. require.) Is it worth making it consistent before people start using it? I can submit a PR quickly if so.
(Edit: Opened #122947 for this, sorry for the oversight.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants