Skip to content

[Clang] Make the result type of sizeof/pointer subtraction/size_t lit… #136542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

YexuanXiao
Copy link

…erals be typedefs instead of built-in types

Includeing the results of sizeof, sizeof..., __datasizeof, __alignof, _Alignof, alignof, _Countof, size_t literals, and signed size_t literals, as well as the results of pointer-pointer subtraction. It does not affect any program output except for debugging information. The goal is to enable clang and downstream tools such as clangd and clang-tidy to provide more portable hints and diagnostics.

…erals be typedefs instead of built-in types

Includeing the results of `sizeof`, `sizeof...`, `__datasizeof`, `__alignof`, `_Alignof`, `alignof`, `_Countof`, `size_t` literals, and signed `size_t` literals, as well as the results of pointer-pointer subtraction. It does not affect any program output except for debugging information. The goal is to enable clang and downstream tools such as clangd and clang-tidy to provide more portable hints and diagnostics.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 21, 2025

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-coroutines
@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: YexuanXiao (YexuanXiao)

Changes

…erals be typedefs instead of built-in types

Includeing the results of sizeof, sizeof..., __datasizeof, __alignof, _Alignof, alignof, _Countof, size_t literals, and signed size_t literals, as well as the results of pointer-pointer subtraction. It does not affect any program output except for debugging information. The goal is to enable clang and downstream tools such as clangd and clang-tidy to provide more portable hints and diagnostics.


Full diff: https://github.com/llvm/llvm-project/pull/136542.diff

5 Files Affected:

  • (modified) clang/include/clang/AST/ASTContext.h (+4)
  • (modified) clang/lib/AST/ASTContext.cpp (+29)
  • (modified) clang/lib/AST/ComparisonCategories.cpp (+5-25)
  • (modified) clang/lib/AST/ExprCXX.cpp (+4-2)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+26-8)
diff --git a/clang/include/clang/AST/ASTContext.h b/clang/include/clang/AST/ASTContext.h
index 3c78833a3f069..0c133d45d3f5e 100644
--- a/clang/include/clang/AST/ASTContext.h
+++ b/clang/include/clang/AST/ASTContext.h
@@ -2442,6 +2442,10 @@ class ASTContext : public RefCountedBase<ASTContext> {
   QualType GetBuiltinType(unsigned ID, GetBuiltinTypeError &Error,
                           unsigned *IntegerConstantArgs = nullptr) const;
 
+  QualType getCGlobalCXXStdNSTypedef(const NamespaceDecl *StdNS,
+                                     StringRef DefName,
+                                     QualType FallBack = {}) const;
+
   /// Types and expressions required to build C++2a three-way comparisons
   /// using operator<=>, including the values return by builtin <=> operators.
   ComparisonCategories CompCategories;
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 2836d68b05ff6..aa8ce0078d4d3 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -12556,6 +12556,35 @@ QualType ASTContext::GetBuiltinType(unsigned Id,
   return getFunctionType(ResType, ArgTypes, EPI);
 }
 
+QualType ASTContext::getCGlobalCXXStdNSTypedef(const NamespaceDecl *StdNS,
+                                               StringRef DefName,
+                                               QualType FallBack) const {
+  DeclContextLookupResult Lookup;
+  if (getLangOpts().C99) {
+    Lookup = getTranslationUnitDecl()->lookup(&Idents.get(DefName));
+  } else if (getLangOpts().CPlusPlus) {
+    if (StdNS == nullptr) {
+      auto LookupStdNS = getTranslationUnitDecl()->lookup(&Idents.get("std"));
+      if (!LookupStdNS.empty()) {
+        StdNS = dyn_cast<NamespaceDecl>(LookupStdNS.front());
+      }
+    }
+    if (StdNS) {
+      Lookup = StdNS->lookup(&Idents.get(DefName));
+    } else {
+      Lookup = getTranslationUnitDecl()->lookup(&Idents.get(DefName));
+    }
+  }
+  if (!Lookup.empty()) {
+    if (auto *TD = dyn_cast<TypedefNameDecl>(Lookup.front())) {
+      if (auto Result = getTypeDeclType(TD); !Result.isNull()) {
+        return Result;
+      }
+    }
+  }
+  return FallBack;
+}
+
 static GVALinkage basicGVALinkageForFunction(const ASTContext &Context,
                                              const FunctionDecl *FD) {
   if (!FD->isExternallyVisible())
diff --git a/clang/lib/AST/ComparisonCategories.cpp b/clang/lib/AST/ComparisonCategories.cpp
index 28244104d6636..46dcd6ac4261d 100644
--- a/clang/lib/AST/ComparisonCategories.cpp
+++ b/clang/lib/AST/ComparisonCategories.cpp
@@ -87,37 +87,17 @@ ComparisonCategoryInfo::ValueInfo *ComparisonCategoryInfo::lookupValueInfo(
   return &Objects.back();
 }
 
-static const NamespaceDecl *lookupStdNamespace(const ASTContext &Ctx,
-                                               NamespaceDecl *&StdNS) {
-  if (!StdNS) {
-    DeclContextLookupResult Lookup =
-        Ctx.getTranslationUnitDecl()->lookup(&Ctx.Idents.get("std"));
-    if (!Lookup.empty())
-      StdNS = dyn_cast<NamespaceDecl>(Lookup.front());
-  }
-  return StdNS;
-}
-
-static const CXXRecordDecl *lookupCXXRecordDecl(const ASTContext &Ctx,
-                                                const NamespaceDecl *StdNS,
-                                                ComparisonCategoryType Kind) {
-  StringRef Name = ComparisonCategories::getCategoryString(Kind);
-  DeclContextLookupResult Lookup = StdNS->lookup(&Ctx.Idents.get(Name));
-  if (!Lookup.empty())
-    if (const CXXRecordDecl *RD = dyn_cast<CXXRecordDecl>(Lookup.front()))
-      return RD;
-  return nullptr;
-}
-
 const ComparisonCategoryInfo *
 ComparisonCategories::lookupInfo(ComparisonCategoryType Kind) const {
   auto It = Data.find(static_cast<char>(Kind));
   if (It != Data.end())
     return &It->second;
-
-  if (const NamespaceDecl *NS = lookupStdNamespace(Ctx, StdNS))
-    if (const CXXRecordDecl *RD = lookupCXXRecordDecl(Ctx, NS, Kind))
+  if (auto QT = Ctx.getCGlobalCXXStdNSTypedef(
+          nullptr, ComparisonCategories::getCategoryString(Kind));
+      !QT.isNull()) {
+    if (const auto *RD = QT->getAsCXXRecordDecl())
       return &Data.try_emplace((char)Kind, Ctx, RD, Kind).first->second;
+  }
 
   return nullptr;
 }
diff --git a/clang/lib/AST/ExprCXX.cpp b/clang/lib/AST/ExprCXX.cpp
index 169f11b611066..306ddcb9f491a 100644
--- a/clang/lib/AST/ExprCXX.cpp
+++ b/clang/lib/AST/ExprCXX.cpp
@@ -1700,8 +1700,10 @@ SizeOfPackExpr *SizeOfPackExpr::Create(ASTContext &Context,
                                        ArrayRef<TemplateArgument> PartialArgs) {
   void *Storage =
       Context.Allocate(totalSizeToAlloc<TemplateArgument>(PartialArgs.size()));
-  return new (Storage) SizeOfPackExpr(Context.getSizeType(), OperatorLoc, Pack,
-                                      PackLoc, RParenLoc, Length, PartialArgs);
+  return new (Storage) SizeOfPackExpr(
+      Context.getCGlobalCXXStdNSTypedef(nullptr, "size_t",
+                                        Context.getSizeType()),
+      OperatorLoc, Pack, PackLoc, RParenLoc, Length, PartialArgs);
 }
 
 SizeOfPackExpr *SizeOfPackExpr::CreateDeserialized(ASTContext &Context,
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 01a021443c94f..d07be9f117957 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -4026,10 +4026,20 @@ ExprResult Sema::ActOnNumericConstant(const Token &Tok, Scope *UDLScope) {
         // Does it fit in size_t?
         if (ResultVal.isIntN(SizeTSize)) {
           // Does it fit in ssize_t?
-          if (!Literal.isUnsigned && ResultVal[SizeTSize - 1] == 0)
-            Ty = Context.getSignedSizeType();
-          else if (AllowUnsigned)
-            Ty = Context.getSizeType();
+          if (!Literal.isUnsigned && ResultVal[SizeTSize - 1] == 0) {
+            auto SignedSize = Context.getSignedSizeType();
+            if (auto PtrDiff = Context.getCGlobalCXXStdNSTypedef(
+                    getStdNamespace(), "ptrdiff_t");
+                Context.hasSameType(PtrDiff, SignedSize))
+              Ty = PtrDiff;
+            else if (auto SSize = Context.getCGlobalCXXStdNSTypedef(
+                         getStdNamespace(), "ssize_t");
+                     Context.hasSameType(SSize, SignedSize))
+              Ty = SSize;
+          } else if (AllowUnsigned) {
+            Ty = Context.getCGlobalCXXStdNSTypedef(getStdNamespace(), "size_t",
+                                                   Context.getSizeType());
+          }
           Width = SizeTSize;
         }
       }
@@ -4702,7 +4712,10 @@ ExprResult Sema::CreateUnaryExprOrTypeTraitExpr(TypeSourceInfo *TInfo,
 
   // C99 6.5.3.4p4: the type (an unsigned integer type) is size_t.
   return new (Context) UnaryExprOrTypeTraitExpr(
-      ExprKind, TInfo, Context.getSizeType(), OpLoc, R.getEnd());
+      ExprKind, TInfo,
+      Context.getCGlobalCXXStdNSTypedef(getStdNamespace(), "size_t",
+                                        Context.getSizeType()),
+      OpLoc, R.getEnd());
 }
 
 ExprResult
@@ -4745,7 +4758,10 @@ Sema::CreateUnaryExprOrTypeTraitExpr(Expr *E, SourceLocation OpLoc,
 
   // C99 6.5.3.4p4: the type (an unsigned integer type) is size_t.
   return new (Context) UnaryExprOrTypeTraitExpr(
-      ExprKind, E, Context.getSizeType(), OpLoc, E->getSourceRange().getEnd());
+      ExprKind, E,
+      Context.getCGlobalCXXStdNSTypedef(getStdNamespace(), "size_t",
+                                        Context.getSizeType()),
+      OpLoc, E->getSourceRange().getEnd());
 }
 
 ExprResult
@@ -11353,8 +11369,10 @@ QualType Sema::CheckSubtractionOperands(ExprResult &LHS, ExprResult &RHS,
         }
       }
 
-      if (CompLHSTy) *CompLHSTy = LHS.get()->getType();
-      return Context.getPointerDiffType();
+      if (CompLHSTy)
+        *CompLHSTy = LHS.get()->getType();
+      return Context.getCGlobalCXXStdNSTypedef(getStdNamespace(), "ptrdiff_t",
+                                               Context.getPointerDiffType());
     }
   }
 

Context.hasSameType(SSize, SignedSize))
Ty = SSize;
} else if (AllowUnsigned) {
Ty = Context.getCGlobalCXXStdNSTypedef(getStdNamespace(), "size_t",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test file to verify that size_t is shown instead of unsigned long/unsigned long long?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into it.

Comment on lines 4764 to 4766
Context.getCGlobalCXXStdNSTypedef(getStdNamespace(), "size_t",
Context.getSizeType()),
OpLoc, E->getSourceRange().getEnd());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning getStdNamespace for C is really not great. Couldn't we merge getCGlobalCXXStdNSTypedef into Context.getSizeType?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it appropriate to change it to getLangOpts().CPlusPlus ? getStdNamespace() : nullptr? I’m concerned that moving it into getSizeType might introduce unintended effects.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!

I'd like to better understand the need for the changes because I have a few concerns. One concern is about compile time performance. But also, this means downstream consumers of the AST are going to have to react because they used to be able to look for a size_t node directly and now they have to resolve a qualified type instead. This may be acceptable, but it seems disruptive too.

Also, there should be more test coverage for the changes showing that we actually do get the types correct in all the various circumstances.

@YexuanXiao
Copy link
Author

YexuanXiao commented Apr 21, 2025

Thank you for this!

I'd like to better understand the need for the changes because I have a few concerns. One concern is about compile time performance. But also, this means downstream consumers of the AST are going to have to react because they used to be able to look for a size_t node directly and now they have to resolve a qualified type instead. This may be acceptable, but it seems disruptive too.

Also, there should be more test coverage for the changes showing that we actually do get the types correct in all the various circumstances.

The current inlay hint of clangd is auto a: unsigned long = sizeof(int);, which is misleading. At the same time, it eliminates certain conversions that clang-tidy or other cleanup tools might warn about. The C and C++ standards state that the result type of such expressions is size_t/ptrdiff_t, so while this may disrupt some downstream assumptions about prior implementations, it aligns more closely with the standard. I believe this is worthwhile, maybe there's a faster way to implement it.

@AaronBallman
Copy link
Collaborator

Thank you for this!
I'd like to better understand the need for the changes because I have a few concerns. One concern is about compile time performance. But also, this means downstream consumers of the AST are going to have to react because they used to be able to look for a size_t node directly and now they have to resolve a qualified type instead. This may be acceptable, but it seems disruptive too.
Also, there should be more test coverage for the changes showing that we actually do get the types correct in all the various circumstances.

The current inlay hint of clangd is auto a: unsigned long = sizeof(int);, which is misleading. At the same time, it eliminates certain conversions that clang-tidy or other cleanup tools might warn about. The C and C++ standards state that the result type of such expressions is size_t/ptrdiff_t, so while this may disrupt some downstream assumptions about prior implementations, it aligns more closely with the standard. I believe this is worthwhile, maybe there's a faster way to implement it.

Yes, but this doesn't exactly accomplish that. In C, you'll still get the underlying integer type unless there happens to be a typedef we can find, right? So you can spot a difference between:

sizeof(int); // returns an unsigned integer type
#include <stddef.h>
sizeof(int); // now returns a typedef to an unsigned integer type

right?

What it seems like you're really after is making size_t and ptrdiff_t actual identifiable types instead of magic based on the target. e.g., ASTContext::getSizeType() should return a QualType representing size_t which has the expected width and alignment and other type properties as what we currently get based on the target. Then __SIZE_TYPE__ results in this otherwise unutterable type name, but it's distinguishable from the underlying integer type despite being compatible with it.

CC @zygoloid @rjmccall @erichkeane for additional opinions on this.

@erichkeane
Copy link
Collaborator

Thank you for this!
I'd like to better understand the need for the changes because I have a few concerns. One concern is about compile time performance. But also, this means downstream consumers of the AST are going to have to react because they used to be able to look for a size_t node directly and now they have to resolve a qualified type instead. This may be acceptable, but it seems disruptive too.
Also, there should be more test coverage for the changes showing that we actually do get the types correct in all the various circumstances.

The current inlay hint of clangd is auto a: unsigned long = sizeof(int);, which is misleading. At the same time, it eliminates certain conversions that clang-tidy or other cleanup tools might warn about. The C and C++ standards state that the result type of such expressions is size_t/ptrdiff_t, so while this may disrupt some downstream assumptions about prior implementations, it aligns more closely with the standard. I believe this is worthwhile, maybe there's a faster way to implement it.

Yes, but this doesn't exactly accomplish that. In C, you'll still get the underlying integer type unless there happens to be a typedef we can find, right? So you can spot a difference between:

sizeof(int); // returns an unsigned integer type
#include <stddef.h>
sizeof(int); // now returns a typedef to an unsigned integer type

right?

What it seems like you're really after is making size_t and ptrdiff_t actual identifiable types instead of magic based on the target. e.g., ASTContext::getSizeType() should return a QualType representing size_t which has the expected width and alignment and other type properties as what we currently get based on the target. Then __SIZE_TYPE__ results in this otherwise unutterable type name, but it's distinguishable from the underlying integer type despite being compatible with it.

CC @zygoloid @rjmccall @erichkeane for additional opinions on this.

I think you're on to something here actually. We should do something like we do with the std namespace: We create an 'implicit' version of it when we need it (materializing it as the 'right' thing), and 'set' it correctly when we 'find' it. Then, we can just 'get' it whenever we need it, like for these.

It does NOT make sense to allow the fallback to have a different textual type based on the existence of the typedef.

@erichkeane
Copy link
Collaborator

TO ADD: see uses of buildImplicitTypedef for times we do basically the same thing.

@AaronBallman
Copy link
Collaborator

Thank you for this!
I'd like to better understand the need for the changes because I have a few concerns. One concern is about compile time performance. But also, this means downstream consumers of the AST are going to have to react because they used to be able to look for a size_t node directly and now they have to resolve a qualified type instead. This may be acceptable, but it seems disruptive too.
Also, there should be more test coverage for the changes showing that we actually do get the types correct in all the various circumstances.

The current inlay hint of clangd is auto a: unsigned long = sizeof(int);, which is misleading. At the same time, it eliminates certain conversions that clang-tidy or other cleanup tools might warn about. The C and C++ standards state that the result type of such expressions is size_t/ptrdiff_t, so while this may disrupt some downstream assumptions about prior implementations, it aligns more closely with the standard. I believe this is worthwhile, maybe there's a faster way to implement it.

Yes, but this doesn't exactly accomplish that. In C, you'll still get the underlying integer type unless there happens to be a typedef we can find, right? So you can spot a difference between:

Actually, it might be even worse; I think it's valid for a user to define a typedef for size_t themselves so long as C standard library headers are not included, because it's not a reserved identifier in that case. I'm asking on the WG14 reflectors because it matters for a test case like:

typedef float size_t;
static_assert(_Generic(sizeof(int), size_t : 1, default : 0));

where it's unclear whether that static assertion should pass or fail.

@YexuanXiao
Copy link
Author

Thank you for this!
I'd like to better understand the need for the changes because I have a few concerns. One concern is about compile time performance. But also, this means downstream consumers of the AST are going to have to react because they used to be able to look for a size_t node directly and now they have to resolve a qualified type instead. This may be acceptable, but it seems disruptive too.
Also, there should be more test coverage for the changes showing that we actually do get the types correct in all the various circumstances.

The current inlay hint of clangd is auto a: unsigned long = sizeof(int);, which is misleading. At the same time, it eliminates certain conversions that clang-tidy or other cleanup tools might warn about. The C and C++ standards state that the result type of such expressions is size_t/ptrdiff_t, so while this may disrupt some downstream assumptions about prior implementations, it aligns more closely with the standard. I believe this is worthwhile, maybe there's a faster way to implement it.

Yes, but this doesn't exactly accomplish that. In C, you'll still get the underlying integer type unless there happens to be a typedef we can find, right? So you can spot a difference between:

sizeof(int); // returns an unsigned integer type
#include <stddef.h>
sizeof(int); // now returns a typedef to an unsigned integer type

right?
What it seems like you're really after is making size_t and ptrdiff_t actual identifiable types instead of magic based on the target. e.g., ASTContext::getSizeType() should return a QualType representing size_t which has the expected width and alignment and other type properties as what we currently get based on the target. Then __SIZE_TYPE__ results in this otherwise unutterable type name, but it's distinguishable from the underlying integer type despite being compatible with it.
CC @zygoloid @rjmccall @erichkeane for additional opinions on this.

I think you're on to something here actually. We should do something like we do with the std namespace: We create an 'implicit' version of it when we need it (materializing it as the 'right' thing), and 'set' it correctly when we 'find' it. Then, we can just 'get' it whenever we need it, like for these.

It does NOT make sense to allow the fallback to have a different textual type based on the existence of the typedef.

I don't have a problem to this, but this approach is more complex and requires more professional work.

@AaronBallman
Copy link
Collaborator

Thank you for this!
I'd like to better understand the need for the changes because I have a few concerns. One concern is about compile time performance. But also, this means downstream consumers of the AST are going to have to react because they used to be able to look for a size_t node directly and now they have to resolve a qualified type instead. This may be acceptable, but it seems disruptive too.
Also, there should be more test coverage for the changes showing that we actually do get the types correct in all the various circumstances.

The current inlay hint of clangd is auto a: unsigned long = sizeof(int);, which is misleading. At the same time, it eliminates certain conversions that clang-tidy or other cleanup tools might warn about. The C and C++ standards state that the result type of such expressions is size_t/ptrdiff_t, so while this may disrupt some downstream assumptions about prior implementations, it aligns more closely with the standard. I believe this is worthwhile, maybe there's a faster way to implement it.

Yes, but this doesn't exactly accomplish that. In C, you'll still get the underlying integer type unless there happens to be a typedef we can find, right? So you can spot a difference between:

Actually, it might be even worse; I think it's valid for a user to define a typedef for size_t themselves so long as C standard library headers are not included, because it's not a reserved identifier in that case. I'm asking on the WG14 reflectors because it matters for a test case like:

typedef float size_t;
static_assert(_Generic(sizeof(int), size_t : 1, default : 0));

where it's unclear whether that static assertion should pass or fail.

Okay, the typedef is valid (not using a reserved identifier), and the assertion should fail because sizeof is defined as returning the size type defined in stddef.h explicitly, not just size_t generically.

TO ADD: see uses of buildImplicitTypedef for times we do basically the same thing.

I don't think we can do that, at least for C, because that means this code would then be accepted when it shouldn't be, right?

void foo() {
  (void)sizeof(0);
  size_t x = 12; // size_t was implicitly defined due to the use of sizeof above
}

@erichkeane
Copy link
Collaborator

I don't think we can do that, at least for C, because that means this code would then be accepted when it shouldn't be, right?

Yeah, I am not sure we can do this now then. It seems odd to allow calls to builtins to change their return 'type' (or at least, return sugar) based on when they are included. Even if we did, we would have to make sure it actually WAS SIZE_TYPE and not a different typedef.

So I guess we're back to: are we ok with this?

sizeof(int); // returns an unsigned integer type
#include <stddef.h>
sizeof(int); // now returns a typedef to an unsigned integer type

@YexuanXiao
Copy link
Author

I don't think we can do that, at least for C, because that means this code would then be accepted when it shouldn't be, right?

Yeah, I am not sure we can do this now then. It seems odd to allow calls to builtins to change their return 'type' (or at least, return sugar) based on when they are included. Even if we did, we would have to make sure it actually WAS SIZE_TYPE and not a different typedef.

So I guess we're back to: are we ok with this?

sizeof(int); // returns an unsigned integer type
#include <stddef.h>
sizeof(int); // now returns a typedef to an unsigned integer type

The current implementation retains the possibility for users to write typedef typeof(sizeof(int)) size_t, while all other scenarios fall outside the intended design.

@rjmccall
Copy link
Contributor

Is there a reason to do so? Intermediate sugar types don't generally have much of an effect on diagnostics, which is the motivation for this work.

@YexuanXiao
Copy link
Author

Is there a reason to do so? Intermediate sugar types don't generally have much of an effect on diagnostics, which is the motivation for this work.

When __size_t is a built-in type, template instantiation diagnostic messages show __size_t rather than unsigned long.

@rjmccall
Copy link
Contributor

That is not possible, and this is exactly what I'm worried about with all this discussion about making size_t more "built-in". size_t is specified to be a typedef of a (platform-dependent) standard integer type, and it needs to continue to behave that way; we cannot actually make it a different type, no matter how much cleaner we personally think the language would be if it were. That is not fundamentally changed by the committee adding the %z format specifier or 19z literals or anything like that.

@AaronBallman
Copy link
Collaborator

That is not possible, and this is exactly what I'm worried about with all this discussion about making size_t more "built-in". size_t is specified to be a typedef of a (platform-dependent) standard integer type, and it needs to continue to behave that way; we cannot actually make it a different type, no matter how much cleaner we personally think the language would be if it were. That is not fundamentally changed by the committee adding the %z format specifier or 19z literals or anything like that.

Er, I feel like I must be missing something, because I think we can do this. size_t is defined to be an implementation-defined unsigned integer type. It is not required to be defined to unsigned int or unsigned long, etc explicitly, or even a standard integer type at all. So I think we can define it to be __size_t, so long as that type is compatible with the platform-specific underlying type.

@zygoloid
Copy link
Collaborator

zygoloid commented Apr 23, 2025

size_t is defined to be an implementation-defined unsigned integer type. It is not required to be defined to unsigned int or unsigned long, etc explicitly, or even a standard integer type at all. So I think we can define it to be __size_t, so long as that type is compatible with the platform-specific underlying type.

The specification of <stddef.h> requires size_t to be an unsigned integer type. The only type that can be compatible with an unsigned integer type in C is an enumeration type, and enumeration types (while being integer types) are not unsigned integer types. So I don't believe the C standard permits using a type that is merely compatible with the desired type.

Moreover, we're not just implementing the C standard, we're also implementing the platform ABI, which specifies exactly which type size_t is. While we could in principle use a type that is merely compatible with that type, given that C essentially makes that difference unobservable, as noted above C does not permit the existence of an unsigned integer type that is compatible with another unsigned integer type, and the platform ABIs all define size_t to be an unsigned integer type.

I really don't think this is a fruitful line of investigation. I'd also note that, while in C there are some allowances for compatible types, in C++ there are not, and given that the problem isn't C-specific, it seems like the solution shouldn't be either. Type sugar is the proper way to model that we want to print a type differently in some circumstances. Let's not invent something else.

When adding __size_t, _signed_size_t, __ptrdiff_t, and __unsigned_ptrdiff_t, how can the underlying types of the typedefs provided by the standard library also be set to these?

Is there a reason to do so? Intermediate sugar types don't generally have much of an effect on diagnostics, which is the motivation for this work.

There's some motivation, at least in principle. We do some work to preserve common type sugar when merging types, and this might allow us to preserve the type sugar in a case like:

const auto a = sizeof(0);
#include <stddef.h>
size_t b = 0;
// Type has common sugar of size_t and typeof(sizeof(0)).
cond ? a : b

But I don't think this needs to be a priority. (Also, we could make the type sugar merging logic explicitly handle this case if needed, rather than changing the types that the library typedefs are aliases for.)

@YexuanXiao
Copy link
Author

Er, I feel like I must be missing something, because I think we can do this. size_t is defined to be an implementation-defined unsigned integer type. It is not required to be defined to unsigned int or unsigned long, etc explicitly, or even a standard integer type at all. So I think we can define it to be __size_t, so long as that type is compatible with the platform-specific underlying type.

At least, it must ensure that __size_t is equivalent to size_t and generates the same mangled name.

@rjmccall
Copy link
Contributor

rjmccall commented Apr 23, 2025

Something being unspecified in the standard doesn't mean we're unconstrained. In this case, it is unspecified because it is allowed to vary between platforms, but platforms are required to specify what type size_t is, and it is well-defined to write code based on that assumption on that platform. Unportable code is not invalid code. Almost all C and C++ code makes some non-portable assumptions, and this one in particular is very common.

Moreover, C++ encodes the underlying type of typedefs into the ABI. size_t appears in a lot of type signatures, and it is mangled as its underlying type. (This is actually very annoying for writing portable compiler tests because e.g. operator new(size_t) has a different mangled name on different targets, but we just have to deal with it.) The type produced by sizeof expressions can similarly easily propagate into the ABI through template argument deduction and auto. None of this can be changed without massively breaking the ABI. It is off the table.

Now, C might be different because of how loose the compatible-types rule is. If you want to pursue this just for C, we can talk about it; I don't know that it's a good idea, but we can at least talk about it.

@YexuanXiao
Copy link
Author

YexuanXiao commented Apr 23, 2025

At least, it must ensure that __size_t is equivalent to size_t and generates the same mangled name.

Is that possible? @rjmccall

@AaronBallman
Copy link
Collaborator

size_t is defined to be an implementation-defined unsigned integer type. It is not required to be defined to unsigned int or unsigned long, etc explicitly, or even a standard integer type at all. So I think we can define it to be __size_t, so long as that type is compatible with the platform-specific underlying type.

The specification of <stddef.h> requires size_t to be an unsigned integer type. The only type that can be compatible with an unsigned integer type in C is an enumeration type, and enumeration types (while being integer types) are not unsigned integer types. So I don't believe the C standard permits using a type that is merely compatible with the desired type.

If the type is an extension, we can define its compatibility rules, can't we? We certainly treat _int32 and int as being compatible: https://godbolt.org/z/66eKrWKcs

I was thinking we'd be doing the same here.

Moreover, we're not just implementing the C standard, we're also implementing the platform ABI, which specifies exactly which type size_t is. While we could in principle use a type that is merely compatible with that type, given that C essentially makes that difference unobservable, as noted above C does not permit the existence of an unsigned integer type that is compatible with another unsigned integer type, and the platform ABIs all define size_t to be an unsigned integer type.

I really don't think this is a fruitful line of investigation. I'd also note that, while in C there are some allowances for compatible types, in C++ there are not, and given that the problem isn't C-specific, it seems like the solution shouldn't be either. Type sugar is the proper way to model that we want to print a type differently in some circumstances. Let's not invent something else.

Type sugar may still be a better approach than aliasing the type to the underlying type like I was envisioning.

@mizvekov
Copy link
Contributor

mizvekov commented Apr 23, 2025

With template specialization resugaring, these being typedefs still help somewhat:
https://compiler-explorer.com/z/qKxbYMEGq
or simpler : https://compiler-explorer.com/z/z9TsebGvs

You have to make a bit of contortion to expose the intermediate type, but I think that's partly due to a different problem, where in diagnostics we don't try to skip over some unhelpful top level typedefs, like vector's value_type.

@rjmccall
Copy link
Contributor

rjmccall commented Apr 23, 2025

At least, it must ensure that __size_t is equivalent to size_t and generates the same mangled name.

Is that possible? @rjmccall

If size_t is a different type from __size_t, then void foo(size_t); declares a different entity from void foo(__size_t);. The two functions therefore must be mangled differently in order to allow them to coexist in the program. The same rule applies consistently to template specializations and everywhere else a type can be written.

The result is that, no, this is not possible. Two types that mangle the same way really need to be the same type. This is the relationship between a type and a sugaring of the type, not the relationship between two different types that are compatible (share a common representation).

@AaronBallman
Copy link
Collaborator

Something being unspecified in the standard doesn't mean we're unconstrained. In this case, it is unspecified because it is allowed to vary between platforms, but platforms are required to specify what type size_t is, and it is well-defined to write code based on that assumption on that platform. Unportable code is not invalid code. Almost all C and C++ code makes some non-portable assumptions, and this one in particular is very common.

Moreover, C++ encodes the underlying type of typedefs into the ABI. size_t appears in a lot of type signatures, and it is mangled as its underlying type. (This is actually very annoying for writing portable compiler tests because e.g. operator new(size_t) has a different mangled name on different targets, but we just have to deal with it.) The type produced by sizeof expressions can similarly easily propagate into the ABI through template argument deduction and auto. None of this can be changed without massively breaking the ABI. It is off the table.

Now, C might be different because of how loose the compatible-types rule is. If you want to pursue this just for C, we can talk about it; I don't know that it's a good idea, but we can at least talk about it.

Okay, I see where the disconnect is now. I was using standards terms like "compatible" when what I was really meaning was "alias". e.g., I wasn't suggesting we introduce a distinct, new type. I was suggesting we take the existing types and give them a spelling of __size_t. Same for how we already handle things like _int32 and int; they're the same type, just with different ways of spelling it.

But I'm more and more thinking Richard is correct, this is just a fancy form of sugar.

@mizvekov
Copy link
Contributor

If the reason to even consider doing this as a new builtin type is due to templates, please don't do that.

The better alternative here is to wait for template specialization resugaring to land.

@YexuanXiao
Copy link
Author

YexuanXiao commented Apr 23, 2025

If the reason to even consider doing this as a new builtin type is due to templates, please don't do that.

The better alternative here is to wait for template specialization resugaring to land.

The main purpose is for these expressions, with templates being just one possible beneficiary.

Templates should print user-defined typedefs instead of the underlying types, but this is not within the scope of this PR.

@mizvekov
Copy link
Contributor

The main purpose is for these expressions, with templates being just one possible beneficiary.

Sure, but outside of templates, all forms of type sugar are preserved just fine in the current implementation, so there would be no particular reason to pursue a different builtin type.

@zygoloid
Copy link
Collaborator

If the type is an extension, we can define its compatibility rules, can't we?

Well, the type size_t is not an extension, and we can't make strictly conforming C programs observably use a compatibility rule that the C standard doesn't permit. I think we could make the claim that there is no way in C to observe the difference between two types being compatible and being the same, and therefore this doesn't change the behavior of any strictly conforming code, so we can do it under as-if. Maybe that's true, but I don't think that's a good idea: if it's ever observable that these types are different but compatible, that would be a bug, and relying on nothing in C (or in C extensions that we don't own the specification of) exposing that ever seems risky. And I think the other arguments against still stand.

(Note for example that this would presumably result in an ABI break for clang's __attribute__((overloadable)), and who knows how many other extensions that care about type identity not only type compatibility. Some of those are surely fixable, but are all of them?)

We certainly treat _int32 and int as being compatible: https://godbolt.org/z/66eKrWKcs

It looks to me like we treat the keyword _int32 as just a funny spelling for int (and don't even preserve the spelling as type sugar), not as a distinct type that is merely compatible with int.

@mizvekov
Copy link
Contributor

To be precise, the problem here is not really creating a new builtin type, as long as that new builtin type is still canonically the same as the existing builtin type.

This could certainly be a strategy to give a new spelling for existing cases like _int32, instead of synthesizing a typedef for it.

@erichkeane
Copy link
Collaborator

We VERY much shouldn't be creating a new type. If we make a __size_t (or the magic thing Richard suggested), it needs to stay a type-alias (which makes mangling et-al irrelevant).

I could see us just auto-introducing generating a __size_type alias in ASTContext, then 'replacing' it if we see a compatible type alias of size_t. That way, any version of sizeof from before the type alias declaration gets the __size_type alias, and anything after gets the size_t alias (again, only if the proper underlying type).

@mizvekov
Copy link
Contributor

It doesn't need to be a type alias, it just needs to be type sugar to an existing type. A type alias is just one of the forms to achieve that.

@mizvekov
Copy link
Contributor

mizvekov commented Apr 23, 2025

The advantage of a "builtin-type as sugar" approach is that you get the same effect without having to synthesize a TypedefDecl (and the associated polution that currently causes in AST dumps).

This might expose existing 'bugs' where a builtin-type node is assumed to be a canonical type, so it could take more effort to upstream.

@mizvekov
Copy link
Contributor

If you make it a canonical node, you also automatically don't get an aka in diagnostics, which might be an advantage (or not).

Ie you avoid the aka in size_t (aka 'unsigned long') you currently get.

@erichkeane
Copy link
Collaborator

If you make it a canonical node, you also automatically don't get an aka in diagnostics, which might be an advantage (or not).

Ie you avoid the aka in size_t (aka 'unsigned long') you currently get.

I don't think you'd WANT to avoid that, I would want a type-alias because what we want is actually something that works exactly LIKE a type alias in every way, except doesn't require declaration. Optimally we'd just generate a size_t alias, but as Aaron mentioned, we cannot use that name.

YES, of course we could come up with a new type-like-thing that aliases a specific type (at which point, I'd say it isn't really a type, just a type alias with a fancy representation), but not have it be a type alias, but really all that does is save us a node in the AST. IF we were worried about THAT, we could have it lazily constructed, though that buys us little.

@mizvekov
Copy link
Contributor

You can still do that with a "builtin-type as sugar" approach, you just have to make it a non-canonical node (ie isSugared() returns true).

It's one of the options, and whether we print an aka or not is still a diagnostic policy (we should do it only if it's relevant, but our implementation is currently very simple).

@AaronBallman
Copy link
Collaborator

Yeah, sorry, I feel I accidentally derailed the conversation by talking about new types. :-) Are we converging on the idea of using type sugar?

@YexuanXiao
Copy link
Author

The anticipated implementation from the discussion is beyond my capabilities.

@YexuanXiao YexuanXiao closed this Apr 28, 2025
@yuxuanchen1997
Copy link
Member

The anticipated implementation from the discussion is beyond my capabilities.

Hey, chiming in here since I've been silently following this thread and I think you've been doing great. The conversation might seem intimidating but once we converge on an approach, the code change should be straightforward. Don't question your own capabilities.

@YexuanXiao
Copy link
Author

Hey, chiming in here since I've been silently following this thread and I think you've been doing great. The conversation might seem intimidating but once we converge on an approach, the code change should be straightforward. Don't question your own capabilities.

I fully understand and agree with the previous discussions. I wasn’t intimidated by them—the issue is simply my lack of familiarity with Clang. I have some other lighter ideas that now take higher priority, and this idea won't be lost by closing the PR. I’ll revisit it when the opportunity arises. Thank you for your concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:static analyzer clang Clang issues not falling into any other category coroutines C++20 coroutines
Projects
None yet
Development

Successfully merging this pull request may close these issues.