Skip to content

[Clang] [C23] Implement N2653: u8 strings are char8_t[] #97208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 17, 2024

Conversation

MitalAshok
Copy link
Contributor

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics labels Jun 30, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 30, 2024

@llvm/pr-subscribers-clang

Author: Mital Ashok (MitalAshok)

Changes

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm

Closes #97202


Full diff: https://github.com/llvm/llvm-project/pull/97208.diff

7 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+6)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+4-1)
  • (modified) clang/lib/Frontend/InitPreprocessor.cpp (+4-2)
  • (modified) clang/lib/Headers/stdatomic.h (+5)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+16-7)
  • (added) clang/test/C/C2x/n2653.c (+29)
  • (modified) clang/www/c_status.html (+1-1)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index c720e47dbe35b..e51be81d8b11a 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -337,6 +337,12 @@ C23 Feature Support
 - Properly promote bit-fields of bit-precise integer types to the field's type
   rather than to ``int``. #GH87641
 
+- Compiler support for `N2653 char8_t: A type for UTF-8 characters and strings`
+  <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm>`_: ``u8`` string
+  literals are now of type ``char8_t[N]`` in C23 and expose
+  ``__CLANG_ATOMIC_CHAR8_T_LOCK_FREE``/``__GCC_ATOMIC_CHAR8_T_LOCK_FREE`` to
+  implement the corresponding macro in ``<stdatomic.h>``.
+
 Non-comprehensive list of changes in this release
 -------------------------------------------------
 
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 5dc36c594bcb7..6a00b92df1c36 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -7252,7 +7252,10 @@ def err_array_init_utf8_string_into_char : Error<
 def warn_cxx20_compat_utf8_string : Warning<
   "type of UTF-8 string literal will change from array of const char to "
   "array of const char8_t in C++20">, InGroup<CXX20Compat>, DefaultIgnore;
-def note_cxx20_compat_utf8_string_remove_u8 : Note<
+def warn_c23_compat_utf8_string : Warning<
+  "type of UTF-8 string literal will change from array of char to "
+  "array of char8_t in C23">, InGroup<C23Compat>, DefaultIgnore;
+def note_cxx20_c23_compat_utf8_string_remove_u8 : Note<
   "remove 'u8' prefix to avoid a change of behavior; "
   "Clang encodes unprefixed narrow string literals as UTF-8">;
 def err_array_init_different_type : Error<
diff --git a/clang/lib/Frontend/InitPreprocessor.cpp b/clang/lib/Frontend/InitPreprocessor.cpp
index 55ec460064830..6270c37342bcf 100644
--- a/clang/lib/Frontend/InitPreprocessor.cpp
+++ b/clang/lib/Frontend/InitPreprocessor.cpp
@@ -1342,8 +1342,10 @@ static void InitializePredefinedMacros(const TargetInfo &TI,
                       getLockFreeValue(TI.get##Type##Width(), TI));
     DEFINE_LOCK_FREE_MACRO(BOOL, Bool);
     DEFINE_LOCK_FREE_MACRO(CHAR, Char);
-    if (LangOpts.Char8)
-      DEFINE_LOCK_FREE_MACRO(CHAR8_T, Char); // Treat char8_t like char.
+    // char8_t has the same representation / width as unsigned
+    // char in C++ and is a typedef for unsigned char in C23
+    if (LangOpts.Char8 || LangOpts.C23)
+      DEFINE_LOCK_FREE_MACRO(CHAR8_T, Char);
     DEFINE_LOCK_FREE_MACRO(CHAR16_T, Char16);
     DEFINE_LOCK_FREE_MACRO(CHAR32_T, Char32);
     DEFINE_LOCK_FREE_MACRO(WCHAR_T, WChar);
diff --git a/clang/lib/Headers/stdatomic.h b/clang/lib/Headers/stdatomic.h
index 9c103d98af8c5..c33cd8083525c 100644
--- a/clang/lib/Headers/stdatomic.h
+++ b/clang/lib/Headers/stdatomic.h
@@ -35,6 +35,10 @@ extern "C" {
 
 #define ATOMIC_BOOL_LOCK_FREE       __CLANG_ATOMIC_BOOL_LOCK_FREE
 #define ATOMIC_CHAR_LOCK_FREE       __CLANG_ATOMIC_CHAR_LOCK_FREE
+#if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L) ||              \
+    defined(__cplusplus)
+#define ATOMIC_CHAR8_T_LOCK_FREE    __CLANG_ATOMIC_CHAR8_T_LOCK_FREE
+#endif
 #define ATOMIC_CHAR16_T_LOCK_FREE   __CLANG_ATOMIC_CHAR16_T_LOCK_FREE
 #define ATOMIC_CHAR32_T_LOCK_FREE   __CLANG_ATOMIC_CHAR32_T_LOCK_FREE
 #define ATOMIC_WCHAR_T_LOCK_FREE    __CLANG_ATOMIC_WCHAR_T_LOCK_FREE
@@ -104,6 +108,7 @@ typedef _Atomic(long)               atomic_long;
 typedef _Atomic(unsigned long)      atomic_ulong;
 typedef _Atomic(long long)          atomic_llong;
 typedef _Atomic(unsigned long long) atomic_ullong;
+typedef _Atomic(unsigned char)      atomic_char8_t;
 typedef _Atomic(uint_least16_t)     atomic_char16_t;
 typedef _Atomic(uint_least32_t)     atomic_char32_t;
 typedef _Atomic(wchar_t)            atomic_wchar_t;
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index db44cfe1288b6..a1b060f7f1510 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -2082,6 +2082,8 @@ Sema::ActOnStringLiteral(ArrayRef<Token> StringToks, Scope *UDLScope) {
   } else if (Literal.isUTF8()) {
     if (getLangOpts().Char8)
       CharTy = Context.Char8Ty;
+    else if (getLangOpts().C23)
+      CharTy = Context.UnsignedCharTy;
     Kind = StringLiteralKind::UTF8;
   } else if (Literal.isUTF16()) {
     CharTy = Context.Char16Ty;
@@ -2093,17 +2095,24 @@ Sema::ActOnStringLiteral(ArrayRef<Token> StringToks, Scope *UDLScope) {
     CharTy = Context.UnsignedCharTy;
   }
 
-  // Warn on initializing an array of char from a u8 string literal; this
-  // becomes ill-formed in C++2a.
-  if (getLangOpts().CPlusPlus && !getLangOpts().CPlusPlus20 &&
-      !getLangOpts().Char8 && Kind == StringLiteralKind::UTF8) {
-    Diag(StringTokLocs.front(), diag::warn_cxx20_compat_utf8_string);
+  // Warn on u8 string literals before C++20 and C23, whose type
+  // was an array of char before but becomes an array of char8_t.
+  // In C++20, initializing an array of char from a u8 string literal
+  // becomes ill-formed. In C23, it might have an unexpected value if
+  // char was signed.
+  if (Kind == StringLiteralKind::UTF8 &&
+      (getLangOpts().CPlusPlus
+           ? !getLangOpts().CPlusPlus20 && !getLangOpts().Char8
+           : !getLangOpts().C23)) {
+    Diag(StringTokLocs.front(), getLangOpts().CPlusPlus
+                                    ? diag::warn_cxx20_compat_utf8_string
+                                    : diag::warn_c23_compat_utf8_string);
 
     // Create removals for all 'u8' prefixes in the string literal(s). This
-    // ensures C++2a compatibility (but may change the program behavior when
+    // ensures C++20/C23 compatibility (but may change the program behavior when
     // built by non-Clang compilers for which the execution character set is
     // not always UTF-8).
-    auto RemovalDiag = PDiag(diag::note_cxx20_compat_utf8_string_remove_u8);
+    auto RemovalDiag = PDiag(diag::note_cxx20_c23_compat_utf8_string_remove_u8);
     SourceLocation RemovalDiagLoc;
     for (const Token &Tok : StringToks) {
       if (Tok.getKind() == tok::utf8_string_literal) {
diff --git a/clang/test/C/C2x/n2653.c b/clang/test/C/C2x/n2653.c
new file mode 100644
index 0000000000000..1abd61947de7e
--- /dev/null
+++ b/clang/test/C/C2x/n2653.c
@@ -0,0 +1,29 @@
+// RUN: %clang_cc1 -verify=c23 -std=c23 %s
+// RUN: %clang_cc1 -verify=c17 -std=c17 %s
+
+// c23-no-diagnostics
+
+#include <stdatomic.h>
+
+#define __enable_constant_folding(x) (__builtin_constant_p(x) ? (x) : (x))
+
+#ifndef ATOMIC_CHAR8_T_LOCK_FREE
+#error missing
+#endif
+// c17-error@-2 {{missing}}
+
+_Static_assert(_Generic(u8"", unsigned char*: 1, char*: 0), "");
+// c17-error@-1 {{static assertion failed}}
+
+// -fsigned-char is the default
+#define M(X) __enable_constant_folding((X) >= 0x80)
+
+_Static_assert(M(u8"\U000000E9"[0]), "");
+// c17-error@-1 {{static assertion failed}}
+#if __STDC_VERSION__ >= 202311L
+_Static_assert(M(u8'\xC3'), "");
+#endif
+
+const          char cu8[]  = u8"text";
+const signed   char scu8[] = u8"text";
+const unsigned char ucu8[] = u8"text";
diff --git a/clang/www/c_status.html b/clang/www/c_status.html
index 84cd8e836006c..81bb51a58e5cb 100644
--- a/clang/www/c_status.html
+++ b/clang/www/c_status.html
@@ -1061,7 +1061,7 @@ <h2 id="c2x">C23 implementation status</h2>
     <tr>
       <td>char8_t: A type for UTF-8 characters and strings</td>
       <td><a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm">N2653</a></td>
-      <td class="none" align="center">No</td>
+      <td class="unreleased" align="center">Clang 19</td>
     </tr>
     <tr>
       <td>Clarification for max exponent macros-update</td>

@llvmbot
Copy link
Member

llvmbot commented Jun 30, 2024

@llvm/pr-subscribers-backend-x86

Author: Mital Ashok (MitalAshok)

Changes

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm

Closes #97202


Full diff: https://github.com/llvm/llvm-project/pull/97208.diff

7 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+6)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+4-1)
  • (modified) clang/lib/Frontend/InitPreprocessor.cpp (+4-2)
  • (modified) clang/lib/Headers/stdatomic.h (+5)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+16-7)
  • (added) clang/test/C/C2x/n2653.c (+29)
  • (modified) clang/www/c_status.html (+1-1)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index c720e47dbe35b..e51be81d8b11a 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -337,6 +337,12 @@ C23 Feature Support
 - Properly promote bit-fields of bit-precise integer types to the field's type
   rather than to ``int``. #GH87641
 
+- Compiler support for `N2653 char8_t: A type for UTF-8 characters and strings`
+  <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm>`_: ``u8`` string
+  literals are now of type ``char8_t[N]`` in C23 and expose
+  ``__CLANG_ATOMIC_CHAR8_T_LOCK_FREE``/``__GCC_ATOMIC_CHAR8_T_LOCK_FREE`` to
+  implement the corresponding macro in ``<stdatomic.h>``.
+
 Non-comprehensive list of changes in this release
 -------------------------------------------------
 
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 5dc36c594bcb7..6a00b92df1c36 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -7252,7 +7252,10 @@ def err_array_init_utf8_string_into_char : Error<
 def warn_cxx20_compat_utf8_string : Warning<
   "type of UTF-8 string literal will change from array of const char to "
   "array of const char8_t in C++20">, InGroup<CXX20Compat>, DefaultIgnore;
-def note_cxx20_compat_utf8_string_remove_u8 : Note<
+def warn_c23_compat_utf8_string : Warning<
+  "type of UTF-8 string literal will change from array of char to "
+  "array of char8_t in C23">, InGroup<C23Compat>, DefaultIgnore;
+def note_cxx20_c23_compat_utf8_string_remove_u8 : Note<
   "remove 'u8' prefix to avoid a change of behavior; "
   "Clang encodes unprefixed narrow string literals as UTF-8">;
 def err_array_init_different_type : Error<
diff --git a/clang/lib/Frontend/InitPreprocessor.cpp b/clang/lib/Frontend/InitPreprocessor.cpp
index 55ec460064830..6270c37342bcf 100644
--- a/clang/lib/Frontend/InitPreprocessor.cpp
+++ b/clang/lib/Frontend/InitPreprocessor.cpp
@@ -1342,8 +1342,10 @@ static void InitializePredefinedMacros(const TargetInfo &TI,
                       getLockFreeValue(TI.get##Type##Width(), TI));
     DEFINE_LOCK_FREE_MACRO(BOOL, Bool);
     DEFINE_LOCK_FREE_MACRO(CHAR, Char);
-    if (LangOpts.Char8)
-      DEFINE_LOCK_FREE_MACRO(CHAR8_T, Char); // Treat char8_t like char.
+    // char8_t has the same representation / width as unsigned
+    // char in C++ and is a typedef for unsigned char in C23
+    if (LangOpts.Char8 || LangOpts.C23)
+      DEFINE_LOCK_FREE_MACRO(CHAR8_T, Char);
     DEFINE_LOCK_FREE_MACRO(CHAR16_T, Char16);
     DEFINE_LOCK_FREE_MACRO(CHAR32_T, Char32);
     DEFINE_LOCK_FREE_MACRO(WCHAR_T, WChar);
diff --git a/clang/lib/Headers/stdatomic.h b/clang/lib/Headers/stdatomic.h
index 9c103d98af8c5..c33cd8083525c 100644
--- a/clang/lib/Headers/stdatomic.h
+++ b/clang/lib/Headers/stdatomic.h
@@ -35,6 +35,10 @@ extern "C" {
 
 #define ATOMIC_BOOL_LOCK_FREE       __CLANG_ATOMIC_BOOL_LOCK_FREE
 #define ATOMIC_CHAR_LOCK_FREE       __CLANG_ATOMIC_CHAR_LOCK_FREE
+#if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L) ||              \
+    defined(__cplusplus)
+#define ATOMIC_CHAR8_T_LOCK_FREE    __CLANG_ATOMIC_CHAR8_T_LOCK_FREE
+#endif
 #define ATOMIC_CHAR16_T_LOCK_FREE   __CLANG_ATOMIC_CHAR16_T_LOCK_FREE
 #define ATOMIC_CHAR32_T_LOCK_FREE   __CLANG_ATOMIC_CHAR32_T_LOCK_FREE
 #define ATOMIC_WCHAR_T_LOCK_FREE    __CLANG_ATOMIC_WCHAR_T_LOCK_FREE
@@ -104,6 +108,7 @@ typedef _Atomic(long)               atomic_long;
 typedef _Atomic(unsigned long)      atomic_ulong;
 typedef _Atomic(long long)          atomic_llong;
 typedef _Atomic(unsigned long long) atomic_ullong;
+typedef _Atomic(unsigned char)      atomic_char8_t;
 typedef _Atomic(uint_least16_t)     atomic_char16_t;
 typedef _Atomic(uint_least32_t)     atomic_char32_t;
 typedef _Atomic(wchar_t)            atomic_wchar_t;
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index db44cfe1288b6..a1b060f7f1510 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -2082,6 +2082,8 @@ Sema::ActOnStringLiteral(ArrayRef<Token> StringToks, Scope *UDLScope) {
   } else if (Literal.isUTF8()) {
     if (getLangOpts().Char8)
       CharTy = Context.Char8Ty;
+    else if (getLangOpts().C23)
+      CharTy = Context.UnsignedCharTy;
     Kind = StringLiteralKind::UTF8;
   } else if (Literal.isUTF16()) {
     CharTy = Context.Char16Ty;
@@ -2093,17 +2095,24 @@ Sema::ActOnStringLiteral(ArrayRef<Token> StringToks, Scope *UDLScope) {
     CharTy = Context.UnsignedCharTy;
   }
 
-  // Warn on initializing an array of char from a u8 string literal; this
-  // becomes ill-formed in C++2a.
-  if (getLangOpts().CPlusPlus && !getLangOpts().CPlusPlus20 &&
-      !getLangOpts().Char8 && Kind == StringLiteralKind::UTF8) {
-    Diag(StringTokLocs.front(), diag::warn_cxx20_compat_utf8_string);
+  // Warn on u8 string literals before C++20 and C23, whose type
+  // was an array of char before but becomes an array of char8_t.
+  // In C++20, initializing an array of char from a u8 string literal
+  // becomes ill-formed. In C23, it might have an unexpected value if
+  // char was signed.
+  if (Kind == StringLiteralKind::UTF8 &&
+      (getLangOpts().CPlusPlus
+           ? !getLangOpts().CPlusPlus20 && !getLangOpts().Char8
+           : !getLangOpts().C23)) {
+    Diag(StringTokLocs.front(), getLangOpts().CPlusPlus
+                                    ? diag::warn_cxx20_compat_utf8_string
+                                    : diag::warn_c23_compat_utf8_string);
 
     // Create removals for all 'u8' prefixes in the string literal(s). This
-    // ensures C++2a compatibility (but may change the program behavior when
+    // ensures C++20/C23 compatibility (but may change the program behavior when
     // built by non-Clang compilers for which the execution character set is
     // not always UTF-8).
-    auto RemovalDiag = PDiag(diag::note_cxx20_compat_utf8_string_remove_u8);
+    auto RemovalDiag = PDiag(diag::note_cxx20_c23_compat_utf8_string_remove_u8);
     SourceLocation RemovalDiagLoc;
     for (const Token &Tok : StringToks) {
       if (Tok.getKind() == tok::utf8_string_literal) {
diff --git a/clang/test/C/C2x/n2653.c b/clang/test/C/C2x/n2653.c
new file mode 100644
index 0000000000000..1abd61947de7e
--- /dev/null
+++ b/clang/test/C/C2x/n2653.c
@@ -0,0 +1,29 @@
+// RUN: %clang_cc1 -verify=c23 -std=c23 %s
+// RUN: %clang_cc1 -verify=c17 -std=c17 %s
+
+// c23-no-diagnostics
+
+#include <stdatomic.h>
+
+#define __enable_constant_folding(x) (__builtin_constant_p(x) ? (x) : (x))
+
+#ifndef ATOMIC_CHAR8_T_LOCK_FREE
+#error missing
+#endif
+// c17-error@-2 {{missing}}
+
+_Static_assert(_Generic(u8"", unsigned char*: 1, char*: 0), "");
+// c17-error@-1 {{static assertion failed}}
+
+// -fsigned-char is the default
+#define M(X) __enable_constant_folding((X) >= 0x80)
+
+_Static_assert(M(u8"\U000000E9"[0]), "");
+// c17-error@-1 {{static assertion failed}}
+#if __STDC_VERSION__ >= 202311L
+_Static_assert(M(u8'\xC3'), "");
+#endif
+
+const          char cu8[]  = u8"text";
+const signed   char scu8[] = u8"text";
+const unsigned char ucu8[] = u8"text";
diff --git a/clang/www/c_status.html b/clang/www/c_status.html
index 84cd8e836006c..81bb51a58e5cb 100644
--- a/clang/www/c_status.html
+++ b/clang/www/c_status.html
@@ -1061,7 +1061,7 @@ <h2 id="c2x">C23 implementation status</h2>
     <tr>
       <td>char8_t: A type for UTF-8 characters and strings</td>
       <td><a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm">N2653</a></td>
-      <td class="none" align="center">No</td>
+      <td class="unreleased" align="center">Clang 19</td>
     </tr>
     <tr>
       <td>Clarification for max exponent macros-update</td>

Copy link

github-actions bot commented Jun 30, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff fa0e52995929ab67dfb468d71fe793be5e1c7f03 c2528de29ac278f25773042e93ac6c9ba8d4701e --extensions c,cpp,h -- clang/test/C/C23/n2653.c clang/lib/Frontend/InitPreprocessor.cpp clang/lib/Headers/stdatomic.h clang/lib/Sema/SemaExpr.cpp
View the diff from clang-format here.
diff --git a/clang/lib/Headers/stdatomic.h b/clang/lib/Headers/stdatomic.h
index 2027055f38..ebe92eca64 100644
--- a/clang/lib/Headers/stdatomic.h
+++ b/clang/lib/Headers/stdatomic.h
@@ -36,7 +36,7 @@ extern "C" {
 #define ATOMIC_BOOL_LOCK_FREE       __CLANG_ATOMIC_BOOL_LOCK_FREE
 #define ATOMIC_CHAR_LOCK_FREE       __CLANG_ATOMIC_CHAR_LOCK_FREE
 #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
-#define ATOMIC_CHAR8_T_LOCK_FREE    __CLANG_ATOMIC_CHAR8_T_LOCK_FREE
+#define ATOMIC_CHAR8_T_LOCK_FREE __CLANG_ATOMIC_CHAR8_T_LOCK_FREE
 #endif
 #define ATOMIC_CHAR16_T_LOCK_FREE   __CLANG_ATOMIC_CHAR16_T_LOCK_FREE
 #define ATOMIC_CHAR32_T_LOCK_FREE   __CLANG_ATOMIC_CHAR32_T_LOCK_FREE
@@ -108,7 +108,7 @@ typedef _Atomic(unsigned long)      atomic_ulong;
 typedef _Atomic(long long)          atomic_llong;
 typedef _Atomic(unsigned long long) atomic_ullong;
 #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
-typedef _Atomic(unsigned char)      atomic_char8_t;
+typedef _Atomic(unsigned char) atomic_char8_t;
 #endif
 typedef _Atomic(uint_least16_t)     atomic_char16_t;
 typedef _Atomic(uint_least32_t)     atomic_char32_t;

@MitalAshok
Copy link
Contributor Author

CC @AaronBallman

Also the clang-format issues are intentional to fit the style of the surrounding lines

This should be equivalent to:

\#if (defined(__STDC_VERSION__) && __STDC_VERSION__ > 202311L) || \
    (defined(__cpp_char8_t) && __cpp_char8_t >= 201811L)
Comment on lines 110 to 112
typedef _Atomic(unsigned char) atomic_char8_t;
typedef _Atomic(uint_least16_t) atomic_char16_t;
typedef _Atomic(uint_least32_t) atomic_char32_t;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the definition of atomic_char8_t also be gated behind C23/C++20?

Also in C++20, this should be _Atomic(char8_t), but the next two are also wrong in C++11 (_Atomic(char16_t)/_Atomic(char32_t))

Copy link
Collaborator

@AaronBallman AaronBallman Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes atomic_char8_t should be gated behind C23 because the identifier isn't reserved until C23. For C++, it should be gated behind C++20.

As for the definition, technically in C they should be using char8_t and friends as well, but those are defined in uchar.h which we don't vend with the compiler (should we be vending that one in freestanding?). GCC seems to use unsigned char directly rather than use a transitive include, so I think that approach is fine for us to take as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this header doesn't properly implement P0943R6 in C++23 (Where #define _Atomic(T) std::atomic<T> and all these typedefs should just be using std::atomic_*.

So ::atomic_char8_t should be std::atomic<char8_t>, not std::atomic<unsigned char> (nor unsigned char _Atomic), but that is outside the scope of this PR. I'll just make this match the surrounding typedefs for now.

Comment on lines 1168 to 1169
if (LangOpts.Char8 || LangOpts.C23)
DefineType("__CHAR8_TYPE__", TI.UnsignedChar, Builder);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches GCC behaviour: https://godbolt.org/z/6Eax3eqrd

But I'm not sure why it's defined in C++ at all

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm that is surprising, because it expands to unsigned char and not char8_t there: https://godbolt.org/z/Y96bPbj8e and this seems to have changed between GCC 8 and GCC 9: https://godbolt.org/z/EaEPKr59E

It may be worth opening a GCC bug to see if this was intentional or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://gcc.gnu.org/legacy-ml/gcc-patches/2018-11/msg00295.html __CHAR8_TYPE__ is "the underlying type of char8_t" in C++. char8_t in C++ was added in GCC9 which explains the difference between GCC8 and 9

And this made me realise -std=c++11 -fchar8_t sets LangOpts.Char8 in C++11, which I think this patch handles fine (See also: #97601).
Also -std=c23 -fchar8_t is a mode that exists. It makes typeof(u8'x') char8_t and typeof(u8"") char8_t[1], even with this patch, which I guess is fine. Pending #55373 if -std=c23 -fchar8_t even makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm certainly concerned with the impact of that in C++. It should not be defined in C++ (ie, char9_t is not unsigned char in C++)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think #55373 should be fixed so that the option is rejected in C modes, but that's somewhat orthogonal to this patch.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precommit CI has found relevant failures:

�_bk;t=1720277739321�********************

�_bk;t=1720277739321�Failed Tests (1):

�_bk;t=1720277739321�  Clang :: C/C2x/n2653.c

�_bk;t=1720277739321�

�_bk;t=1720277739321�

�_bk;t=1720277739321�Testing Time: 261.44s

Comment on lines 1 to 2
// RUN: %clang_cc1 -verify=c23 -std=c23 %s
// RUN: %clang_cc1 -verify=c17 -std=c17 %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add -ffreestanding here otherwise your include below is using whatever it finds on the system rather than our header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That still seems to fail on the CI:

******************** TEST 'Clang :: C/C23/n2653.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/build/lib/clang/19/include -nostdsysteminc -ffreestanding -verify=c23 -std=c23 /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/clang/test/C/C23/n2653.c
+ /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/build/lib/clang/19/include -nostdsysteminc -ffreestanding -verify=c23 -std=c23 /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/clang/test/C/C23/n2653.c
error: 'c23-error' diagnostics seen but not expected: 
  File /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/clang/test/C/C23/n2653.c Line 11: missing
1 error generated.

--

********************

So I've just removed the test for <stdatomic.h> (-nostdinc seems to disable the clang header too)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should remove the test for stdatomic,h, that's an important property for us to test and may be papering over a bug here. The only situation we should be getting the system header is:

#if __STDC_HOSTED__ &&                                                         \
    __has_include_next(<stdatomic.h>) &&                                       \
    (!defined(_MSC_VER) || (defined(__cplusplus) && __cplusplus >= 202002L))
# include_next <stdatomic.h>
#else
...
so `-ffreestanding` really should only include our header. What do you get from `-E` on that machine? Are we actually including the system header anyway, or did we mess up our header somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I ran the wrong tests then misread some -E output... Should be fixed now (__STDC_VERSION__ > 202311L -> __STDC_VERSION__ >= 202311L)

Still fails with -nostdsysteminc -ffreestanding (seems to be pulling in other system headers still)
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please wait for @cor3ntin to give a final sign-off before landing.

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cor3ntin cor3ntin merged commit 329e7c8 into llvm:main Jul 17, 2024
3 of 6 checks passed
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 c23 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

u8 prefixed strings have type char instead of char8_t (aka unsigned char)
4 participants