-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[Clang] [C23] Implement N2653: u8 strings are char8_t[] #97208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang Author: Mital Ashok (MitalAshok) Changeshttps://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm Closes #97202 Full diff: https://github.com/llvm/llvm-project/pull/97208.diff 7 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index c720e47dbe35b..e51be81d8b11a 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -337,6 +337,12 @@ C23 Feature Support
- Properly promote bit-fields of bit-precise integer types to the field's type
rather than to ``int``. #GH87641
+- Compiler support for `N2653 char8_t: A type for UTF-8 characters and strings`
+ <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm>`_: ``u8`` string
+ literals are now of type ``char8_t[N]`` in C23 and expose
+ ``__CLANG_ATOMIC_CHAR8_T_LOCK_FREE``/``__GCC_ATOMIC_CHAR8_T_LOCK_FREE`` to
+ implement the corresponding macro in ``<stdatomic.h>``.
+
Non-comprehensive list of changes in this release
-------------------------------------------------
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 5dc36c594bcb7..6a00b92df1c36 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -7252,7 +7252,10 @@ def err_array_init_utf8_string_into_char : Error<
def warn_cxx20_compat_utf8_string : Warning<
"type of UTF-8 string literal will change from array of const char to "
"array of const char8_t in C++20">, InGroup<CXX20Compat>, DefaultIgnore;
-def note_cxx20_compat_utf8_string_remove_u8 : Note<
+def warn_c23_compat_utf8_string : Warning<
+ "type of UTF-8 string literal will change from array of char to "
+ "array of char8_t in C23">, InGroup<C23Compat>, DefaultIgnore;
+def note_cxx20_c23_compat_utf8_string_remove_u8 : Note<
"remove 'u8' prefix to avoid a change of behavior; "
"Clang encodes unprefixed narrow string literals as UTF-8">;
def err_array_init_different_type : Error<
diff --git a/clang/lib/Frontend/InitPreprocessor.cpp b/clang/lib/Frontend/InitPreprocessor.cpp
index 55ec460064830..6270c37342bcf 100644
--- a/clang/lib/Frontend/InitPreprocessor.cpp
+++ b/clang/lib/Frontend/InitPreprocessor.cpp
@@ -1342,8 +1342,10 @@ static void InitializePredefinedMacros(const TargetInfo &TI,
getLockFreeValue(TI.get##Type##Width(), TI));
DEFINE_LOCK_FREE_MACRO(BOOL, Bool);
DEFINE_LOCK_FREE_MACRO(CHAR, Char);
- if (LangOpts.Char8)
- DEFINE_LOCK_FREE_MACRO(CHAR8_T, Char); // Treat char8_t like char.
+ // char8_t has the same representation / width as unsigned
+ // char in C++ and is a typedef for unsigned char in C23
+ if (LangOpts.Char8 || LangOpts.C23)
+ DEFINE_LOCK_FREE_MACRO(CHAR8_T, Char);
DEFINE_LOCK_FREE_MACRO(CHAR16_T, Char16);
DEFINE_LOCK_FREE_MACRO(CHAR32_T, Char32);
DEFINE_LOCK_FREE_MACRO(WCHAR_T, WChar);
diff --git a/clang/lib/Headers/stdatomic.h b/clang/lib/Headers/stdatomic.h
index 9c103d98af8c5..c33cd8083525c 100644
--- a/clang/lib/Headers/stdatomic.h
+++ b/clang/lib/Headers/stdatomic.h
@@ -35,6 +35,10 @@ extern "C" {
#define ATOMIC_BOOL_LOCK_FREE __CLANG_ATOMIC_BOOL_LOCK_FREE
#define ATOMIC_CHAR_LOCK_FREE __CLANG_ATOMIC_CHAR_LOCK_FREE
+#if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L) || \
+ defined(__cplusplus)
+#define ATOMIC_CHAR8_T_LOCK_FREE __CLANG_ATOMIC_CHAR8_T_LOCK_FREE
+#endif
#define ATOMIC_CHAR16_T_LOCK_FREE __CLANG_ATOMIC_CHAR16_T_LOCK_FREE
#define ATOMIC_CHAR32_T_LOCK_FREE __CLANG_ATOMIC_CHAR32_T_LOCK_FREE
#define ATOMIC_WCHAR_T_LOCK_FREE __CLANG_ATOMIC_WCHAR_T_LOCK_FREE
@@ -104,6 +108,7 @@ typedef _Atomic(long) atomic_long;
typedef _Atomic(unsigned long) atomic_ulong;
typedef _Atomic(long long) atomic_llong;
typedef _Atomic(unsigned long long) atomic_ullong;
+typedef _Atomic(unsigned char) atomic_char8_t;
typedef _Atomic(uint_least16_t) atomic_char16_t;
typedef _Atomic(uint_least32_t) atomic_char32_t;
typedef _Atomic(wchar_t) atomic_wchar_t;
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index db44cfe1288b6..a1b060f7f1510 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -2082,6 +2082,8 @@ Sema::ActOnStringLiteral(ArrayRef<Token> StringToks, Scope *UDLScope) {
} else if (Literal.isUTF8()) {
if (getLangOpts().Char8)
CharTy = Context.Char8Ty;
+ else if (getLangOpts().C23)
+ CharTy = Context.UnsignedCharTy;
Kind = StringLiteralKind::UTF8;
} else if (Literal.isUTF16()) {
CharTy = Context.Char16Ty;
@@ -2093,17 +2095,24 @@ Sema::ActOnStringLiteral(ArrayRef<Token> StringToks, Scope *UDLScope) {
CharTy = Context.UnsignedCharTy;
}
- // Warn on initializing an array of char from a u8 string literal; this
- // becomes ill-formed in C++2a.
- if (getLangOpts().CPlusPlus && !getLangOpts().CPlusPlus20 &&
- !getLangOpts().Char8 && Kind == StringLiteralKind::UTF8) {
- Diag(StringTokLocs.front(), diag::warn_cxx20_compat_utf8_string);
+ // Warn on u8 string literals before C++20 and C23, whose type
+ // was an array of char before but becomes an array of char8_t.
+ // In C++20, initializing an array of char from a u8 string literal
+ // becomes ill-formed. In C23, it might have an unexpected value if
+ // char was signed.
+ if (Kind == StringLiteralKind::UTF8 &&
+ (getLangOpts().CPlusPlus
+ ? !getLangOpts().CPlusPlus20 && !getLangOpts().Char8
+ : !getLangOpts().C23)) {
+ Diag(StringTokLocs.front(), getLangOpts().CPlusPlus
+ ? diag::warn_cxx20_compat_utf8_string
+ : diag::warn_c23_compat_utf8_string);
// Create removals for all 'u8' prefixes in the string literal(s). This
- // ensures C++2a compatibility (but may change the program behavior when
+ // ensures C++20/C23 compatibility (but may change the program behavior when
// built by non-Clang compilers for which the execution character set is
// not always UTF-8).
- auto RemovalDiag = PDiag(diag::note_cxx20_compat_utf8_string_remove_u8);
+ auto RemovalDiag = PDiag(diag::note_cxx20_c23_compat_utf8_string_remove_u8);
SourceLocation RemovalDiagLoc;
for (const Token &Tok : StringToks) {
if (Tok.getKind() == tok::utf8_string_literal) {
diff --git a/clang/test/C/C2x/n2653.c b/clang/test/C/C2x/n2653.c
new file mode 100644
index 0000000000000..1abd61947de7e
--- /dev/null
+++ b/clang/test/C/C2x/n2653.c
@@ -0,0 +1,29 @@
+// RUN: %clang_cc1 -verify=c23 -std=c23 %s
+// RUN: %clang_cc1 -verify=c17 -std=c17 %s
+
+// c23-no-diagnostics
+
+#include <stdatomic.h>
+
+#define __enable_constant_folding(x) (__builtin_constant_p(x) ? (x) : (x))
+
+#ifndef ATOMIC_CHAR8_T_LOCK_FREE
+#error missing
+#endif
+// c17-error@-2 {{missing}}
+
+_Static_assert(_Generic(u8"", unsigned char*: 1, char*: 0), "");
+// c17-error@-1 {{static assertion failed}}
+
+// -fsigned-char is the default
+#define M(X) __enable_constant_folding((X) >= 0x80)
+
+_Static_assert(M(u8"\U000000E9"[0]), "");
+// c17-error@-1 {{static assertion failed}}
+#if __STDC_VERSION__ >= 202311L
+_Static_assert(M(u8'\xC3'), "");
+#endif
+
+const char cu8[] = u8"text";
+const signed char scu8[] = u8"text";
+const unsigned char ucu8[] = u8"text";
diff --git a/clang/www/c_status.html b/clang/www/c_status.html
index 84cd8e836006c..81bb51a58e5cb 100644
--- a/clang/www/c_status.html
+++ b/clang/www/c_status.html
@@ -1061,7 +1061,7 @@ <h2 id="c2x">C23 implementation status</h2>
<tr>
<td>char8_t: A type for UTF-8 characters and strings</td>
<td><a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm">N2653</a></td>
- <td class="none" align="center">No</td>
+ <td class="unreleased" align="center">Clang 19</td>
</tr>
<tr>
<td>Clarification for max exponent macros-update</td>
|
@llvm/pr-subscribers-backend-x86 Author: Mital Ashok (MitalAshok) Changeshttps://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm Closes #97202 Full diff: https://github.com/llvm/llvm-project/pull/97208.diff 7 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index c720e47dbe35b..e51be81d8b11a 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -337,6 +337,12 @@ C23 Feature Support
- Properly promote bit-fields of bit-precise integer types to the field's type
rather than to ``int``. #GH87641
+- Compiler support for `N2653 char8_t: A type for UTF-8 characters and strings`
+ <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm>`_: ``u8`` string
+ literals are now of type ``char8_t[N]`` in C23 and expose
+ ``__CLANG_ATOMIC_CHAR8_T_LOCK_FREE``/``__GCC_ATOMIC_CHAR8_T_LOCK_FREE`` to
+ implement the corresponding macro in ``<stdatomic.h>``.
+
Non-comprehensive list of changes in this release
-------------------------------------------------
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 5dc36c594bcb7..6a00b92df1c36 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -7252,7 +7252,10 @@ def err_array_init_utf8_string_into_char : Error<
def warn_cxx20_compat_utf8_string : Warning<
"type of UTF-8 string literal will change from array of const char to "
"array of const char8_t in C++20">, InGroup<CXX20Compat>, DefaultIgnore;
-def note_cxx20_compat_utf8_string_remove_u8 : Note<
+def warn_c23_compat_utf8_string : Warning<
+ "type of UTF-8 string literal will change from array of char to "
+ "array of char8_t in C23">, InGroup<C23Compat>, DefaultIgnore;
+def note_cxx20_c23_compat_utf8_string_remove_u8 : Note<
"remove 'u8' prefix to avoid a change of behavior; "
"Clang encodes unprefixed narrow string literals as UTF-8">;
def err_array_init_different_type : Error<
diff --git a/clang/lib/Frontend/InitPreprocessor.cpp b/clang/lib/Frontend/InitPreprocessor.cpp
index 55ec460064830..6270c37342bcf 100644
--- a/clang/lib/Frontend/InitPreprocessor.cpp
+++ b/clang/lib/Frontend/InitPreprocessor.cpp
@@ -1342,8 +1342,10 @@ static void InitializePredefinedMacros(const TargetInfo &TI,
getLockFreeValue(TI.get##Type##Width(), TI));
DEFINE_LOCK_FREE_MACRO(BOOL, Bool);
DEFINE_LOCK_FREE_MACRO(CHAR, Char);
- if (LangOpts.Char8)
- DEFINE_LOCK_FREE_MACRO(CHAR8_T, Char); // Treat char8_t like char.
+ // char8_t has the same representation / width as unsigned
+ // char in C++ and is a typedef for unsigned char in C23
+ if (LangOpts.Char8 || LangOpts.C23)
+ DEFINE_LOCK_FREE_MACRO(CHAR8_T, Char);
DEFINE_LOCK_FREE_MACRO(CHAR16_T, Char16);
DEFINE_LOCK_FREE_MACRO(CHAR32_T, Char32);
DEFINE_LOCK_FREE_MACRO(WCHAR_T, WChar);
diff --git a/clang/lib/Headers/stdatomic.h b/clang/lib/Headers/stdatomic.h
index 9c103d98af8c5..c33cd8083525c 100644
--- a/clang/lib/Headers/stdatomic.h
+++ b/clang/lib/Headers/stdatomic.h
@@ -35,6 +35,10 @@ extern "C" {
#define ATOMIC_BOOL_LOCK_FREE __CLANG_ATOMIC_BOOL_LOCK_FREE
#define ATOMIC_CHAR_LOCK_FREE __CLANG_ATOMIC_CHAR_LOCK_FREE
+#if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L) || \
+ defined(__cplusplus)
+#define ATOMIC_CHAR8_T_LOCK_FREE __CLANG_ATOMIC_CHAR8_T_LOCK_FREE
+#endif
#define ATOMIC_CHAR16_T_LOCK_FREE __CLANG_ATOMIC_CHAR16_T_LOCK_FREE
#define ATOMIC_CHAR32_T_LOCK_FREE __CLANG_ATOMIC_CHAR32_T_LOCK_FREE
#define ATOMIC_WCHAR_T_LOCK_FREE __CLANG_ATOMIC_WCHAR_T_LOCK_FREE
@@ -104,6 +108,7 @@ typedef _Atomic(long) atomic_long;
typedef _Atomic(unsigned long) atomic_ulong;
typedef _Atomic(long long) atomic_llong;
typedef _Atomic(unsigned long long) atomic_ullong;
+typedef _Atomic(unsigned char) atomic_char8_t;
typedef _Atomic(uint_least16_t) atomic_char16_t;
typedef _Atomic(uint_least32_t) atomic_char32_t;
typedef _Atomic(wchar_t) atomic_wchar_t;
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index db44cfe1288b6..a1b060f7f1510 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -2082,6 +2082,8 @@ Sema::ActOnStringLiteral(ArrayRef<Token> StringToks, Scope *UDLScope) {
} else if (Literal.isUTF8()) {
if (getLangOpts().Char8)
CharTy = Context.Char8Ty;
+ else if (getLangOpts().C23)
+ CharTy = Context.UnsignedCharTy;
Kind = StringLiteralKind::UTF8;
} else if (Literal.isUTF16()) {
CharTy = Context.Char16Ty;
@@ -2093,17 +2095,24 @@ Sema::ActOnStringLiteral(ArrayRef<Token> StringToks, Scope *UDLScope) {
CharTy = Context.UnsignedCharTy;
}
- // Warn on initializing an array of char from a u8 string literal; this
- // becomes ill-formed in C++2a.
- if (getLangOpts().CPlusPlus && !getLangOpts().CPlusPlus20 &&
- !getLangOpts().Char8 && Kind == StringLiteralKind::UTF8) {
- Diag(StringTokLocs.front(), diag::warn_cxx20_compat_utf8_string);
+ // Warn on u8 string literals before C++20 and C23, whose type
+ // was an array of char before but becomes an array of char8_t.
+ // In C++20, initializing an array of char from a u8 string literal
+ // becomes ill-formed. In C23, it might have an unexpected value if
+ // char was signed.
+ if (Kind == StringLiteralKind::UTF8 &&
+ (getLangOpts().CPlusPlus
+ ? !getLangOpts().CPlusPlus20 && !getLangOpts().Char8
+ : !getLangOpts().C23)) {
+ Diag(StringTokLocs.front(), getLangOpts().CPlusPlus
+ ? diag::warn_cxx20_compat_utf8_string
+ : diag::warn_c23_compat_utf8_string);
// Create removals for all 'u8' prefixes in the string literal(s). This
- // ensures C++2a compatibility (but may change the program behavior when
+ // ensures C++20/C23 compatibility (but may change the program behavior when
// built by non-Clang compilers for which the execution character set is
// not always UTF-8).
- auto RemovalDiag = PDiag(diag::note_cxx20_compat_utf8_string_remove_u8);
+ auto RemovalDiag = PDiag(diag::note_cxx20_c23_compat_utf8_string_remove_u8);
SourceLocation RemovalDiagLoc;
for (const Token &Tok : StringToks) {
if (Tok.getKind() == tok::utf8_string_literal) {
diff --git a/clang/test/C/C2x/n2653.c b/clang/test/C/C2x/n2653.c
new file mode 100644
index 0000000000000..1abd61947de7e
--- /dev/null
+++ b/clang/test/C/C2x/n2653.c
@@ -0,0 +1,29 @@
+// RUN: %clang_cc1 -verify=c23 -std=c23 %s
+// RUN: %clang_cc1 -verify=c17 -std=c17 %s
+
+// c23-no-diagnostics
+
+#include <stdatomic.h>
+
+#define __enable_constant_folding(x) (__builtin_constant_p(x) ? (x) : (x))
+
+#ifndef ATOMIC_CHAR8_T_LOCK_FREE
+#error missing
+#endif
+// c17-error@-2 {{missing}}
+
+_Static_assert(_Generic(u8"", unsigned char*: 1, char*: 0), "");
+// c17-error@-1 {{static assertion failed}}
+
+// -fsigned-char is the default
+#define M(X) __enable_constant_folding((X) >= 0x80)
+
+_Static_assert(M(u8"\U000000E9"[0]), "");
+// c17-error@-1 {{static assertion failed}}
+#if __STDC_VERSION__ >= 202311L
+_Static_assert(M(u8'\xC3'), "");
+#endif
+
+const char cu8[] = u8"text";
+const signed char scu8[] = u8"text";
+const unsigned char ucu8[] = u8"text";
diff --git a/clang/www/c_status.html b/clang/www/c_status.html
index 84cd8e836006c..81bb51a58e5cb 100644
--- a/clang/www/c_status.html
+++ b/clang/www/c_status.html
@@ -1061,7 +1061,7 @@ <h2 id="c2x">C23 implementation status</h2>
<tr>
<td>char8_t: A type for UTF-8 characters and strings</td>
<td><a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm">N2653</a></td>
- <td class="none" align="center">No</td>
+ <td class="unreleased" align="center">Clang 19</td>
</tr>
<tr>
<td>Clarification for max exponent macros-update</td>
|
You can test this locally with the following command:git-clang-format --diff fa0e52995929ab67dfb468d71fe793be5e1c7f03 c2528de29ac278f25773042e93ac6c9ba8d4701e --extensions c,cpp,h -- clang/test/C/C23/n2653.c clang/lib/Frontend/InitPreprocessor.cpp clang/lib/Headers/stdatomic.h clang/lib/Sema/SemaExpr.cpp View the diff from clang-format here.diff --git a/clang/lib/Headers/stdatomic.h b/clang/lib/Headers/stdatomic.h
index 2027055f38..ebe92eca64 100644
--- a/clang/lib/Headers/stdatomic.h
+++ b/clang/lib/Headers/stdatomic.h
@@ -36,7 +36,7 @@ extern "C" {
#define ATOMIC_BOOL_LOCK_FREE __CLANG_ATOMIC_BOOL_LOCK_FREE
#define ATOMIC_CHAR_LOCK_FREE __CLANG_ATOMIC_CHAR_LOCK_FREE
#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
-#define ATOMIC_CHAR8_T_LOCK_FREE __CLANG_ATOMIC_CHAR8_T_LOCK_FREE
+#define ATOMIC_CHAR8_T_LOCK_FREE __CLANG_ATOMIC_CHAR8_T_LOCK_FREE
#endif
#define ATOMIC_CHAR16_T_LOCK_FREE __CLANG_ATOMIC_CHAR16_T_LOCK_FREE
#define ATOMIC_CHAR32_T_LOCK_FREE __CLANG_ATOMIC_CHAR32_T_LOCK_FREE
@@ -108,7 +108,7 @@ typedef _Atomic(unsigned long) atomic_ulong;
typedef _Atomic(long long) atomic_llong;
typedef _Atomic(unsigned long long) atomic_ullong;
#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
-typedef _Atomic(unsigned char) atomic_char8_t;
+typedef _Atomic(unsigned char) atomic_char8_t;
#endif
typedef _Atomic(uint_least16_t) atomic_char16_t;
typedef _Atomic(uint_least32_t) atomic_char32_t;
|
Also the clang-format issues are intentional to fit the style of the surrounding lines |
This should be equivalent to: \#if (defined(__STDC_VERSION__) && __STDC_VERSION__ > 202311L) || \ (defined(__cpp_char8_t) && __cpp_char8_t >= 201811L)
clang/lib/Headers/stdatomic.h
Outdated
typedef _Atomic(unsigned char) atomic_char8_t; | ||
typedef _Atomic(uint_least16_t) atomic_char16_t; | ||
typedef _Atomic(uint_least32_t) atomic_char32_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the definition of atomic_char8_t
also be gated behind C23/C++20?
Also in C++20, this should be _Atomic(char8_t)
, but the next two are also wrong in C++11 (_Atomic(char16_t)
/_Atomic(char32_t)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes atomic_char8_t
should be gated behind C23 because the identifier isn't reserved until C23. For C++, it should be gated behind C++20.
As for the definition, technically in C they should be using char8_t
and friends as well, but those are defined in uchar.h
which we don't vend with the compiler (should we be vending that one in freestanding?). GCC seems to use unsigned char
directly rather than use a transitive include, so I think that approach is fine for us to take as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this header doesn't properly implement P0943R6 in C++23 (Where #define _Atomic(T) std::atomic<T>
and all these typedefs should just be using std::atomic_*
.
So ::atomic_char8_t
should be std::atomic<char8_t>
, not std::atomic<unsigned char>
(nor unsigned char _Atomic
), but that is outside the scope of this PR. I'll just make this match the surrounding typedefs for now.
if (LangOpts.Char8 || LangOpts.C23) | ||
DefineType("__CHAR8_TYPE__", TI.UnsignedChar, Builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches GCC behaviour: https://godbolt.org/z/6Eax3eqrd
But I'm not sure why it's defined in C++ at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm that is surprising, because it expands to unsigned char
and not char8_t
there: https://godbolt.org/z/Y96bPbj8e and this seems to have changed between GCC 8 and GCC 9: https://godbolt.org/z/EaEPKr59E
It may be worth opening a GCC bug to see if this was intentional or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://gcc.gnu.org/legacy-ml/gcc-patches/2018-11/msg00295.html __CHAR8_TYPE__
is "the underlying type of char8_t
" in C++. char8_t
in C++ was added in GCC9 which explains the difference between GCC8 and 9
And this made me realise -std=c++11 -fchar8_t
sets LangOpts.Char8
in C++11, which I think this patch handles fine (See also: #97601).
Also -std=c23 -fchar8_t
is a mode that exists. It makes typeof(u8'x')
char8_t
and typeof(u8"")
char8_t[1]
, even with this patch, which I guess is fine. Pending #55373 if -std=c23 -fchar8_t
even makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm certainly concerned with the impact of that in C++. It should not be defined in C++ (ie, char9_t is not unsigned char in C++)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think #55373 should be fixed so that the option is rejected in C modes, but that's somewhat orthogonal to this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precommit CI has found relevant failures:
�_bk;t=1720277739321�********************
�_bk;t=1720277739321�Failed Tests (1):
�_bk;t=1720277739321� Clang :: C/C2x/n2653.c
�_bk;t=1720277739321�
�_bk;t=1720277739321�
�_bk;t=1720277739321�Testing Time: 261.44s
clang/test/C/C2x/n2653.c
Outdated
// RUN: %clang_cc1 -verify=c23 -std=c23 %s | ||
// RUN: %clang_cc1 -verify=c17 -std=c17 %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add -ffreestanding
here otherwise your include below is using whatever it finds on the system rather than our header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That still seems to fail on the CI:
******************** TEST 'Clang :: C/C23/n2653.c' FAILED ********************
Exit Code: 1
Command Output (stderr):
--
RUN: at line 1: /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/build/lib/clang/19/include -nostdsysteminc -ffreestanding -verify=c23 -std=c23 /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/clang/test/C/C23/n2653.c
+ /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/build/lib/clang/19/include -nostdsysteminc -ffreestanding -verify=c23 -std=c23 /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/clang/test/C/C23/n2653.c
error: 'c23-error' diagnostics seen but not expected:
File /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-vhr99-1/llvm-project/github-pull-requests/clang/test/C/C23/n2653.c Line 11: missing
1 error generated.
--
********************
So I've just removed the test for <stdatomic.h> (-nostdinc
seems to disable the clang header too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should remove the test for stdatomic,h, that's an important property for us to test and may be papering over a bug here. The only situation we should be getting the system header is:
#if __STDC_HOSTED__ && \
__has_include_next(<stdatomic.h>) && \
(!defined(_MSC_VER) || (defined(__cplusplus) && __cplusplus >= 202002L))
# include_next <stdatomic.h>
#else
...
so `-ffreestanding` really should only include our header. What do you get from `-E` on that machine? Are we actually including the system header anyway, or did we mess up our header somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I ran the wrong tests then misread some -E
output... Should be fixed now (__STDC_VERSION__ > 202311L
-> __STDC_VERSION__ >= 202311L
)
Still fails with -nostdsysteminc -ffreestanding (seems to be pulling in other system headers still)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please wait for @cor3ntin to give a final sign-off before landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm Closes #97202 --------- Co-authored-by: cor3ntin <[email protected]> Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250894
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm
Closes #97202