Skip to content

Commit 41c6e43

Browse files
FznamznonThePhDAaronBallmancor3ntinh-vetinari
authored
Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (#95802)
This commit implements the entirety of the now-accepted [N3017 -Preprocessor Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and its sister C++ paper [p1967](https://wg21.link/p1967). It implements everything in the specification, and includes an implementation that drastically improves the time it takes to embed data in specific scenarios (the initialization of character type arrays). The mechanisms used to do this are used under the "as-if" rule, and in general when the system cannot detect it is initializing an array object in a variable declaration, will generate EmbedExpr AST node which will be expanded by AST consumers (CodeGen or constant expression evaluators) or expand embed directive as a comma expression. This reverts commit 682d461. --------- Co-authored-by: The Phantom Derpstorm <[email protected]> Co-authored-by: Aaron Ballman <[email protected]> Co-authored-by: cor3ntin <[email protected]> Co-authored-by: H. Vetinari <[email protected]>
1 parent af82e63 commit 41c6e43

File tree

95 files changed

+3324
-107
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+3324
-107
lines changed

clang-tools-extra/test/pp-trace/pp-trace-macro.cpp

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,15 @@ X
3131
// CHECK: MacroNameTok: __STDC_UTF_32__
3232
// CHECK-NEXT: MacroDirective: MD_Define
3333
// CHECK: - Callback: MacroDefined
34+
// CHECK-NEXT: MacroNameTok: __STDC_EMBED_NOT_FOUND__
35+
// CHECK-NEXT: MacroDirective: MD_Define
36+
// CHECK: - Callback: MacroDefined
37+
// CHECK-NEXT: MacroNameTok: __STDC_EMBED_FOUND__
38+
// CHECK-NEXT: MacroDirective: MD_Define
39+
// CHECK: - Callback: MacroDefined
40+
// CHECK-NEXT: MacroNameTok: __STDC_EMBED_EMPTY__
41+
// CHECK-NEXT: MacroDirective: MD_Define
42+
// CHECK: - Callback: MacroDefined
3443
// CHECK: - Callback: MacroDefined
3544
// CHECK-NEXT: MacroNameTok: MACRO
3645
// CHECK-NEXT: MacroDirective: MD_Define

clang/docs/LanguageExtensions.rst

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1502,6 +1502,7 @@ Attributes on Structured Bindings __cpp_structured_bindings C+
15021502
Designated initializers (N494) C99 C89
15031503
Array & element qualification (N2607) C23 C89
15041504
Attributes (N2335) C23 C89
1505+
``#embed`` (N3017) C23 C89, C++
15051506
============================================ ================================ ============= =============
15061507

15071508
Type Trait Primitives
@@ -5664,3 +5665,26 @@ Compiling different TUs depending on these flags (including use of
56645665
``std::hardware_destructive_interference``) with different compilers, macro
56655666
definitions, or architecture flags will lead to ODR violations and should be
56665667
avoided.
5668+
5669+
``#embed`` Parameters
5670+
=====================
5671+
5672+
``clang::offset``
5673+
-----------------
5674+
The ``clang::offset`` embed parameter may appear zero or one time in the
5675+
embed parameter sequence. Its preprocessor argument clause shall be present and
5676+
have the form:
5677+
5678+
..code-block: text
5679+
5680+
( constant-expression )
5681+
5682+
and shall be an integer constant expression. The integer constant expression
5683+
shall not evaluate to a value less than 0. The token ``defined`` shall not
5684+
appear within the constant expression.
5685+
5686+
The offset will be used when reading the contents of the embedded resource to
5687+
specify the starting offset to begin embedding from. The resources is treated
5688+
as being empty if the specified offset is larger than the number of bytes in
5689+
the resource. The offset will be applied *before* any ``limit`` parameters are
5690+
applied.

clang/include/clang/AST/Expr.h

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4799,6 +4799,164 @@ class SourceLocExpr final : public Expr {
47994799
friend class ASTStmtReader;
48004800
};
48014801

4802+
/// Stores data related to a single #embed directive.
4803+
struct EmbedDataStorage {
4804+
StringLiteral *BinaryData;
4805+
size_t getDataElementCount() const { return BinaryData->getByteLength(); }
4806+
};
4807+
4808+
/// Represents a reference to #emded data. By default, this references the whole
4809+
/// range. Otherwise it represents a subrange of data imported by #embed
4810+
/// directive. Needed to handle nested initializer lists with #embed directives.
4811+
/// Example:
4812+
/// struct S {
4813+
/// int x, y;
4814+
/// };
4815+
///
4816+
/// struct T {
4817+
/// int x[2];
4818+
/// struct S s
4819+
/// };
4820+
///
4821+
/// struct T t[] = {
4822+
/// #embed "data" // data contains 10 elements;
4823+
/// };
4824+
///
4825+
/// The resulting semantic form of initializer list will contain (EE stands
4826+
/// for EmbedExpr):
4827+
/// { {EE(first two data elements), {EE(3rd element), EE(4th element) }},
4828+
/// { {EE(5th and 6th element), {EE(7th element), EE(8th element) }},
4829+
/// { {EE(9th and 10th element), { zeroinitializer }}}
4830+
///
4831+
/// EmbedExpr inside of a semantic initializer list and referencing more than
4832+
/// one element can only appear for arrays of scalars.
4833+
class EmbedExpr final : public Expr {
4834+
SourceLocation EmbedKeywordLoc;
4835+
IntegerLiteral *FakeChildNode = nullptr;
4836+
const ASTContext *Ctx = nullptr;
4837+
EmbedDataStorage *Data;
4838+
unsigned Begin = 0;
4839+
unsigned NumOfElements;
4840+
4841+
public:
4842+
EmbedExpr(const ASTContext &Ctx, SourceLocation Loc, EmbedDataStorage *Data,
4843+
unsigned Begin, unsigned NumOfElements);
4844+
explicit EmbedExpr(EmptyShell Empty) : Expr(SourceLocExprClass, Empty) {}
4845+
4846+
SourceLocation getLocation() const { return EmbedKeywordLoc; }
4847+
SourceLocation getBeginLoc() const { return EmbedKeywordLoc; }
4848+
SourceLocation getEndLoc() const { return EmbedKeywordLoc; }
4849+
4850+
StringLiteral *getDataStringLiteral() const { return Data->BinaryData; }
4851+
EmbedDataStorage *getData() const { return Data; }
4852+
4853+
unsigned getStartingElementPos() const { return Begin; }
4854+
size_t getDataElementCount() const { return NumOfElements; }
4855+
4856+
// Allows accessing every byte of EmbedExpr data and iterating over it.
4857+
// An Iterator knows the EmbedExpr that it refers to, and an offset value
4858+
// within the data.
4859+
// Dereferencing an Iterator results in construction of IntegerLiteral AST
4860+
// node filled with byte of data of the corresponding EmbedExpr within offset
4861+
// that the Iterator currently has.
4862+
template <bool Const>
4863+
class ChildElementIter
4864+
: public llvm::iterator_facade_base<
4865+
ChildElementIter<Const>, std::random_access_iterator_tag,
4866+
std::conditional_t<Const, const IntegerLiteral *,
4867+
IntegerLiteral *>> {
4868+
friend class EmbedExpr;
4869+
4870+
EmbedExpr *EExpr = nullptr;
4871+
unsigned long long CurOffset = ULLONG_MAX;
4872+
using BaseTy = typename ChildElementIter::iterator_facade_base;
4873+
4874+
ChildElementIter(EmbedExpr *E) : EExpr(E) {
4875+
if (E)
4876+
CurOffset = E->getStartingElementPos();
4877+
}
4878+
4879+
public:
4880+
ChildElementIter() : CurOffset(ULLONG_MAX) {}
4881+
typename BaseTy::reference operator*() const {
4882+
assert(EExpr && CurOffset != ULLONG_MAX &&
4883+
"trying to dereference an invalid iterator");
4884+
IntegerLiteral *N = EExpr->FakeChildNode;
4885+
StringRef DataRef = EExpr->Data->BinaryData->getBytes();
4886+
N->setValue(*EExpr->Ctx,
4887+
llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
4888+
N->getType()->isSignedIntegerType()));
4889+
// We want to return a reference to the fake child node in the
4890+
// EmbedExpr, not the local variable N.
4891+
return const_cast<typename BaseTy::reference>(EExpr->FakeChildNode);
4892+
}
4893+
typename BaseTy::pointer operator->() const { return **this; }
4894+
using BaseTy::operator++;
4895+
ChildElementIter &operator++() {
4896+
assert(EExpr && "trying to increment an invalid iterator");
4897+
assert(CurOffset != ULLONG_MAX &&
4898+
"Already at the end of what we can iterate over");
4899+
if (++CurOffset >=
4900+
EExpr->getDataElementCount() + EExpr->getStartingElementPos()) {
4901+
CurOffset = ULLONG_MAX;
4902+
EExpr = nullptr;
4903+
}
4904+
return *this;
4905+
}
4906+
bool operator==(ChildElementIter Other) const {
4907+
return (EExpr == Other.EExpr && CurOffset == Other.CurOffset);
4908+
}
4909+
}; // class ChildElementIter
4910+
4911+
public:
4912+
using fake_child_range = llvm::iterator_range<ChildElementIter<false>>;
4913+
using const_fake_child_range = llvm::iterator_range<ChildElementIter<true>>;
4914+
4915+
fake_child_range underlying_data_elements() {
4916+
return fake_child_range(ChildElementIter<false>(this),
4917+
ChildElementIter<false>());
4918+
}
4919+
4920+
const_fake_child_range underlying_data_elements() const {
4921+
return const_fake_child_range(
4922+
ChildElementIter<true>(const_cast<EmbedExpr *>(this)),
4923+
ChildElementIter<true>());
4924+
}
4925+
4926+
child_range children() {
4927+
return child_range(child_iterator(), child_iterator());
4928+
}
4929+
4930+
const_child_range children() const {
4931+
return const_child_range(const_child_iterator(), const_child_iterator());
4932+
}
4933+
4934+
static bool classof(const Stmt *T) {
4935+
return T->getStmtClass() == EmbedExprClass;
4936+
}
4937+
4938+
ChildElementIter<false> begin() { return ChildElementIter<false>(this); }
4939+
4940+
ChildElementIter<true> begin() const {
4941+
return ChildElementIter<true>(const_cast<EmbedExpr *>(this));
4942+
}
4943+
4944+
template <typename Call, typename... Targs>
4945+
bool doForEachDataElement(Call &&C, unsigned &StartingIndexInArray,
4946+
Targs &&...Fargs) const {
4947+
for (auto It : underlying_data_elements()) {
4948+
if (!std::invoke(std::forward<Call>(C), const_cast<IntegerLiteral *>(It),
4949+
StartingIndexInArray, std::forward<Targs>(Fargs)...))
4950+
return false;
4951+
StartingIndexInArray++;
4952+
}
4953+
return true;
4954+
}
4955+
4956+
private:
4957+
friend class ASTStmtReader;
4958+
};
4959+
48024960
/// Describes an C or C++ initializer list.
48034961
///
48044962
/// InitListExpr describes an initializer list, which can be used to

clang/include/clang/AST/RecursiveASTVisitor.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2864,6 +2864,11 @@ DEF_TRAVERSE_STMT(ShuffleVectorExpr, {})
28642864
DEF_TRAVERSE_STMT(ConvertVectorExpr, {})
28652865
DEF_TRAVERSE_STMT(StmtExpr, {})
28662866
DEF_TRAVERSE_STMT(SourceLocExpr, {})
2867+
DEF_TRAVERSE_STMT(EmbedExpr, {
2868+
for (IntegerLiteral *IL : S->underlying_data_elements()) {
2869+
TRY_TO_TRAVERSE_OR_ENQUEUE_STMT(IL);
2870+
}
2871+
})
28672872

28682873
DEF_TRAVERSE_STMT(UnresolvedLookupExpr, {
28692874
TRY_TO(TraverseNestedNameSpecifierLoc(S->getQualifierLoc()));

clang/include/clang/AST/TextNodeDumper.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -409,6 +409,7 @@ class TextNodeDumper
409409
void VisitHLSLBufferDecl(const HLSLBufferDecl *D);
410410
void VisitOpenACCConstructStmt(const OpenACCConstructStmt *S);
411411
void VisitOpenACCLoopConstruct(const OpenACCLoopConstruct *S);
412+
void VisitEmbedExpr(const EmbedExpr *S);
412413
};
413414

414415
} // namespace clang

clang/include/clang/Basic/DiagnosticCommonKinds.td

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,9 @@ def err_too_large_for_fixed_point : Error<
275275
def err_unimplemented_conversion_with_fixed_point_type : Error<
276276
"conversion between fixed point and %0 is not yet supported">;
277277

278+
def err_requires_positive_value : Error<
279+
"%select{invalid value '%0'; must be positive|value '%0' is too large}1">;
280+
278281
// SEH
279282
def err_seh_expected_handler : Error<
280283
"expected '__except' or '__finally' block">;

clang/include/clang/Basic/DiagnosticLexKinds.td

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -436,6 +436,14 @@ def warn_cxx23_compat_warning_directive : Warning<
436436
def warn_c23_compat_warning_directive : Warning<
437437
"#warning is incompatible with C standards before C23">,
438438
InGroup<CPre23Compat>, DefaultIgnore;
439+
def ext_pp_embed_directive : ExtWarn<
440+
"#embed is a %select{C23|Clang}0 extension">,
441+
InGroup<C23>;
442+
def warn_compat_pp_embed_directive : Warning<
443+
"#embed is incompatible with C standards before C23">,
444+
InGroup<CPre23Compat>, DefaultIgnore;
445+
def err_pp_embed_dup_params : Error<
446+
"cannot specify parameter '%0' twice in the same '#embed' directive">;
439447

440448
def ext_pp_extra_tokens_at_eol : ExtWarn<
441449
"extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
@@ -505,6 +513,8 @@ def err_pp_invalid_directive : Error<
505513
"invalid preprocessing directive%select{|, did you mean '#%1'?}0">;
506514
def warn_pp_invalid_directive : Warning<
507515
err_pp_invalid_directive.Summary>, InGroup<DiagGroup<"unknown-directives">>;
516+
def err_pp_unknown_parameter : Error<
517+
"unknown%select{ | embed}0 preprocessor parameter '%1'">;
508518
def err_pp_directive_required : Error<
509519
"%0 must be used within a preprocessing directive">;
510520
def err_pp_file_not_found : Error<"'%0' file not found">, DefaultFatal;
@@ -719,6 +729,8 @@ def err_pp_module_build_missing_end : Error<
719729
"no matching '#pragma clang module endbuild' for this '#pragma clang module build'">;
720730

721731
def err_defined_macro_name : Error<"'defined' cannot be used as a macro name">;
732+
def err_defined_in_pp_embed : Error<
733+
"'defined' cannot appear within this context">;
722734
def err_paste_at_start : Error<
723735
"'##' cannot appear at start of macro expansion">;
724736
def err_paste_at_end : Error<"'##' cannot appear at end of macro expansion">;

clang/include/clang/Basic/DiagnosticSemaKinds.td

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1097,8 +1097,6 @@ def note_surrounding_namespace_starts_here : Note<
10971097
"surrounding namespace with visibility attribute starts here">;
10981098
def err_pragma_loop_invalid_argument_type : Error<
10991099
"invalid argument of type %0; expected an integer type">;
1100-
def err_pragma_loop_invalid_argument_value : Error<
1101-
"%select{invalid value '%0'; must be positive|value '%0' is too large}1">;
11021100
def err_pragma_loop_compatibility : Error<
11031101
"%select{incompatible|duplicate}0 directives '%1' and '%2'">;
11041102
def err_pragma_loop_precedes_nonloop : Error<

clang/include/clang/Basic/FileManager.h

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -286,12 +286,15 @@ class FileManager : public RefCountedBase<FileManager> {
286286
/// MemoryBuffer if successful, otherwise returning null.
287287
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
288288
getBufferForFile(FileEntryRef Entry, bool isVolatile = false,
289-
bool RequiresNullTerminator = true);
289+
bool RequiresNullTerminator = true,
290+
std::optional<int64_t> MaybeLimit = std::nullopt);
290291
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
291292
getBufferForFile(StringRef Filename, bool isVolatile = false,
292-
bool RequiresNullTerminator = true) const {
293-
return getBufferForFileImpl(Filename, /*FileSize=*/-1, isVolatile,
294-
RequiresNullTerminator);
293+
bool RequiresNullTerminator = true,
294+
std::optional<int64_t> MaybeLimit = std::nullopt) const {
295+
return getBufferForFileImpl(Filename,
296+
/*FileSize=*/(MaybeLimit ? *MaybeLimit : -1),
297+
isVolatile, RequiresNullTerminator);
295298
}
296299

297300
private:

clang/include/clang/Basic/StmtNodes.td

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,7 @@ def OpaqueValueExpr : StmtNode<Expr>;
204204
def TypoExpr : StmtNode<Expr>;
205205
def RecoveryExpr : StmtNode<Expr>;
206206
def BuiltinBitCastExpr : StmtNode<ExplicitCastExpr>;
207+
def EmbedExpr : StmtNode<Expr>;
207208

208209
// Microsoft Extensions.
209210
def MSPropertyRefExpr : StmtNode<Expr>;

clang/include/clang/Basic/TokenKinds.def

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,9 @@ PPKEYWORD(error)
126126
// C99 6.10.6 - Pragma Directive.
127127
PPKEYWORD(pragma)
128128

129+
// C23 & C++26 #embed
130+
PPKEYWORD(embed)
131+
129132
// GNU Extensions.
130133
PPKEYWORD(import)
131134
PPKEYWORD(include_next)
@@ -999,6 +1002,9 @@ ANNOTATION(header_unit)
9991002
// Annotation for end of input in clang-repl.
10001003
ANNOTATION(repl_input_end)
10011004

1005+
// Annotation for #embed
1006+
ANNOTATION(embed)
1007+
10021008
#undef PRAGMA_ANNOTATION
10031009
#undef ANNOTATION
10041010
#undef TESTING_KEYWORD

clang/include/clang/Driver/Options.td

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -884,6 +884,9 @@ will be ignored}]>;
884884
def L : JoinedOrSeparate<["-"], "L">, Flags<[RenderJoined]>, Group<Link_Group>,
885885
Visibility<[ClangOption, FlangOption]>,
886886
MetaVarName<"<dir>">, HelpText<"Add directory to library search path">;
887+
def embed_dir_EQ : Joined<["--"], "embed-dir=">, Group<Preprocessor_Group>,
888+
Visibility<[ClangOption, CC1Option]>, MetaVarName<"<dir>">,
889+
HelpText<"Add directory to embed search path">;
887890
def MD : Flag<["-"], "MD">, Group<M_Group>,
888891
HelpText<"Write a depfile containing user and system headers">;
889892
def MMD : Flag<["-"], "MMD">, Group<M_Group>,
@@ -1477,6 +1480,9 @@ def dD : Flag<["-"], "dD">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>
14771480
def dI : Flag<["-"], "dI">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>,
14781481
HelpText<"Print include directives in -E mode in addition to normal output">,
14791482
MarshallingInfoFlag<PreprocessorOutputOpts<"ShowIncludeDirectives">>;
1483+
def dE : Flag<["-"], "dE">, Group<d_Group>, Visibility<[CC1Option]>,
1484+
HelpText<"Print embed directives in -E mode in addition to normal output">,
1485+
MarshallingInfoFlag<PreprocessorOutputOpts<"ShowEmbedDirectives">>;
14801486
def dM : Flag<["-"], "dM">, Group<d_Group>, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>,
14811487
HelpText<"Print macro definitions in -E mode instead of normal output">;
14821488
def dead__strip : Flag<["-"], "dead_strip">;

clang/include/clang/Frontend/PreprocessorOutputOptions.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ class PreprocessorOutputOptions {
3232
LLVM_PREFERRED_TYPE(bool)
3333
unsigned ShowIncludeDirectives : 1; ///< Print includes, imports etc. within preprocessed output.
3434
LLVM_PREFERRED_TYPE(bool)
35+
unsigned ShowEmbedDirectives : 1; ///< Print embeds, etc. within preprocessed
36+
LLVM_PREFERRED_TYPE(bool)
3537
unsigned RewriteIncludes : 1; ///< Preprocess include directives only.
3638
LLVM_PREFERRED_TYPE(bool)
3739
unsigned RewriteImports : 1; ///< Include contents of transitively-imported modules.
@@ -51,6 +53,7 @@ class PreprocessorOutputOptions {
5153
ShowMacroComments = 0;
5254
ShowMacros = 0;
5355
ShowIncludeDirectives = 0;
56+
ShowEmbedDirectives = 0;
5457
RewriteIncludes = 0;
5558
RewriteImports = 0;
5659
MinimizeWhitespace = 0;

0 commit comments

Comments
 (0)