Skip to content

[Clang][AArch64] Generalise streaming mode checks for builtins. #93802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions clang/include/clang/Basic/DiagnosticSemaKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -3198,9 +3198,8 @@ def err_attribute_bad_sve_vector_size : Error<
def err_attribute_arm_feature_sve_bits_unsupported : Error<
"%0 is only supported when '-msve-vector-bits=<bits>' is specified with a "
"value of 128, 256, 512, 1024 or 2048">;
def warn_attribute_arm_sm_incompat_builtin : Warning<
"builtin call has undefined behaviour when called from a %0 function">,
InGroup<DiagGroup<"undefined-arm-streaming">>;
def err_attribute_arm_sm_incompat_builtin : Error<
"builtin can only be called from a %0 function">;
def warn_attribute_arm_za_builtin_no_za_state : Warning<
"builtin call is not valid when calling from a function without active ZA state">,
InGroup<DiagGroup<"undefined-arm-za">>;
Expand Down
1,550 changes: 781 additions & 769 deletions clang/include/clang/Basic/arm_sve.td

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion clang/include/clang/Basic/arm_sve_sme_incl.td
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ def IsStreamingCompatible : FlagType<0x4000000000>;
def IsReadZA : FlagType<0x8000000000>;
def IsWriteZA : FlagType<0x10000000000>;
def IsReductionQV : FlagType<0x20000000000>;
def IsStreamingOrSVE2p1 : FlagType<0x40000000000>; // Use for intrinsics that are common between sme/sme2 and sve2p1.
def VerifyRuntimeMode : FlagType<0x40000000000>; // Use for intrinsics that are common between SVE and SME.
def IsInZA : FlagType<0x80000000000>;
def IsOutZA : FlagType<0x100000000000>;
def IsInOutZA : FlagType<0x200000000000>;
Expand Down
11 changes: 7 additions & 4 deletions clang/include/clang/Sema/SemaARM.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,13 @@ class SemaARM : public SemaBase {
SemaARM(Sema &S);

enum ArmStreamingType {
ArmNonStreaming,
ArmStreaming,
ArmStreamingCompatible,
ArmStreamingOrSVE2p1
ArmNonStreaming, /// Intrinsic is only available in normal mode
ArmStreaming, /// Intrinsic is only available in Streaming-SVE mode.
ArmStreamingCompatible, /// Intrinsic is available both in normal and
/// Streaming-SVE mode.
VerifyRuntimeMode /// Intrinsic is available in normal mode with
/// SVE flags, or in Streaming-SVE mode with SME
/// flags. Do Sema checks for the runtime mode.
};

bool CheckARMBuiltinExclusiveCall(unsigned BuiltinID, CallExpr *TheCall,
Expand Down
123 changes: 94 additions & 29 deletions clang/lib/Sema/SemaARM.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -559,31 +559,86 @@ SemaARM::ArmStreamingType getArmStreamingFnType(const FunctionDecl *FD) {
return SemaARM::ArmNonStreaming;
}

static void checkArmStreamingBuiltin(Sema &S, CallExpr *TheCall,
const FunctionDecl *FD,
SemaARM::ArmStreamingType BuiltinType) {
static bool checkArmStreamingBuiltin(Sema &S, CallExpr *TheCall,
FunctionDecl *FD,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can FD's constness be restored? I think you only had to remove it because you previously called getODRHash.

SemaARM::ArmStreamingType BuiltinType,
unsigned BuiltinID) {
SemaARM::ArmStreamingType FnType = getArmStreamingFnType(FD);
if (BuiltinType == SemaARM::ArmStreamingOrSVE2p1) {
// Check intrinsics that are available in [sve2p1 or sme/sme2].
llvm::StringMap<bool> CallerFeatureMap;
S.Context.getFunctionFeatureMap(CallerFeatureMap, FD);
if (Builtin::evaluateRequiredTargetFeatures("sve2p1", CallerFeatureMap))
BuiltinType = SemaARM::ArmStreamingCompatible;
else

// Check if the intrinsic is available in the right mode, i.e.
// * When compiling for SME only, the caller must be in streaming mode.
// * When compiling for SVE only, the caller must be in non-streaming mode.
// * When compiling for both SVE and SME, the caller can be in either mode.
if (BuiltinType == SemaARM::VerifyRuntimeMode) {
static llvm::StringMap<bool> CallerFeatureMapWithoutSVE,
CallerFeatureMapWithoutSME;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I'm wrong but I think the use of static here is almost certainly bad because there's nothing stopping multiple threads from calling checkArmStreamingBuiltin. I looked for other instances but all I could find involved one time static initialisation after which the data is effectively constant.

Random Idea:
Rather than filtering the feature map you could process BuiltinTargetGuards, by which I mean you could split the guard into streaming and non-streaming (perhaps that's what | effectively means) and then you use whichever side is relevant to the function's mode of operation.

Of course you could just remove the cache and leave compile time as a worry for tomorrow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, not sure what I was thinking here with the 'static' variable :)

I'd rather not go down the route of parsing the BuiltinTargetGuards here or making assumptions on the format if we're not going to ensure this format in the arm_sve.td file. For example, one can write (sve,featureX)|(sme2,featureX) or (sve|sme),featureX. Sure we can assume the former format, but in that case I think we need to have a SVETargetGuard and a SMETargetGuard, that we let TableGen combine into a canonical form for the combined TargetGuard.


// Cache the feature maps, to avoid having to recalculate this for each
// builtin call.
static unsigned CachedODRHash = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (FD->getODRHash() != CachedODRHash) {
auto DisableFeatures = [](llvm::StringMap<bool> &Map, StringRef S) {
for (StringRef K : Map.keys())
if (K.starts_with(S))
Map[K] = false;
};

CallerFeatureMapWithoutSME.clear();
S.Context.getFunctionFeatureMap(CallerFeatureMapWithoutSME, FD);
DisableFeatures(CallerFeatureMapWithoutSME, "sme");

CallerFeatureMapWithoutSVE.clear();
S.Context.getFunctionFeatureMap(CallerFeatureMapWithoutSVE, FD);
DisableFeatures(CallerFeatureMapWithoutSVE, "sve");

CachedODRHash = FD->getODRHash();
}

// Avoid emitting diagnostics for a function that can never compile.
if (FnType == SemaARM::ArmStreaming && !CallerFeatureMapWithoutSVE["sme"])
return false;

// We know the builtin requires either some combination of SVE flags, or
// some combination of SME flags, but we need to figure out which part
// of the required features is satisfied by the target features.
//
// For a builtin with target guard 'sve2p1|sme2', if we compile with
// '+sve2p1,+sme', then we know that it satisfies the 'sve2p1' part if we
// evaluate the features for '+sve2p1,+sme,+nosme'.
//
// Similarly, if we compile with '+sve2,+sme2', then we know it satisfies
// the 'sme2' part if we evaluate the features for '+sve2,+sme2,+nosve'.
StringRef BuiltinTargetGuards(
S.Context.BuiltinInfo.getRequiredFeatures(BuiltinID));
bool SatisfiesSVE = Builtin::evaluateRequiredTargetFeatures(
BuiltinTargetGuards, CallerFeatureMapWithoutSME);
bool SatisfiesSME = Builtin::evaluateRequiredTargetFeatures(
BuiltinTargetGuards, CallerFeatureMapWithoutSVE);

if ((SatisfiesSVE && SatisfiesSME) ||
(SatisfiesSVE && FnType == SemaARM::ArmStreamingCompatible))
return false;
else if (SatisfiesSVE)
BuiltinType = SemaARM::ArmNonStreaming;
else if (SatisfiesSME)
BuiltinType = SemaARM::ArmStreaming;
else
// This should be diagnosed by CodeGen
return false;
}

if (FnType == SemaARM::ArmStreaming &&
if (FnType != SemaARM::ArmNonStreaming &&
BuiltinType == SemaARM::ArmNonStreaming)
S.Diag(TheCall->getBeginLoc(), diag::warn_attribute_arm_sm_incompat_builtin)
<< TheCall->getSourceRange() << "streaming";
else if (FnType == SemaARM::ArmNonStreaming && BuiltinType == SemaARM::ArmStreaming)
S.Diag(TheCall->getBeginLoc(), diag::warn_attribute_arm_sm_incompat_builtin)
S.Diag(TheCall->getBeginLoc(), diag::err_attribute_arm_sm_incompat_builtin)
<< TheCall->getSourceRange() << "non-streaming";
else if (FnType == SemaARM::ArmStreamingCompatible &&
BuiltinType != SemaARM::ArmStreamingCompatible)
S.Diag(TheCall->getBeginLoc(), diag::warn_attribute_arm_sm_incompat_builtin)
<< TheCall->getSourceRange() << "streaming compatible";
else if (FnType != SemaARM::ArmStreaming &&
BuiltinType == SemaARM::ArmStreaming)
S.Diag(TheCall->getBeginLoc(), diag::err_attribute_arm_sm_incompat_builtin)
<< TheCall->getSourceRange() << "streaming";
else
return false;

return true;
}

static bool hasArmZAState(const FunctionDecl *FD) {
Expand Down Expand Up @@ -612,7 +667,9 @@ static ArmSMEState getSMEState(unsigned BuiltinID) {

bool SemaARM::CheckSMEBuiltinFunctionCall(unsigned BuiltinID,
CallExpr *TheCall) {
if (const FunctionDecl *FD = SemaRef.getCurFunctionDecl()) {
bool HasError = false;

if (FunctionDecl *FD = SemaRef.getCurFunctionDecl()) {
std::optional<ArmStreamingType> BuiltinType;

switch (BuiltinID) {
Expand All @@ -622,7 +679,8 @@ bool SemaARM::CheckSMEBuiltinFunctionCall(unsigned BuiltinID,
}

if (BuiltinType)
checkArmStreamingBuiltin(SemaRef, TheCall, FD, *BuiltinType);
HasError |= checkArmStreamingBuiltin(SemaRef, TheCall, FD, *BuiltinType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be wrong to return immediately? I ask because there's

  switch (BuiltinID) {
  default:
    return false;

which should be return HasError;? but if we can return directly then there's less change of other similar issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

BuiltinID);

if ((getSMEState(BuiltinID) & ArmZAMask) && !hasArmZAState(FD))
Diag(TheCall->getBeginLoc(),
Expand All @@ -646,12 +704,14 @@ bool SemaARM::CheckSMEBuiltinFunctionCall(unsigned BuiltinID,
#undef GET_SME_IMMEDIATE_CHECK
}

return ParseSVEImmChecks(TheCall, ImmChecks);
HasError |= ParseSVEImmChecks(TheCall, ImmChecks);
return HasError;
}

bool SemaARM::CheckSVEBuiltinFunctionCall(unsigned BuiltinID,
CallExpr *TheCall) {
if (const FunctionDecl *FD = SemaRef.getCurFunctionDecl()) {
bool HasError = false;
if (FunctionDecl *FD = SemaRef.getCurFunctionDecl()) {
std::optional<ArmStreamingType> BuiltinType;

switch (BuiltinID) {
Expand All @@ -660,7 +720,8 @@ bool SemaARM::CheckSVEBuiltinFunctionCall(unsigned BuiltinID,
#undef GET_SVE_STREAMING_ATTRS
}
if (BuiltinType)
checkArmStreamingBuiltin(SemaRef, TheCall, FD, *BuiltinType);
HasError |= checkArmStreamingBuiltin(SemaRef, TheCall, FD, *BuiltinType,
BuiltinID);
}
// Range check SVE intrinsics that take immediate values.
SmallVector<std::tuple<int, int, int>, 3> ImmChecks;
Expand All @@ -673,13 +734,15 @@ bool SemaARM::CheckSVEBuiltinFunctionCall(unsigned BuiltinID,
#undef GET_SVE_IMMEDIATE_CHECK
}

return ParseSVEImmChecks(TheCall, ImmChecks);
HasError |= ParseSVEImmChecks(TheCall, ImmChecks);
return HasError;
}

bool SemaARM::CheckNeonBuiltinFunctionCall(const TargetInfo &TI,
unsigned BuiltinID,
CallExpr *TheCall) {
if (const FunctionDecl *FD = SemaRef.getCurFunctionDecl()) {
bool HasError = false;
if (FunctionDecl *FD = SemaRef.getCurFunctionDecl()) {

switch (BuiltinID) {
default:
Expand All @@ -688,7 +751,8 @@ bool SemaARM::CheckNeonBuiltinFunctionCall(const TargetInfo &TI,
#define TARGET_BUILTIN(id, ...) case NEON::BI##id:
#define BUILTIN(id, ...) case NEON::BI##id:
#include "clang/Basic/arm_neon.inc"
checkArmStreamingBuiltin(SemaRef, TheCall, FD, ArmNonStreaming);
HasError |= checkArmStreamingBuiltin(SemaRef, TheCall, FD,
ArmNonStreaming, BuiltinID);
break;
#undef TARGET_BUILTIN
#undef BUILTIN
Expand Down Expand Up @@ -753,14 +817,15 @@ bool SemaARM::CheckNeonBuiltinFunctionCall(const TargetInfo &TI,
unsigned i = 0, l = 0, u = 0;
switch (BuiltinID) {
default:
return false;
return HasError;
#define GET_NEON_IMMEDIATE_CHECK
#include "clang/Basic/arm_fp16.inc"
#include "clang/Basic/arm_neon.inc"
#undef GET_NEON_IMMEDIATE_CHECK
}

return SemaRef.BuiltinConstantArgRange(TheCall, i, l, u + l);
HasError |= SemaRef.BuiltinConstantArgRange(TheCall, i, l, u + l);
return HasError;
}

bool SemaARM::CheckMVEBuiltinFunctionCall(unsigned BuiltinID,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sve -target-feature +sme2 -O1 -Werror -emit-llvm -o - %s | FileCheck %s
// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -O1 -Werror -emit-llvm -o - %s | FileCheck %s
// RUN: %clang_cc1 -triple aarch64 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sve -target-feature +sve2 -target-feature +sme -target-feature +sve2p1 -O1 -Werror -emit-llvm -o - %s | FileCheck %s
// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sve -target-feature +sve2 -target-feature +sve2p1 -O1 -Werror -emit-llvm -o - %s | FileCheck %s
// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -O1 -Werror -emit-llvm -o - -x c++ %s | FileCheck %s -check-prefix=CPP-CHECK
// RUN: %clang_cc1 -triple aarch64 -target-feature +bf16 -target-feature +sme -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s

Expand Down
12 changes: 9 additions & 3 deletions clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@

#include <arm_sve.h>

#ifdef __ARM_FEATURE_SME
#define ATTR __arm_streaming
#else
#define ATTR
#endif

#ifdef SVE_OVERLOADED_FORMS
// A simple used,unused... macro, long enough to represent any SVE builtin.
#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED,A5) A1##A3##A5
Expand All @@ -34,7 +40,7 @@
// CPP-CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sqrshrn.x2.nxv4i32(<vscale x 4 x i32> [[TMP0]], <vscale x 4 x i32> [[TMP1]], i32 16)
// CPP-CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
//
svint16_t test_svqrshrn_s16_s32_x2(svint32x2_t zn) __arm_streaming_compatible {
svint16_t test_svqrshrn_s16_s32_x2(svint32x2_t zn) ATTR {
return SVE_ACLE_FUNC(svqrshrn,_n,_s16,_s32_x2,)(zn, 16);
}

Expand All @@ -54,7 +60,7 @@ svint16_t test_svqrshrn_s16_s32_x2(svint32x2_t zn) __arm_streaming_compatible {
// CPP-CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uqrshrn.x2.nxv4i32(<vscale x 4 x i32> [[TMP0]], <vscale x 4 x i32> [[TMP1]], i32 16)
// CPP-CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
//
svuint16_t test_svqrshrn_u16_u32_x2(svuint32x2_t zn) __arm_streaming_compatible {
svuint16_t test_svqrshrn_u16_u32_x2(svuint32x2_t zn) ATTR {
return SVE_ACLE_FUNC(svqrshrn,_n,_u16,_u32_x2,)(zn, 16);
}

Expand All @@ -74,6 +80,6 @@ svuint16_t test_svqrshrn_u16_u32_x2(svuint32x2_t zn) __arm_streaming_compatible
// CPP-CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sqrshrun.x2.nxv4i32(<vscale x 4 x i32> [[TMP0]], <vscale x 4 x i32> [[TMP1]], i32 16)
// CPP-CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
//
svuint16_t test_svqrshrun_u16_s32_x2(svint32x2_t zn) __arm_streaming_compatible {
svuint16_t test_svqrshrun_u16_s32_x2(svint32x2_t zn) ATTR {
return SVE_ACLE_FUNC(svqrshrun,_n,_u16,_s32_x2,)(zn, 16);
}
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ void base(int8x16_t n, bfloat16x8_t m) {
svget_neonq_s8(svundef_s8()); // expected-error {{'svget_neonq_s8' needs target feature sve}}
svdup_neonq_s8(n); // expected-error {{'svdup_neonq_s8' needs target feature sve}}

// expected-error@+1 {{'svundef_bf16' needs target feature sve}}
// expected-error@+1 {{'svundef_bf16' needs target feature (sve,bf16)|sme}}
svset_neonq_bf16(svundef_bf16(), m); // expected-error {{'svset_neonq_bf16' needs target feature sve,bf16}}
// expected-error@+1 {{'svundef_bf16' needs target feature sve}}
// expected-error@+1 {{'svundef_bf16' needs target feature (sve,bf16)|sme}}
svget_neonq_bf16(svundef_bf16()); // expected-error {{'svget_neonq_bf16' needs target feature sve,bf16}}
svdup_neonq_bf16(m); // expected-error {{'svdup_neonq_bf16' needs target feature sve,bf16}}
}
Loading
Loading