Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Commit 279af1a

Browse files
committed
implement profiler ELT callbacks for AMD64 Linux
1 parent 1413d46 commit 279af1a

File tree

10 files changed

+521
-23
lines changed

10 files changed

+521
-23
lines changed

Documentation/botr/clr-abi.md

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -585,9 +585,9 @@ The CLR unwinder assumes any non-leaf frame was unwound as a result of a call. T
585585

586586
If the JIT gets passed `CORJIT_FLG_PROF_ENTERLEAVE`, then the JIT might need to insert native entry/exit/tail call probes. To determine for sure, the JIT must call GetProfilingHandle. This API returns as out parameters, the true dynamic boolean indicating if the JIT should actually insert the probes and a parameter to pass to the callbacks (typed as void*), with an optional indirection (used for NGEN). This parameter is always the first argument to all of the call-outs (thus placed in the usual first argument register `RCX` (AMD64) or `R0` (ARM, ARM64)).
587587

588-
Outside of the prolog (in a GC interruptible location), the JIT injects a call to `CORINFO_HELP_PROF_FCN_ENTER`. For AMD64, all argument registers will be homed into their caller-allocated stack locations (similar to varargs). For ARM and ARM64, all arguments are prespilled (again similar to varargs).
588+
Outside of the prolog (in a GC interruptible location), the JIT injects a call to `CORINFO_HELP_PROF_FCN_ENTER`. For AMD64, on Windows all argument registers will be homed into their caller-allocated stack locations (similar to varargs), on Unix all argument registers will be stored in the inner structure. For ARM and ARM64, all arguments are prespilled (again similar to varargs).
589589

590-
After computing the return value and storing it in the correct register, but before any epilog code (including before a possible GS cookie check), the JIT injects a call to `CORINFO_HELP_PROF_FCN_LEAVE`. For AMD64 this call must preserve the return register: `RAX` or `XMM0`. For ARM, the return value will be moved from `R0` to `R2` (if it was in `R0`), `R1`, `R2`, and `S0/D0` must be preserved by the callee (longs will be `R2`, `R1` - note the unusual ordering of the registers, floats in `S0`, doubles in `D0`, smaller integrals in `R2`).
590+
After computing the return value and storing it in the correct register, but before any epilog code (including before a possible GS cookie check), the JIT injects a call to `CORINFO_HELP_PROF_FCN_LEAVE`. For AMD64 this call must preserve the return register: `RAX` or `XMM0` on Windows and `RAX` and `RDX` or `XMM0` and `XMM1` on Unix. For ARM, the return value will be moved from `R0` to `R2` (if it was in `R0`), `R1`, `R2`, and `S0/D0` must be preserved by the callee (longs will be `R2`, `R1` - note the unusual ordering of the registers, floats in `S0`, doubles in `D0`, smaller integrals in `R2`).
591591

592592
TODO: describe ARM64 profile leave conventions.
593593

@@ -667,3 +667,35 @@ The general rules outlined in the System V x86_64 ABI (described at http://www.x
667667
3. The JIT proactively generates frame register frames (with `RBP` as a frame register) in order to aid the native OS tooling for stack unwinding and the like.
668668
4. All the other internal VM contracts for PInvoke, EH, and generic support remains in place. Please see the relevant sections above for more details. Note, however, that the registers used are different on System V due to the different calling convention. For example, the integer argument registers are, in order, RDI, RSI, RDX, RCX, R8, and R9. Thus, where the first argument (typically, the "this" pointer) on Windows AMD64 goes in RCX, on System V it goes in RDI, and so forth.
669669
5. Structs with explicit layout are always passed by value on the stack.
670+
6. The following table describes register usage according to the System V x86_64 ABI
671+
672+
```
673+
| Register | Usage | Preserved across |
674+
| | | function calls |
675+
|--------------|-----------------------------------------|-------------------|
676+
| %rax | temporary register; with variable argu- | No |
677+
| | ments passes information about the | |
678+
| | number of SSE registers used; | |
679+
| | 1st return argument | |
680+
| %rbx | callee-saved register; optionally used | Yes |
681+
| | as base pointer | |
682+
| %rcx | used to pass 4st integer argument to | No |
683+
| | to functions | |
684+
| %rdx | used to pass 3rd argument to functions | No |
685+
| | 2nd return register | |
686+
| %rsp | stack pointer | Yes |
687+
| %rbp | callee-saved register; optionally used | Yes |
688+
| | as frame pointer | |
689+
| %rsi | used to pass 2nd argument to functions | No |
690+
| %rdi | used to pass 1st argument to functions | No |
691+
| %r8 | used to pass 5th argument to functions | No |
692+
| %r9 | used to pass 6th argument to functions | No |
693+
| %r10 | temporary register, used for passing a | No |
694+
| | function's static chain pointer | |
695+
| %r11 | temporary register | No |
696+
| %r12-%r15 | callee-saved registers | Yes |
697+
| %xmm0-%xmm1 | used to pass and return floating point | No |
698+
| | arguments | |
699+
| %xmm2-%xmm7 | used to pass floating point arguments | No |
700+
| %xmm8-%xmm15 | temporary registers | No |
701+
```

src/jit/codegencommon.cpp

Lines changed: 110 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4412,7 +4412,9 @@ void CodeGen::genFnPrologCalleeRegArgs(regNumber xtraReg, bool* pXtraRegClobbere
44124412
if ((regSet.rsMaskPreSpillRegs(false) & genRegMask(regNum)) == 0)
44134413
#endif // _TARGET_ARM_
44144414
{
4415-
noway_assert(xtraReg != varDsc->lvArgReg + i);
4415+
#if !defined(UNIX_AMD64_ABI)
4416+
noway_assert(xtraReg != (varDsc->lvArgReg + i));
4417+
#endif
44164418
noway_assert(regArgMaskLive & genRegMask(regNum));
44174419
}
44184420

@@ -7418,7 +7420,9 @@ void CodeGen::genProfilingEnterCallback(regNumber initReg, bool* pInitRegZeroed)
74187420
return;
74197421
}
74207422

7421-
#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI) // No profiling for System V systems yet.
7423+
#if defined(_TARGET_AMD64_)
7424+
#if !defined(UNIX_AMD64_ABI)
7425+
74227426
unsigned varNum;
74237427
LclVarDsc* varDsc;
74247428

@@ -7547,6 +7551,57 @@ void CodeGen::genProfilingEnterCallback(regNumber initReg, bool* pInitRegZeroed)
75477551
*pInitRegZeroed = false;
75487552
}
75497553

7554+
#else // !defined(UNIX_AMD64_ABI)
7555+
7556+
// Emit profiler EnterCallback(ProfilerMethHnd, caller's SP)
7557+
// R14 = ProfilerMethHnd
7558+
if (compiler->compProfilerMethHndIndirected)
7559+
{
7560+
// Profiler hooks enabled during Ngen time.
7561+
// Profiler handle needs to be accessed through an indirection of a pointer.
7562+
getEmitter()->emitIns_R_AI(INS_mov, EA_PTR_DSP_RELOC, REG_PROFILER_ENTER_ARG_0,
7563+
(ssize_t)compiler->compProfilerMethHnd);
7564+
}
7565+
else
7566+
{
7567+
// No need to record relocations, if we are generating ELT hooks under the influence
7568+
// of COMPlus_JitELTHookEnabled=1
7569+
if (compiler->opts.compJitELTHookEnabled)
7570+
{
7571+
genSetRegToIcon(REG_PROFILER_ENTER_ARG_0, (ssize_t)compiler->compProfilerMethHnd, TYP_I_IMPL);
7572+
}
7573+
else
7574+
{
7575+
instGen_Set_Reg_To_Imm(EA_8BYTE, REG_PROFILER_ENTER_ARG_0, (ssize_t)compiler->compProfilerMethHnd);
7576+
}
7577+
}
7578+
7579+
// R15 = caller's SP
7580+
// Notes
7581+
// 1) Here we can query caller's SP offset since prolog will be generated after final frame layout.
7582+
// 2) caller's SP relative offset to FramePointer will be negative. We need to add absolute value
7583+
// of that offset to FramePointer to obtain caller's SP value.
7584+
assert(compiler->lvaOutgoingArgSpaceVar != BAD_VAR_NUM);
7585+
int callerSPOffset = compiler->lvaToCallerSPRelativeOffset(0, isFramePointerUsed());
7586+
getEmitter()->emitIns_R_AR(INS_lea, EA_PTRSIZE, REG_PROFILER_ENTER_ARG_1, genFramePointerReg(), -callerSPOffset);
7587+
7588+
// Can't have a call until we have enough padding for rejit
7589+
genPrologPadForReJit();
7590+
7591+
// We can use any callee trash register (other than RAX, RDI, RSI) for call target.
7592+
// We use R11 here. This will emit either
7593+
// "call ip-relative 32-bit offset" or
7594+
// "mov r11, helper addr; call r11"
7595+
genEmitHelperCall(CORINFO_HELP_PROF_FCN_ENTER, 0, EA_UNKNOWN, REG_DEFAULT_PROFILER_CALL_TARGET);
7596+
7597+
// If initReg is one of RBM_CALLEE_TRASH, then it needs to be zero'ed before using.
7598+
if ((RBM_CALLEE_TRASH & genRegMask(initReg)) != 0)
7599+
{
7600+
*pInitRegZeroed = false;
7601+
}
7602+
7603+
#endif // !defined(UNIX_AMD64_ABI)
7604+
75507605
#elif defined(_TARGET_X86_) || (defined(_TARGET_ARM_) && defined(LEGACY_BACKEND))
75517606

75527607
unsigned saveStackLvl2 = genStackLevel;
@@ -7654,6 +7709,7 @@ void CodeGen::genProfilingEnterCallback(regNumber initReg, bool* pInitRegZeroed)
76547709
//
76557710
void CodeGen::genProfilingLeaveCallback(unsigned helper /*= CORINFO_HELP_PROF_FCN_LEAVE*/)
76567711
{
7712+
76577713
assert((helper == CORINFO_HELP_PROF_FCN_LEAVE) || (helper == CORINFO_HELP_PROF_FCN_TAILCALL));
76587714

76597715
// Only hook if profiler says it's okay.
@@ -7667,7 +7723,8 @@ void CodeGen::genProfilingLeaveCallback(unsigned helper /*= CORINFO_HELP_PROF_FC
76677723
// Need to save on to the stack level, since the helper call will pop the argument
76687724
unsigned saveStackLvl2 = genStackLevel;
76697725

7670-
#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI) // No profiling for System V systems yet.
7726+
#if defined(_TARGET_AMD64_)
7727+
#if !defined(UNIX_AMD64_ABI)
76717728

76727729
// Since the method needs to make a profiler callback, it should have out-going arg space allocated.
76737730
noway_assert(compiler->lvaOutgoingArgSpaceVar != BAD_VAR_NUM);
@@ -7738,6 +7795,48 @@ void CodeGen::genProfilingLeaveCallback(unsigned helper /*= CORINFO_HELP_PROF_FC
77387795
// "mov r8, helper addr; call r8"
77397796
genEmitHelperCall(helper, 0, EA_UNKNOWN, REG_ARG_2);
77407797

7798+
#else // !defined(UNIX_AMD64_ABI)
7799+
7800+
// RDI = ProfilerMethHnd
7801+
if (compiler->compProfilerMethHndIndirected)
7802+
{
7803+
getEmitter()->emitIns_R_AI(INS_mov, EA_PTR_DSP_RELOC, REG_ARG_0, (ssize_t)compiler->compProfilerMethHnd);
7804+
}
7805+
else
7806+
{
7807+
if (compiler->opts.compJitELTHookEnabled)
7808+
{
7809+
genSetRegToIcon(REG_ARG_0, (ssize_t)compiler->compProfilerMethHnd, TYP_I_IMPL);
7810+
}
7811+
else
7812+
{
7813+
instGen_Set_Reg_To_Imm(EA_8BYTE, REG_ARG_0, (ssize_t)compiler->compProfilerMethHnd);
7814+
}
7815+
}
7816+
7817+
// RSI = caller's SP
7818+
if (compiler->lvaDoneFrameLayout == Compiler::FINAL_FRAME_LAYOUT)
7819+
{
7820+
int callerSPOffset = compiler->lvaToCallerSPRelativeOffset(0, isFramePointerUsed());
7821+
getEmitter()->emitIns_R_AR(INS_lea, EA_PTRSIZE, REG_ARG_1, genFramePointerReg(), -callerSPOffset);
7822+
}
7823+
else
7824+
{
7825+
LclVarDsc* varDsc = compiler->lvaTable;
7826+
NYI_IF((varDsc == nullptr) || !varDsc->lvIsParam, "Profiler ELT callback for a method without any params");
7827+
7828+
// lea rdx, [FramePointer + Arg0's offset]
7829+
getEmitter()->emitIns_R_S(INS_lea, EA_PTRSIZE, REG_ARG_1, 0, 0);
7830+
}
7831+
7832+
// We can use any callee trash register (other than RAX, RDI, RSI) for call target.
7833+
// We use R11 here. This will emit either
7834+
// "call ip-relative 32-bit offset" or
7835+
// "mov r11, helper addr; call r11"
7836+
genEmitHelperCall(helper, 0, EA_UNKNOWN, REG_DEFAULT_PROFILER_CALL_TARGET);
7837+
7838+
#endif // !defined(UNIX_AMD64_ABI)
7839+
77417840
#elif defined(_TARGET_X86_)
77427841

77437842
//
@@ -8179,6 +8278,14 @@ void CodeGen::genFinalizeFrame()
81798278
regSet.rsSetRegsModified(RBM_INT_CALLEE_SAVED & ~RBM_FPBASE);
81808279
}
81818280

8281+
#ifdef UNIX_AMD64_ABI
8282+
// On Unix x64 we also save R14 and R15 for ELT profiler hook generation.
8283+
if (compiler->compIsProfilerHookNeeded())
8284+
{
8285+
regSet.rsSetRegsModified(RBM_PROFILER_ENTER_ARG_0 | RBM_PROFILER_ENTER_ARG_1);
8286+
}
8287+
#endif
8288+
81828289
/* Count how many callee-saved registers will actually be saved (pushed) */
81838290

81848291
// EBP cannot be (directly) modified for EBP frame and double-aligned frames

src/jit/codegenxarch.cpp

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1219,16 +1219,51 @@ void CodeGen::genReturn(GenTreePtr treeNode)
12191219
// Since we are invalidating the assumption that we would slip into the epilog
12201220
// right after the "return", we need to preserve the return reg's GC state
12211221
// across the call until actual method return.
1222+
ReturnTypeDesc retTypeDesc;
1223+
unsigned regCount;
1224+
if (compiler->compMethodReturnsMultiRegRetType())
1225+
{
1226+
if (varTypeIsLong(compiler->info.compRetNativeType))
1227+
{
1228+
retTypeDesc.InitializeLongReturnType(compiler);
1229+
}
1230+
else // we must have a struct return type
1231+
{
1232+
retTypeDesc.InitializeStructReturnType(compiler, compiler->info.compMethodInfo->args.retTypeClass);
1233+
}
1234+
regCount = retTypeDesc.GetReturnRegCount();
1235+
}
1236+
12221237
if (varTypeIsGC(compiler->info.compRetType))
12231238
{
12241239
gcInfo.gcMarkRegPtrVal(REG_INTRET, compiler->info.compRetType);
12251240
}
1241+
else if (compiler->compMethodReturnsMultiRegRetType())
1242+
{
1243+
for (unsigned i = 0; i < regCount; ++i)
1244+
{
1245+
if (varTypeIsGC(retTypeDesc.GetReturnRegType(i)))
1246+
{
1247+
gcInfo.gcMarkRegPtrVal(retTypeDesc.GetABIReturnReg(i), retTypeDesc.GetReturnRegType(i));
1248+
}
1249+
}
1250+
}
12261251

12271252
genProfilingLeaveCallback();
12281253

12291254
if (varTypeIsGC(compiler->info.compRetType))
12301255
{
1231-
gcInfo.gcMarkRegSetNpt(REG_INTRET);
1256+
gcInfo.gcMarkRegSetNpt(genRegMask(REG_INTRET));
1257+
}
1258+
else if (compiler->compMethodReturnsMultiRegRetType())
1259+
{
1260+
for (unsigned i = 0; i < regCount; ++i)
1261+
{
1262+
if (varTypeIsGC(retTypeDesc.GetReturnRegType(i)))
1263+
{
1264+
gcInfo.gcMarkRegSetNpt(genRegMask(retTypeDesc.GetABIReturnReg(i)));
1265+
}
1266+
}
12321267
}
12331268
}
12341269
#endif
@@ -8243,7 +8278,6 @@ void CodeGen::genPutStructArgStk(GenTreePutArgStk* putArgStk)
82438278
var_types memType = (gcPtrs[i] == TYPE_GC_REF) ? TYP_REF : TYP_BYREF;
82448279
getEmitter()->emitIns_R_AR(ins_Load(memType), emitTypeSize(memType), REG_RCX, REG_RSI, 0);
82458280
genStoreRegToStackArg(memType, REG_RCX, i * TARGET_POINTER_SIZE);
8246-
82478281
#ifdef DEBUG
82488282
numGCSlotsCopied++;
82498283
#endif // DEBUG

src/jit/compiler.cpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6839,6 +6839,29 @@ void Compiler::GetStructTypeOffset(const SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSIN
68396839
*type1 = GetEightByteType(structDesc, 1);
68406840
}
68416841
}
6842+
6843+
//------------------------------------------------------------------------------------------------------
6844+
// GetStructTypeOffset: Gets the type, size and offset of the eightbytes of a struct for System V systems.
6845+
//
6846+
// Arguments:
6847+
// 'typeHnd' - type handle
6848+
// 'type0' - out param; returns the type of the first eightbyte.
6849+
// 'type1' - out param; returns the type of the second eightbyte.
6850+
// 'offset0' - out param; returns the offset of the first eightbyte.
6851+
// 'offset1' - out param; returns the offset of the second eightbyte.
6852+
//
6853+
void Compiler::GetStructTypeOffset(CORINFO_CLASS_HANDLE typeHnd,
6854+
var_types* type0,
6855+
var_types* type1,
6856+
unsigned __int8* offset0,
6857+
unsigned __int8* offset1)
6858+
{
6859+
SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR structDesc;
6860+
eeGetSystemVAmd64PassStructInRegisterDescriptor(typeHnd, &structDesc);
6861+
assert(structDesc.passedInRegisters);
6862+
GetStructTypeOffset(structDesc, type0, type1, offset0, offset1);
6863+
}
6864+
68426865
#endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
68436866

68446867
/*****************************************************************************/

src/jit/compiler.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9253,11 +9253,19 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
92539253
static var_types GetTypeFromClassificationAndSizes(SystemVClassificationType classType, int size);
92549254
static var_types GetEightByteType(const SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR& structDesc,
92559255
unsigned slotNum);
9256+
92569257
static void GetStructTypeOffset(const SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR& structDesc,
92579258
var_types* type0,
92589259
var_types* type1,
92599260
unsigned __int8* offset0,
92609261
unsigned __int8* offset1);
9262+
9263+
void GetStructTypeOffset(CORINFO_CLASS_HANDLE typeHnd,
9264+
var_types* type0,
9265+
var_types* type1,
9266+
unsigned __int8* offset0,
9267+
unsigned __int8* offset1);
9268+
92619269
void fgMorphSystemVStructArgs(GenTreeCall* call, bool hasStructArgument);
92629270
#endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
92639271

src/jit/target.h

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -830,6 +830,13 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits
830830
#define RBM_FLT_CALLEE_SAVED (0)
831831
#define RBM_FLT_CALLEE_TRASH (RBM_XMM0|RBM_XMM1|RBM_XMM2|RBM_XMM3|RBM_XMM4|RBM_XMM5|RBM_XMM6|RBM_XMM7| \
832832
RBM_XMM8|RBM_XMM9|RBM_XMM10|RBM_XMM11|RBM_XMM12|RBM_XMM13|RBM_XMM14|RBM_XMM15)
833+
#define REG_PROFILER_ENTER_ARG_0 REG_R14
834+
#define RBM_PROFILER_ENTER_ARG_0 RBM_R14
835+
#define REG_PROFILER_ENTER_ARG_1 REG_R15
836+
#define RBM_PROFILER_ENTER_ARG_1 RBM_R15
837+
838+
#define REG_DEFAULT_PROFILER_CALL_TARGET REG_R11
839+
833840
#else // !UNIX_AMD64_ABI
834841
#define MIN_ARG_AREA_FOR_CALL (4 * REGSIZE_BYTES) // Minimum required outgoing argument space for a call.
835842

@@ -976,7 +983,7 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits
976983
// profiler.
977984
#define REG_DEFAULT_HELPER_CALL_TARGET REG_RAX
978985

979-
// GenericPInvokeCalliHelper VASigCookie Parameter
986+
// GenericPInvokeCalliHelper VASigCookie Parameter
980987
#define REG_PINVOKE_COOKIE_PARAM REG_R11
981988
#define RBM_PINVOKE_COOKIE_PARAM RBM_R11
982989
#define PREDICT_REG_PINVOKE_COOKIE_PARAM PREDICT_REG_R11

src/vm/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -363,6 +363,7 @@ else(WIN32)
363363

364364
if(CLR_CMAKE_TARGET_ARCH_AMD64)
365365
set(VM_SOURCES_WKS_ARCH_ASM
366+
${ARCH_SOURCES_DIR}/asmhelpers.S
366367
${ARCH_SOURCES_DIR}/calldescrworkeramd64.S
367368
${ARCH_SOURCES_DIR}/crthelpers.S
368369
${ARCH_SOURCES_DIR}/externalmethodfixupthunk.S

0 commit comments

Comments
 (0)