Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Commit 24e48d9

Browse files
committed
implement profiler ELT callbacks for AMD64 Linux
1 parent 45d0444 commit 24e48d9

File tree

10 files changed

+521
-23
lines changed

10 files changed

+521
-23
lines changed

Documentation/botr/clr-abi.md

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -585,9 +585,9 @@ The CLR unwinder assumes any non-leaf frame was unwound as a result of a call. T
585585

586586
If the JIT gets passed `CORJIT_FLG_PROF_ENTERLEAVE`, then the JIT might need to insert native entry/exit/tail call probes. To determine for sure, the JIT must call GetProfilingHandle. This API returns as out parameters, the true dynamic boolean indicating if the JIT should actually insert the probes and a parameter to pass to the callbacks (typed as void*), with an optional indirection (used for NGEN). This parameter is always the first argument to all of the call-outs (thus placed in the usual first argument register `RCX` (AMD64) or `R0` (ARM, ARM64)).
587587

588-
Outside of the prolog (in a GC interruptible location), the JIT injects a call to `CORINFO_HELP_PROF_FCN_ENTER`. For AMD64, all argument registers will be homed into their caller-allocated stack locations (similar to varargs). For ARM and ARM64, all arguments are prespilled (again similar to varargs).
588+
Outside of the prolog (in a GC interruptible location), the JIT injects a call to `CORINFO_HELP_PROF_FCN_ENTER`. For AMD64, on Windows all argument registers will be homed into their caller-allocated stack locations (similar to varargs), on Unix all argument registers will be stored in the inner structure. For ARM and ARM64, all arguments are prespilled (again similar to varargs).
589589

590-
After computing the return value and storing it in the correct register, but before any epilog code (including before a possible GS cookie check), the JIT injects a call to `CORINFO_HELP_PROF_FCN_LEAVE`. For AMD64 this call must preserve the return register: `RAX` or `XMM0`. For ARM, the return value will be moved from `R0` to `R2` (if it was in `R0`), `R1`, `R2`, and `S0/D0` must be preserved by the callee (longs will be `R2`, `R1` - note the unusual ordering of the registers, floats in `S0`, doubles in `D0`, smaller integrals in `R2`).
590+
After computing the return value and storing it in the correct register, but before any epilog code (including before a possible GS cookie check), the JIT injects a call to `CORINFO_HELP_PROF_FCN_LEAVE`. For AMD64 this call must preserve the return register: `RAX` or `XMM0` on Windows and `RAX` and `RDX` or `XMM0` and `XMM1` on Unix. For ARM, the return value will be moved from `R0` to `R2` (if it was in `R0`), `R1`, `R2`, and `S0/D0` must be preserved by the callee (longs will be `R2`, `R1` - note the unusual ordering of the registers, floats in `S0`, doubles in `D0`, smaller integrals in `R2`).
591591

592592
TODO: describe ARM64 profile leave conventions.
593593

@@ -667,3 +667,35 @@ The general rules outlined in the System V x86_64 ABI (described at http://www.x
667667
3. The JIT proactively generates frame register frames (with `RBP` as a frame register) in order to aid the native OS tooling for stack unwinding and the like.
668668
4. All the other internal VM contracts for PInvoke, EH, and generic support remains in place. Please see the relevant sections above for more details. Note, however, that the registers used are different on System V due to the different calling convention. For example, the integer argument registers are, in order, RDI, RSI, RDX, RCX, R8, and R9. Thus, where the first argument (typically, the "this" pointer) on Windows AMD64 goes in RCX, on System V it goes in RDI, and so forth.
669669
5. Structs with explicit layout are always passed by value on the stack.
670+
6. The following table describes register usage according to the System V x86_64 ABI
671+
672+
```
673+
| Register | Usage | Preserved across |
674+
| | | function calls |
675+
|--------------|-----------------------------------------|-------------------|
676+
| %rax | temporary register; with variable argu- | No |
677+
| | ments passes information about the | |
678+
| | number of SSE registers used; | |
679+
| | 1st return argument | |
680+
| %rbx | callee-saved register; optionally used | Yes |
681+
| | as base pointer | |
682+
| %rcx | used to pass 4st integer argument to | No |
683+
| | to functions | |
684+
| %rdx | used to pass 3rd argument to functions | No |
685+
| | 2nd return register | |
686+
| %rsp | stack pointer | Yes |
687+
| %rbp | callee-saved register; optionally used | Yes |
688+
| | as frame pointer | |
689+
| %rsi | used to pass 2nd argument to functions | No |
690+
| %rdi | used to pass 1st argument to functions | No |
691+
| %r8 | used to pass 5th argument to functions | No |
692+
| %r9 | used to pass 6th argument to functions | No |
693+
| %r10 | temporary register, used for passing a | No |
694+
| | function's static chain pointer | |
695+
| %r11 | temporary register | No |
696+
| %r12-%r15 | callee-saved registers | Yes |
697+
| %xmm0-%xmm1 | used to pass and return floating point | No |
698+
| | arguments | |
699+
| %xmm2-%xmm7 | used to pass floating point arguments | No |
700+
| %xmm8-%xmm15 | temporary registers | No |
701+
```

src/jit/codegencommon.cpp

Lines changed: 110 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4434,7 +4434,9 @@ void CodeGen::genFnPrologCalleeRegArgs(regNumber xtraReg, bool* pXtraRegClobbere
44344434
if ((regSet.rsMaskPreSpillRegs(false) & genRegMask(regNum)) == 0)
44354435
#endif // _TARGET_ARM_
44364436
{
4437-
noway_assert(xtraReg != varDsc->lvArgReg + i);
4437+
#if !defined(UNIX_AMD64_ABI)
4438+
noway_assert(xtraReg != (varDsc->lvArgReg + i));
4439+
#endif
44384440
noway_assert(regArgMaskLive & genRegMask(regNum));
44394441
}
44404442

@@ -7440,7 +7442,9 @@ void CodeGen::genProfilingEnterCallback(regNumber initReg, bool* pInitRegZeroed)
74407442
return;
74417443
}
74427444

7443-
#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI) // No profiling for System V systems yet.
7445+
#if defined(_TARGET_AMD64_)
7446+
#if !defined(UNIX_AMD64_ABI)
7447+
74447448
unsigned varNum;
74457449
LclVarDsc* varDsc;
74467450

@@ -7549,6 +7553,57 @@ void CodeGen::genProfilingEnterCallback(regNumber initReg, bool* pInitRegZeroed)
75497553
*pInitRegZeroed = false;
75507554
}
75517555

7556+
#else // !defined(UNIX_AMD64_ABI)
7557+
7558+
// Emit profiler EnterCallback(ProfilerMethHnd, caller's SP)
7559+
// R14 = ProfilerMethHnd
7560+
if (compiler->compProfilerMethHndIndirected)
7561+
{
7562+
// Profiler hooks enabled during Ngen time.
7563+
// Profiler handle needs to be accessed through an indirection of a pointer.
7564+
getEmitter()->emitIns_R_AI(INS_mov, EA_PTR_DSP_RELOC, REG_PROFILER_ENTER_ARG_0,
7565+
(ssize_t)compiler->compProfilerMethHnd);
7566+
}
7567+
else
7568+
{
7569+
// No need to record relocations, if we are generating ELT hooks under the influence
7570+
// of COMPlus_JitELTHookEnabled=1
7571+
if (compiler->opts.compJitELTHookEnabled)
7572+
{
7573+
genSetRegToIcon(REG_PROFILER_ENTER_ARG_0, (ssize_t)compiler->compProfilerMethHnd, TYP_I_IMPL);
7574+
}
7575+
else
7576+
{
7577+
instGen_Set_Reg_To_Imm(EA_8BYTE, REG_PROFILER_ENTER_ARG_0, (ssize_t)compiler->compProfilerMethHnd);
7578+
}
7579+
}
7580+
7581+
// R15 = caller's SP
7582+
// Notes
7583+
// 1) Here we can query caller's SP offset since prolog will be generated after final frame layout.
7584+
// 2) caller's SP relative offset to FramePointer will be negative. We need to add absolute value
7585+
// of that offset to FramePointer to obtain caller's SP value.
7586+
assert(compiler->lvaOutgoingArgSpaceVar != BAD_VAR_NUM);
7587+
int callerSPOffset = compiler->lvaToCallerSPRelativeOffset(0, isFramePointerUsed());
7588+
getEmitter()->emitIns_R_AR(INS_lea, EA_PTRSIZE, REG_PROFILER_ENTER_ARG_1, genFramePointerReg(), -callerSPOffset);
7589+
7590+
// Can't have a call until we have enough padding for rejit
7591+
genPrologPadForReJit();
7592+
7593+
// We can use any callee trash register (other than RAX, RDI, RSI) for call target.
7594+
// We use R11 here. This will emit either
7595+
// "call ip-relative 32-bit offset" or
7596+
// "mov r11, helper addr; call r11"
7597+
genEmitHelperCall(CORINFO_HELP_PROF_FCN_ENTER, 0, EA_UNKNOWN, REG_DEFAULT_PROFILER_CALL_TARGET);
7598+
7599+
// If initReg is one of RBM_CALLEE_TRASH, then it needs to be zero'ed before using.
7600+
if ((RBM_CALLEE_TRASH & genRegMask(initReg)) != 0)
7601+
{
7602+
*pInitRegZeroed = false;
7603+
}
7604+
7605+
#endif // !defined(UNIX_AMD64_ABI)
7606+
75527607
#elif defined(_TARGET_X86_) || (defined(_TARGET_ARM_) && defined(LEGACY_BACKEND))
75537608

75547609
unsigned saveStackLvl2 = genStackLevel;
@@ -7656,6 +7711,7 @@ void CodeGen::genProfilingEnterCallback(regNumber initReg, bool* pInitRegZeroed)
76567711
//
76577712
void CodeGen::genProfilingLeaveCallback(unsigned helper /*= CORINFO_HELP_PROF_FCN_LEAVE*/)
76587713
{
7714+
76597715
assert((helper == CORINFO_HELP_PROF_FCN_LEAVE) || (helper == CORINFO_HELP_PROF_FCN_TAILCALL));
76607716

76617717
// Only hook if profiler says it's okay.
@@ -7669,7 +7725,8 @@ void CodeGen::genProfilingLeaveCallback(unsigned helper /*= CORINFO_HELP_PROF_FC
76697725
// Need to save on to the stack level, since the helper call will pop the argument
76707726
unsigned saveStackLvl2 = genStackLevel;
76717727

7672-
#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI) // No profiling for System V systems yet.
7728+
#if defined(_TARGET_AMD64_)
7729+
#if !defined(UNIX_AMD64_ABI)
76737730

76747731
// Since the method needs to make a profiler callback, it should have out-going arg space allocated.
76757732
noway_assert(compiler->lvaOutgoingArgSpaceVar != BAD_VAR_NUM);
@@ -7740,6 +7797,48 @@ void CodeGen::genProfilingLeaveCallback(unsigned helper /*= CORINFO_HELP_PROF_FC
77407797
// "mov r8, helper addr; call r8"
77417798
genEmitHelperCall(helper, 0, EA_UNKNOWN, REG_ARG_2);
77427799

7800+
#else // !defined(UNIX_AMD64_ABI)
7801+
7802+
// RDI = ProfilerMethHnd
7803+
if (compiler->compProfilerMethHndIndirected)
7804+
{
7805+
getEmitter()->emitIns_R_AI(INS_mov, EA_PTR_DSP_RELOC, REG_ARG_0, (ssize_t)compiler->compProfilerMethHnd);
7806+
}
7807+
else
7808+
{
7809+
if (compiler->opts.compJitELTHookEnabled)
7810+
{
7811+
genSetRegToIcon(REG_ARG_0, (ssize_t)compiler->compProfilerMethHnd, TYP_I_IMPL);
7812+
}
7813+
else
7814+
{
7815+
instGen_Set_Reg_To_Imm(EA_8BYTE, REG_ARG_0, (ssize_t)compiler->compProfilerMethHnd);
7816+
}
7817+
}
7818+
7819+
// RSI = caller's SP
7820+
if (compiler->lvaDoneFrameLayout == Compiler::FINAL_FRAME_LAYOUT)
7821+
{
7822+
int callerSPOffset = compiler->lvaToCallerSPRelativeOffset(0, isFramePointerUsed());
7823+
getEmitter()->emitIns_R_AR(INS_lea, EA_PTRSIZE, REG_ARG_1, genFramePointerReg(), -callerSPOffset);
7824+
}
7825+
else
7826+
{
7827+
LclVarDsc* varDsc = compiler->lvaTable;
7828+
NYI_IF((varDsc == nullptr) || !varDsc->lvIsParam, "Profiler ELT callback for a method without any params");
7829+
7830+
// lea rdx, [FramePointer + Arg0's offset]
7831+
getEmitter()->emitIns_R_S(INS_lea, EA_PTRSIZE, REG_ARG_1, 0, 0);
7832+
}
7833+
7834+
// We can use any callee trash register (other than RAX, RDI, RSI) for call target.
7835+
// We use R11 here. This will emit either
7836+
// "call ip-relative 32-bit offset" or
7837+
// "mov r11, helper addr; call r11"
7838+
genEmitHelperCall(helper, 0, EA_UNKNOWN, REG_DEFAULT_PROFILER_CALL_TARGET);
7839+
7840+
#endif // !defined(UNIX_AMD64_ABI)
7841+
77437842
#elif defined(_TARGET_X86_)
77447843

77457844
//
@@ -8181,6 +8280,14 @@ void CodeGen::genFinalizeFrame()
81818280
regSet.rsSetRegsModified(RBM_INT_CALLEE_SAVED & ~RBM_FPBASE);
81828281
}
81838282

8283+
#ifdef UNIX_AMD64_ABI
8284+
// On Unix x64 we also save R14 and R15 for ELT profiler hook generation.
8285+
if (compiler->compIsProfilerHookNeeded())
8286+
{
8287+
regSet.rsSetRegsModified(RBM_PROFILER_ENTER_ARG_0 | RBM_PROFILER_ENTER_ARG_1);
8288+
}
8289+
#endif
8290+
81848291
/* Count how many callee-saved registers will actually be saved (pushed) */
81858292

81868293
// EBP cannot be (directly) modified for EBP frame and double-aligned frames

src/jit/codegenxarch.cpp

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1219,16 +1219,51 @@ void CodeGen::genReturn(GenTreePtr treeNode)
12191219
// Since we are invalidating the assumption that we would slip into the epilog
12201220
// right after the "return", we need to preserve the return reg's GC state
12211221
// across the call until actual method return.
1222+
ReturnTypeDesc retTypeDesc;
1223+
unsigned regCount;
1224+
if (compiler->compMethodReturnsMultiRegRetType())
1225+
{
1226+
if (varTypeIsLong(compiler->info.compRetNativeType))
1227+
{
1228+
retTypeDesc.InitializeLongReturnType(compiler);
1229+
}
1230+
else // we must have a struct return type
1231+
{
1232+
retTypeDesc.InitializeStructReturnType(compiler, compiler->info.compMethodInfo->args.retTypeClass);
1233+
}
1234+
regCount = retTypeDesc.GetReturnRegCount();
1235+
}
1236+
12221237
if (varTypeIsGC(compiler->info.compRetType))
12231238
{
12241239
gcInfo.gcMarkRegPtrVal(REG_INTRET, compiler->info.compRetType);
12251240
}
1241+
else if (compiler->compMethodReturnsMultiRegRetType())
1242+
{
1243+
for (unsigned i = 0; i < regCount; ++i)
1244+
{
1245+
if (varTypeIsGC(retTypeDesc.GetReturnRegType(i)))
1246+
{
1247+
gcInfo.gcMarkRegPtrVal(retTypeDesc.GetABIReturnReg(i), retTypeDesc.GetReturnRegType(i));
1248+
}
1249+
}
1250+
}
12261251

12271252
genProfilingLeaveCallback();
12281253

12291254
if (varTypeIsGC(compiler->info.compRetType))
12301255
{
1231-
gcInfo.gcMarkRegSetNpt(REG_INTRET);
1256+
gcInfo.gcMarkRegSetNpt(genRegMask(REG_INTRET));
1257+
}
1258+
else if (compiler->compMethodReturnsMultiRegRetType())
1259+
{
1260+
for (unsigned i = 0; i < regCount; ++i)
1261+
{
1262+
if (varTypeIsGC(retTypeDesc.GetReturnRegType(i)))
1263+
{
1264+
gcInfo.gcMarkRegSetNpt(genRegMask(retTypeDesc.GetABIReturnReg(i)));
1265+
}
1266+
}
12321267
}
12331268
}
12341269
#endif
@@ -8297,7 +8332,6 @@ void CodeGen::genPutStructArgStk(GenTreePutArgStk* putArgStk)
82978332
var_types memType = (gcPtrs[i] == TYPE_GC_REF) ? TYP_REF : TYP_BYREF;
82988333
getEmitter()->emitIns_R_AR(ins_Load(memType), emitTypeSize(memType), REG_RCX, REG_RSI, 0);
82998334
genStoreRegToStackArg(memType, REG_RCX, i * TARGET_POINTER_SIZE);
8300-
83018335
#ifdef DEBUG
83028336
numGCSlotsCopied++;
83038337
#endif // DEBUG

src/jit/compiler.cpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6670,6 +6670,29 @@ void Compiler::GetStructTypeOffset(const SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSIN
66706670
*type1 = GetEightByteType(structDesc, 1);
66716671
}
66726672
}
6673+
6674+
//------------------------------------------------------------------------------------------------------
6675+
// GetStructTypeOffset: Gets the type, size and offset of the eightbytes of a struct for System V systems.
6676+
//
6677+
// Arguments:
6678+
// 'typeHnd' - type handle
6679+
// 'type0' - out param; returns the type of the first eightbyte.
6680+
// 'type1' - out param; returns the type of the second eightbyte.
6681+
// 'offset0' - out param; returns the offset of the first eightbyte.
6682+
// 'offset1' - out param; returns the offset of the second eightbyte.
6683+
//
6684+
void Compiler::GetStructTypeOffset(CORINFO_CLASS_HANDLE typeHnd,
6685+
var_types* type0,
6686+
var_types* type1,
6687+
unsigned __int8* offset0,
6688+
unsigned __int8* offset1)
6689+
{
6690+
SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR structDesc;
6691+
eeGetSystemVAmd64PassStructInRegisterDescriptor(typeHnd, &structDesc);
6692+
assert(structDesc.passedInRegisters);
6693+
GetStructTypeOffset(structDesc, type0, type1, offset0, offset1);
6694+
}
6695+
66736696
#endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
66746697

66756698
/*****************************************************************************/

src/jit/compiler.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9251,11 +9251,19 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
92519251
static var_types GetTypeFromClassificationAndSizes(SystemVClassificationType classType, int size);
92529252
static var_types GetEightByteType(const SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR& structDesc,
92539253
unsigned slotNum);
9254+
92549255
static void GetStructTypeOffset(const SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR& structDesc,
92559256
var_types* type0,
92569257
var_types* type1,
92579258
unsigned __int8* offset0,
92589259
unsigned __int8* offset1);
9260+
9261+
void GetStructTypeOffset(CORINFO_CLASS_HANDLE typeHnd,
9262+
var_types* type0,
9263+
var_types* type1,
9264+
unsigned __int8* offset0,
9265+
unsigned __int8* offset1);
9266+
92599267
void fgMorphSystemVStructArgs(GenTreeCall* call, bool hasStructArgument);
92609268
#endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
92619269

src/jit/target.h

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -830,6 +830,13 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits
830830
#define RBM_FLT_CALLEE_SAVED (0)
831831
#define RBM_FLT_CALLEE_TRASH (RBM_XMM0|RBM_XMM1|RBM_XMM2|RBM_XMM3|RBM_XMM4|RBM_XMM5|RBM_XMM6|RBM_XMM7| \
832832
RBM_XMM8|RBM_XMM9|RBM_XMM10|RBM_XMM11|RBM_XMM12|RBM_XMM13|RBM_XMM14|RBM_XMM15)
833+
#define REG_PROFILER_ENTER_ARG_0 REG_R14
834+
#define RBM_PROFILER_ENTER_ARG_0 RBM_R14
835+
#define REG_PROFILER_ENTER_ARG_1 REG_R15
836+
#define RBM_PROFILER_ENTER_ARG_1 RBM_R15
837+
838+
#define REG_DEFAULT_PROFILER_CALL_TARGET REG_R11
839+
833840
#else // !UNIX_AMD64_ABI
834841
#define MIN_ARG_AREA_FOR_CALL (4 * REGSIZE_BYTES) // Minimum required outgoing argument space for a call.
835842

@@ -976,7 +983,7 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits
976983
// profiler.
977984
#define REG_DEFAULT_HELPER_CALL_TARGET REG_RAX
978985

979-
// GenericPInvokeCalliHelper VASigCookie Parameter
986+
// GenericPInvokeCalliHelper VASigCookie Parameter
980987
#define REG_PINVOKE_COOKIE_PARAM REG_R11
981988
#define RBM_PINVOKE_COOKIE_PARAM RBM_R11
982989
#define PREDICT_REG_PINVOKE_COOKIE_PARAM PREDICT_REG_R11

src/vm/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,7 @@ else(WIN32)
357357

358358
if(CLR_CMAKE_TARGET_ARCH_AMD64)
359359
set(VM_SOURCES_WKS_ARCH_ASM
360+
${ARCH_SOURCES_DIR}/asmhelpers.S
360361
${ARCH_SOURCES_DIR}/calldescrworkeramd64.S
361362
${ARCH_SOURCES_DIR}/crthelpers.S
362363
${ARCH_SOURCES_DIR}/externalmethodfixupthunk.S

0 commit comments

Comments
 (0)