-
Notifications
You must be signed in to change notification settings - Fork 2.6k
implementing profiler ELT callbacks for AMD64 Linux #7719
Conversation
There's a big "TODO" in codegenxarch.cpp, genReturn(), that it looks like you haven't addressed yet:
|
cc @dotnet/jit-contrib |
@BruceForstall thanks for your comment! Is it not enough to add push rax; push rdx; ... ; pop rdx; pop rax; into ProfileLeaveNaked for this TODO? |
The issue isn't the values in registers getting preserved, it is proper reporting of the GC information for those registers. If you look at Compiler::impFixupStructReturnType(), you can see how multi-reg returns of 8-15 byte structs are massaged:
In genReturn(), I believe that under |
@BruceForstall I've placed the following code in the CodeGen::genReturn function
but I got the following assertion
Could you give me a hint to perform it by the right way? Thanks in advance |
@BruceForstall as I see the current version of coreclr generate the following code for a method that returns struct value
Could you give some sample template of generated code for this case ? |
On unix structs of size > 8 and <=16 gets returned in two return registers RAX/XMM0 and RDX/XMM1. Consider a struct with one or two gc-ref type fields being returned. The gc-ref could be in RAX or RDX. Whichever register contains gc-ref that needs to be explicitly reported. In the code that you have posted above you are iterating through each of the 8-bytes of the struct and always marking REG_INTRET (which is RAX) as gcptr or non-ptr. That could be the reason why you are hitting an assert. Here is the right way to do it if method is returning multi-reg return type struct
Apart from the generated code, what you need to verify is that gcinfo is correctly reported for return regs containing gc-refs around the call to profile leave callback. |
@sivarv many thanks for your hint |
src/jit/codegencommon.cpp
Outdated
unsigned __int8 offset1 = 0; | ||
var_types type0 = TYP_UNKNOWN; | ||
var_types type1 = TYP_UNKNOWN; | ||
compiler->GetStructTypeOffset(structDesc, &type0, &type1, &offset0, &offset1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this code got duplicated. How about abstracting this in Compiler::GetStructTypeOffset() overload whose first argument is a CORINFO_CLASS_HANDLE and has out params to return offset and type of eightbytes of the struct?
src/jit/codegencommon.cpp
Outdated
genEmitHelperCall(CORINFO_HELP_PROF_FCN_ENTER, 0, EA_UNKNOWN, REG_ARG_2); | ||
#else | ||
genEmitHelperCall(CORINFO_HELP_PROF_FCN_ENTER, 0, EA_UNKNOWN, REG_R11); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason why we want to use REG_ARG2 on amd64 windows and REG_R11 on Unix?
Please note that without specifying a callTargetReg, genEmitHelperCall() would use REG_RAX.
src/jit/codegencommon.cpp
Outdated
#if FEATURE_VARARG | ||
if (compiler->info.compIsVarArgs && varTypeIsFloating(loadType)) | ||
#if defined(UNIX_AMD64_ABI) | ||
#ifdef FEATURE_UNIX_AMD64_STRUCT_PASSING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think FEATURE_UNIX_AMD64_STRUCT_PASSING #ifdef alone is sufficient here.
src/jit/codegencommon.cpp
Outdated
regNumber argReg = varDsc->lvArgReg; | ||
getEmitter()->emitIns_S_R(ins_Store(storeType), emitTypeSize(storeType), argReg, varNum, 0); | ||
#if defined(UNIX_AMD64_ABI) | ||
#ifdef FEATURE_UNIX_AMD64_STRUCT_PASSING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think UNIX_AMD64_ABI is not required here.
src/jit/codegencommon.cpp
Outdated
genEmitHelperCall(helper, 0, EA_UNKNOWN, REG_ARG_2); | ||
#else | ||
genEmitHelperCall(helper, 0, EA_UNKNOWN, REG_R11); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason why we don't want to use default call target reg (rax)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sivarv I guess that it would be better to use r10 & r11 for intermediate computations on Linux
src/jit/codegenxarch.cpp
Outdated
@@ -1152,6 +1152,39 @@ void CodeGen::genReturn(GenTreePtr treeNode) | |||
// Also, there is not much to be gained by materializing it as an explicit node. | |||
if (compiler->compCurBB == compiler->genReturnBB) | |||
{ | |||
#if defined(_TARGET_AMD64_) | |||
#ifdef FEATURE_UNIX_AMD64_STRUCT_PASSING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This #ifdef is redundant since FEATURE_UNIX_AMD64_STRUCT_PASSING is defined only on amd64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you mean is that the first #if
is redundant, as it is implied by the #ifdef
In reply to: 87924333 [](ancestors = 87924333)
src/jit/codegenxarch.cpp
Outdated
|
||
unsigned regCount = retTypeDesc.GetReturnRegCount(); | ||
|
||
if (varTypeIsGC(compiler->info.compRetType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (varTypeIsGC(compiler->info.compRetType)) [](start = 1, length = 55)
This condition would always be false if the return type is a struct (i.e. TYP_STRUCT) returned in two regs.
This if-check is not required and should be deleted. This comment also applies to if-check on line 1176.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sivarv I tried it but I have got the assertion in codegenlinear.cpp
System.RuntimeTypeHandle:GetMetadataImport(ref):struct: Regset after BB07 gcr=00000001 {rax}, byr=00000000 {}, regVars=00000000 {}
Assert failure(PID 27642 [0x00006bfa], Thread: 27642 [0x6bfa]): Assertion failed 'nonVarPtrRegs == RBM_NONE' in 'System.RuntimeTypeHandle:GetMetadataImport(ref):struct' (IL size 18)
File: /home/signatov/TEST/historical_debugging/coreclr/src/jit/codegenlinear.cpp Line: 422
Image: /home/signatov/TEST/historical_debugging/coreclr/bin/Product/Linux.x64.Debug/corerun
In my test case this assertion arises only on the forth method where the code with gcMarkRegPtrVal was inserted. Three methods with this code
System.RuntimeTypeHandle:GetIntroducedMethods(ref):struct
IntroducedMethodEnumerator:GetEnumerator():struct:this
System.RuntimeMethodHandle:GetUtf8Name(long):struct
are compiled successfully before this assertion. May be you have some ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sivarv I inserted debug prints and see the following
=== System.RuntimeTypeHandle:GetMetadataImport(ref):struct: regCount 2
00000000 00000000
in GCInfo::gcMarkRegPtrVal type 0xe TYP_LONG 0xa TYP_REF 0xe TYP_BYREF 0xf
in GCInfo::gcMarkRegSetGCref
GC regs: 00000000 {} => 00000001 {rax}
Byref regs: (unchanged) 00000000 {}
00000001 00000000
in GCInfo::gcMarkRegPtrVal type 0xa TYP_LONG 0xa TYP_REF 0xe TYP_BYREF 0xf
in GCInfo::gcMarkRegSetNpt
GC regs: (unchanged) 00000001 {rax}
Byref regs: (unchanged) 00000000 {}
00000001 00000000
--------->>>>>>>>
in GCInfo::gcMarkRegSetNpt
GC regs: (unchanged) 00000001 {rax}
Byref regs: (unchanged) 00000000 {}
in GCInfo::gcMarkRegSetNpt
GC regs: (unchanged) 00000001 {rax}
Byref regs: (unchanged) 00000000 {}
System.RuntimeTypeHandle:GetMetadataImport(ref):struct: Regset after BB07 gcr=00000001 {rax}, byr=00000000 {}, regVars=00000000 {}
Assert failure(PID 12181 [0x00002f95], Thread: 12181 [0x2f95]): Assertion failed 'nonVarPtrRegs == RBM_NONE' in 'System.RuntimeTypeHandle:GetMetadataImport(ref):struct' (IL size 18)
File: /home/signatov/TEST/historical_debugging/coreclr/src/jit/codegenlinear.cpp Line: 422
Image: /home/signatov/TEST/historical_debugging/coreclr/bin/Product/Linux.x64.Debug/corerun
In the previous successfully compiled methods arg type is TYP_LONG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see the argument of gcMarkRegSetNpt is regMaskTP not var_types, so I should replace
gcInfo.gcMarkRegSetNpt(var_types);
by
gcInfo.gcMarkRegSetNpt(genRegMask(var_types));
Am I right?
src/jit/codegencommon.cpp
Outdated
@@ -7298,13 +7298,15 @@ void CodeGen::genProfilingEnterCallback(regNumber initReg, bool* pInitRegZeroed) | |||
return; | |||
} | |||
|
|||
#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI) // No profiling for System V systems yet. | |||
#if defined(_TARGET_AMD64_) | |||
unsigned varNum; | |||
LclVarDsc* varDsc; | |||
|
|||
// Since the method needs to make a profiler callback, it should have out-going arg space allocated. | |||
noway_assert(compiler->lvaOutgoingArgSpaceVar != BAD_VAR_NUM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this first noway_assert
also be under the following #if
? I believe it's only defined under FEATURE_FIXED_OUT_ARGS
which is only true for amd64.
src/jit/codegenxarch.cpp
Outdated
gcInfo.gcMarkRegSetNpt(retTypeDesc.GetABIReturnReg(i)); | ||
} | ||
} | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return; [](start = 1, length = 18)
returns in the middle of the method are very hard to notice and could lead to accidental bugs. Instead I would suggest this to restructure this logic as follows. Note that the below logic requires no #ifdefs.
e.g.
ReturnTypeDesc retTypeDesc;
if (compiler->compMethodReturnsMultiRegRetType())
{
CORINFO_CLASS_HANDLE retClsHnd = compiler->info.compMethodInfo->args.retTypeClass;
retTypeDesc.InitializeStructReturnType(compiler, retClsHnd);
}
if (varTypeIsGC(compiler->info.compRetType))
{
gcInfo.gcMarkRegPtrVal(REG_INTRET, compiler->info.compRetType);
}
else if (compiler->compMethodReturnsMultiRegRetType())
{
for (unsigned i = 0; i < retTypeDesc.GetReturnRegCount(); ++i)
{
gcInfo.gcMarkRegPtrVal(retTypeDesc.GetABIReturnReg(i), retTypeDesc.GetReturnRegType(i));
}
}
genProfilingLeaveCallback();
if (varTypeIsGC(compiler->info.compRetType))
{
gcInfo.gcMarkRegSetNpt(REG_INTRET);
}
else if (compiler->compMethodReturnsMultiRegRetType())
{
for (unsigned i = 0; i < retTypeDesc.GetReturnRegCount(); ++i)
{
gcInfo.gcMarkRegSetNpt(retTypeDesc.GetABIReturnReg(i));
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sivarv I think that it is the good idea, thanks
src/jit/codegencommon.cpp
Outdated
@@ -7526,11 +7581,12 @@ void CodeGen::genProfilingLeaveCallback(unsigned helper /*= CORINFO_HELP_PROF_FC | |||
// Need to save on to the stack level, since the helper call will pop the argument | |||
unsigned saveStackLvl2 = genStackLevel; | |||
|
|||
#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI) // No profiling for System V systems yet. | |||
|
|||
#if defined(_TARGET_AMD64_) | |||
// Since the method needs to make a profiler callback, it should have out-going arg space allocated. | |||
noway_assert(compiler->lvaOutgoingArgSpaceVar != BAD_VAR_NUM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too I think that this needs to be only under !UNIX_AMD64_ABI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt sorry, but I didn't quite catch you. This code is for both Unix && Windows on AMD64.
src/vm/amd64/asmhelpers.S
Outdated
// - integer argument registers (rcx, rdx, r8, r9) | ||
// - floating point argument registers (xmm1-3) | ||
// - volatile integer registers (r10, r11) | ||
// - volatile floating point registers (xmm4-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add here something like:
- upper halves of ymm registers on AVX (which are volatile)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt ok, I'll insert it
src/vm/amd64/asmhelpers.S
Outdated
push_argument_register rcx | ||
push_argument_register rdx | ||
push_nonvol_reg r10 | ||
push_nonvol_reg rax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these pushed registers represented in GC info? Or does the GC "know" about these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ELT profiler helpers are considered NO GC helpers. That is GC won't happen within the helper call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great - thanks for clarifying that.
CC : @noahfalk - to review profiler helpers. |
src/vm/amd64/asmhelpers.S
Outdated
// Upon entry : | ||
// rdi = clientInfo | ||
// rsi = profiledRsp | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On windows these arguments are passed in rcx and rdx. Does Linux use a different 64bit ABI in general or is there a reason for the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linux has different ABI for x64. It has 6 argument registers - RDI, RSI, RDX, RCX, R8 and R9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noahfalk According to the Unix ABI, the first 6 integer or pointer arguments to a function are passed in registers. The first is placed in rdi, the second in rsi, the third in rdx, and then rcx, r8 and r9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, makes sense then
src/vm/amd64/asmhelpers.S
Outdated
push_nonvol_reg r10 | ||
push_nonvol_reg rax | ||
|
||
lea rax, [rsp + 0x10] // caller rsp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rarely deal with assembly, but it doesn't seem like this would calculate the caller's rsp correctly?
At this point in the function we have already pushed r15, r14, r13, r12, rbp, rbx, rsi, rdi, r10, and rax. Did those pushes generate code that doesn't update rsp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is not correct. It was this way in the Windows version of this function when only RAX was pushed on the stack.
@sergign60 what is the reason for pushing all the callee saved registers plus RSI, RDI and R10 when Windows version doesn't do that (I guess that's why the function name is ProfileEnterNaked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/vm/amd64/asmhelpers.S
Outdated
|
||
|
||
// setup ProfilePlatformSpecificData structure | ||
xor r11, r11 // nullify r11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use r8 so that we retain consistency with the windows implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noahfalk I guess that it would be better to use r10 & r11 for intermediate computations on Linux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense now that I understand the linux ABI difference
src/vm/amd64/asmhelpers.S
Outdated
// Upon entry : | ||
// rdi = clientInfo | ||
// rsi = profiledRsp | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same questions as above on ProfileEnter
src/vm/amd64/asmhelpers.S
Outdated
// } PROFILE_PLATFORM_SPECIFIC_DATA, *PPROFILE_PLATFORM_SPECIFIC_DATA; | ||
// | ||
.equ SIZEOF_PROFILE_PLATFORM_SPECIFIC_DATA, 0x8*11 + 0x4*2 // includes fudge to make FP_SPILL right | ||
.equ SIZEOF_OUTGOING_ARGUMENT_HOMES, 0x8*6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not right. AMD64 ABI on Linux doesn't home arguments. On Windows, there are always 4 stack slots reserved for storing the 4 argument registers before the return address. So you can just remove this constant and also the OFFSETOF_PLATFORM_SPECIFIC_DATA below, since they are both zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli thanks for this note.
@janvorli |
95b6b51
to
b4c7322
Compare
@janvorli Now I get the another set of failed tests, but the reason is not this profiling callbacks implementation. For example,
fails (segmentation fault) with
both with the profiler's callback calls and when I commented their calls. |
ad9fa9a
to
65eb9e6
Compare
#10857 |
@janvorli I've got the results on Windows. Please look at [AMD64 Windows ELT] Some CoreCLR tests fail with ELT hooks enabled and GCStress & TailcallStress 10857 |
24e48d9
to
e180fe6
Compare
@dotnet-bot test Ubuntu arm Cross Release Build |
@janvorli please see @sergign60 note above. |
src/jit/codegencommon.cpp
Outdated
@@ -7654,6 +7709,7 @@ void CodeGen::genProfilingEnterCallback(regNumber initReg, bool* pInitRegZeroed) | |||
// | |||
void CodeGen::genProfilingLeaveCallback(unsigned helper /*= CORINFO_HELP_PROF_FCN_LEAVE*/) | |||
{ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather remove such new empty lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rartemev fixed
src/vm/amd64/asmhelpers.S
Outdated
// UINT64 flt1; | ||
// UINT64 flt2; | ||
// UINT64 flt3; | ||
// #if defined(UNIX_AMD64_ABI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it source file used only for Unix? If so I think it would be good to remove this #ifdef bracket
src/vm/amd64/asmhelpers.S
Outdated
mov [rsp + 0x88], rdx // -- struct rdx field | ||
mov [rsp + 0x90], rcx // -- struct rcx field | ||
mov [rsp + 0x98], r8 // -- struct r8 field | ||
mov [rsp + 0xa0], r9 // -- struct r9 field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to save those scratch registers (rdi-r9)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rartemev these registers are not scratch registers, they are argument registers.
4d3a75a
to
9200f7a
Compare
This PR implements profiler ELT callbacks for AMD64 Linux. It's tested with historical debugger.
The main differences from appropriate Windows implementation:
@sivarv: I don't think this (regSet.AddMaskVars(REG_R10 | REG_R11 | REG_R12 | REG_R13);) is sufficient to have R10-R13 saved in prolog.
Note then genProfilingEnterCallback() is called after OS prolog and zero init of frame. By the time we are generating Enter callback, we have already generated the method prolog that has saved all required callee saved regs.