Description
I noticed this while doing timings for #9390, where sometimes using not
gave an unexpected performance degradation. This boiled down to not
sometimes leading to very unexpected IL.
Repro steps
Take the following code snippet:
let useNotIsNull (str:string) =
if not(isNull str) then str.Length
else 0
Because not
is coded to return a single IL
instruction with ceq
, and isNull
, while coded with match
, also leads to basically a single ceq
instruction, that we'd end up with two instructions, or, after optimization, a single one. However, it blows up:
IL_0000: ldarg.0
IL_0001: brfalse.s IL_0006
IL_0003: ldc.i4.0
IL_0004: br.s IL_0007
IL_0006: ldc.i4.1
IL_0007: brtrue.s IL_0010
IL_0009: ldarg.0
IL_000a: callvirt instance int32 [System.Private.CoreLib]System.String::get_Length()
IL_000f: ret
IL_0010: ldc.i4.0
IL_0011: ret
Which gets translated in C# as:
public static int useNotIsNull(string str)
{
if (str != null || 1 == 0)
{
return str.Length;
}
return 0;
}
If you were to recreate the not
function as follows:
let not x = match x with true -> false | _ -> true
The same code above would now be encoded in IL as:
IL_0000: ldarg.0
IL_0001: brfalse.s IL_000a
IL_0003: ldarg.0
IL_0004: callvirt instance int32 [System.Private.CoreLib]System.String::get_Length()
IL_0009: ret
IL_000a: ldc.i4.0
IL_000b: ret
And here is the real killer, if we encode not
as itself, the problem also disappears, regardless of whether it is marked as inline
(the original) or not:
let justLikeNot x = not x
let useJustLikeNot (str:string) =
if justLikeNot(isNull str) then str.Length
else 0
Resulting IL:
IL_0000: ldarg.0
IL_0001: brfalse.s IL_000a
IL_0003: ldarg.0
IL_0004: callvirt instance int32 [System.Private.CoreLib]System.String::get_Length()
IL_0009: ret
IL_000a: ldc.i4.0
IL_000b: ret
Strangely, the not
function itself looks exactly the same as the justLikeNot
function above:
IL_0000: ldarg.0
IL_0001: ldc.i4.0
IL_0002: ceq
IL_0004: ret
Though in one case (with isNull
) it leads to strange opcodes. In most other cases, it leads to the expected folding of the ceq
into a brfalse
or brtrue
respectively.
More examples of coding this and their surprising translations can be found in this SharpLab.io snippet.
Expected behavior
Actual behavior
See above for the actual behavior. In terms of performance, the different not
versions in the code perform all as expected, since they are ultimately folded into optimized x64 assembly, except for the not(isNull x)
version. The notIsNull
below uses not(isNull x)
, the others all use a different way of coding not
than the default:
(These timings were made by ensuring the function returns and is not optimized away (hence the str.Length
call) and repeated 10_000x in a close for-loop to erase timing inefficiencies for micro-benchmarks with BDN.)
This is ultimatedly caused by the final assembly, which looks as follows (note the popping and extra call):
; FSharp.Perf.BenchLength.notIsNull()
push rdi
push rsi
sub rsp,28
mov ecx,[rcx+8]
call FSharp.Perf.Data.get(Int32)
mov rsi,rax
xor edi,edi
M00_L00: ; start of for-loop body
mov rcx,rsi
call FSharp.Perf.StringLength.notIsNull(System.String)
inc edi
cmp edi,2711 ; loop 10_000 times
jl short M00_L00
add rsp,28
pop rsi
pop rdi
ret
; Total bytes of code 44
; FSharp.Perf.StringLength.notIsNull(System.String)
test rcx,rcx
je short M02_L00
mov eax,[rcx+8]
ret
M02_L00:
xor eax,eax
ret
; Total bytes of code 12
Compare that to using one of the not
redefinitions, which, with the same code, gives:
; FSharp.Perf.BenchLength.newNot()
sub rsp,28
mov ecx,[rcx+8]
call FSharp.Perf.Data.get(Int32)
xor edx,edx
M00_L00: ; start of for-loop body
test rax,rax
je short M00_L01
mov ecx,[rax+8]
M00_L01:
inc edx
cmp edx,2711 ; loop 10_000 times
jl short M00_L00
add rsp,28
ret
; Total bytes of code 37
That is: no push/pop of rdi
and rsi
, that is, no new stackframe.
Known workarounds
Redefine not
yourself and the problem seems to disappear.
Related information
I've only tested this on the latest VS + FSC (with optimizations on, of course), but the Sharplab decoding showed the same results.
I discussed this with @baronfel yesterday and neither of us could come up with a reasonable explanation, even more so since re-defining not
as itself leads to optimized code, so I'm not sure why the combination not(isNull x)
leads to such IL. The Sharplab.io link shows that using something else than isNull
in the brackets does not lead to the same weird IL opcodes.