Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Merge changes from TFS #1

Merged
merged 1 commit into from
Jan 30, 2015
Merged

Merge changes from TFS #1

merged 1 commit into from
Jan 30, 2015

Conversation

dotnet-bot
Copy link

No description provided.

The test will use Roslyn to compile a simple program.

[tfs-changeset: 1408001]
@ellismg
Copy link

ellismg commented Jan 30, 2015

The PR for this change failed, but that's because the Jenkin's system was unable to find the correct ref to build, which I think is a configuration issue on our end. I am going to merge this anyway. We'll get a good read of quality from TFS and I've asked @mmitche to take a look at Jenkins when he has a chance.

ellismg added a commit that referenced this pull request Jan 30, 2015
@ellismg ellismg merged commit 24fc4dd into dotnet:master Jan 30, 2015
@dotnet-bot dotnet-bot deleted the from-tfs branch January 30, 2015 23:54
sergiy-k added a commit that referenced this pull request Mar 2, 2015
Exception handling needs a full context here to restore properly
richlander added a commit to richlander/coreclr that referenced this pull request Apr 8, 2015
mikem8361 pushed a commit that referenced this pull request Aug 4, 2015
…p where the system ids (TIDs) are wrong.

First find the managed thread os id is:

(lldb) sos Threads
                                                                                                                                                           Lock
       ID OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
   1    1 3787 00000000006547F8    20220 Preemptive  00007FFFCC0145D0:00007FFFCC015FD0 00000000006357F8 0     Ukn
   6    2 3790 0000000000678FB8    21220 Preemptive  0000000000000000:0000000000000000 00000000006357F8 0     Ukn (Finalizer)

(lldb) thread list
Process 0 stopped
* thread #1: tid = 0x0000, 0x00007f01fe64d267 libc.so.6`__GI_raise(sig=6) + 55 at raise.c:55, name = 'corerun', stop reason = signal SIGABRT
  thread #2: tid = 0x0001, 0x00007f01fe7138dd libc.so.6, stop reason = signal SIGABRT
  thread #3: tid = 0x0002, 0x00007f01fd27dda0 libpthread.so.0`__pthread_cond_wait + 192, stop reason = signal SIGABRT
  thread #4: tid = 0x0003, 0x00007f01fd27e149 libpthread.so.0`__pthread_cond_timedwait + 297, stop reason = signal SIGABRT
  thread #5: tid = 0x0004, 0x00007f01fe70f28d libc.so.6, stop reason = signal SIGABRT
  thread #6: tid = 0x0005, 0x00007f01fe70f49d libc.so.6, stop reason = signal SIGABRT

Then use the new command "setsostid" to set the current thread using the "OSID" from above:
(lldb) setsostid 3790 6
Set sos thread os id to 0x3790 which maps to lldb thread index 6

Now ClrStack should dump that managed thread:
(lldb) sos ClrStack

To undo the affect of this command:
(lldb) setsostid

Added setclrpath command that allows the path that sos/dac/dbi are loaded from to be changes instead of using
the coreclr path. This may be needed if loading a core dump and the debugger binaries are in a different directory
that what the dump has for coreclr's path.
BruceForstall referenced this pull request in BruceForstall/coreclr Jun 23, 2016
Fixes #4181 "NYI_X86: Implement PInvoke frame init inlining for x86"

The main work here is to handle the custom calling convention for the
x86 CORINFO_HELP_INIT_PINVOKE_FRAME helper call: it takes EDI as an argument,
trashes only EAX, and returns the TCB in ESI.

The code changes are as follows:
1. Lowering::InsertPInvokeMethodProlog(): don't pass the "secret stub param" for x86.
Also, don't store the InlinedCallFrame.m_pCallSiteSP in the prolog: for x86 this is done
at the call site, due to the floating stack pointer.
2. LinearScan::getKillSetForNode(): for helper calls, call compHelperCallKillSet() to get the killMask,
to account for non-standard kill sets.
3. Morph.cpp::fgMorphArgs(): set non-standard arguments for CORINFO_HELP_INIT_PINVOKE_FRAME.
4. compHelperCallKillSet(): set the correct kill set for CORINFO_HELP_INIT_PINVOKE_FRAME.
5. codegenxarch.cpp::genCallInstruction(): set the ABI return register for CORINFO_HELP_INIT_PINVOKE_FRAME.
6. lowerxarch.cpp::TreeNodeInfoInit(): set the GT_CALL dstCandidates for CORINFO_HELP_INIT_PINVOKE_FRAME.

5 & 6 are both needed to avoid a copy.

With this change, the #1 NYI with 18415 hits over the tests is gone.
The total number of NYI is now 29516.
BruceForstall referenced this pull request in BruceForstall/coreclr Jun 23, 2016
Fixes #4181 "NYI_X86: Implement PInvoke frame init inlining for x86"

The main work here is to handle the custom calling convention for the
x86 CORINFO_HELP_INIT_PINVOKE_FRAME helper call: it takes EDI as an argument,
trashes only EAX, and returns the TCB in ESI.

The code changes are as follows:
1. Lowering::InsertPInvokeMethodProlog(): don't pass the "secret stub param" for x86.
Also, don't store the InlinedCallFrame.m_pCallSiteSP in the prolog: for x86 this is done
at the call site, due to the floating stack pointer.
2. LinearScan::getKillSetForNode(): for helper calls, call compHelperCallKillSet() to get the killMask,
to account for non-standard kill sets.
3. Morph.cpp::fgMorphArgs(): set non-standard arguments for CORINFO_HELP_INIT_PINVOKE_FRAME.
4. compHelperCallKillSet(): set the correct kill set for CORINFO_HELP_INIT_PINVOKE_FRAME.
5. codegenxarch.cpp::genCallInstruction(): set the ABI return register for CORINFO_HELP_INIT_PINVOKE_FRAME.
6. lowerxarch.cpp::TreeNodeInfoInit(): set the GT_CALL dstCandidates for CORINFO_HELP_INIT_PINVOKE_FRAME.

5 & 6 are both needed to avoid a copy.

With this change, the #1 NYI with 18415 hits over the tests is gone.
The total number of NYI is now 29516.
BruceForstall referenced this pull request in BruceForstall/coreclr Jun 23, 2016
Fixes #4181 "NYI_X86: Implement PInvoke frame init inlining for x86"

The main work here is to handle the custom calling convention for the
x86 CORINFO_HELP_INIT_PINVOKE_FRAME helper call: it takes EDI as an argument,
trashes only EAX, and returns the TCB in ESI.

The code changes are as follows:
1. Lowering::InsertPInvokeMethodProlog(): don't pass the "secret stub param" for x86.
Also, don't store the InlinedCallFrame.m_pCallSiteSP in the prolog: for x86 this is done
at the call site, due to the floating stack pointer.
2. LinearScan::getKillSetForNode(): for helper calls, call compHelperCallKillSet() to get the killMask,
to account for non-standard kill sets.
3. Morph.cpp::fgMorphArgs(): set non-standard arguments for CORINFO_HELP_INIT_PINVOKE_FRAME.
4. compHelperCallKillSet(): set the correct kill set for CORINFO_HELP_INIT_PINVOKE_FRAME.
5. codegenxarch.cpp::genCallInstruction(): set the ABI return register for CORINFO_HELP_INIT_PINVOKE_FRAME.
6. lowerxarch.cpp::TreeNodeInfoInit(): set the GT_CALL dstCandidates for CORINFO_HELP_INIT_PINVOKE_FRAME.

5 & 6 are both needed to avoid a copy.

With this change, the #1 NYI with 18415 hits over the tests is gone.
The total number of NYI is now 29516.
BruceForstall referenced this pull request in BruceForstall/coreclr Jul 28, 2016
Remove GT_STMT usage from the code generator.
dotnet-bot pushed a commit to dotnet-bot/coreclr that referenced this pull request Aug 19, 2016
OVERVIEW
========

This directory contains the SuperPMI tool used for testing the .NET
just-in-time (JIT) compiler.

SuperPMI has two uses:
1. Verification that a JIT code change doesn't cause any asserts.
2. Finding test code where two JIT compilers generate different code, or verifying
that the two compilers generate the same code.

Case dotnet#1 is useful for doing quick regression checking when making a source
code change to the JIT compiler. The process is: (a) make a JIT source code
change, (b) run that newly built JIT through a SuperPMI run to verify no
asserts have been introduced.

Case dotnet#2 is useful for generating assembly language diffs, to help analyze the
impact of a JIT code change.

SuperPMI works in two phases: collection and playback. In the collection phase,
the system is configured to collect SuperPMI data. Then, run any set of .NET managed
programs. When these managed programs invoke the JIT compiler, SuperPMI gathers and
captures all information passed between the JIT and its .NET host. In the
playback phase, SuperPMI loads the JIT directly, and causes it to compile all
the functions that it previously compiled, but using the collected data to
provide answers to various questions that the JIT needs to ask. The .NET
execution engine (EE) is not invoked at all.

TOOLS
==========

There are two native executable tools: superpmi and mcs. There is a .NET Core
C# program that is built as part of the coreclr repo tests build called
superpmicollect.exe.

All will show a help screen if passed -?.

COLLECTION
==========

Set the following environment variables:

SuperPMIShimLogPath=<full path to an empty temporary directory>
SuperPMIShimPath=<full path to clrjit.dll, the "standalone" JIT>
COMPlus_AltJit=*
COMPlus_AltJitName=superpmi-shim-collector.dll

(On Linux, use libclrjit.so and libsuperpmi-shim-collector.so. On Mac,
use libclrjit.dylib and libsuperpmi-shim-collector.dylib.)

Then, run some managed programs. When done running programs, un-set these variables.

Now, you will have a large number of .mc files. Merge these using the mcs
tool:

mcs -merge base.mch *.mc

One benefit of SuperPMI is the ability to remove duplicated compilations, so
on replay only unique functions are compiled. Use the following to create a
"unique" set of functions:

mcs -removeDup -thin base.mch unique.mch

Note that -thin is not required. However, it will delete all the compilation
result collected during the collection phase, which makes the resulting MCH
file smaller. Those compilation results are not required for playback.

Use the superpmicollect.exe tool to automate and simplify this process.

PLAYBACK
========

Once you have a merged, de-duplicated MCH collection, you can play it back
using:

superpmi unique.mch clrjit.dll

You can do this much faster by utilizing all the processors on your machine,
and replaying in parallel, using:

superpmi -p unique.mch clrjit.dll

REMAINING WORK
=============

The basic of assembly diffing are there, using the "coredistools"
package. The open source build needs to be altered to use this package
to wire up the correct build steps.

[tfs-changeset: 1623347]
litian2025 pushed a commit to litian2025/coreclr that referenced this pull request Dec 12, 2016
There are two two kinds of transition penalties:
1.Transition from 256-bit AVX code to 128-bit legacy SSE code.
2.Transition from 128-bit legacy SSE code to either 128 or
256-bit AVX code. This only happens if there was a preceding
AVX256->legacy SSE transition penalty.

The primary goal is to remove the dotnet#1 AVX to SSE transition penalty.

Added two emitter flags: contains256bitAVXInstruction indicates that
if the JIT method contains 256-bit AVX code, containsAVXInstruction
indicates that if the method contains 128-bit or 256-bit AVX code.

Issue VZEROUPPER in prolog if the method contains 128-bit or 256-bit
AVX code, to avoid legacy SSE to AVX transition penalty, this could
happen for reverse pinvoke situation. Issue VZEROUPPER in epilog
if the method contains 256-bit AVX code, to avoid AVX to legacy
SSE transition penalty.

To limite code size increase impact, we only issue VZEROUPPER before
PInvoke call on user defined function if the JIT method contains
256-bit AVX code, assuming user defined function contains legacy
SSE code. No need to issue VZEROUPPER after PInvoke call because dotnet#2
SSE to AVX transition penalty won't happen since dotnet#1 AVX to SSE
transition has been taken care of before the PInvoke call.

We measured ~3% to 1% performance gain on TechEmPower plaintext and
verified those VTune AVX/SSE events: OTHER_ASSISTS.AVX_TO_SSE and
OTHER_ASSISTS.SSE_TO_AVE have been reduced to 0.

Fix #7240
litian2025 pushed a commit to litian2025/coreclr that referenced this pull request Jan 8, 2017
There are two two kinds of transition penalties:
1.Transition from 256-bit AVX code to 128-bit legacy SSE code.
2.Transition from 128-bit legacy SSE code to either 128 or
256-bit AVX code. This only happens if there was a preceding
AVX256->legacy SSE transition penalty.

The primary goal is to remove the dotnet#1 AVX to SSE transition penalty.

Added two emitter flags: contains256bitAVXInstruction indicates that
if the JIT method contains 256-bit AVX code, containsAVXInstruction
indicates that if the method contains 128-bit or 256-bit AVX code.

Issue VZEROUPPER in prolog if the method contains 128-bit or 256-bit
AVX code, to avoid legacy SSE to AVX transition penalty, this could
happen for reverse pinvoke situation. Issue VZEROUPPER in epilog
if the method contains 256-bit AVX code, to avoid AVX to legacy
SSE transition penalty.

To limite code size increase impact, we only issue VZEROUPPER before
PInvoke call on user defined function if the JIT method contains
256-bit AVX code, assuming user defined function contains legacy
SSE code. No need to issue VZEROUPPER after PInvoke call because dotnet#2
SSE to AVX transition penalty won't happen since dotnet#1 AVX to SSE
transition has been taken care of before the PInvoke call.

We measured ~3% to 1% performance gain on TechEmPower plaintext and
verified those VTune AVX/SSE events: OTHER_ASSISTS.AVX_TO_SSE and
OTHER_ASSISTS.SSE_TO_AVE have been reduced to 0.

Fix #7240
litian2025 pushed a commit to litian2025/coreclr that referenced this pull request Jan 8, 2017
There are two two kinds of transition penalties:
1.Transition from 256-bit AVX code to 128-bit legacy SSE code.
2.Transition from 128-bit legacy SSE code to either 128 or
256-bit AVX code. This only happens if there was a preceding
AVX256->legacy SSE transition penalty.

The primary goal is to remove the dotnet#1 AVX to SSE transition penalty.

Added two emitter flags: contains256bitAVXInstruction indicates that
if the JIT method contains 256-bit AVX code, containsAVXInstruction
indicates that if the method contains 128-bit or 256-bit AVX code.

Issue VZEROUPPER in prolog if the method contains 128-bit or 256-bit
AVX code, to avoid legacy SSE to AVX transition penalty, this could
happen for reverse pinvoke situation. Issue VZEROUPPER in epilog
if the method contains 256-bit AVX code, to avoid AVX to legacy
SSE transition penalty.

To limite code size increase impact, we only issue VZEROUPPER before
PInvoke call on user defined function if the JIT method contains
256-bit AVX code, assuming user defined function contains legacy
SSE code. No need to issue VZEROUPPER after PInvoke call because dotnet#2
SSE to AVX transition penalty won't happen since dotnet#1 AVX to SSE
transition has been taken care of before the PInvoke call.

We measured ~3% to 1% performance gain on TechEmPower plaintext and
verified those VTune AVX/SSE events: OTHER_ASSISTS.AVX_TO_SSE and
OTHER_ASSISTS.SSE_TO_AVE have been reduced to 0.

Fix #7240

move setContainsAVX flags to lower, refactor to a smaller method

refactor, fix typo in comments

fix format error
BruceForstall referenced this pull request in BruceForstall/coreclr Mar 13, 2017
1. Use the LIR node dumper to display nodes to be generated by
codegen, since we're in LIR form at that point. Add a new
"prefix message" argument to allow "Generating: " to prefix all
such lines.
2. Fix off-by-one error in LIR dump due to `#ifdef` versus `#if`.
3. Remove extra trailing line for each LIR node. This interfered
with #1. But I always thought it was unnecessarily verbose; I don't
believe there is any ambiguity without that extra space.
4. Add dTreeLIR()/cTreeLIR() functions for use in the debugger.
BruceForstall added a commit that referenced this pull request Mar 14, 2017
1. Use the LIR node dumper to display nodes to be generated by
codegen, since we're in LIR form at that point. Add a new
"prefix message" argument to allow "Generating: " to prefix all
such lines.
2. Fix off-by-one error in LIR dump due to `#ifdef` versus `#if`.
3. Remove extra trailing line for each LIR node. This interfered
with #1. But I always thought it was unnecessarily verbose; I don't
believe there is any ambiguity without that extra space.
4. Add dTreeLIR()/cTreeLIR() functions for use in the debugger.
jorive pushed a commit to guhuro/coreclr that referenced this pull request May 4, 2017
1. Use the LIR node dumper to display nodes to be generated by
codegen, since we're in LIR form at that point. Add a new
"prefix message" argument to allow "Generating: " to prefix all
such lines.
2. Fix off-by-one error in LIR dump due to `#ifdef` versus `#if`.
3. Remove extra trailing line for each LIR node. This interfered
with dotnet#1. But I always thought it was unnecessarily verbose; I don't
believe there is any ambiguity without that extra space.
4. Add dTreeLIR()/cTreeLIR() functions for use in the debugger.
jkotas pushed a commit that referenced this pull request Apr 24, 2018
When _WIN64 is defined vm relies on the secret arg being
shifted left and orred with #1.

Revert part of changes from #17659 to fix dotnet/corefx#29266

Fix arm64 to match amd64

Simplify dllimport.cpp
stephentoub added a commit to stephentoub/coreclr that referenced this pull request May 31, 2018
1. Computing GC roots is a relatively slow operation, and doing it for every state machine object found in a large heap can be time consuming.  Making it opt-in with -roots command-line flag.

2. Added -waiting command-line flag.  DumpAsync will now retrieve the <>1__state field from the StateMachine, and if -waiting is specified, it'll filter down to state machines that have a state value >= 0, meaning the state machines are waiting at an await point.  For example, given this program:
```C#
using System.Threading.Tasks;
class Program
{
    static async Task Main() { await MethodA(0); await MethodA(int.MaxValue); }
    static async Task MethodA(int delay) => await MethodB(delay);
    static async Task MethodB(int delay) { await Task.Yield(); await Task.Delay(delay); }
}
```
using `!DumpAsync` outputs:
```
         Address               MT     Size Name
#0
0000026848693438 00007ff88ea35e58      120  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
StateMachine: Program+<MethodB>d__2 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000008        0         System.Int32  1 instance               -2 <>1__state
00007ff8e9bd82f8  4000009        8 ...TaskMethodBuilder  1 instance 0000026848693490 <>t__builder
00007ff8e9bc4bc0  400000a        4         System.Int32  1 instance                0 delay
00007ff8e9bee4d0  400000b       10 ...able+YieldAwaiter  1 instance 0000026848693498 <>u__1
00007ff8e9bcead0  400000c       18 ...vices.TaskAwaiter  1 instance 00000268486934a0 <>u__2
Continuation: 00000268486934b0 (System.Object)

dotnet#1
0000026848693e68 00007ff88ea36cc8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
StateMachine: Program+<MethodA>d__1 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000004        0         System.Int32  1 instance               -2 <>1__state
00007ff8e9bd82f8  4000005        8 ...TaskMethodBuilder  1 instance 0000026848693ec0 <>t__builder
00007ff8e9bc4bc0  4000006        4         System.Int32  1 instance                0 delay
00007ff8e9bcead0  4000007       10 ...vices.TaskAwaiter  1 instance 0000026848693ec8 <>u__1
Continuation: 00000268486934b0 (System.Object)

dotnet#2
0000026848693ed8 00007ff88ea37188      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]]
StateMachine: Program+<Main>d__0 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000001        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000002        8 ...TaskMethodBuilder  1 instance 0000026848693f30 <>t__builder
00007ff8e9bcead0  4000003       10 ...vices.TaskAwaiter  1 instance 0000026848693f38 <>u__1
Continuation: 0000026848693f48 (System.Threading.Tasks.Task+SetOnInvokeMres)

dotnet#3
0000026848695d30 00007ff88ea35e58      120  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
StateMachine: Program+<MethodB>d__2 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000008        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000009        8 ...TaskMethodBuilder  1 instance 0000026848695d88 <>t__builder
00007ff8e9bc4bc0  400000a        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bee4d0  400000b       10 ...able+YieldAwaiter  1 instance 0000026848695d90 <>u__1
00007ff8e9bcead0  400000c       18 ...vices.TaskAwaiter  1 instance 0000026848695d98 <>u__2
Continuation: 0000026848695dd0 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]])

dotnet#4
0000026848695dd0 00007ff88ea36cc8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
StateMachine: Program+<MethodA>d__1 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000004        0         System.Int32  1 instance                0 <>1__state
00007ff8e9bd82f8  4000005        8 ...TaskMethodBuilder  1 instance 0000026848695e28 <>t__builder
00007ff8e9bc4bc0  4000006        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bcead0  4000007       10 ...vices.TaskAwaiter  1 instance 0000026848695e30 <>u__1
Continuation: 0000026848693ed8 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]])

Found 5 state machines.
```
while using `!DumpAsync -waiting` outputs only:
```
         Address               MT     Size Name
#0
0000026848693ed8 00007ff88ea37188      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]]
StateMachine: Program+<Main>d__0 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000001        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000002        8 ...TaskMethodBuilder  1 instance 0000026848693f30 <>t__builder
00007ff8e9bcead0  4000003       10 ...vices.TaskAwaiter  1 instance 0000026848693f38 <>u__1
Continuation: 0000026848693f48 (System.Threading.Tasks.Task+SetOnInvokeMres)

dotnet#1
0000026848695d30 00007ff88ea35e58      120  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
StateMachine: Program+<MethodB>d__2 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000008        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000009        8 ...TaskMethodBuilder  1 instance 0000026848695d88 <>t__builder
00007ff8e9bc4bc0  400000a        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bee4d0  400000b       10 ...able+YieldAwaiter  1 instance 0000026848695d90 <>u__1
00007ff8e9bcead0  400000c       18 ...vices.TaskAwaiter  1 instance 0000026848695d98 <>u__2
Continuation: 0000026848695dd0 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]])

dotnet#2
0000026848695dd0 00007ff88ea36cc8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
StateMachine: Program+<MethodA>d__1 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000004        0         System.Int32  1 instance                0 <>1__state
00007ff8e9bd82f8  4000005        8 ...TaskMethodBuilder  1 instance 0000026848695e28 <>t__builder
00007ff8e9bc4bc0  4000006        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bcead0  4000007       10 ...vices.TaskAwaiter  1 instance 0000026848695e30 <>u__1
Continuation: 0000026848693ed8 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]])

Found 3 state machines.
```
skipping the two state machines that have a `<>1__state` field value of -2 (meaning it's completed).  Note that this change has the somewhat unfortunate impact of taking a dependency on what's effectively an implementation detail of Roslyn, but the value the filtering provides is deemed to be worth it.  This design is unlikely to change in the future, and as with other diagnostic/debugging features that rely on such details, it can be updated if Roslyn ever changes its scheme.  In the meantime, the code will output a warning message if it can't find the state field.

3. If a state machine is found to have 0 roots but also to have a <>1__state value >= 0, that suggests it was dropped without having been completed, which is likely a sign of an application bug.  The command now prints out an information message to highlight that state.  For example, this program:
```C#
using System;
using System.Threading.Tasks;
class Program
{
    static void Main()
    {
        Task.Run(async () => await new TaskCompletionSource<bool>().Task);
        Console.ReadLine();
    }
}
```
when processed with `!DumpAsync -roots` results in:
```
         Address               MT     Size Name
#0
0000020787fb5b30 00007ff88ea1afe8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Boolean, System.Private.CoreLib],[Program+<>c+<<Main>b__0_0>d, test]]
StateMachine: Program+<>c+<<Main>b__0_0>d (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000003        0         System.Int32  1 instance                0 <>1__state
00007ff8e9bd0b88  4000004        8 ...Private.CoreLib]]  1 instance 0000020787fb5b88 <>t__builder
00007ff8e9bffd58  4000005       10 ...Private.CoreLib]]  1 instance 0000020787fb5b90 <>u__1
Continuation: 0000020787fb3fc8 (System.Threading.Tasks.UnwrapPromise`1[[System.Boolean, System.Private.CoreLib]])
GC roots:
Incomplete state machine (<>1__state == 0) with 0 roots.

Found 1 state machines.
```
stephentoub added a commit to stephentoub/coreclr that referenced this pull request May 31, 2018
1. Computing GC roots is a relatively slow operation, and doing it for every state machine object found in a large heap can be time consuming.  Making it opt-in with -roots command-line flag.

2. Added -waiting command-line flag.  DumpAsync will now retrieve the <>1__state field from the StateMachine, and if -waiting is specified, it'll filter down to state machines that have a state value >= 0, meaning the state machines are waiting at an await point.  For example, given this program:
```C#
using System.Threading.Tasks;
class Program
{
    static async Task Main() { await MethodA(0); await MethodA(int.MaxValue); }
    static async Task MethodA(int delay) => await MethodB(delay);
    static async Task MethodB(int delay) { await Task.Yield(); await Task.Delay(delay); }
}
```
using `!DumpAsync` outputs:
```
         Address               MT     Size Name
#0
0000026848693438 00007ff88ea35e58      120  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
StateMachine: Program+<MethodB>d__2 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000008        0         System.Int32  1 instance               -2 <>1__state
00007ff8e9bd82f8  4000009        8 ...TaskMethodBuilder  1 instance 0000026848693490 <>t__builder
00007ff8e9bc4bc0  400000a        4         System.Int32  1 instance                0 delay
00007ff8e9bee4d0  400000b       10 ...able+YieldAwaiter  1 instance 0000026848693498 <>u__1
00007ff8e9bcead0  400000c       18 ...vices.TaskAwaiter  1 instance 00000268486934a0 <>u__2
Continuation: 00000268486934b0 (System.Object)

dotnet#1
0000026848693e68 00007ff88ea36cc8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
StateMachine: Program+<MethodA>d__1 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000004        0         System.Int32  1 instance               -2 <>1__state
00007ff8e9bd82f8  4000005        8 ...TaskMethodBuilder  1 instance 0000026848693ec0 <>t__builder
00007ff8e9bc4bc0  4000006        4         System.Int32  1 instance                0 delay
00007ff8e9bcead0  4000007       10 ...vices.TaskAwaiter  1 instance 0000026848693ec8 <>u__1
Continuation: 00000268486934b0 (System.Object)

dotnet#2
0000026848693ed8 00007ff88ea37188      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]]
StateMachine: Program+<Main>d__0 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000001        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000002        8 ...TaskMethodBuilder  1 instance 0000026848693f30 <>t__builder
00007ff8e9bcead0  4000003       10 ...vices.TaskAwaiter  1 instance 0000026848693f38 <>u__1
Continuation: 0000026848693f48 (System.Threading.Tasks.Task+SetOnInvokeMres)

dotnet#3
0000026848695d30 00007ff88ea35e58      120  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
StateMachine: Program+<MethodB>d__2 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000008        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000009        8 ...TaskMethodBuilder  1 instance 0000026848695d88 <>t__builder
00007ff8e9bc4bc0  400000a        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bee4d0  400000b       10 ...able+YieldAwaiter  1 instance 0000026848695d90 <>u__1
00007ff8e9bcead0  400000c       18 ...vices.TaskAwaiter  1 instance 0000026848695d98 <>u__2
Continuation: 0000026848695dd0 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]])

dotnet#4
0000026848695dd0 00007ff88ea36cc8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
StateMachine: Program+<MethodA>d__1 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000004        0         System.Int32  1 instance                0 <>1__state
00007ff8e9bd82f8  4000005        8 ...TaskMethodBuilder  1 instance 0000026848695e28 <>t__builder
00007ff8e9bc4bc0  4000006        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bcead0  4000007       10 ...vices.TaskAwaiter  1 instance 0000026848695e30 <>u__1
Continuation: 0000026848693ed8 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]])

Found 5 state machines.
```
while using `!DumpAsync -waiting` outputs only:
```
         Address               MT     Size Name
#0
0000026848693ed8 00007ff88ea37188      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]]
StateMachine: Program+<Main>d__0 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000001        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000002        8 ...TaskMethodBuilder  1 instance 0000026848693f30 <>t__builder
00007ff8e9bcead0  4000003       10 ...vices.TaskAwaiter  1 instance 0000026848693f38 <>u__1
Continuation: 0000026848693f48 (System.Threading.Tasks.Task+SetOnInvokeMres)

dotnet#1
0000026848695d30 00007ff88ea35e58      120  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
StateMachine: Program+<MethodB>d__2 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000008        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000009        8 ...TaskMethodBuilder  1 instance 0000026848695d88 <>t__builder
00007ff8e9bc4bc0  400000a        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bee4d0  400000b       10 ...able+YieldAwaiter  1 instance 0000026848695d90 <>u__1
00007ff8e9bcead0  400000c       18 ...vices.TaskAwaiter  1 instance 0000026848695d98 <>u__2
Continuation: 0000026848695dd0 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]])

dotnet#2
0000026848695dd0 00007ff88ea36cc8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
StateMachine: Program+<MethodA>d__1 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000004        0         System.Int32  1 instance                0 <>1__state
00007ff8e9bd82f8  4000005        8 ...TaskMethodBuilder  1 instance 0000026848695e28 <>t__builder
00007ff8e9bc4bc0  4000006        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bcead0  4000007       10 ...vices.TaskAwaiter  1 instance 0000026848695e30 <>u__1
Continuation: 0000026848693ed8 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]])

Found 3 state machines.
```
skipping the two state machines that have a `<>1__state` field value of -2 (meaning it's completed).  Note that this change has the somewhat unfortunate impact of taking a dependency on what's effectively an implementation detail of Roslyn, but the value the filtering provides is deemed to be worth it.  This design is unlikely to change in the future, and as with other diagnostic/debugging features that rely on such details, it can be updated if Roslyn ever changes its scheme.  In the meantime, the code will output a warning message if it can't find the state field.

3. If a state machine is found to have 0 roots but also to have a <>1__state value >= 0, that suggests it was dropped without having been completed, which is likely a sign of an application bug.  The command now prints out an information message to highlight that state.  For example, this program:
```C#
using System;
using System.Threading.Tasks;
class Program
{
    static void Main()
    {
        Task.Run(async () => await new TaskCompletionSource<bool>().Task);
        Console.ReadLine();
    }
}
```
when processed with `!DumpAsync -roots` results in:
```
         Address               MT     Size Name
#0
0000020787fb5b30 00007ff88ea1afe8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Boolean, System.Private.CoreLib],[Program+<>c+<<Main>b__0_0>d, test]]
StateMachine: Program+<>c+<<Main>b__0_0>d (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000003        0         System.Int32  1 instance                0 <>1__state
00007ff8e9bd0b88  4000004        8 ...Private.CoreLib]]  1 instance 0000020787fb5b88 <>t__builder
00007ff8e9bffd58  4000005       10 ...Private.CoreLib]]  1 instance 0000020787fb5b90 <>u__1
Continuation: 0000020787fb3fc8 (System.Threading.Tasks.UnwrapPromise`1[[System.Boolean, System.Private.CoreLib]])
GC roots:
Incomplete state machine (<>1__state == 0) with 0 roots.

Found 1 state machines.
```
stephentoub added a commit that referenced this pull request May 31, 2018
1. Computing GC roots is a relatively slow operation, and doing it for every state machine object found in a large heap can be time consuming.  Making it opt-in with -roots command-line flag.

2. Added -waiting command-line flag.  DumpAsync will now retrieve the <>1__state field from the StateMachine, and if -waiting is specified, it'll filter down to state machines that have a state value >= 0, meaning the state machines are waiting at an await point.  For example, given this program:
```C#
using System.Threading.Tasks;
class Program
{
    static async Task Main() { await MethodA(0); await MethodA(int.MaxValue); }
    static async Task MethodA(int delay) => await MethodB(delay);
    static async Task MethodB(int delay) { await Task.Yield(); await Task.Delay(delay); }
}
```
using `!DumpAsync` outputs:
```
         Address               MT     Size Name
#0
0000026848693438 00007ff88ea35e58      120  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
StateMachine: Program+<MethodB>d__2 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000008        0         System.Int32  1 instance               -2 <>1__state
00007ff8e9bd82f8  4000009        8 ...TaskMethodBuilder  1 instance 0000026848693490 <>t__builder
00007ff8e9bc4bc0  400000a        4         System.Int32  1 instance                0 delay
00007ff8e9bee4d0  400000b       10 ...able+YieldAwaiter  1 instance 0000026848693498 <>u__1
00007ff8e9bcead0  400000c       18 ...vices.TaskAwaiter  1 instance 00000268486934a0 <>u__2
Continuation: 00000268486934b0 (System.Object)

#1
0000026848693e68 00007ff88ea36cc8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
StateMachine: Program+<MethodA>d__1 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000004        0         System.Int32  1 instance               -2 <>1__state
00007ff8e9bd82f8  4000005        8 ...TaskMethodBuilder  1 instance 0000026848693ec0 <>t__builder
00007ff8e9bc4bc0  4000006        4         System.Int32  1 instance                0 delay
00007ff8e9bcead0  4000007       10 ...vices.TaskAwaiter  1 instance 0000026848693ec8 <>u__1
Continuation: 00000268486934b0 (System.Object)

#2
0000026848693ed8 00007ff88ea37188      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]]
StateMachine: Program+<Main>d__0 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000001        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000002        8 ...TaskMethodBuilder  1 instance 0000026848693f30 <>t__builder
00007ff8e9bcead0  4000003       10 ...vices.TaskAwaiter  1 instance 0000026848693f38 <>u__1
Continuation: 0000026848693f48 (System.Threading.Tasks.Task+SetOnInvokeMres)

#3
0000026848695d30 00007ff88ea35e58      120  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
StateMachine: Program+<MethodB>d__2 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000008        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000009        8 ...TaskMethodBuilder  1 instance 0000026848695d88 <>t__builder
00007ff8e9bc4bc0  400000a        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bee4d0  400000b       10 ...able+YieldAwaiter  1 instance 0000026848695d90 <>u__1
00007ff8e9bcead0  400000c       18 ...vices.TaskAwaiter  1 instance 0000026848695d98 <>u__2
Continuation: 0000026848695dd0 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]])

#4
0000026848695dd0 00007ff88ea36cc8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
StateMachine: Program+<MethodA>d__1 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000004        0         System.Int32  1 instance                0 <>1__state
00007ff8e9bd82f8  4000005        8 ...TaskMethodBuilder  1 instance 0000026848695e28 <>t__builder
00007ff8e9bc4bc0  4000006        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bcead0  4000007       10 ...vices.TaskAwaiter  1 instance 0000026848695e30 <>u__1
Continuation: 0000026848693ed8 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]])

Found 5 state machines.
```
while using `!DumpAsync -waiting` outputs only:
```
         Address               MT     Size Name
#0
0000026848693ed8 00007ff88ea37188      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]]
StateMachine: Program+<Main>d__0 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000001        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000002        8 ...TaskMethodBuilder  1 instance 0000026848693f30 <>t__builder
00007ff8e9bcead0  4000003       10 ...vices.TaskAwaiter  1 instance 0000026848693f38 <>u__1
Continuation: 0000026848693f48 (System.Threading.Tasks.Task+SetOnInvokeMres)

#1
0000026848695d30 00007ff88ea35e58      120  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
StateMachine: Program+<MethodB>d__2 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000008        0         System.Int32  1 instance                1 <>1__state
00007ff8e9bd82f8  4000009        8 ...TaskMethodBuilder  1 instance 0000026848695d88 <>t__builder
00007ff8e9bc4bc0  400000a        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bee4d0  400000b       10 ...able+YieldAwaiter  1 instance 0000026848695d90 <>u__1
00007ff8e9bcead0  400000c       18 ...vices.TaskAwaiter  1 instance 0000026848695d98 <>u__2
Continuation: 0000026848695dd0 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]])

#2
0000026848695dd0 00007ff88ea36cc8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
StateMachine: Program+<MethodA>d__1 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000004        0         System.Int32  1 instance                0 <>1__state
00007ff8e9bd82f8  4000005        8 ...TaskMethodBuilder  1 instance 0000026848695e28 <>t__builder
00007ff8e9bc4bc0  4000006        4         System.Int32  1 instance       2147483647 delay
00007ff8e9bcead0  4000007       10 ...vices.TaskAwaiter  1 instance 0000026848695e30 <>u__1
Continuation: 0000026848693ed8 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]])

Found 3 state machines.
```
skipping the two state machines that have a `<>1__state` field value of -2 (meaning it's completed).  Note that this change has the somewhat unfortunate impact of taking a dependency on what's effectively an implementation detail of Roslyn, but the value the filtering provides is deemed to be worth it.  This design is unlikely to change in the future, and as with other diagnostic/debugging features that rely on such details, it can be updated if Roslyn ever changes its scheme.  In the meantime, the code will output a warning message if it can't find the state field.

3. If a state machine is found to have 0 roots but also to have a <>1__state value >= 0, that suggests it was dropped without having been completed, which is likely a sign of an application bug.  The command now prints out an information message to highlight that state.  For example, this program:
```C#
using System;
using System.Threading.Tasks;
class Program
{
    static void Main()
    {
        Task.Run(async () => await new TaskCompletionSource<bool>().Task);
        Console.ReadLine();
    }
}
```
when processed with `!DumpAsync -roots` results in:
```
         Address               MT     Size Name
#0
0000020787fb5b30 00007ff88ea1afe8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Boolean, System.Private.CoreLib],[Program+<>c+<<Main>b__0_0>d, test]]
StateMachine: Program+<>c+<<Main>b__0_0>d (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8e9bc4bc0  4000003        0         System.Int32  1 instance                0 <>1__state
00007ff8e9bd0b88  4000004        8 ...Private.CoreLib]]  1 instance 0000020787fb5b88 <>t__builder
00007ff8e9bffd58  4000005       10 ...Private.CoreLib]]  1 instance 0000020787fb5b90 <>u__1
Continuation: 0000020787fb3fc8 (System.Threading.Tasks.UnwrapPromise`1[[System.Boolean, System.Private.CoreLib]])
GC roots:
Incomplete state machine (<>1__state == 0) with 0 roots.

Found 1 state machines.
```
dotnet-maestro-bot referenced this pull request in dotnet-maestro-bot/coreclr Sep 5, 2018
* Fix ServiceController name population perf

* Split tests

* Remove dead field

* Remove new use of DangerousGetHandle

* SafeHandle all the things!

* VSB #1

* VSB dotnet#2

* Fix GLE

* Initialize machineName in ctor

* Test for empty name ex

* Null names

* Inadvertent edit

* Unix build

* Move interop into class

* Reverse SafeHandle for HAllocGlobal

* Fix tests

* Disable test for NETFX

* CR feedback

* Pattern matching on VSB

* Direct call

* typo

Signed-off-by: dotnet-bot <[email protected]>
jkotas pushed a commit that referenced this pull request Sep 5, 2018
* Fix ServiceController name population perf

* Split tests

* Remove dead field

* Remove new use of DangerousGetHandle

* SafeHandle all the things!

* VSB #1

* VSB #2

* Fix GLE

* Initialize machineName in ctor

* Test for empty name ex

* Null names

* Inadvertent edit

* Unix build

* Move interop into class

* Reverse SafeHandle for HAllocGlobal

* Fix tests

* Disable test for NETFX

* CR feedback

* Pattern matching on VSB

* Direct call

* typo

Signed-off-by: dotnet-bot <[email protected]>
jkotas pushed a commit to jkotas/coreclr that referenced this pull request Aug 27, 2020
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Fixes dotnet/coreclr#4181 "NYI_X86: Implement PInvoke frame init inlining for x86"

The main work here is to handle the custom calling convention for the
x86 CORINFO_HELP_INIT_PINVOKE_FRAME helper call: it takes EDI as an argument,
trashes only EAX, and returns the TCB in ESI.

The code changes are as follows:
1. Lowering::InsertPInvokeMethodProlog(): don't pass the "secret stub param" for x86.
Also, don't store the InlinedCallFrame.m_pCallSiteSP in the prolog: for x86 this is done
at the call site, due to the floating stack pointer.
2. LinearScan::getKillSetForNode(): for helper calls, call compHelperCallKillSet() to get the killMask,
to account for non-standard kill sets.
3. Morph.cpp::fgMorphArgs(): set non-standard arguments for CORINFO_HELP_INIT_PINVOKE_FRAME.
4. compHelperCallKillSet(): set the correct kill set for CORINFO_HELP_INIT_PINVOKE_FRAME.
5. codegenxarch.cpp::genCallInstruction(): set the ABI return register for CORINFO_HELP_INIT_PINVOKE_FRAME.
6. lowerxarch.cpp::TreeNodeInfoInit(): set the GT_CALL dstCandidates for CORINFO_HELP_INIT_PINVOKE_FRAME.

5 & 6 are both needed to avoid a copy.

With this change, the dotnet/coreclr#1 NYI with 18415 hits over the tests is gone.
The total number of NYI is now 29516.


Commit migrated from dotnet/coreclr@3c7ecfe
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
OVERVIEW
========

This directory contains the SuperPMI tool used for testing the .NET
just-in-time (JIT) compiler.

SuperPMI has two uses:
1. Verification that a JIT code change doesn't cause any asserts.
2. Finding test code where two JIT compilers generate different code, or verifying
that the two compilers generate the same code.

Case dotnet/coreclr#1 is useful for doing quick regression checking when making a source
code change to the JIT compiler. The process is: (a) make a JIT source code
change, (b) run that newly built JIT through a SuperPMI run to verify no
asserts have been introduced.

Case dotnet/coreclr#2 is useful for generating assembly language diffs, to help analyze the
impact of a JIT code change.

SuperPMI works in two phases: collection and playback. In the collection phase,
the system is configured to collect SuperPMI data. Then, run any set of .NET managed
programs. When these managed programs invoke the JIT compiler, SuperPMI gathers and
captures all information passed between the JIT and its .NET host. In the
playback phase, SuperPMI loads the JIT directly, and causes it to compile all
the functions that it previously compiled, but using the collected data to
provide answers to various questions that the JIT needs to ask. The .NET
execution engine (EE) is not invoked at all.

TOOLS
==========

There are two native executable tools: superpmi and mcs. There is a .NET Core
C# program that is built as part of the coreclr repo tests build called
superpmicollect.exe.

All will show a help screen if passed -?.

COLLECTION
==========

Set the following environment variables:

SuperPMIShimLogPath=<full path to an empty temporary directory>
SuperPMIShimPath=<full path to clrjit.dll, the "standalone" JIT>
COMPlus_AltJit=*
COMPlus_AltJitName=superpmi-shim-collector.dll

(On Linux, use libclrjit.so and libsuperpmi-shim-collector.so. On Mac,
use libclrjit.dylib and libsuperpmi-shim-collector.dylib.)

Then, run some managed programs. When done running programs, un-set these variables.

Now, you will have a large number of .mc files. Merge these using the mcs
tool:

mcs -merge base.mch *.mc

One benefit of SuperPMI is the ability to remove duplicated compilations, so
on replay only unique functions are compiled. Use the following to create a
"unique" set of functions:

mcs -removeDup -thin base.mch unique.mch

Note that -thin is not required. However, it will delete all the compilation
result collected during the collection phase, which makes the resulting MCH
file smaller. Those compilation results are not required for playback.

Use the superpmicollect.exe tool to automate and simplify this process.

PLAYBACK
========

Once you have a merged, de-duplicated MCH collection, you can play it back
using:

superpmi unique.mch clrjit.dll

You can do this much faster by utilizing all the processors on your machine,
and replaying in parallel, using:

superpmi -p unique.mch clrjit.dll

REMAINING WORK
=============

The basic of assembly diffing are there, using the "coredistools"
package. The open source build needs to be altered to use this package
to wire up the correct build steps.

[tfs-changeset: 1623347]


Commit migrated from dotnet/coreclr@d85eb92
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
There are two two kinds of transition penalties:
1.Transition from 256-bit AVX code to 128-bit legacy SSE code.
2.Transition from 128-bit legacy SSE code to either 128 or
256-bit AVX code. This only happens if there was a preceding
AVX256->legacy SSE transition penalty.

The primary goal is to remove the dotnet/coreclr#1 AVX to SSE transition penalty.

Added two emitter flags: contains256bitAVXInstruction indicates that
if the JIT method contains 256-bit AVX code, containsAVXInstruction
indicates that if the method contains 128-bit or 256-bit AVX code.

Issue VZEROUPPER in prolog if the method contains 128-bit or 256-bit
AVX code, to avoid legacy SSE to AVX transition penalty, this could
happen for reverse pinvoke situation. Issue VZEROUPPER in epilog
if the method contains 256-bit AVX code, to avoid AVX to legacy
SSE transition penalty.

To limite code size increase impact, we only issue VZEROUPPER before
PInvoke call on user defined function if the JIT method contains
256-bit AVX code, assuming user defined function contains legacy
SSE code. No need to issue VZEROUPPER after PInvoke call because dotnet/coreclr#2
SSE to AVX transition penalty won't happen since dotnet/coreclr#1 AVX to SSE
transition has been taken care of before the PInvoke call.

We measured ~3% to 1% performance gain on TechEmPower plaintext and
verified those VTune AVX/SSE events: OTHER_ASSISTS.AVX_TO_SSE and
OTHER_ASSISTS.SSE_TO_AVE have been reduced to 0.

Fix dotnet/coreclr#7240

move setContainsAVX flags to lower, refactor to a smaller method

refactor, fix typo in comments

fix format error


Commit migrated from dotnet/coreclr@cc169ea
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants