|
| 1 | +Passing and Returning Structs |
| 2 | +============================= |
| 3 | +Problem Statement |
| 4 | +----------------- |
| 5 | +The current implementation of ABI (Application Binary Interface, aka calling |
| 6 | +convention) support in RyuJIT is problematic in a number of areas, especially |
| 7 | +when it comes to the handling of structs (aka value types). |
| 8 | + |
| 9 | +- RyuJIT currently supports 4 target architectures: x86, x64 (aka x86-64), ARM |
| 10 | + and ARM64, with two different ABIs for x64 (Windows and Linux). |
| 11 | + These each have unique requirements, yet these requirements are expressed in |
| 12 | + the code programmatically, with #ifdefs, and yet even where the requirements |
| 13 | + are shared, they are often handled in different code paths. |
| 14 | + |
| 15 | +- When passing or returning structs, the code generator sometimes requires |
| 16 | + that the struct must be copied to or from memory. The morpher (`fgMorphArgs()`) |
| 17 | + attempts to discern these cases, and create copies when necessary, but sometimes it |
| 18 | + makes copies when they aren't needed. |
| 19 | + |
| 20 | +- Even in cases where the code generator currently requires the struct to be |
| 21 | + in memory, it could be enhanced to handle the in-register case: |
| 22 | + - Currently, when we have a register-passed struct that fits in a register, |
| 23 | + but that doesn't have a single field of a matching type, |
| 24 | + `fgMorphArgs()` generates a `GT_LCL_FLD` of the appropriate scalar type |
| 25 | + to reference the value. This forces the struct to be marked `lvDoNotEnregister`. |
| 26 | + However, the backend has support for performing the necessary move in |
| 27 | + some cases (e.g. when a struct with a single field of `TYP_DOUBLE` is passed |
| 28 | + in an integer register as `TYP_LONG`), by generating a `GT_BITCAST` to move |
| 29 | + the value to the appropriate register. |
| 30 | + - In other cases (e.g. a struct with two `TYP_INT` fields in registers), the |
| 31 | + backend should be able to generate the necessary code to place the fields |
| 32 | + in the necessary register(s). |
| 33 | + |
| 34 | +- Even when the requirements are similar, the IL representation, as well as the |
| 35 | + transformations performed by `fgMorphArgs()`, are not the same. |
| 36 | + |
| 37 | +- Much of the information about each argument is contained in the `fgArgInfo` |
| 38 | + on the `GT_CALL` node. It in turn contains an `argTable` with an entry for |
| 39 | + each argument. However, this information is not complete, especially on |
| 40 | + x64/Linux where repeated calls are made to the VM to obtain the struct |
| 41 | + descriptor. |
| 42 | + |
| 43 | +- The functionality of `fgMorphArgs()` combines the determination of the ABI |
| 44 | + requirements, which sets up the `fgArgInfo` and `argTable`, with the IR |
| 45 | + transformations required to ensure that the arguments of the `GT_CALL` are |
| 46 | + in the appropriate form. |
| 47 | + |
| 48 | +- When `fgCanFastTailCall()` is called, it doesn't yet have the `fgArgInfo`, |
| 49 | + so it must duplicate some of the analysis that is done in `fgMorphArgs()` |
| 50 | + |
| 51 | +High-Level Proposed Design |
| 52 | +-------------------------- |
| 53 | +This is a preliminary design, and is likely to change as the implementation proceeds: |
| 54 | + |
| 55 | +First, the `fgArgInfo` is extended to contain all the information needed to determine |
| 56 | +how an argument is passed. Ideally, most of the `#ifdef`s relating to ABI differences |
| 57 | +can be eliminated by querying the `fgArgInfo`. Most of the information will be queried |
| 58 | +via properties, such that when a target doesn't support a particular struct passing |
| 59 | +mechanism (e.g. passing structs by reference), the property will unconditionally return false, and the associated code paths will be eliminated. |
| 60 | + |
| 61 | +The initial determination of the number of arguments and how they |
| 62 | +are passed is extracted from `fgMorphArgs()` into a separate method: `gtInitArgInfo()`. It is idempotent - that is, it can be re-invoked and will simply return if it |
| 63 | +has already been called. It can be called by `fgCanFastTailCall()` so that it can query |
| 64 | +the `argTable` to get the information it requires. |
| 65 | + |
| 66 | +This method is responsible for the first part of what is currently `fgMorphArgs()`, plus setting up the `argTable`: |
| 67 | +- Count the number of args. |
| 68 | + - Create any non-standard args (e.g. indirection cells or cookie parameters) that |
| 69 | + are needed, but don't yet create copies |
| 70 | +- Create the `argTable` for the given number of args |
| 71 | +- Initialize the `fgArgInfo` for each arg, with all the information about how |
| 72 | + the arg is passed, and whether it requires a temp, but don't yet create any |
| 73 | + temps. |
| 74 | + - On x64/ux, this is the only method that should need to consult the struct |
| 75 | + descriptor for outgoing arguments. |
| 76 | + - The `isProcessed` flag remains false until `fgMorphArgs()` has handled |
| 77 | + the arg. |
| 78 | + - The `fgArgInfo` contains an array of register numbers (sized according to the |
| 79 | + maximum number of registers used for a single argument). If the first register |
| 80 | + in `REG_STK`, the argument is passed entirely on the stack. For most targets, |
| 81 | + if the first register is a register, the argument is passed entirely in |
| 82 | + registers. When arguments can be split (`_TARGET_ARM_`), this will be indicated |
| 83 | + with an `isSplit` property of `true`. |
| 84 | + - Note that the `isSplit` property would evaluate to false on targets where |
| 85 | + it is not supported, reducing the need for `ifdef`s (we can rely on the compiler |
| 86 | + to eliminate those dead paths). |
| 87 | +- Validate that each struct argument is either a `GT_LCL_VAR`, a `GT_OBJ`, |
| 88 | + or a `GT_MKREFANY`. |
| 89 | + |
| 90 | +During the initial `fgMorph` phase, `fgMorphArgs()` does the following: |
| 91 | + |
| 92 | +- Calls `gtInitArgInfo()` to ensure that the `argTable` is set up properly. |
| 93 | + |
| 94 | +- Creates a copy of each argument as necessary. |
| 95 | + - This should only be done if one or more of the following conditions hold: |
| 96 | + - A copy is required to preserve possible ordering dependencies, in which |
| 97 | + case the `needsTmp` field of the `fgArgInfo` was set to true by |
| 98 | + `fgInitArgInfo()`. |
| 99 | + - A struct arg has been promoted, it is passed in register(s) (or split), |
| 100 | + and has not yet been marked `lvDoNotEnregister`. |
| 101 | + |
| 102 | +- Sets up the actual argument for any non-standard args. |
| 103 | + |
| 104 | +- Transforms struct arg nodes from `GT_LCL_VAR`, `GT_OBJ` or `GT_MKREFANY` into: |
| 105 | + - `GT_FIELD_LIST` (i.e. a list of fields) if the lclVar is promoted and |
| 106 | + either 1) passed on the stack, or 2) each register used to pass the struct |
| 107 | + corresponds to exactly one field of the struct. The type of the register |
| 108 | + in which a field is passed need not match the type of the field. |
| 109 | + - The case of a single `GT_FIELD_LIST` node subsumes the current |
| 110 | + `GT_LCL_FLD` representation for a matching single-field struct, |
| 111 | + and does not require a lclVar to be marked `lvDoNotEnregister`. |
| 112 | + Any register type mismatch (e.g. a float field passed in an integer |
| 113 | + register) will be handled by `Lowering` (see below). |
| 114 | + - In future, this should include *any* case of a promoted struct, and the |
| 115 | + backend (`Lowering` and/or `CodeGen`) should be enhanced to correctly |
| 116 | + perform the needed re-assembling of fields into registers. |
| 117 | + - `GT_LCL_VAR` if the argument is a non-promoted struct that is either |
| 118 | + marked `lvDoNotEnregister` or fully enregistered, such as a SIMD type lclVar |
| 119 | + or (in future) a struct that fits entirely into a register. |
| 120 | + - `GT_OBJ` otherwise. In this case, if it is a partial reference to a lclVar, it must be |
| 121 | + marked `lvDoNotEnregister`. (If it is a full reference to a lclVar, it falls into |
| 122 | + the `GT_LCL_VAR` case above.) This representation will be used even for structs |
| 123 | + that are passed as a primitive type (i.e. that currently use the `GT_LCL_FLD` |
| 124 | + representation). |
| 125 | + |
| 126 | +During `Lowering`, any mismatches between the type of an actual register argument (i.e. the |
| 127 | +`GT_OBJ` or the `GT_FIELD_LIST` element) and the type of the register, will cause a |
| 128 | +`GT_BITCAST` node to be inserted. The purpose of this node is simply to instruct the |
| 129 | +register allocator to move the value between the register files, without requiring the |
| 130 | +value to necessarily be spilled to memory.' |
| 131 | + |
| 132 | +Future |
| 133 | +------ |
| 134 | +There are additional improvements for struct parameters for future consideration: |
| 135 | + |
| 136 | +- Support passing promoted structs in registers (as suggested above), where `Lowering` |
| 137 | + would insert the necessary IR to assemble the fields into registers. |
| 138 | +- Instead of generating `GT_FIELD_LIST`, we should consider modeling the passing of a |
| 139 | + promoted struct as separate arguments. This would probably be best implemented by |
| 140 | + modifying the `argTable` during `fgMorphArgs()` such that it reflects the "as-if" |
| 141 | + signature with the exploded struct fields. |
| 142 | + - How this would impact the handling of fields that must be packed into a single |
| 143 | + register remains to be determined (i.e. does `fgMorphArgs()` generate the IR |
| 144 | + to assemble the fields into a single register-sized value, or is that somehow |
| 145 | + deferred?) |
| 146 | +- Support vector calling conventions. This should be somewhat simplified by the |
| 147 | + extraction of the ABI code. |
0 commit comments