Skip to content

Write a pass for Data layout scalarization #107920

Closed
@farzonl

Description

@farzonl

The scalarizer pass does not handle scalarization of data structures. Further there isn't an existing llvm pass that does this.

In this godbolt link we have seven scenarios of Array vectors: https://hlsl.godbolt.org/z/9c35aa9zj

uint3 bArr[3];
export uint3 fn0(int index) {
    return bArr[index];
}
export uint3 fn1(int index) {
    uint3 aArr[3];
    for(int i = 0; i < 3; i++)
        aArr[i] = uint3(i,i,i);
    return aArr[index];
}
groupshared uint4 cArr[3];
export uint4 fn2(int index) {
    for(int i = 0; i < 3; i++)
        cArr[i] = uint4(i,i,i,i);
    return cArr[index];
}
groupshared uint4 cVec;
export uint fn3(int i, int index) {
    cVec = uint4(i,i,i,i);
    return cVec[index];
}
static uint4 dArr[3];
export uint4 fn4(int index) {
    for(int i = 0; i < 3; i++)
        dArr[i] = uint4(i,i,i,i);
    return dArr[index];
}
export uint3 fn5(int index) {
    static uint3 eArr[3];
    for(int i = 0; i < 3; i++)
        eArr[i] = uint3(i,i,i);
    return eArr[index];
}

and

static uint4 fVec;
export uint fn6(int i, int index) {
    fVec = uint4(i,i,i,i);
    return fVec[index];
}

The idea behind this is to see the data transformation requirements for vectors defined on the
stack vs those defined globally vs those defined with groupshared or static.

In clang the three different global array of vectors scenarios look roughly the same
with a few attribute differences.

cArr = local_unnamed_addr addrspace(3) global [3 x <4 x i32>] zeroinitializer, align 16
bArr = local_unnamed_addr global [3 x <3 x i32>] zeroinitializer, align 16
dArr = internal unnamed_addr global [3 x <4 x i32>] zeroinitializer, align 16
@"?eArr@?1??fn5@@YAT?$__vector@I$02@__clang@@H@Z@4PAT23@A" = internal unnamed_addr global [3 x <3 x i32>] zeroinitializer, align 16, !dbg !26

DXC however converts bArr (the global non groupshared case) into a cbuffer.
The cArr groupshared global however gets represented as a flattened 12 wide array in DXC

@"\01?cArr@@3PAV?$vector@I$03@@A.v.1dim" = addrspace(3) global [12 x i32] undef, align 4

And dArr the static case the vev4 gets scalarized into 4 3 element arrays.

@dArr.0 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@dArr.1 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@dArr.2 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@dArr.3 = internal unnamed_addr global [3 x i32] zeroinitializer, align 4

static in a function scope is represented similarly to a function in global scope with only name mangling differences

@"\01?eArr@?1??fn5@@YA?AV?$vector@I$02@@H@Z@[email protected]" = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@"\01?eArr@?1??fn5@@YA?AV?$vector@I$02@@H@Z@[email protected]" = internal unnamed_addr global [3 x i32] zeroinitializer, align 4
@"\01?eArr@?1??fn5@@YA?AV?$vector@I$02@@H@Z@[email protected]" = internal unnamed_addr global [3 x i32] zeroinitializer, align 4

aArr, the array defined on the function is optimized away into a series of extract elements.

In the cVec DXC converts the vector into an array of 4 elements:

@"\01?cVec@@3V?$vector@I$03@@A.v" = addrspace(3) global [4 x i32] undef, align 4

The working theory is that data layout transformations are needed for data defined globally.
Further there seems to be three specific behaviors we want.

  1. static scalar layouts
  2. groupshared scalar layouts
  3. cbuffer usage for regular arrays.

As such The proposal is:

  • Traverse global variables in the module.
  • Identify global variables of vector types.
  • Replace the global vector with a new global array of scalar values.
    • Flatten vectors into arrays
    • Flatten arrays of vectors into one dim array
    • Replacement will include when cbuffers are needed as well as when flatten arrays are
  • Update all uses of the global variable to work with the new scalar array.
  • Remove the old global variable.

Globals can be iterated over like so:

for (GlobalVariable &GV : M.globals()) {...}

And we will need to update uses like so

for (auto *User : GV.users()) {...}

Metadata

Metadata

Assignees

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions