Skip to content

Conversation

@hujiajie
Copy link
Contributor

@hujiajie hujiajie commented Jun 21, 2022

PERF

The sparseToDense op takes an optional default value. Unlike scatterNd, the output cannot be initialized with fill(), since the default value is a scalar tensor (which could be the result of a previous op) than a scalar number. The (horrible!) workaround here is to broadcast the value with tile().

The other challenge is if the kernel should discard the original value at index or accumulate on that. The magic is performed by splitting the op into two "scatter" steps: 1) replace the default value with 0, and 2) add the input sparse values to 0 or whatever. This avoids a bitmap for recording whether the output element at index has been updated by another invocation.

Closes #6525

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@hujiajie hujiajie marked this pull request as ready for review June 21, 2022 05:26
}
`;
const atomicRMW = (ptr: string, val: string) => {
const atomicAddSnippet = `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep the original atomicAdd path, which is better if the data is int32?

const atomicAddSnippet = this.type === 'int32' ?
  `atomicAdd(${ptr}, i32(val));` :
  `
        {
          var oldBits = 0;
          var newBits = bitcast<i32>(${val});
          loop {
            let info = atomicCompareExchangeWeak(${ptr}, oldBits, newBits);
            if (info.exchanged) {
              break;
            }
            oldBits = info.old_value;
            let oldValue =
                bitcast<${mapToWgslTypes(this.type, false)}>(oldBits);
            let newValue = oldValue + (${val});
            newBits = bitcast<i32>(newValue);
          }
        }
      `;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me wait for more inputs to see if it worth the complexity.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the i32 path should be kept for performance.

let d0 = index / uniforms.updatesShape[1];
let d1 = index - d0 * uniforms.updatesShape[1];
let d0 = index / uniforms.outShape[1];
let d1 = index - d0 * uniforms.outShape[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changes it from uniforms.updatesShape to uniforms.outShape[1]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack. When updates is a scalar tensor, coords can no longer be derived from its shape as the tensor is "compressed", but outShape[1] happens to provide the slice size here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you ensure that uniforms.updatesShape[1] is equal to uniforms.outShape[1]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems tensors are reshaped this way by design:

ScatterNd SparseToDense
indices shape [numUpdates, sliceRank] [numUpdates, sliceRank]
updates shape [numUpdates, sliceSize] [numUpdates, sliceSize] (or [ ])
result shape [outputSize / sliceSize, sliceSize] [outputSize / sliceSize, sliceSize]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to use uniforms.outShape[1] instead of uniforms.updatesShape[1] for fn getUpdatesCoordsFromFlatIndex.
When updates is a scalar tensor, you probably need to do it like below. In your hack, when updates is a scalar tensor, getUpdatesCoordsFromFlatIndex's result is even not needed.

switch((this.updatesRank) {
case 0:
     outCoordsString = 'flattenedIndex';
     getUpdatesCoordsFromFlatIndex = '';
case 1:
     outCoordsString = 'flattenedIndex';
     getUpdatesCoordsFromFlatIndex = `
     fn getUpdatesCoordsFromFlatIndex(index : i32) -> i32 {
       return index;
     }
case 2:
     outCoordsString = 'vec2<i32>(flattenedIndex, coords[1])';
     getUpdatesCoordsFromFlatIndex = `
     fn getUpdatesCoordsFromFlatIndex(index : i32) -> vec2<i32> {
       let d0 = index / uniforms.updatesShape[1];
       let d1 = index - d0 * uniforms.updatesShape[1];
       return vec2<i32>(d0, d1);
     }
default: throw error
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flattenedIndex also depends on coords[0]. Can you clarify how to handle this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change it as we discussed offline.

}
}
`;
const atomicStoreSnippet = `atomicStore(${ptr}, bitcast<i32>(${val}));`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just store 0 so that we don't need to upload a zero tensor as input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fits my taste better: the kernel does something (accumulation or overwriting) according to indices and values given by the user.

@qjia7 qjia7 requested a review from gyagp June 21, 2022 07:42
program, [sparseValues, sparseIndices, defaultValue], sparseValues.dtype,
uniformData);
{
// First replace the default value with 0 at indices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In API doc, it says
If sparseIndices is scalar dense[i] = (i == sparseIndices ? sparseValues : defaultValue).
In this case, we don't need three trips to finish sparseToDense. Instead, we can iterate output dense and check whether the current index is equal to sparseIndices, and store the right value to output. Can you add a comment that there is an optimization opportunity for such kind of situation in case someone wants to do it in future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would discourage making things too fragmented, but this reminds me a change like this:

  1. Fill output tensor with the default value.
  2. Replace the default value with zero sparseValues at indices.

let d0 = index / uniforms.updatesShape[1];
let d1 = index - d0 * uniforms.updatesShape[1];
let d0 = index / uniforms.outShape[1];
let d1 = index - d0 * uniforms.outShape[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change it as we discussed offline.

PERF

The sparseToDense op takes an optional default value. Unlike scatterNd,
the output cannot be initialized with fill(), since the default value is
a scalar tensor (which could be the result of a previous op) than a
scalar number. The (horrible!) workaround here is to broadcast the value
with tile().

The other challenge is if the kernel should discard the original value
at index or accumulate on that. The magic is performed by splitting the
op into two "scatter" steps: 1) replace the default value with 0, and 2)
add the input sparse values to 0 or whatever. This avoids a bitmap for
recording whether the output element at index has been updated by
another invocation.

Closes tensorflow#6525
hujiajie added a commit to hujiajie/tfjs that referenced this pull request Jun 24, 2022
PERF

The sparseToDense op takes an optional default value. Unlike scatterNd,
the output cannot be initialized with fill(), since the default value is
a scalar tensor (which could be the result of a previous op) than a
scalar number. The (horrible!) workaround here is to broadcast the value
with tile().

The other challenge is if the kernel should discard the original value
at index or accumulate on that. The magic is performed by splitting the
op into two "scatter" steps: 1) replace the default value with 0 or add
0 to 0, and 2) add the input sparse values to 0 or whatever. This avoids
a bitmap for recording whether the output element at index has been
updated by another invocation.

Closes tensorflow#6552
Copy link
Contributor

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit.

case 0:
break;
case 1:
if (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you remove if (true) and also some other similar places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto format looks bad to me without this.

Copy link

@gyagp gyagp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gyagp gyagp merged commit bdfb9d7 into tensorflow:master Jun 24, 2022
@hujiajie hujiajie deleted the for-master branch July 27, 2022 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants