-
Notifications
You must be signed in to change notification settings - Fork 362
Add IntxUnpackedTensor #2732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IntxUnpackedTensor #2732
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2732
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f6c9d09 with merge base e6b38bb ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| block_size: the block size for quantization, representing the granularity, for example groupwise quantization will have block_size (1, group_size) | ||
| """ | ||
|
|
||
| tensor_data_attrs = ["int_data", "scale", "zero_point"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw if you update these to tensor_data_names and tensor_attribute_names you'll be able to remove some of the implementations, see docs in https://github.com/pytorch/ao/pull/2710/files#diff-d2a11602a79e83305208472f1abe6a4106f02ce62a7f9524007181813863fcf6R687, example: #2738
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can still override the behavior in TorchAOBaseTensor, right?
For example, it looks like aten._to_copy.default gets auto-populated, but I want to define its dtype variant in addition to device variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be working, I haven't actively tested this behavior though, I'll try to add a test for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
| ) | ||
|
|
||
| @classmethod | ||
| def from_float( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we are standardizing on from_hp now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does hp stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high precision
torchao/quantization/quant_api.py
Outdated
| scale_dtype: Optional[torch.dtype] = None | ||
| layout: Layout = QDQLayout() | ||
| packing_format: PackingFormat = PackingFormat.UNPACKED | ||
| VERSION: int = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we updated the name to version
|
Any more concerns here @jerryzh168? |
| This format is inteded for torch.export use cases. | ||
| Tensor Attributes: | ||
| int_data: int data for quantization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use qdata to align with other tensors
| block_size=block_size, | ||
| ) | ||
|
|
||
| def get_plain(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no longer need this I think
| @classmethod | ||
| def from_hp( | ||
| cls, | ||
| float_tensor: torch.Tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use hp_tensor to align with the method name
| cls, | ||
| float_tensor: torch.Tensor, | ||
| block_size: Tuple[int], | ||
| dtype: torch.dtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to target_dtype for more clarity
|
|
||
| class IntxUnpackedTensor(TorchAOBaseTensor): | ||
| """ | ||
| intx quantization with unpacked format. Subbyte quantized data is represented as int8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to make it clearer, we can add a bit more description here about subbyte quantized data I think, we should mention the range of the quantized values are restricted to the quant_min and quant_max of the target bit width, e.g. for uint4, the values falls into range of 0 and 15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| block_size: Optional[Tuple[int]] = None, | ||
| ): | ||
| # Check plain data and infer block_size from shapes | ||
| if block_size is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be easier just to make block_size required? when is block_size None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. I did use it in the slice implementation, but I just added logic inside slice to recompute the block size.
| self.bit_width = bit_width | ||
| self.block_size = block_size | ||
|
|
||
| def __repr__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repr is also implemented by default in TorchAOBaseTensor when you define tensor_data_names and tensor_attribute_names btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| device = kwargs.pop("device") | ||
| dtype = kwargs.pop("dtype") | ||
| assert dtype in _FLOAT_TYPES | ||
| return self.__class__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: self.__class__ --> IntxUnpackedTensor to reduce runtime check and align with other code
| scale = aten.slice.Tensor(self.scale, dim, start_scale, end_scale, step) | ||
| zero_point = aten.slice.Tensor(self.zero_point, dim, start_scale, end_scale, step) | ||
|
|
||
| new = self.__class__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
93948a4 to
143fe91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, please add a bit details in PR summary to explain the context for the change and Test Plan as well.
* add intx unpacked tensor * up * up * up * up * up
This adds IntxUnpackedTensor, where subbyte quantized data is represented as int8. The range of the quantized values are restricted to the quant_min and quant_max of the target_dtype, e.g., if target_dtype=torch.int4, qdata will be an int8 tensor with values in [-8, 7]. Quantization is represented in a decomposed way.
This tensor is intended for export use cases that currently use AQT with QDQLayout.
The test plan are the new unit tests.