-
Notifications
You must be signed in to change notification settings - Fork 656
Description
Other forms of data like weights/bias are serialized separately (either data store or end of flatbuffer segment), Scales and ZP however are serialized straight into Flatbuffer. This was ok when we were doing per-tensor and per-channel quantization because the number of scales was not large, but now with blockwise quantization the number of scales can be large. Realisitically since this is a form of data, we should put this in the same place weights/bias's are stored
Essentially we want to move data serialization of scales zp from this:
scale=scale.flatten().tolist(), |
to something like this:
def get_serialized_buffer_index( |
This is only something we should try to do with zeropoints/ scales that are tensors or lists. for per_tensor quantization with a single zp/scale, it becomes overkill to serialize the scales/zp separately, so we should leave those alone.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status