Don't Serialize Scales/ZP in Flatbuffer

Other forms of data like weights/bias are serialized separately (either data store or end of flatbuffer segment), Scales and ZP however are serialized straight into Flatbuffer. This was ok when we were doing per-tensor and per-channel quantization because the number of scales was not large, but now with blockwise quantization the number of scales can be large. Realisitically since this is a form of data, we should put this in the same place weights/bias's are stored

Essentially we want to move data serialization of scales zp from this:
https://github.com/pytorch/executorch/blob/0c6a71b5d5ee37de8fa602d643fac4a1b1df2204/backends/xnnpack/operators/node_visitor.py#L278

to something like this:
https://github.com/pytorch/executorch/blob/0c6a71b5d5ee37de8fa602d643fac4a1b1df2204/backends/xnnpack/operators/node_visitor.py#L496

This is only something we should try to do with zeropoints/ scales that are tensors or lists. for per_tensor quantization with a single zp/scale, it becomes overkill to serialize the scales/zp separately, so we should leave those alone.

cc @digantdesai @cbilgin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't Serialize Scales/ZP in Flatbuffer #9029

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Don't Serialize Scales/ZP in Flatbuffer #9029

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions