Skip to content

Don't Serialize Scales/ZP in Flatbuffer #9029

@mcr229

Description

@mcr229

Other forms of data like weights/bias are serialized separately (either data store or end of flatbuffer segment), Scales and ZP however are serialized straight into Flatbuffer. This was ok when we were doing per-tensor and per-channel quantization because the number of scales was not large, but now with blockwise quantization the number of scales can be large. Realisitically since this is a form of data, we should put this in the same place weights/bias's are stored

Essentially we want to move data serialization of scales zp from this:

scale=scale.flatten().tolist(),

to something like this:

def get_serialized_buffer_index(

This is only something we should try to do with zeropoints/ scales that are tensors or lists. for per_tensor quantization with a single zp/scale, it becomes overkill to serialize the scales/zp separately, so we should leave those alone.

cc @digantdesai @cbilgin

Metadata

Metadata

Assignees

Labels

good first issueGood for newcomersmodule: xnnpackIssues related to xnnpack delegation and the code under backends/xnnpack/triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Done

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions