From 36f4a845a4f2f881b6a8218bcc5b724ef7ee6433 Mon Sep 17 00:00:00 2001 From: yiliu30 Date: Wed, 3 Jul 2024 16:21:30 +0800 Subject: [PATCH] remove 1x docs Signed-off-by: yiliu30 --- docs/source/dataset.md | 165 ---------------------------------- docs/source/distillation.md | 129 -------------------------- docs/source/pythonic_style.md | 146 ------------------------------ 3 files changed, 440 deletions(-) delete mode 100644 docs/source/dataset.md delete mode 100644 docs/source/distillation.md delete mode 100644 docs/source/pythonic_style.md diff --git a/docs/source/dataset.md b/docs/source/dataset.md deleted file mode 100644 index 0695d78a3ac..00000000000 --- a/docs/source/dataset.md +++ /dev/null @@ -1,165 +0,0 @@ -Dataset -======= - -1. [Introduction](#introduction) - -2. [Supported Framework Dataset Matrix](#supported-framework-dataset-matrix) - -3. [Get start with Dataset API](#get-start-with-dataset-api) - -4. [Examples](#examples) - -## Introduction - -To adapt to its internal dataloader API, Intel® Neural Compressor implements some built-in datasets. - -A dataset is a container which holds all data that can be used by the dataloader, and have the ability to be fetched by index or created as an iterator. One can implement a specific dataset by inheriting from the Dataset class by implementing `__iter__` method or `__getitem__` method, while implementing `__getitem__` method, `__len__` method is recommended. - -Users can use Neural Compressor built-in dataset objects as well as register their own datasets. - -## Supported Framework Dataset Matrix - -#### TensorFlow - -| Dataset | Parameters | Comments | Usage | -| :------ | :------ | :------ | :------ | -| MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | -| ImageRecord(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/validation-000-of-100
root/validation-001-of-100
...
root/validation-099-of-100
The file name needs to follow this pattern: '* - * -of- *' | **In yaml file:**
dataset:
   ImageRecord:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageRecord'] (root=root, transform=transform, filter=None)
| -| ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | -| ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | -| COCORecord(root, num_cores, transform, filter) | **root** (str): Root directory of dataset
**num_cores** (int, default=28):The number of input Datasets to interleave from in parallel
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Root is a full path to tfrecord file, which contains the file name.
**Please use Resize transform when batch_size > 1** | **In yaml file:**
dataset:
   COCORecord:
     root: /path/to/tfrecord
     num_cores: 28
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORecord'] (root, num_cores=28, transform=transform, filter=None) | -| COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
**Please use Resize transform when batch_size > 1** | **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | -| COCONpy(root, npy_dir, anno_dir) | **root** (str): Root directory of dataset
**npy_dir** (str, default='val2017'): npy file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory | Please arrange data in this way:
/root/npy_dir/1.jpg.npy
/root/npy_dir/2.jpg.npy
...
/root/npy_dir/n.jpg.npy
/root/anno_dir
**Please use Resize transform when batch_size > 1** | **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     npy_dir: /path/to/npy
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCONpy'] (root, npy_dir, anno_dir)
If anno_dir is not set, the dataset will use default label map | -| dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | -| dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | -| style_transfer(content_folder, style_folder, crop_ratio, resize_shape, image_format, transform, filter) | **content_folder** (str):Root directory of content images
**style_folder** (str):Root directory of style images
**crop_ratio** (float, default=0.1):cropped ratio to each side
**resize_shape** (tuple, default=(256, 256)):target size of image
**image_format** (str, default='jpg'): target image format
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Dataset used for style transfer task. This Dataset is to construct a dataset from two specific image holders representing content image folder and style image folder. | **In yaml file:**
dataset:
   style_transfer:
     content_folder: /path/to/content_folder
     style_folder: /path/to/style_folder
     crop_ratio: 0.1
     resize_shape: [256, 256]
     image_format: 'jpg'
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['style_transfer'] (content_folder, style_folder, crop_ratio, resize_shape, image_format, transform=transform, filter=None) | -| TFRecordDataset(root, transform, filter) | **root** (str): filename of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions |Root is a full path to tfrecord file, which contains the file name. | **In yaml file:**
dataset:
   TFRecordDataset:
     root: /path/to/tfrecord
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['TFRecordDataset'] (root, transform=transform) | -| bert(root, label_file, task, transform, filter) | **root** (str): path of dataset
**label_file** (str): path of label file
**task** (str, default='squad'): task type of model
**model_type** (str, default='bert'): model type, support 'bert'.
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset supports tfrecord data, please refer to [Guide](../examples/tensorflow/nlp/bert_large_squad/quantization/ptq/README.md) to create tfrecord file first. | **In yaml file:**
dataset:
   bert:
     root: /path/to/root
     label_file: /path/to/label_file
     task: squad
     model_type: bert
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['bert'] (root, label_file, transform=transform) | -| sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | - -#### PyTorch - -| Dataset | Parameters | Comments | Usage | -| :------ | :------ | :------ | :------ | -| MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | -| ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | -| ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | -| COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
**Please use Resize transform when batch_size>1**| **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | -| dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | -| dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | -| bert(dataset, task, model_type, transform, filter) | **dataset** (list): list of data
**task** (str): the task of the model, support "classifier", "squad"
**model_type** (str, default='bert'): model type, support 'distilbert', 'bert', 'xlnet', 'xlm'
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This Dataset is to construct from the Bert TensorDataset and not a full implementation from yaml config. The original repo link is: https://github.com/huggingface/transformers. When you want use this Dataset, you should add it before you initialize your DataLoader. | **In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['bert'] (dataset, task, model_type, transform=transform, filter=None)
Now not support yaml implementation | -| sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | - -#### MXNet - -| Dataset | Parameters | Comments | Usage | -| :------ | :------ | :------ | :------ | -| MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | -| ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | -| ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | -| COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
**Please use Resize transform when batch_size > 1**| **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | -| dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | -| dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | -| sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | - -#### ONNXRT - -| Dataset | Parameters | Comments | Usage | -| :------ | :------ | :------ | :------ | -| MNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/MNIST/, otherwise user should put mnist.npz under root/MNIST/ manually. | **In yaml file:**
dataset:
   MNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['MNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| FashionMNIST(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train**(bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/FashionMNIST/, otherwise user should put train-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz and t10k-images-idx3-ubyte.gz under root/FashionMNIST/ manually.| **In yaml file:**
dataset:
   FashionMNIST:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['FashionMNIST'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR10(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR10:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR10'] (root=root, train=False, transform=transform, filter=None, download=True) | -| CIFAR100(root, train, transform, filter, download) | **root** (str): Root directory of dataset
**train** (bool, default=False): If True, creates dataset from train subset, otherwise from validation subset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions
**download** (bool, default=True): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. | If download is True, it will download dataset to root/ and extract it automatically, otherwise user can download file from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz manually to root/ and extract it. | **In yaml file:**
dataset:
   CIFAR100:
     root: /path/to/root
     train: False
     download: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['CIFAR100'] (root=root, train=False, transform=transform, filter=None, download=True) | -| ImageFolder(root, transform, filter) | **root** (str): Root directory of dataset
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
root/class_1/xxx.png
root/class_1/xxy.png
root/class_1/xxz.png
...
root/class_n/123.png
root/class_n/nsdf3.png
root/class_n/asd932_.png
Please put images of different categories into different folders. | **In yaml file:**
dataset:
   ImageFolder:
     root: /path/to/root
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImageFolder'] (root=root,transform=transform, filter=None) | -| ImagenetRaw(data_path, image_list, transform, filter) | **data_path** (str): Root directory of dataset
**image_list** (str): data file, record image_names and their labels
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
data_path/img1.jpg
data_path/img2.jpg
...
data_path/imgx.jpg
dataset will read name and label of each image from image_list file, if user set image_list to None, it will read from data_path/val_map.txt automatically. | **In yaml file:**
dataset:
   ImagenetRaw:
     data_path: /path/to/image
     image_list: /path/to/label
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['ImagenetRaw'] (data_path, image_list, transform=transform, filter=None) | -| COCORaw(root, img_dir, anno_dir, transform, filter) | **root** (str): Root directory of dataset
**img_dir** (str, default='val2017'): image file directory
**anno_dir** (str, default='annotations/instances_val2017.json'): annotation file directory
**transform** (transform object, default=None): transform to process input data
**filter** (Filter objects, default=None): filter out examples according to specific conditions | Please arrange data in this way:
/root/img_dir/1.jpg
/root/img_dir/2.jpg
...
/root/img_dir/n.jpg
/root/anno_dir
***Please use Resize transform when batch_size > 1**| **In yaml file:**
dataset:
   COCORaw:
     root: /path/to/root
     img_dir: /path/to/image
     anno_dir: /path/to/annotation
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['COCORaw'] (root, img_dir, anno_dir, transform=transform, filter=None)
If anno_dir is not set, the dataset will use default label map | -| dummy(shape, low, high, dtype, label, transform, filter) | **shape** (list or tuple):shape of total samples, the first dimension should be the sample count of the dataset. support create multi shape tensors, use list of tuples for each tuple in the list, will create a such size tensor.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**label** (bool, default=True):whether to return 0 as label
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy:
     shape: [3, 224, 224, 3]
     low: 0.0
     high: 127.0
     dtype: float32
     label: True
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy'] (shape, low, high, dtype, label, transform=None, filter=None) | -| dummy_v2(input_shape, label_shape, low, high, dtype, transform, filter) | **input_shape** (list or tuple):create single or multi input tensors list represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   dummy_v2:
     input_shape: [224, 224, 3]
     label_shape: [1]
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['dummy_v2'] (input_shape, low, high, dtype, transform=None, filter=None) | -| GLUE(data_dir, model_name_or_path, max_seq_length, do_lower_case, task, model_type, dynamic_length, evaluate, transform, filter) | **data_dir** (str): The input data dir
**model_name_or_path** (str): Path to pre-trained student model or shortcut name,
**max_seq_length** (int, default=128): The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.
**do_lower_case** (bool, default=True): Whether or not to lowercase the input.
**task** (bool, default=True): The name of the task to fine-tune. Choices include mrpc, qqp, qnli, rte, sts-b, cola, mnli, wnli.
**model_type** (str, default='bert'): model type, support 'distilbert', 'bert', 'mobilebert', 'roberta'.
**dynamic_length** (bool, default=False): Whether to use fixed sequence length.
**evaluate** (bool, default=True): Whether do evaluation or training.
**transform** (bool, default=True): If true,
**filter** (bool, default=True): If true, | Refer to [this example](/examples/onnxrt/language_translation/bert) on how to prepare dataset | **In yaml file:**
dataset:
   bert:
     data_dir: False
     model_name_or_path: True
(transform and filter are not set in the range of dataset)
**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['bert'] (data_dir='/path/to/data/', model_name_or_path='bert-base-uncased', max_seq_length=128, task='mrpc', model_type='bert', dynamic_length=True, transform=None, filter=None) | -| sparse_dummy_v2(dense_shape, label_shape, sparse_ratio, low, high, dtype, transform, filter) | **dense_shape** (list or tuple):create single or multi sparse tensors, tuple represent the sample shape of the dataset, eg and image size should be represented as (224, 224, 3), tuple contains multiple list and represent multi input tensors.
**label_shape** (list or tuple):create single or multi label tensors list represent the sample shape of the label, eg and label size should be represented as (1,), tuple contains multiple list and represent multi label tensors. In yaml usage, it offers (1,) as the default value.
**sparse_ratio** (float, default=0.5): the ratio of sparsity, support [0, 1].
**low** (list or float, default=-128.):low out the tensor value range from[0, 1] to [0, low] or [low, 0] if low < 0, if float, will implement all tensors with same low value.
**high** (list or float, default=127.):high the tensor value by add all tensor element value high. If list, length of list should be same with shape list
**dtype** (list or str, default='float32'):support multi tensor dtype setting. If list, length of list should be same with shape list, if str, all tensors will use same dtype. dtype support 'float32', 'float16', 'uint8', 'int8', 'int32', 'int64', 'bool'
**transform** (transform object, default=None): dummy dataset does not need transform. If transform is not None, it will ignore it.
**filter** (Filter objects, default=None): filter out examples according to specific conditions | This dataset is to construct a dataset from a specific shape, the value range is calculated from: low * stand_normal(0, 1) + high. | **In yaml file:**
dataset:
   sparse_dummy_v2:
     dense_shape: [224, 224, 3]
     label_shape: [1]
     sparse_ratio: 0.5
     low: 0.0
     high: 127.0
     dtype: float32

**In user code:**
from neural_compressor.data import Datasets
datasets = Datasets(framework)
dataset = datasets['sparse_dummy_v2'] (dense_shape, label_shape, sparse_ratio, low, high, dtype, transform=None, filter=None) | - -## Get start with Dataset API - -### Config dataloader in a yaml file - -```yaml -quantization: - approach: post_training_static_quant - calibration: - dataloader: - dataset: - COCORaw: - root: /path/to/calibration/dataset - filter: - LabelBalance: - size: 1 - transform: - Resize: - size: 300 - -evaluation: - accuracy: - metric: - ... - dataloader: - batch_size: 16 - dataset: - COCORaw: - root: /path/to/evaluation/dataset - transform: - Resize: - size: 300 - performance: - dataloader: - batch_size: 16 - dataset: - dummy_v2: - input_shape: [224, 224, 3] -``` - -## User-specific dataset - -Users can register their own datasets as follows: - -```python -class Dataset(object): - def __init__(self, args): - # init code here - - def __getitem__(self, idx): - # use idx to get data and label - return data, label - - def __len__(self): - return len - -``` - -After defining the dataset class, pass it to the quantizer: - -```python -from neural_compressor.experimental import Quantization, common - -quantizer = Quantization(yaml_file) -quantizer.calib_dataloader = common.DataLoader( - dataset -) # user can pass more optional args to dataloader such as batch_size and collate_fn -quantizer.model = graph -quantizer.eval_func = eval_func -q_model = quantizer.fit() -``` - -## Examples - -- Refer to this [example](https://github.com/intel/neural-compressor/tree/v1.14.2/examples/onnxrt/object_detection/onnx_model_zoo/DUC/quantization/ptq) to learn how to define a customised dataset. - -- Refer to this [HelloWorld example](/examples/helloworld/tf_example6) to learn how to configure a built-in dataset. diff --git a/docs/source/distillation.md b/docs/source/distillation.md deleted file mode 100644 index 7e2d6b063ff..00000000000 --- a/docs/source/distillation.md +++ /dev/null @@ -1,129 +0,0 @@ -Distillation -============ - -1. [Introduction](#introduction) - - 1.1. [Knowledge Distillation](#knowledge-distillation) - - 1.2. [Intermediate Layer Knowledge Distillation](#intermediate-layer-knowledge-distillation) - - 1.3. [Self Distillation](#self-distillation) - -2. [Distillation Support Matrix](#distillation-support-matrix) -3. [Get Started with Distillation API ](#get-started-with-distillation-api) -4. [Examples](#examples) - -## Introduction - -Distillation is one of popular approaches of network compression, which transfers knowledge from a large model to a smaller one without loss of validity. As smaller models are less expensive to evaluate, they can be deployed on less powerful hardware (such as a mobile device). Graph shown below is the workflow of the distillation, the teacher model will take the same input that feed into the student model to produce the output that contains knowledge of the teacher model to instruct the student model. -
- -Architecture - -Intel® Neural Compressor supports Knowledge Distillation, Intermediate Layer Knowledge Distillation and Self Distillation algorithms. - -### Knowledge Distillation -Knowledge distillation is proposed in [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531). It leverages the logits (the input of softmax in the classification tasks) of teacher and student model to minimize the the difference between their predicted class distributions, this can be done by minimizing the below loss function. - -$$L_{KD} = D(z_t, z_s)$$ - -Where $D$ is a distance measurement, e.g. Euclidean distance and Kullback–Leibler divergence, $z_t$ and $z_s$ are the logits of teacher and student model, or predicted distributions from softmax of the logits in case the distance is measured in terms of distribution. - -### Intermediate Layer Knowledge Distillation - -There are more information contained in the teacher model beside its logits, for example, the output features of the teacher model's intermediate layers often been used to guide the student model, as in [Patient Knowledge Distillation for BERT Model Compression](https://arxiv.org/pdf/1908.09355) and [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984). The general loss function for this approach can be summarized as follow. - -$$L_{KD} = \sum\limits_i D(T_t^{n_i}(F_t^{n_i}), T_s^{m_i}(F_s^{m_i}))$$ - -Where $D$ is a distance measurement as before, $F_t^{n_i}$ the output feature of the $n_i$'s layer of the teacher model, $F_s^{m_i}$ the output feature of the $m_i$'s layer of the student model. Since the dimensions of $F_t^{n_i}$ and $F_s^{m_i}$ are usually different, the transformations $T_t^{n_i}$ and $T_s^{m_i}$ are needed to match dimensions of the two features. Specifically, the transformation can take the forms like identity, linear transformation, 1X1 convolution etc. - -### Self Distillation - -Self-distillation ia a one-stage training method where the teacher model and student models can be trained together. It attaches several attention modules and shallow classifiers at different depths of neural networks and distills knowledge from the deepest classifier to the shallower classifiers. Different from the conventional knowledge distillation methods where the knowledge of the teacher model is transferred to another student model, self-distillation can be considered as knowledge transfer in the same model, from the deeper layers to the shallower layers. -The additional classifiers in self-distillation allow the neural network to work in a dynamic manner, which leads to a much higher acceleration. -
- -Architecture - -Architecture from paper [Self-Distillation: Towards Efficient and Compact Neural Networks](https://ieeexplore.ieee.org/document/9381661) - -## Distillation Support Matrix - -|Distillation Algorithm |PyTorch |TensorFlow | -|------------------------------------------------|:--------:|:---------:| -|Knowledge Distillation |✔ |✔ | -|Intermediate Layer Knowledge Distillation |✔ |Will be supported| -|Self Distillation |✔ |✖ | - -## Get Started with Distillation API - -User can pass the customized training/evaluation functions to `Distillation` for flexible scenarios. In this case, distillation process can be done by pre-defined hooks in Neural Compressor. User needs to put those hooks inside the training function. - -Neural Compressor defines several hooks for user pass - -``` -on_train_begin() : Hook executed before training begins -on_after_compute_loss(input, student_output, student_loss) : Hook executed after each batch inference of student model -on_epoch_end() : Hook executed at each epoch end -``` - -Following section shows how to use hooks in user pass-in training function: - -```python -def training_func_for_nc(model): - compression_manager.on_train_begin() - for epoch in range(epochs): - compression_manager.on_epoch_begin(epoch) - for i, batch in enumerate(dataloader): - compression_manager.on_step_begin(i) - ...... - output = model(batch) - loss = ...... - loss = compression_manager.on_after_compute_loss(batch, output, loss) - loss.backward() - compression_manager.on_before_optimizer_step() - optimizer.step() - compression_manager.on_step_end() - compression_manager.on_epoch_end() - compression_manager.on_train_end() - -... -``` - -In this case, the launcher code for Knowledge Distillation is like the following: - -```python -from neural_compressor.training import prepare_compression -from neural_compressor.config import DistillationConfig, KnowledgeDistillationLossConfig - -distil_loss_conf = KnowledgeDistillationLossConfig() -conf = DistillationConfig(teacher_model=teacher_model, criterion=distil_loss_conf) -criterion = nn.CrossEntropyLoss() -optimizer = torch.optim.SGD(model.parameters(), lr=0.0001) -compression_manager = prepare_compression(model, conf) -model = compression_manager.model - -model = training_func_for_nc(model) -eval_func(model) -``` - -For Intermediate Layer Knowledge Distillation or Self Distillation, the only difference to above launcher code is that `distil_loss_conf` should be set accordingly as shown below. More detailed settings can be found in this [example](../../examples/pytorch/nlp/huggingface_models/text-classification/optimization_pipeline/distillation_for_quantization/fx/run_glue_no_trainer.py#L510) for Intermediate Layer Knowledge Distillation and this [example](../../examples/pytorch/image_recognition/torchvision_models/self_distillation/eager/main.py#L344) for Self Distillation. - -```python -from neural_compressor.config import ( - IntermediateLayersKnowledgeDistillationLossConfig, - SelfKnowledgeDistillationLossConfig, -) - -# for Intermediate Layer Knowledge Distillation -distil_loss_conf = IntermediateLayersKnowledgeDistillationLossConfig(layer_mappings=layer_mappings) - -# for Self Distillation -distil_loss_conf = SelfKnowledgeDistillationLossConfig(layer_mappings=layer_mappings) -``` -## Examples -[Distillation PyTorch Examples](../../examples/README.md#distillation-1) -
-[Distillation TensorFlow Examples](../../examples/README.md#distillation) -
-[Distillation Examples Results](./validated_model_list.md#validated-knowledge-distillation-examples) diff --git a/docs/source/pythonic_style.md b/docs/source/pythonic_style.md deleted file mode 100644 index d036e9775d5..00000000000 --- a/docs/source/pythonic_style.md +++ /dev/null @@ -1,146 +0,0 @@ -Pythonic Style Access for Configurations -==== - -1. [Introduction](#introduction) -2. [Supported Feature Matrix](#supported-feature-matrix) -3. [Get Started with Pythonic API for Configurations](#get-started-with-pythonic-api-for-configurations) - -## Introduction -To meet the variety of needs arising from various circumstances, INC now provides a -pythonic style access - Pythonic API - for same purpose of either user or framework configurations. - -The Pythonic API for Configuration allows users to specify configurations -directly in their python codes without referring to -a separate YAML file. While we support both simultaneously, -the Pythonic API for Configurations has several advantages over YAML files, -which one can tell from usages in the context below. Hence, we recommend -users to use the Pythonic API for Configurations moving forward. - -## Supported Feature Matrix - -### Pythonic API for User Configurations -| Optimization Techniques | Pythonic API | -|-------------------------|:------------:| -| Quantization | ✔ | -| Pruning | ✔ | -| Distillation | ✔ | -| NAS | ✔ | -### Pythonic API for Framework Configurations - -| Framework | Pythonic API | -|------------|:------------:| -| TensorFlow | ✔ | -| PyTorch | ✔ | -| ONNX | ✔ | -| MXNet | ✔ | - -## Get Started with Pythonic API for Configurations - -### Pythonic API for User Configurations -Now, let's go through the Pythonic API for Configurations in the order of -sections similar as in user YAML files. - -#### Quantization - -To specify quantization configurations, users can use the following -Pythonic API step by step. - -* First, load the ***config*** module -```python -from neural_compressor import config -``` -* Next, assign values to the attributes of *config.quantization* to use specific configurations, and pass the config to *Quantization* API. -```python -config.quantization.inputs = ["image"] # list of str -config.quantization.outputs = ["out"] # list of str -config.quantization.backend = "onnxrt_integerops" # support tensorflow, tensorflow_itex, pytorch, pytorch_ipex, pytorch_fx, onnxrt_qlinearops, onnxrt_integerops, onnxrt_qdq, onnxrt_qoperator, mxnet -config.quantization.approach = "post_training_dynamic_quant" # support post_training_static_quant, post_training_dynamic_quant, quant_aware_training -config.quantization.device = "cpu" # support cpu, gpu -config.quantization.op_type_dict = {"Conv": {"weight": {"dtype": ["fp32"]}, "activation": {"dtype": ["fp32"]}}} # dict -config.quantization.strategy = "mse" # support basic, mse, bayesian, random, exhaustive -config.quantization.objective = "accuracy" # support performance, accuracy, modelsize, footprint -config.quantization.timeout = 100 # int, default is 0 -config.quantization.accuracy_criterion.relative = 0.5 # float, default is 0.01 -config.quantization.reduce_range = ( - False # bool. default value depends on hardware, True if cpu supports VNNI instruction, otherwise is False -) -config.quantization.use_bf16 = False # bool -from neural_compressor.experimental import Quantization - -quantizer = Quantization(config) -``` - -#### Distillation -To specify distillation configurations, users can assign values to -the corresponding attributes. -```python -from neural_compressor import config - -config.distillation.optimizer = {"SGD": {"learning_rate": 0.0001}} - -from neural_compressor.experimental import Distillation - -distiller = Distillation(config) -``` -#### Pruning -To specify pruning configurations, users can assign values to the corresponding attributes. -```python -from neural_compressor import config - -config.pruning.weight_compression.initial_sparsity = 0.0 -config.pruning.weight_compression.target_sparsity = 0.9 -config.pruning.weight_compression.max_sparsity_ratio_per_layer = 0.98 -config.pruning.weight_compression.prune_type = "basic_magnitude" -config.pruning.weight_compression.start_epoch = 0 -config.pruning.weight_compression.end_epoch = 3 -config.pruning.weight_compression.start_step = 0 -config.pruning.weight_compression.end_step = 0 -config.pruning.weight_compression.update_frequency = 1.0 -config.pruning.weight_compression.update_frequency_on_step = 1 -config.pruning.weight_compression.prune_domain = "global" -config.pruning.weight_compression.pattern = "tile_pattern_1x1" - -from neural_compressor.experimental import Pruning - -prune = Pruning(config) -``` -#### NAS -To specify nas configurations, users can assign values to the -corresponding attributes. - -```python -from neural_compressor import config - -config.nas.approach = "dynas" -from neural_compressor.experimental import NAS - -nas = NAS(config) -``` - - -#### Benchmark -To specify benchmark configurations, users can assign values to the -corresponding attributes. -```python -from neural_compressor import config - -config.benchmark.warmup = 10 -config.benchmark.iteration = 10 -config.benchmark.cores_per_instance = 10 -config.benchmark.num_of_instance = 10 -config.benchmark.inter_num_of_threads = 10 -config.benchmark.intra_num_of_threads = 10 - -from neural_compressor.experimental import Benchmark - -benchmark = Benchmark(config) -``` -### Pythonic API for Framework Configurations -Now, let's go through the Pythonic API for Configurations in setting up similar framework -capabilities as in YAML files. Users can specify a framework's (eg. ONNX Runtime) capability by -assigning values to corresponding attributes. - -```python -config.onnxruntime.precisions = ["int8", "uint8"] -config.onnxruntime.graph_optimization_level = "DISABLE_ALL" # only onnxruntime has graph_optimization_level attribute -```