intel
diff --git a/‎neural_compressor/torch/algorithms/fp8_quant/internal/diffusion_evaluation/README‎
Lines changed: 32 additions & 0 deletions b/‎neural_compressor/torch/algorithms/fp8_quant/internal/diffusion_evaluation/README‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎neural_compressor/torch/algorithms/fp8_quant/internal/diffusion_evaluation/SR_evaluation/README.md‎
Lines changed: 37 additions & 0 deletions b/‎neural_compressor/torch/algorithms/fp8_quant/internal/diffusion_evaluation/SR_evaluation/README.md‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎neural_compressor/torch/algorithms/fp8_quant/internal/diffusion_evaluation/SR_evaluation/create_SR_dataset.py‎
Lines changed: 87 additions & 0 deletions b/‎neural_compressor/torch/algorithms/fp8_quant/internal/diffusion_evaluation/SR_evaluation/create_SR_dataset.py‎
Lines changed: 87 additions & 0 deletions
@@ -0,0 +1,32 @@
+How to calculate FID and clip score:
+
+We will use the MS-COCO database. We use this for two things:
+- Generating a large amount of prompts which we can use to create diffusion images
+- Once we have diffusion images, we need a "ground truth" dataset to calculate the FID.
+
+1) Run a python script which does the following things:
+ - Takes a subset of MSCOCO
+ - Create a CSV with prompts which can then be inserted into the diffusion model. These prompts are taken from captions of the images in the subset 
+ - Create a new folder with the images from the subset
+ - The standard number of images for this evaluation is 30K or 10K
+
+run the following:
+
+python create_dataset.py /datasets/coco2014 <path to save CSV and dataset> <size of subset>
+
+Now, create the generated images from the csv file
+
+IMPORTANT!! - the script that does the actual evaluation (explained below) expects to get an image where the prompt is the title of the image. For example, if the prompt is "a monster playing the guitar" then the name of the file that is created using diffusion should be "<path>/a monster playing the guitar.png" (or jpg or whatever)
+
+IMPORTANT!! #2 - from my experience, stable diffusion inference returns an error for prompts with the character '/' in them. There are very few, around one in a thousand. My recomendation, if you want to evaluate N images, create a subset of the size N+30 and delete prompts with '/' in them.  After creating the CSV I just deleted these prompts manually (takes 10 seconds to do).
+(Perhaps automating this should be a future commit).
+
+2) Now, run the evaluation script. This does the following:
+-	Calculates the CLIP score – takes the CLIP embedding of each generated image and the embedding of the caption that created it (in this case each image and its file name). Then, calculates the cosine distance between them. 
+- Calculates the FID - takes the real and generated images, and calculates according to the FID distance metric. 
+- insert the number of images to evaluate with - could be the number of images in the subset created above or less
+
+To do this, run:
+
+python evaluator.py --device hpu --real_images_path /datasets/coco2014/val2014 --diff_images_path <generated images path> --num_of_images <Num of images to evaluate with>
+
@@ -0,0 +1,37 @@
+How to calculate PSNR and SSIM for Super Resolution
+We will use the Imagenet validation dataset.
+
+The evaluation is done by the following steps:
+1) We take the Imagenet validation set which has 50,000 images (We can also take a subset) 
+2) Crop these Images to be 256*256 (center cropped), and save these images as the "ground truth" dataset. The name of 
+the saved image is its label.
+3) Downsample the images to be 64*64 (using bicubic interpolation) and then restore them using Super Resolution. 
+4) Calculate  PSNR and SSIM between each ground truth image and restored image, and print the mean.
+
+Steps 1,2 and 4 are inluded here, while step 3 (downsampling and restoring) should be done seperately, using the 
+desired Super Resolution method. Keep in mind that this script assumes that the images are stored in a specific format, 
+(detailed later). Later, the restored images path should be given as an input to step 4.
+
+You can skip step 1+2 and use the images at /datasets/imagenet/val_cropped_labeled
+You can also run a python script which does the following to the imagenet validation dataset:
+ - Crops images to 256*256 (this can also be changed using the argument --resize, 256*256 is the default)
+ - Saves the images with the convention <path>/<label>_<ID>.png
+ - a text file mapping imagenet class index to label is needed. It is given here as imagenet1000_clsidx_to_labels.txt, but 
+ can be given as an argument with --class_to_labels
+
+to do this, run the following: 
+
+python create_SR_dataset.py --images <imagenet validation path> --out_dir <path to save ground truth images>
+
+Now, create the generated images so they match the files created above (step 3)
+
+IMPORTANT!! - the script that does the actual evaluation (explained below) expects to get an image where the prompt in the same format 
+<generated images path>/<label>_<ID>.png. This means that the script expects the original and restored images to have the same name.
+
+Find an example in /workdisk/ilamprecht/diffusion/stablediffusionv2/scripts/superres_gen_imgs.py
+
+Now, run the evaluation script, which calculates PSNR and SSIM and prints the mean (step 4)
+
+To do this, run:
+
+python super_res_eval.py --num_images <desired number of images up to 50000> --real_images <real images path> --gen_images <generated images path>
@@ -0,0 +1,87 @@
+import os
+import torch.nn.parallel
+import torch.optim
+import torch.utils.data
+import torch
+import torchvision.transforms as transforms
+import torchvision.datasets as datasets
+from torchvision.utils import save_image
+import argparse
+
+from torchvision.transforms import functional as F
+
+
+class CenterCropAndResize(object):
+    def __init__(self, size):
+        self.size = size
+
+    def __call__(self, img):
+        width, height = img.size
+        crop_size = min(width, height)
+        crop = F.center_crop(img, (crop_size, crop_size))
+        resize = F.resize(crop, self.size)
+        return resize
+
+def get_data_loader(path, dataset="ImageNet",
+                    workers=4, shuffle=None, pin_memory=True, resize = 256):
+    
+    #Data loader for ImageNet data.
+
+
+    # defines desired resize amd creates dataset
+    def get_dataset(path_to_data):
+        transformations = [CenterCropAndResize(resize), transforms.ToTensor()]
+        return datasets.ImageFolder(path_to_data, transforms.Compose(transformations))
+
+    # checks if given path is valid  
+    if isinstance(path, str):
+        curr_path = path
+        if not os.path.exists(curr_path):
+            raise FileNotFoundError(f"Directory {curr_path} doesn't exist")
+        data_dir = curr_path
+    elif isinstance(path, list):
+        for path_ in path:
+            if os.path.exists(path_):
+                curr_path = path_
+                break
+        else:
+            raise FileNotFoundError(f"None of the default data directories exist in your env,"
+                                    f" please manually specify one")
+        data_dir = os.path.join(curr_path, 'val')
+    else:
+        raise ValueError("get_data_loader expects list of paths or single path")
+
+    # create dataloader from dataset
+    dataset = get_dataset(data_dir)
+    data_loader = torch.utils.data.DataLoader(
+        dataset,
+        batch_size=1, shuffle=shuffle,
+        num_workers=workers, pin_memory=pin_memory)
+
+    return data_loader
+
+parser = argparse.ArgumentParser('Create dataset of real images for SR evaluation', add_help=False)
+
+parser.add_argument('--images', type = str, help = 'path to imagenet validation set')
+parser.add_argument('--out_dir', type = str, help = 'path to save images with correct format (cropped + modified file name)')
+parser.add_argument('--resize', type = int, default = 256, help = 'dimensions to resize image')
+parser.add_argument('--class_to_labels', type = str, default = 'imagenet1000_clsidx_to_labels.txt', help = 'path to text file containing' 
+        'mapping between class index and label')
+        
+args = parser.parse_args()
+images = args.images
+out_dir = args.out_dir
+resize = args.resize
+class_to_labels = args.class_to_labels
+
+with torch.no_grad():
+    # get dataloader
+    dl = get_data_loader(images, resize = resize)
+
+    # open idx2label, which matches an integer signifying class with the correcs label
+    idx2label = eval(open(class_to_labels).read())
+
+    # save images with correct filename
+    for i,image in enumerate(dl):
+        label = idx2label.get(image[1].item())
+        save_image(image[0], f'{out_dir}/{label}_{i}.png')