Skip to content

transforms of auxillary data #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Nov 10, 2017 · 12 comments
Closed

transforms of auxillary data #329

ghost opened this issue Nov 10, 2017 · 12 comments

Comments

@ghost
Copy link

ghost commented Nov 10, 2017

Problems like object detection/segmentation needs transforms on the image to be concomitant with transforms on auxiliary data (bounding boxes, segmentation masks...). I have had to implement such functionality by porting functions from,
https://github.com/tensorflow/models/blob/master/research/object_detection/core/preprocessor.py

If there's interest in having such functionality reside here, I would be very interested contributing to this. I'm not entirely sure what design choices I'd have to abide by here.

(I also have been porting some non-vectorizable procedures for NMS and RandomCrop sampling, which likely should also reside here).

@daavoo
Copy link
Contributor

daavoo commented Nov 10, 2017

I'm also intereseted in include transforms for segmentation. I think that it should be quite straightforward to extend the existing transforms to support multiple inputs given the current functional API. Maybe something like:

class MultiRandomRotation(RandomRotation):
    def __call__(self, img, target):
        angle = self.get_params(self.degrees)
        new_img = rotate(img, angle, self.resample, self.expand,
            self.center, self.translate)
        new_target = rotate(target, angle, self.resample, self.expand,
            self.center, self.translate)
        return  new_img, new_target

@alykhantejani
Copy link
Contributor

@daavoo is correct, the functional API was introduced to solve exactly this problem see this comment for more info on how this would be possible.

@akssri as for new transforms for segmentation etc, these would be very welcome. It might be worth syncing with @killeent who is also working on some of these/has some ideas around this.

@killeent
Copy link

Hi @akssri - what particular functionality have you ported thus far? I am looking into some of the object detection tooling necessary to implement a Faster R-CNN, but its very much in the early stages.

@daavoo
Copy link
Contributor

daavoo commented Nov 10, 2017

Is there a desired naming convention for segmentation/detection transforms???
For example RandomRotation_Seg or RandomRotation_Det for the respective extensions of RandomRotation ??

@ghost
Copy link
Author

ghost commented Nov 11, 2017

@alykhantejani @daavoo Yes, for the masks there's not much to be done, since they are, in essence, additional channels on the image. The functions for applying these transformations to bounding boxes will, however, need to be of a different flavor.

Tensorflow's object detection (AFAIR) code uses separate classes for boxes and points and so on with transforms defined on them individually, but one only needs to support transformation on arrays of normalized co-ordinates to deal with most of these tasks (including convex masks).

I'd personally prefer to extend relevant transformations in functional.py by having them take on additional 2d-coordinates parameter that gets co-transformed when given. Having separate functions for feature inputs and one for co-ordinate inputs is not going to be pretty IMO.

@killent I have a number of bounding box-aware resize/crop/pad... functions implemented. I'm currently using the cython nms function from,
https://github.com/rbgirshick/py-faster-rcnn
and am in the process of porting sample_distorted_bounding_box from Tensorflow,
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/sample_distorted_bounding_box_op.cc

These are really the only non-pythonic bits necessary for fast inference/training on single-shot detectors (SSD, YOLO) (the addition of these ops in tensorflow coincides with their object detection efforts). Faster RCNN AFAIK only requires ROI-pooling in-addition, which is just strided slicing (or resampling..); all the anchor-box infrastructure is fairly similar (atleast from what I remember).

Tensorflow has a (approximate) bi-partite matching matcher using the Hungarian algorithm, but this doesn't seem to be widely used. I'm not sure if there was a C++ kernel for this (probably not). The more common hard-mining functions can be written with vectorizable code and nms, though.

Now that I think about it, I wonder if some these functions (nms) should go into pytorch.

@fmassa
Copy link
Member

fmassa commented Nov 12, 2017

@akssri The original point of having the functional API for torchvision was to keep things simple and reuse code whenever possible in those more complex cases.
I think that having separate function for different data domains is better and keeps the interface simpler. So I think it would be better to implement a flip_bbox function, that takes the box plus the image width, and performs the flip.
I think that nms is very specific to some vision tasks, and should ideally live outside pytorch (maybe in torchvision?).

@ghost
Copy link
Author

ghost commented Nov 13, 2017

@fmassa Fair enough. Is it okay to cook up some decorator magic to simulate multiple dispatch (but keep the names the same) ? I can imagine this would make it easier further on, while still keeping the functional API.

@ghost
Copy link
Author

ghost commented Nov 14, 2017

To make things concrete, I have something like the following in mind.

#
class GenericFunction(object):
    methods = {}; initializedp = False
    def __new__(cl, *args, **kwargs):
        if not cl.initializedp:
            [getattr(cl, _x) for _x in dir(cl)]
            cl.initializedp=True
        return cl.__call__(cl, *args, **kwargs)
    def __call__(cl, x, *args, **kwargs):
        despatch_method = cl.methods[type(x)]
        return despatch_method(x, *args, **kwargs)

class Method(object):
    def __init__(self, function, despatch_type):
        self.function = function
        self.despatch_type = despatch_type
    def __get__(self, instance, objtype=None):
        if objtype is not None:
            objtype.methods[self.despatch_type] = self.function
        return self.function
    
def defmethod(type):
    def wrapper(function):
        return Method(function, type)
    return wrapper

class crop(GenericFunction):
    methods = {}; initializedp = False
    @defmethod(int)
    def _int(x):
        return x + 1
    @defmethod(np.ndarray)
    def _ndarray(x):
        return x + np.pi

@ghost ghost closed this as completed Feb 3, 2018
@tribbloid
Copy link

why it is closed if you haven't found a solution to it?

@fmassa
Copy link
Member

fmassa commented May 24, 2019

@tribbloid in 0.3, we provide reference training scripts for classification, detection and segmentation.
It includes inside helper functions to perform data transformation on segmentation masks / bounding boxes / keypoints.

They are currently under the references/ folder in the torchvision repo, and once we are more clear on the API it will be moved to the torchvision package

@juanmed
Copy link

juanmed commented Jun 24, 2019

@fmassa

Thanks for clarifying the location of the reference transformations. I was wondering if there is any reference script we can look at for using them. I looked at the colab notebook posted with the release of 0.3 and the reference train code but both of them use only the ToTensor and RandomHorizontalFlip which do not handle the target dictionary.

To be more specific, I would like to use RandomResize, RandomCrop and CenterCrop from references/segmentation/transforms.py but they do not seem to work with the target dictionary which contains 'boxes' and 'area' keys, which should also be modified after resizing the image.

What is the correct way to use these methods? How should I pass the target_dict or its elements to be modified accordingly?

Thanks for your feedback!

@fmassa
Copy link
Member

fmassa commented Jun 24, 2019

@juanmed just follow the implementation in https://github.com/pytorch/vision/blob/master/references/detection/transforms.py and adapt it to your needs.

Note that Resize is part of the detection models now, and is currently present in https://github.com/pytorch/vision/blob/master/torchvision/models/detection/transform.py

rajveerb pushed a commit to rajveerb/vision that referenced this issue Nov 30, 2023
* RNN-Transducer from https://github.com/ryanleary/mlperf-rnnt-ref

* fixes after moving eval function

* Fix spelling of Speech Benchmark directory

* Fix inference script for greedy decode

* use 80 input features

* dropout on each layer and no batch normalization

* fix inference script after preprocessing rewrite

* further fixes to inference.py after preprocessing rewrite
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants