-
Notifications
You must be signed in to change notification settings - Fork 7.1k
transforms of auxillary data #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm also intereseted in include transforms for segmentation. I think that it should be quite straightforward to extend the existing transforms to support multiple inputs given the current functional API. Maybe something like: class MultiRandomRotation(RandomRotation):
def __call__(self, img, target):
angle = self.get_params(self.degrees)
new_img = rotate(img, angle, self.resample, self.expand,
self.center, self.translate)
new_target = rotate(target, angle, self.resample, self.expand,
self.center, self.translate)
return new_img, new_target |
@daavoo is correct, the functional API was introduced to solve exactly this problem see this comment for more info on how this would be possible. @akssri as for new transforms for segmentation etc, these would be very welcome. It might be worth syncing with @killeent who is also working on some of these/has some ideas around this. |
Hi @akssri - what particular functionality have you ported thus far? I am looking into some of the object detection tooling necessary to implement a Faster R-CNN, but its very much in the early stages. |
Is there a desired naming convention for segmentation/detection transforms??? |
@alykhantejani @daavoo Yes, for the masks there's not much to be done, since they are, in essence, additional channels on the image. The functions for applying these transformations to bounding boxes will, however, need to be of a different flavor. Tensorflow's object detection (AFAIR) code uses separate classes for boxes and points and so on with transforms defined on them individually, but one only needs to support transformation on arrays of normalized co-ordinates to deal with most of these tasks (including convex masks). I'd personally prefer to extend relevant transformations in @killent I have a number of bounding box-aware resize/crop/pad... functions implemented. I'm currently using the cython nms function from, These are really the only non-pythonic bits necessary for fast inference/training on single-shot detectors (SSD, YOLO) (the addition of these ops in tensorflow coincides with their object detection efforts). Faster RCNN AFAIK only requires ROI-pooling in-addition, which is just strided slicing (or resampling..); all the anchor-box infrastructure is fairly similar (atleast from what I remember). Tensorflow has a (approximate) bi-partite matching matcher using the Hungarian algorithm, but this doesn't seem to be widely used. I'm not sure if there was a C++ kernel for this (probably not). The more common hard-mining functions can be written with vectorizable code and nms, though. Now that I think about it, I wonder if some these functions (nms) should go into pytorch. |
@akssri The original point of having the functional API for torchvision was to keep things simple and reuse code whenever possible in those more complex cases. |
@fmassa Fair enough. Is it okay to cook up some decorator magic to simulate multiple dispatch (but keep the names the same) ? I can imagine this would make it easier further on, while still keeping the functional API. |
To make things concrete, I have something like the following in mind. #
class GenericFunction(object):
methods = {}; initializedp = False
def __new__(cl, *args, **kwargs):
if not cl.initializedp:
[getattr(cl, _x) for _x in dir(cl)]
cl.initializedp=True
return cl.__call__(cl, *args, **kwargs)
def __call__(cl, x, *args, **kwargs):
despatch_method = cl.methods[type(x)]
return despatch_method(x, *args, **kwargs)
class Method(object):
def __init__(self, function, despatch_type):
self.function = function
self.despatch_type = despatch_type
def __get__(self, instance, objtype=None):
if objtype is not None:
objtype.methods[self.despatch_type] = self.function
return self.function
def defmethod(type):
def wrapper(function):
return Method(function, type)
return wrapper
class crop(GenericFunction):
methods = {}; initializedp = False
@defmethod(int)
def _int(x):
return x + 1
@defmethod(np.ndarray)
def _ndarray(x):
return x + np.pi |
why it is closed if you haven't found a solution to it? |
@tribbloid in 0.3, we provide reference training scripts for classification, detection and segmentation. They are currently under the |
Thanks for clarifying the location of the reference transformations. I was wondering if there is any reference script we can look at for using them. I looked at the colab notebook posted with the release of 0.3 and the reference train code but both of them use only the ToTensor and RandomHorizontalFlip which do not handle the target dictionary. To be more specific, I would like to use RandomResize, RandomCrop and CenterCrop from What is the correct way to use these methods? How should I pass the target_dict or its elements to be modified accordingly? Thanks for your feedback! |
@juanmed just follow the implementation in https://github.com/pytorch/vision/blob/master/references/detection/transforms.py and adapt it to your needs. Note that |
* RNN-Transducer from https://github.com/ryanleary/mlperf-rnnt-ref * fixes after moving eval function * Fix spelling of Speech Benchmark directory * Fix inference script for greedy decode * use 80 input features * dropout on each layer and no batch normalization * fix inference script after preprocessing rewrite * further fixes to inference.py after preprocessing rewrite
Problems like object detection/segmentation needs transforms on the image to be concomitant with transforms on auxiliary data (bounding boxes, segmentation masks...). I have had to implement such functionality by porting functions from,
https://github.com/tensorflow/models/blob/master/research/object_detection/core/preprocessor.py
If there's interest in having such functionality reside here, I would be very interested contributing to this. I'm not entirely sure what design choices I'd have to abide by here.
(I also have been porting some non-vectorizable procedures for NMS and RandomCrop sampling, which likely should also reside here).
The text was updated successfully, but these errors were encountered: