-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[feature request] rgb2lab / rgb2hsv / rgb2gray and other color space conversions (maybe upstream from kornia? or colorsys python core module?) #4029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @vadimkantorov. This is useful feedback as we are currently in discussions to improve our transforms. |
Maybe this is not even related to transforms per se, since useful also in non-augmentation contexts as well. For transforms, I really hope, there is more native batched transforms, clear separation between function and random-sampling its parameters, and also that there is an easy way to do inverse transforms (where possible). Also getting padding-masks for direct and inverse transforms is useful |
@vadimkantorov The tag transforms includes to anything under the transform module, including augmentations, functional and class transforms. Similar requests have been raised in the past (see #3224 for summary), so that's something we are looking into. |
@vadimkantorov I think adding a Wondering if you think this is a problem per se, and that this is up to the user to make sure that the input tensors are in RGB colorspace? |
I think it's not a problem per se, as we almost do not encode semantics into the tensor type system in PyTorch, in the goal of being flexible and letting users quickly build on things. Same for NumPy. This problem persists with OpenCV returning BGR images, but it's up to users to remember to convert if this is needed. In my own practice, I use suffix notation to bring extra attention to semantics, e.g. I use names like If these functions stay in extra namespace and are not discussed in every other tutorial, the use will be limited to those looking for them, so it should not confuse the users of the most common usecases. |
If I understand the problem correctly. In torchvision it is assumed that the colorspace is We need to add documentation saying that all the transforms work on RGB Colorspace only and we do not support any other. This is something we faced earlier in bbox conversions, and we followed I guess a similar solution would work here? Would love to hear from @fmassa and @vadimkantorov 😃 |
I think many transforms actually don't care about the input color space and support everything. Yes, it would be good to add notes to the docs about supported color spaces. But even if it's said somewhere in the docs that the transforms were designed for RGB. And then there're manual color space conversion utils on the side, it would be sufficient and not cause confusion, as long as the image loaders load in RGB by default. |
I'm curious about the use-case for this, @vadimkantorov could you describe a bit more why this would be useful to you? I assume that passing a LAB image into a pretrained model would lead to invalid predictions, so I guess this would only be for new training experiments? |
You are right. Of course it would be invalid to put in LAB image into a RGB-trained model. So yes, this would only be for new models or new functionality. My usecase is this: reimplement SLIC superpixels extraction in PyTorch, this requires rgb2lab |
Thanks for the feedback! If we consider that interpolation works the same in RGB and in Lab, it's true that most transforms don't make a strong assumption w.r.t. the color space, except maybe stuff like ColorJitter or ToGrayScale. But as noted above, there aren't just transforms: there are also operators and models, and at the moment literally everything in torchvision is built with RGB/Grayscale in mind. On top of that, for consistency and coherence of the library, I think it's best not to introduce things for which we don't have an explicit need within the library. It's still valuable to support new/alternative use-cases though, but IMHO this would be better done in a somewhat separate way, something that we don't officially support but that power-users could still find and use, at their own risk. Typically such conversion snippet could be part of an FAQ entry with a strong disclaimer regarding its compatibility/support... but we don't have an FAQ. Maybe it could be part of a gallery example as well? Or even just a snippet here below that people could copy/paste? |
Well, one also does not have explicit need for drawing bounding boxes within the library (and yet it is was merged in). I think this argument is not very valid. At the end, torchvision is supposed to help users with standardized ways of doing very frequent chores in comptuer vision tasks. Reference implementation of color space transformation is one of these (and it's available in OpenCV since its beginning), and it'd be useful to have one working with GPU tensors in core - like in Kornia (as opposed to OpenCV / skimage). I think putting it into torchvision.color or torchvision.utils.color would make sense and not confuse anyone. The fact that the models don't support all possible color spaces is obvious, but this should not prevent adding these conversion reference implementations. Putting them into FAQ is also a not -bad option, but I don't see why having these copy-pastes in various codebases woudld be better than providing a stadnard tested implementation. Moreover, conversion APIs have stabilized over the dozens of years in various libraries, so probably a good APIs can just be copied from some other libraries and would not be subject of frequent change. A |
That's why it's a visualization util, not part of the core set of transforms / operators. And the drawing of bounding boxes and masks perfectly integrates with the rest of the library's ecosystem in terms of expected inputs and outputs. It's not an outlier, which a Lab conversion transform would be.
Unfortunately, we already have the rgb <-> grayscale conversions in the transforms module.
It's obvious now, because we only support RGB. The second we start having other color spaces conversions, users will assume that the models and transforms will "magically" work with whatever tensor you pass. I'm afraid that this isn't an assumption we can make, and as library authors we try hard to prevent users from doing wrong things. |
Lab, YCrCb, HSV conversions are classical computer vision tools, far from an outlier in the broader context
If wanted, this could be refactored into another namespace like what happened with core torch torch.linalg and torch.fft new namespaces (and there the use was probably much higher) A new "util" namespace can be started even if some older methods exist in "transforms" to enable and help with classical computer vision pipelines.
I think this is false. Users are already well-acquainted that preprocessing is a must (mean subtraction; plus [-1, 1] normalization).
Well, nothing can prevent users to corrupt the input tensor if they are feeling playful, and the library can't do anything about this in the end besides clear documentation. If these colorspace utils are not advertised in the main tutorials it should not be a problem. I don't see why anyone would out-of-the-blue apply colorsys functions cc @fmassa |
They'd be outliers in torchvision. That's my main point: the coherence of a library is very important both for its maintenance and for its users. Yes, we want to support as many use-cases as possible, but we can't just say "oh this will be useful to someone, let's put it in". On top of preventing wrong results, the scope of a library is also important. As far as I can tell, Lab conversions are out of scope for the core of torchvision, at least at the moment. But we can think of ways to help the users who need it to implement it easily.
Deprecating the widely used rgb <-> grayscale transforms just so we can introduce lab conversions in a new submodule so that users don't consider Lab images to be fully supported isn't a reasonable strategy, unfortunately. (this sentence is way too long already, which is an indicator that there are too many things going on). And it's very different from what happened to torch.linalg BTW, which is not just a namespace change. The new linalg module comes with tons of new features and improvements: https://60c76d975cb350913c6c73c8--shiftlab-pytorch-github-io.netlify.app/blog/pytorch-1.9-released/#torchlinalg
You're right. The point is that we should prevent what we can prevent, still. I'm not sure such argument really helps moving the discussion forward. |
Hereby I disengage :) Please feel free to close the issue as "wontfix" |
@vadimkantorov I think it's good that we have your use-case documented thoroughly in this post. There are parallel discussions on how to improve our transforms, so I think having this information is useful and can steer us to the right direction. Concerning moving forward with this proposal, I think more time and discussions are needed and as Nicolas mentioned, we need to find ways to avoid confusion. At the moment, it's unclear to me how we could add this and similar features without breaking compatibility and bloating the library. It's definitely on our radar to improve the situation, so please keep nudging us and providing your feedback. So I think instead of marking this as "wontfix", it's preferable to leave this open and mark it for additional discussion. |
Hi @datumbox @NicolasHug @vadimkantorov I do have a reasonable solution, though this might be improved upon with discussion. Create a simple transform in Usage:
Similarly we can provide a class for same Bloating issue Compatibility Issue Maintenance Issue: - Benefit to end user: -
I would love to hear thoughts from all 😄 I hope that I have proposed something reasonable. Feel free to correct me! |
Hi, Let me chime in a bit on this discussion. Thanks a ton @vadimkantorov, @NicolasHug , @datumbox and @oke-aditya for ideas.
Yes, and this is a recurring issue everywhere in the python ecosystem. How does PIL solve this issue? By attaching a This is not the first instance of this type of problem: bounding boxes have the same problem, as they can have different input representations (xyxy, xywh, etc, as pointed out by @oke-aditya in his comment). While it would be easy to add those color conversion functions to torchvision, the problem I want to minimize is for users to get silent bugs because of using LAB / HSV colorspaces. Those are all questions that it would be good for us to answer before implementing such functions, so that we have a coherent story. Here is one proposal (which is not good enough, but is one option): class ImageTensor(torch.Tensor):
def __init__(self, *args, *, mode='rgb', **kwargs):
super().__init__(*args, **kwargs)
self._img_mode = mode
# do the other stuff needed so that `._img_mode` gets propagated
# via __torch_function__, for functions that make sense In this representation, A similar approach could be done for bounding boxes as well, and segmentation masks. But what is the gotcha with this approach? torchscript won't work because of |
Some great thoughts by @fmassa . I had thought of creating similar solution for bounding boxes and segmentation masks. As similar to -> Creating Classes abstracts the idea of tensor! All the downstream libraries built on top of torchvision now need to use Well this is a trade-off, and a big breaking change too. |
Yes, exactly. That is the original motivation. I'm not yet 100% sure if this is the best way to go forward, but it's one possibility. Let's redirect discussions about transforms to the RFC from @pmeier pmeier/torchvision-datasets-rework#1 for now. We will be iterating on the design to get to a state where we are all happy with.
Those are good points, and it's one of the reasons why we have been careful without adding those features without careful thought. There is probably a way of making current functions still work in a BC way (as a |
rgb2hsv / hsv2rgb is also useful for interpolating between two colors: https://stackoverflow.com/questions/13488957/interpolate-from-one-color-to-another, this is useful for visualizations |
rgb2gray has been implemented as https://pytorch.org/vision/stable/generated/torchvision.transforms.functional.rgb_to_grayscale.html#torchvision.transforms.functional.rgb_to_grayscale other transforms existing in opencv / kornia / colorsys / skimage would still be nice to have (rgb<->hsv etc)
|
btw, TensorFlow's tf.image package does contain rgb_to_hsv, rgb_to_yuv and a few others: https://www.tensorflow.org/api_docs/python/tf/image/rgb_to_hsv |
It might be feasible with new transforms API (which I haven't explored yet) cc @vfdev-5 |
Btw related: |
Uh oh!
There was an error while loading. Please reload this page.
It would be nice to have them available in the standard library (both on CPU/GPU) rather than rolling your own. Kornia seems to have implemented it: https://kornia.readthedocs.io/en/latest/color.html, skimage as well: https://scikit-image.org/docs/dev/api/skimage.color.html, python core as well: https://docs.python.org/3/library/colorsys.html, but I think it's basic enough to merit inclusion of similar functions in core.
(my usecase: SLIC superpixel extraction which requires rgb2lab and selective search algorithm which requires rgb2hsv and rgb2lab)
cc @vfdev-5
The text was updated successfully, but these errors were encountered: