Skip to content

[feature request] rgb2lab / rgb2hsv / rgb2gray and other color space conversions (maybe upstream from kornia? or colorsys python core module?) #4029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vadimkantorov opened this issue Jun 9, 2021 · 25 comments

Comments

@vadimkantorov
Copy link

vadimkantorov commented Jun 9, 2021

It would be nice to have them available in the standard library (both on CPU/GPU) rather than rolling your own. Kornia seems to have implemented it: https://kornia.readthedocs.io/en/latest/color.html, skimage as well: https://scikit-image.org/docs/dev/api/skimage.color.html, python core as well: https://docs.python.org/3/library/colorsys.html, but I think it's basic enough to merit inclusion of similar functions in core.

(my usecase: SLIC superpixel extraction which requires rgb2lab and selective search algorithm which requires rgb2hsv and rgb2lab)

cc @vfdev-5

@datumbox
Copy link
Contributor

Thanks @vadimkantorov. This is useful feedback as we are currently in discussions to improve our transforms.

@vadimkantorov
Copy link
Author

Maybe this is not even related to transforms per se, since useful also in non-augmentation contexts as well.

For transforms, I really hope, there is more native batched transforms, clear separation between function and random-sampling its parameters, and also that there is an easy way to do inverse transforms (where possible). Also getting padding-masks for direct and inverse transforms is useful

@datumbox
Copy link
Contributor

@vadimkantorov The tag transforms includes to anything under the transform module, including augmentations, functional and class transforms. Similar requests have been raised in the past (see #3224 for summary), so that's something we are looking into.

@fmassa
Copy link
Member

fmassa commented Jun 21, 2021

@vadimkantorov I think adding a rgb2lab would be nice in principle, but the current issue I see with it is that there is no way for the library to know if the input tensor that you are passing is indeed in rgb colorspace. So you can silently get wrong results if you are not careful.
As of now, torchvision assumes that the tensor is either in RGB or grayscale space, and don't try to make any further assumptions otherwise. This works fine because RGB and grayscale have different number of channels, but things break if we start supporting HSV / LAB / etc colorspaces.

Wondering if you think this is a problem per se, and that this is up to the user to make sure that the input tensors are in RGB colorspace?

@vadimkantorov
Copy link
Author

I think it's not a problem per se, as we almost do not encode semantics into the tensor type system in PyTorch, in the goal of being flexible and letting users quickly build on things. Same for NumPy.

This problem persists with OpenCV returning BGR images, but it's up to users to remember to convert if this is needed.

In my own practice, I use suffix notation to bring extra attention to semantics, e.g. I use names like mean_rgb, std_rgb in my input normalization functions, especially if I do image reading with OpenCV.

If these functions stay in extra namespace and are not discussed in every other tutorial, the use will be limited to those looking for them, so it should not confuse the users of the most common usecases.

@oke-aditya
Copy link
Contributor

If I understand the problem correctly. In torchvision it is assumed that the colorspace is RGB .
I think if we provide conversions from RGB to other colorspaces and vice versa it would work.

We need to add documentation saying that all the transforms work on RGB Colorspace only and we do not support any other.

This is something we faced earlier in bbox conversions, and we followed xyxy assumption, provided a utility to convert to xyxy and from xyxy.

I guess a similar solution would work here?

Would love to hear from @fmassa and @vadimkantorov 😃

@vadimkantorov
Copy link
Author

I think many transforms actually don't care about the input color space and support everything.

Yes, it would be good to add notes to the docs about supported color spaces. But even if it's said somewhere in the docs that the transforms were designed for RGB. And then there're manual color space conversion utils on the side, it would be sufficient and not cause confusion, as long as the image loaders load in RGB by default.

@NicolasHug
Copy link
Member

NicolasHug commented Jun 28, 2021

I'm curious about the use-case for this, @vadimkantorov could you describe a bit more why this would be useful to you?

I assume that passing a LAB image into a pretrained model would lead to invalid predictions, so I guess this would only be for new training experiments?

@vadimkantorov
Copy link
Author

vadimkantorov commented Jun 28, 2021

You are right. Of course it would be invalid to put in LAB image into a RGB-trained model. So yes, this would only be for new models or new functionality.

My usecase is this: reimplement SLIC superpixels extraction in PyTorch, this requires rgb2lab

@NicolasHug
Copy link
Member

Thanks for the feedback!

If we consider that interpolation works the same in RGB and in Lab, it's true that most transforms don't make a strong assumption w.r.t. the color space, except maybe stuff like ColorJitter or ToGrayScale.

But as noted above, there aren't just transforms: there are also operators and models, and at the moment literally everything in torchvision is built with RGB/Grayscale in mind.

On top of that, for consistency and coherence of the library, I think it's best not to introduce things for which we don't have an explicit need within the library.

It's still valuable to support new/alternative use-cases though, but IMHO this would be better done in a somewhat separate way, something that we don't officially support but that power-users could still find and use, at their own risk. Typically such conversion snippet could be part of an FAQ entry with a strong disclaimer regarding its compatibility/support... but we don't have an FAQ. Maybe it could be part of a gallery example as well? Or even just a snippet here below that people could copy/paste?

@vadimkantorov
Copy link
Author

Well, one also does not have explicit need for drawing bounding boxes within the library (and yet it is was merged in). I think this argument is not very valid. At the end, torchvision is supposed to help users with standardized ways of doing very frequent chores in comptuer vision tasks. Reference implementation of color space transformation is one of these (and it's available in OpenCV since its beginning), and it'd be useful to have one working with GPU tensors in core - like in Kornia (as opposed to OpenCV / skimage).

I think putting it into torchvision.color or torchvision.utils.color would make sense and not confuse anyone. The fact that the models don't support all possible color spaces is obvious, but this should not prevent adding these conversion reference implementations. Putting them into FAQ is also a not -bad option, but I don't see why having these copy-pastes in various codebases woudld be better than providing a stadnard tested implementation. Moreover, conversion APIs have stabilized over the dozens of years in various libraries, so probably a good APIs can just be copied from some other libraries and would not be subject of frequent change. A colorsys module also exists in Python core standard library, but it's not working with arrays, forcing one to write loops.

@NicolasHug
Copy link
Member

Well, one also does not have explicit need for drawing bounding boxes within the library (and yet it is was merged in). I think this argument is not very valid

That's why it's a visualization util, not part of the core set of transforms / operators. And the drawing of bounding boxes and masks perfectly integrates with the rest of the library's ecosystem in terms of expected inputs and outputs. It's not an outlier, which a Lab conversion transform would be.

I think putting it into torchvision.color or torchvision.utils.color would make sense and not confuse anyone

Unfortunately, we already have the rgb <-> grayscale conversions in the transforms module.

The fact that the models don't support all possible color spaces is obvious

It's obvious now, because we only support RGB. The second we start having other color spaces conversions, users will assume that the models and transforms will "magically" work with whatever tensor you pass. I'm afraid that this isn't an assumption we can make, and as library authors we try hard to prevent users from doing wrong things.

@vadimkantorov
Copy link
Author

It's not an outlier, which a Lab conversion transform would be.

Lab, YCrCb, HSV conversions are classical computer vision tools, far from an outlier in the broader context

Unfortunately, we already have the rgb <-> grayscale conversions in the transforms module.

If wanted, this could be refactored into another namespace like what happened with core torch torch.linalg and torch.fft new namespaces (and there the use was probably much higher)

A new "util" namespace can be started even if some older methods exist in "transforms" to enable and help with classical computer vision pipelines.

The second we start having other color spaces conversions, users will assume that the models and transforms will "magically" work with whatever tensor you pass. I'

I think this is false. Users are already well-acquainted that preprocessing is a must (mean subtraction; plus [-1, 1] normalization).

we try hard to prevent users from doing wrong things.

Well, nothing can prevent users to corrupt the input tensor if they are feeling playful, and the library can't do anything about this in the end besides clear documentation. If these colorspace utils are not advertised in the main tutorials it should not be a problem. I don't see why anyone would out-of-the-blue apply colorsys functions

cc @fmassa

@NicolasHug
Copy link
Member

NicolasHug commented Jun 28, 2021

Lab, YCrCb, HSV conversions are classical computer vision tools, far from an outlier in the broader context

They'd be outliers in torchvision. That's my main point: the coherence of a library is very important both for its maintenance and for its users. Yes, we want to support as many use-cases as possible, but we can't just say "oh this will be useful to someone, let's put it in". On top of preventing wrong results, the scope of a library is also important. As far as I can tell, Lab conversions are out of scope for the core of torchvision, at least at the moment. But we can think of ways to help the users who need it to implement it easily.

If wanted, this could be refactored into another namespace like what happened with core torch torch.linalg and torch.fft new namespaces (and there the use was probably much higher)

Deprecating the widely used rgb <-> grayscale transforms just so we can introduce lab conversions in a new submodule so that users don't consider Lab images to be fully supported isn't a reasonable strategy, unfortunately. (this sentence is way too long already, which is an indicator that there are too many things going on).

And it's very different from what happened to torch.linalg BTW, which is not just a namespace change. The new linalg module comes with tons of new features and improvements: https://60c76d975cb350913c6c73c8--shiftlab-pytorch-github-io.netlify.app/blog/pytorch-1.9-released/#torchlinalg

Well, nothing can prevent users to corrupt the input tensor if they are feeling playful, and the library can't do anything about this in the end besides clear documentation.

You're right. The point is that we should prevent what we can prevent, still. I'm not sure such argument really helps moving the discussion forward.

@vadimkantorov
Copy link
Author

I'm not sure such argument really helps moving the discussion forward.

Hereby I disengage :) Please feel free to close the issue as "wontfix"

@datumbox
Copy link
Contributor

@vadimkantorov I think it's good that we have your use-case documented thoroughly in this post. There are parallel discussions on how to improve our transforms, so I think having this information is useful and can steer us to the right direction.

Concerning moving forward with this proposal, I think more time and discussions are needed and as Nicolas mentioned, we need to find ways to avoid confusion. At the moment, it's unclear to me how we could add this and similar features without breaking compatibility and bloating the library. It's definitely on our radar to improve the situation, so please keep nudging us and providing your feedback. So I think instead of marking this as "wontfix", it's preferable to leave this open and mark it for additional discussion.

@oke-aditya
Copy link
Contributor

oke-aditya commented Jun 28, 2021

Hi @datumbox @NicolasHug @vadimkantorov

I do have a reasonable solution, though this might be improved upon with discussion.
(This might look like I'm rushing to solution, but maybe I will forget about this and forget the solution too :( )

Create a simple transform in torchvision.transforms called ConvertColorspace().
Note that this would not change RGB to Grayscale. As changing number of channels makes it inconsistent.

Usage:

# Instead of strings we can use Enums instead
new_img = F.convert_colorspace(orig_img, in_color="RGB", out_color="LAB")

# internally we do all these conversions with `_methods` in `_convert_colorspace.py` file. 
# Not exposing these to user.

Similarly we can provide a class for same T.ConverColorspace.

Bloating issue
This does not bloat the library as we add only 1 function and class (Single responsibility 😄 )

Compatibility Issue
Well this does not mean that we support we support BGR / LAB / HSV / colorspaces images in transforms.
We have written all the transforms with only RGB in mind.
Note that models / any other utility / ops will not support colorspace other than RGB.

Maintenance Issue: -
Well I think this does not break existing code. We don't introduce a namespace so lowering down the maintenance.

Benefit to end user: -

  1. GPU accelerated!
  2. Works on batches.
  3. Scriptable
  4. Composable
  5. Tested

I would love to hear thoughts from all 😄

I hope that I have proposed something reasonable. Feel free to correct me!

@fmassa
Copy link
Member

fmassa commented Jun 29, 2021

Hi,

Let me chime in a bit on this discussion. Thanks a ton @vadimkantorov, @NicolasHug , @datumbox and @oke-aditya for ideas.

This problem persists with OpenCV returning BGR images, but it's up to users to remember to convert if this is needed.

Yes, and this is a recurring issue everywhere in the python ecosystem. How does PIL solve this issue? By attaching a mode metadata to the Image object.

This is not the first instance of this type of problem: bounding boxes have the same problem, as they can have different input representations (xyxy, xywh, etc, as pointed out by @oke-aditya in his comment).

While it would be easy to add those color conversion functions to torchvision, the problem I want to minimize is for users to get silent bugs because of using LAB / HSV colorspaces.
One example: LAB representation is not in the same scale as RGB (the A and B channels don't have a fixed range IIRC). How does convert_image_dtype work in this situation? Can we support LAB images in uint8 format? Should resize clamp values in 0-1 for float tensors (won't work for LAB)? Should make_grid and other visualization functions work with lab colorspace? What happens if we blend a LAB image with a RGB image, should it fail?

Those are all questions that it would be good for us to answer before implementing such functions, so that we have a coherent story.

Here is one proposal (which is not good enough, but is one option):

class ImageTensor(torch.Tensor):
    def __init__(self, *args, *, mode='rgb', **kwargs):
        super().__init__(*args, **kwargs)
        self._img_mode = mode
        
    # do the other stuff needed so that `._img_mode` gets propagated
    # via __torch_function__, for functions that make sense

In this representation, ImageTensor is just like a Tensor, but carries metadata about what is the colorspace for a given tensor.

A similar approach could be done for bounding boxes as well, and segmentation masks.

But what is the gotcha with this approach? torchscript won't work because of __torch_function__.

@oke-aditya
Copy link
Contributor

Some great thoughts by @fmassa .
These have broadened the discussion.

I had thought of creating similar solution for bounding boxes and segmentation masks.
Such solution allows lot of flexibility and potentially can help in solving Detection transforms as well as rotated boxes, etc.

As similar to ImageTensor classes such as Boxes, Masks can help. (Something similar what Detectron2 does)
Another point I would like to add is

-> Creating Classes abstracts the idea of tensor!

All the downstream libraries built on top of torchvision now need to use Boxes, ImageTensor dtypes.
These classes are like Tensor but not the same. This presents an abstraction, sometimes limitations and a small learning curve for users. All downstream libraries cannot simply use a Tensor [N, 4] for boxes, etc.

Well this is a trade-off, and a big breaking change too.

@fmassa
Copy link
Member

fmassa commented Jun 30, 2021

Such solution allows lot of flexibility and potentially can help in solving Detection transforms as well as rotated boxes, etc.

Yes, exactly. That is the original motivation. I'm not yet 100% sure if this is the best way to go forward, but it's one possibility.

Let's redirect discussions about transforms to the RFC from @pmeier pmeier/torchvision-datasets-rework#1 for now. We will be iterating on the design to get to a state where we are all happy with.

All the downstream libraries built on top of torchvision now need to use Boxes, ImageTensor dtypes.
All downstream libraries cannot simply use a Tensor [N, 4] for boxes, etc.
Well this is a trade-off, and a big breaking change too.

Those are good points, and it's one of the reasons why we have been careful without adding those features without careful thought. There is probably a way of making current functions still work in a BC way (as a Boxes is still a Tensor), but torchscript support might be tricky to get and needs to be evaluated.

@vadimkantorov vadimkantorov changed the title [feature request] rgb2lab and other color space conversions [feature request] rgb2lab / rgb2hsv / rgb2gray and other color space conversions Jan 17, 2022
@vadimkantorov
Copy link
Author

rgb2hsv / hsv2rgb is also useful for interpolating between two colors: https://stackoverflow.com/questions/13488957/interpolate-from-one-color-to-another, this is useful for visualizations

@vadimkantorov vadimkantorov changed the title [feature request] rgb2lab / rgb2hsv / rgb2gray and other color space conversions [feature request] rgb2lab / rgb2hsv / rgb2gray and other color space conversions (maybe upstream from kornia?) Jan 17, 2022
@vadimkantorov vadimkantorov changed the title [feature request] rgb2lab / rgb2hsv / rgb2gray and other color space conversions (maybe upstream from kornia?) [feature request] rgb2lab / rgb2hsv / rgb2gray and other color space conversions (maybe upstream from kornia? or colorsys python module?) Jan 18, 2022
@vadimkantorov vadimkantorov changed the title [feature request] rgb2lab / rgb2hsv / rgb2gray and other color space conversions (maybe upstream from kornia? or colorsys python module?) [feature request] rgb2lab / rgb2hsv / rgb2gray and other color space conversions (maybe upstream from kornia? or colorsys python core module?) Jan 18, 2022
@vadimkantorov
Copy link
Author

vadimkantorov commented Jul 8, 2022

rgb2gray has been implemented as https://pytorch.org/vision/stable/generated/torchvision.transforms.functional.rgb_to_grayscale.html#torchvision.transforms.functional.rgb_to_grayscale

other transforms existing in opencv / kornia / colorsys / skimage would still be nice to have (rgb<->hsv etc)

It would be nice to have them available in the standard library (both on CPU/GPU) rather than rolling your own. Kornia seems to have implemented it: https://kornia.readthedocs.io/en/latest/color.html, skimage as well: https://scikit-image.org/docs/dev/api/skimage.color.html, python core as well: https://docs.python.org/3/library/colorsys.html, but I think it's basic enough to merit inclusion of similar functions in core.

(my usecase: SLIC superpixel extraction which requires rgb2lab and selective search algorithm which requires rgb2hsv and rgb2lab)

@vadimkantorov
Copy link
Author

btw, TensorFlow's tf.image package does contain rgb_to_hsv, rgb_to_yuv and a few others: https://www.tensorflow.org/api_docs/python/tf/image/rgb_to_hsv

@oke-aditya
Copy link
Contributor

It might be feasible with new transforms API (which I haven't explored yet) cc @vfdev-5

@vadimkantorov
Copy link
Author

vadimkantorov commented Jun 21, 2023

Btw _rgb2hsv/_hsv2rgb/_rgb_to_grayscale are already implemented in https://github.com/pytorch/vision/blob/main/torchvision/transforms/_functional_tensor.py / https://github.com/pytorch/vision/blob/main/torchvision/transforms/v2/functional/_color.py and used as part of adjust_hue, but of course rgb2hsv/hsv2rgb would be useful as public APIs in other contexts too

related:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants