Skip to content

[Feature] Add transforms for randomly converting image to grayscale #299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sourabhd opened this issue Oct 16, 2017 · 7 comments
Closed

Comments

@sourabhd
Copy link
Contributor

Data augmentation (transform) for randomly converting image into grayscale (with probability p) is useful for handling datasets containing a mix of rgb and grayscale images.

Please check my implementation here
Let me know if this looks reasonable addition. I could send a pull request for the same.

@sourabhd
Copy link
Contributor Author

sourabhd commented Oct 16, 2017

@sourabhd sourabhd changed the title Add transforms for randomly converting image to grayscale [Feature] Add transforms for randomly converting image to grayscale Oct 18, 2017
@alykhantejani
Copy link
Contributor

Hi @sourabhd,

I've taken a quick look at the code and it seems like you convert an image to grayscale and then back to RGB (repeating the grayscale image 3 times).

I'm not sure when you would want to do this, i.e. if your dataset is a mix of RGB and grayscale images I would think you would either want all grayscale images (single channel) or to convert the grayscale ones to 3-channel grayscale and mix these with the RGB ones.

A mix of single channel and 3 channel images wouldn't make sense as your network needs to know the number of input channels?

@sourabhd
Copy link
Contributor Author

@alykhantejani

  1. A 3-channel image could be grayscale if R == G == B
    Example image from ImageNet
    Grayscale in RGB
  2. Origins
  • Dataset might contain such 3-channel images like the example shown above

  • If dataset has single channel images, they need to be converted to 3 channel as pre-trained models are available only for 3 channel. This is usually done by replicating across channels. This leads to the same situation

  1. Why is it needed ?
    Labeling of dataset could be expensive (example FACS, emotions etc) and we want to make use of both the colored and grayscale images (instead of throwing away one set). For introducing, invariance to grayscaling of an image, we could employ an augmentation where we randomly (with a probability p) grayscale an image. The idea is that over multiple epochs, the network sees an image as well as its grayscale counterpart (which has the same label) and learns the invariance during backprop accordingly.

@alykhantejani
Copy link
Contributor

So I think that this could go both ways i.e. the user wants to change 3-channel images to single channel grayscale or the user wants to change 1-channel images to 3-channel ones. I think a to_grayscale function can be misleading in this case as sometimes is returns 1-channel images and sometimes 3-channel.

Additionally, depending on the users preference, it should be easy for them to encapsulate their desired behavior into a function (using PIL's convert/stack) and chain these together in a Compose?

@fmassa wdyt?

@sourabhd
Copy link
Contributor Author

sourabhd commented Nov 4, 2017

@alykhantejani In that case we could have two functions to_grayscale_singlechannel and to_grayscale_threechannel.

@alykhantejani
Copy link
Contributor

@sourabhd yeah, I'd be happy with a to_grayscale function/transform with a num_output_channels kwarg. Can you send a PR?

@alykhantejani
Copy link
Contributor

Fixed via #325

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants