-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Refactoring of the datasets #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for the PR!
It looks good for the most part, I only have a couple of comments:
- the inheritance pattern should be Python2 friendly for new-style classes, so using
super(MyClass, self)
would be preferable tosuper()
. - I'm unsure if I'd want to add the
transform
andtarget_transform
to the base class. The reason is that for some tasks, we might want to apply the same random transform to both input and target. This is currently not well supported in torchvision anyway, but I'm planning a few things that would make it better.
Thoughts about 2. ?
I'm not sure if I understand you correctly, but wouldn't the following work in such a case?
Although I've never worked with it, isn't this somewhat similar to the |
Yes, you could handle those cases with something like class CrazyDataset(VisionDataset):
def __init__(self, root, transform=None):
super(CrazyDataset, self).__init__(root, None, None) but then, this means that For example, in #230 (comment) I mention one alternative, class StandardTransform(object):
def __init__(self, transform, target_transform):
self.transform = transform
self.target_transform = target_transform
def __call__(self, input, target):
if self.transform:
input = self.transform(input)
if self.target_transform:
target = self.target_transform(target)
return input, target and which could replace the Because I'm not 100% sure of such a thing, I'd prefer to have the base class be for now very dummy and basic, and then maybe add support for more things afterwards once we are sure about it. Thoughts? |
Thanks for the additional information as I wasn't aware of that discussion. Just to be clear: you want |
@pmeier I'm not 100% sure about this either :-) My original idea was to have it as a fallback initially, but I still need to think a bit more about it. This is why I'd rather have |
@fmassa Alright. I will move
# cityscapes
def extra_repr(self):
lines = (self._format_transform_repr(self.transform, "Transforms: ") +
self._format_transform_repr(self.target_transform, "Target transforms: ") +
["Split: {split}", "Mode: {mode}", "Type: {target_type}"])
return '\n'.join(lines).format(**self.__dict__)
class VisionDataset(data.Dataset):
_repr_indent = 4
def __init__(self, root):
if isinstance(root, torch._six.string_classes):
root = os.path.expanduser(root)
self.root = root
self.transform = None
self.target_transform = None
def __repr__(self):
...
if self.transform is not None:
body += self._format_transform_repr(self.transform,
"Transforms: ")
if self.target_transform is not None:
body += self._format_transform_repr(self.target_transform,
"Target transforms: ")
... I would prefer option 2. since it will keep the subclasses uncluttered. |
@pmeier I'd go for option 2 as well, or by some intermediate solution like if hasattr(self, "transform") and self.transform is not None:
... so that we can avoid adding the |
…g them within the constructor
@fmassa It's good to have someone with experience for guidance ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
There are still a few things missing I believe.
Codecov Report
@@ Coverage Diff @@
## master #749 +/- ##
==========================================
+ Coverage 39.64% 41.54% +1.89%
==========================================
Files 29 30 +1
Lines 2742 2684 -58
Branches 430 438 +8
==========================================
+ Hits 1087 1115 +28
+ Misses 1581 1492 -89
- Partials 74 77 +3
Continue to review full report at Codecov.
|
@fmassa Yep you are right, Sorry for causing more work than this should have been. |
Hi @pmeier Lint is failing with
Could you address that? Thanks! |
@fmassa Can I do these checks locally? I seems counter-intuitive to me to first commit the changes and than "hope" that the commit passes all checks. |
@pmeier yes, absolutely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I'm not sure if there is a practical reason for almost all datasets to individually define the attributes
root
,transform
, andtarget_transform
as well as the__repr__
method. Since I think there is no reason I've introduced a superclassVisionDataset
to streamline the process. This achieves two things:root
andtarget_transform
attribute and all of them have atransform
attribute, I think it is a good idea to move them toVisionDataset
. The only two exceptions are:FakeData
withoutroot
andPhotoTour
withouttarget_transform
. Both cases are handled byVisionDataset
ifNone
is passed for these arguments.__repr__
similar totorch.nn.Module
. Thus,__repr__
always returns the name of the dataset as well as the number of samples within it. If the attributesroot
,transform
, andtarget_transform
are notNone
they are added. Additionally, the subclasses can override theextra_repr
method to include information specific for that dataset.Next to the refactoring with these changes the datasets
CocoCaptions
,Flickr*
,Omniglot
,SBU
, andVOC*
now also have a human-readable representation.