-
Notifications
You must be signed in to change notification settings - Fork 7.1k
fix eurosat prototype mock data setup #5549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 3062c71 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
fa803fb
to
ed0e2a6
Compare
ed0e2a6
to
a5b4f2d
Compare
test/builtin_dataset_mocks.py
Outdated
@@ -1329,20 +1329,20 @@ def cub200(info, root, config): | |||
|
|||
@register_mock | |||
def eurosat(info, root, config): | |||
data_folder = pathlib.Path(root, "eurosat", "2750") | |||
data_folder = pathlib.Path(root, "2750") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra eurosat
folder is a remnant from the legacy mock data and is not present in the actual archive.
name=cls, | ||
file_name_fn=lambda idx: f"{cls}_{idx}.jpg", | ||
name=category, | ||
file_name_fn=lambda idx: f"{category}_{idx + 1}.jpg", | ||
num_examples=num_examples_per_class, | ||
) | ||
make_zip(root, "EuroSAT.zip", data_folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using data_folder
here, it is assumed that this folder and all of its contents should be included in the archive. By default, make_zip
removes all files and folders it includes. Since the extra eurosat
folder is not included, it is left as is. Thus, there are now an empty eurosat
folder as well as the EuroSAT.zip
file in the root directory.
Although not correct, that wouldn't be an issue if weren't for a quirk on Windows and macOS: pathlib
treats file and folder names case-insensitive 🤯 This means,
vision/torchvision/prototype/datasets/utils/_resource.py
Lines 92 to 99 in e836b3d
stem = path.name.replace("".join(path.suffixes), "") | |
# In a first step, we check for a folder with the same stem as the raw file. If it exists, we use it since | |
# extracted files give the best I/O performance. Note that OnlineResource._extract() makes sure that an archive | |
# is always extracted in a folder with the corresponding file name. | |
folder_candidate = path.parent / stem | |
if folder_candidate.exists() and folder_candidate.is_dir(): | |
return self._loader(folder_candidate) |
is picking up the empty eurosat
folder as a candidate for EuroSAT.zip
although it should only look for an EuroSAT
folder. Since we prefer extracted files over archives, it now loads the empty folder and thus all tests fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows and macOS: pathlib treats file and folder names case-insensitive
Is this a bug, or a documented "feature" ? Are we likely to hit more bugs in the future because of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a bug nor documented, since this is part of system. Neither macOS nor Windows support case sensitive files.
IMO the only places where we might hit this is in the test suite if the mock data is setup wrong. A scenario like above cannot happen during normal operation unless a user manually manipulates the data folders.
except StopIteration: | ||
raise AssertionError("Unable to draw any sample.") from None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how this relates to the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes this case more expressive. Before you got an error message like Drawing a sample resulted in the error above
while "above" is a plain StopIteration
. If you are familiar with datapipes, you should be able to interpret this, but new contributors might not be. I've fixed this here, because we were hitting this exact case.
name=cls, | ||
file_name_fn=lambda idx: f"{cls}_{idx}.jpg", | ||
name=category, | ||
file_name_fn=lambda idx: f"{category}_{idx + 1}.jpg", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how the idx + 1
part relates to the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, but it aligns the mock data more with real data. I just found this while looking into the real data to debug this issue and didn't want to have an extra PR for it.
name=cls, | ||
file_name_fn=lambda idx: f"{cls}_{idx}.jpg", | ||
name=category, | ||
file_name_fn=lambda idx: f"{category}_{idx + 1}.jpg", | ||
num_examples=num_examples_per_class, | ||
) | ||
make_zip(root, "EuroSAT.zip", data_folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows and macOS: pathlib treats file and folder names case-insensitive
Is this a bug, or a documented "feature" ? Are we likely to hit more bugs in the future because of this?
@@ -1329,20 +1329,20 @@ def cub200(info, root, config): | |||
|
|||
@register_mock | |||
def eurosat(info, root, config): | |||
data_folder = pathlib.Path(root, "eurosat", "2750") | |||
data_folder = root / "2750" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come the linux tests were passing before with the extra eurosat
folder?
I would assume that if we change the structure in the , this would have to be reflected somehow in the dataset code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the prototype datasets never look at the absolute path, but only relative starting from the right
category = pathlib.Path(path).parent.name |
adding more folders on the left side makes no difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pmeier
Summary: * [DEBUG] eurosat prototype on windows and macos * print paths * fix eurosat mock data setup * revert changes * minor cleanup Reviewed By: vmoens Differential Revision: D34878969 fbshipit-source-id: 6784c6119b65ec719683195e7d1ebbf22a9d1b2d
torchdata
reported CI failures for Windows and macOS for the EuroSAT prototype from #5452 cc @NivekT @Dbhasin1. The mock data is setup slightly wrong, but that shouldn't matter were it not for a quirk on Windows and macOS. Details follow inline.We didn't detect that, because our prototype tests only run on Linux. I've switched to Windows for debugging. Passing Windows CI run with the patch from this PR: https://app.circleci.com/pipelines/github/pytorch/vision/15300/workflows/c9eb674a-a3fa-4bcc-9824-c047beb47579/jobs/1235498