Skip to content

resolve redirection for HTTP resources #5151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 1, 2022

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Dec 31, 2021

This is far from perfect, since the mirrors do not get resolved. So far we never hit an issue where this would be a problem, but still.

In the future, the plan is to switch the resources to iopath. If that is available, resolving will no longer generate a new resource, but rather change the underlying path in place.

cc @pmeier @bjuncek

@facebook-github-bot
Copy link

facebook-github-bot commented Dec 31, 2021

💊 CI failures summary and remediations

As of commit d573acc (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build CodeQL / build (1/1)

Step: "Build TorchVision" (full log | diagnosis details | 🔁 rerun)

2022-02-01T15:12:44.9644611Z ##[error]Process completed with exit code 1.
2022-02-01T15:12:44.8307258Z     self.finalize_options()
2022-02-01T15:12:44.8307803Z   File "/home/runner/.local/lib/python3.8/site-packages/setuptools/command/develop.py", line 52, in finalize_options
2022-02-01T15:12:44.8308179Z     easy_install.finalize_options(self)
2022-02-01T15:12:44.8308973Z   File "/home/runner/.local/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 276, in finalize_options
2022-02-01T15:12:44.8309350Z     self._fix_install_dir_for_user_site()
2022-02-01T15:12:44.8309967Z   File "/home/runner/.local/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 382, in _fix_install_dir_for_user_site
2022-02-01T15:12:44.8310383Z     self.create_home_path()
2022-02-01T15:12:44.8310993Z   File "/home/runner/.local/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 1338, in create_home_path
2022-02-01T15:12:44.8311452Z     if path.startswith(home) and not os.path.isdir(path):
2022-02-01T15:12:44.8311876Z AttributeError: 'int' object has no attribute 'startswith'
2022-02-01T15:12:44.9644611Z ##[error]Process completed with exit code 1.
2022-02-01T15:12:44.9725056Z Post job cleanup.
2022-02-01T15:12:45.0884396Z [command]/usr/bin/git version
2022-02-01T15:12:45.0936777Z git version 2.34.1
2022-02-01T15:12:45.0984334Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2022-02-01T15:12:45.1030878Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2022-02-01T15:12:45.1379930Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2022-02-01T15:12:45.1411224Z http.https://github.com/.extraheader
2022-02-01T15:12:45.1424165Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2022-02-01T15:12:45.1465406Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :
2022-02-01T15:12:45.1973969Z Cleaning up orphan processes

1 failure not recognized by patterns:

Job Step Action
CircleCI cmake_macos_cpu curl -o conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
sh conda.sh -b
source $HOME/miniconda3/bin/activate
conda install -yq conda-build cmake
packaging/build_cmake.sh
🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmeier ,

I mostly have questions but I'll approve to unlock. For my own understanding, did we face cases in the past where not handling redirections was a problem?

Comment on lines +168 to +170
gdrive_id = _get_google_drive_file_id(redirect_url)
if gdrive_id:
return GDriveResource(gdrive_id, **meta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have current cases where an HTTP resource redirects to a Google drive?

Copy link
Collaborator Author

@pmeier pmeier Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is one of the datasets not yet ported to prototype. I've also hit this while doing #5154.


def _download(self, root: pathlib.Path) -> None:
if not self.resolved:
return cast(OnlineResource, self.resolve())._download(root)

for url in itertools.chain((self.url,), self.mirrors or ()):
try:
download_url(url, str(root), filename=self.file_name, md5=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your first comment you wrote

This is far from perfect, since the mirrors do not get resolved

Could we instead do the redirection here and handle the mirrors?

Copy link
Collaborator Author

@pmeier pmeier Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but that would make stuff more complicated. Given that the plan is to refactor this anyway with iopath and we currently only have a single datasets that has official mirrors, I wouldn't sweat it at the moment.

@pmeier pmeier merged commit d5a22a8 into pytorch:main Feb 1, 2022
@pmeier pmeier deleted the datasets/resolve-url branch February 1, 2022 16:43
facebook-github-bot pushed a commit that referenced this pull request Feb 3, 2022
Summary:
* resolve redirection for HTTP resources

* appease mypy

* address review

Reviewed By: kazhang

Differential Revision: D33927505

fbshipit-source-id: a6b39b2809fd63419620523f6b84a60e91147818
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants