Skip to content

add regnet_y_128gf factory function #5176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 13, 2022

Conversation

kazhang
Copy link
Contributor

@kazhang kazhang commented Jan 7, 2022

Add a factory function for RegNetY 128GF.

cc @datumbox

@facebook-github-bot
Copy link

facebook-github-bot commented Jan 7, 2022

💊 CI failures summary and remediations

As of commit 59d56ab (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build unittest_linux_cpu_py3.7 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

/root/project/torchvision/io/video.py:406: Runt...log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found
test/test_image.py::test_decode_png[L-ImageReadMode.GRAY-palette_pytorch.png]
test/test_image.py::test_decode_png[RGB-ImageReadMode.RGB-palette_pytorch.png]
  /root/project/env/lib/python3.7/site-packages/PIL/Image.py:946: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
    "Palette images with Transparency expressed in bytes should be "

test/test_io.py::TestVideo::test_probe_video_from_memory
  /root/project/torchvision/io/_video_opt.py:423: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1642031458636/work/torch/csrc/utils/tensor_new.cpp:998.)
    video_data = torch.frombuffer(video_data, dtype=torch.uint8)

test/test_io.py::TestVideo::test_read_video_timestamps_corrupted_file
  /root/project/torchvision/io/video.py:406: RuntimeWarning: Failed to open container for /tmp/tmp_byuro1l.mp4; Caught error: [Errno 1094995529] Invalid data found when processing input: '/tmp/tmp_byuro1l.mp4'; last error log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found
    warnings.warn(msg, RuntimeWarning)

test/test_models.py::test_memory_efficient_densenet[densenet121]
test/test_models.py::test_memory_efficient_densenet[densenet169]
test/test_models.py::test_memory_efficient_densenet[densenet201]
test/test_models.py::test_memory_efficient_densenet[densenet161]
  /root/project/env/lib/python3.7/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
    warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

test/test_models.py::test_inception_v3_eval

1 failure not recognized by patterns:

Job Step Action
CircleCI binary_linux_conda_py3.9_cu111 packaging/build_conda.sh 🔁 rerun

1 job timed out:

  • binary_linux_conda_py3.9_cu111

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@kazhang kazhang force-pushed the regnet_y_128gf_factory_function branch from 0ccc5d5 to 85b2ec1 Compare January 8, 2022 01:12
@yiwen-song
Copy link
Contributor

yiwen-song commented Jan 9, 2022

Yeah...same for vit_h_14. It's always the tests on windows that fail... I tried bumping up the memory allocation (by upgrading CI instance type) but still it didn't help with my case :)
Probably you can give it a try:

executors:
windows-cpu:
machine:
resource_class: windows.xlarge
image: windows-server-2019-vs2019:stable
shell: bash.exe
windows-gpu:
machine:
resource_class: windows.gpu.nvidia.medium
image: windows-server-2019-nvidia:stable
shell: bash.exe

The available executor types on CircleCI can be found on this documentation.

@kazhang kazhang force-pushed the regnet_y_128gf_factory_function branch 2 times, most recently from 14d0957 to 755b824 Compare January 11, 2022 04:31
@kazhang
Copy link
Contributor Author

kazhang commented Jan 11, 2022

The tests on Windows failed when saving model torchscript. I suspected it's OOMd. I tried to use tempfile.NamedTemporaryFile instead of bytes io, hoping to avoid OOM, but the temporary file can't be opened to write on windows.

@kazhang kazhang force-pushed the regnet_y_128gf_factory_function branch from 755b824 to cf27e0e Compare January 11, 2022 07:44
@datumbox
Copy link
Contributor

@kazhang It seems to work now. The failing test is not related.

@prabhat00155 The test.test_videoapi.TestVideoApi test has been failing periodically for a while now. Can we adjust the thresholds or modify to reduce flakiness?

@prabhat00155
Copy link
Contributor

@prabhat00155 The test.test_videoapi.TestVideoApi test has been failing periodically for a while now. Can we adjust the thresholds or modify to reduce flakiness?

@bjuncek is looking into this.

@kazhang
Copy link
Contributor Author

kazhang commented Jan 11, 2022

The reason why NamedTemporaryFile is because

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later)

T.I.L
TemporaryDirectory works.

@kazhang kazhang marked this pull request as ready for review January 11, 2022 16:56
@datumbox
Copy link
Contributor

@kazhang The failures on Windows are related. It's a memory problem, might be hard to fix.

@kazhang
Copy link
Contributor Author

kazhang commented Jan 11, 2022

hmmm, saving and loading torchscript from temporary file caused failure in detection model tests, sounds weird to me.

@kazhang kazhang marked this pull request as draft January 12, 2022 06:24
@kazhang kazhang changed the title add regnet_y_128gf factory function [WIP]add regnet_y_128gf factory function Jan 12, 2022
@datumbox
Copy link
Contributor

@kazhang Previously we faced similar issues with ViT and I noticed that the biggest Windows instance available to us is "GPU Windows Medium". I'm not sure if there are larger instances available that we could use but you could try checking with CircleCI. My hypothesis for why Linux doesn't fail is that we either use a different type of instance or it uses the GPU memory more efficiently due to the lack of GUI.

@kazhang kazhang force-pushed the regnet_y_128gf_factory_function branch from 5656811 to a038c85 Compare January 13, 2022 00:45
@kazhang kazhang marked this pull request as ready for review January 13, 2022 01:29
@kazhang kazhang changed the title [WIP]add regnet_y_128gf factory function add regnet_y_128gf factory function Jan 13, 2022
@@ -126,16 +126,16 @@ def assert_export_import_module(m, args):

def get_export_import_copy(m):
"""Save and load a TorchScript model"""
buffer = io.BytesIO()
torch.jit.save(m, buffer)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crash when saving large model to buffer.

Comment on lines +129 to +130
with TemporaryDirectory() as dir:
path = os.path.join(dir, "script.pt")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NamedTemporaryFile doesn't work on windows since it can't be open twice.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @kazhang!

@datumbox datumbox merged commit e3767f8 into pytorch:main Jan 13, 2022
facebook-github-bot pushed a commit that referenced this pull request Jan 17, 2022
Summary:
* add regnet_y_128gf

* fix test

* add expected test file

* update regnet factory function, add to prototype as well

* write torchscript to temp file instead bytesio in model test

* docs

* clear GPU memory

* no_grad

* nit

Reviewed By: NicolasHug

Differential Revision: D33618170

fbshipit-source-id: cb92ce70413a6f1096aef8732d8fe948af41caad

Co-authored-by: Vasilis Vryniotis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants