add regnet_y_128gf factory function #5176

kazhang · 2022-01-07T21:37:57Z

Add a factory function for RegNetY 128GF.

cc @datumbox

facebook-github-bot · 2022-01-07T21:38:04Z

💊 CI failures summary and remediations

As of commit 59d56ab (more details on the Dr. CI page):

2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

unittest_linux_cpu_py3.7 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

/root/project/torchvision/io/video.py:406: Runt...log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found

test/test_image.py::test_decode_png[L-ImageReadMode.GRAY-palette_pytorch.png]
test/test_image.py::test_decode_png[RGB-ImageReadMode.RGB-palette_pytorch.png]
  /root/project/env/lib/python3.7/site-packages/PIL/Image.py:946: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
    "Palette images with Transparency expressed in bytes should be "

test/test_io.py::TestVideo::test_probe_video_from_memory
  /root/project/torchvision/io/_video_opt.py:423: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1642031458636/work/torch/csrc/utils/tensor_new.cpp:998.)
    video_data = torch.frombuffer(video_data, dtype=torch.uint8)

test/test_io.py::TestVideo::test_read_video_timestamps_corrupted_file
  /root/project/torchvision/io/video.py:406: RuntimeWarning: Failed to open container for /tmp/tmp_byuro1l.mp4; Caught error: [Errno 1094995529] Invalid data found when processing input: '/tmp/tmp_byuro1l.mp4'; last error log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found
    warnings.warn(msg, RuntimeWarning)

test/test_models.py::test_memory_efficient_densenet[densenet121]
test/test_models.py::test_memory_efficient_densenet[densenet169]
test/test_models.py::test_memory_efficient_densenet[densenet201]
test/test_models.py::test_memory_efficient_densenet[densenet161]
  /root/project/env/lib/python3.7/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
    warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

test/test_models.py::test_inception_v3_eval

1 failure not recognized by patterns:

Job	Step	Action
^{binary_linux_conda_py3.9_cu111}	^{packaging/build_conda.sh}	🔁 rerun

1 job timed out:

binary_linux_conda_py3.9_cu111

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

yiwen-song · 2022-01-09T18:51:47Z

Yeah...same for vit_h_14. It's always the tests on windows that fail... I tried bumping up the memory allocation (by upgrading CI instance type) but still it didn't help with my case :)
Probably you can give it a try:

vision/.circleci/config.yml

Lines 9 to 20 in 49ec677

    
           executors: 
        
             windows-cpu: 
        
               machine: 
        
                 resource_class: windows.xlarge 
        
                 image: windows-server-2019-vs2019:stable 
        
                 shell: bash.exe 
        
             windows-gpu: 
        
               machine: 
        
                 resource_class: windows.gpu.nvidia.medium 
        
                 image: windows-server-2019-nvidia:stable 
        
                 shell: bash.exe

The available executor types on CircleCI can be found on this documentation.

torchvision/models/regnet.py

kazhang · 2022-01-11T07:21:48Z

The tests on Windows failed when saving model torchscript. I suspected it's OOMd. I tried to use tempfile.NamedTemporaryFile instead of bytes io, hoping to avoid OOM, but the temporary file can't be opened to write on windows.

datumbox · 2022-01-11T09:15:16Z

@kazhang It seems to work now. The failing test is not related.

@prabhat00155 The test.test_videoapi.TestVideoApi test has been failing periodically for a while now. Can we adjust the thresholds or modify to reduce flakiness?

prabhat00155 · 2022-01-11T10:32:54Z

@prabhat00155 The test.test_videoapi.TestVideoApi test has been failing periodically for a while now. Can we adjust the thresholds or modify to reduce flakiness?

@bjuncek is looking into this.

kazhang · 2022-01-11T16:56:06Z

The reason why NamedTemporaryFile is because

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later)

T.I.L
TemporaryDirectory works.

datumbox · 2022-01-11T20:42:51Z

@kazhang The failures on Windows are related. It's a memory problem, might be hard to fix.

kazhang · 2022-01-11T23:31:32Z

hmmm, saving and loading torchscript from temporary file caused failure in detection model tests, sounds weird to me.

datumbox · 2022-01-12T09:33:58Z

@kazhang Previously we faced similar issues with ViT and I noticed that the biggest Windows instance available to us is "GPU Windows Medium". I'm not sure if there are larger instances available that we could use but you could try checking with CircleCI. My hypothesis for why Linux doesn't fail is that we either use a different type of instance or it uses the GPU memory more efficiently due to the lack of GUI.

kazhang · 2022-01-13T01:36:02Z

test/test_models.py

@@ -126,16 +126,16 @@ def assert_export_import_module(m, args):

        def get_export_import_copy(m):
            """Save and load a TorchScript model"""
-            buffer = io.BytesIO()
-            torch.jit.save(m, buffer)


Crash when saving large model to buffer.

kazhang · 2022-01-13T01:37:16Z

test/test_models.py

+            with TemporaryDirectory() as dir:
+                path = os.path.join(dir, "script.pt")


NamedTemporaryFile doesn't work on windows since it can't be open twice.

datumbox

LGTM, thanks @kazhang!

Summary: * add regnet_y_128gf * fix test * add expected test file * update regnet factory function, add to prototype as well * write torchscript to temp file instead bytesio in model test * docs * clear GPU memory * no_grad * nit Reviewed By: NicolasHug Differential Revision: D33618170 fbshipit-source-id: cb92ce70413a6f1096aef8732d8fe948af41caad Co-authored-by: Vasilis Vryniotis <[email protected]>

add regnet_y_128gf

7af0699

pytorch-probot bot added the ciflow/default label Jan 7, 2022

facebook-github-bot added the cla signed label Jan 7, 2022

fix test

85b2ec1

kazhang force-pushed the regnet_y_128gf_factory_function branch from 0ccc5d5 to 85b2ec1 Compare January 8, 2022 01:12

add expected test file

83dd16a

datumbox reviewed Jan 10, 2022

View reviewed changes

torchvision/models/regnet.py Outdated Show resolved Hide resolved

kazhang and others added 2 commits January 10, 2022 12:01

Merge branch 'main' into regnet_y_128gf_factory_function

3a94686

update regnet factory function, add to prototype as well

d1ff8a1

kazhang force-pushed the regnet_y_128gf_factory_function branch 2 times, most recently from 14d0957 to 755b824 Compare January 11, 2022 04:31

write torchscript to temp file instead bytesio in model test

cf27e0e

kazhang force-pushed the regnet_y_128gf_factory_function branch from 755b824 to cf27e0e Compare January 11, 2022 07:44

Merge branch 'main' into regnet_y_128gf_factory_function

f7e56bb

kazhang marked this pull request as ready for review January 11, 2022 16:56

docs

6985776

kazhang marked this pull request as draft January 12, 2022 06:24

kazhang changed the title ~~add regnet_y_128gf factory function~~ [WIP]add regnet_y_128gf factory function Jan 12, 2022

kazhang added 3 commits January 12, 2022 22:20

clear GPU memory

ffcdf09

no_grad

9b608aa

nit

a038c85

kazhang force-pushed the regnet_y_128gf_factory_function branch from 5656811 to a038c85 Compare January 13, 2022 00:45

kazhang marked this pull request as ready for review January 13, 2022 01:29

kazhang changed the title ~~[WIP]add regnet_y_128gf factory function~~ add regnet_y_128gf factory function Jan 13, 2022

kazhang commented Jan 13, 2022

View reviewed changes

datumbox approved these changes Jan 13, 2022

View reviewed changes

datumbox added module: models enhancement labels Jan 13, 2022

Merge branch 'main' into regnet_y_128gf_factory_function

59d56ab

datumbox merged commit e3767f8 into pytorch:main Jan 13, 2022

yiwen-song mentioned this pull request Jan 19, 2022

Adding vit_h_14 architecture #5210

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add regnet_y_128gf factory function #5176

add regnet_y_128gf factory function #5176

Uh oh!

kazhang commented Jan 7, 2022 •

edited by pytorch-probot bot

Loading

Uh oh!

facebook-github-bot commented Jan 7, 2022 •

edited

Loading

Uh oh!

yiwen-song commented Jan 9, 2022 •

edited

Loading

Uh oh!

Uh oh!

kazhang commented Jan 11, 2022 •

edited

Loading

Uh oh!

datumbox commented Jan 11, 2022

Uh oh!

prabhat00155 commented Jan 11, 2022

Uh oh!

kazhang commented Jan 11, 2022

Uh oh!

datumbox commented Jan 11, 2022

Uh oh!

kazhang commented Jan 11, 2022 •

edited

Loading

Uh oh!

datumbox commented Jan 12, 2022

Uh oh!

kazhang Jan 13, 2022

Uh oh!

kazhang Jan 13, 2022

Uh oh!

datumbox left a comment

Uh oh!

Uh oh!

		with TemporaryDirectory() as dir:
		path = os.path.join(dir, "script.pt")

add regnet_y_128gf factory function #5176

add regnet_y_128gf factory function #5176

Uh oh!

Conversation

kazhang commented Jan 7, 2022 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jan 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

unittest_linux_cpu_py3.7 (1/1)

1 failure not recognized by patterns:

Uh oh!

yiwen-song commented Jan 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kazhang commented Jan 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datumbox commented Jan 11, 2022

Uh oh!

prabhat00155 commented Jan 11, 2022

Uh oh!

kazhang commented Jan 11, 2022

Uh oh!

datumbox commented Jan 11, 2022

Uh oh!

kazhang commented Jan 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datumbox commented Jan 12, 2022

Uh oh!

kazhang Jan 13, 2022

Choose a reason for hiding this comment

Uh oh!

kazhang Jan 13, 2022

Choose a reason for hiding this comment

Uh oh!

datumbox left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kazhang commented Jan 7, 2022 •

edited by pytorch-probot bot

Loading

facebook-github-bot commented Jan 7, 2022 •

edited

Loading

yiwen-song commented Jan 9, 2022 •

edited

Loading

kazhang commented Jan 11, 2022 •

edited

Loading

kazhang commented Jan 11, 2022 •

edited

Loading