[From pretrained] Speed-up loading from cache #2515

patrickvonplaten · 2023-02-28T13:27:00Z

As described in #2514 we currently ping the Hub too often when all files are already cached.

This PR makes sure that for every download the Hub is only pinged exactly once. All the functionality is copied from transformers (thanks @sgugger)

UPDATE:

New benchmark of #2514 (comment) reveals the speed-up:

Loading a single cached model:

Previous: 1.0550 sec (2 HEAD calls)
This PR: 0.53 sec (1 HEAD call)

Loading a single cached pipeline:

Previous: 2.27 sec (1 HEAD, 2 GET)
This PR: 1.1 sec (1 GET)

As mentioned by @pcuenca, in the previous version it wasn't possible to have certainty that the correct model variants were downloaded by just doing one HEAD call. So we change the HEAD call to a single GET call that retrieves all info about the pipeline and can then load the pipeline.

Note 1: This PR requires the new huggingface_hub >= 0.13.0 as we're using the new telemetry function.
Note 2: DiffusionPipeline.from_pretrained is refactored in this PR and should now be much more readable.

🚨🚨 Make sure to update huggingface_hub to 0.13.0 🚨🚨

HuggingFaceDocBuilderDev · 2023-02-28T13:31:27Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2023-02-28T13:46:49Z

Still need to add tests, but apart from this everything should be good!

patrickvonplaten · 2023-02-28T13:47:54Z

src/diffusers/utils/hub_utils.py

@@ -129,3 +134,116 @@ def create_model_card(args, model_name):

    card_path = os.path.join(args.output_dir, "README.md")
    model_card.save(card_path)
+
+
+def extract_commit_hash(resolved_file: Optional[str], commit_hash: Optional[str] = None):


all the added files are mostly copied from transformers with some small changes.

pcuenca

It's hard to follow, but I think variants are not handled properly. I made a couple of other minor comments.

src/diffusers/models/modeling_utils.py

src/diffusers/pipelines/pipeline_utils.py

src/diffusers/utils/hub_utils.py

pcuenca · 2023-02-28T15:49:20Z

Another observation: if we load individual components separately (the unet, the vae, then the pipeline), I think each call to from_pretrained performs a ~~GET operation~~ request to retrieve the corresponding configuration file. Not sure if there's a way around that.

src/diffusers/models/modeling_utils.py

src/diffusers/configuration_utils.py

williamberman · 2023-03-06T00:47:32Z

The documentation for hf_hub_download seems to imply it should handle local caching. "Download a given file if it's not already present in the local cache."
We start using both commit_hash and revision and sometimes as arguments to the same function. This gets a bit confusing as a revision can be a commit hash. Especially in try_cache_hub_download where revision is passed via kwargs to hf_hub_download but the arg _commit_hash is passed to try_to_load_from_cache as the argument revision. I might be missing something but ideally our version control library should transparently handle alternative revision selections.

Wauplin

Made a few comments on the implementation. I think it could be simplified (we do not need the same complexity as in transformers and I also agree with @williamberman that revision should be used instead of having a duplicate _commit_hash.

src/diffusers/utils/hub_utils.py

src/diffusers/pipelines/pipeline_utils.py

williamberman · 2023-03-09T20:25:17Z

src/diffusers/configuration_utils.py

+
+            commit_hash = extract_commit_hash(config_file)
+            config_dict["_commit_hash"] = commit_hash


Nice! I think in the future it would be nice if we had an option for the hub library to return additional metadata (including the commit hash) when we download a file. It looks like in the source, the commit hash is manually read out of the refs folder or the first metadata request from the hub. Relying on the structure of the local cache is ok but if we already retrieve the information in the function, would be nice to just explicitly return it.

Sorry if I already said something similar to this, can't remember if I did :P

pcuenca

Big PR 😅 Looks good, will run some additional tests looking for edge cases.

src/diffusers/pipelines/pipeline_utils.py

pcuenca · 2023-03-09T20:00:31Z

src/diffusers/pipelines/pipeline_utils.py

        if return_cached_folder:
+            message = f"Passing `return_cached_folder=True` is deprecated and will be removed in `diffusers=0.17.0`. Please do the following instead: \n 1. Load the cached_folder via `cached_folder={cls}.download({pretrained_model_name_or_path})`. \n 2. Load the pipeline by loading from the cached folder: `pipeline={cls}.from_pretrained(cached_folder)`."


Just a question, why is this not desired anymore?

Because if one wants the cached_folder one can just split:

from_pretrained(....)

into

cached_folder = download(...) from_pretrained(cached_folder)

pcuenca · 2023-03-09T20:05:29Z

src/diffusers/pipelines/pipeline_utils.py

+
+        <Tip>
+
+        Activate the special ["offline-mode"](https://huggingface.co/diffusers/installation.html#offline-mode) to use


This link does not exist. Should it point to https://huggingface.co/docs/diffusers/installation? There's no offline-mode section there, just a note about disabling telemetry.

pcuenca · 2023-03-09T20:08:06Z

src/diffusers/pipelines/pipeline_utils.py

+                revision=revision,
+            )
+            user_agent["pretrained_model_name"] = pretrained_model_name
+            send_telemetry("pipelines", library_name="diffusers", library_version=__version__, user_agent=user_agent)


I get this is async, which is cool. However, wouldn't it be possible to send the required data as part of the model_info call, which needs to download info anyway? (I guess it's not possible, I don't know how data collection works in the backend)

cc @Wauplin

src/diffusers/pipelines/pipeline_utils.py

pcuenca · 2023-03-09T20:31:14Z

Linking to a comment I made in a conversation I didn't realize was resolved: #2515 (comment) /cc @Wauplin, do you think we should address that?

williamberman · 2023-03-09T20:33:14Z

src/diffusers/models/modeling_utils.py

+        # load config
+        config, unused_kwargs = cls.load_config(
+            config_path,
+            cache_dir=cache_dir,
+            return_unused_kwargs=True,
+            force_download=force_download,
+            resume_download=resume_download,
+            proxies=proxies,
+            local_files_only=local_files_only,
+            use_auth_token=use_auth_token,
+            revision=revision,
+            subfolder=subfolder,
+            device_map=device_map,
+            user_agent=user_agent,
+            **kwargs,
+        )
+        commit_hash = config.pop("_commit_hash", None)


I don't think we ever expect the commit hash to be part of the pipeline config, so I think it should be an explicit return value

Suggested change

# load config

config, unused_kwargs = cls.load_config(

config_path,

cache_dir=cache_dir,

return_unused_kwargs=True,

force_download=force_download,

resume_download=resume_download,

proxies=proxies,

local_files_only=local_files_only,

use_auth_token=use_auth_token,

revision=revision,

subfolder=subfolder,

device_map=device_map,

user_agent=user_agent,

**kwargs,

)

commit_hash = config.pop("_commit_hash", None)

# load config

config, unused_kwargs, commit_hash = cls.load_config(

config_path,

cache_dir=cache_dir,

return_unused_kwargs=True,

force_download=force_download,

resume_download=resume_download,

proxies=proxies,

local_files_only=local_files_only,

use_auth_token=use_auth_token,

revision=revision,

subfolder=subfolder,

device_map=device_map,

user_agent=user_agent,

return commit_hash=True,

**kwargs,

)

williamberman · 2023-03-09T20:47:20Z

src/diffusers/models/modeling_utils.py

@@ -829,6 +808,9 @@ def _get_model_file(
            and version.parse(version.parse(__version__).base_version) >= version.parse("0.17.0")
        ):
            try:
+                if commit_hash is not None and revision is None:
+                    revision = commit_hash


Note that this will cause the warning logs in this function to print the commit hash instead of the revision

Good catch!

williamberman · 2023-03-09T20:49:00Z

src/diffusers/schedulers/scheduling_utils.py

+        # _commit_hash
+        config.pop("_commit_hash", None)


Yea, if commit hash becomes a return value instead of added on to config, we don't have to do this. This could be a bit confusing for someone reading for the first time

tests/test_modeling_common.py

williamberman · 2023-03-09T20:57:51Z

tests/test_modeling_common.py

+    def tearDown(self):
+        # clean up the VRAM after each test
+        super().tearDown()
+        gc.collect()
+        torch.cuda.empty_cache()
+
+        import diffusers
+
+        diffusers.utils.import_utils._safetensors_available = True
+


Suggested change

def tearDown(self):

# clean up the VRAM after each test

super().tearDown()

gc.collect()

torch.cuda.empty_cache()

import diffusers

diffusers.utils.import_utils._safetensors_available = True

Tied to https://github.com/huggingface/diffusers/pull/2515/files#r1131591497

I also don't think we need any of the gc/cuda caching code in the utils test?

Still want to make sure diffusers.utils.import_utils._safetensors_available = True is done in case the test fails

williamberman · 2023-03-09T21:03:01Z

tests/test_pipelines.py

+    def test_one_request_upon_cached(self):
+        with tempfile.TemporaryDirectory() as tmpdirname:
+            with requests_mock.mock(real_http=True) as m:
+                DiffusionPipeline.download(
+                    "hf-internal-testing/tiny-stable-diffusion-pipe", safety_checker=None, cache_dir=tmpdirname
+                )
+
+            download_requests = [r.method for r in m.request_history]
+            assert download_requests.count("HEAD") == 16, "15 calls to files + send_telemetry"
+            assert download_requests.count("GET") == 17, "15 calls to files + model_info + model_index.json"
+            assert (
+                len(download_requests) == 33
+            ), "2 calls per file (15 files) + send_telemetry, model_info and model_index.json"
+
+            with requests_mock.mock(real_http=True) as m:
+                DiffusionPipeline.download(
+                    "hf-internal-testing/tiny-stable-diffusion-pipe", safety_checker=None, cache_dir=tmpdirname
+                )
+
+            cache_requests = [r.method for r in m.request_history]
+            assert cache_requests.count("HEAD") == 1, "send_telemetry is only HEAD"
+            assert cache_requests.count("GET") == 1, "model info is only GET"
+            assert (
+                len(cache_requests) == 2
+            ), "We should call only `model_info` to check for _commit hash and `send_telemetry`"
+


williamberman

few nits but looks g2g. I wasn't super thorough on going through the pipeline loading logic and am more so relying on tests still passing.

My understanding of the high level is just that we need to pass a commit hash to hf_hub_download because a commit hash + the expected file already being in the cache just early returns the file path in the cache

src/diffusers/pipelines/pipeline_utils.py

tests/test_modeling_common.py

Co-authored-by: Pedro Cuenca <[email protected]>

pcuenca · 2023-03-10T12:30:58Z

src/diffusers/pipelines/pipeline_utils.py

+        Activate the special
+        ["offline-mode"](https://huggingface.co/diffusers/installation.html#notice-on-telemetry-logging) to use this
+        method in a firewalled environment.


Suggested change

Activate the special

["offline-mode"](https://huggingface.co/diffusers/installation.html#notice-on-telemetry-logging) to use this

method in a firewalled environment.

Activate the special

["offline-mode"](https://huggingface.co/docs/diffusers/installation#notice-on-telemetry-logging) to use this

method in a firewalled environment.

URL was wrong.

But still, unless I'm not understanding this tip correctly the doc is not related to offline mode or being behind a firewall. Maybe replace with something like:

Suggested change

Activate the special

["offline-mode"](https://huggingface.co/diffusers/installation.html#notice-on-telemetry-logging) to use this

method in a firewalled environment.

Use the `proxies` arg if you are in a firewalled environment, or `local_files_only` for full offline mode,

which requires the pipeline to be cached locally. Please, refer to

[these notes](https://huggingface.co/docs/diffusers/installation#notice-on-telemetry-logging) to

disable all telemetry logging.

Yeah it's probably just a bad copy-paste from transformers that we should delete

* [From pretrained] Speed-up loading from cache * up * Fix more * fix one more bug * make style * bigger refactor * factor out function * Improve more * better * deprecate return cache folder * clean up * improve tests * up * upload * add nice tests * simplify * finish * correct * fix version * rename * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> * rename * correct doc string * correct more * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * apply code suggestions * finish --------- Co-authored-by: Lucain <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>

[From pretrained] Speed-up loading from cache

7c3c3e1

patrickvonplaten added 2 commits February 28, 2023 14:34

up

d1ad4d3

Fix more

19a0fdf

patrickvonplaten requested review from williamberman, Wauplin, anton-l and pcuenca February 28, 2023 13:47

patrickvonplaten commented Feb 28, 2023

View reviewed changes

patrickvonplaten added 2 commits February 28, 2023 15:28

fix one more bug

300544a

make style

152f902

patrickvonplaten changed the title ~~[From pretrained] Speed-up loading from cache~~ [WIP][From pretrained] Speed-up loading from cache Feb 28, 2023

pcuenca reviewed Feb 28, 2023

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/pipeline_utils.py Outdated Show resolved Hide resolved

src/diffusers/utils/hub_utils.py Outdated Show resolved Hide resolved

patrickvonplaten added 5 commits February 28, 2023 17:54

bigger refactor

d13141a

factor out function

c4a49e6

Improve more

513b213

better

c4aadde

deprecate return cache folder

b43be19

williamberman reviewed Mar 6, 2023

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

williamberman reviewed Mar 6, 2023

View reviewed changes

src/diffusers/configuration_utils.py Show resolved Hide resolved

Wauplin reviewed Mar 6, 2023

View reviewed changes

src/diffusers/utils/hub_utils.py Outdated Show resolved Hide resolved

src/diffusers/utils/hub_utils.py Outdated Show resolved Hide resolved

src/diffusers/utils/hub_utils.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/pipeline_utils.py Show resolved Hide resolved

patrickvonplaten added 6 commits March 8, 2023 12:51

Merge branch 'main' into avoid_calling_hub_if_already_downlaoded

22eeb11

clean up

a37cb95

improve tests

0f39ab7

up

d6a1815

upload

79afaf2

add nice tests

e4bff0b

patrickvonplaten added 5 commits March 9, 2023 17:33

Merge branch 'main' into avoid_calling_hub_if_already_downlaoded

d085f06

rename

26aacc0

finish

4590c99

correct doc string

b569eb8

correct more

3470424

williamberman reviewed Mar 9, 2023

View reviewed changes

pcuenca approved these changes Mar 9, 2023

View reviewed changes

williamberman reviewed Mar 9, 2023

View reviewed changes

tests/test_modeling_common.py Outdated Show resolved Hide resolved

williamberman reviewed Mar 9, 2023

View reviewed changes

williamberman approved these changes Mar 9, 2023

View reviewed changes

patrickvonplaten commented Mar 10, 2023

View reviewed changes

src/diffusers/pipelines/pipeline_utils.py Outdated Show resolved Hide resolved

patrickvonplaten commented Mar 10, 2023

View reviewed changes

tests/test_modeling_common.py Outdated Show resolved Hide resolved

patrickvonplaten commented Mar 10, 2023

View reviewed changes

tests/test_modeling_common.py Outdated Show resolved Hide resolved

patrickvonplaten commented Mar 10, 2023

View reviewed changes

tests/test_modeling_common.py Outdated Show resolved Hide resolved

patrickvonplaten and others added 3 commits March 10, 2023 10:40

Apply suggestions from code review

d28e8d4

Co-authored-by: Pedro Cuenca <[email protected]>

apply code suggestions

0359a96

finish

343a330

patrickvonplaten merged commit d761b58 into main Mar 10, 2023

patrickvonplaten deleted the avoid_calling_hub_if_already_downlaoded branch March 10, 2023 10:56

pcuenca reviewed Mar 10, 2023

View reviewed changes

patrickvonplaten mentioned this pull request Mar 28, 2023

Loading cached pipeline takes too long #2514

Closed

Wauplin mentioned this pull request Jul 11, 2023

FIX force_download in download utility #4036

Merged


		commit_hash = extract_commit_hash(config_file)
		config_dict["_commit_hash"] = commit_hash

		if return_cached_folder:
		message = f"Passing `return_cached_folder=True` is deprecated and will be removed in `diffusers=0.17.0`. Please do the following instead: \n 1. Load the cached_folder via `cached_folder={cls}.download({pretrained_model_name_or_path})`. \n 2. Load the pipeline by loading from the cached folder: `pipeline={cls}.from_pretrained(cached_folder)`."


		<Tip>

		Activate the special ["offline-mode"](https://huggingface.co/diffusers/installation.html#offline-mode) to use

-        Activate the special
-        ["offline-mode"](https://huggingface.co/diffusers/installation.html#notice-on-telemetry-logging) to use this
-        method in a firewalled environment.
+        Use the `proxies` arg if you are in a firewalled environment, or `local_files_only` for full offline mode,
+        which requires the pipeline to be cached locally. Please, refer to
+        [these notes](https://huggingface.co/docs/diffusers/installation#notice-on-telemetry-logging) to
+        disable all telemetry logging.

[From pretrained] Speed-up loading from cache #2515

[From pretrained] Speed-up loading from cache #2515

Uh oh!

Conversation

patrickvonplaten commented Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Feb 28, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcuenca commented Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

williamberman commented Mar 6, 2023

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pcuenca commented Mar 9, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

williamberman Mar 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

williamberman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

patrickvonplaten commented Feb 28, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 28, 2023 •

edited

Loading

pcuenca commented Feb 28, 2023 •

edited

Loading

williamberman Mar 9, 2023 •

edited

Loading