Skip to content

Conversation

@Wauplin
Copy link
Collaborator

@Wauplin Wauplin commented Jun 6, 2024

Related to internal slack thread.

HF mirror example: https://huggingface.co/datasets/Wauplin/diffusers-community-pipelines-mirror. Each tag/branch is a subfolder with all the pipelines in it.

This PR adds a new CI workflow to host the community pipelines folder (./examples/community) as an HF dataset. The HF dataset will only be a mirror of the content on Github and should be automatically updated by this CI. Action is triggered on commits on new tags (i.e. on new releases) and on commits on main updating the ./examples/community folder.

Once this will be done, we will be able to download community pipeline modules directly from the HF Hub instead of relying on Github files. This should simplify the logic as we will be able to use hf_hub_download which handles cache_dir, force_download, local_files_only, etc... natively. Downloading from Github currently requires the deprecated cached_download that will soon be removed from huggingface_hub.


TODO before merging: (by a diffusers maintainer)

  • Create a dataset repo in diffusers/community-pipelines-mirror in the diffusers organization on the Hub.
  • Create a fine-grained token for https://huggingface.co/diffusers-bot that has write access only to this dataset repo
  • Set this newly created token as HF_TOKEN_MIRROR_COMMUNITY_PIPELINES in Github secret

TODO once merged:

@Wauplin Wauplin changed the title [test] Mirror community pipeline folder on HF Mirror /examples/community folder on HF Jun 6, 2024
@Wauplin Wauplin changed the title Mirror /examples/community folder on HF Mirror ./examples/community folder on HF Jun 6, 2024
@Wauplin Wauplin marked this pull request as ready for review June 6, 2024 16:07
@Wauplin Wauplin requested a review from sayakpaul June 6, 2024 16:07
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this!

I have taken care of the todos (before merging one), too.

@Wauplin
Copy link
Collaborator Author

Wauplin commented Jun 7, 2024

Thanks for the review and already handling the TODOs @sayakpaul! ❤️
I've addressed your comments so we're good to merge I think :)

@sayakpaul
Copy link
Member

No worries. Will merge once the CI is green (although it should not matter since the CI is not supposed to be triggered during PRs).

From the todos after merging:

upload all previous tags to the dataset mirror

What exactly needs to be done here?

@Wauplin
Copy link
Collaborator Author

Wauplin commented Jun 7, 2024

What exactly needs to be done here?

I actually just pushed a new commit (72783bb) that will allow us to trigger this workflow manually. What we need to do is trigger a deploy for every existing tag + for the main branch to populate the dataset repo. The CI will then take care of updating it when needed. I can take care of that part I think and will let you know once it's done.

@Wauplin Wauplin marked this pull request as draft June 7, 2024 08:12
@Wauplin
Copy link
Collaborator Author

Wauplin commented Jun 7, 2024

(sorry, just realized the yml is incorrect. Converted to draft to avoid it being merged)

@Wauplin Wauplin marked this pull request as ready for review June 7, 2024 08:54
@Wauplin Wauplin merged commit e0fae6f into huggingface:main Jun 7, 2024
@Wauplin
Copy link
Collaborator Author

Wauplin commented Jun 7, 2024

Ok, sorry for the mess. I created and merged #8425, #8426 and #8427 to fix a few things. Files are getting correctly hosted on the Hub: https://huggingface.co/datasets/diffusers/community-pipelines-mirror/tree/main 🎉

Workflow can be manually triggered on https://github.com/huggingface/diffusers/actions/workflows/mirror_community_pipeline.yml.
Next time you do a new release, would be good to check that everything is correctly uploaded as well.

@sayakpaul
Copy link
Member

Cool, would you mind updating our Slack channel about this? This is very nice move, indeed.

Next time you do a new release, would be good to check that everything is correctly uploaded as well.

Perhaps we could set up automated reporting for this?

@Wauplin
Copy link
Collaborator Author

Wauplin commented Jun 7, 2024

would you mind updating our Slack channel about this?

Will do!

Perhaps we could set up automated reporting for this?

A CI checking if the CI failed?

@sayakpaul
Copy link
Member

A CI checking if the CI failed?

If the CI isn't successful, we will report the error to a Slack channel.

@Wauplin
Copy link
Collaborator Author

Wauplin commented Jun 7, 2024

Ah yes, makes sense. Do you have an example of how to do that? Happy to help but if you know how to do it/configure it, can I leave it to you? 🙏

@sayakpaul
Copy link
Member

This is how automated reporting is setup for a release: https://github.com/huggingface/diffusers/blob/main/.github/workflows/notify_slack_about_release.yml. I plan to update the benchmarking workflow and this workflow for automated reporting. But could be some days given the upcoming releases.

@Wauplin
Copy link
Collaborator Author

Wauplin commented Jun 7, 2024

Not a priority indeed! Good luck for the upcoming releases!

sayakpaul added a commit that referenced this pull request Dec 23, 2024
* first draft

* secret

* tiktok

* capital matters

* dataset matter

* don't be a prick

* refact

* only on main or tag

* document with an example

* Update destination dataset

* link

* allow manual trigger

* better

* lin

---------

Co-authored-by: Sayak Paul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants