-
Notifications
You must be signed in to change notification settings - Fork 678
Description
This is a follow-up to the discussion we had yesterday on the team meeting.
User2 is defined in @pbontrager 's #54
The Problem
I'm user 2. My workflow typically is to copy a recipe from torchtune, edit it to my needs, and run it.
copy a recipe from torchtune
That's the part that we need to very clearly define.
- Am I copying the recipe from the
main
branch of the torchtune repo?- The pb here is that I'm relying on the stable version of torchtune. But the recipe on the
main
branch, it's tracking the dev version of torchtune, and it probably contains some code and utilities that I don't have access to in my stable torchtune version. So I can't run it :(
- The pb here is that I'm relying on the stable version of torchtune. But the recipe on the
- Am I copying the recipe from... where
tochtune
was installed? (e.g. some very-hard-to-fine-place like/home/nicolashug/.miniconda3/envs/myenv/lib/python3.10/site-packages/torchtune/assets
??)
We need a blessed way to copy/paste the training recipes for a given stable version of torchtune
It's important to understand that this problem exists regardless of the repo structure that we have, and regardless of whether we are bundling the recipes as part of the package, or as assets/resources.
BTW, to enable User1 workflow, having the recipes as assets / resources in the package is probably a good solution, as Philip already suggested in other channels.
Back to User2: I don't have a perfect solution to suggest, I just wanted to flag something we need to think about. Some random thoughts:
- we probably want a visible disclaimer on top of the READMEs in the recipes (and scripts and configs) to tell users to checkout the repo in a state corresponding to their stable release.
- Thinking of a blessed way to copy recipes: what about a CLI that would copy-paste the relevant files of a recipe for a given version??
torchtune make_recipe --recipe=<recipe-name> --version=0.2 --output=...
This would make sure to have the proper finetune_llm.py
file with the appropriate configs, etc.?
(by default, the version would just be the current version of torchtune
)