You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When executing guidellm against a vllm instance with an arbitrary model name set, guidellm errors out with a huggingface error that it can't access the tokenizer_config.json.
Duplicating the issue
Deploy a vllm instance with any model and set the following argument:
guidellm errors out with a 401 on the toeknizer_config.json for my-model since my-model isn't a valid huggingface model name.
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/my-model/resolve/main/tokenizer_config.json
Stack Trace
The following is an example of a full stack trace of the error:
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
response.raise_for_status()
File "/opt/app-root/lib64/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/granite/resolve/main/tokenizer_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/transformers/utils/hub.py", line 424, in cached_files
hf_hub_download(
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 961, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 1068, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 1596, in _raise_on_head_call_error
raise head_call_error
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 1484, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 1401, in get_hf_file_metadata
r = _request_wrapper(
^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 285, in _request_wrapper
response = _request_wrapper(
^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 309, in _request_wrapper
hf_raise_for_status(response)
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/utils/_http.py", line 459, in hf_raise_for_status
raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-67eb10fa-305b679009abf7055fe388ff;30339271-9f91-49ff-8324-c347a6b5da16)
Repository Not Found for url: https://huggingface.co/granite/resolve/main/tokenizer_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication
Invalid username or password.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/guidellm/main.py", line 239, in generate_benchmark_report
tokenizer_inst = backend_inst.model_tokenizer()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/guidellm/backend/base.py", line 173, in model_tokenizer
return AutoTokenizer.from_pretrained(self.model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 910, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 742, in get_tokenizer_config
resolved_config_file = cached_file(
^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/transformers/utils/hub.py", line 266, in cached_file
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/transformers/utils/hub.py", line 456, in cached_files
raise EnvironmentError(
OSError: granite is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/bin/guidellm", line 8, in <module>
sys.exit(generate_benchmark_report_cli())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/guidellm/main.py", line 171, in generate_benchmark_report_cli
generate_benchmark_report(
File "/opt/app-root/lib64/python3.11/site-packages/guidellm/main.py", line 241, in generate_benchmark_report
raise ValueError(
ValueError: Could not load model's tokenizer, --tokenizer must be provided for request generation
Why is this important
OpenShift AI sets the --served-model-name argument to the name of the ServingRuntime the user provides when they are deploying a vLLM instance and does not use the actual huggingface model name. Any model deployed with OpenShift AI will not be able to be load tested with guidellm unless the user knows how to customize the --served-model-name argument and that they need to set it to the correct huggingface name.
The text was updated successfully, but these errors were encountered:
You need to pass the field --tokenizer as a path to the on-disk model or the name of the model on huggingface. Guidellm needs access to the model's tokenizer for the "emulated" data mode since its generating token sequences.
As @sjmonson mentioned, passing the --tokenizer argument (now --processor) on main will enable you to work around it. The processor/tokenizer is needed for synthetic data generation to ensure the token counts are correct for the prompts to send.
With #96 landing, now the processor is only invoked as needed. So, another work around is to pass in a dataset either as an HF dataset or a txt/csv/jsonl file which will use the text stored within as the prompts and not require the processor to count tokens.
Closing this out, but feel free to reping if you hit any issues
Uh oh!
There was an error while loading. Please reload this page.
When executing guidellm against a vllm instance with an arbitrary model name set, guidellm errors out with a huggingface error that it can't access the tokenizer_config.json.
Duplicating the issue
Deploy a vllm instance with any model and set the following argument:
--served-model-name=my-model
Run a guidellm test against the endpoint:
Results
guidellm errors out with a 401 on the toeknizer_config.json for my-model since
my-model
isn't a valid huggingface model name.Stack Trace
The following is an example of a full stack trace of the error:
Why is this important
OpenShift AI sets the
--served-model-name
argument to the name of the ServingRuntime the user provides when they are deploying a vLLM instance and does not use the actual huggingface model name. Any model deployed with OpenShift AI will not be able to be load tested with guidellm unless the user knows how to customize the--served-model-name
argument and that they need to set it to the correct huggingface name.The text was updated successfully, but these errors were encountered: