Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
264e36a
[1.0] Httpx migration (#3328)
Wauplin Sep 10, 2025
0f6d2f4
Bump minimal version to Python3.9 (#3343)
Wauplin Sep 10, 2025
398492c
Remove `HfFolder` and `InferenceAPI` classes (#3344)
Wauplin Sep 11, 2025
e1cf5b6
[v1.0] Remove more deprecated stuff (#3345)
Wauplin Sep 11, 2025
6bd2eb0
[v1.0] Remove `Repository` class (#3346)
Wauplin Sep 12, 2025
1dd1bc5
bump to 1.0.0.dev0
Wauplin Sep 12, 2025
7b2b982
Remove _deprecate_positional_args on login methods (#3349)
Wauplin Sep 12, 2025
c36962a
[v1.0] Remove imports kept only for backward compatibility (#3350)
Wauplin Sep 12, 2025
0844d9c
[v1.0] Remove keras2 utilities (#3352)
Wauplin Sep 12, 2025
9b3258e
[v1.0] Remove anything tensorflow-related + deps (#3354)
Wauplin Sep 12, 2025
1044d37
Release: v1.0.0.rc0
Wauplin Sep 15, 2025
5062377
[v1.0] Update "HTTP backend" docs + `git_vs_http` guide (#3357)
Wauplin Sep 17, 2025
0e021d4
Refactor CLI implementation using Typer (#3372)
hanouticelina Sep 18, 2025
0008034
Make HfHubHTTPError inherit from OSError (#3387)
Wauplin Sep 24, 2025
5fa1931
Release: v1.0.0.rc1
Wauplin Sep 24, 2025
b8d6092
Add new HF commands (#3384)
hanouticelina Sep 25, 2025
f3334dd
Release: v1.0.0.rc2
Wauplin Sep 25, 2025
ee6c65a
Document new HF commands (#3393)
hanouticelina Sep 26, 2025
21edca8
Add cross-platform CLI Installers (#3378)
hanouticelina Sep 29, 2025
a1c1474
update installers paths (#3400)
hanouticelina Sep 29, 2025
5d81092
Merge branch 'main' into v1.0-release
Wauplin Oct 1, 2025
470a7ee
[v1.0] feat: add migration guide for v1.0 (#3360)
google-labs-jules[bot] Oct 1, 2025
836dff1
Merge branch 'main' into v1.0-release
Wauplin Oct 2, 2025
e9fa836
prepare rc3
Wauplin Oct 2, 2025
942fe42
Remove contrib test suite (#3403)
Wauplin Oct 7, 2025
379c06a
Strict typed dict validator (#3408)
Wauplin Oct 7, 2025
c5c1c5f
Implement dry run mode in download CLI (#3407)
Wauplin Oct 7, 2025
7505554
Remove `huggingface-cli` entirely in favor of `hf` (#3404)
Wauplin Oct 7, 2025
6af2baa
Fix proxy environment variables not used in v1.0 (#3412)
Wauplin Oct 7, 2025
f305cce
reset
Wauplin Oct 7, 2025
2a6feff
Merge branch 'main' into v1.0-release
Wauplin Oct 8, 2025
c154d28
Release: v1.0.0.rc3
Wauplin Oct 8, 2025
1ddb16f
[hf CLI] check for updates and notify user (#3418)
Wauplin Oct 8, 2025
1c39425
Fix forward ref validation if total false (#3423)
Wauplin Oct 8, 2025
59b160c
Release: v1.0.0.rc4
Wauplin Oct 8, 2025
888dbda
Disable rich in CLI (#3427)
Wauplin Oct 9, 2025
9005007
Print version only in CLI
Wauplin Oct 9, 2025
069ee68
Merge branch 'v1.0-release' of github.com:huggingface/huggingface_hub…
Wauplin Oct 9, 2025
4faf7e5
add inference endpoints cli
hanouticelina Oct 9, 2025
30c13d6
fix naming
hanouticelina Oct 9, 2025
e670188
update docs
hanouticelina Oct 9, 2025
f387a11
Merge branch 'v1.0-release' of github.com:huggingface/huggingface_hub…
hanouticelina Oct 9, 2025
b49a70a
wording
hanouticelina Oct 9, 2025
7b7b122
remove logging
hanouticelina Oct 9, 2025
0862c4a
don't instantiate logger when not needed
hanouticelina Oct 9, 2025
d81a59c
refactor
hanouticelina Oct 9, 2025
6a50b0b
remove unused import
hanouticelina Oct 9, 2025
c5b0638
nit
hanouticelina Oct 9, 2025
5b4111d
nit
hanouticelina Oct 9, 2025
7f30eb3
Apply suggestions from code review
hanouticelina Oct 13, 2025
52809ad
use docstring
hanouticelina Oct 13, 2025
5f570c2
rework CLI UX
hanouticelina Oct 13, 2025
e83f256
Merge branch 'main' of github.com:huggingface/huggingface_hub into in…
hanouticelina Nov 4, 2025
abf6073
fix merge conflicts
hanouticelina Nov 4, 2025
2f63862
some fixes
hanouticelina Nov 4, 2025
7f2dee1
fix
hanouticelina Nov 4, 2025
8893053
generate cli reference
hanouticelina Nov 4, 2025
e4bf7ee
Update src/huggingface_hub/cli/inference_endpoints.py
hanouticelina Nov 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions docs/source/en/guides/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,20 @@ On Windows:
>>> powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
```

Alternatively, you can install the `hf` CLI with a single command:

On macOS and Linux:

```bash
>>> curl -LsSf https://hf.co/cli/install.sh | sh
```

On Windows:

```powershell
>>> powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
```

Once installed, you can check that the CLI is correctly setup:

```
Expand Down Expand Up @@ -1016,3 +1030,34 @@ Manage scheduled jobs using
# Delete a scheduled job
>>> hf jobs scheduled delete <scheduled_job_id>
```

## hf endpoints

Use `hf endpoints` to list, deploy, describe, and manage Inference Endpoints directly from the terminal. The legacy
`hf inference-endpoints` alias remains available for compatibility.

```bash
# Lists endpoints in your namespace
>>> hf endpoints ls

# Deploy an endpoint from Model Catalog
>>> hf endpoints catalog deploy --repo openai/gpt-oss-120b --name my-endpoint

# Deploy an endpoint from the Hugging Face Hub
>>> hf endpoints deploy my-endpoint --repo gpt2 --framework pytorch --accelerator cpu --instance-size x2 --instance-type intel-icl

# List catalog entries
>>> hf endpoints catalog ls

# Show status and metadata
>>> hf endpoints describe my-endpoint

# Pause the endpoint
>>> hf endpoints pause my-endpoint

# Delete without confirmation prompt
>>> hf endpoints delete my-endpoint --yes
```

> [!TIP]
> Add `--namespace` to target an organization, `--token` to override authentication.
46 changes: 46 additions & 0 deletions docs/source/en/guides/inference_endpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,16 @@ The first step is to create an Inference Endpoint using [`create_inference_endpo
... )
```

Or via CLI:

```bash
hf endpoints deploy my-endpoint-name --repo gpt2 --framework pytorch --accelerator cpu --vendor aws --region us-east-1 --instance-size x2 --instance-type intel-icl --task text-generation

# Deploy from the catalog with a single command
hf endpoints catalog deploy my-endpoint-name --repo openai/gpt-oss-120b
```


In this example, we created a `protected` Inference Endpoint named `"my-endpoint-name"`, to serve [gpt2](https://huggingface.co/gpt2) for `text-generation`. A `protected` Inference Endpoint means your token is required to access the API. We also need to provide additional information to configure the hardware requirements, such as vendor, region, accelerator, instance type, and size. You can check out the list of available resources [here](https://api.endpoints.huggingface.cloud/#/v2%3A%3Aprovider/list_vendors). Alternatively, you can create an Inference Endpoint manually using the [Web interface](https://ui.endpoints.huggingface.co/new) for convenience. Refer to this [guide](https://huggingface.co/docs/inference-endpoints/guides/advanced) for details on advanced settings and their usage.

The value returned by [`create_inference_endpoint`] is an [`InferenceEndpoint`] object:
Expand All @@ -42,6 +52,12 @@ The value returned by [`create_inference_endpoint`] is an [`InferenceEndpoint`]
InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2', status='pending', url=None)
```

Or via CLI:

```bash
hf endpoints describe my-endpoint-name
```

It's a dataclass that holds information about the endpoint. You can access important attributes such as `name`, `repository`, `status`, `task`, `created_at`, `updated_at`, etc. If you need it, you can also access the raw response from the server with `endpoint.raw`.

Once your Inference Endpoint is created, you can find it on your [personal dashboard](https://ui.endpoints.huggingface.co/).
Expand Down Expand Up @@ -101,6 +117,14 @@ InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2
[InferenceEndpoint(name='aws-starchat-beta', namespace='huggingface', repository='HuggingFaceH4/starchat-beta', status='paused', url=None), ...]
```

Or via CLI:

```bash
hf endpoints describe my-endpoint-name
hf endpoints ls --namespace huggingface
hf endpoints ls --namespace '*'
```

## Check deployment status

In the rest of this guide, we will assume that we have a [`InferenceEndpoint`] object called `endpoint`. You might have noticed that the endpoint has a `status` attribute of type [`InferenceEndpointStatus`]. When the Inference Endpoint is deployed and accessible, the status should be `"running"` and the `url` attribute is set:
Expand All @@ -117,6 +141,12 @@ Before reaching a `"running"` state, the Inference Endpoint typically goes throu
InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2', status='pending', url=None)
```

Or via CLI:

```bash
hf endpoints describe my-endpoint-name
```

Instead of fetching the Inference Endpoint status while waiting for it to run, you can directly call [`~InferenceEndpoint.wait`]. This helper takes as input a `timeout` and a `fetch_every` parameter (in seconds) and will block the thread until the Inference Endpoint is deployed. Default values are respectively `None` (no timeout) and `5` seconds.

```py
Expand Down Expand Up @@ -189,6 +219,14 @@ InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2
# Endpoint is not 'running' but still has a URL and will restart on first call.
```

Or via CLI:

```bash
hf endpoints pause my-endpoint-name
hf endpoints resume my-endpoint-name
hf endpoints scale-to-zero my-endpoint-name
```

### Update model or hardware requirements

In some cases, you might also want to update your Inference Endpoint without creating a new one. You can either update the hosted model or the hardware requirements to run the model. You can do this using [`~InferenceEndpoint.update`]:
Expand All @@ -207,6 +245,14 @@ InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2
InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2-large', status='pending', url=None)
```

Or via CLI:

```bash
hf endpoints update my-endpoint-name --repo gpt2-large
hf endpoints update my-endpoint-name --min-replica 2 --max-replica 6
hf endpoints update my-endpoint-name --accelerator cpu --instance-size x4 --instance-type intel-icl
```

### Delete the endpoint

Finally if you won't use the Inference Endpoint anymore, you can simply call [`~InferenceEndpoint.delete()`].
Expand Down
Loading
Loading