huggingface · hanouticelina · Nov 5, 2025 · Sep 10, 2025 · Sep 10, 2025 · Sep 11, 2025
diff --git a/docs/source/en/guides/cli.md b/docs/source/en/guides/cli.md
@@ -35,6 +35,20 @@ On Windows:
 >>> powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
 ```
 
+Alternatively, you can install the `hf` CLI with a single command:
+
+On macOS and Linux:
+
+```bash
+>>> curl -LsSf https://hf.co/cli/install.sh | sh
+```
+
+On Windows:
+
+```powershell
+>>> powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
+```
+
 Once installed, you can check that the CLI is correctly setup:
 
 ```
@@ -1016,3 +1030,34 @@ Manage scheduled jobs using
 # Delete a scheduled job
 >>> hf jobs scheduled delete <scheduled_job_id>
 ```
+
+## hf endpoints
+
+Use `hf endpoints` to list, deploy, describe, and manage Inference Endpoints directly from the terminal. The legacy
+`hf inference-endpoints` alias remains available for compatibility.
+
+```bash
+# Lists endpoints in your namespace
+>>> hf endpoints ls
+
+# Deploy an endpoint from Model Catalog
+>>> hf endpoints catalog deploy --repo openai/gpt-oss-120b --name my-endpoint
+
+# Deploy an endpoint from the Hugging Face Hub 
+>>> hf endpoints deploy my-endpoint --repo gpt2 --framework pytorch --accelerator cpu --instance-size x2 --instance-type intel-icl
+
+# List catalog entries
+>>> hf endpoints catalog ls
+
+# Show status and metadata
+>>> hf endpoints describe my-endpoint
+
+# Pause the endpoint
+>>> hf endpoints pause my-endpoint
+
+# Delete without confirmation prompt
+>>> hf endpoints delete my-endpoint --yes
+```
+
+> [!TIP]
+> Add `--namespace` to target an organization, `--token` to override authentication.
diff --git a/docs/source/en/guides/inference_endpoints.md b/docs/source/en/guides/inference_endpoints.md
@@ -33,6 +33,16 @@ The first step is to create an Inference Endpoint using [`create_inference_endpo
 ... )
 ```
 
+Or via CLI:
+
+```bash
+hf endpoints deploy my-endpoint-name --repo gpt2 --framework pytorch --accelerator cpu --vendor aws --region us-east-1 --instance-size x2 --instance-type intel-icl --task text-generation
+
+# Deploy from the catalog with a single command
+hf endpoints catalog deploy my-endpoint-name --repo openai/gpt-oss-120b
+```
+
+
 In this example, we created a `protected` Inference Endpoint named `"my-endpoint-name"`, to serve [gpt2](https://huggingface.co/gpt2) for `text-generation`. A `protected` Inference Endpoint means your token is required to access the API. We also need to provide additional information to configure the hardware requirements, such as vendor, region, accelerator, instance type, and size. You can check out the list of available resources [here](https://api.endpoints.huggingface.cloud/#/v2%3A%3Aprovider/list_vendors). Alternatively, you can create an Inference Endpoint manually using the [Web interface](https://ui.endpoints.huggingface.co/new) for convenience. Refer to this [guide](https://huggingface.co/docs/inference-endpoints/guides/advanced) for details on advanced settings and their usage.
 
 The value returned by [`create_inference_endpoint`] is an [`InferenceEndpoint`] object:
@@ -42,6 +52,12 @@ The value returned by [`create_inference_endpoint`] is an [`InferenceEndpoint`]
 InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2', status='pending', url=None)
 ```
 
+Or via CLI:
+
+```bash
+hf endpoints describe my-endpoint-name
+```
+
 It's a dataclass that holds information about the endpoint. You can access important attributes such as `name`, `repository`, `status`, `task`, `created_at`, `updated_at`, etc. If you need it, you can also access the raw response from the server with `endpoint.raw`.
 
 Once your Inference Endpoint is created, you can find it on your [personal dashboard](https://ui.endpoints.huggingface.co/).
@@ -101,6 +117,14 @@ InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2
 [InferenceEndpoint(name='aws-starchat-beta', namespace='huggingface', repository='HuggingFaceH4/starchat-beta', status='paused', url=None), ...]
 ```
 
+Or via CLI: 
+
+```bash
+hf endpoints describe my-endpoint-name
+hf endpoints ls --namespace huggingface
+hf endpoints ls --namespace '*'
+```
+
 ## Check deployment status
 
 In the rest of this guide, we will assume that we have a [`InferenceEndpoint`] object called `endpoint`. You might have noticed that the endpoint has a `status` attribute of type [`InferenceEndpointStatus`]. When the Inference Endpoint is deployed and accessible, the status should be `"running"` and the `url` attribute is set:
@@ -117,6 +141,12 @@ Before reaching a `"running"` state, the Inference Endpoint typically goes throu
 InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2', status='pending', url=None)
 ```
 
+Or via CLI:
+
+```bash
+hf endpoints describe my-endpoint-name
+```
+
 Instead of fetching the Inference Endpoint status while waiting for it to run, you can directly call [`~InferenceEndpoint.wait`]. This helper takes as input a `timeout` and a `fetch_every` parameter (in seconds) and will block the thread until the Inference Endpoint is deployed. Default values are respectively `None` (no timeout) and `5` seconds.
 
 ```py
@@ -189,6 +219,14 @@ InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2
 # Endpoint is not 'running' but still has a URL and will restart on first call.
 ```
 
+Or via CLI:
+
+```bash
+hf endpoints pause my-endpoint-name
+hf endpoints resume my-endpoint-name
+hf endpoints scale-to-zero my-endpoint-name
+```
+
 ### Update model or hardware requirements
 
 In some cases, you might also want to update your Inference Endpoint without creating a new one. You can either update the hosted model or the hardware requirements to run the model. You can do this using [`~InferenceEndpoint.update`]:
@@ -207,6 +245,14 @@ InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2
 InferenceEndpoint(name='my-endpoint-name', namespace='Wauplin', repository='gpt2-large', status='pending', url=None)
 ```
 
+Or via CLI:
+
+```bash
+hf endpoints update my-endpoint-name --repo gpt2-large
+hf endpoints update my-endpoint-name --min-replica 2 --max-replica 6
+hf endpoints update my-endpoint-name --accelerator cpu --instance-size x4 --instance-type intel-icl
+```
+
 ### Delete the endpoint
 
 Finally if you won't use the Inference Endpoint anymore, you can simply call [`~InferenceEndpoint.delete()`].