Skip to content

Commit e0e7f2d

Browse files
alex3267006Blair Briggs
authored andcommitted
{AKS}: introduce agent-init subcommand for llm setup (Azure#9268)
1 parent b32d136 commit e0e7f2d

22 files changed

+1350
-268
lines changed

src/aks-agent/HISTORY.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,11 @@ To release a new version, please select a new version number (usually plus 1 to
1212
Pending
1313
+++++++
1414

15+
1.0.0b6
16+
+++++++
17+
* Introduce the new `az aks agent-init` command for better cli interaction.
18+
* Separate llm configuration from main agent command for improved clarity and extensibility.
19+
1520
1.0.0b5
1621
+++++++
1722
* Bump holmesgpt to 0.15.0 - Enhanced AI debugging experience and bug fixes

src/aks-agent/README.rst

Lines changed: 59 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,32 +4,41 @@ Azure CLI AKS Agent Extension
44
Introduction
55
============
66

7-
The AKS Agent extension provides the "az aks agent" command, an AI-powered assistant that
8-
helps analyze and troubleshoot Azure Kubernetes Service (AKS) clusters using Large Language
9-
Models (LLMs). The agent combines cluster context, configurable toolsets, and LLMs to answer
10-
natural-language questions about your cluster (for example, "Why are my pods not starting?")
11-
and can investigate issues in both interactive and non-interactive (batch) modes.
7+
8+
The AKS Agent extension provides the "az aks agent" command, an AI-powered assistant that helps analyze and troubleshoot Azure Kubernetes Service (AKS) clusters using Large Language Models (LLMs). The agent combines cluster context, configurable toolsets, and LLMs to answer natural-language questions about your cluster (for example, "Why are my pods not starting?") and can investigate issues in both interactive and non-interactive (batch) modes.
9+
10+
New in this version: **az aks agent-init** command for easy LLM model configuration!
11+
12+
You can now use `az aks agent-init` to interactively add and configure LLM models before asking questions. This command guides you through the setup process, allowing you to add multiple models as needed. When asking questions with `az aks agent`, you can:
13+
14+
- Use `--config-file` to specify your own model configuration file
15+
- Use `--model` to select a previously configured model
16+
- If neither is provided, the last configured LLM will be used by default
17+
18+
This makes it much easier to manage and switch between multiple models for your AKS troubleshooting workflows.
1219

1320
Key capabilities
1421
----------------
1522

23+
1624
- Interactive and non-interactive modes (use --no-interactive for batch runs).
17-
- Support for multiple LLM providers (Azure OpenAI, OpenAI, etc.) via environment variables.
18-
- Configurable via a JSON/YAML config file provided with --config-file.
25+
- Support for multiple LLM providers (Azure OpenAI, OpenAI, etc.) via interactive configuration.
26+
- **Easy model setup with `az aks agent-init`**: interactively add and configure LLM models, run multiple times to add more models.
27+
- Configurable via a JSON/YAML config file provided with --config-file, or select a model with --model.
28+
- If no config or model is specified, the last configured LLM is used automatically.
1929
- Control echo and tool output visibility with --no-echo-request and --show-tool-output.
2030
- Refresh the available toolsets with --refresh-toolsets.
2131
- Stay in traditional toolset mode by default, or opt in to aks-mcp integration with ``--aks-mcp`` when you need the enhanced capabilities.
2232

2333
Prerequisites
2434
-------------
25-
26-
Before using the agent, make sure provider-specific environment variables are set. For
27-
example, Azure OpenAI typically requires AZURE_API_BASE, AZURE_API_VERSION, and AZURE_API_KEY,
28-
while OpenAI requires OPENAI_API_KEY. For more details about supported providers and required
35+
No need to manually set environment variables! All model and credential information can be configured interactively using `az aks agent-init`.
36+
For more details about supported model providers and required
2937
variables, see: https://docs.litellm.ai/docs/providers
3038

39+
3140
Quick start and examples
32-
========================
41+
=========================
3342

3443
Install the extension
3544
---------------------
@@ -38,25 +47,58 @@ Install the extension
3847
3948
az extension add --name aks-agent
4049
41-
Run the agent (Azure OpenAI example)
50+
Configure LLM models interactively
51+
----------------------------------
52+
53+
.. code-block:: bash
54+
55+
az aks agent-init
56+
57+
This command will guide you through adding a new LLM model. You can run it multiple times to add more models or update existing models. All configured models are saved locally and can be selected when asking questions.
58+
59+
Run the agent (Azure OpenAI example) :
4260
-----------------------------------
4361

62+
**1. Use the last configured model (no extra parameters needed):**
63+
4464
.. code-block:: bash
4565
46-
export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
47-
export AZURE_API_VERSION="2025-01-01-preview"
48-
export AZURE_API_KEY="sk-xxx"
66+
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup
67+
68+
**2. Specify a particular model you have configured:**
69+
70+
.. code-block:: bash
4971
5072
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment
5173
74+
**3. Use a custom config file:**
75+
76+
.. code-block:: bash
77+
78+
az aks agent "Why are my pods not starting?" --config-file /path/to/your/model_config.yaml
79+
80+
5281
Run the agent (OpenAI example)
5382
------------------------------
5483

84+
**1. Use the last configured model (no extra parameters needed):**
85+
5586
.. code-block:: bash
5687
57-
export OPENAI_API_KEY="sk-xxx"
88+
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup
89+
90+
**2. Specify a particular model you have configured:**
91+
92+
.. code-block:: bash
93+
5894
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
5995
96+
**3. Use a custom config file:**
97+
98+
.. code-block:: bash
99+
100+
az aks agent "Why are my pods not starting?" --config-file /path/to/your/model_config.yaml
101+
60102
Run in non-interactive batch mode
61103
---------------------------------
62104

src/aks-agent/azext_aks_agent/_help.py

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
short-summary: Run AI assistant to analyze and troubleshoot Kubernetes clusters.
1717
long-summary: |-
1818
This command allows you to ask questions about your Azure Kubernetes cluster and get answers using AI models.
19-
Environment variables must be set to use the AI model, please refer to https://docs.litellm.ai/docs/providers to learn more about supported AI providers and models and required environment variables.
19+
No need to manually set environment variables! All model and credential information can be configured interactively using `az aks agent-init` or via a config file.
2020
parameters:
2121
- name: --name -n
2222
type: string
@@ -36,7 +36,7 @@
3636
Note: For Azure OpenAI, it is recommended to set the deployment name as the model name until https://github.com/BerriAI/litellm/issues/13950 is resolved.
3737
- name: --api-key
3838
type: string
39-
short-summary: API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY).
39+
short-summary: API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY). (Deprecated)
4040
- name: --config-file
4141
type: string
4242
short-summary: Path to configuration file.
@@ -63,23 +63,25 @@
6363
short-summary: Enable AKS MCP integration for enhanced capabilities. Traditional mode is the default.
6464
6565
examples:
66+
- name: Ask about pod issues in the cluster with last configured model
67+
text: |-
68+
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup
6669
- name: Ask about pod issues in the cluster with Azure OpenAI
6770
text: |-
68-
export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
69-
export AZURE_API_VERSION="2025-01-01-preview"
70-
export AZURE_API_KEY="sk-xxx"
7171
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/gpt-4.1
7272
- name: Ask about pod issues in the cluster with OpenAI
7373
text: |-
74-
export OPENAI_API_KEY="sk-xxx"
7574
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
7675
- name: Run agent with config file
7776
text: |
7877
az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml --name MyManagedCluster --resource-group MyResourceGroup
7978
Here is an example of config file:
8079
```json
81-
model: "azure/gpt-4.1"
82-
api_key: "..."
80+
llms:
81+
- provider: "azure"
82+
MODEL_NAME: "gpt-4.1"
83+
AZURE_API_BASE: "https://<your-base-url>"
84+
AZURE_API_KEY: "<your-api-key>"
8385
# define a list of mcp servers, mcp server can be defined
8486
mcp_servers:
8587
aks_mcp:
@@ -131,3 +133,16 @@
131133
- name: Refresh toolsets to get the latest available tools
132134
text: az aks agent "What is the status of my cluster?" --refresh-toolsets --model azure/my-gpt4.1-deployment
133135
"""
136+
137+
helps[
138+
"aks agent-init"
139+
] = """
140+
type: command
141+
short-summary: Initialize and validate LLM provider/model configuration for AKS agent.
142+
long-summary: |-
143+
This command interactively guides you to select an LLM provider and model, validates the connection, and saves the configuration for later use.
144+
You can run this command multiple times to add or update different model configurations.
145+
examples:
146+
- name: Initialize configuration for Azure OpenAI, OpenAI or other llms
147+
text: az aks agent-init
148+
"""

src/aks-agent/azext_aks_agent/agent/agent.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,7 @@ async def _setup_mcp_mode(mcp_manager, config_file: str, model: str, api_key: st
371371

372372
# Generate enhanced MCP config
373373
mcp_config_dict = ConfigurationGenerator.generate_mcp_config(base_config_dict, server_url)
374+
mcp_config_dict.pop("llms", None) # Remove existing llms to avoid conflicts
374375

375376
# Create temporary config file with MCP settings
376377
with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as temp_file:
@@ -739,6 +740,7 @@ def _setup_traditional_mode_sync(config_file: str, model: str, api_key: str,
739740

740741
# Generate traditional config
741742
traditional_config_dict = ConfigurationGenerator.generate_traditional_config(base_config_dict)
743+
traditional_config_dict.pop("llms", None) # Remove existing llms to avoid conflicts
742744

743745
# Create temporary config and load
744746
with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as temp_file:
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# --------------------------------------------------------------------------------------------
2+
# Copyright (c) Microsoft Corporation. All rights reserved.
3+
# Licensed under the MIT License. See License.txt in the project root for license information.
4+
# --------------------------------------------------------------------------------------------
5+
6+
7+
import os
8+
from typing import List, Dict, Optional
9+
import yaml
10+
11+
from azure.cli.core.api import get_config_dir
12+
from azext_aks_agent._consts import CONST_AGENT_CONFIG_FILE_NAME
13+
14+
15+
class LLMConfigManager:
16+
"""Manages loading and saving LLM configuration from/to a YAML file."""
17+
18+
def __init__(self, config_path=None):
19+
if config_path is None:
20+
config_path = os.path.join(
21+
get_config_dir(), CONST_AGENT_CONFIG_FILE_NAME)
22+
self.config_path = os.path.expanduser(config_path)
23+
24+
def save(self, provider_name: str, params: dict):
25+
configs = self.load()
26+
if not isinstance(configs, Dict):
27+
configs = {}
28+
29+
models = configs.get("llms", [])
30+
model_name = params.get("MODEL_NAME")
31+
if not model_name:
32+
raise ValueError("MODEL_NAME is required to save configuration.")
33+
34+
# Check if model already exists, update it and move it to the last;
35+
# otherwise, append new
36+
models = [
37+
cfg for cfg in models if not (
38+
cfg.get("provider") == provider_name and cfg.get("MODEL_NAME") == model_name)]
39+
models.append({"provider": provider_name, **params})
40+
41+
configs["llms"] = models
42+
43+
with open(self.config_path, "w") as f:
44+
yaml.safe_dump(configs, f, sort_keys=False)
45+
46+
def load(self):
47+
"""Load configurations from the YAML file."""
48+
if not os.path.exists(self.config_path):
49+
return {}
50+
with open(self.config_path, "r") as f:
51+
configs = yaml.safe_load(f)
52+
return configs if isinstance(configs, Dict) else {}
53+
54+
def get_list(self) -> List[Dict]:
55+
"""Get the list of all model configurations"""
56+
return self.load()["llms"] if self.load(
57+
) and "llms" in self.load() else []
58+
59+
def get_latest(self) -> Optional[Dict]:
60+
"""Get the last model configuration"""
61+
model_configs = self.get_list()
62+
if model_configs:
63+
return model_configs[-1]
64+
raise ValueError(
65+
"No configurations found. Please run `az aks agent-init`")
66+
67+
def get_specific(
68+
self,
69+
provider_name: str,
70+
model_name: str) -> Optional[Dict]:
71+
"""
72+
Get specific model configuration by provider and model name during Q&A with --model provider/model
73+
"""
74+
model_configs = self.get_list()
75+
for cfg in model_configs:
76+
if cfg.get("provider") == provider_name and cfg.get(
77+
"MODEL_NAME") == model_name:
78+
return cfg
79+
raise ValueError(
80+
f"No configuration found for provider '{provider_name}' with model '{model_name}'. "
81+
f"Please run `az aks agent-init`")
82+
83+
def is_config_complete(self, config, provider_schema):
84+
"""
85+
Check if the given config has all required keys and valid values as per the provider schema.
86+
"""
87+
for key, meta in provider_schema.items():
88+
if meta.get("validator") and not meta["validator"](
89+
config.get(key)):
90+
return False
91+
return True
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# --------------------------------------------------------------------------------------------
2+
# Copyright (c) Microsoft Corporation. All rights reserved.
3+
# Licensed under the MIT License. See License.txt in the project root for license information.
4+
# --------------------------------------------------------------------------------------------
5+
6+
from typing import List, Tuple
7+
from rich.console import Console
8+
from .base import LLMProvider
9+
from .azure_provider import AzureProvider
10+
from .openai_provider import OpenAIProvider
11+
from .anthropic_provider import AnthropicProvider
12+
from .gemini_provider import GeminiProvider
13+
from .openai_compatible_provider import OpenAICompatibleProvider
14+
15+
16+
console = Console()
17+
18+
_PROVIDER_CLASSES: List[LLMProvider] = [
19+
AzureProvider,
20+
OpenAIProvider,
21+
AnthropicProvider,
22+
GeminiProvider,
23+
OpenAICompatibleProvider,
24+
# Add new providers here
25+
]
26+
27+
PROVIDER_REGISTRY = {}
28+
for cls in _PROVIDER_CLASSES:
29+
key = cls.name.lower()
30+
if key not in PROVIDER_REGISTRY:
31+
PROVIDER_REGISTRY[key] = cls
32+
33+
34+
def _available_providers() -> List[str]:
35+
"""Return a list of registered provider names (lowercase): ["azure", "openai", ...]"""
36+
return list(PROVIDER_REGISTRY.keys())
37+
38+
39+
def _provider_choices_numbered() -> List[Tuple[int, str]]:
40+
"""Return numbered choices: [(1, "azure"), (2, "openai"), ...]."""
41+
return [(i + 1, name) for i, name in enumerate(_available_providers())]
42+
43+
44+
def _get_provider_by_index(idx: int) -> LLMProvider:
45+
"""
46+
Return provider instance by numeric index (1-based).
47+
Raises ValueError if index is out of range.
48+
"""
49+
from holmes.utils.colors import HELP_COLOR
50+
if 1 <= idx <= len(_PROVIDER_CLASSES):
51+
console.print("You selected provider:", _PROVIDER_CLASSES[idx - 1].name, style=f"bold {HELP_COLOR}")
52+
return _PROVIDER_CLASSES[idx - 1]()
53+
raise ValueError(f"Invalid provider index: {idx}")
54+
55+
56+
def prompt_provider_choice() -> LLMProvider:
57+
"""
58+
Show a numbered menu and return the chosen provider instance.
59+
Keeps prompting until a valid selection is made.
60+
"""
61+
from holmes.utils.colors import HELP_COLOR, ERROR_COLOR
62+
from holmes.interactive import SlashCommands
63+
choices = _provider_choices_numbered()
64+
if not choices:
65+
raise ValueError("No providers are registered.")
66+
while True:
67+
for idx, name in choices:
68+
console.print(f" {idx}. {name}", style=f"bold {HELP_COLOR}")
69+
sel_idx = console.input(
70+
f"[bold {HELP_COLOR}]Enter the number of your LLM provider: [/bold {HELP_COLOR}]").strip().lower()
71+
72+
if sel_idx == "/exit":
73+
raise SystemExit(0)
74+
try:
75+
return _get_provider_by_index(int(sel_idx))
76+
except ValueError as e:
77+
console.print(
78+
f"{e}. Please enter a valid number, or type '{SlashCommands.EXIT.command}' to exit.",
79+
style=f"{ERROR_COLOR}")
80+
81+
82+
__all__ = [
83+
"PROVIDER_REGISTRY",
84+
"prompt_provider_choice",
85+
]

0 commit comments

Comments
 (0)