From 943d702b3c7ff931a9407304f093de810e34b3a5 Mon Sep 17 00:00:00 2001 From: pankit-eng Date: Mon, 13 Oct 2025 17:27:30 -0400 Subject: [PATCH 01/16] Add OpenEnv 0.1 RFC for excution environment --- rfcs/OpenEnv-0.1-RFC.md | 190 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 190 insertions(+) create mode 100644 rfcs/OpenEnv-0.1-RFC.md diff --git a/rfcs/OpenEnv-0.1-RFC.md b/rfcs/OpenEnv-0.1-RFC.md new file mode 100644 index 00000000..d046013f --- /dev/null +++ b/rfcs/OpenEnv-0.1-RFC.md @@ -0,0 +1,190 @@ +# RFC: EnvTorch Framework for agent execution environments + +**Status**: Request for Comments(RFC) +**Created**: October 2025 +**Authors**: EnvTorch Contributors + +## Summary + +An e2e framework for creating, deploying and using isolated execution environments for agentic RL training, built using Gymnasium style APIs.It provides a clean client-server architecture where environments run as FastAPI servers in Docker containers, and clients interact with them via type-safe HTTP APIs. + +## Motivation + +### Problem Statement + +Building execution environments for AI agents, code execution, or computational tasks typically involves: +- Complex setup and dependency management +- Security concerns with code execution +- Difficulty in scaling and deploying environments +- Lack of standardized interfaces between environments and clients of environments + +### Goals + +1. **Simplicity**: Simple APIs to interact with the environment from RL training code +2. **Type Safety**: Strongly-typed actions, observations, and state +3. **Isolation**: Each environment runs in its own Docker container +4. **Observability**: Leverage side-car container pattern to observe actions, observation tuples for an RL training eposide. + + +## Design + +### Architecture Overview + +``` +┌─────────────────────────────────────────────────────────┐ +│ RL code(Client Application) │ +│ RL code(Client Application) │ +│ ┌────────────────┐ ┌──────────────────┐ │ +│ │ Environment │ │ Environment │ │ +│ │ Client │ │ Client │ │ +│ │ (HTTPEnvClient)│ │ (HTTPEnvClient) │ │ +│ └────────┬───────┘ └────────┬─────────┘ │ +└───────────┼───────────────────────────────┼─────────────┘ + │ HTTP (reset, step, state) │ HTTP + │ │ +┌───────────▼───────────────────────────────▼─────────────┐ +│ Docker Containers (Isolated) │ +│ ┌──────────────────────┐ ┌──────────────────────┐ │ +│ │ FastAPI Server │ │ FastAPI Server │ │ +│ │ Environment │ │ Environment │ │ +│ │ Logic │ │ Logic │ │ +│ └──────────────────────┘ └──────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### Core Abstractions(Already available on the master) + +#### 1. Environment (Server-Side) + +```python +class Environment(ABC): + """Base class for all environments.""" + + @abstractmethod + def reset(self) -> Observation: + """Initialize new episode.""" + + @abstractmethod + def step(self, action: Action) -> Observation: + """Execute action and return observation.""" + + @property + @abstractmethod + def state(self) -> State: + """Get current episode state.""" +``` + +**Design Rationale**: +- Familiar interface for RL/environment practitioners +- Clear separation between action execution (step) and state management +- Abstract base class enforces contract across all environments + +#### 2. HTTPEnvClient (Client-Side) + +```python +class HTTPEnvClient(Generic[ActT, ObsT]): + """Base class for HTTP environment clients.""" + + def reset(self) -> StepResult[ObsT]: + """Reset environment.""" + + def step(self, action: ActT) -> StepResult[ObsT]: + """Execute action.""" + + def state(self) -> State: + """Get current state.""" + + def close(self) -> None: + """Cleanup resources by signaling to the provider.""" +``` + +**Design Rationale**: + +The HTTPEnvClient serves as the primary interface for users to interact with environments, designed with several key principles: + +- This base class handles all HTTP communication(resp, req) with the environment +- This base class handles all HTTP communication(resp, req) with the environment +- Generic types (`Generic[ActT, ObsT]`) provide compile-time type safety +- Each environment's concrete client class implements parsing step, observation, and state responses from the server into corresponding data models for the respective response. +- Each environment's concrete client class implements parsing step, observation, and state responses from the server into corresponding data models for the respective response. +- Example: `CodingEnv(HTTPEnvClient[CodeAction, CodeObservation])` +- `state()` method provides visibility into episode metadata +- Explicit `close()` ensures proper resource cleanup + +#### 3. Container Providers + +```python +class ContainerProvider(ABC): + """Abstract base for container orchestration.""" + + @abstractmethod + def start_container(self, image: str, ...) -> str: + """Start container and return base URL.""" + + @abstractmethod + def stop_container(self) -> None: + """Stop and remove container.""" + + @abstractmethod + def wait_for_ready(self, base_url: str, timeout_s: float) -> None: + """Wait for container to be ready.""" +``` + +**Design Rationale**: +- Pluggable architecture supports multiple platforms (local Docker, K8s, other orchestration providers) +- Provider abstraction decouples client from deployment details and management with easy integration with existing orchestration solutions +- Provider abstraction decouples client from deployment details and management with easy integration with existing orchestration solutions +- Consistent interface across all providers +- Higher level RL frameworks can implement their own container providers to integrate with their existing orchestration solutions. +- Higher level RL frameworks can implement their own container providers to integrate with their existing orchestration solutions. + +### Key Design Decisions + +#### Decision 1: HTTP-Based Communication + +**Chosen Approach**: Use HTTP/REST for client-server communication + +**Rationale**: +- HTTP based RPC is universal and well-understood than other alternatives like grpc or thrift +- Easy to debug with standard tools (curl, Postman) +- Supports language-agnostic clients +- FastAPI provides excellent developer experience + +#### Decision 2: Docker-Based runtime isolation and packaging + +**Chosen Approach**: Each environment runs in its own Docker container + +**Rationale**: +- Strong isolation boundaries compared to process-based isolation +- Reproducible environments with packaged dependencies +- Easy dependency management via Dockerfile +- Industry-standard tooling + +#### Decision 3: Type-Safe Models + +**Chosen Approach**: Use Python dataclasses with explicit types for actions, observations, and state + +**Rationale**: +- Native Python support (no extra dependencies) +- Clear contracts between client and server +- IDE support for autocomplete and type checking +- Easy serialization to/from JSON + + +### Example Environments + +**Purpose**: Test infrastructure, demonstrate patterns, verify deployments + +#### Coding Environment + +Executes Python code in a sandboxed environment: + +```python +from envs.coding_env import CodeAction, CodingEnv + +client = CodingEnv.from_docker_image("coding-env:latest") +result = client.step(CodeAction(code="print('Hello, World!')")) +print(result.observation.stdout) # "Hello, World!\n" +print(result.observation.exit_code) # 0 +client.close() +``` From 1cfbf397d7be65d485a94432af4587c5b6ca31c5 Mon Sep 17 00:00:00 2001 From: Joseph Spisak Date: Tue, 14 Oct 2025 13:21:03 -0700 Subject: [PATCH 02/16] Update OpenEnv-0.1-RFC.md updated the name to OpenEnv --- rfcs/OpenEnv-0.1-RFC.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/OpenEnv-0.1-RFC.md b/rfcs/OpenEnv-0.1-RFC.md index d046013f..d90fb6e5 100644 --- a/rfcs/OpenEnv-0.1-RFC.md +++ b/rfcs/OpenEnv-0.1-RFC.md @@ -1,8 +1,8 @@ -# RFC: EnvTorch Framework for agent execution environments +# RFC: OpenEnv Framework for agent execution environments **Status**: Request for Comments(RFC) **Created**: October 2025 -**Authors**: EnvTorch Contributors +**Authors**: OpenEnv Contributors ## Summary From ba976f4eaf7956534cc620cf885358c93325211d Mon Sep 17 00:00:00 2001 From: Joseph Spisak Date: Mon, 13 Oct 2025 09:11:07 -0700 Subject: [PATCH 03/16] Update README.md adding a pytorch logo :) --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index caa494db..fdf73bab 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# EnvTorch: Agentic Execution Environments +# image EnvTorch: Agentic Execution Environments An e2e framework for creating, deploying and using isolated execution environments for agentic RL training, built using Gymnasium style simple APIs. From 2a8dc6f46b7b33b4e366c1c1c230c7ab935e84bc Mon Sep 17 00:00:00 2001 From: Joseph Spisak Date: Tue, 14 Oct 2025 10:08:06 -0700 Subject: [PATCH 04/16] Update README.md Adding an experimental warning to the readme. --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index fdf73bab..ebd3cea4 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,13 @@ EnvTorch provides a standard for interacting with agentic execution environments In addition to making it easier for researchers and RL framework writers, we also provide tools for environment creators making it easier for them to create richer environments and make them available over familar protocols like HTTP and packaged using canonical technologies like docker. Environment creators can use the EnvTorch framework to create environments that are isolated, secure, and easy to deploy and use. +> ⚠️ **Early Development Warning** EnvTorch is currently in an experimental +> stage. You should expect bugs, incomplete features, and APIs that may change +> in future versions. The project welcomes bugfixes, but to make sure things are +> well coordinated you should discuss any significant change before starting the +> work. It's recommended that you signal your intention to contribute in the +> issue tracker, either by filing a new issue or by claiming an existing one. + ## Architecture ### Component Overview From e7fcfc61340f4676bcad2aff15bf703f9f0b7420 Mon Sep 17 00:00:00 2001 From: Joseph Spisak Date: Tue, 14 Oct 2025 12:51:49 -0700 Subject: [PATCH 05/16] Update README.md Creating a PR to update naming on the Readme --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index ebd3cea4..0d12b7b1 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,14 @@ -# image EnvTorch: Agentic Execution Environments +# image OpenEnv: Agentic Execution Environments An e2e framework for creating, deploying and using isolated execution environments for agentic RL training, built using Gymnasium style simple APIs. ## Overview -EnvTorch provides a standard for interacting with agentic execution environments via simple Gymnasium style APIs - step(), reset(), state(). Users of agentic execution environments can interact with the environment during RL training loops using these simple APIs. +OpenEnv provides a standard for interacting with agentic execution environments via simple Gymnasium style APIs - step(), reset(), state(). Users of agentic execution environments can interact with the environment during RL training loops using these simple APIs. -In addition to making it easier for researchers and RL framework writers, we also provide tools for environment creators making it easier for them to create richer environments and make them available over familar protocols like HTTP and packaged using canonical technologies like docker. Environment creators can use the EnvTorch framework to create environments that are isolated, secure, and easy to deploy and use. +In addition to making it easier for researchers and RL framework writers, we also provide tools for environment creators making it easier for them to create richer environments and make them available over familar protocols like HTTP and packaged using canonical technologies like docker. Environment creators can use the OpenEnv framework to create environments that are isolated, secure, and easy to deploy and use. -> ⚠️ **Early Development Warning** EnvTorch is currently in an experimental +> ⚠️ **Early Development Warning** OpenEnv is currently in an experimental > stage. You should expect bugs, incomplete features, and APIs that may change > in future versions. The project welcomes bugfixes, but to make sure things are > well coordinated you should discuss any significant change before starting the From 8523bf53beb485e5c341d35a1e2d216badb557fa Mon Sep 17 00:00:00 2001 From: Davide Testuggine Date: Mon, 13 Oct 2025 14:16:23 -0700 Subject: [PATCH 06/16] Add basic ChatEnv --- src/core/env_server/__init__.py | 4 +- src/core/env_server/interfaces.py | 56 +++- src/envs/chat_env/README.md | 268 +++++++++++++++ src/envs/chat_env/__init__.py | 12 + src/envs/chat_env/client.py | 182 ++++++++++ src/envs/chat_env/models.py | 67 ++++ src/envs/chat_env/server/Dockerfile | 27 ++ src/envs/chat_env/server/__init__.py | 11 + src/envs/chat_env/server/app.py | 77 +++++ src/envs/chat_env/server/chat_environment.py | 172 ++++++++++ src/envs/chat_env/server/test_chat_env.py | 328 +++++++++++++++++++ 11 files changed, 1202 insertions(+), 2 deletions(-) create mode 100644 src/envs/chat_env/README.md create mode 100644 src/envs/chat_env/__init__.py create mode 100644 src/envs/chat_env/client.py create mode 100644 src/envs/chat_env/models.py create mode 100644 src/envs/chat_env/server/Dockerfile create mode 100644 src/envs/chat_env/server/__init__.py create mode 100644 src/envs/chat_env/server/app.py create mode 100644 src/envs/chat_env/server/chat_environment.py create mode 100644 src/envs/chat_env/server/test_chat_env.py diff --git a/src/core/env_server/__init__.py b/src/core/env_server/__init__.py index 1a8e7df4..33250b31 100644 --- a/src/core/env_server/__init__.py +++ b/src/core/env_server/__init__.py @@ -8,13 +8,15 @@ from .base_transforms import CompositeTransform, NullTransform from .http_server import HTTPEnvServer, create_fastapi_app -from .interfaces import Environment, Transform +from .interfaces import Environment, Message, ModelTokenizer, Transform from .types import Action, Observation, State __all__ = [ # Core interfaces "Environment", "Transform", + "Message", + "ModelTokenizer", # Types "Action", "Observation", diff --git a/src/core/env_server/interfaces.py b/src/core/env_server/interfaces.py index 86241bf6..caa2d76d 100644 --- a/src/core/env_server/interfaces.py +++ b/src/core/env_server/interfaces.py @@ -5,11 +5,65 @@ # LICENSE file in the root directory of this source tree. from abc import ABC, abstractmethod -from typing import Any +from typing import Any, Protocol, TypedDict from .types import Action, Observation, State +class Message(TypedDict): + """A message in a conversation. + + Compatible with Huggingface chat template format. + """ + + role: str + content: str + + +class ModelTokenizer(Protocol): + """Protocol for tokenizers that support chat templates. + + This protocol defines the interface that tokenizers must implement + to work with chat-based environments. It's compatible with + Huggingface transformers tokenizers. + """ + + def apply_chat_template( + self, + conversation: list[Message], + tokenize: bool = True, + return_tensors: str | None = None, + **kwargs: Any, + ) -> Any: + """Apply a chat template to format and optionally tokenize a conversation. + + Args: + conversation: List of message dictionaries with 'role' and 'content' + tokenize: Whether to tokenize the output + return_tensors: Format for returned tensors ('pt' for PyTorch) + **kwargs: Additional arguments + + Returns: + Formatted and optionally tokenized conversation + """ + ... + + def decode( + self, token_ids: Any, skip_special_tokens: bool = False, **kwargs: Any + ) -> str: + """Decode token IDs back to text. + + Args: + token_ids: Token IDs to decode + skip_special_tokens: Whether to skip special tokens in output + **kwargs: Additional arguments + + Returns: + Decoded text string + """ + ... + + class Transform(ABC): """Transform observations to add rewards, metrics, or other modifications. diff --git a/src/envs/chat_env/README.md b/src/envs/chat_env/README.md new file mode 100644 index 00000000..abc873d2 --- /dev/null +++ b/src/envs/chat_env/README.md @@ -0,0 +1,268 @@ +# Chat Environment + +A chat-based environment for LLMs with built-in tokenization and message history management. This environment is designed to work directly with language models and provides a minimal, flexible foundation for conversation-based RL training. + +## Overview + +ChatEnvironment is a lightweight environment that: +- Manages conversation history in Huggingface chat format +- Handles tokenization internally using any compatible tokenizer +- Stores both messages and tokens for efficient model interaction +- Provides a clean interface for building chat-based RL agents + +ChatEnvironment can be used in **two ways**: +1. **Direct usage**: Import and use ChatEnvironment directly in your Python code (best for local development) +2. **HTTP client**: Use ChatEnv client to connect to a ChatEnvironment server (best for distributed/containerized deployments) + +## Quick Start + +### Option 1: Direct Usage (Local) + +```python +from transformers import AutoTokenizer +from envs.chat_env import ChatAction, ChatObservation +from envs.chat_env.server import ChatEnvironment +from core.env_server import Message + +# Initialize with a tokenizer and optional system prompt +tokenizer = AutoTokenizer.from_pretrained("gpt2") +env = ChatEnvironment( + tokenizer=tokenizer, + system_prompt="You are a helpful assistant.", + system_role="system" +) + +# Reset the environment +obs = env.reset() +print(f"Messages: {obs.messages}") +print(f"Tokens shape: {obs.tokens.shape}") + +# Create an action from a message +user_message: Message = {"role": "user", "content": "Hello!"} +action = env.message_to_action(user_message) + +# Step the environment +obs = env.step(action) +print(f"Updated messages: {obs.messages}") +print(f"Updated tokens shape: {obs.tokens.shape}") +``` + +### Option 2: HTTP Client (Distributed) + +```python +from transformers import AutoTokenizer +from envs.chat_env import ChatEnv, ChatAction +import torch + +# Create environment from Docker image +client = ChatEnv.from_docker_image("chat-env:latest") + +# Or connect to existing server +# client = ChatEnv(base_url="http://localhost:8000") + +# Reset +result = client.reset() +print(f"Initial messages: {result.observation.messages}") + +# Send an action with tokens +tokenizer = AutoTokenizer.from_pretrained("gpt2") +message = {"role": "user", "content": "Hello!"} +action = client.message_to_action(message, tokenizer) + +result = client.step(action) +print(f"Messages: {result.observation.messages}") +print(f"Reward: {result.reward}") + +# Cleanup +client.close() +``` + +### Building the Docker Image + +Before using the HTTP client, build the Docker image: + +```bash +# From project root +docker build -t chat-env:latest -f src/envs/chat_env/server/Dockerfile . + +# Optionally specify a different tokenizer +docker build -t chat-env:latest \ + --build-arg TOKENIZER_NAME=meta-llama/Llama-2-7b-chat-hf \ + -f src/envs/chat_env/server/Dockerfile . +``` + +## Architecture + +### Data Models + +#### ChatAction +Actions contain only tokens (PyTorch tensors) that interface directly with models: +```python +@dataclass +class ChatAction(Action): + tokens: torch.Tensor # Required, cannot be empty +``` + +#### ChatObservation +Observations contain both the message history and flattened tokens: +```python +@dataclass +class ChatObservation(Observation): + messages: list[Message] # List of {"role": str, "content": str} + tokens: torch.Tensor # Flattened tensor of all conversation tokens + # Inherited: done, reward, metadata +``` + +#### ChatState +Internal state tracking message and token history: +```python +@dataclass +class ChatState(State): + history_messages: list[Message] + history_tokens: list[torch.Tensor] + # Inherited: episode_id, step_count +``` + +### Key Methods + +#### `reset() -> ChatObservation` +Resets the environment to initial state with optional system prompt. + +#### `step(action: ChatAction) -> ChatObservation` +Takes an action (tokens), decodes to text, adds to history, returns updated observation. + +#### `message_to_action(message: Message) -> ChatAction` +Convenience method to convert a message dict to a tokenized ChatAction. + +## Usage Patterns + +### Basic Conversation + +```python +from transformers import AutoTokenizer +from envs.chat_env.server import ChatEnvironment +from core.env_server import Message + +tokenizer = AutoTokenizer.from_pretrained("gpt2") +env = ChatEnvironment(tokenizer=tokenizer) + +# Reset +obs = env.reset() + +# User turn +user_msg: Message = {"role": "user", "content": "What is 2+2?"} +action = env.message_to_action(user_msg) +obs = env.step(action) + +# Assistant turn +assistant_msg: Message = {"role": "assistant", "content": "2+2 equals 4."} +action = env.message_to_action(assistant_msg) +obs = env.step(action) + +# Access conversation history +print(f"Full conversation: {obs.messages}") +print(f"All tokens: {obs.tokens}") +``` + +### With Transforms + +You can add transforms to compute rewards or modify observations: + +```python +from core.env_server import Transform, Observation + +class LengthRewardTransform(Transform): + """Reward based on response length.""" + + def __call__(self, observation: Observation) -> Observation: + if hasattr(observation, 'messages') and observation.messages: + last_message = observation.messages[-1] + observation.reward = len(last_message['content']) * 0.1 + return observation + +env = ChatEnvironment( + tokenizer=tokenizer, + transform=LengthRewardTransform() +) +``` + +### Direct Token Usage + +If you're generating tokens from a model, you can create actions directly: + +```python +import torch +from envs.chat_env import ChatAction + +# Assume you have tokens from your model +generated_tokens = torch.tensor([[1, 2, 3, 4, 5]]) + +# Create action directly +action = ChatAction(tokens=generated_tokens) + +# Step environment +obs = env.step(action) +``` + +## Design Philosophy + +ChatEnvironment is intentionally minimal and flexible: + +1. **No HTTP overhead**: Works directly with Python objects and tensors +2. **Tokenizer ownership**: Environment handles tokenization consistently +3. **Dual representation**: Maintains both human-readable messages and model-ready tokens +4. **Transform support**: Extensible reward computation and observation modification +5. **Type-safe**: Uses typed Messages compatible with Huggingface format + +## Integration with Models + +ChatEnvironment pairs naturally with language models: + +```python +# Pseudo-code for RL training loop +model = YourLanguageModel() +env = ChatEnvironment(tokenizer=model.tokenizer) + +for episode in range(num_episodes): + obs = env.reset() + + while not obs.done: + # Model generates response tokens + action_tokens = model.generate(obs.tokens) + action = ChatAction(tokens=action_tokens) + + # Step environment + obs = env.step(action) + + # Use obs.reward for RL updates + model.update(obs.reward) +``` + +## Project Structure + +``` +chat_env/ +├── __init__.py # Module exports (ChatEnv, ChatAction, etc.) +├── README.md # This file +├── client.py # ChatEnv HTTP client +├── models.py # ChatAction, ChatObservation, ChatState +└── server/ + ├── __init__.py # Server module exports + ├── chat_environment.py # Core ChatEnvironment implementation + ├── app.py # FastAPI server application + ├── test_chat_env.py # Unit tests + └── Dockerfile # Container image for HTTP server +``` + +## Requirements + +- Python 3.10+ +- PyTorch +- A tokenizer with `apply_chat_template` method (e.g., Huggingface transformers) + +## Notes + +- ChatEnvironment does **not** generate responses - it only manages conversation state +- You need to provide tokens from your model or other source +- The environment is thread-safe for single-threaded use only +- For multi-turn conversations, alternate between user and assistant messages diff --git a/src/envs/chat_env/__init__.py b/src/envs/chat_env/__init__.py new file mode 100644 index 00000000..06977614 --- /dev/null +++ b/src/envs/chat_env/__init__.py @@ -0,0 +1,12 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +"""Chat Environment - A chat-based environment for LLMs with tokenization support.""" + +from .client import ChatEnv +from .models import ChatAction, ChatObservation, ChatState + +__all__ = ["ChatAction", "ChatObservation", "ChatState", "ChatEnv"] diff --git a/src/envs/chat_env/client.py b/src/envs/chat_env/client.py new file mode 100644 index 00000000..87126fce --- /dev/null +++ b/src/envs/chat_env/client.py @@ -0,0 +1,182 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +""" +Chat Environment HTTP Client. + +This module provides the client for connecting to a Chat Environment server +over HTTP. +""" + +from typing import Any, Dict + +import torch + +from core.env_server.interfaces import Message +from core.env_server.types import State +from core.http_env_client import HTTPEnvClient +from core.types import StepResult + +from .models import ChatAction, ChatObservation, ChatState + + +class ChatEnv(HTTPEnvClient[ChatAction, ChatObservation]): + """ + HTTP client for the Chat Environment. + + This client connects to a ChatEnvironment HTTP server and provides + methods to interact with it: reset(), step(), and state access. + + Note: Since ChatEnvironment works with PyTorch tensors, the HTTP layer + serializes tokens as lists for transport and deserializes them back to tensors. + + Example: + >>> # Connect to a running server + >>> client = ChatEnv(base_url="http://localhost:8000") + >>> result = client.reset() + >>> print(result.observation.messages) + >>> + >>> # Send an action with tokens + >>> import torch + >>> tokens = torch.tensor([[1, 2, 3, 4, 5]]) + >>> result = client.step(ChatAction(tokens=tokens)) + >>> print(result.observation.messages) + >>> print(result.reward) + + Example with Docker: + >>> # Automatically start container and connect + >>> client = ChatEnv.from_docker_image("chat-env:latest") + >>> result = client.reset() + >>> result = client.step(ChatAction(tokens=torch.tensor([[1, 2, 3]]))) + """ + + def _step_payload(self, action: ChatAction) -> Dict: + """ + Convert ChatAction to JSON payload for step request. + + Since PyTorch tensors can't be directly serialized to JSON, + we convert them to nested lists. + + Args: + action: ChatAction instance with tokens + + Returns: + Dictionary representation suitable for JSON encoding + """ + # Convert tensor to list for JSON serialization + if isinstance(action.tokens, torch.Tensor): + tokens_list = action.tokens.tolist() + else: + tokens_list = action.tokens + + return { + "tokens": tokens_list, + "metadata": action.metadata, + } + + def _parse_result(self, payload: Dict) -> StepResult[ChatObservation]: + """ + Parse server response into StepResult[ChatObservation]. + + Args: + payload: JSON response from server + + Returns: + StepResult with ChatObservation + """ + obs_data = payload.get("observation", {}) + + # Convert tokens list back to tensor + tokens_data = obs_data.get("tokens", []) + if isinstance(tokens_data, list): + if tokens_data: + tokens = torch.tensor(tokens_data) + else: + tokens = torch.tensor([]) + else: + tokens = torch.tensor([]) + + # Parse messages + messages = obs_data.get("messages", []) + + observation = ChatObservation( + messages=messages, + tokens=tokens, + done=payload.get("done", False), + reward=payload.get("reward"), + metadata=obs_data.get("metadata", {}), + ) + + return StepResult( + observation=observation, + reward=payload.get("reward"), + done=payload.get("done", False), + ) + + def _parse_state(self, payload: Dict) -> ChatState: + """ + Parse server response into ChatState object. + + Args: + payload: JSON response from /state endpoint + + Returns: + ChatState object with conversation history + """ + # Parse history messages + history_messages = payload.get("history_messages", []) + + # Parse history tokens - convert lists back to tensors + history_tokens_data = payload.get("history_tokens", []) + history_tokens = [] + for token_list in history_tokens_data: + if token_list: + history_tokens.append(torch.tensor(token_list)) + else: + history_tokens.append(torch.tensor([])) + + return ChatState( + episode_id=payload.get("episode_id"), + step_count=payload.get("step_count", 0), + history_messages=history_messages, + history_tokens=history_tokens, + ) + + def message_to_action(self, message: Message, tokenizer: Any) -> ChatAction: + """ + Helper method to convert a message to a ChatAction using a tokenizer. + + This is a client-side convenience method for users who have a tokenizer + and want to create actions from messages. + + Args: + message: Message dict with 'role' and 'content' + tokenizer: Tokenizer with apply_chat_template method + + Returns: + ChatAction with tokenized message + + Example: + >>> from transformers import AutoTokenizer + >>> tokenizer = AutoTokenizer.from_pretrained("gpt2") + >>> client = ChatEnv(base_url="http://localhost:8000") + >>> message = {"role": "user", "content": "Hello!"} + >>> action = client.message_to_action(message, tokenizer) + >>> result = client.step(action) + """ + if "role" not in message: + raise ValueError("Message must contain a 'role' key") + if "content" not in message: + raise ValueError("Message must contain a 'content' key") + if message["content"] is None: + raise ValueError("Message content cannot be None") + + # Tokenize the message + tokens = tokenizer.apply_chat_template( + conversation=[message], tokenize=True, return_tensors="pt" + ) + + return ChatAction(tokens=tokens) diff --git a/src/envs/chat_env/models.py b/src/envs/chat_env/models.py new file mode 100644 index 00000000..321565ed --- /dev/null +++ b/src/envs/chat_env/models.py @@ -0,0 +1,67 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +""" +Data models for the Chat Environment. + +The Chat environment provides a chat-based interface for LLMs with support +for tokenization and message history management. +""" + +from dataclasses import dataclass, field + +import torch + +from core.env_server.interfaces import Message +from core.env_server.types import Action, Observation, State + + +@dataclass +class ChatAction(Action): + """Action for chat environments. + + Contains tokens that represent the action to be taken. + This interfaces directly with models. + """ + + tokens: torch.Tensor = field(default_factory=lambda: torch.tensor([])) + + def __post_init__(self): + """Validate required fields after initialization.""" + if self.tokens.numel() == 0: + raise ValueError("tokens is required and cannot be empty") + + +@dataclass +class ChatState(State): + """State of the ChatEnvironment containing message history.""" + + history_messages: list[Message] = field(default_factory=list) + history_tokens: list[torch.Tensor] = field( + default_factory=list + ) # Same len as messages + + +@dataclass(kw_only=True) +class ChatObservation(Observation): + """Observation returned by ChatEnvironment. + + Contains the message history in Huggingface format (list of dicts with role/content) + and the tokenized representation of the entire conversation. + + The environment owns the tokenizer and generates the tokens from the messages. + + Example: + messages = [ + {"role": "system", "content": "You are a helpful assistant"}, + {"role": "user", "content": "How tall is the Eiffel Tower?"}, + ] + tokens = tensor([1, 2, 3, 4, 5, ...]) # tokenized entire conversation + """ + + messages: list[Message] = field(default_factory=list) + tokens: torch.Tensor = field(default_factory=lambda: torch.tensor([])) + # Inherited fields from Observation ABC: reward, done, metadata diff --git a/src/envs/chat_env/server/Dockerfile b/src/envs/chat_env/server/Dockerfile new file mode 100644 index 00000000..eaea312c --- /dev/null +++ b/src/envs/chat_env/server/Dockerfile @@ -0,0 +1,27 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +# Use the standard envtorch base image +# Built from: docker build -t envtorch-base:latest -f src/core/containers/images/Dockerfile . +FROM envtorch-base:latest + +# Install additional dependencies for ChatEnvironment +RUN pip install torch transformers + +# Copy only what's needed for this environment +COPY src/core/ /app/src/core/ +COPY src/envs/chat_env/ /app/src/envs/chat_env/ + +# Environment variables that can be overridden at runtime +ENV TOKENIZER_NAME=gpt2 +ENV SYSTEM_PROMPT="You are a helpful AI assistant." + +# Health check +HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ + CMD curl -f http://localhost:8000/health || exit 1 + +# Run the FastAPI server +CMD ["uvicorn", "envs.chat_env.server.app:app", "--host", "0.0.0.0", "--port", "8000"] \ No newline at end of file diff --git a/src/envs/chat_env/server/__init__.py b/src/envs/chat_env/server/__init__.py new file mode 100644 index 00000000..534e5827 --- /dev/null +++ b/src/envs/chat_env/server/__init__.py @@ -0,0 +1,11 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +"""Chat environment server components.""" + +from .chat_environment import ChatEnvironment + +__all__ = ["ChatEnvironment"] diff --git a/src/envs/chat_env/server/app.py b/src/envs/chat_env/server/app.py new file mode 100644 index 00000000..b50d9b6e --- /dev/null +++ b/src/envs/chat_env/server/app.py @@ -0,0 +1,77 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +""" +FastAPI application for the Chat Environment. + +This module creates an HTTP server that exposes the ChatEnvironment +over HTTP endpoints, making it compatible with HTTPEnvClient. + +Note: This server requires a tokenizer to be initialized. The tokenizer +must be specified when starting the server. + +Usage: + # Development (with auto-reload): + uvicorn envs.chat_env.server.app:app --reload --host 0.0.0.0 --port 8000 + + # Production: + uvicorn envs.chat_env.server.app:app --host 0.0.0.0 --port 8000 --workers 4 + + # Or run directly: + python -m envs.chat_env.server.app +""" + +import os + +from core.env_server import create_fastapi_app + +from ..models import ChatAction, ChatObservation +from .chat_environment import ChatEnvironment + + +# Initialize tokenizer based on environment variable +def get_tokenizer(): + """Get tokenizer from environment or use a mock for testing.""" + tokenizer_name = os.environ.get("TOKENIZER_NAME", "gpt2") + + try: + from transformers import AutoTokenizer + + tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) + print(f"Loaded tokenizer: {tokenizer_name}") + return tokenizer + except ImportError: + print( + "Warning: transformers not installed, using mock tokenizer for testing only" + ) + # Use mock tokenizer from tests + import sys + from pathlib import Path + + # Add parent directory to path to import test utilities + test_path = Path(__file__).parent + sys.path.insert(0, str(test_path)) + + from test_chat_env import MockTokenizer + + return MockTokenizer() + + +# Get system prompt from environment +system_prompt = os.environ.get("SYSTEM_PROMPT", None) + +# Create the environment instance with tokenizer +tokenizer = get_tokenizer() +env = ChatEnvironment(tokenizer=tokenizer, system_prompt=system_prompt) + +# Create the FastAPI app with routes +app = create_fastapi_app(env, ChatAction, ChatObservation) + + +if __name__ == "__main__": + import uvicorn + + uvicorn.run(app, host="0.0.0.0", port=8000) diff --git a/src/envs/chat_env/server/chat_environment.py b/src/envs/chat_env/server/chat_environment.py new file mode 100644 index 00000000..66eb24c2 --- /dev/null +++ b/src/envs/chat_env/server/chat_environment.py @@ -0,0 +1,172 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +""" +Chat Environment Implementation. + +A chat-based environment for LLMs, designed as a blank canvas for conversation and RL. +""" + +import torch + +from core.env_server.interfaces import Environment, Message, ModelTokenizer, Transform + +from ..models import ChatAction, ChatObservation, ChatState + + +class ChatEnvironment(Environment): + """A chat-based environment for LLMs, designed as a blank canvas for conversation and RL. + + This environment is designed to work with language models. It provides the fundamental structure + for managing conversation state but is intentionally minimal to allow maximum flexibility. + + The environment owns the tokenizer and is responsible for managing both message history and tokens. + Actions contain only tokens that interface directly with models. + + Args: + tokenizer: A tokenizer that will be used to tokenize the conversation + system_prompt: An optional system prompt string to use during reset calls (optional) + system_role: The role of the system (at reset time). Defaults to "system" + transform: Optional transform to apply to observations + """ + + def __init__( + self, + tokenizer: ModelTokenizer, + system_prompt: str | None = None, + system_role: str = "system", + transform: Transform | None = None, + ): + super().__init__(transform=transform) + + if not hasattr(tokenizer, "apply_chat_template"): + raise ValueError("Tokenizer must have 'apply_chat_template' method") + self.tokenizer = tokenizer + self.system_prompt = system_prompt + self.system_role = system_role + + self._state = ChatState() + + if system_prompt: + system_message: Message = {"role": system_role, "content": system_prompt} + self._state.history_messages.append(system_message) + # Tokenize the system message + system_tokens = self.tokenizer.apply_chat_template( + conversation=[system_message], tokenize=True, return_tensors="pt" # type: ignore + ) + self._state.history_tokens.append(system_tokens) + + def reset(self) -> ChatObservation: + """Reset the environment to initial state. + + Returns: + ChatObservation: Initial observation with system prompt (if any) + """ + self._state.history_messages = [] + self._state.history_tokens = [] + if self.system_prompt: + system_message: Message = { + "role": self.system_role, + "content": self.system_prompt, + } + self._state.history_messages = [system_message] + # Tokenize the system message + system_tokens = self.tokenizer.apply_chat_template( + conversation=[system_message], tokenize=True, return_tensors="pt" # type: ignore + ) + self._state.history_tokens = [system_tokens] + + return self._create_observation() + + def step(self, action: ChatAction) -> ChatObservation: # type: ignore[override] + """Take a step in the environment by adding tokens to the chat history. + + Args: + action: A ChatAction object containing tokens. + + Returns: + ChatObservation: The updated observation with the new tokens added. + """ + # Store the tokens directly from the action + self._state.history_tokens.append(action.tokens) + + # Decode tokens to text and add as a message to history + decoded_text = self.tokenizer.decode( + action.tokens.squeeze(), skip_special_tokens=True + ) + assistant_message: Message = {"role": "assistant", "content": decoded_text} + self._state.history_messages.append(assistant_message) + + return self._create_observation() + + def _create_observation(self) -> ChatObservation: + """Create a ChatObservation from the current state. + + Returns both the message history and the tokens flattened as a single tensor + ready to be used by models. + + Returns: + ChatObservation: Observation with messages and flattened tokens + """ + if self._state.history_tokens: + # Flatten all tokens into a single 1D tensor + flattened_tokens = torch.cat( + [t.flatten() for t in self._state.history_tokens], dim=0 + ) + else: + flattened_tokens = torch.tensor([]) + + observation = ChatObservation( + messages=self._state.history_messages.copy(), # Copy to prevent external mutation + tokens=flattened_tokens, + ) + + transformed = self._apply_transform(observation) + if isinstance(transformed, ChatObservation): + return transformed + else: + # If transform returns base Observation, convert back to ChatObservation + return ChatObservation( + messages=getattr(transformed, "messages", []), + tokens=getattr(transformed, "tokens", torch.tensor([])), + done=transformed.done, + reward=transformed.reward, + ) + + @property + def state(self) -> ChatState: + """Get the current state of the environment. + + Returns: + ChatState: The current state. + """ + return self._state + + def message_to_action(self, message: Message) -> ChatAction: + """Convert a message dictionary to a ChatAction with tokens. + + Args: + message: Dictionary with 'role' and 'content' keys + + Returns: + ChatAction: A new ChatAction instance with tokenized content + + Raises: + ValueError: If required keys are missing + """ + if "role" not in message: + raise ValueError("Message must contain a 'role' key") + if "content" not in message: + raise ValueError("Message must contain a 'content' key") + if message["content"] is None: + raise ValueError("Message content cannot be None") + + # Tokenize the single message + tokens = self.tokenizer.apply_chat_template( + conversation=[message], tokenize=True, return_tensors="pt" # type: ignore + ) + + return ChatAction(tokens=tokens) diff --git a/src/envs/chat_env/server/test_chat_env.py b/src/envs/chat_env/server/test_chat_env.py new file mode 100644 index 00000000..92a67d0e --- /dev/null +++ b/src/envs/chat_env/server/test_chat_env.py @@ -0,0 +1,328 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +""" +Test suite for ChatEnvironment. + +Proper unit tests with assertions to verify correct behavior. +""" + +import torch + +from core.env_server.interfaces import Message + +from ..models import ChatAction +from .chat_environment import ChatEnvironment + + +class MockTokenizer: + """Mock tokenizer for testing without requiring transformers library.""" + + def apply_chat_template( + self, + conversation: list[Message], + tokenize: bool = True, + return_tensors: str | None = None, + **kwargs, + ): + """Mock implementation that creates deterministic token tensors from text.""" + # Concatenate all message content + text = " ".join([msg["content"] for msg in conversation]) + + # Create deterministic tokens based on text content + # Use character codes modulo 256 to get valid token IDs + tokens = [ord(c) % 256 for c in text] + + if return_tensors == "pt": + return torch.tensor([tokens]) + return tokens + + def decode(self, token_ids, skip_special_tokens: bool = False, **kwargs) -> str: + """Mock decode that reverses the encoding process.""" + if isinstance(token_ids, torch.Tensor): + token_ids = token_ids.tolist() + + # Reverse the encoding: convert tokens back to characters + chars = [chr(t) for t in token_ids] + return "".join(chars) + + +def test_tokenization_consistency(): + """Test that tokenizing the same string produces the same tokens.""" + tokenizer = MockTokenizer() + env = ChatEnvironment(tokenizer=tokenizer) + + # Create the same message twice + message1: Message = {"role": "user", "content": "Hello, world!"} + message2: Message = {"role": "user", "content": "Hello, world!"} + + # Convert to actions + action1 = env.message_to_action(message1) + action2 = env.message_to_action(message2) + + # Verify tokens are identical + assert torch.equal( + action1.tokens, action2.tokens + ), "Same message should produce identical tokens" + + # Verify tokens are not empty + assert action1.tokens.numel() > 0, "Tokens should not be empty" + + print("✓ test_tokenization_consistency passed") + + +def test_message_content_preservation(): + """Test that message content is preserved in the observation.""" + tokenizer = MockTokenizer() + env = ChatEnvironment(tokenizer=tokenizer) + + env.reset() + + # Test with user message + user_content = "What is the capital of France?" + user_message: Message = {"role": "user", "content": user_content} + action = env.message_to_action(user_message) + obs = env.step(action) + + # The last message should have the decoded content + assert len(obs.messages) > 0, "Observation should have at least one message" + last_message = obs.messages[-1] + + # Verify the decoded content matches what we sent + # Note: The environment decodes the tokens, so we verify the round-trip + decoded_content = last_message["content"] + assert decoded_content == user_content, ( + f"Message content should be preserved. " + f"Expected: {user_content}, Got: {decoded_content}" + ) + + # Test with assistant message + assistant_content = "The capital of France is Paris." + assistant_message: Message = {"role": "assistant", "content": assistant_content} + action = env.message_to_action(assistant_message) + obs = env.step(action) + + # Verify the last message has the assistant content + assert len(obs.messages) >= 2, "Should have at least 2 messages now" + last_message = obs.messages[-1] + decoded_content = last_message["content"] + assert decoded_content == assistant_content, ( + f"Assistant message content should be preserved. " + f"Expected: {assistant_content}, Got: {decoded_content}" + ) + + print("✓ test_message_content_preservation passed") + + +def test_system_prompt_preserved(): + """Test that system prompt is preserved after reset.""" + tokenizer = MockTokenizer() + system_prompt = "You are a helpful assistant." + + env = ChatEnvironment(tokenizer=tokenizer, system_prompt=system_prompt) + + # Check after initialization + obs = env.reset() + assert len(obs.messages) == 1, "Should have exactly one message (system prompt)" + assert obs.messages[0]["role"] == "system", "First message should have system role" + assert ( + obs.messages[0]["content"] == system_prompt + ), "System prompt content should match" + + # Add some messages + action = env.message_to_action({"role": "user", "content": "Hello"}) + env.step(action) + + # Reset and verify system prompt is still there + obs = env.reset() + assert len(obs.messages) == 1, "After reset, should only have system prompt" + assert ( + obs.messages[0]["content"] == system_prompt + ), "System prompt should be preserved after reset" + + print("✓ test_system_prompt_preserved passed") + + +def test_token_history_accumulation(): + """Test that tokens accumulate correctly in the observation.""" + tokenizer = MockTokenizer() + env = ChatEnvironment(tokenizer=tokenizer) + + obs = env.reset() + initial_token_count = obs.tokens.numel() + + # Step with first message + message1 = {"role": "user", "content": "Hi"} + action1 = env.message_to_action(message1) + obs1 = env.step(action1) + token_count_1 = obs1.tokens.numel() + + # Tokens should increase + assert token_count_1 > initial_token_count, "Token count should increase after step" + + # Step with second message + message2 = {"role": "assistant", "content": "Hello there"} + action2 = env.message_to_action(message2) + obs2 = env.step(action2) + token_count_2 = obs2.tokens.numel() + + # Tokens should continue to accumulate + assert ( + token_count_2 > token_count_1 + ), "Token count should keep increasing with more messages" + + # Verify tokens are the concatenation of both messages + expected_tokens = torch.cat([action1.tokens.flatten(), action2.tokens.flatten()]) + assert torch.equal( + obs2.tokens, expected_tokens + ), "Tokens should be concatenation of all actions" + + print("✓ test_token_history_accumulation passed") + + +def test_direct_token_action(): + """Test creating actions directly from tokens.""" + tokenizer = MockTokenizer() + env = ChatEnvironment(tokenizer=tokenizer) + + env.reset() + + # Create raw tokens + raw_tokens = torch.tensor([[72, 101, 108, 108, 111]]) # ASCII for "Hello" + action = ChatAction(tokens=raw_tokens) + + # Step with raw tokens + obs = env.step(action) + + # Verify message was added + assert len(obs.messages) == 1, "Should have one message" + assert obs.messages[0]["role"] == "assistant", "Should default to assistant role" + + # Verify tokens match what we sent (flattened) + assert torch.equal( + obs.tokens, raw_tokens.flatten() + ), "Observation tokens should match input tokens" + + print("✓ test_direct_token_action passed") + + +def test_empty_tokens_validation(): + """Test that empty tokens raise a ValueError.""" + try: + action = ChatAction(tokens=torch.tensor([])) + assert False, "Should have raised ValueError for empty tokens" + except ValueError as e: + assert "empty" in str(e).lower(), "Error message should mention empty tokens" + + print("✓ test_empty_tokens_validation passed") + + +def test_message_validation(): + """Test that invalid messages raise appropriate errors.""" + tokenizer = MockTokenizer() + env = ChatEnvironment(tokenizer=tokenizer) + + # Test missing 'role' key + try: + env.message_to_action({"content": "test"}) # type: ignore + assert False, "Should have raised error for missing 'role' key" + except (ValueError, KeyError): + pass + + # Test missing 'content' key + try: + env.message_to_action({"role": "user"}) # type: ignore + assert False, "Should have raised error for missing 'content' key" + except (ValueError, KeyError): + pass + + # Test None content + try: + env.message_to_action({"role": "user", "content": None}) # type: ignore + assert False, "Should have raised error for None content" + except ValueError: + pass + + print("✓ test_message_validation passed") + + +def test_reset_clears_history(): + """Test that reset properly clears all message and token history.""" + tokenizer = MockTokenizer() + env = ChatEnvironment(tokenizer=tokenizer, system_prompt="System message") + + # Add some messages + obs1 = env.reset() + initial_messages = len(obs1.messages) + + action = env.message_to_action({"role": "user", "content": "Test message"}) + obs2 = env.step(action) + + # Verify message was added + assert ( + len(obs2.messages) > initial_messages + ), "Message should be added after step" + + # Reset + obs3 = env.reset() + + # Verify we're back to just the system prompt + assert ( + len(obs3.messages) == initial_messages + ), "Reset should clear history back to initial state" + assert ( + obs3.messages[0]["content"] == "System message" + ), "System prompt should be preserved" + + print("✓ test_reset_clears_history passed") + + +def main(): + """Run all tests.""" + print("\n" + "=" * 60) + print("ChatEnvironment Test Suite") + print("=" * 60 + "\n") + + tests = [ + test_tokenization_consistency, + test_message_content_preservation, + test_system_prompt_preserved, + test_token_history_accumulation, + test_direct_token_action, + test_empty_tokens_validation, + test_message_validation, + test_reset_clears_history, + ] + + failed = [] + for test in tests: + try: + test() + except AssertionError as e: + print(f"✗ {test.__name__} failed: {e}") + failed.append(test.__name__) + except Exception as e: + print(f"✗ {test.__name__} errored: {e}") + import traceback + + traceback.print_exc() + failed.append(test.__name__) + + print("\n" + "=" * 60) + if not failed: + print(f"✓ All {len(tests)} tests passed!") + print("=" * 60) + return 0 + else: + print(f"✗ {len(failed)}/{len(tests)} tests failed:") + for name in failed: + print(f" - {name}") + print("=" * 60) + return 1 + + +if __name__ == "__main__": + exit(main()) From d4e07e1e7968b1270955ad2d20bd29e0d0324a52 Mon Sep 17 00:00:00 2001 From: Davide Testuggine Date: Mon, 13 Oct 2025 14:22:36 -0700 Subject: [PATCH 07/16] Update src/envs/chat_env/server/chat_environment.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- src/envs/chat_env/server/chat_environment.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/envs/chat_env/server/chat_environment.py b/src/envs/chat_env/server/chat_environment.py index 66eb24c2..80aa5a7c 100644 --- a/src/envs/chat_env/server/chat_environment.py +++ b/src/envs/chat_env/server/chat_environment.py @@ -114,7 +114,7 @@ def _create_observation(self) -> ChatObservation: if self._state.history_tokens: # Flatten all tokens into a single 1D tensor flattened_tokens = torch.cat( - [t.flatten() for t in self._state.history_tokens], dim=0 + (t.flatten() for t in self._state.history_tokens), dim=0 ) else: flattened_tokens = torch.tensor([]) From 911da7a8a5c4394ef2b10e936c4b033969159ee2 Mon Sep 17 00:00:00 2001 From: Joseph Spisak Date: Wed, 15 Oct 2025 08:54:01 -0700 Subject: [PATCH 08/16] Create CODE_OF_CONDUCT.md adding the CoC.. --- CODE_OF_CONDUCT.md | 80 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) create mode 100644 CODE_OF_CONDUCT.md diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 00000000..3232ed66 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,80 @@ +# Code of Conduct + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as +contributors and maintainers pledge to make participation in our project and +our community a harassment-free experience for everyone, regardless of age, body +size, disability, ethnicity, sex characteristics, gender identity and expression, +level of experience, education, socio-economic status, nationality, personal +appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment +include: + +* Using welcoming and inclusive language +* Being respectful of differing viewpoints and experiences +* Gracefully accepting constructive criticism +* Focusing on what is best for the community +* Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +* The use of sexualized language or imagery and unwelcome sexual attention or +advances +* Trolling, insulting/derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or electronic +address, without explicit permission +* Other conduct which could reasonably be considered inappropriate in a +professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable +behavior and are expected to take appropriate and fair corrective action in +response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or +reject comments, commits, code, wiki edits, issues, and other contributions +that are not aligned to this Code of Conduct, or to ban temporarily or +permanently any contributor for other behaviors that they deem inappropriate, +threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies within all project spaces, and it also applies when +an individual is representing the project or its community in public spaces. +Examples of representing a project or community include using an official +project e-mail address, posting via an official social media account, or acting +as an appointed representative at an online or offline event. Representation of +a project may be further defined and clarified by project maintainers. + +This Code of Conduct also applies outside the project spaces when there is a +reasonable belief that an individual's behavior may have a negative impact on +the project or its community. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported by contacting the project team at . All +complaints will be reviewed and investigated and will result in a response that +is deemed necessary and appropriate to the circumstances. The project team is +obligated to maintain confidentiality with regard to the reporter of an incident. +Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good +faith may face temporary or permanent repercussions as determined by other +members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, +available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see +https://www.contributor-covenant.org/faq From a61e05cd6a1e856958ba5ed5c18f03a099c1aecf Mon Sep 17 00:00:00 2001 From: Joseph Spisak Date: Wed, 15 Oct 2025 08:54:36 -0700 Subject: [PATCH 09/16] Create CONTRIBUTING.md --- CONTRIBUTING.md | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 CONTRIBUTING.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..535c71dd --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,39 @@ +# Contributing to __________ +We want to make contributing to this project as easy and transparent as +possible. + +## Our Development Process +... (in particular how this is synced with internal changes to the project) + +## Pull Requests +We actively welcome your pull requests. + +1. Fork the repo and create your branch from `main`. +2. If you've added code that should be tested, add tests. +3. If you've changed APIs, update the documentation. +4. Ensure the test suite passes. +5. Make sure your code lints. +6. If you haven't already, complete the Contributor License Agreement ("CLA"). + +## Contributor License Agreement ("CLA") +In order to accept your pull request, we need you to submit a CLA. You only need +to do this once to work on any of Meta's open source projects. + +Complete your CLA here: + +## Issues +We use GitHub issues to track public bugs. Please ensure your description is +clear and has sufficient instructions to be able to reproduce the issue. + +Meta has a [bounty program](https://bugbounty.meta.com/) for the safe +disclosure of security bugs. In those cases, please go through the process +outlined on that page and do not file a public issue. + +## Coding Style +* 2 spaces for indentation rather than tabs +* 80 character line length +* ... + +## License +By contributing to __________, you agree that your contributions will be licensed +under the LICENSE file in the root directory of this source tree. From a686771ebd8721f7283d3a8b8d78c3fa30ef039c Mon Sep 17 00:00:00 2001 From: Joseph Spisak Date: Wed, 15 Oct 2025 08:56:02 -0700 Subject: [PATCH 10/16] Create LICENSE --- LICENSE | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 LICENSE diff --git a/LICENSE b/LICENSE new file mode 100644 index 00000000..6d1df98f --- /dev/null +++ b/LICENSE @@ -0,0 +1,28 @@ +BSD 3-Clause License + +(c) Meta Platforms, Inc. and affiliates. + +Redistribution and use in source and binary forms, with or without modification, +are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice,this list +of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, this +list of conditions and the following disclaimer in the documentation +and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its contributors may +be used to endorse or promote products derived from this software without specific +prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY +EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES +OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT +SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH +DAMAGE. From b966944f43efb4329df07894c0e025d10b68b966 Mon Sep 17 00:00:00 2001 From: Joseph Spisak Date: Wed, 15 Oct 2025 09:27:04 -0700 Subject: [PATCH 11/16] Update local_coding_env.py --- examples/local_coding_env.py | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/examples/local_coding_env.py b/examples/local_coding_env.py index 69f4a2ee..e88dcb35 100644 --- a/examples/local_coding_env.py +++ b/examples/local_coding_env.py @@ -1,3 +1,9 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + #!/usr/bin/env python3 """ Simple test showing how users will use CodingEnv.from_docker_image(). From 3998e50bd5ff1126691d9ce1ad7883897f94d5b4 Mon Sep 17 00:00:00 2001 From: pankit-eng Date: Wed, 15 Oct 2025 14:11:01 -0400 Subject: [PATCH 12/16] add dir structure to README --- README.md | 30 +++ rfcs/MCPTools-integ-RFC.md | 440 +++++++++++++++++++++++++++++++++++++ 2 files changed, 470 insertions(+) create mode 100644 rfcs/MCPTools-integ-RFC.md diff --git a/README.md b/README.md index 0d12b7b1..76d4d5c4 100644 --- a/README.md +++ b/README.md @@ -65,6 +65,36 @@ Type-safe data structures: - `State`: Episode state tracking - `StepResult`: Combines observation, reward, done flag +## Project Structure + +### For Environment Creators + +When building a new environment, create the following structure: + +``` +src/envs/your_env/ +├── __init__.py # Export YourAction, YourObservation, YourEnv +├── models.py # Define Action, Observation, State dataclasses +├── client.py # Implement YourEnv(HTTPEnvClient) +├── README.md # Document your environment +└── server/ + ├── your_environment.py # Implement YourEnvironment(Environment) + ├── app.py # Create FastAPI app + └── Dockerfile # Define container image +``` + +See [`src/envs/README.md`](src/envs/README.md) for a complete guide on building environments. + +### For Environment Users + +To use an environment: +1. Import from `envs.your_env`: `from envs.echo_env import EchoAction, EchoEnv` +2. Create client: `client = EchoEnv.from_docker_image("echo-env:latest")` +3. Interact: `client.reset()`, `client.step(action)`, `client.state()` +4. Cleanup: `client.close()` + +See example scripts in `examples/` directory. + ## Design Principles 1. **Separation of Concerns**: Clear client-server boundaries diff --git a/rfcs/MCPTools-integ-RFC.md b/rfcs/MCPTools-integ-RFC.md new file mode 100644 index 00000000..27034e31 --- /dev/null +++ b/rfcs/MCPTools-integ-RFC.md @@ -0,0 +1,440 @@ +# RFC: MCP Tools Integration for OpenEnv + +**Status**: Request for Comments +**Created**: OCtober 2025 +**Authors**: EnvTorch Contributors +**Related**: OpenEnv-0.1-RFC.md + +## Summary + +This RFC proposes integrating Model Context Protocol (MCP) tools into OpenEnv environments, enabling agents and RL frameworks to discover and use external tools (file systems, APIs, databases, etc.) through a standardized interface. MCP tools will be surfaced via environment manifests, discoverable through standard APIs, and routable from within containerized environments. + +## Motivation + +### Problem Statement + +AI agents and RL systems often need to interact with external tools and data sources: +- File system operations (read, write, search) +- API calls (web search, weather, databases) +- Code execution environments with tool access +- Multi-tool orchestration for complex tasks + +Current challenges: +- **Discovery Problem**: Agents don't know what tools are available +- **Isolation Issues**: Tools running in containers need secure access to external resources +- **Configuration Complexity**: Tool setup varies across environments +- **Type Safety**: No standardized schemas for tool inputs/outputs + +### Goals + +1. **Standardized Tool Interface**: Use MCP protocol for consistent tool integration +2. **Easy Discovery**: Agents can query available tools via environment Gymnasium based APIs +3. **Declarative Configuration**: Tools defined in YAML manifests +4. **Container-Safe**: Tools work within Docker isolation boundaries +5. **Type-Safe**: Tool schemas validated at runtime +6. **Extensible**: Easy to add new MCP tool servers + +## Background: Model Context Protocol (MCP) + +MCP is a protocol that standardizes how AI assistants connect to data sources and tools: + +- **MCP Server**: Provides tools/resources via standard protocol +- **MCP Client**: Discovers and invokes tools from servers +- **Tool Schema**: JSON Schema definition of inputs/outputs +- **Transport**: Typically stdio or HTTP + +Example MCP tool: +```json +{ + "name": "read_file", + "description": "Read contents of a file", + "inputSchema": { + "type": "object", + "properties": { + "path": {"type": "string"} + }, + "required": ["path"] + } +} +``` + +### Types of MCP Tools Supported + +OpenEnv will support three categories of MCP tools: + +#### 1. External Backend Tools +Tools that call to another backend service or API. + +**Examples**: +- Web search (Brave, Google, Bing) +- GitHub API (create issues, PRs, search repos) +- Slack/Discord bots +- Weather APIs +- Database queries (Postgres, MongoDB) + +**Characteristics**: +- Require API keys/credentials +- Make external network calls +- May have rate limits + +**Configuration Example**: +```yaml +- name: github + type: mcp + mcp_server: + command: "npx" + args: ["-y", "@modelcontextprotocol/server-github"] + transport: stdio + env: + GITHUB_TOKEN: "${GITHUB_TOKEN}" +``` + +#### 2. Simulated Backend Tools +Tools that simulate backend services for testing/training without real API calls. + +**Examples**: +- Mock Airbnb booking system +- Mock Google Calendar +- Mock payment gateway +- Mock email service +- Simulated REST APIs + +**Characteristics**: +- No external dependencies +- Fast and deterministic +- Perfect for training RL agents +- Can be reset to initial state +- Useful for benchmarking + +**Configuration Example**: +```yaml +- name: airbnb_simulator + type: mcp + mcp_server: + command: "python" + args: ["-m", "mcp_simulators.airbnb"] + transport: stdio + env: + SIMULATOR_SEED: "42" +``` + +#### 3. Self-Contained Service Tools +Tools that provide real services directly within the container. + +**Examples**: +- Filesystem operations (read, write, search) +- SQLite database +- Local git operations +- Image processing +- PDF parsing + +**Characteristics**: +- No external dependencies +- Fast execution +- Work offline +- Data stays within container +- Stateful across environment steps + +**Configuration Example**: +```yaml +- name: filesystem + type: mcp + mcp_server: + command: "npx" + args: ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"] + transport: stdio +``` + +## Design + +### Architecture Overview + +``` +┌────────────────────────────────────────────────────────────┐ +│ Agent / RL Framework │ +│ │ +│ 1. Query /tools → Get available tools │ +│ 2. Execute action with tool_name + tool_args │ +└────────────────┬───────────────────────────────────────────┘ + │ HTTP + │ +┌────────────────▼───────────────────────────────────────────┐ +│ Environment Server (in Container) │ +│ ┌──────────────────────────────────────────────────────┐ │ +│ │ Environment │ │ +│ │ - Reads tools.yaml manifest │ │ +│ │ - Routes tool calls to MCP clients │ │ +│ │ - Returns tool results in observations │ │ +│ └──────────────┬───────────────────────────────────────┘ │ +│ │ │ +│ ┌──────────────▼───────────────────────────────────────┐ │ +│ │ MCP Client Manager │ │ +│ │ - Manages connections to MCP servers │ │ +│ │ - Caches tool schemas │ │ +│ │ - Handles tool invocation │ │ +│ └──────────────┬───────────────────────────────────────┘ │ +└─────────────────┼──────────────────────────────────────────┘ + │ + ┌─────────────┼─────────────┐ + │ │ │ +┌───▼────┐ ┌────▼─────┐ ┌───▼────┐ +│ MCP │ │ MCP │ │ MCP │ +│ Server │ │ Server │ │ Server │ +│ (Files)│ │ (GitHub) │ │ (DB) │ +└────────┘ └──────────┘ └────────┘ +``` + +### 1. Surfacing MCP Tools via Manifest + +#### Environment Tools Manifest (`tools.yaml`) + +Each environment declares available tools in a YAML manifest: + +```yaml +# src/envs/my_env/tools.yaml +version: "1.0" +tools: + - name: filesystem + type: mcp + mcp_server: + command: "npx" + args: ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"] + transport: stdio + enabled: true + + - name: github + type: mcp + mcp_server: + command: "npx" + args: ["-y", "@modelcontextprotocol/server-github"] + transport: stdio + env: + GITHUB_TOKEN: "${GITHUB_TOKEN}" + enabled: true + + - name: brave_search + type: mcp + mcp_server: + url: "http://mcp-brave-search:8080" + transport: http + enabled: false # Disabled by default +``` + +**Manifest Fields**: +- `name`: Unique identifier for the tool set +- `type`: Always "mcp" for MCP tools +- `mcp_server.command`: Executable to run (for stdio transport) +- `mcp_server.args`: Arguments to pass +- `mcp_server.url`: HTTP endpoint (for http transport) +- `mcp_server.env`: Environment variables (with secret interpolation) +- `enabled`: Whether to load this tool set + +#### Loading Manifest in Environment + +```python +# server/my_environment.py +import yaml +from pathlib import Path +from core.tools.mcp import MCPClientManager + +class MyEnvironment(Environment): + def __init__(self): + super().__init__() + self._state = MyState() + + # Load tools manifest + tools_config = self._load_tools_manifest() + self._mcp_manager = MCPClientManager(tools_config) + + def _load_tools_manifest(self) -> dict: + manifest_path = Path(__file__).parent.parent / "tools.yaml" + if manifest_path.exists(): + with open(manifest_path) as f: + return yaml.safe_load(f) + return {"tools": []} +``` + +### 2. Tool Discoverability + +#### New HTTP Endpoint: `/tools` + +Environments expose available tools via a dedicated endpoint: + +```python +# core/env_server/http_server.py + +@app.get("/tools") +async def get_tools() -> Dict[str, Any]: + """Get available tools in this environment.""" + tools = env.get_available_tools() + return { + "tools": [ + { + "name": tool.name, + "description": tool.description, + "inputSchema": tool.input_schema, + } + for tool in tools + ] + } +``` + +**Response Example**: +```json +{ + "tools": [ + { + "name": "read_file", + "description": "Read contents of a file", + "inputSchema": { + "type": "object", + "properties": { + "path": {"type": "string"} + }, + "required": ["path"] + } + } + ] +} +``` + +### 3. Routing Tool Calls from Environment + +#### Action Model Extension + +```python +@dataclass +class UnifiedAction(Action): + """Unified action supporting both code and tools.""" + action_type: str # "code" or "tool" + code: Optional[str] = None + tool_name: Optional[str] = None + tool_args: Optional[Dict[str, Any]] = None +``` + +#### Environment Step Logic + +```python +def step(self, action: UnifiedAction) -> MyObservation: + if action.action_type == "code": + result = self._executor.run(action.code) + return MyObservation(stdout=result.stdout, ...) + + elif action.action_type == "tool": + tool_result = self._mcp_manager.call_tool( + tool_name=action.tool_name, + arguments=action.tool_args + ) + return MyObservation(stdout=str(tool_result.content), ...) +``` + +### 4. Environment Logic for Tool Support + +```python +class ToolEnabledEnvironment(Environment): + """Base class for environments that support MCP tools.""" + + def __init__(self, tools_manifest: str | None = None): + super().__init__() + if tools_manifest: + tools_config = self._load_tools_manifest(tools_manifest) + self._mcp_manager = MCPClientManager(tools_config) + + def get_available_tools(self) -> List[Dict[str, Any]]: + if self._mcp_manager: + return self._mcp_manager.get_available_tools() + return [] +``` + +### 5. Docker Runtime and Packaging + +#### Dockerfile with MCP Support + +```dockerfile +FROM envtorch-base:latest + +# Install Node.js for MCP servers +RUN apt-get update && apt-get install -y nodejs npm && \ + rm -rf /var/lib/apt/lists/* + +# Install Python MCP SDK +RUN pip install --no-cache-dir mcp + +# Copy environment code +COPY src/core/ /app/src/core/ +COPY src/envs/my_env/ /app/src/envs/my_env/ + +# Set up workspace for filesystem tools +RUN mkdir -p /workspace && chmod 777 /workspace + +CMD ["uvicorn", "envs.my_env.server.app:app", "--host", "0.0.0.0", "--port", "8000"] +``` + +#### Volume Mounts for Tool Access + +```python +# Starting container with workspace mount +client = CodingEnv.from_docker_image( + "coding-env:latest", + workspace_dir=os.getcwd(), # Mount current directory + env_vars={"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")} +) +``` + +## Open Questions & Feedback Requested + +### 1. Tool Discovery Timing +**Question**: When should tools be discovered - at startup or lazy loaded? + +**Feedback Needed**: Which approach fits your use case? + +### 2. Tool Security +**Question**: How should we handle tool permissions and secrets? + +**Options**: +- Allowlist of permitted tools +- Role-based access control +- Audit logging + +**Feedback Needed**: What security measures are essential? + +### 3. Tool Schema Validation +**Question**: Client-side, server-side, or both? + +**Feedback Needed**: Where should validation happen? + +### 4. Multi-Tool Composition +**Question**: Should environments support automatic tool chaining? + +**Feedback Needed**: Is this in scope? + +## Implementation Plan + +### Phase 1: Core MCP Integration +- [ ] Add MCP Python SDK dependency +- [ ] Implement `MCPClientManager` +- [ ] Add `ToolEnabledEnvironment` base class +- [ ] Implement `/tools` HTTP endpoint +- [ ] Create tools.yaml manifest schema + +### Phase 2: Container Support +- [ ] Update Dockerfile to include Node.js +- [ ] Add volume mount support +- [ ] Test with filesystem MCP server + +### Phase 3: Example Environments +- [ ] Update `CodingEnv` to support tools +- [ ] Add example tools.yaml +- [ ] Create documentation + +## Conclusion + +MCP tools integration will enable EnvTorch environments to provide standardized tool access to AI agents and RL frameworks. The proposed design uses declarative YAML manifests, provides automatic discovery via `/tools` endpoint, and maintains container isolation through careful volume mounting and environment variable passing. + +**Key areas for feedback**: +1. Tool discovery and initialization timing +2. Security model for tool access +3. Multi-tool composition support +4. Docker runtime requirements + +Please share your thoughts via GitHub Issues or Discussions! From c6f39163f5274c7e01c45dd27cb5583728fe1f91 Mon Sep 17 00:00:00 2001 From: Pankit Date: Wed, 15 Oct 2025 11:18:51 -0700 Subject: [PATCH 13/16] Delete rfcs/MCPTools-integ-RFC.md --- rfcs/MCPTools-integ-RFC.md | 440 ------------------------------------- 1 file changed, 440 deletions(-) delete mode 100644 rfcs/MCPTools-integ-RFC.md diff --git a/rfcs/MCPTools-integ-RFC.md b/rfcs/MCPTools-integ-RFC.md deleted file mode 100644 index 27034e31..00000000 --- a/rfcs/MCPTools-integ-RFC.md +++ /dev/null @@ -1,440 +0,0 @@ -# RFC: MCP Tools Integration for OpenEnv - -**Status**: Request for Comments -**Created**: OCtober 2025 -**Authors**: EnvTorch Contributors -**Related**: OpenEnv-0.1-RFC.md - -## Summary - -This RFC proposes integrating Model Context Protocol (MCP) tools into OpenEnv environments, enabling agents and RL frameworks to discover and use external tools (file systems, APIs, databases, etc.) through a standardized interface. MCP tools will be surfaced via environment manifests, discoverable through standard APIs, and routable from within containerized environments. - -## Motivation - -### Problem Statement - -AI agents and RL systems often need to interact with external tools and data sources: -- File system operations (read, write, search) -- API calls (web search, weather, databases) -- Code execution environments with tool access -- Multi-tool orchestration for complex tasks - -Current challenges: -- **Discovery Problem**: Agents don't know what tools are available -- **Isolation Issues**: Tools running in containers need secure access to external resources -- **Configuration Complexity**: Tool setup varies across environments -- **Type Safety**: No standardized schemas for tool inputs/outputs - -### Goals - -1. **Standardized Tool Interface**: Use MCP protocol for consistent tool integration -2. **Easy Discovery**: Agents can query available tools via environment Gymnasium based APIs -3. **Declarative Configuration**: Tools defined in YAML manifests -4. **Container-Safe**: Tools work within Docker isolation boundaries -5. **Type-Safe**: Tool schemas validated at runtime -6. **Extensible**: Easy to add new MCP tool servers - -## Background: Model Context Protocol (MCP) - -MCP is a protocol that standardizes how AI assistants connect to data sources and tools: - -- **MCP Server**: Provides tools/resources via standard protocol -- **MCP Client**: Discovers and invokes tools from servers -- **Tool Schema**: JSON Schema definition of inputs/outputs -- **Transport**: Typically stdio or HTTP - -Example MCP tool: -```json -{ - "name": "read_file", - "description": "Read contents of a file", - "inputSchema": { - "type": "object", - "properties": { - "path": {"type": "string"} - }, - "required": ["path"] - } -} -``` - -### Types of MCP Tools Supported - -OpenEnv will support three categories of MCP tools: - -#### 1. External Backend Tools -Tools that call to another backend service or API. - -**Examples**: -- Web search (Brave, Google, Bing) -- GitHub API (create issues, PRs, search repos) -- Slack/Discord bots -- Weather APIs -- Database queries (Postgres, MongoDB) - -**Characteristics**: -- Require API keys/credentials -- Make external network calls -- May have rate limits - -**Configuration Example**: -```yaml -- name: github - type: mcp - mcp_server: - command: "npx" - args: ["-y", "@modelcontextprotocol/server-github"] - transport: stdio - env: - GITHUB_TOKEN: "${GITHUB_TOKEN}" -``` - -#### 2. Simulated Backend Tools -Tools that simulate backend services for testing/training without real API calls. - -**Examples**: -- Mock Airbnb booking system -- Mock Google Calendar -- Mock payment gateway -- Mock email service -- Simulated REST APIs - -**Characteristics**: -- No external dependencies -- Fast and deterministic -- Perfect for training RL agents -- Can be reset to initial state -- Useful for benchmarking - -**Configuration Example**: -```yaml -- name: airbnb_simulator - type: mcp - mcp_server: - command: "python" - args: ["-m", "mcp_simulators.airbnb"] - transport: stdio - env: - SIMULATOR_SEED: "42" -``` - -#### 3. Self-Contained Service Tools -Tools that provide real services directly within the container. - -**Examples**: -- Filesystem operations (read, write, search) -- SQLite database -- Local git operations -- Image processing -- PDF parsing - -**Characteristics**: -- No external dependencies -- Fast execution -- Work offline -- Data stays within container -- Stateful across environment steps - -**Configuration Example**: -```yaml -- name: filesystem - type: mcp - mcp_server: - command: "npx" - args: ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"] - transport: stdio -``` - -## Design - -### Architecture Overview - -``` -┌────────────────────────────────────────────────────────────┐ -│ Agent / RL Framework │ -│ │ -│ 1. Query /tools → Get available tools │ -│ 2. Execute action with tool_name + tool_args │ -└────────────────┬───────────────────────────────────────────┘ - │ HTTP - │ -┌────────────────▼───────────────────────────────────────────┐ -│ Environment Server (in Container) │ -│ ┌──────────────────────────────────────────────────────┐ │ -│ │ Environment │ │ -│ │ - Reads tools.yaml manifest │ │ -│ │ - Routes tool calls to MCP clients │ │ -│ │ - Returns tool results in observations │ │ -│ └──────────────┬───────────────────────────────────────┘ │ -│ │ │ -│ ┌──────────────▼───────────────────────────────────────┐ │ -│ │ MCP Client Manager │ │ -│ │ - Manages connections to MCP servers │ │ -│ │ - Caches tool schemas │ │ -│ │ - Handles tool invocation │ │ -│ └──────────────┬───────────────────────────────────────┘ │ -└─────────────────┼──────────────────────────────────────────┘ - │ - ┌─────────────┼─────────────┐ - │ │ │ -┌───▼────┐ ┌────▼─────┐ ┌───▼────┐ -│ MCP │ │ MCP │ │ MCP │ -│ Server │ │ Server │ │ Server │ -│ (Files)│ │ (GitHub) │ │ (DB) │ -└────────┘ └──────────┘ └────────┘ -``` - -### 1. Surfacing MCP Tools via Manifest - -#### Environment Tools Manifest (`tools.yaml`) - -Each environment declares available tools in a YAML manifest: - -```yaml -# src/envs/my_env/tools.yaml -version: "1.0" -tools: - - name: filesystem - type: mcp - mcp_server: - command: "npx" - args: ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"] - transport: stdio - enabled: true - - - name: github - type: mcp - mcp_server: - command: "npx" - args: ["-y", "@modelcontextprotocol/server-github"] - transport: stdio - env: - GITHUB_TOKEN: "${GITHUB_TOKEN}" - enabled: true - - - name: brave_search - type: mcp - mcp_server: - url: "http://mcp-brave-search:8080" - transport: http - enabled: false # Disabled by default -``` - -**Manifest Fields**: -- `name`: Unique identifier for the tool set -- `type`: Always "mcp" for MCP tools -- `mcp_server.command`: Executable to run (for stdio transport) -- `mcp_server.args`: Arguments to pass -- `mcp_server.url`: HTTP endpoint (for http transport) -- `mcp_server.env`: Environment variables (with secret interpolation) -- `enabled`: Whether to load this tool set - -#### Loading Manifest in Environment - -```python -# server/my_environment.py -import yaml -from pathlib import Path -from core.tools.mcp import MCPClientManager - -class MyEnvironment(Environment): - def __init__(self): - super().__init__() - self._state = MyState() - - # Load tools manifest - tools_config = self._load_tools_manifest() - self._mcp_manager = MCPClientManager(tools_config) - - def _load_tools_manifest(self) -> dict: - manifest_path = Path(__file__).parent.parent / "tools.yaml" - if manifest_path.exists(): - with open(manifest_path) as f: - return yaml.safe_load(f) - return {"tools": []} -``` - -### 2. Tool Discoverability - -#### New HTTP Endpoint: `/tools` - -Environments expose available tools via a dedicated endpoint: - -```python -# core/env_server/http_server.py - -@app.get("/tools") -async def get_tools() -> Dict[str, Any]: - """Get available tools in this environment.""" - tools = env.get_available_tools() - return { - "tools": [ - { - "name": tool.name, - "description": tool.description, - "inputSchema": tool.input_schema, - } - for tool in tools - ] - } -``` - -**Response Example**: -```json -{ - "tools": [ - { - "name": "read_file", - "description": "Read contents of a file", - "inputSchema": { - "type": "object", - "properties": { - "path": {"type": "string"} - }, - "required": ["path"] - } - } - ] -} -``` - -### 3. Routing Tool Calls from Environment - -#### Action Model Extension - -```python -@dataclass -class UnifiedAction(Action): - """Unified action supporting both code and tools.""" - action_type: str # "code" or "tool" - code: Optional[str] = None - tool_name: Optional[str] = None - tool_args: Optional[Dict[str, Any]] = None -``` - -#### Environment Step Logic - -```python -def step(self, action: UnifiedAction) -> MyObservation: - if action.action_type == "code": - result = self._executor.run(action.code) - return MyObservation(stdout=result.stdout, ...) - - elif action.action_type == "tool": - tool_result = self._mcp_manager.call_tool( - tool_name=action.tool_name, - arguments=action.tool_args - ) - return MyObservation(stdout=str(tool_result.content), ...) -``` - -### 4. Environment Logic for Tool Support - -```python -class ToolEnabledEnvironment(Environment): - """Base class for environments that support MCP tools.""" - - def __init__(self, tools_manifest: str | None = None): - super().__init__() - if tools_manifest: - tools_config = self._load_tools_manifest(tools_manifest) - self._mcp_manager = MCPClientManager(tools_config) - - def get_available_tools(self) -> List[Dict[str, Any]]: - if self._mcp_manager: - return self._mcp_manager.get_available_tools() - return [] -``` - -### 5. Docker Runtime and Packaging - -#### Dockerfile with MCP Support - -```dockerfile -FROM envtorch-base:latest - -# Install Node.js for MCP servers -RUN apt-get update && apt-get install -y nodejs npm && \ - rm -rf /var/lib/apt/lists/* - -# Install Python MCP SDK -RUN pip install --no-cache-dir mcp - -# Copy environment code -COPY src/core/ /app/src/core/ -COPY src/envs/my_env/ /app/src/envs/my_env/ - -# Set up workspace for filesystem tools -RUN mkdir -p /workspace && chmod 777 /workspace - -CMD ["uvicorn", "envs.my_env.server.app:app", "--host", "0.0.0.0", "--port", "8000"] -``` - -#### Volume Mounts for Tool Access - -```python -# Starting container with workspace mount -client = CodingEnv.from_docker_image( - "coding-env:latest", - workspace_dir=os.getcwd(), # Mount current directory - env_vars={"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")} -) -``` - -## Open Questions & Feedback Requested - -### 1. Tool Discovery Timing -**Question**: When should tools be discovered - at startup or lazy loaded? - -**Feedback Needed**: Which approach fits your use case? - -### 2. Tool Security -**Question**: How should we handle tool permissions and secrets? - -**Options**: -- Allowlist of permitted tools -- Role-based access control -- Audit logging - -**Feedback Needed**: What security measures are essential? - -### 3. Tool Schema Validation -**Question**: Client-side, server-side, or both? - -**Feedback Needed**: Where should validation happen? - -### 4. Multi-Tool Composition -**Question**: Should environments support automatic tool chaining? - -**Feedback Needed**: Is this in scope? - -## Implementation Plan - -### Phase 1: Core MCP Integration -- [ ] Add MCP Python SDK dependency -- [ ] Implement `MCPClientManager` -- [ ] Add `ToolEnabledEnvironment` base class -- [ ] Implement `/tools` HTTP endpoint -- [ ] Create tools.yaml manifest schema - -### Phase 2: Container Support -- [ ] Update Dockerfile to include Node.js -- [ ] Add volume mount support -- [ ] Test with filesystem MCP server - -### Phase 3: Example Environments -- [ ] Update `CodingEnv` to support tools -- [ ] Add example tools.yaml -- [ ] Create documentation - -## Conclusion - -MCP tools integration will enable EnvTorch environments to provide standardized tool access to AI agents and RL frameworks. The proposed design uses declarative YAML manifests, provides automatic discovery via `/tools` endpoint, and maintains container isolation through careful volume mounting and environment variable passing. - -**Key areas for feedback**: -1. Tool discovery and initialization timing -2. Security model for tool access -3. Multi-tool composition support -4. Docker runtime requirements - -Please share your thoughts via GitHub Issues or Discussions! From 76700f8b416bf20437f205c64e97c6270b4fedf1 Mon Sep 17 00:00:00 2001 From: pankit-eng Date: Wed, 15 Oct 2025 16:46:19 -0400 Subject: [PATCH 14/16] rename RFC --- ...OpenEnv-0.1-RFC.md => 001-openenv-spec.md} | 38 +++++++++++-------- 1 file changed, 22 insertions(+), 16 deletions(-) rename rfcs/{OpenEnv-0.1-RFC.md => 001-openenv-spec.md} (85%) diff --git a/rfcs/OpenEnv-0.1-RFC.md b/rfcs/001-openenv-spec.md similarity index 85% rename from rfcs/OpenEnv-0.1-RFC.md rename to rfcs/001-openenv-spec.md index d90fb6e5..34599764 100644 --- a/rfcs/OpenEnv-0.1-RFC.md +++ b/rfcs/001-openenv-spec.md @@ -1,8 +1,9 @@ -# RFC: OpenEnv Framework for agent execution environments +# RFC: OpenEnv Framework Spec for agent execution environments -**Status**: Request for Comments(RFC) -**Created**: October 2025 -**Authors**: OpenEnv Contributors +**Status**: In Review +**Created**: 10/14/2025 +**Authors**: @Darktex, @pankit-eng +**RFC ID:** 001 ## Summary @@ -140,7 +141,22 @@ class ContainerProvider(ABC): ### Key Design Decisions -#### Decision 1: HTTP-Based Communication +In this RFC, we want to align on four decisions that will shape the overall design of the framework. + +#### Decision 1: Baseline API Set + +**Chosen Approach**: Define three core APIs as the baseline interface for this framework: `step`, `reset`, and `state`. + +**Rationale**: +- **`reset()`**: Initializes a new episode and returns initial observation, providing a clean starting point for agent interactions +- **`step(action)`**: Executes an action and returns an observation, forming the core interaction loop +- **`state()`**: Provides visibility into the current episode state and metadata + +These three APIs establish the minimum viable interface for environment interaction and are sufficient for basic RL training workflows. They align with established patterns from Gymnasium and similar frameworks, making them immediately familiar to practitioners. + +**Scope**: This RFC focuses exclusively on these baseline APIs. Additional APIs (e.g., `render()`, `seed()`, `close()`, `tools()` and environment-specific utilities) will be explored in follow-up RFCs. + +#### Decision 2: HTTP-Based Communication **Chosen Approach**: Use HTTP/REST for client-server communication @@ -150,7 +166,7 @@ class ContainerProvider(ABC): - Supports language-agnostic clients - FastAPI provides excellent developer experience -#### Decision 2: Docker-Based runtime isolation and packaging +#### Decision 3: Docker-Based runtime isolation and packaging **Chosen Approach**: Each environment runs in its own Docker container @@ -160,16 +176,6 @@ class ContainerProvider(ABC): - Easy dependency management via Dockerfile - Industry-standard tooling -#### Decision 3: Type-Safe Models - -**Chosen Approach**: Use Python dataclasses with explicit types for actions, observations, and state - -**Rationale**: -- Native Python support (no extra dependencies) -- Clear contracts between client and server -- IDE support for autocomplete and type checking -- Easy serialization to/from JSON - ### Example Environments From f14dba1ff87903c728870f859a336f0794a4673d Mon Sep 17 00:00:00 2001 From: pankit-eng Date: Wed, 15 Oct 2025 16:56:34 -0400 Subject: [PATCH 15/16] Update co-authors that were missed --- rfcs/001-openenv-spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/001-openenv-spec.md b/rfcs/001-openenv-spec.md index 34599764..7a7146f3 100644 --- a/rfcs/001-openenv-spec.md +++ b/rfcs/001-openenv-spec.md @@ -2,7 +2,7 @@ **Status**: In Review **Created**: 10/14/2025 -**Authors**: @Darktex, @pankit-eng +**Authors**: @Darktex, @pankit-eng, @jspisak, @zkwentz **RFC ID:** 001 ## Summary From 4e752b284d4b0ad75f865e535715e68f82ef8d9d Mon Sep 17 00:00:00 2001 From: pankit-eng Date: Wed, 15 Oct 2025 17:42:05 -0400 Subject: [PATCH 16/16] add a decision to RFC about reward computatio --- rfcs/001-openenv-spec.md | 33 +++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/rfcs/001-openenv-spec.md b/rfcs/001-openenv-spec.md index 7a7146f3..fbe71007 100644 --- a/rfcs/001-openenv-spec.md +++ b/rfcs/001-openenv-spec.md @@ -156,7 +156,36 @@ These three APIs establish the minimum viable interface for environment interact **Scope**: This RFC focuses exclusively on these baseline APIs. Additional APIs (e.g., `render()`, `seed()`, `close()`, `tools()` and environment-specific utilities) will be explored in follow-up RFCs. -#### Decision 2: HTTP-Based Communication +#### Decision 2: Environment-Computed Rewards + +**Chosen Approach**: Rewards are computed inside the environment and returned as part of the observation. + +**Rationale**: +- **Encapsulation**: Reward logic stays with the environment where domain knowledge resides +- **Consistency**: Ensures reward computation is deterministic and reproducible across different client implementations +- **Flexibility**: Environments can use internal state and context not visible to clients for reward computation +- **Standard Pattern**: Aligns with Gymnasium/Gym conventions where rewards are returned from `step()` + +The `Observation` base class includes a `reward` field that environments populate: + +```python +@dataclass(kw_only=True) +class Observation: + """Base class for all environment observations.""" + done: bool = False + reward: Union[bool, int, float, None] = None + metadata: Dict[str, Any] = field(default_factory=dict) +``` + +This design enables environments to compute rewards based on: +- Action outcomes (e.g., exit codes, success/failure) +- Internal state transitions +- Multi-step trajectories +- Domain-specific metrics + +Clients receive fully-formed observations with rewards already computed, simplifying the client-side RL loop. + +#### Decision 3: HTTP-Based Communication **Chosen Approach**: Use HTTP/REST for client-server communication @@ -166,7 +195,7 @@ These three APIs establish the minimum viable interface for environment interact - Supports language-agnostic clients - FastAPI provides excellent developer experience -#### Decision 3: Docker-Based runtime isolation and packaging +#### Decision 4: Docker-Based runtime isolation and packaging **Chosen Approach**: Each environment runs in its own Docker container