[RFC 001] - Baseline API and Interface Specifications #26

pankit-eng · 2025-10-14T17:51:05Z

This PR is to discuss OpenEnv 0.1 RFC with focus on

Baseline API interface that an environment should provide
Packaging and runtime interfaces - docker
Communication interfaces - HTTP

What has been proposed here is already available on the master branch to try out and gather feedback from the current experience.

NOTE: Extensions to supporting observability, mcp tools will follow up this baseline API spec RFC in order to ensure

updated the name to OpenEnv

jspisak · 2025-10-15T16:29:13Z

will we link this to the top level readme to ensure folks see it?

adding a pytorch logo :)

Adding an experimental warning to the readme.

Creating a PR to update naming on the Readme

Co-authored-by: Copilot <[email protected]>

adding the CoC..

Darktex · 2025-10-17T16:47:13Z

rfcs/001-openenv-spec.md

+```
+┌─────────────────────────────────────────────────────────┐
+│             RL code(Client Application)                 │
+│             RL code(Client Application)                 │


double line

Darktex · 2025-10-17T16:49:33Z

rfcs/001-openenv-spec.md

+│  │ (HTTPEnvClient)│              │ (HTTPEnvClient)  │   │
+│  └────────┬───────┘              └────────┬─────────┘   │
+└───────────┼───────────────────────────────┼─────────────┘
+            │ HTTP (reset, step, state)     │ HTTP


I'm not sure we should expose state as the model is that you keep that private and only return what you are allowed to see under the observation. If you are playing chess or having a 1:1 conversation, you are allowed to see everything so it doesn't matter. But it does matter in many real-life applications, which involved imperfect information (e.g. poker, you don't see other people's hands. But also a driving sim, where some cars will move out of your view because they are occluded by buildings or other cars)

Darktex · 2025-10-17T16:50:02Z

rfcs/001-openenv-spec.md

+
+    @property
+    @abstractmethod
+    def state(self) -> State:


Nothing in python is really private, so idk how to enforce this

Darktex · 2025-10-17T16:50:56Z

rfcs/001-openenv-spec.md

+#### 1. Environment (Server-Side)
+
+```python
+class Environment(ABC):


If we add a way of discovering actions (perhaps the topic of another RFC) it will have to backpropagate here

Darktex · 2025-10-17T16:52:20Z

rfcs/001-openenv-spec.md

+- Generic types (`Generic[ActT, ObsT]`) provide compile-time type safety
+- Each environment's concrete client class implements parsing step, observation, and state responses from the server into corresponding data models for the respective response.
+- Each environment's concrete client class implements parsing step, observation, and state responses from the server into corresponding data models for the respective response.
+- Example: `CodingEnv(HTTPEnvClient[CodeAction, CodeObservation])`


We need a better naming convention here. I know that CodingEnvClient is a bit heavy, but I find the current convention of naming the server CodingEnvironment and the client CodingEnv to be deceptive/confusing

Darktex · 2025-10-17T16:53:14Z

rfcs/001-openenv-spec.md

+    """Abstract base for container orchestration."""
+
+    @abstractmethod
+    def start_container(self, image: str, ...) -> str:


Since we call .reset() a lot, does it make sense to have a .reset() here too to like restart from a warmed-up image?

Darktex · 2025-10-17T16:54:36Z

rfcs/001-openenv-spec.md

+- **Flexibility**: Environments can use internal state and context not visible to clients for reward computation
+- **Standard Pattern**: Aligns with Gymnasium/Gym conventions where rewards are returned from `step()`
+
+The `Observation` base class includes a `reward` field that environments populate:


Optional reward field.

Darktex · 2025-10-17T16:55:08Z

rfcs/001-openenv-spec.md

+
+These three APIs establish the minimum viable interface for environment interaction and are sufficient for basic RL training workflows. They align with established patterns from Gymnasium and similar frameworks, making them immediately familiar to practitioners.
+
+**Scope**: This RFC focuses exclusively on these baseline APIs. Additional APIs (e.g., `render()`, `seed()`, `close()`, `tools()` and  environment-specific utilities) will be explored in follow-up RFCs.


I think we should call this in the very first line, so that the reader is not gonna be like "BUT WHAT ABOUT TOOLS"

Darktex · 2025-10-17T16:56:04Z

rfcs/001-openenv-spec.md

+    metadata: Dict[str, Any] = field(default_factory=dict)
+```
+
+This design enables environments to compute rewards based on:


"Note that the environment is just the place where these are returned, not necessarily where they are computed. For example, we recommend that you RPC to a GPU machine hosting your reward model"

(This brings the next question: what standard should said RPCs follow so that this code is shareable?)

Darktex · 2025-10-17T21:49:13Z

Merging this to move quicker. Will refactor.

pseudo-rnd-thoughts

I realise I'm a bit late to the conversation but I'll give my two cent having maintained Gymnasium and thought about if a Gymnasium v2 would appear, what I would change.

Make it vectorized and multi-agent by default. Single agent, single environment are just special cases and can remove the need for another library to add these features later, i.e., PettingZoo for multi-agent RL as an equivalent to Gymnasium. Why vectorized? Further reduces duplication later if you want a server that runs multiple environments at the same time.

Personally, I would plan to bring these into the API, from the beginning.

These changes can be very easy to implement through num_envs = 1 and num_agents = 1 attributes (I would make them properties for easier customization by users).
Also, changing the rewards to be an array, this gives the freedom to the developer on the shape, i.e., (1,) and (num-envs, num-agents, num-reward-types) are all valid under a single API.

pseudo-rnd-thoughts · 2025-10-31T22:52:23Z

rfcs/001-openenv-spec.md

+@dataclass(kw_only=True)
+class Observation:
+    """Base class for all environment observations."""
+    done: bool = False


I would rename to episode_over given the distinction between termination and truncation, episodes with a defined ending (answer given, etc) and those where we end computation for an external reason. https://farama.org/Gymnasium-Terminated-Truncated-Step-API
While this differs from the old gym naming conversion, it is more clear naming for new users to understand in my opinion.

pseudo-rnd-thoughts · 2025-10-31T22:55:29Z

rfcs/001-openenv-spec.md

+class Observation:
+    """Base class for all environment observations."""
+    done: bool = False
+    reward: Union[bool, int, float, None] = None


I would use a numpy array as your reward, as if you have a single one that fine but allows more than one reward with the API having to be changed significantly (see #107 and multi-reward RL - https://github.com/Farama-Foundation/MO-Gymnasium).
Further, this allows easier vectorization of environments

Add OpenEnv 0.1 RFC for excution environment

943d702

pankit-eng requested review from Darktex and jspisak October 14, 2025 17:51

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 14, 2025

pankit-eng changed the title ~~Add OpenEnv 0.1 RFC for excution environment~~ Add OpenEnv 0.1 RFC for exceution environment Oct 14, 2025

jspisak approved these changes Oct 14, 2025

View reviewed changes

Update OpenEnv-0.1-RFC.md

1cfbf39

updated the name to OpenEnv

zkwentz changed the title ~~Add OpenEnv 0.1 RFC for exceution environment~~ [RFC 001] - Add OpenEnv 0.1 RFC for execution environment Oct 15, 2025

zkwentz changed the title ~~[RFC 001] - Add OpenEnv 0.1 RFC for execution environment~~ [RFC 001] - Baseline API and Interface Specifications Oct 15, 2025

jspisak and others added 16 commits October 15, 2025 16:22

Update README.md

ba976f4

adding a pytorch logo :)

Update README.md

2a8dc6f

Adding an experimental warning to the readme.

Update README.md

e7fcfc6

Creating a PR to update naming on the Readme

Add basic ChatEnv

8523bf5

Update src/envs/chat_env/server/chat_environment.py

d4e07e1

Co-authored-by: Copilot <[email protected]>

Create CODE_OF_CONDUCT.md

911da7a

adding the CoC..

Create CONTRIBUTING.md

a61e05c

Create LICENSE

a686771

Update local_coding_env.py

b966944

add dir structure to README

3998e50

Delete rfcs/MCPTools-integ-RFC.md

c6f3916

Merge branch 'main' into env_code

1ad07d2

rename RFC

76700f8

Update co-authors that were missed

f14dba1

Merge remote-tracking branch 'origin/main' into env_code

0908733

add a decision to RFC about reward computatio

4e752b2

pankit-eng force-pushed the env_code branch from 3f7acec to 4e752b2 Compare October 16, 2025 00:16

zkwentz added the RFC label Oct 16, 2025

Darktex reviewed Oct 17, 2025

View reviewed changes

Darktex merged commit 9f03488 into main Oct 17, 2025
1 check passed

pseudo-rnd-thoughts reviewed Oct 31, 2025

View reviewed changes


		These three APIs establish the minimum viable interface for environment interaction and are sufficient for basic RL training workflows. They align with established patterns from Gymnasium and similar frameworks, making them immediately familiar to practitioners.

		Scope: This RFC focuses exclusively on these baseline APIs. Additional APIs (e.g., `render()`, `seed()`, `close()`, `tools()` and environment-specific utilities) will be explored in follow-up RFCs.

[RFC 001] - Baseline API and Interface Specifications #26

[RFC 001] - Baseline API and Interface Specifications #26

Conversation

pankit-eng commented Oct 14, 2025

Uh oh!

jspisak commented Oct 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Darktex commented Oct 17, 2025

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants