Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 10 additions & 43 deletions examples/OpenEnv_Tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
"cells": [
{
"cell_type": "markdown",
"id": "cell-0",
"metadata": {},
"source": [
"<div align=\"center\">\n",
Expand Down Expand Up @@ -50,13 +49,13 @@
"\n",
"What do you do beyond Cartpole?\n",
"\n",
"Fast forward to 2025, GRPO is awesome and this time it's not JUST in theory, it works well in practise and is really here! \n",
"Fast-forward to 2025, GRPO is awesome and this time it's not JUST in theory, it works well in practise and is really here!\n",
"\n",
"The problem still remains, how do you take these RL algorithms and take them beyond Cartpole?\n",
"\n",
"A huge part of RL is giving your algorithms environment access to learn. \n",
"\n",
"We are excited to introduce an Environement Spec for adding Open Environments for RL Training. This will allow you to focus on your experiments and allow everyone to bring their environments. \n",
"We are excited to introduce an Environment Spec for adding Open Environments for RL Training. This will allow you to focus on your experiments and allow everyone to bring their environments.\n",
"\n",
"Focus on experiments, use OpenEnvironments, and build agents that go beyond Cartpole on a single spec.\n",
"\n",
Expand All @@ -65,7 +64,6 @@
},
{
"cell_type": "markdown",
"id": "cell-1",
"metadata": {},
"source": [
"## 📋 What You'll Learn\n",
Expand Down Expand Up @@ -116,7 +114,6 @@
},
{
"cell_type": "markdown",
"id": "cell-2",
"metadata": {},
"source": [
"---\n",
Expand Down Expand Up @@ -156,7 +153,6 @@
},
{
"cell_type": "markdown",
"id": "cell-3",
"metadata": {},
"source": [
"---\n",
Expand Down Expand Up @@ -187,7 +183,6 @@
{
"cell_type": "code",
"execution_count": 3,
"id": "cell-4",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -253,7 +248,6 @@
},
{
"cell_type": "markdown",
"id": "cell-5",
"metadata": {},
"source": [
"---\n",
Expand Down Expand Up @@ -327,7 +321,6 @@
},
{
"cell_type": "markdown",
"id": "cell-6",
"metadata": {},
"source": [
"### The Architecture\n",
Expand Down Expand Up @@ -375,7 +368,6 @@
},
{
"cell_type": "markdown",
"id": "cell-7",
"metadata": {},
"source": [
"---\n",
Expand All @@ -394,7 +386,6 @@
{
"cell_type": "code",
"execution_count": 4,
"id": "cell-8",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -443,7 +434,6 @@
},
{
"cell_type": "markdown",
"id": "cell-9",
"metadata": {},
"source": [
"---\n",
Expand Down Expand Up @@ -477,7 +467,6 @@
{
"cell_type": "code",
"execution_count": 5,
"id": "cell-10",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -576,7 +565,6 @@
},
{
"cell_type": "markdown",
"id": "cell-11",
"metadata": {},
"source": [
"---\n",
Expand Down Expand Up @@ -622,7 +610,6 @@
{
"cell_type": "code",
"execution_count": 6,
"id": "cell-12",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -722,7 +709,6 @@
{
"cell_type": "code",
"execution_count": 7,
"id": "cell-13",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -810,7 +796,6 @@
},
{
"cell_type": "markdown",
"id": "cell-14",
"metadata": {},
"source": [
"### How the Client Works\n",
Expand All @@ -830,7 +815,6 @@
},
{
"cell_type": "markdown",
"id": "cell-15",
"metadata": {},
"source": [
"---\n",
Expand All @@ -857,7 +841,6 @@
},
{
"cell_type": "markdown",
"id": "cell-16",
"metadata": {},
"source": [
"## The Game: Catch 🔴🏓\n",
Expand Down Expand Up @@ -918,7 +901,6 @@
{
"cell_type": "code",
"execution_count": 8,
"id": "cell-17",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -990,7 +972,6 @@
{
"cell_type": "code",
"execution_count": 9,
"id": "cell-18",
"metadata": {},
"outputs": [
{
Expand All @@ -1009,15 +990,15 @@
"evalue": "Command '['d:\\\\ANACONDA\\\\envs\\\\openenv\\\\python.exe', '-m', 'pip', 'install', '-q', 'open_spiel']' returned non-zero exit status 1.",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[1;32mIn[9], line 12\u001b[0m\n\u001b[0;32m 11\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 12\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mpyspiel\u001b[39;00m\n\u001b[0;32m 13\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m✅ OpenSpiel is installed!\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
"\u001b[1;31mModuleNotFoundError\u001b[0m: No module named 'pyspiel'",
"\u001B[1;31m---------------------------------------------------------------------------\u001B[0m",
"\u001B[1;31mModuleNotFoundError\u001B[0m Traceback (most recent call last)",
"Cell \u001B[1;32mIn[9], line 12\u001B[0m\n\u001B[0;32m 11\u001B[0m \u001B[38;5;28;01mtry\u001B[39;00m:\n\u001B[1;32m---> 12\u001B[0m \u001B[38;5;28;01mimport\u001B[39;00m\u001B[38;5;250m \u001B[39m\u001B[38;5;21;01mpyspiel\u001B[39;00m\n\u001B[0;32m 13\u001B[0m \u001B[38;5;28mprint\u001B[39m(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m✅ OpenSpiel is installed!\u001B[39m\u001B[38;5;130;01m\\n\u001B[39;00m\u001B[38;5;124m\"\u001B[39m)\n",
"\u001B[1;31mModuleNotFoundError\u001B[0m: No module named 'pyspiel'",
"\nDuring handling of the above exception, another exception occurred:\n",
"\u001b[1;31mCalledProcessError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[1;32mIn[9], line 17\u001b[0m\n\u001b[0;32m 15\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m⚠️ OpenSpiel not found. Installing...\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 16\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01msubprocess\u001b[39;00m\n\u001b[1;32m---> 17\u001b[0m \u001b[43msubprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcheck_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43m[\u001b[49m\u001b[43msys\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecutable\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m-m\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mpip\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43minstall\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m-q\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mopen_spiel\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 18\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m✅ OpenSpiel installed!\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 20\u001b[0m \u001b[38;5;66;03m# Start the OpenSpiel server in background\u001b[39;00m\n",
"File \u001b[1;32md:\\ANACONDA\\envs\\openenv\\Lib\\subprocess.py:413\u001b[0m, in \u001b[0;36mcheck_call\u001b[1;34m(*popenargs, **kwargs)\u001b[0m\n\u001b[0;32m 411\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m cmd \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m 412\u001b[0m cmd \u001b[38;5;241m=\u001b[39m popenargs[\u001b[38;5;241m0\u001b[39m]\n\u001b[1;32m--> 413\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m CalledProcessError(retcode, cmd)\n\u001b[0;32m 414\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;241m0\u001b[39m\n",
"\u001b[1;31mCalledProcessError\u001b[0m: Command '['d:\\\\ANACONDA\\\\envs\\\\openenv\\\\python.exe', '-m', 'pip', 'install', '-q', 'open_spiel']' returned non-zero exit status 1."
"\u001B[1;31mCalledProcessError\u001B[0m Traceback (most recent call last)",
"Cell \u001B[1;32mIn[9], line 17\u001B[0m\n\u001B[0;32m 15\u001B[0m \u001B[38;5;28mprint\u001B[39m(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m⚠️ OpenSpiel not found. Installing...\u001B[39m\u001B[38;5;124m\"\u001B[39m)\n\u001B[0;32m 16\u001B[0m \u001B[38;5;28;01mimport\u001B[39;00m\u001B[38;5;250m \u001B[39m\u001B[38;5;21;01msubprocess\u001B[39;00m\n\u001B[1;32m---> 17\u001B[0m \u001B[43msubprocess\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mcheck_call\u001B[49m\u001B[43m(\u001B[49m\u001B[43m[\u001B[49m\u001B[43msys\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mexecutable\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43m-m\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mpip\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43minstall\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43m-q\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mopen_spiel\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m]\u001B[49m\u001B[43m)\u001B[49m\n\u001B[0;32m 18\u001B[0m \u001B[38;5;28mprint\u001B[39m(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m✅ OpenSpiel installed!\u001B[39m\u001B[38;5;130;01m\\n\u001B[39;00m\u001B[38;5;124m\"\u001B[39m)\n\u001B[0;32m 20\u001B[0m \u001B[38;5;66;03m# Start the OpenSpiel server in background\u001B[39;00m\n",
"File \u001B[1;32md:\\ANACONDA\\envs\\openenv\\Lib\\subprocess.py:413\u001B[0m, in \u001B[0;36mcheck_call\u001B[1;34m(*popenargs, **kwargs)\u001B[0m\n\u001B[0;32m 411\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m cmd \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m:\n\u001B[0;32m 412\u001B[0m cmd \u001B[38;5;241m=\u001B[39m popenargs[\u001B[38;5;241m0\u001B[39m]\n\u001B[1;32m--> 413\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m CalledProcessError(retcode, cmd)\n\u001B[0;32m 414\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;241m0\u001B[39m\n",
"\u001B[1;31mCalledProcessError\u001B[0m: Command '['d:\\\\ANACONDA\\\\envs\\\\openenv\\\\python.exe', '-m', 'pip', 'install', '-q', 'open_spiel']' returned non-zero exit status 1."
]
}
],
Expand Down Expand Up @@ -1102,7 +1083,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "cell-19",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -1124,7 +1104,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "cell-20",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -1173,7 +1152,6 @@
},
{
"cell_type": "markdown",
"id": "cell-21",
"metadata": {},
"source": [
"---\n",
Expand Down Expand Up @@ -1220,7 +1198,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "cell-22",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -1319,7 +1296,6 @@
},
{
"cell_type": "markdown",
"id": "cell-23",
"metadata": {},
"source": [
"### Watch a Policy Play!"
Expand All @@ -1328,7 +1304,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "cell-24",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -1397,7 +1372,6 @@
},
{
"cell_type": "markdown",
"id": "cell-25",
"metadata": {},
"source": [
"---\n",
Expand All @@ -1416,7 +1390,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "cell-26",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -1477,7 +1450,6 @@
},
{
"cell_type": "markdown",
"id": "cell-27",
"metadata": {},
"source": [
"---\n",
Expand Down Expand Up @@ -1576,7 +1548,6 @@
},
{
"cell_type": "markdown",
"id": "cell-28",
"metadata": {},
"source": [
"---\n",
Expand Down Expand Up @@ -1711,7 +1682,6 @@
},
{
"cell_type": "markdown",
"id": "cell-29",
"metadata": {},
"source": [
"---\n",
Expand All @@ -1725,7 +1695,6 @@
},
{
"cell_type": "markdown",
"id": "cell-30",
"metadata": {},
"source": [
"## What You Learned\n",
Expand Down Expand Up @@ -1778,7 +1747,6 @@
},
{
"cell_type": "markdown",
"id": "cell-31",
"metadata": {},
"source": [
"## OpenEnv vs Traditional RL\n",
Expand Down Expand Up @@ -1845,7 +1813,6 @@
},
{
"cell_type": "markdown",
"id": "cell-32",
"metadata": {},
"source": [
"<a id=\"resources\"></a>\n",
Expand Down
2 changes: 1 addition & 1 deletion rfcs/002-env-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Building execution environments for AI agents, code execution, or computational
1. **Simplicity**: Simple APIs to interact with the environment from RL training code
2. **Type Safety**: Strongly-typed actions, observations, and state
3. **Isolation**: Each environment runs in its own Docker container
4. **Observability**: Leverage side-car container pattern to observe actions, observation tuples for an RL training eposide.
4. **Observability**: Leverage side-car container pattern to observe actions, observation tuples for an RL training episode.


## Design
Expand Down