Skip to content

The official implementation of the paper ''Constant-Memory Strategies in Stochastic Games: Best Responses and Equilibria''

Notifications You must be signed in to change notification settings

Fernadoo/Const-Mem

Repository files navigation

Notebooks and Experiments

This code base is tested under Python 3.9.18

1. Ready-to-run notebooks that are mentioned in Theorem 1&2.

  • kMemBR.ipynb demonstrates how to compute BRs in IPD
  • kMemNE_full.ipynb demonstrates how to compute NEs in IPD

2. Compute BR to strategies in Axelrod's IPD library

python axl2tab.py

3. Compute BR in ITD

# usage: python OPPO_MEM MY_MEM NUM_TRAIN_EPISODE MAX_ITD_ROUND TRAIN_or_EVAL, e.g.,
python itd_learn.py 2 1 1e6 20 train

4. RL in the Pursuit Domain

  1. NE seeking
# Usage: python marl.py --alg "$ALG" --cfg "$CFG" --stack "$STACK" --it "$IT" --device "cuda:$DEVICE", e.g.,
python marl.py --alg DQN --cfg MidCatchMap16 --stack 4 --it 30e6 --device cuda:0
  1. 1-to-7 best response
# Usage: python marl_sbr.py --alg "$ALG" --cfg "$CFG" --stack "$STACK" --it "$IT" --device "cuda:$DEVICE" --load-op "$OP" --op-stack "$OPSTACK", e.g.,
python marl_sbr.py --alg DQN --cfg MidCatchMap16 --stack 4 --it 10e6 --device cuda:0 --load-op DQN_MidCatchMap16_8_60e6_2025-09-09-190423 --op-stack 8
  1. 4-to-4 best response
# Usage: python marl_team_sbr.py --alg "$ALG" --cfg "$CFG" --stack "$STACK" --it "$IT" --device "cuda:$DEVICE" --load-op "$OP" --op-stack "$OPSTACK", e.g.,
python marl_team_sbr.py --alg DQN --cfg MidCatchMap16 --stack 4 --it 10e6 --device cuda:0 --load-op DQN_MidCatchMap16_8_60e6_2025-09-09-190423 --op-stack 8

Notes

  • The detailed usage can be further found using the --help flag.
  • Only some prototype pertained model is included.

About

The official implementation of the paper ''Constant-Memory Strategies in Stochastic Games: Best Responses and Equilibria''

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published