Skip to content

Commit 5591257

Browse files
authored
Cleanup docstring types (#169)
* Cleanup docstring types * Update style * Test with js hack * Revert "Test with js hack" This reverts commit d091f43. * Fix types * Fix typo * Update CONTRIBUTING example
1 parent 2c924f5 commit 5591257

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+962
-950
lines changed

CONTRIBUTING.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,9 @@ def my_function(arg1: type1, arg2: type2) -> returntype:
5050
"""
5151
Short description of the function.
5252
53-
:param arg1: (type1) describe what is arg1
54-
:param arg2: (type2) describe what is arg2
55-
:return: (returntype) describe what is returned
53+
:param arg1: describe what is arg1
54+
:param arg2: describe what is arg2
55+
:return: describe what is returned
5656
"""
5757
...
5858
return my_variable

docs/_static/css/baselines_theme.css

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,3 +50,12 @@ a.icon.icon-home {
5050
.codeblock,pre.literal-block,.rst-content .literal-block,.rst-content pre.literal-block,div[class^='highlight'] {
5151
background: #f8f8f8;;
5252
}
53+
54+
/* Change style of types in the docstrings .rst-content .field-list */
55+
.field-list .xref.py.docutils, .field-list code.docutils, .field-list .docutils.literal.notranslate
56+
{
57+
border: None;
58+
padding-left: 0;
59+
padding-right: 0;
60+
color: #404040;
61+
}

docs/misc/changelog.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ Documentation:
5656
- Added ``StopTrainingOnMaxEpisodes`` details and example (@xicocaio)
5757
- Updated custom policy section (added custom feature extractor example)
5858
- Re-enable ``sphinx_autodoc_typehints``
59+
- Updated doc style for type hints and remove duplicated type hints
5960

6061

6162

stable_baselines3/a2c/a2c.py

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -21,34 +21,34 @@ class A2C(OnPolicyAlgorithm):
2121
2222
Introduction to A2C: https://hackernoon.com/intuitive-rl-intro-to-advantage-actor-critic-a2c-4ff545978752
2323
24-
:param policy: (ActorCriticPolicy or str) The policy model to use (MlpPolicy, CnnPolicy, ...)
25-
:param env: (Gym environment or str) The environment to learn from (if registered in Gym, can be str)
26-
:param learning_rate: (float or callable) The learning rate, it can be a function
27-
:param n_steps: (int) The number of steps to run for each environment per update
24+
:param policy: The policy model to use (MlpPolicy, CnnPolicy, ...)
25+
:param env: The environment to learn from (if registered in Gym, can be str)
26+
:param learning_rate: The learning rate, it can be a function
27+
:param n_steps: The number of steps to run for each environment per update
2828
(i.e. batch size is n_steps * n_env where n_env is number of environment copies running in parallel)
29-
:param gamma: (float) Discount factor
30-
:param gae_lambda: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator
29+
:param gamma: Discount factor
30+
:param gae_lambda: Factor for trade-off of bias vs variance for Generalized Advantage Estimator
3131
Equivalent to classic advantage when set to 1.
32-
:param ent_coef: (float) Entropy coefficient for the loss calculation
33-
:param vf_coef: (float) Value function coefficient for the loss calculation
34-
:param max_grad_norm: (float) The maximum value for the gradient clipping
35-
:param rms_prop_eps: (float) RMSProp epsilon. It stabilizes square root computation in denominator
32+
:param ent_coef: Entropy coefficient for the loss calculation
33+
:param vf_coef: Value function coefficient for the loss calculation
34+
:param max_grad_norm: The maximum value for the gradient clipping
35+
:param rms_prop_eps: RMSProp epsilon. It stabilizes square root computation in denominator
3636
of RMSProp update
37-
:param use_rms_prop: (bool) Whether to use RMSprop (default) or Adam as optimizer
38-
:param use_sde: (bool) Whether to use generalized State Dependent Exploration (gSDE)
37+
:param use_rms_prop: Whether to use RMSprop (default) or Adam as optimizer
38+
:param use_sde: Whether to use generalized State Dependent Exploration (gSDE)
3939
instead of action noise exploration (default: False)
40-
:param sde_sample_freq: (int) Sample a new noise matrix every n steps when using gSDE
40+
:param sde_sample_freq: Sample a new noise matrix every n steps when using gSDE
4141
Default: -1 (only sample at the beginning of the rollout)
42-
:param normalize_advantage: (bool) Whether to normalize or not the advantage
43-
:param tensorboard_log: (str) the log location for tensorboard (if None, no logging)
44-
:param create_eval_env: (bool) Whether to create a second environment that will be
42+
:param normalize_advantage: Whether to normalize or not the advantage
43+
:param tensorboard_log: the log location for tensorboard (if None, no logging)
44+
:param create_eval_env: Whether to create a second environment that will be
4545
used for evaluating the agent periodically. (Only available when passing string for the environment)
46-
:param policy_kwargs: (dict) additional arguments to be passed to the policy on creation
47-
:param verbose: (int) the verbosity level: 0 no output, 1 info, 2 debug
48-
:param seed: (int) Seed for the pseudo random generators
49-
:param device: (str or th.device) Device (cpu, cuda, ...) on which the code should be run.
46+
:param policy_kwargs: additional arguments to be passed to the policy on creation
47+
:param verbose: the verbosity level: 0 no output, 1 info, 2 debug
48+
:param seed: Seed for the pseudo random generators
49+
:param device: Device (cpu, cuda, ...) on which the code should be run.
5050
Setting it to auto, the code will be run on the GPU if possible.
51-
:param _init_setup_model: (bool) Whether or not to build the network at the creation of the instance
51+
:param _init_setup_model: Whether or not to build the network at the creation of the instance
5252
"""
5353

5454
def __init__(

stable_baselines3/common/atari_wrappers.py

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ def __init__(self, env: gym.Env, noop_max: int = 30):
1818
Sample initial states by taking random number of no-ops on reset.
1919
No-op is assumed to be action 0.
2020
21-
:param env: (gym.Env) the environment to wrap
22-
:param noop_max: (int) the maximum value of no-ops to run
21+
:param env: the environment to wrap
22+
:param noop_max: the maximum value of no-ops to run
2323
"""
2424
gym.Wrapper.__init__(self, env)
2525
self.noop_max = noop_max
@@ -47,7 +47,7 @@ def __init__(self, env: gym.Env):
4747
"""
4848
Take action on reset for environments that are fixed until firing.
4949
50-
:param env: (gym.Env) the environment to wrap
50+
:param env: the environment to wrap
5151
"""
5252
gym.Wrapper.__init__(self, env)
5353
assert env.unwrapped.get_action_meanings()[1] == "FIRE"
@@ -70,7 +70,7 @@ def __init__(self, env: gym.Env):
7070
Make end-of-life == end-of-episode, but only reset on true game over.
7171
Done by DeepMind for the DQN and co. since it helps value estimation.
7272
73-
:param env: (gym.Env) the environment to wrap
73+
:param env: the environment to wrap
7474
"""
7575
gym.Wrapper.__init__(self, env)
7676
self.lives = 0
@@ -97,7 +97,7 @@ def reset(self, **kwargs) -> np.ndarray:
9797
and the learner need not know about any of this behind-the-scenes.
9898
9999
:param kwargs: Extra keywords passed to env.reset() call
100-
:return: (np.ndarray) the first observation of the environment
100+
:return: the first observation of the environment
101101
"""
102102
if self.was_real_done:
103103
obs = self.env.reset(**kwargs)
@@ -113,8 +113,8 @@ def __init__(self, env: gym.Env, skip: int = 4):
113113
"""
114114
Return only every ``skip``-th frame (frameskipping)
115115
116-
:param env: (gym.Env) the environment
117-
:param skip: (int) number of ``skip``-th frame
116+
:param env: the environment
117+
:param skip: number of ``skip``-th frame
118118
"""
119119
gym.Wrapper.__init__(self, env)
120120
# most recent raw observations (for max pooling across time steps)
@@ -126,8 +126,8 @@ def step(self, action: int) -> GymStepReturn:
126126
Step the environment with the given action
127127
Repeat action, sum reward, and max over last observations.
128128
129-
:param action: ([int] or [float]) the action
130-
:return: ([int] or [float], [float], [bool], dict) observation, reward, done, information
129+
:param action: the action
130+
:return: observation, reward, done, information
131131
"""
132132
total_reward = 0.0
133133
done = None
@@ -155,16 +155,16 @@ def __init__(self, env: gym.Env):
155155
"""
156156
Clips the reward to {+1, 0, -1} by its sign.
157157
158-
:param env: (gym.Env) the environment
158+
:param env: the environment
159159
"""
160160
gym.RewardWrapper.__init__(self, env)
161161

162162
def reward(self, reward: float) -> float:
163163
"""
164164
Bin reward to {+1, 0, -1} by its sign.
165165
166-
:param reward: (float)
167-
:return: (float)
166+
:param reward:
167+
:return:
168168
"""
169169
return np.sign(reward)
170170

@@ -175,9 +175,9 @@ def __init__(self, env: gym.Env, width: int = 84, height: int = 84):
175175
Convert to grayscale and warp frames to 84x84 (default)
176176
as done in the Nature paper and later work.
177177
178-
:param env: (gym.Env) the environment
179-
:param width: (int)
180-
:param height: (int)
178+
:param env: the environment
179+
:param width:
180+
:param height:
181181
"""
182182
gym.ObservationWrapper.__init__(self, env)
183183
self.width = width
@@ -190,8 +190,8 @@ def observation(self, frame: np.ndarray) -> np.ndarray:
190190
"""
191191
returns the current observation from a frame
192192
193-
:param frame: (np.ndarray) environment frame
194-
:return: (np.ndarray) the observation
193+
:param frame: environment frame
194+
:return: the observation
195195
"""
196196
frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
197197
frame = cv2.resize(frame, (self.width, self.height), interpolation=cv2.INTER_AREA)
@@ -212,13 +212,13 @@ class AtariWrapper(gym.Wrapper):
212212
* Grayscale observation
213213
* Clip reward to {-1, 0, 1}
214214
215-
:param env: (gym.Env) gym environment
216-
:param noop_max: (int): max number of no-ops
217-
:param frame_skip: (int): the frequency at which the agent experiences the game.
218-
:param screen_size: (int): resize Atari frame
219-
:param terminal_on_life_loss: (bool): if True, then step() returns done=True whenever a
215+
:param env: gym environment
216+
:param noop_max:: max number of no-ops
217+
:param frame_skip:: the frequency at which the agent experiences the game.
218+
:param screen_size:: resize Atari frame
219+
:param terminal_on_life_loss:: if True, then step() returns done=True whenever a
220220
life is lost.
221-
:param clip_reward: (bool) If True (default), the reward is clip to {-1, 0, 1} depending on its sign.
221+
:param clip_reward: If True (default), the reward is clip to {-1, 0, 1} depending on its sign.
222222
"""
223223

224224
def __init__(

0 commit comments

Comments
 (0)