Skip to content

Commit a10e3ae

Browse files
authored
Release v0.9.0 (#174)
1 parent 5591257 commit a10e3ae

File tree

3 files changed

+8
-2
lines changed

3 files changed

+8
-2
lines changed

docs/misc/changelog.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@
33
Changelog
44
==========
55

6-
Pre-Release 0.9.0a2 (WIP)
6+
Pre-Release 0.9.0 (2020-10-03)
77
------------------------------
88

9+
**Bug fixes, get/set parameters and improved docs**
10+
911
Breaking Changes:
1012
^^^^^^^^^^^^^^^^^
1113
- Removed ``device`` keyword argument of policies; use ``policy.to(device)`` instead. (@qxcv)
@@ -50,6 +52,7 @@ Others:
5052
- Clarified docstrings on what is saved and loaded to/from files
5153
- Simplified ``save_to_zip_file`` function by removing duplicate code
5254
- Store library version along with the saved models
55+
- DQN loss is now logged
5356

5457
Documentation:
5558
^^^^^^^^^^^^^^

stable_baselines3/dqn/dqn.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,7 @@ def train(self, gradient_steps: int, batch_size: int = 100) -> None:
147147
# Update learning rate according to schedule
148148
self._update_learning_rate(self.policy.optimizer)
149149

150+
losses = []
150151
for gradient_step in range(gradient_steps):
151152
# Sample replay buffer
152153
replay_data = self.replay_buffer.sample(batch_size, env=self._vec_normalize_env)
@@ -169,6 +170,7 @@ def train(self, gradient_steps: int, batch_size: int = 100) -> None:
169170

170171
# Compute Huber loss (less sensitive to outliers)
171172
loss = F.smooth_l1_loss(current_q, target_q)
173+
losses.append(loss.item())
172174

173175
# Optimize the policy
174176
self.policy.optimizer.zero_grad()
@@ -181,6 +183,7 @@ def train(self, gradient_steps: int, batch_size: int = 100) -> None:
181183
self._n_updates += gradient_steps
182184

183185
logger.record("train/n_updates", self._n_updates, exclude="tensorboard")
186+
logger.record("train/loss", np.mean(losses))
184187

185188
def predict(
186189
self,

stable_baselines3/version.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.9.0a2
1+
0.9.0

0 commit comments

Comments
 (0)