Skip to content

Commit 680e83a

Browse files
authored
[doc] Update the order of zero_grad and backward (#6478)
* Fix zero_grad in docs * Fix zero_grad in docs
1 parent 518c7e4 commit 680e83a

File tree

2 files changed

+21
-11
lines changed

2 files changed

+21
-11
lines changed

docs/source/common/lightning_module.rst

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -178,12 +178,14 @@ Under the hood, Lightning does the following (pseudocode):
178178
loss = training_step(batch)
179179
losses.append(loss.detach())
180180

181+
# clear gradients
182+
optimizer.zero_grad()
183+
181184
# backward
182185
loss.backward()
183186

184-
# apply and clear grads
187+
# update parameters
185188
optimizer.step()
186-
optimizer.zero_grad()
187189

188190

189191
Training epoch-level metrics
@@ -212,12 +214,14 @@ Here's the pseudocode of what it does under the hood:
212214
# forward
213215
out = training_step(val_batch)
214216

217+
# clear gradients
218+
optimizer.zero_grad()
219+
215220
# backward
216221
loss.backward()
217222

218-
# apply and clear grads
223+
# update parameters
219224
optimizer.step()
220-
optimizer.zero_grad()
221225

222226
epoch_metric = torch.mean(torch.stack([x['train_loss'] for x in outs]))
223227

@@ -247,12 +251,14 @@ The matching pseudocode is:
247251
# forward
248252
out = training_step(val_batch)
249253

254+
# clear gradients
255+
optimizer.zero_grad()
256+
250257
# backward
251258
loss.backward()
252259

253-
# apply and clear grads
260+
# update parameters
254261
optimizer.step()
255-
optimizer.zero_grad()
256262

257263
training_epoch_end(outs)
258264

@@ -946,9 +952,9 @@ When set to ``False``, Lightning does not automate the optimization process. Thi
946952
opt = self.optimizers(use_pl_optimizer=True)
947953

948954
loss = ...
955+
opt.zero_grad()
949956
self.manual_backward(loss)
950957
opt.step()
951-
opt.zero_grad()
952958

953959
This is recommended only if using 2+ optimizers AND if you know how to perform the optimization procedure properly. Note that automatic optimization can still be used with multiple optimizers by relying on the ``optimizer_idx`` parameter. Manual optimization is most useful for research topics like reinforcement learning, sparse coding, and GAN research.
954960

@@ -1048,11 +1054,13 @@ This is the pseudocode to describe how all the hooks are called during a call to
10481054

10491055
loss = out.loss
10501056

1057+
on_before_zero_grad()
1058+
optimizer_zero_grad()
1059+
10511060
backward()
10521061
on_after_backward()
1062+
10531063
optimizer_step()
1054-
on_before_zero_grad()
1055-
optimizer_zero_grad()
10561064

10571065
on_train_batch_end(out)
10581066

docs/source/common/trainer.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,12 +75,14 @@ Here's the pseudocode for what the trainer does under the hood (showing the trai
7575
# train step
7676
loss = training_step(batch)
7777

78+
# clear gradients
79+
optimizer.zero_grad()
80+
7881
# backward
7982
loss.backward()
8083

81-
# apply and clear grads
84+
# update parameters
8285
optimizer.step()
83-
optimizer.zero_grad()
8486

8587
losses.append(loss)
8688

0 commit comments

Comments
 (0)