huggingface · andryr · Aug 28, 2025
diff --git a/units/en/unit3/additional-readings.mdx b/units/en/unit3/additional-readings.mdx
@@ -4,6 +4,7 @@ These are **optional readings** if you want to go deeper.
 
 - [Foundations of Deep RL Series, L2 Deep Q-Learning by Pieter Abbeel](https://youtu.be/Psrhxy88zww)
 - [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602)
-- [Double Deep Q-Learning](https://papers.nips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html)
+- [Double Q-Learning](https://papers.nips.cc/paper/3964-double-q-learning)
+- [Double Deep Q-Learning](https://arxiv.org/abs/1509.06461)
 - [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952)
 - [Dueling Deep Q-Learning](https://arxiv.org/abs/1511.06581)
diff --git a/units/en/unit3/deep-q-algorithm.mdx b/units/en/unit3/deep-q-algorithm.mdx
@@ -84,7 +84,7 @@ Instead, what we see in the pseudo-code is that we:
 
 ## Double DQN [[double-dqn]]
 
-Double DQNs, or Double Deep Q-Learning neural networks, were introduced [by Hado van Hasselt](https://papers.nips.cc/paper/3964-double-q-learning). This method **handles the problem of the overestimation of Q-values.**
+Double DQNs, or [Double Deep Q-Learning neural networks](https://arxiv.org/abs/1509.06461), extend the [Double Q-Learning algorithm](https://papers.nips.cc/paper/3964-double-q-learning), introduced by Hado van Hasselt. This method **handles the problem of the overestimation of Q-values.**
 
 To understand this problem, remember how we calculate the TD Target: