diff --git a/units/en/unit3/additional-readings.mdx b/units/en/unit3/additional-readings.mdx index 2b9da601..d506f886 100644 --- a/units/en/unit3/additional-readings.mdx +++ b/units/en/unit3/additional-readings.mdx @@ -4,6 +4,7 @@ These are **optional readings** if you want to go deeper. - [Foundations of Deep RL Series, L2 Deep Q-Learning by Pieter Abbeel](https://youtu.be/Psrhxy88zww) - [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602) -- [Double Deep Q-Learning](https://papers.nips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html) +- [Double Q-Learning](https://papers.nips.cc/paper/3964-double-q-learning) +- [Double Deep Q-Learning](https://arxiv.org/abs/1509.06461) - [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952) - [Dueling Deep Q-Learning](https://arxiv.org/abs/1511.06581) diff --git a/units/en/unit3/deep-q-algorithm.mdx b/units/en/unit3/deep-q-algorithm.mdx index 28e7fd50..0f1c6ae0 100644 --- a/units/en/unit3/deep-q-algorithm.mdx +++ b/units/en/unit3/deep-q-algorithm.mdx @@ -84,7 +84,7 @@ Instead, what we see in the pseudo-code is that we: ## Double DQN [[double-dqn]] -Double DQNs, or Double Deep Q-Learning neural networks, were introducedĀ [by Hado van Hasselt](https://papers.nips.cc/paper/3964-double-q-learning). This methodĀ **handles the problem of the overestimation of Q-values.** +Double DQNs, or [Double Deep Q-Learning neural networks](https://arxiv.org/abs/1509.06461), extend the [Double Q-Learning algorithm](https://papers.nips.cc/paper/3964-double-q-learning), introduced by Hado van Hasselt. This methodĀ **handles the problem of the overestimation of Q-values.** To understand this problem, remember how we calculate the TD Target: