Skip to content

Normalization #7

@HenriDeh

Description

@HenriDeh

Hi there,

I'm opening this issue and assigning myself to it. I recently made a PR to RL.jl to add a RewardNormalizer.
I thought about it and I think it was not done the best way because it still must be "hard coded" into the algorithms that must use it. I think I figured out a way to implement a better solution via this repo.

Normalization is a two-phase thing:

  1. When generating new experience, the online stats are updated. This is done when pushing experience to the trajectory.
  2. When sampling to update a learner (or for any reason), normalize with the latest stats. This is done when fetching experience from the trajectory.

Said like that, it is clear that normalization is a trajectory thing. My proposition is to create a (several) trajectory wrappers to add a normalizer field. Roughly, we can make this as follows

  1. push!(a_normalized_trajectory[:trace], data) first updates a_normalized_trajectory.normalizer then does the normal push to a_normalized_trajectory.trajectory[:trace].
  2. (a_normalized_trajectory.trajectory.sampler)(a_normalized_trajectory) first samples with (sampler)(a_normalized_trajectory.trajectory) but normalizes the traces with before returning.

Note that I used :trace above, that's because this does not have to be restricted to rewards, state normalization is also very common in RL and I believe it could work just the same way.

Some notes

  • We should use OnlineStats.jl instead of a homemade version.
  • We must be careful about some samplers, I mainly think about NStepSampler where the sampled reward is a discounted sum of rewards and thus the normalization must be done per reward. This indicates that the normalization should be done a the earliest stage of sampling, not the latest (unlike what I describe above).
  • We must think about how to deal with async trajectories. I think nothing must be done on the workers' side.
  • Return normalization can also be done much more easily with the new trajectory design.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions