Normalization

Hi there, 

I'm opening this issue and assigning myself to it. I recently made a PR to RL.jl to add a RewardNormalizer. 
I thought about it and I think it was not done the best way because it still must be "hard coded" into the algorithms that must use it. I think I figured out a way to implement a better solution via this repo. 

Normalization is a two-phase thing: 
1. When generating new experience, the online stats are updated. This is done when pushing experience to the trajectory.
2. When sampling to update a learner (or for any reason), normalize with the latest stats. This is done when fetching experience from the trajectory.

Said like that, it is clear that normalization is a trajectory thing. My proposition is to create a (several) trajectory wrappers to add a normalizer field. Roughly, we can make this as follows
1.  `push!(a_normalized_trajectory[:trace], data)` first updates `a_normalized_trajectory.normalizer` then does the normal push to `a_normalized_trajectory.trajectory[:trace]`.
2. `(a_normalized_trajectory.trajectory.sampler)(a_normalized_trajectory)` first samples with `(sampler)(a_normalized_trajectory.trajectory)` but normalizes the traces with before returning.

Note that I used `:trace` above, that's because this does not have to be restricted to rewards, state normalization is also very common in RL and I believe it could work just the same way.

**Some notes**
- We should use [OnlineStats.jl](https://github.com/joshday/OnlineStats.jl) instead of a homemade version. 
- We must be careful about some samplers, I mainly think about NStepSampler where the sampled reward is a discounted sum of rewards and thus the normalization must be done per reward. This indicates that the normalization should be done a the earliest stage of sampling, not the latest (unlike what I describe above).
- We must think about how to deal with async trajectories. I think nothing must be done on the workers' side.
- Return normalization can also be done much more easily with the new trajectory design.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalization #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Normalization #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions