Skip to content

Conversation

findmyway
Copy link
Member

No description provided.

@HenriDeh
Copy link
Member

Hello ! I just wanted to pop in to discuss a certain point.

In Actor-Critic algorithms, it is often the case that we must sample from the trajectory to update the critic, and then a second time to update the actor. It may also be so for model-based methods if there's a sampling to train the model.
Depending on the algorithm, it may be that the critic needs a different kind of batch than the actor/policy. I think of retrace that needs N (states, actions, rewards) batched to update the critic, but the actor needs a single state. So that means that the Sampler should be different. It may be nice to think of how to implement the possibility to specify multiple samplers. Maybe Sampler could allow for a Dict like so:

:policy => BatchSampler
:qnetwork => NStepBatchSampler (3 steps)
:model => NStepBatchSampler (100 steps)

And then algorithms would call e.g. inds, batch = traj.sampler[:policy](traj) when updating the policy but inds, batch = traj.sampler:qnetwork` when updating a qnetwork.

Each algorithm could check that the required sampler is present in the Dict during the check phase.

@findmyway
Copy link
Member Author

Thanks for providing early feedback.

Yes, it should be easy to support. We just need a meta sampler 😉

@findmyway findmyway merged commit dc220ed into JuliaReinforcementLearning:main May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants