Discussion about this post

User's avatar
Alex Torex's avatar

Model based RL is essential for efficient learning. Unfortunately your model learning network needs to

learn incrementally from data under distribution shift. This needs continual learning , growing network capacity, plasticity. One reason why current deep learning nets can't handle well model based RL.

Expand full comment
Alex Torex's avatar

Interesting paper: Mastering Atari Games with Limited Data: https://arxiv.org/abs/2111.00210

Though still limited to simple environments

Despite the recent progress in the sample-efficient RL, today’s RL algorithms are still well behind human performance when the amount of data is limited. Although traditional model-based RL is 2 considered more sample efficient than model-free ones, current model-free methods dominate in terms of performance for image-input settings. In this paper, we propose a model-based RL algorithm that for the first time, achieves super-human performance on Atari games with limited data.

Through our ablations, we confirm the following three issues which pose challenges to algorithms like MuZero in data-limited settings.

- Lack of supervision on environment model.

- Hardness to deal with aleatoric uncertainty.

- Off-policy issues of multi-step value.

To address the above issues, we propose the following three critical modifications, which can greatly improve performance when samples are limited.

- Self-Supervised Consistency Loss

- End-To-End Prediction of the Value Prefix

- Model-Based Off-Policy Correction

Expand full comment
3 more comments...

No posts