Model based RL is essential for efficient learning. Unfortunately your model learning network needs to
learn incrementally from data under distribution shift. This needs continual learning , growing network capacity, plasticity. One reason why current deep learning nets can't handle well model based RL.
Despite the recent progress in the sample-efficient RL, today’s RL algorithms are still well behind human performance when the amount of data is limited. Although traditional model-based RL is 2 considered more sample efficient than model-free ones, current model-free methods dominate in terms of performance for image-input settings. In this paper, we propose a model-based RL algorithm that for the first time, achieves super-human performance on Atari games with limited data.
Through our ablations, we confirm the following three issues which pose challenges to algorithms like MuZero in data-limited settings.
- Lack of supervision on environment model.
- Hardness to deal with aleatoric uncertainty.
- Off-policy issues of multi-step value.
To address the above issues, we propose the following three critical modifications, which can greatly improve performance when samples are limited.
Well nature uses population of organisms to collect data , train on it and pass innate behaviors to offspring. When life appeared exploration was much safer because all simple life forms had plenty of nutrients and no predators.
No matter how much big dataset we collect it will never be enough to survive in the changing world. Big dataset implies very big neural network which may not be efficient in real-world if no ability to reform itself. We need population of robots cheap enough and able to learn , survive, multiply in the real-world doing simple tasks for us in the beginning. We need growing neural circuits with plasticity (something like Neural Attentive Circuits: https://arxiv.org/abs/2210.08031).
We need cell-like organisms that can build evolvable bodies. We need continual learning. We may need to build first bacteria like self-assembling robots that can cooperate and put them on Mars and wait for them to evolve into what we need. But then we will have to admit they are life-forms and we cannot use them for our purposes.
In the end we need advances in the biology, materials science, nanotech, genetics, brain prosthetics, neuromorphic chips etc. Deep learning algorithms will have to wait all other fields to advance.
Algorithms need to evolve together with brains to run it. There will be always need to tweak the algorithms. True AI itself should do it.
Model based RL is essential for efficient learning. Unfortunately your model learning network needs to
learn incrementally from data under distribution shift. This needs continual learning , growing network capacity, plasticity. One reason why current deep learning nets can't handle well model based RL.
Interesting paper: Mastering Atari Games with Limited Data: https://arxiv.org/abs/2111.00210
Though still limited to simple environments
Despite the recent progress in the sample-efficient RL, today’s RL algorithms are still well behind human performance when the amount of data is limited. Although traditional model-based RL is 2 considered more sample efficient than model-free ones, current model-free methods dominate in terms of performance for image-input settings. In this paper, we propose a model-based RL algorithm that for the first time, achieves super-human performance on Atari games with limited data.
Through our ablations, we confirm the following three issues which pose challenges to algorithms like MuZero in data-limited settings.
- Lack of supervision on environment model.
- Hardness to deal with aleatoric uncertainty.
- Off-policy issues of multi-step value.
To address the above issues, we propose the following three critical modifications, which can greatly improve performance when samples are limited.
- Self-Supervised Consistency Loss
- End-To-End Prediction of the Value Prefix
- Model-Based Off-Policy Correction
https://techxplore.com/news/2022-11-large-scale-virtual-visual-cortex-highly.html
https://ai.googleblog.com/2022/11/robots-that-write-their-own-code.html
Well nature uses population of organisms to collect data , train on it and pass innate behaviors to offspring. When life appeared exploration was much safer because all simple life forms had plenty of nutrients and no predators.
No matter how much big dataset we collect it will never be enough to survive in the changing world. Big dataset implies very big neural network which may not be efficient in real-world if no ability to reform itself. We need population of robots cheap enough and able to learn , survive, multiply in the real-world doing simple tasks for us in the beginning. We need growing neural circuits with plasticity (something like Neural Attentive Circuits: https://arxiv.org/abs/2210.08031).
We need cell-like organisms that can build evolvable bodies. We need continual learning. We may need to build first bacteria like self-assembling robots that can cooperate and put them on Mars and wait for them to evolve into what we need. But then we will have to admit they are life-forms and we cannot use them for our purposes.
In the end we need advances in the biology, materials science, nanotech, genetics, brain prosthetics, neuromorphic chips etc. Deep learning algorithms will have to wait all other fields to advance.
Algorithms need to evolve together with brains to run it. There will be always need to tweak the algorithms. True AI itself should do it.