AI Insight
This study investigates whether the brain uses reward prediction errors not only to update behavioral predictions, but also to actively reshape the underlying state representations themselves — a principle borrowed from machine learning. By simultaneously recording neural activity in the striatum (olfactory tubercle) and dopamine neurons in the ventral tegmental area, the researchers found that trial-by-trial changes in striatal activity align more closely with a dopamine-driven representation learning mechanism than with simpler, fixed-representation alternatives. This suggests that the mesolimbic system may implement a form of deep reinforcement learning, where errors propagate back to refine how the brain encodes environmental states.
Why it matters
Understanding how the brain dynamically constructs and updates internal representations could advance our knowledge of learning-related disorders such as addiction and depression, which involve dopaminergic dysfunction. It also provides biological grounding for principles used in modern artificial intelligence, potentially inspiring more brain-like learning algorithms.
⚠️ Preprint – Noch nicht peer-reviewed
Dieser Artikel wurde noch nicht von unabhängigen Experten begutachtet. Die Ergebnisse sind vorläufig und sollten mit Vorsicht interpretiert werden.
In reinforcement learning, an agent learns to map representations of the environment state to predictions of future reward. Most prior work in neuroscience has assumed a fixed representation and studied how reward prediction errors (thought to be conveyed by phasic dopamine signals) are used to update the mapping from representations to predictions. However, work in machine learning has demonstrated that much more powerful predictive systems can be learned by using the errors to update the representations themselves. We study whether the brain does something similar by leveraging simultaneous recordings of striatal projection neurons in the olfactory tubercle (putatively representing state features) and dopamine neurons in the ventral tegmental area. We show that trial-by-trial changes in striatal activity are more consistent with dopamine-driven representation learning than a variety of alternative updating schemes. This result suggests a convergence of representation learning principles in biological and artificial systems.
Source: Error-driven representation learning in the mesolimbic system