Extended Data Fig. 4: Null models.
From: A distributional code for value in dopamine-based reinforcement learning

a, Classical TD plus noise does not give rise to the pattern of results observed in real dopamine data in the variable-magnitude task. When reversal points were estimated in two independent partitions there was no correlation between the two (P = 0.32 by linear regression). b, We then estimated asymmetric scaling of responses and found no correlation between this and reversal point (P = 0.78 by linear regression). c, Model comparison between ‘same’, a single reversal point, and ‘diverse’, separate reversal points. In both, the model is used to predict whether a held-out trial has a positive or negative response. d, Simulated baseline-subtracted RPEs, colour-coded according to the ground-truth value of bias added to that cell’s RPEs. e, Across all simulated cells, there was a strong positive relationship between pre-stimulus baseline firing and the estimated reversal point. f, Two independent measurements of the reversal point were strongly correlated. g, The proportion of simulated cells that have significantly positive (blue) or negative (red) responses showed no magnitudes with both positive and negative responses. h, In the simulation, there was a significant negative relationship between the estimated asymmetry of each cell and its estimated reversal point (opposite that observed in neural data). i, Diagram illustrating a Gaussian-weighted topological mapping between RPEs and value predictors. j, Varying the standard deviation of this Gaussian modulates the degree of coupling. k, In a task with equal chance of a reward 1.0 or 0.0, distributional TD with different levels of coupling shows robustness to the degree of coupling. l, When there is no coupling, a distributional code is not learned, but asymmetric scaling can cause spurious detection of diverse reversal points. m, Even though every cell has the same reward prediction they appear to have different reversal points. n, With this model, some cells may have significantly positive responses, and others significantly negative responses, in response to the same reward. o, But this model is unable to explain a positive correlation between asymmetric scaling and reversal points. p, Simulation of ‘synaptic’ distributional RL, in which learning rates but not firing rates are asymmetrically scaled. This model predicts diversity in reversal points between dopamine neurons. q, The model predicts no correlation between asymmetric scaling of firing rates and reversal point.