Extended Data Fig. 2: A comprehensive set of behavioral and experimental variables was examined before selecting the reward predictors.
From: Population coding of strategic variables during foraging in freely moving macaques

(a) The probability of reward availability as a function of the scheduled reward rate and the time since the preceding response on the same box. (b) The Pearson correlation coefficient between the scheduled and the (observable) reward ratios is calculated using a recent sequence of reward outcomes defined in the main text. The recency was imposed by choosing the standard deviation of a causal half-Gaussian filter (x-axis). For each monkey, the reward ratio was calculated using the standard deviation for achieving the maximum correlation with the scheduled reward ratio. (c-e) Other observable/controllable variables that the animal might have used to predict rewards. (c) Similar to Fig. 2a, but for the inverse of the time since the last reward, the inverse of the number of unrewarded presses (losses), and the reward per press (the binary sequence of rewarded (1) and unrewarded (0) presses, filtered using the same causal half-Gaussian filter that was used for calculation of the reward ratio). (d) Correlation matrix between observable and unobservable reward predictors. Based on this matrix, the waiting time was chosen as one reward predictor because it was maximally correlated with the hidden probability of reward availability, and the reward ratio was selected as another reward predictor because it was maximally correlated with the scheduled reward rate and minimally correlated with the waiting time. Other predictors were omitted because they were correlated with either of these two variables (|r | >0.21). (e) Change of the next action (n = 8862) as a function of each reward predictor candidate that was discretized into three bins with an equal number of presses in each bin. The x-axis shows the center of the bins. The y-axis shows the excess percentage of switches or waiting time when unrewarded compared to when rewarded. Two chosen reward predictors, waiting time and reward ratio, linearly predict the change of the following action. Data are presented as mean values +/- SEM.