Main

From remarkable life scenarios to mundane everyday events, episodic memory plays a crucial role in shaping our perception of the world and our sense of self. Our brains not only store snapshots of individual events but also temporally weave together sequential contiguous moments into rich and coherent memories for future use. A fundamental open question in human memory is, how do we encode and retrieve memories of continuous experience?

Theoretical work suggests that the encoding, binding and compressing of sequential events into a coherent memory might rely on a temporal neural code1,2,3,4,5,6. A key mechanism that gives rise to temporal coding is phase precession, whereby neurons spike in progressively earlier phases of ongoing theta oscillations relative to the onset of a memorable event7,8 or the ___location of the animal in the case of place cells9,10,11. The notion that phase precession supports sequential learning extends to non-spatial tasks: phase precession has been observed during encoding of successive stimuli or states, such as images12, sounds8,13, odours8, time within an episode14,15,16,17 and task progression18. On the basis of this literature, we set out to test the hypothesis that phase precession would be present following cognitive boundaries (that is, event transitions) during movie clip watching and that the purpose of phase precession during those periods is to support the encoding of event sequences. As a result of phase precession, the phase of spiking relative to theta is indicative of the elapsed time since the onset of the event. Phase precession was first discovered in rodents9 and has since been investigated extensively in the context of spatial coding of self10 and others19. Phase precession has since also been reported in other species, including marmosets20, bats21 and humans12,22. A key insight of the work in marmosets, bats and humans is that phase precession is robustly present even if the underlying theta activity is present only intermittently in bouts21. Recent work further generalizes the presence of phase precession towards non-spatial domains, demonstrating that phase precession carries information about auditory stimulus identity8,13, odour identity8, time in an event sequence15,16,23 and sleep15.

Despite the prevalence of phase precession during spatial and non-spatial sequential behaviour across species, the precise functional role that phase precession plays in memory remains elusive. Phase precession coordinates hippocampal cell assemblies such that different cells fire in order within a single theta cycle11,24 in a manner that is conducive to the induction of synaptic plasticity2,25. Recent rodent8 and human studies14,22 report that phase precession occurs most strongly following event onsets or event transitions. On the basis of these data, it has been hypothesized that a function of phase precession is to organize neural activity such that a cohesive memory can be formed and later be retrieved. However, so far no direct link between the extent of phase precession and episodic-memory-related behaviour has been established.

We examined whether phase precession strength is related to whether a non-spatial episodic memory is encoded or retrieved in humans during continuous semirealistic experience. Our results make three key contributions. First, we report non-spatial phase precession in humans after event boundaries during naturalistic experience. Second, phase precession also occurred during recognition memory and order memory retrieval. Third, the strength of phase precession reflected participants’ memory encoding and retrieval success. Overall, our findings extend phase precession to non-spatial episodic memory and demonstrate a direct link between phase precession and memory behaviours in humans.

Results

Task and neural recording

Twenty-two patients with drug-resistant epilepsy participated in the experiment (3 new participants in addition to 19 participants for whom we previously published single-neuron but not field potential data26; Supplementary Table 1). The task consisted of three parts: encoding, scene recognition and time discrimination (Methods). During encoding, the participants watched 90 movie clips approximately eight seconds long each. Each clip was novel and consisted of either a single continuous shot (Fig. 1b, referred to as ‘no-boundary clips’) or several different shots separated by event boundaries (Fig. 1a, ‘boundary clips’). Event boundaries are time points (cuts) at which the movie transitions to an unexpected new part of the movie. This new part can be either contextually related or unrelated to the content of the movie seen so far. We referred to these two types of boundaries as ‘soft boundaries’ and ‘hard boundaries’ in our previous study (Zheng et al.26). Here we combine both types and refer to them jointly as ‘cognitive boundaries’. To counterbalance trial numbers, we defined virtual ‘no boundaries’ as the time point four seconds after the onsets of no-boundary clips. After watching all 90 clips once, the participants performed two memory tests. During the scene recognition test (Fig. 1c), the participants indicated whether the presented frame was ‘old’ (from watched clips) or ‘new’ (not watched) via a button press. During the time discrimination test (Fig. 1d), we evaluated order memory by asking the participants to indicate which of two frames shown on the screen appeared earlier in time in the movie clips just watched. The participants performed well in both memory tests, with accuracies of 73% ± 10% (mean ± s.d. here and throughout the manuscript unless stated differently) for the scene recognition test and 70% ± 12% for the time discrimination test (scene recognition: t21 = 22.263; P = 3 × 10−17; 95% confidence interval, (0.723, 0.853); time discrimination: t21 = 20.561; P = 2 × 10−15; 95% confidence interval, (0.692, 0.847); one sample two-tailed t-test with respect to chance level). The participants were implanted with depth electrodes for clinical evaluation while performing the task (see the participant demographics in Supplementary Table 1 and the spike quality metrics in Supplementary Fig. 1).

Fig. 1: Experiment and electrode locations.
figure 1

ad, Schematics of the three task stages. During encoding (a,b), the participants watched a series of silent clips and were instructed to answer yes/no questions that appeared randomly after every four to eight clips26. These clips contained either cuts to scenes from the same or a different movie (a, boundary clips) or no cuts (b, no-boundary clips). The red triangle in a marks the time point of the boundary in an example boundary clip. The grey triangle in b indicates the time point of four seconds in an example no-boundary clip. During scene recognition (c), the participants were presented with a static image and were asked to indicate whether the image was ‘old’ (shown in the watched clips) or ‘new’. During time discrimination (d), the participants were presented with two images side by side and were asked to indicate whether the left or right frame appeared first in the watched clips. See ref. 26 for more detailed information about the task. Due to copyright restrictions, the example images shown here are not the original stimuli used in the experiments. All images were generated by the authors. e, Locations of the 50 microelectrodes in the MTL across 22 participants (see the participants’ demographics in Supplementary Table 1) that had phase precession neurons. Shown is a slice from the template brain CIT168 (Methods), with microelectrodes plotted as individual dots and colour-coded for different brain regions (cyan for the amygdala, yellow for the hippocampus and red for the parahippocampal gyrus). The MNI coordinates for all microelectrodes in the plot are listed in Supplementary Table 2.

We quantified phase precession using simultaneously recorded single-neuron activity and local field potentials (LFPs) from microwire electrodes implanted in the hippocampus, amygdala and parahippocampal gyrus (Fig. 1e and Supplementary Table 2), jointly referred to as the medial temporal lobe (MTL). We also recorded data from three frontal areas, including the orbitofrontal cortex, anterior cingulate and pre-supplementary motor area, which we analysed to compare the prevalence of phase precession between the frontal lobe (in total 41 microwires and 433 neurons) and the MTL (in total 50 microwires and 503 neurons).

MTL theta bouts are prevalent following boundaries

As the internal reference ‘clock’ for neuronal spiking6, we first assessed the properties of theta-band LFPs recorded from the MTL (see the electrode locations in Fig. 1e and the electrode Montreal Neurological Institute (MNI) coordinates in Supplementary Table 2). Recordings from the amygdala, hippocampus and parahippocampal gyrus revealed large non-rhythmic fluctuations in the LFPs (Fig. 2a,c,e). Averaging power across the entire task, the LFPs recorded in these regions had approximately 1/f-shaped power spectra, with no apparent peaks in the conventional theta range of 4–8 Hz (Fig. 2a–f and Extended Data Fig. 1a–c,e–g).

Fig. 2: Characteristics of theta bouts during encoding.
figure 2

af, Example LFPs recorded from microelectrodes located in the amygdala (a), hippocampus (c), and parahippocampal gyrus (e). Panels b,d,f show LFP power spectra from the same example electrodes in a,c,e recorded throughout the entire task. g,i, Examples of detected theta bouts. The raw LFP (grey), 1–40 Hz band-pass-filtered LFP (black) and detected theta bout (red) are shown. t = 0 is the time point when a boundary occurs. h, Proportion of time occupied by theta bouts within different one-second time windows (analysis windows) before (Pre-Boundary) and after boundaries (Post-Boundary) in boundary clips, before (Pre-NoBoundary) and after (Post-NoBoundary) the midpoint in no-boundary clips, and after clip onsets (Post-ClipOnset) and clip offsets (Post-ClipOffset) of all the clips. Each dot represents one microelectrode (n = 50 in total). The asterisk and horizontal line denote the mean and median of the data, respectively. The shaded violin shape represents the data distribution, with the lower end indicating the 1st percentile (minima) and the top end indicating the 99th percentile (maxima). The top edge and bottom edge of the shaded rectangle represent the mean + s.d. and mean − s.d., respectively. The top edge and bottom edge of the shaded hourglass represent the 75th and 25th percentiles, respectively. ***P = 6 × 10−7 (two-tailed ANOVA test across all the analysis windows). j,l, Frequency of all detected theta bouts within the Post-Boundary time windows for the two example microelectrodes shown in g and i, respectively. The variance is also shown. k,m, Power spectra of the LFPs across all the Post-Boundary windows from the microelectrodes in g and i, respectively. n, Variance of the frequency of theta bouts across all recorded microwires in the MTL. The dashed and solid lines mark the variance of the two example microelectrodes shown in g and i, respectively.

We next identified brief ‘theta bouts’ in the time ___domain on a cycle-by-cycle basis27 and quantified each oscillatory cycle by its amplitude, period and waveform symmetry (see Methods and the characteristics of theta bouts in Extended Data Fig. 2). The revealed transient theta bouts (see examples highlighted in Fig. 2g,i; see also Methods) have similar frequencies across multiple cycles (Extended Data Fig. 2f), consistent with previous findings in humans27,28,29 and other far-sensing species20,21. On average, 14% ± 4% of the entire recording time was occupied by theta bouts. The likelihood of theta bout occurrence varied across different task periods. During encoding, considering 1 s periods, theta bouts were more likely to occur following cognitive boundaries in boundary clips (Fig. 2h; F5,244 = 12.4; P = 6 × 10−7; 95% confidence interval, (0.115, 0.423); two-tailed analysis of variance (ANOVA)), with theta bouts occupying 36.4% ± 8.7% of post-boundary time windows. In contrast, theta bouts occupied less time within the 1 s periods before cognitive boundaries in boundary clips (16.2% ± 5.9%), before (18.3% ± 4.7%) and after (16.8% ± 6.7%) no boundaries (that is, 4 s from the clip onsets) in no-boundary clips, and after clip onsets (25.2% ± 8.8%) and offsets (15.6% ± 4.9%) in all clips. The frequency of theta bouts was heterogeneous, spanning 2 Hz to 10 Hz, which is different from the narrow-band theta oscillations (~8 Hz) observed in rodents. This was true even when comparing different bouts detected on the same microelectrode (Fig. 2j,l shows two example microelectrodes). The variance of theta bout frequency changed considerably across electrodes, with some electrodes having theta bouts with relatively consistent frequencies (Fig. 2j, mostly from 6 Hz to 8 Hz). As a result, the microelectrodes with relatively low variance in bout-by-bout frequency changes had power spectra with prominent theta peaks when only considering the 1 s period following cognitive boundaries (Fig. 2k), whereas those with larger variance had at best a modest peak (Fig. 2m). Across all electrodes, low variance was relatively rare, with most electrodes exhibiting relatively large variance in the frequency of theta bouts (Fig. 2n; variance of theta bout frequency for all the electrodes was 4.18 ± 1.14 Hz).

Consistent with the findings during encoding, theta oscillations during memory retrieval were also relatively rare compared with the strong theta oscillations in rodents, resulting in approximately 1/f-shaped power spectra (Extended Data Figs. 1a–c and 2e–g, with balanced trial numbers across different comparison windows). The incidence of theta bouts increased following image onset in both retrieval tasks (Extended Data Fig. 1d,h; percentage of time; scene recognition: 22.2% ± 6.3% versus 12.7% ± 3.8%; F2,97 = 6.88, P = 2 × 10−3; time discrimination: 21.6% ± 4.2% versus 14.2% ± 3.7%; F2,97 = 7.11, P = 2 × 10−3; two-tailed ANOVA) relative to baseline (that is, when the fixation cross was presented; Fig. 1a–c). Theta bouts following boundaries during encoding and theta bouts following image onsets during scene recognition and time discrimination were more common on channels that also exhibited phase resetting (Extended Data Fig. 3). Theta bout prevalence was also relatively low during the later memory retrieval periods (Extended Data Fig. 1d,h). The total time occupied by theta bouts was substantially larger during encoding than following image onsets in both memory retrieval tasks (Extended Data Fig. 1i; encoding versus scene recognition: 36.4% ± 8.7% versus 21.3% ± 4.2%, P = 0.002, Kolmogorov–Smirnov test; encoding versus time discrimination: 36.4% ± 8.7% versus 22.4% ± 6.2%, P = 0.007, Kolmogorov–Smirnov test). Lastly, the frequency of detected theta bouts following image onsets varied more than that of theta bouts following cognitive boundaries during encoding (Extended Data Fig. 1j; scene recognition versus encoding: 5.25 ± 0.98 Hz versus 4.18 ± 1.14 Hz, P = 0.021, Kolmogorov–Smirnov test; time discrimination versus encoding: 5.47 ± 0.93 Hz versus 4.18 ± 1.14 Hz, P = 0.022, Kolmogorov–Smirnov test). In sum, these data show that LFPs in the human MTL contain prominent but relatively rare and short transient bouts of activity in the theta range. Theta bouts were most common following the occurrence of cognitive boundaries during encoding.

Theta phase precession mostly occurs following boundaries

Next, we examined whether the spiking of single neurons exhibited theta phase precession during memory formation and retrieval of continuous experience. We started by assessing phase precession within the post-boundary windows in boundary clips. Given the wide range of theta bout frequencies (Fig. 2n), we quantified phase precession with respect to activity within the 2–10 Hz frequency band. Similar to the methods used in rodents9,11,30, we quantified phase precession by computing the circular–linear correlation31 between the theta phase of spikes and the elapsed time after boundaries within each microwire. We assessed the significance of phase precession using a shuffle-based permutation procedure (Methods). We quantified elapsed time as the total accumulated theta phase rather than absolute time, a metric we refer to as ‘unwrapped phase’ throughout. This is a common method to assess phase precession first introduced by Mizuseki et al.32 and widely used in the context of non-stationary LFPs21,22. For example, 0° on the x axis in Fig. 3a marks the time of boundary onset, and 1,080° marks the time after three cycles of theta have elapsed.

Fig. 3: Prevalence of theta phase precession across different task stages.
figure 3

a, Example hippocampal phase precession neuron during encoding. The spiking phases relative to local theta (y axis) are plotted as a function of time in unwrapped theta phase (Methods). Each dot shows one spike within three theta cycles (x axis) relative to boundaries. The orange line indicates the fitted correlation between neuronal spiking phase and time in unwrapped theta phase. bd, Distribution of correlation coefficients for all MTL neurons demonstrating phase precession during encoding (b, orange), scene recognition (c, blue) or time discrimination (d, green). The distribution of correlation coefficients for neurons without phase precession is plotted in grey. Note that strong phase precession is indicated by negative correlation coefficients. e, Comparison of phase precession strength during encoding for the phase precession neurons in b (n = 68). Plotted are circular–linear correlation coefficients computed using spikes within three theta cycles before (Pre-Boundary) and after (Post-Boundary) boundaries, after clip onsets (Post-ClipOnset) and after clip offsets (Post-ClipOffset) in boundary clips, as well as before (Pre-NoBoundary) and after (Post-NoBoundary) the midpoint in no-boundary clips. ***P = 4 × 10−5 (two-tailed t-test against zero); NS, not significant. f,g, Comparison of phase precession strength during scene recognition (f) or time discrimination (g) for the phase precession neurons in c (n = 51) and d (n = 89), respectively. Plotted are circular–linear correlation coefficients computed using spikes within three theta cycles before (Pre-ImageOnset) and after image onsets (Post-ImageOnset), after image offsets (Post-ImageOffset) and after making a memory choice (Post-ButtonPress). The asterisk and horizontal line denote the mean and median of the data, respectively. The shaded violin shape represents the data distribution, with the lower end indicating the 1st percentile (minima) and the top end indicating the 99th percentile (maxima). The top edge and bottom edge of the shaded rectangle represent the mean ± s.d. and mean − s.d., respectively. The top edge and bottom edge of the shaded hourglass represent the 75th and 25th percentiles, respectively. ***P = 3 × 10−8 in f and ***P = 7 × 10−8 in g, two-tailed t-test against zero.

We found that 68/503 (13.5%; above chance, P < 7 × 10−4, permutation test) neurons in the MTL demonstrated significant phase precession following cognitive boundaries during encoding as assessed using this method (P < 0.05, permutation test; mean preferred phase, 166 ± 32 degrees; see example in Fig. 3a and Methods). Figure 3b shows the distribution of correlation coefficients for the selected 68 neurons. As time elapsed following cognitive boundaries, most phase precession neurons (66/68, 97%) spiked progressively earlier relative to ongoing theta (from 360° to 0°), characterized by a negative spike-phase correlation (Fig. 3b). Phase precession neurons exhibited an average correlation coefficient of −0.38 ± 0.11 (t67 = −12.550; P = 3 × 10−8; 95% confidence interval, (−0.391, 0.227); one-sample two-tailed t-test), which is comparable to the strength of phase precession reported previously8,33,34. Consistent with previous findings21, the majority of these phase precession neurons also exhibited theta-band phase-locking when considering all spikes together (Extended Data Fig. 4), Most of these phase precession neurons do not change their preferred phases before and after boundaries (Extended Data Fig. 5). Around 40% of phase precession neurons did not exhibit significant phase-locking, which is proportionally larger than typically observed in place and grid cells in rodents32. By definition, phase precession neurons had more negative (P = 3 × 10−4; 95% confidence interval, (−0.521, −0.233); two-tailed Kolmogorov–Smirnov test) correlation coefficients than non-phase-precession neurons (−0.07 ± 0.13, mean ± s.d.; one-sample two-tailed t-test: t67 = −1.391; P = 0.262; 95% confidence interval, (−0.085, 0.007)). Note that phase precession was also present in trials without detected theta bouts (Extended Data Fig. 6). To evaluate the robustness of phase precession, we also computed phase precession using two different methods, which demonstrated consistent findings (Extended Data Fig. 7 and Supplementary Table 3; see Methods).

Was phase precession specific to the 1 s period following cognitive boundaries? For the neurons showing phase precession after cognitive boundaries (n = 68; Fig. 3b, orange), their phase precession strength following no boundaries, clip onsets and clip offsets or proceeding cognitive boundaries and no boundaries did not significantly differ from chance levels (P > 0.05, permutation test; Fig. 3e). These findings suggest that the reported phase precession neurons demonstrate phase precession prominently during memory formation of continuous experience following cognitive boundaries.

Phase precession is task dependent and anatomically specific

We next turned to assessing phase precession during memory retrieval. We found that 10.1% (51/503) and 17.7% (89/503) of MTL neurons showed phase precession (P < 0.050, permutation test) when participants were presented with images during scene recognition (Fig. 3c; mean preferred phase, 168 ± 25 degrees) and time discrimination (Fig. 3d; mean preferred phase, 154 ± 31 degrees), respectively. These neurons exhibited an average correlation coefficient of −0.41 ± 0.09 (t51 = −16.622; P = 2 × 10−8; 95% confidence interval, (−0.418, −0.235); one-sample two-tailed t-test) during scene recognition and −0.40 ± 0.12 during time discrimination. By definition, these correlation coefficients were more negative than those of non-phase-precession neurons (scene recognition: P = 2 × 10−4; 95% confidence interval, (−0.172, −0.104); time discrimination: P = 2 × 10−4; 95% confidence interval, (−0.213, −0.126); two-tailed Kolmogorov–Smirnov test). Similar to theta bouts, the observed phase precession was prominent following the onset of the images shown but not during baseline, following image offset or during the button-press period (Fig. 3f,g). What triggered phase precession in the neurons we examined? We examined whether phase precession was related to phase resetting, shifts in preferred phases of phase-locking or the presence of theta bouts. This comparison revealed that phase resetting on channels with phase resetting neurons was the most common relationship (Extended Data Figs. 36 and Supplementary Table 4). The strength of phase resetting was stronger in hard than in soft boundaries (Extended Data Fig. 3b). Given that the majority (49/68, 72%) of phase precession neurons demonstrated phase precession following both soft and hard boundaries (Extended Data Fig. 3c) and more phase precession neurons (n = 22 versus 68, χ2 = 16.6, P = 3 × 10−4, chi-squared test) were identified using time in unwrapped theta phase instead of absolute time, the observed phase precession seemed not strictly contingent on phase resetting. Furthermore, in a substantial proportion of neurons, none of the three (phase resetting, shifts in preferred phase of phase-locking or the presence of theta bouts) were present, leaving it an open question what initiates phase precession in these cases.

We found that phase precession is a dynamic process. Among all the phase precession neurons observed in the MTL across encoding, scene recognition and time discrimination (n = 117), half (61/117, 47.9%; P = 1 × 10−6, binomial tests) demonstrated phase precession exclusively for only one task stage (Fig. 4d; Fig. 4a–c shows an example). Of the neurons that showed phase precession in only one task, the majority did so during time discrimination (40/61, 65.6%) rather than encoding (18/61, 29.5%) or scene recognition (3/61, 4.9%). We also observed neurons that showed phase precession during multiple task stages (Fig. 4d; 56/117, 52.1%, P = 1 × 10−8, binomial tests), with most of these neurons showing phase precession for all three task stages (35/56, 62.5%). The strength of phase precession varied across different task stages (see the example in Fig. 4a–c), with more negative correlations (indicating stronger phase precession) during the two memory retrieval tasks than during encoding (Fig. 4e,f; scene recognition minus encoding: rdiff = −0.19 ± 0.09; t55 = −11.345; P = 3 × 10−4; 95% confidence interval, (−0.199, −0.140); time discrimination minus encoding: rdiff = −0.24 ± 0.11; t55 = −12.334; P = 6 × 10−7; 95% confidence interval, (−0.252, −0.179); one-sample two-tailed t-test). Comparing the two retrieval tasks, phase precession was stronger during time discrimination than during scene recognition (time discrimination minus scene recognition: rdiff = −0.11 ± 0.04; t55 = −6.211; P = 0.031; 95% confidence interval, (−0.114, 0.088); one sample two-tailed t-test). Regardless of the difference in phase precession across the three task stages, their firing rates remained relatively the same (Supplementary Fig. 2a; scene recognition minus encoding: Frdiff = −0.18 ± 0.41 spikes per s; t55 = −1.286; P = 0.334; 95% confidence interval, (−0.224, 0.049); one-sample two-tailed t-test; time discrimination minus encoding: Frdiff = −0.14 ± 0.38 spikes per s; t55 = −0.861; P = 0.243; 95% confidence interval, (−0.181, 0.082); one-sample two-tailed t-test). Lastly, we asked whether phase precession strength varied as a function of different types of trials within a given task. Phase precession during scene recognition was significantly stronger following the onset of images that were old than images that were novel (−0.26 ± 0.14 versus −0.11 ± 0.07; t1,50 = −4.235; P = 4 × 10−3; 95% confidence interval, (−0.165, −0.059); two-tailed t-test; Extended Data Fig. 8), suggesting a role of precession in retrieval. Together, these results indicate that the strength with which a given neuron exhibits phase precession is a dynamic process that varies as a function of both the task and the trial type within the task.

Fig. 4: Comparison of phase precession strength across different task stages.
figure 4

ac, Example hippocampal neuron whose spiking exhibited phase precession during scene recognition and time discrimination, but not encoding. Shown are spike phases as a function of time in unwrapped theta phase, displayed separately for encoding (a; t = 0 is the boundary in boundary clips), scene recognition (b; t = 0 is the image onset) and time discrimination (c; t = 0 is the image onset). The coloured lines indicate the fitted correlation between spike phase and time in unwrapped theta phase. The correlation value and its statistical significance (two-tailed permutation test) are listed above each plot. d, Number of MTL neurons showing significant phase precession during encoding (orange), scene recognition (blue), time discrimination (green) and combinations thereof. e, Difference in phase precession strength between encoding and scene recognition for neurons that show phase precession for encoding and/or scene recognition (that is, the orange plus blue circles in d). f, Difference in phase precession strength between encoding and time discrimination for neurons that showed significant phase precession during encoding and/or time discrimination (that is, the orange plus green circles in d). g, Difference in phase precession strength between scene recognition and time discrimination for neurons that showed significant phase precession during recognition and/or time discrimination (that is, the blue plus green circles in d). Note that in eg, more negative correlations indicate stronger phase precession. The dashed lines indicate the example hippocampal neuron shown in ac.

We then asked how phase precession propensity varied across the brain. There was a statistically significant interaction between the effects of task stages (encoding, scene recognition and time discrimination) and brain regions (F3,6 = 9.37, P = 0.011, two-tailed ANOVA test). In particular, only the hippocampus and amygdala contained a significantly higher proportion of phase precession neurons than expected by chance during all three task stages (Fig. 5a; P < 0.05, permutation test). While substantially rarer, the proportion of phase precession neurons was also larger than expected by chance in the parahippocampal gyrus, orbitofrontal cortex and anterior cingulate cortex, but only during encoding and not during either of the memory retrieval tests (Fig. 5a; P < 0.05, permutation test).

Fig. 5: Anatomical distribution of phase precession neurons and their co-occurrence with firing rate changes.
figure 5

a, Phase precession was most prominent in the MTL. The bars indicate the proportion of neurons that exhibited phase precession in each anatomical area during different task stages (orange for encoding, blue for scene recognition and green for time discrimination). The number of phase precession neurons in each anatomical area is listed at the top of each bar. The dashed horizontal line indicates the chance level. The asterisks mark brain areas under specific task stages with proportions of phase precession neurons larger than expected by chance (P < 0.05, one-tailed binomial test). HPC, hippocampus; AMY, amygdala; PHG, parahippocampal gyrus; OFC, orbitofrontal cortex; ACC, anterior cingulate; SMA, supplementary motor area. b, Most phase precession neurons did not exhibit modulation by firing rate. Shown are the proportions of phase precession neurons in the MTL with and without firing rate modulation (defined as firing rate differences between before and after boundaries during encoding). c,d, Same as b but assessing whether neurons changed their firing rate between before and after image onset during scene recognition (c) or time discrimination (d). ***P < 0.001; **P < 0.01; *P < 0.05.

Phase precession can be independent from firing rate tuning

We previously showed that around 10% of MTL neurons selectively increased their firing rates after cognitive boundaries during encoding (we labelled these cells ‘cognitive boundary cells’ in Zheng et al.26). Were the phase precession neurons we describe here also cognitive boundary cells? On average, phase precession neurons had higher firing rates than non-phase-precession neurons following the onset of cognitive boundaries, indicating that the two groups of neurons might overlap at least to some extent (Supplementary Fig. 2b–d). This was the case, but only weakly so: most of the phase precession neurons (encoding: 45/68, 66.2%) were not cognitive boundary cells (Fig. 5b). Still, the proportion of phase precession neurons that were also cognitive boundary cells (23/68, 33.8%) was significantly larger than that in the population of non-phase-precession neurons (45/455, 9.9%). Similarly, during scene recognition and time discrimination, most phase precession cells did not change their firing rate following stimulus onset relative to baseline (Fig. 5c, scene recognition: 39/51, 76.5%; Fig. 5d, time discrimination: 66/89, 74.2%). These findings align with theoretical models35,36 that suggest that rate and phase coding can occur together as well as separately from each other. We simulated the phase precession model of Mehta et al.37 (Extended Data Fig. 9), which revealed that a parameter regime where phase precession is present without firing rate changes can easily be found. We note that our simulation does not explicitly implement recurrent inhibition38,39; rather, recurrent inhibition is accounted for using a long refractory period.

Phase precession predicts participants’ memory performance

Given the largely separate groups of cells exhibiting phase precession and firing rate changes, we asked whether phase precession and firing rate provided complementary information about memory encoding and/or retrieval success as assessed behaviourally. We tested this idea by computing phase precession strength (correlation coefficient) and average firing rates (normalized to baseline) separately for trials with correct and incorrect memory (Methods). We then quantified how well phase precession strength and/or firing rates explained participants’ memory performance (correct = 0 versus incorrect = 1) using a generalized linear model (GLM). Due to the significant difference in phase precession strength (Extended Data Fig. 8), correct targets (old images identified as old) and correct foils (new images identified as new) are counted as separate variables in the model for predicting scene recognition performance. On the basis of comparisons between models with different combinations of features, we found that the winning model for explaining recognition memory performance is the model that has access to all three variables: (1) phase precession strength during encoding, (2) phase precession strength during retrieval and (3) firing rate during encoding (Fig. 6a, winning model marked in red). This model performed better than one with access to only firing rate. Similarly, for explaining time order retrieval performance, the winning model was also the one with access to all three variables (Fig. 6b, winning model marked in red). Note that the absolute value of phase precession strength was used in the model fitting. An odds ratio bigger than 1 indicates that significant amounts of variance in whether recognition memory or order memory was correct or incorrect was explained by a given variable. Further examining the winning models (marked in red in Fig. 6a,b) revealed that in both instances, when phase precession occurred during encoding (rencoding) and time discrimination (rtimeDiscrim), participants were more likely to correctly recognize targets (Fig. 6c) and correctly identify the order (Fig. 6d), respectively. Consistent with the GLM results, stronger phase precession was observed following boundaries during encoding for trials associated with correct recognition and order memory than for incorrect ones (Extended Data Fig. 10a–f; encoding: F1,49 = 8.24; P = 6 × 10−3; 95% confidence interval, (0.122, 0,347); time discrimination: F1,61 = 4.38; P = 0.041; 95% confidence interval, (0.015, 0.121)). Stronger phase precession was also found following image onsets during time discrimination when participants successfully recalled the temporal order of watched clips than in unsuccessful trials (Extended Data Fig. 10j–l; F1,61 = 12.55; P = 8 × 10−4; 95% confidence interval, (0.176, 0.387)).

Fig. 6: Strength of phase precession is predictive of memory encoding and retrieval success.
figure 6

a, Model comparisons for neural activity during the scene recognition task. Each bar shows a comparison between the full GLM model and reduced GLM models with or without (w/wo) a given predictor. A likelihood ratio bigger than 1 with a significant P value indicates a better model performance in explaining participants’ behaviour outcomes with the added predictor. The predictors considered are firing rate during encoding (Frencoding) and scene recognition (FrsceneRecog), and phase precession strength during encoding (rencoding) and scene recognition (rsceneRecog). b, Same as a, but for the time discrimination task. The predictors considered are firing rate during encoding (Frencoding) and time discrimination (FrtimeDiscrim), and phase precession strength during encoding (rencoding) and time discrimination (rtimeDiscrim). All the models in a and b consider all the recorded neurons in the MTL. c,d, Odds ratios for different predictors when predicting participants’ memory performance during scene recognition (c) and time discrimination (d). The vertical lines indicate the confidence intervals. The horizontal dashed line marks the chance level. The asterisks denote significance. e, For the winning model for scene recognition (indicated by the red box in a), the proportion of variance in the response variable (correct versus incorrect) explained by different groups of neurons is shown. Shown are R2 ratios of models built using all phase precession neurons during encoding (orange line), all phase precession neurons during scene recognition (blue line) and all non-phase-precession neurons (grey bars). The total numbers of neurons used for the GLM model for each group were balanced by random subsampling. To do so, the non-phase-precession neuron group was subsampled 100 times, each time selecting the same number of neurons as the number of phase precession neurons present during scene recognition. The dashed line indicates the chance level. f, Same as e, but for time discrimination. Shown is the proportion of variance in the behaviour explained by the winning model (indicated by the red box in b). Shown are model fit for all phase precession neurons selected during encoding (orange), all phase precession neurons selected during time discrimination (green) and all other neurons (grey, subsampled 100 times and each time with the same number of neurons as those selected during encoding). The dashed line indicates the chance level. *P < 0.05; **P < 0.01; ***P < 0.001 in ad.

Next, we assessed how much variance in the winning models was explained by subsets of the neurons (the results in the above paragraph are for all recorded neurons). Compared with the rest of the MTL neurons, phase precession neurons explained significantly more variance in behavioural accuracy during scene recognition (correctly recognized targets and correctly recognized foils versus incorrect recognition memory decisions), with phase precession neurons during encoding and scene recognition exceeding 96% and 82% of the R2 values achieved for random subsets of the same number of non-phase-precession neurons, respectively (Fig. 6e). Similarly, phase precession neurons also explained the most variance in time discrimination accuracy, with phase precession neurons during encoding and time discrimination exceeding 97% and 100% of the R2 values achieved for random subsets of the same number of non-phase-precession neurons, respectively (Fig. 6f). In sum, the strength of phase precession provided information about memory encoding and retrieval success that was complementary to the information provided by firing rates, thereby revealing a temporal code.

Discussion

Theta phase precession is commonly observed in several species and is thought to be a neural mechanism to temporally link continuous experience and encode and retrieve these experiences into and from memory9. However, how phase precession facilitates memory formation and retrieval remains an open question. Here we tested the hypothesis that phase precession supports memory formation and retrieval of naturalistic experience in humans. We found that phase precession was prominent in the human MTL following cognitive boundaries during movie clip watching and when participants retrieved memories of the content and temporal structure of the previously watched movie clips. The strength of phase precession was predictive of participants’ ability to encode and retrieve the memories.

Theta phase precession was prominent despite the absence of long and continuous stretches of theta activity like that observed in rodents during movement. This observation is consistent with work in bats, where phase precession is also prominent despite theta being present only in short bouts21 similar to those in humans. Consistent with previous work27,28, theta-frequency-band LFP fluctuations in humans tend to be non-rhythmic, except for short, transient theta ‘bouts’ (Fig. 2). The frequencies of theta bouts detected on the same recording electrode can vary across a broad range (2–10 Hz; Fig. 2i), which is in line with previous observations in humans22 and bats21 and which explains the absence of clear ‘peaks’ in the spectra in many instances. Theta bouts and theta phase precession are both prevalent after cognitive boundaries during encoding (Figs. 2h and 3e) and after image onset during retrieval (Fig. 3f,g and Extended Data Fig. 1d,h). However, the two phenomena were largely independent: while in some cases, stronger phase precession was observed in trials with theta bouts detected than in trials without theta bouts, phase precession was prominent in either condition regardless of whether theta bouts were present or absent in a given trial (Extended Data Fig. 6). Further illustrating this point, phase precession is stronger during memory retrieval than during encoding despite theta bouts being rarer and more variable during retrieval than during encoding (Extended Data Fig. 1i,j). Previous studies have also reported phase precession when theta frequency is altered40, when theta-modulated spiking is reduced41,42,43 or even with little periodicity of low-frequency field potentials20,21. Moreover, theoretical work shows that periodicity is not necessary for phase precession37. Indeed, recent work suggests unique advantages of phase coding in the absence of rhythmicity, which enables multiplexing of rich information through a broad range of oscillatory frequencies44. For this reason, we here used a method of detecting phase precession that did not require theta periodicity. This hypothesis might also explain our observation that many phase precession neurons do not also show theta phase-locking (Extended Data Fig. 4). Different from rodents, whose place cells and grid cells that show phase precession tend to be phase-locked to 4–8 Hz, neurons in humans can be phase-locked to oscillations at multiple frequencies with various preferred spiking phases45. Our results, together with previous literature, support the idea that phase precession can serve as a flexible neural coding principle that is effective for both narrow-band and broadband theta.

We observed the strongest phase precession following cognitive boundaries during encoding and following image onset during scene recognition and time discrimination (Fig. 3c,f,g). Both cases contain sharp visual transitions, either across two different movie scenes or from a blank screen (baseline) to a movie scene. However, not all sharp visual transitions trigger phase precession, as phase precession following movie clip onsets, movie clip offsets, image offsets and button presses was not different from that expected by chance (Fig. 3c,f,g). A possible reason for why phase precession occurred following cognitive boundaries but not movie onsets is that cognitive boundaries attracted more attention (which, in turn, can lead to phase resetting46; Supplementary Table 4). We did not control for attention in our study, leaving this an open question for further studies. An alternative explanation could be that phase precession occurs when encoding or retrieving memories across events within a continuous experience. Under this interpretation, phase precession would not be present following movie onset because the onset of the movie is predictable and constitutes the start of a new sequence, without needing to create a link to prior events in the same sequence. This hypothesis aligns with theoretical models suggesting that phase precession supports sequence learning11,37,47. This idea is also consistent with previous findings8,14 that show phase precession at event transitions in discrete event sequences.

The strength of phase precession varied across the different task stages. As stated above, stronger phase precession was observed during scene recognition and time discrimination than during encoding (Fig. 4e,f). Also, during scene recognition, phase precession was stronger in old than in new trials. We posit that these differences in phase precession strength might be the result of memory-based decision-making, during which the precise temporal structure of previous experiences is relevant. Supporting this hypothesis, more phase precession neurons (Fig. 4d) were found during time discrimination than during scene recognition. Also, stronger phase precession (Fig. 4g) was more predictive of participants’ order memory accuracy than of their recognition memory accuracy (Fig. 6c,d). A previous study48 has also reported stronger phase-locking (but not phase precession) when animals were instructed to make a memory-based decision than when they passively traversed the same environment.

A potential explanation of changes in phase precession strength between tasks could be the different strategies participants used when solving the two memory tests. During time discrimination, the participants needed to determine the order of the two frames shown on the screen, which requires recollection of the entire event sequence. A recent study49 that combined neural data and modelling showed that phase coding is critical for encoding temporal order information in humans. In contrast, during scene recognition, the participants only needed to retrieve the memory associated with a single image. Notably, during scene recognition, phase precession was mostly observed during trials in which an old (familiar) image was shown (Extended Data Fig. 8). Identification of novel (foil) images might not require phase precession because it could rely on novelty signals50,51. An important future question is what the relationship is between phase precession and other forms of neural coding, such as novelty50. The dynamic task-dependent phase precession strength observed in our study suggests that phase precession can serve different functional roles as needed.

Models with access to both phase precession strength and firing rate were able to explain variance in the memory accuracy of the participants significantly better than models with access to firing rate alone (Fig. 6). This finding supports the idea that phase coding provides information that trial-averaged firing rate does not provide, making the two codes complementary35,52,53. By combining both rate and phase coding, hippocampal cell assemblies may generate sequential structures across single theta cycles that represent sequences of past, current and future states in both the spatial and episodic domains in a compressed manner10,11,47. While phase precession has largely been reported for neurons with known tuning curves (that is, rate coding, such as place cells and grid cells), most phase precession neurons in our study did not show modulation of firing rate with the variables examined in this study (Fig. 5b–d). This independence of phase precession strength and firing rate modulation is consistent with theoretical models35,36 (see Extended Data Fig. 9 for a simulation of the Mehta et al.37 model) and with previous observations in rodents54 and humans22 that show that phase precession can appear in the absence of concurrent firing rate changes. We posit that the neurons that show phase precession without firing rate modulation might encode the ‘episodic’ position within a given high-level behaviour state by their spike phase, similar to work in rodents showing phase recession in ‘episode’ or ‘time’ cells when rodents run on a treadmill with a goal16 but not without55. An alternative interpretation is that we did not test for the variables that would modulate firing rate in these neurons. Even for the subgroup of neurons that showed both significant firing rate changes and phase precession, our findings differed from those of place cells (which exhibit both rate and phase coding). This is because in our study, all movie clips were shown only once (they were novel). Contrary to place cells9, the rate tuning of phase precession neurons in our study is thus not tied to a specific stimulus. These findings suggest that phase precession in a given neuron can occur during exposure to many different novel stimuli that have the occurrence of cognitive boundaries in common, thereby flexibly capturing the temporal structure of diverse event sequences using a shared neural coding mechanism.

In sum, we demonstrate non-spatial phase precession during memory formation and retrieval of naturalistic experience in humans. The strength of phase precession and the groups of neurons showing phase precession are task specific, and the strength of phase precession was predictive of the success of memory encoding and retrieval above and beyond firing rates alone. Our findings suggest that phase precession in humans is a general neural coding mechanism that can flexibly support different aspects of episodic memory.

Methods

This study complies with all relevant ethical regulations. The study protocol was approved by the Research Ethics Board at Toronto Western Hospital (approval number: 15-5052; approval date: 14 May 2021) and the Institutional Review Board at Cedars-Sinai Medical Center (approval number: Study572; approval date: 9 June 2020).

Task

The task (Fig. 1a–d) consisted of three parts: encoding, scene recognition and time discrimination. During encoding, the participants watched 90 novel silent clips embedded with or without boundaries, defined as movie cuts transitioning to scenes from the same or a different movie. To assess attention, a yes/no question related to the content of the clip appeared randomly after every 4–8 clips. After watching all 90 clips, the participants were instructed to take a short break, roughly five minutes, before proceeding to the memory tests. The participants first performed the scene recognition test and were instructed to identify whether the frames shown were from watched clips (answer: ‘old’) or from clips not shown (answer: ‘new’). The participants then performed the time discrimination test to identify whether the ‘left’ or ‘right’ frame was shown first (earlier in time) during encoding. More detailed information about the task design can be found in our previous publication26.

Participants

Twenty-two patients (13 females; mean age, 39 ± 16 years, mean ± s.d.; see the participants’ demographics in Supplementary Table 1) with refractory epilepsy volunteered for this study and provided their informed consent. Nineteen of these participants come from our previous work26, and three are new participants. The participants performed the task while they stayed at the hospital and were implanted with electrodes for seizure monitoring. The locations of the implanted electrodes were determined solely by clinical needs. No statistical methods were used to predetermine sample sizes, but our sample sizes are similar to those reported in previous publications22,26,53. There was no participation compensation.

Electrophysiological recordings

Broadband neural signals (0.1–8,000 Hz filtered) were recorded using Behnke–Fried electrodes56 (Ad-Tech Medical) at 32 kHz using the ATLAS system (Neuralynx Inc.). We recorded bilaterally from the amygdala, hippocampus and parahippocampus, and other regions outside the MTL. The electrode locations were determined by co-registering postoperative CT and preoperative MRI scans using Freesurfer’s mri_robust_register57. Each Behnke–Fried electrode shank had eight microwires on the tip and were marked as one anatomical ___location. To assess potential anatomical specificity across different participants, we aligned each participant’s preoperative MRI scan to the CIT168 template brain in MNI152 coordinates58 using a concatenation of an affine transformation followed by a symmetric image normalization diffeomorphic transform59. The fifty electrodes within the MTL across all the participants are illustrated on the CIT168 template brain in Fig. 1e, and the corresponding MNI coordinates are listed in Supplementary Table 2.

Spike sorting and quality metrics of single units

Spike sorting was performed offline using a semi-automated template-matching algorithm, Osort60. See Zheng et al.26 for more details on the spike-sorting procedures. We identified 1,103 neurons in this dataset across all brain areas considered. As an accurate assessment of phase precession requires a sufficient number of spikes, we excluded 207 neurons with low firing rates (<0.5 Hz) and analysed the remaining 936 neurons (503 neurons in the MTL). The quality of our spike-sorting results was evaluated using our standard set of spike-sorting quality metrics61,62 for all considered 936 putative single neurons (Supplementary Fig. 1).

Characteristics of theta oscillations

Preprocessing

LFPs were recorded simultaneously with single-neuron activity and were used for computing phase precession along with the spike activity from neurons detected from the same microwire. To eliminate potential influences of the spike waveform on the higher-frequency parts of the LFP63, we replaced the LFP in a 3 ms time window centred on the detected spike by linear interpolation. We then downsampled this spike-free version of the LFP from 32 kHz to 250 Hz, followed by further post-processing using automatic artefact rejection64 and manual visual inspection using the function fr_databrowser.m in the Fieldtrip toolbox65 to remove trials with large transient signal changes from further analyses. Trials with large transient signal changes were removed from further analyses. Examples of preprocessed LFPs are shown in Fig. 2a,c,e and as grey lines in Fig. 2g,i.

Spectral analysis

The power spectra (1–20 Hz) of the preprocessed LFPs were computed using Welch’s method (function pwelch.m in MATLAB66) with Hamming windows (50% overlap). In Fig. 2k,n, we show spectra computed from preprocessed LFPs within the 0–1 s time window that follows boundaries in boundary clips.

Theta bout detection

To examine the cycle-to-cycle variability of theta oscillations, we detected transient theta bouts using the method described in previous papers27,29. Briefly, two versions of the LFP were first computed: one low-pass-filtered at 40 Hz (‘low-passed LFP’) for identifying peaks and troughs in a ‘smooth’ version of the LFP without high-frequency components, and one band-pass-filtered within the theta range (2–10 Hz, ‘theta-filtered LFP’) to identify zero-crossings. Theta bouts (2–10 Hz) were identified as time periods in which all detected cycles had similar amplitudes (amp_consistency_threshold = 0.6), similar periods (period_consistency_threshold = 0.6), and relatively symmetric waveforms with similar length of rising and decaying edges within a given theta cycle (monotonicity_threshold = 0.6) for more than three consecutive putative cycles in time. The frequency of theta bouts was computed as the ratio of how many cycles the theta bout had divided by how long in seconds it lasted. For example, a theta bout that lasted for two seconds with four cycles had a frequency of 2 Hz. The time ratio of theta bouts was computed as how much time of the analysis window was occupied by theta bouts divided by the total length of the analysis window. For example, a theta bout that appeared for one second of a two-second time window had a time ratio of 0.5.

Spike phase estimation

On the basis of the peaks, troughs and zero-crossings identified as described above, we estimated theta phase at all time points by linearly interpolating between the peaks (0° or 360°), troughs (180°) and zero-crossings (90° or 270°) cycle by cycle. Compared with a conventional Hilbert transform approach, this phase-interpolation method eliminates potential distortions introduced when estimating the phase of non-sinusoidal LFPs27,67. To reduce noise in the phase estimation, we excluded spikes that occurred during the lowest 25th percentile of theta power distribution of a given channel21. The phase assigned to a given spike was set equal to the phase estimated at the point at which the action potential was at its peak.

Phase precession measurements

Spike-phase circular–linear correlation method (Method 1)

We analysed spikes that occurred within the first three theta cycles following boundaries during encoding or after the onset of image display during scene recognition and time discrimination. Phase precession was quantified using circular statistics. For each neuron, we computed the circular–linear correlation coefficient31 between the spike phase (circular) and time in unwrapped theta phases (linear). Unwrapped theta phase was defined as accumulated cycle-by-cycle theta phase starting at the alignment point (that is, boundary onset or image onset). To assess statistical significance, we generated a null distribution for each circular–linear correlation using surrogate data generated from shuffling the spike timing of neurons 1,000 times. This procedure maintained the firing rate and spike phase distribution of each neuron while scrambling the correspondence between the spike phase and spike time within each trial. A neuron was considered as a significant phase precession neuron if the observed circular–linear correlation was greater than the 97.5th percentile or smaller than 2.5th percentile (P < 0.05) of the surrogate null distribution of the correlation coefficient.

Chance level of neurons showing phase precession

We estimated the number of neurons exhibiting significant phase precession by chance by recomputing the spike-phase circular–linear correlation 1,000 times using the surrogate data generated by shuffling trial numbers between the spike phases and LFP. For each iteration, we obtained the proportion of selected phase precession cells relative to the total number of neurons within each brain region. These 1,000 values formed the empirically estimated null distribution for the proportion of phase precession cells expected by chance. A brain region was considered to have a significant amount of boundary cells or event cells if its actual fraction of significant cells exceeded 95% of the null distribution (Fig. 5a; P < 0.05).

Control analyses for phase precession

We performed the following control analyses to assess whether our findings are sensitive to parameter or method choices. We found that the results were robust.

Phase precession with different theta cycle numbers

We re-assessed phase precession while varying the number of theta cycles (three cycles versus four cycles) from which the spikes were included for the same analyses. While including more theta cycles resulted in more spikes being available for analysis, the extent of phase precession was not significantly different (three cycles versus four cycles comparison; encoding: χ2 = 5.17, P = 0.511; scene recognition: χ2 = 7.32, P = 0.253; time discrimination: χ2 = 6.22, P = 0.314; chi-squared test).

Phase precession using the spike-phase autocorrelation method (Method 2)

The results in the main manuscript are derived using circular–linear correlation (Method 1). To examine the robustness of these findings, we employed a second method (Method 2) to estimate phase precession and compared the results. For this alternative method, we compared the autocorrelation of spiking of each neuron with the frequency of the underlying LFP32. Method 2 is widely used in rodents68,69 with stereotypical 8 Hz theta and also in animals with non-stationary theta, such as bats21, non-human primates46,70,71 and humans22. As in Method 1, for each neuron, we assigned to each spike its unwrapped theta phase by accumulating the cycle-by-cycle phase following boundaries during encoding. We then computed the autocorrelation (‘phase autocorrelation’) of these cumulative spiking phases of the spike-train, using 60° bins with a window length of three cycles. If a neuron exhibits phase precession, its firing phases should occur more and more ahead of theta cycles32. Its spike-phase autocorrelation should therefore have peaks at a higher frequency than the frequency of the underlying theta (Extended Data Fig. 7f,h). To quantify the relationship between the spike-phase autocorrelation and the frequency of theta, we fitted decaying sine wave functions to the spike-phase autocorrelation and then computed the power spectra of the fitted line using the Fourier transform. We then plotted the power of the spike-phase autocorrelation as a function of frequency ratio, which is normalized by an averaged frequency of underlying theta. Lastly, on the basis of this obtained power spectrum density plot (insets in Extended Data Fig. 7e–h), we computed the modulation index22, which is defined as the amplitude of the power spectra peak normalized by the total area under the curve of the normalized power spectra. We then generated a null distribution of modulation indices from spike-phase autocorrelations based on surrogate data generated using the same shuffling procedure discussed for spike-phase circular–linear correlation. A neuron was considered as exhibiting significant phase precession if its observed modulation index exceeded the 95th percentile of the surrogate distribution. To eliminate potential bias from low spike counts, we only analysed neurons with more than 80 spikes within the analysis windows across trials for computing spike-phase autocorrelation. Furthermore, we conducted a Rayleigh test using spike phase information within the phase precession analysis windows, with greater r values indicating a higher likelihood of phase-locking. To overcome the bias against neurons with higher firing rates, we computed a Rayleigh test on surrogate spike data randomly extracted from neurons’ spike phases during the entire recording and repeated the same procedure 100 times to generate a null distribution of r values. Phase-locking neurons are defined as neurons with r values exceeding 95% of the r values in the null distribution (P < 0.05, permutation test). Neurons showing significant phase-locking (P < 0.05, Rayleigh test) and a peak relative frequency around 1 (0.95 to 1.05) were excluded to ensure that we did not mistakenly identify phase-locking neurons as exhibiting phase precession. Note that Method 1 naturally rejects neurons showing phase-locking, whose correlation coefficient values between spike phases and time are around zero. We found no overlap between significant phase-locking neurons and phase precession neurons identified using Method 1. Comparing the neurons selected as phase precession with Methods 1 and 2 shows significant agreement in which neurons are selected. For example, during encoding, 70% of the neurons selected as phase precession with Method 1 were also selected with Method 2 (48/68, 70.5%; Supplementary Table 3).

Phase precession using absolute time versus time in unwrapped theta phases

We recomputed phase precession using time in seconds instead of time in unwrapped theta phases for both Method 1 and Method 2 (Extended Data Fig. 7e,g). To do so, we computed the circular–linear correlation (Method 1) and modulation index (Method 2) on the basis of spikes within a fixed zero- to one-second time window after boundaries during encoding and after the onset of images during scene recognition and time discrimination. This alternative way of assessing spike phase resulted in the identification of a smaller number of phase precession cells (P < 0.01, permutation test; above chance level) than our analyses using time in unwrapped theta phase (spike-time correlation versus spike-phase correlation: n = 22 versus 68, χ2 = 16.6, P = 3 × 10−4, chi-squared test; spike-time autocorrelation versus spike-phase autocorrelation: n = 17 versus 51, χ2 = 14.7, P = 7 × 10−3, chi-squared test). Comparing the differences between the two correlation coefficients and modulation indices obtained for each cell revealed that the larger the difference, the larger the variance of theta bout frequency on the wire from which the neuron was recorded (Extended Data Fig. 7k,l). This shows that in the presence of variable theta, unwrapped phase more reliably estimates phase precession in both Methods 1 and 2. We therefore used unwrapped phase throughout.

Firing rate modulation

For neurons showing significant phase precession during encoding and retrieval, we computed their average firing rates within the same analysis time windows for computing phase precession—three cycles after boundaries during encoding and after image onsets during scene recognition and time discrimination. Note that the average lengths of pre- and post-analysis time are comparable when aligned to boundaries during encoding (pre: 0.86 ± 0.44 s; post: 0.78 ± 0.29 s; t1,67 = −11.222; P = 0.156; 95% confidence interval, (−1.706, −1.396); paired two-tailed t-test) and image onsets during scene recognition (pre: 0.74 ± 0.38 s; post: 0.77 ± 0.36 s; t1,50 = 0.923; P = 0.221; 95% confidence interval, (−0.091, 0.245); paired two-tailed t-test) and time discrimination (pre: 0.75 ± 0.41 s; post: 0.73 ± 0.35 s; t1,55 = 1.523; P = 0.195; 95% confidence interval, (−0.040, 0.309); paired two-tailed t-test). A neuron was considered to have significant firing rate modulation if its average firing rate differed significantly between post- and pre-analysis time windows (P < 0.05, paired two-tailed t-test). We also assessed the overlap between neurons showing firing rate modulation and phase precession at different task stages (Fig. 5b–d).

Relationship between phase precession and memory performance

GLM

We assessed the relationship between neural metrics (firing rate and phase precession) and participants’ memory as assessed behaviourally using a GLM. Using scene recognition as an example, we grouped trials into the two categories ‘correctly recognized’ (true positives and true negatives) and ‘incorrectly recognized’ (false negatives and false positives) depending on the accuracy of their response. To account for the difference in trial numbers, we subsampled correct trials with the same trial number as incorrect trials. We first computed phase precession (spike-phase circular–linear correlation) and average firing rates (z-scored normalized to the baseline) within three theta cycles relative to cognitive boundaries during encoding and relative to image onsets during scene recognition, separately for correct and incorrect trials. For each neuron, this resulted in four values each for correct and incorrect trials: correlation coefficients specifying phase precession strength (rencoding and rsceneRecog) and normalized firing rates (Frencoding and FrsceneRecog). We used a mixed-effect GLM using the function fitglme.m in MATLAB with logistic regression and Bernoulli distribution to predict participants’ binary memory outcomes (that is, correct versus incorrect) during scene recognition. The fixed effects were four values (rencoding, rsceneRecog, Frencoding and FrsceneRecog) from all the neurons in the MTL. The random effects were neuron ID nested within session ID.

Contribution of different fixed effects

To assess the extent to which firing rates and phase precession predicted behaviour, we compared different reduced versions of the above GLM model. We compared model 1 (fixed effect: Frencoding) with model 2 (fixed effect: model 1 + FrsceneRecog), model 3 (fixed effect: model 1 + rencoding) and model 4 (fixed effect: model 1 + rsceneRecog). We also added the effect of phase precession during memory retrieval (rsceneRecog) to model 3 and compared the model performance between the two. Model comparisons were performed on the basis of the Akaike information criterion, expressed as a log likelihood ratio in Fig. 6a. We did this separately also for the time discrimination task, in which all variables were replaced by the equivalent for the time discrimination task (Fig. 6b). For example, we replaced FrsceneRecog with FrtimeDiscrim and updated rsceneRecog with rtimeDiscrim. We quantified the strength of each fixed effect using odds ratios, computed as the exponential of the coefficient of each variable derived from the best GLM model. Odds ratios significantly bigger than 1 indicated that participants were more likely to correctly answer a given question when the given effect was present. We applied the same methods for time discrimination as well (Fig. 6d).

Contribution of different cell groups

We then assessed the ability of different groups of neurons in predicting participants’ memory outcomes (correct versus incorrect). To do so, we fit the same GLM model 4 as described above (fixed effects: rencoding + Frencoding + rsceneRecog or rtimeDiscrim) but including only the neurons that showed phase precession during encoding, only the neurons showing phase precession during scene recognition or time discrimination, and the non-phase-precession neurons. We then computed the R2 ratio as the ratio between the R2 value of GLM model 4 using different subgroups of neurons \({R}_{{\mathrm{subgroups}}}^{2}\) versus all MTL neurons \({R}_{{\mathrm{all}}}^{2}\), as follows:

$$R^2 \; {\mathrm{ratio}}=\frac{{R}_{{\mathrm{subgroups}}}^{2}}{{R}_{{\mathrm{all}}}^{2}}$$

We subsampled the non-phase-precession neurons to balance the number of neurons across different subgroups when building the GLM models.

Statistical methods and software

The participants were not informed of the existence of cognitive boundaries in the clips. All the statistical analyses were conducted in MATLAB, primarily using the Statistics and Machine Learning toolbox. For comparison against specific values, we used the one-sample t-test. For comparison between two groups, we primarily used the Kolmogorov–Smirnov test, while for omnibus testing, we used ANOVA, unless otherwise specified in the text. When the normality of data was not clear, non-parametric permutation tests were used to determine significance levels by comparing the real test statistic to the null distribution estimated from the surrogate dataset.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.