Introduction

Subseasonal-to-seasonal (S2S) climate prediction on the time window of 10–30 days has received increasing attention1,2,3. As a critical component of a seamless climate forecast system, S2S prediction can bridge the gap between medium-range weather forecasts and seasonal-to-decadal climate predictions, thus benefiting the management and decision-making in agriculture, energy, food security, and many sectors vulnerable to natural hazards4,5. Recent advances in S2S prediction include two ongoing multi-institutional projects: the S2S Prediction Project6 launched by the World Weather Research Programme (WWRP) and World Climate Research Programme (WCRP), and the Subseasonal Experiment (SubX) Project7. However, S2S predictability, along with the prediction of extremes, is still in their infancy and remains a grand challenge for the climate community. Evaluating the skill of S2S forecasts and understanding the sources of predictability are needed for developing operational forecasts and early warning systems for climate extremes and hazards.

S2S prediction of atmospheric rivers (ARs) is at the forefront of weather and climate prediction, given the critical role of ARs in modulating midlatitude climate and global hydrological cycle8. ARs are characterized by narrow, elongated bands (up to thousands of kilometers) of intense lower tropospheric moisture residing within the warm conveyor belt of extratropical cyclones9. Considered as the largest “rivers” of water vapor on the Earth’s atmosphere, ARs cover around 10% of the zonal circumference but transport over 90% of the total poleward moisture at midlatitudes10,11. ARs influence the variability of wind and precipitation around the globe12,13, including extreme precipitation and floods in the western U.S.14,15,16. Moreover, the frequency, intensity, and duration of AR events are anticipated to increase within a warming climate, causing more severe flooding events, extreme winds, and storms17,18,19,20. Therefore, it would be crucially beneficial to the management of water resources and AR-related natural catastrophes (e.g., drought, flood, and severe wildfires), if provided with skillful S2S prediction of ARs and accurate forewarning for incoming heightened or suppressed AR activity.

However, current forecast systems have low skill in forecasting AR and associated week-average weather patterns at S2S timescales, particularly for week-3 and beyond. Using the ECMWF S2S forecast system, DeFlorio et al.21 provided a global assessment of the simulation and subseasonal prediction of AR frequencies during the winter and summer seasons. They found that the prediction skill of weekly AR frequencies could be skillfully predicted in the first week but most skill is lost by week 3, except for some specific regions in the midlatitude oceans. Based on the North American Multi-Model Ensemble (NMME) hindcasts, Zhou and Kim22 showed encouraging seasonal predictions of year-to-year wintertime AR variability over the North Pacific associated with the El Niño-Southern Oscillation (ENSO); meanwhile, systematic biases in the midlatitudes between 30°N–40°N for AR frequencies were identified in the state-of-the-art models. It is even more difficult to accurately forecast the intensity and duration of an individual, discrete AR event, as the deterministic predictability limit for a specific synoptic AR event is around 7–10 days due to the inherently chaotic nature of the atmosphere23,24. Note that we aim to predict changes in the statistics of AR frequencies as opposed to predicting individual ARs.

To advance AR forecasts into S2S timescales, previous studies have centered around diagnosing large-scale variability of climate modes as potential sources of S2S AR predictability using reanalysis datasets10,25. Using 30 year of Modern-Era Retrospective Analysis for Research and Applications (MERRA) reanalysis data, Payne and Magnusdottir26 investigated the large-scale AR features and highlighted the influence of Rossby wave breaking on their behavior and intensity over the Pacific basin. Based on the ECMWF Interim (ERA-I) reanalysis, Guan and Waliser10 examined the global footprints of ARs and the modulation of large-scale climate variability on AR activities. Notably, the importance of climate modes, such as the ENSO, the Arctic Oscillation (AO), the Madden–Julian oscillation (MJO), and the quasi-biennial oscillation (QBO), in modulating S2S AR activities and associated rainfall over the North Pacific and western North America has been addressed in several earlier studies10,23,27,28. Strong MJO and QBO activities have been shown to significantly alter the frequency of AR occurrences29. Mundhenk et al.30 further constructed statistical prediction models and improved AR forecast skill with lead times of 3–5 weeks by integrating the combined effects of QBO and MJO.

However, there is little consensus on the underlying mechanisms and how different climate modes influence AR activities in the western U.S. For example, Guan and Waliser10 and Corringham et al.14 found no significant ENSO signals in modulating landfalling AR frequencies over the western U.S., whereas Kim et al.31 reported fewer AR storms along the west coast of the U.S. and an enhanced AR frequency during central Pacific El Niño. Meanwhile, Huang et al.28 detected more frequent AR activities during the La Niña phase than the El Niño and neutral phases based on the Weather Research and Forecasting (WRF) model and acknowledged the large uncertainty due to the short observational record. Limited research on models has specifically examined the effects of different climate patterns on S2S prediction skill of AR frequencies. Which climate mode or source of predictability dominates S2S forecast of wintertime ARs remains unresolved.

Although previous research has explored the impacts of certain climate modes, particularly from observational estimates, there is a gap in linking these modes to the predictability of ARs. To our knowledge, this is the first paper to systematically document the predictability of ARs during the northern winter within a modeling framework using the Average Predictability Time method (see Methods). This study focuses on evaluating the predictability and forecast skill of wintertime (December-January-February) ARs at S2S timescales, identifying primary sources of predictability for wintertime ARs, especially as they are modulated by large-scale climate modes. The specific model we are using the Geophysical Fluid Dynamics Laboratory (GFDL) Seamless System for Prediction and EArth system Research (SPEAR) coupled model32 (see Methods). SPEAR has shown potentially skillful prediction for Arctic Sea Ice33 at S2S timescales; summertime heat extremes34 and ARs35 at seasonal timescales; and Southern Ocean surface temperature36 and sea surface height over the Kuroshio Current extension37 at multi-year to decadal timescales. Meanwhile, a S2S prediction system was developed using one configuration of the SPEAR model that shows a 30 day prediction of wintertime MJO38 and a beyond-weather timescale prediction of the major modes controlling the landfalling tropical cyclones near the U.S. coast39. This study aims to examine S2S prediction of ARs in the northern winter using the SPEAR model, identify fundamental drivers of AR predictability, examine windows of forecast opportunity, and hence provide insights for operational predictions.

Results

Wintertime AR activities and forecast skill in the SPEAR model

We apply AR detection provided by Mundhenk et al.25 to the fifth generation ECMWF atmospheric reanalysis (ERA5) and SPEAR hindcasts (see Methods). The AR occurrence days are counted within a week-long period (from week-1 to week-4) for estimating weekly AR frequency and predictability. For instance, number of AR days in week-1 can be estimated by counting AR days during a week-long window with a lead time of 1–7 days. As shown in Fig. 1, AR climatological pattern over the northern winter in SPEAR generally resembles the reanalysis pattern. Strong wintertime AR activities are particularly detected over eastern Asia, North Pacific, and western North America, consistent with previous studies based on the ECMWF model21, the Interim reanalysis10, and the WRF model28. This dominant AR activity structure over the North Pacific region brings considerable moisture, heavy precipitation, and extreme winds toward the western U.S.25. Meanwhile, there are systematic negative biases (SPEAR minus ERA5; Fig. 1d, e) over eddy-rich regions, such as the Kuroshio Current and Gulf Stream, and positive biases slightly southward. Consistent with the ECMWF S2S forecast system (see Fig. 2 in DeFlorio et al.21), the SPEAR generally shows negative biases relative to reanalysis in the mid-latitudes with the highest AR occurrence frequencies. The biases are most prominent from the western boundary toward the central western North Pacific in the SPEAR, whereas they locate slightly eastward towards North America in the ECMWF. It is notable that there is a southward shift (i.e., negative biases along the western boundary currents and extension regions with respective to positive biases southward) in both SPEAR and ECMWF models. We further analyzed three additional standard control experiments using SPEAR with increased atmospheric model resolution from 1°, 0.5°, to 0.25° grid (see Methods and Supplementary Fig. S1). The result suggests that with increasing atmospheric model resolution, gradual improvement of weekly mean AR frequency is identified especially over North Pacific and western North America during winter seasons. Yet, the southward shift patterns of AR bias remain, which can be related to the biases in mean jets. Held et al.40 clearly showed that the GFDL Coupled Model Generation 4 (CM4) has equatorward shifted biases in the jets, while this is largely absent in the Atmospheric-only Model Generation 4 (AM4) (see their Fig. 22). Note that SPEAR model components are largely the same as CM4 with the major difference of the ocean model resolution (SPEAR uses 1° ocean while CM4 use 0.25° ocean model). It strongly indicates that the equatorward shifted bias of AR is caused by the mean SST biases. In particular, the mid-latitude cold SST biases (see Fig. 2 in Delworth et al.32) tend to enhance the meridional temperature gradient to the south flank of the jet but reduce the meridional gradient to the north flank of the jet, largely responsible for the equatorward shift of the jets.

Fig. 1: Climatology and bias of AR frequency in the northern winter based on the ERA5 reanalysis and SPEAR model hindcasts.
figure 1

a ERA5 AR climatology. b SPEAR AR climatology for week-1. c Comparison of zonal mean AR climatology between ERA5 and SPEAR. d Bias for week-2 hindcasts. e Bias for week-3 hindcasts. f Zonal mean bias for week-1 to week-4. Bias is computed as the difference between SPEAR and ERA5 reanalysis.

Fig. 2: S2S forecast correlation of AR frequency in the northern winter in the SPEAR model.
figure 2

a ACC for week-2 lead time. b ACC for week-3 lead time. The stippling denotes the region with correlation skill significant at 5% level considering the effective sample size. c Zonal mean ACC from week-1 to week-4. d RPC for week-2 lead time. e RPC for week-3 lead time. f Zonal mean RPC values from week-1 to week-4.

Figure 2 shows the forecast skill, measured by the anomaly correlation coefficient (ACC), and an assessment of how well forecast skill matches potential predictability, measured by the ratio of predictable components (RPC), for lead times of 2–4 weeks (see Methods). The spatial patterns of SPEAR S2S AR forecast correlation skill in the northern winter are generally comparable with the results from the ECMWF model, although with slightly different studied periods and seasons21. The zonal mean AR forecast correlation maximizes over the midlatitudes, reaching ~0.8 with week-1 lead time, and minimizes in the tropics and the polar regions. The wintertime AR correlation ranges from 0.3 to 0.7 in week-2 and decreases to 0.1–0.4 in week-3 over the North Pacific. Compared with the ECMWF model21, SPEAR shows a relatively higher correlation over the Gulf Stream extension, parts of Kuroshio current, and the eastern North Pacific regions for week-3 lead time. The RCP patterns with values below 1.0 across most regions indicate that SPEAR tends to produce an under-dispersive ensemble and overconfident AR frequency forecasts. The absence of RPC values above 1 suggest no evidence of the so-called “signal-to-noise paradox”41 in S2S AR prediction. RPC values below 1 also highlight considerable room for improvement in subseasonal AR forecasts, assuming that SPEAR produces a reasonable estimate of potential AR frequency predictability.

To further examine SPEAR’s prediction skill of ARs with different frequencies, we estimate the Brier Skill Score (BSS; see Methods) for three AR occurrence levels: 0 AR day/week (no AR activities; Level-1), 1–2 AR days/week (Level-2), and 3–7 days/week (Level-3). This classification of AR activities defined in this study follows DeFlorio et al.23. This classification is based on counting the number of AR days within a week. We first show in Fig. 3 the occurrence probability for each AR level for the SPEAR hindcasts (3 weeks lead time) and the corresponding ERA5 reanalysis. All the ten-ensemble model hindcasts are used to calculate the forecasted probability, leading to a similar but smoother spatial pattern compared with the reanalysis. Most of the tropics, high-latitudes, and land regions are characterized by no AR activities. The global average AR occurrence probabilities (Level-2 + Level-3) are 24.1% and 25.6% for the SPEAR (week-3 lead time) and the reanalysis, respectively (Fig. 3g). Meanwhile, it is very likely for at least one AR days to be forecasted in a given week over the midlatitude ocean regions, resembling the climatological pattern of wintertime AR activity. Level-2 AR activities with 1–2 AR days/week have a larger occurrence probability of 30%-60% compared to Level-3 AR activities 3–7 AR days/week (10%-40%) over this region. Higher AR occurrence probability (computed as the fraction of 1–7 AR days/week in total winter weeks) is particularly identified over the western US (30°-50°N, 115°-130°W; with the ocean region included to cover more grids for analysis) compared with the global mean results, indicating the importance of ARs in producing extreme weather in this specific region of interest.

Fig. 3: The occurrence probability of different AR occurrence frequencies in the northern winter based on the SPEAR model hindcasts (for week-3 lead time) and the corresponding ERA5 reanalysis.
figure 3

a ERA5 Level-1: 0 AR day/week, (b). ERA5 Level-2: 1–2 AR days/week, (c). ERA5 Level-3: 3–7 AR days/week. d SPEAR Level-1: 0 AR day/week, (e). SPEAR Level-2: 1–2 AR days/week, (f). SPEAR Level-3: 3–7 AR days/week, (g). Comparison of wintertime AR occurrence frequencies (from 0 day/week to 7 days/week) in week-3 over the western U.S. (30°-50°N, 115°-130°W), see box in panel (f) and the globe between the ERA5 reanalysis and week-3 SPEAR model hindcasts.

Figure 4 shows the global BSS values for three AR frequency levels in the SPEAR model for week-2 and week-3 lead times. The BSS values in the SPEAR model generally decrease with increasing lead weeks in each level (insignificant in Level-2). The area average BSS values for Level-1 (Level-3) in the North Pacific Ocean (20°-55°N, 120°E-120°W; with land regions masked) reach 0.10 (0.09) in week-2 lead and decline to 0.04 (0.02) in week-3 lead. In terms of Level-2, the North Pacific area-averaged BSS values are <0.01 for lead times of week-2 and week-3. Relatively larger BSS values are seen over the Eastern U.S. than the Western U.S., particularly for Level-1 and Level-3 in week-2 lead; this is consistent with the correlation pattern. We also find high BSS values (Fig. 4) and forecast correlations (Fig. 2) in week-2 and week-3 lead times over the Southeast Asia. While the occurrence rate is larger for Level-2 AR activities than Level-3 AR activities, higher values of BSS are detected in Level-3 ARs than Level-2 ARs. SPEAR can be better at predicting high-frequency AR activities (3–7 days/week) than low-frequency AR activities (1–2 days/week).

Fig. 4: December–February BSS values for different AR frequency levels in the northern winter in the SPEAR model for week-2 and week-3 lead times.
figure 4

a Level-1, (b). Level-2, (c). Level-3 for week-2 lead time, and (d). Level-1, (e). Level-2, (f). Level-3 for week-3 lead time.

Identification of the most predictable modes

Given the large spatial variation of the prediction skill of AR frequency, a question arises: what are predictability sources for the wintertime AR activities? To address this, we apply the Average Predictability Time (APT) analysis42,43, an efficient technique to characterize predictability of a system and can be used to decompose the most predictable component s or modes (MPMs) (see Methods). Note that the APT is applied to model-predicted AR weekly frequency anomalies for ten ensemble members from week-1 to week-4. By definition, the lower-order MPMs have higher prediction skills than the high-order modes. The forecasted AR frequency can be reconstructed by using the first several MPMs44. In this study, we decompose the hindcasts into a total of fifteen MPMs that explain about 40% of the total variance in frequency of wintertime ARs in the ensemble mean. The results are insensitive to the number of retained MPM modes (determined by testing 10–30 modes). Reducing the dimensionality to a small number of most predictable modes facilitates the identification of the potential sources of predictability45,46.

For week-2 and week-3 lead times, the APT-reconstructed patterns of correlation based on the first three MPMs generally resemble the original results based on the SPEAR hindcasts (Fig. 5a–d). When incorporating more MPMs for reconstruction, the analysis indicates a noteworthy improvement primarily in the southwestern regions of the U.S. and Mexico during week-2, while this improvement is negligible during week-3 (Supplementary Fig. S2). High-order modes only have significant skills for a short range of forecast (i.e., the weather timescale). Fig. 5e shows the correlation between the timeseries of observations (ERA5 reanalysis) and hindcasts (SPEAR model) as a function of lead time. The time series for each MPM of the SPEAR hindcasts are calculated based on the APT technique (see Methods). The time series for observations are obtained by projecting the ERA5-based ARs onto the spatial patterns of these MPMs. The first three MPMs display stronger correlation compared to the other MPMs, with coefficients around 0.7 at week-2 lead and above 0.4 at week-3 lead; although gradually decreasing, they remain significant at week-4 lead at the 5% significance level. For week-3 (Fig. 5c, d), the correlation based on the reconstructed hindcasts using the first three MPMs is sufficient to reproduce most of the prediction correlation in the original SPEAR hindcasts (comparing Fig. 5c with d). Notably, five or more MPMs are needed to reconstruct the forecast correlation over the Southwest U.S. and Mexico for week-2 lead. With a target on S2S prediction beyond the weather timescales, we focus on the first three MPMs and thus likely account for the dominant source of S2S predictability for wintertime AR activities.

Fig. 5: The reconstructed correlation skill of wintertime ARs over the North Pacific sector.
figure 5

The (a) original and (b) reconstructed correlation skill for week-2 lead time. The (c) original and (d) reconstructed correlation skill for week-3 lead time. ac display the original forecast correlations as illustrated in Fig. 2a, b, specifically focusing on the North Pacific sector. The reconstructed correlation skill is calculated using the APT method with the first three MPMs. The stippling denotes the region with correlation significant at 5% level considering the effective sample size. e The correlation between the ERA5-based and the SPEAR-predicted time series of the first five MPMs as a function of lead times from week-1 to week-4. The dashed lines indicate the 5% significance level for each MPM after considering the effective sample size.

The spatial patterns for the first three MPMs are shown in Fig. 6a–c (see Supplementary Fig. S3 for additional twelve spatial patterns of MPM4-15). The MPM1 shows a tripole-like structure with strong positive anomalies along the Kuroshio current extension, negative anomalies over the central to the northeastern part of the North Pacific and weakened positive anomalies around the eastern North Pacific (Fig. 6a). The correlation pattern between MPM1 time series and anomalous fields of sea surface temperature (SST) shows strong positive values in the tropics, implying considerable contribution from tropical climate variability, particularly ENSO47 (Fig. 6d). We further conduct a composite analysis to examine how negative (La Niña) and positive (El Niño) phases of ENSO (see Methods) impact wintertime AR activities for the week-3 lead time in the SPEAR model (Fig. 6g, h). During La Niña, we identify strong negative anomalies of AR occurrence around the Kuroshio Current extension and western North Pacific regions and positive values over the subtropical to northern North Pacific, resembling the structure (despite different signs) as shown in MPM1. The modulation pattern is largely reversed for El Niño conditions, except for some regions in the central North Pacific. Note that the influences of La Niña and El Niño events on wintertime AR activity derived from the model hindcasts are not completely symmetric, similar to observational estimates10. In particular, La Niña considerably enhances wintertime AR activity in the central North Pacific and northwest U.S., but this converse is not seen in El Niño conditions. This discrepancy of how ENSO affects AR activities can lead to the distinct patterns of AR landfall during La Niña and El Niño conditions48. Overall, given the evidence discussed above, the MPM1 can be referred to as the ENSO mode, though the pattern in the western North Pacific deviates from the composite ENSO pattern. The tripole-like pattern highlights the substantial impact of ENSO on ARs over the North Pacific and western North America. The dominant role of ENSO on the wintertime week 3–5 temperature prediction is also emphasized in a previous study46.

Fig. 6: The spatial patterns of the first three MPMs for wintertime AR activity (week-3) over the North Pacific sector.
figure 6

a MPM1, (b). MPM2, and (c). MPM3. Red (blue) colors indicate positive (negative) anomalies of wintertime ARs. The correlation patterns between the anomalous SST field and (d). MPM1, (e). MPM2, and f. MPM3 time series. The composite wintertime AR frequencies (AR days/week) in week-3 for the SPEAR during positive and negative phases of ENSO, PNA, and AO. g, h El Niño and La Niña conditions, (i, j). PNA+ and PNA-, (k, l). AO+ and AO-. The contour in each panel indicates the composite of the 500hPa geopotential height anomalies. Refer to Methods for definitions of ENSO, PNA, and AO phases.

The spatial pattern of MPM2 features a displaced dipole structure with negative values over the western and northern North Pacific and positive values over the central North Pacific and west coast of North America (Fig. 6b). It represents a typical structure of the positive phase of Pacific-North American teleconnection pattern (PNA; see Methods), with remarkable negative anomalies located at the south of the Aleutian Islands and the southeast U.S. and positive anomalies close to Hawaii and the Rocky Mountains to western North America49. Further, coherent annual-mean variations are detected for the MPM1 and MPM2 time series (with the Pearson’s correlation reaching 0.75), leading to similar spatial correlation patterns with respect to the SST field in the tropics. However, the correlation patterns between MPM1 and the 500 hPa geopotential height (or precipitation) significantly differ from those between MPM2 and the 500 hPa geopotential height (or precipitation) (see Supplementary Fig. S4). The PNA-like structure identified in MPM2 is further supported by the composite pattern of ARs modulated by positive and negative PNA (Fig. 6i and j). Additionally, the composite patterns of ARs modulated by ENSO and PNA further underscore the differences between MPM1 and MPM2. The result suggests a relationship between the PNA pattern and ARs that is largely independent of the midlatitude anomalies driven by the ENSO; this arises primarily due to the distinction between the ENSO-forced teleconnection versus the PNA pattern50.

The spatial pattern of the MPM3 is dominated by an elongated, narrow negative signal from the East Asia to western North America (northeastward from 20°N to 40°N over the North Pacific; Fig. 6c), where strong negative correlations are identified in the correlation maps between MPM3 time series and anomalous SST (Fig. 6f). The negative correlation in terms of the H500 anomalies extends to western U.S. and polar regions, surrounded by positive anomalies over the northern North Pacific and the eastern U.S. (see Supplementary Fig. S4). We speculate that this particular pattern mimics the teleconnection patterns of the AO, exhibiting a meridional dipole and a large-scale barotropic structure between the Arctic and mid-latitudes51,52,53. The AO index (see Methods) shows a significantly negative correlation with MPM3 time series and the AO’s signature detected in the MPM3 is largely supported by the composite pattern shown in Fig. 6k, l. Specifically, we find a strong negative (positive) signal in AO’s negative (positive) phase consistent with the dominant structure in Fig. 6c.

We further examine how these three identified large-scale climate modes affect wintertime ARs at S2S timescales based on the SPEAR, with a focus on the Western U.S. to diagnose the potential “forecasts of opportunity.” For ENSO, the modulation of the El Niño phase on wintertime AR occurrence over the western U.S. is relatively weaker compared with that of the La Niña phase (Fig. 6g, h), aligned with previous research10,48,54. During the La Niña phase, there exists a strong positive anomaly in the North Pacific extending to the western U.S. in week-3, which is related to the weakened Aleutian Low and poleward shifted subtropical jet31. In addition, the positive AR anomalies over the western U.S. during the La Niña periods are accompanied by an increased forecast correlation of wintertime ARs particularly in week-3 lead (Figs. 6h and 7a). A decreased forecast correlation, in contrast, is noted during El Niño and neutral phases. For PNA, enhanced wintertime AR activities are detected in California during the PNA’s negative phase (Figs. 6j and 7b). This further supports the distinction of the PNA pattern from ENSO forced teleconnections, as evidenced by the lack of strong correlations with El Niño events. Both the PNA’s positive and negative phases show relatively higher prediction correction than the neutral state, particularly in week-2 (Fig. 7b). For positive and negative AO phases (Fig. 7c), it has slightly higher skill than the neutral phases for week-2, similar to PNA. The impact of ENSO is different with an asymmetric response of ARs between El Niño and La Niña in the western US. For week-3, the correlation skill reaches 0.4 (0.36) during negative (positive) AO phase, greater than neutral AO phase with skill at 0.31. Nevertheless, no significantly consistent modulations are found in week1-4 concerning the AO’s positive phase. As noted in APT reconstruction, additional MPM patterns (see Supplementary Fig. S3) may contribute to AR forecast of opportunity especially over the western U.S. Analysis of the QBO and MJO phases (see Methods) further demonstrates potentially important roles in modulating wintertime AR patterns and correlation over western U.S (Supplementary Figs. S5 and S6). Higher correlation of predicted number of AR days over the western U.S. is shown from week-2 to week-4 during QBO easterly, while the modulation of MJO phases varies with phases and lead times.

Fig. 7: The modulation of climate modes on correlation skill of wintertime AR frequency with lead times from week-1 to week-4 over the western U.S. (30°-50°N, 115°-130°W).
figure 7

The composite prediction correlation coefficients are estimated based on the SPEAR during positive (orange curve), negative (blue curve), and neutral (gray curve) phases of (a). ENSO, (b). PNA, and (c). AO. The black curve indicates raw correlation over the western U.S. considering the whole period during 2000–2019. The correlation skill is estimated between the SPEAR hindcasts and ERA5 reanalysis.

Discussion

This study provides a first evaluation of deterministic and probabilistic forecast skill of ARs in the northern winter at S2S timescales using the SPEAR model. The SPEAR can generally reproduce the climatological AR frequencies compared to that in ERA5 reanalysis, despite systematic bias particularly over the Kuroshio Current and Gulf Stream extension (around 20-40°N) in the midlatitude oceans. The SPEAR shows relatively high AR forecast correlations in the first 2 weeks but becomes substantially weaker in week-3. Although the occurrence rate for Level-2 AR activities (1–2 days/week) is larger than that of Level-3 AR activities (3–7 days/week) particularly over the North Pacific, the BSS metric implies that the SPEAR model is better at predicting high-frequency AR activities than low-frequency AR activities.

The APT method is applied to diagnose the maximum predictable modes as sources of predictability in the SPEAR model-predicted AR anomalies. The reconstructed AR activity based on the first three MPMs largely reproduces the forecast skill from the raw hindcasts at week-3. This implies that the first three MPMs can be the primary contributors to AR forecasts at S2S timescales. The spatial structures for the first three MPMs are identified as the ENSO-like, PNA-like, and AO-like patterns. Relatively, the roles of the MJO and QBO in driving S2S AR predictability are modest. We further test how the three identified climate modes affect the prediction of wintertime AR. These climate modes vary from S2S and interannual timescales, considerably influencing subseasonal AR activities and frequencies10,15,31. We conduct the composite analysis for each climate mode (i.e., positive, negative, and neutral phases), evaluating their modulations on wintertime AR occurrence and S2S forecast correlation over the western U.S. We detect strong positive AR anomalies over the most of western U.S., accompanied by consistently enhanced AR forecast correlation in week 1–4 during the La Niña phase. Our analysis also shows a better forecast correlation during the positive phase of the PNA and a slight increase in the forecast correlation of wintertime ARs during the negative phase of the AO, particularly in week-3. Consistently higher correlation is identified during the QBO easterly condition than the QBO westerly condition over the western U.S., while the modulation by the MJO is much more complicated and dependent on different phases and lead times.

Although the analyses discussed above provide valuable insights into S2S AR predictions, there remains some uncertainty which may be model-dependent. Nonetheless, this study not only sheds light on the SPEAR model’s ability in forecasting weekly AR frequencies, but also helps understand the sources of S2S AR predictability. Our findings suggest that wintertime ARs are driven and modulated by large-scale climate modes. The wintertime ARs can be predicted with correlations reaching 0.5 in week-3 and 0.4 in week-4 over the western U.S. during specific conditions, such as La Niña, PNA + , AO-, QBO easterly, and some phases of MJO. S2S forecast correlation with observational estimates is modulated by phases of large-scale climate modes, highlights the potential window of opportunity for S2S AR forecasting.

Methods

Observations

The ERA5 reanalysis is regarded as the observational reference for validation55. We use the ERA5 daily mean reanalysis from 2000 – 2019 with specific humidity, meridional and zonal wind components retrieved at 300, 500, 700, 850, and 1000hPa vertical levels.

Model hindcasts

Utilizing GFDL’s SPEAR coupled model we perform a ten-member, 20 year hindcast experiment38,39. SPEAR consists of the AM4 atmosphere model (33 vertical levels with 0.5° horizontal resolution), MOM6 ocean model (75 vertical layers with ~1° grid), LM4 land model, and SIS2 sea ice model. A simple nudging scheme is employed toward observations with several years’ integration before prediction, as described in Xiang et al.46, for initializing atmospheric and oceanic conditions. The NASA’s MERRA-256 and the NOAA 1/4° Daily Optimum Interpolation Sea Surface Temperature Version 2 (OISST2)57 are utilized as the nudging fields of the atmosphere and SST, respectively. The ensemble hindcasts, generated with perturbed nudging strength, are carried out with 45 days of integration every 5 days from January 2000 to December 2019.

Atmospheric river detection algorithm

Following the algorithm provided by Mundhenk et al.25, this study defines ARs using the vertically integrated water vapor transport (IVT) between 300hPa and 1000hPa. This algorithm identifies AR-like features satisfying specific intensity and geometry structures (i.e., Length ≥ 1400 km, Area ≥ 300,000 km2, and Aspect Ratio ≥ 1.4) within the fields of IVT anomalies over the globe. The definition of the IVT is shown below,

$${IVT}=\frac{1}{g}\sqrt{{\left({\int }_{\!\!1000}^{300}{qudp}\right)}^{2}+{\left({\int }_{\!\!1000}^{300}{qvdp}\right)}^{2}}$$
(1)

where u and v are zonal and meridional wind, g is the gravitational acceleration, q is the specific humidity, and dp is the pressure difference between adjacent layers from 300 hPa to surface level (technically 1000 hPa). The anomalous IVT in observation is estimated by subtracting the seasonal cycle, which is defined as the first three harmonics of the 2000–2019 calendar day means at each grid. To exclude the bias due to model drift, the IVT anomalies for the SPEAR hindcasts are computed by removing the hindcast climatology as a function of the initial date and lead time. By aggregating the IVT fields over the historical period (2000–2019) in the ERA5 reanalysis and model ensembles, we use the 85th percentile of the anomalous IVT as the threshold for AR detection following Tseng et al.18. We also tested the 94th percentile threshold but found negligible difference in the forecast correlation pattern compared with the 85th percentile. These two threshold criteria cover most of the threshold criterion range that was employed in earlier studies18,20. The algorithm used in this study considers a large-scale percentile threshold and diagnoses each time step of IVT map independently (0 for non-AR and 1 for AR days for a given grid cell), which benefits dynamical analyses in large-scale domains25.

Control experiments

Three additional present-day standard control experiments are performed with SPEAR to examine the impacts of model resolution on the AR climatology and bias, namely, low-resolution experiment (LR: 1° atmospheric grid), medium-resolution experiment (MR: 0.5° atmospheric grid), and high-resolution experiment (HR: 0.25° atmospheric grid). All the three simulations have the same resolution of ocean, sea ice and land models (~1° grid) and we run the SPEAR control experiments for 20 years, respectively.

Deterministic correlation

We use the ACC to evaluate the deterministic forecast skill of wintertime AR activity. The ACC is defined as35:

$${ACC}=\frac{{\sum }_{i=1}^{N}{(M}_{i}-\bar{M}){(O}_{i}-\bar{O})}{\sqrt{{\sum }_{i=1}^{N}{{(M}_{i}-\bar{M})}^{2}}\sqrt{{\sum }_{i=1}^{N}{{(O}_{i}-\bar{O})}^{2}}}$$
(2)

where i is the time index, and N is the total number of hindcast initializations. \({M}_{i}\) and \({O}_{i}\) are the weekly AR anomalies in model ensembles and reanalysis at each grid point. \(\bar{M}\) and \(\bar{O}\) are defined as the average over the period from 2000 – 2019.

Moreover, the significance test for the correlation is performed considering the effective sample size (Ne). The formula for Ne is given by Bretherton et al.58:

$${N}_{e}=\frac{N}{1+2{\sum }_{k=1}^{N-1}(1-\frac{k}{N}{P}_{o}(k){P}_{m}(k))}$$
(3)

where N accounts for the number of sample pairs, Po and Pm are the autocorrelation coefficients of observation and model forecast at time lag k.

Ratio of predictable component

To evaluate whether the AR frequency forecast signal is of the expected amplitude relative to the noise, given the deterministic forecast skill, we use a diagnostic tool called the ratio of predictable components (RPC)59:

$${RPC}=\frac{{r}_{{mo}}}{\sqrt{{\sigma }_{{\rm{signal}}}^{2}/{\sigma }_{{\rm{total}}}^{2}}}$$
(4)

where \({r}_{{mo}}\) is the correlation between ensemble mean forecasts and observations. \({\sigma }_{{\rm{signal}}}^{2}\) and \({\sigma }_{{\rm{total}}}^{2}\) account for the ensemble mean variance and the mean variance of individual ensemble members, respectively. In this study, \({\sigma }_{{\rm{signal}}}^{2}\) is estimated for wintertime AR frequency as the variance of the “signal” – defined as the variance of ensemble mean of the SPEAR hindcasts (ten ensemble members). \({\sigma }_{{\rm{total}}}^{2}\) is the average variance of wintertime AR frequency across all the individual ensemble members. An RPC of 1.0 implies a perfect model forecast system, where the model potential predictability and actual forecast skill are equivalent. RPC exceeding 1.0 suggests a signal-to-noise paradox, where actual correlation exceeds the potential predictability, indicating an underconfident forecast and over-dispersive ensemble. Conversely, RPC below 1.0 indicates actual correlation below potential correlation, indicating overconfident forecasts and an under-dispersive ensemble. For more details of the signal-to-noise paradox, we refer readers to Eade et al.59, Zhang and Kirtman60, and Klanvans et al.61. To the best of our knowledge, this study represents the first RPC evaluation of AR predictions.

Brier skill score

The BSS provides a quantitative estimation of probabilistic skill by calculating the fraction of ensemble members successfully predicting a specific event among all the ensemble forecasts62,63. The BSS measures the improvement of a predictive model over a baseline model (often a climatological forecast) in terms of how well it predicts probabilities. Following Weigel et al.64, we added a correction term C in the BSS formula to avoid its sensitivity to small ensemble sizes. For hindcast systems with a small ensemble size, the term C is given more weight as it is reciprocal to the ensemble size. The corrected BSS is computed as:

$${BSS}=1-\frac{{BS}}{{{BS}}_{{clm}}+C}=1-\frac{\frac{1}{N}{\sum }_{i=1}^{N}{({P}_{i}-{O}_{i})}^{2}}{\frac{1}{N}{\sum }_{i=1}^{N}{{(P}_{{clm}}-{O}_{i})}^{2}+\frac{1}{m}{P}_{{clm}}(1-{P}_{{clm}})}$$
(5)

Where BS and BSclm are the Brier Skill for the SPEAR hindcast and reference climatology, respectively. N is the total number of hindcast initializations. m is the ensemble size. Pi is the forecast probability of a single hindcast falling into a specific AR frequency level. Pclm is the probability of reference climatology for a particular level of AR frequency. For each of the AR occurrence level, the average BSS over all ensembles in the SPEAR hindcast and the corresponding ERA5 reanalysis is computed. The BSS ranges from -\(\infty\) to 1.0, with the value of 1.0 indicating a perfect forecast. The BSS smaller than or equal to 0 suggests that the forecast system provides no improvement from a forecast based on reference climatology. We calculate the BSS for each of the three AR frequency levels separately.

Average Predictability Time (APT) and MPMs

The APT characterizes a system’s overall predictability at all lead times and can be decomposed into a set of independent components (or MPMs), which can be ordered by their fractional contributions to the predictability. This decomposition is analogous to empirical orthogonal function (EOF) analysis but identifies structures or patterns that maximize predictability time rather than variance. The APT is defined at discrete time as:

$${APT}=2\mathop{\sum }\limits_{\tau =1}^{\infty }\left(1-\frac{{\sigma }_{\tau }^{2}}{{\sigma }_{{clim}}^{2}}\right)$$
(6)

where \({{\boldsymbol{\sigma }}}_{{\boldsymbol{\tau }}}^{{\boldsymbol{2}}}\) is the forecast variance at lead time \({\boldsymbol{\tau }}\), averaged over all initialization times. \({{\boldsymbol{\sigma }}}_{{\boldsymbol{clim}}}^{{\boldsymbol{2}}}\) is the time-averaged climatological variance estimated as the variance of all ensemble forecasts with different lead times. \({\boldsymbol{1}}{\boldsymbol{-}}{{\boldsymbol{\sigma }}}_{{\boldsymbol{\tau }}}^{{\boldsymbol{2}}}/{{\boldsymbol{\sigma }}}_{{\boldsymbol{clim}}}^{{\boldsymbol{2}}}\) represents a measure of predictability at lead time \({\boldsymbol{\tau }}\), the value of which is close to 1.0 initially and decreases to near 0 as the forecast variance approaching the climatological variance. The APT is thus defined as twice the sum of this specific predictability measure across all lead times.

After a series of derivations, maximizing APT in the model ensemble forecasts generates an eigenvalue problem:

$$2\mathop{\sum }\limits_{\tau =1}^{\infty }(\mathop{\sum }\limits_{{clim}}-\mathop{\sum }\limits_{\tau })q={\rm{\lambda }}\mathop{\sum }\limits_{{clim}}q$$
(7)

where the eigenvectors q is the projection vector decomposing the time series into ordered, independent modes to optimize APT. λ is the eigenvalue giving corresponding APT values of each uncorrelated component. This uncorrelated set of components is ordered in a descending manner, with the first eigenvector maximizes APT, the second maximizes APT subject to being uncorrelated with the first eigenvector, and so on42,43. The eigenvalues produce the corresponding maximized APT values of each component, which is then defined as the MPMs. We use the SPEAR model ensembles for the estimation. Considering a state variable x at a specific lead time \({\boldsymbol{\tau }}\), start time t and ensemble member e, we can acquire the spatial pattern of MPMs by projecting the component time series \({{\boldsymbol{q}}}^{{\boldsymbol{T}}}{\boldsymbol{x}}\) onto x. The spatial ___domain applied with this analysis (0-60°N, 90°E-90°W) covers eastern Asia, the North Pacific, and western North America. For further details on the APT method and estimations of MPMs, please refer to DelSole and Tippett42,43, Jia et al.44, Yang et al.45, and Xiang et al.46.

Definition of climate modes

The ENSO phases are determined based on the Niño3.4 index defined as area-averaged SST anomalies over the eastern Tropical Pacific (170°-120°W, 5°S-5°N). The El Niño (La Niña) event is identified when the standardized Niño 3.4 index is greater (smaller) than 0.5 (-0.5).

The PNA describes the differences of the 500 hPa pressure level between the polar and equatorial zones of the Pacific and North America. The PNA index is derived from the formula by Wallace and Gutzler49:

$$\begin{array}{l}{PNA}=0.25* \left[\right.Z\left(20^{\circ} N,\,160^{\circ} W\right)-Z\left(45^{\circ} N,165^{\circ} W\right)\\\qquad\quad\;+\,Z\left(55^{\circ} N,115^{\circ} W\right)-Z(30^{\circ} N,85^{\circ} W)\end{array}$$
(8)

where Z is standardized 500 hPa geopotential height.

The AO index is computed by projecting the 1000 mb height anomalies poleward of 20°N onto the empirical orthogonal function loading pattern of the AO. Positive and negative phases of PNA or AO are defined as 0.5 standard deviation above or below their respective indices. We quantify the QBO index utilizing the area-average means of zonal wind anomaly at 50 hPa (u50) averaged between 10°S and 10°N. Cases with the standard deviation of QBO index greater (less) than 0.5 (-0.5) are considered westerly (easterly) periods. The Real-time Multivariate MJO (RMM) index65 is applied to quantify the phases and magnitude of the MJO. Only cases where the magnitude of RMM > 1.0 are considered in composite analysis.