Extended Data Fig. 1: Validation of machine-learning surrogates against observations.
From: Substantial cooling effect from aerosol-induced increase in tropical marine cloud cover

Panel (a) June and (b) July. Left subpanels show validations in non-eruption years, demonstrating the very good agreement between machine-learning surrogates and observations, with regression (pink lines) very close to the 1:1 black lines and 80% of the data (black dash circles) around the 1:1 lines. The shading areas indicate the 90% confidence interval for the multi-year regression lines based on individual years. The middle subpanels show results for the eruption year 2008 and the right subpanels show results for the eruption year 2018, highlighting the differences between machine-learning surrogates and observations. The following variables are shown from top to bottom: cloud droplet number concentration (Nd), cloud droplet effective radius (reff), in-cloud liquid water path (LWP), cloud fraction (CF, or cloud cover), reflected shortwave radiation (SW) at the top of the atmosphere (TOA), and daily precipitation (Rain). The colour of each pixel indicates the normalized data density function, brighter colour means more data points in this pixel. A steeper pink slope (than the black dashed line) indicates an increase of the given variable compared to the non-eruption years average. Note that the slopes here can be different from the ratios in Fig. 2, in which area-weighted averaging is applied and provides a more robust validation using Monte Carlo bootstrapping.