Fig. 2: The comparative performance of the pure ML model (blue boxes) and KGML-ag-Carbon (red boxes) when using different sizes of observed data samples for model training. | Nature Communications

Fig. 2: The comparative performance of the pure ML model (blue boxes) and KGML-ag-Carbon (red boxes) when using different sizes of observed data samples for model training.

From: Knowledge-guided machine learning can improve carbon cycle quantification in agroecosystems

Fig. 2

a, b The yield prediction performance over 210 counties. n = 1, 50, 20, 10, 10, 5, 4, 2, 1, and 1 independent experiments for model ensembles with a training sample size of 0, 2, 5, 10, 20, 40, 80, 160, 240, and 320, respectively. Each training sample has a 21-year period of annual yield observations in one county. c, d The Reco and NEE prediction performance across 11 EC flux towers. n = 48 independent experiments for training sample size from 1 to 7. n = 7 and 6 independent experiments for the ecosys model and KGML-ag-Carbon model ensembles with a training sample size of 0, respectively. Each training sample has daily observations during the observation period in one site (varying by site, ranging from 5 to 19 years). Each box plot illustrates the first and third quartiles (lower and upper box edges), median (central line), and minimum and maximum (lower and upper whiskers), with outliers as round circles. The green stars represent the performance of ecosys in crop yield simulations across the U.S. states of Illinois, Iowa, and Indiana constrained with remotely sensed GPP and observed yield, and the green boxes represent the performance of ecosys in carbon flux simulations at 7 EC flux tower sites across the U.S. Midwest from Zhou et al.13, which is a subset of the dataset used in this study. Only out-of-sample test results from cross-validation ensembles are depicted here. Details of the experiments can be found in the “Methods” section. Source data are provided as a Source Data file.

Back to article page