Fig. 3: The contributions of model structure and training strategies to improving KGML-ag-Carbon performance. | Nature Communications

Fig. 3: The contributions of model structure and training strategies to improving KGML-ag-Carbon performance.

From: Knowledge-guided machine learning can improve carbon cycle quantification in agroecosystems

Fig. 3

a The contributions from different components of KGML-ag-Carbon in improving the annual corn yield prediction accuracy by sequentially adding one component to the pure ML model. n = 20 and 5 independent experiments for model ensembles with small and large training data, respectively. b The contributions to improving the annual cumulative NEE flux prediction accuracy. n = 48 independent experiments for model ensembles with both small and large training data. Data in a and b are presented as mean values ± standard deviation. Values below each bar represent the mean performance increase from the previous step. ML w/o GPP indicates the pure ML model without GPP input; GPP Data indicates the ML model with GPP inputs; Pretrain indicates the GPP Data model pre-trained using the synthetic data generated by the process-based model; Structure indicates the model that contains hierarchical architecture, is pre-trained with synthetic data, and includes GPP inputs; KG Loss indicates the Structure model that considers knowledge-guided loss terms; KGML-ag-Carbon indicates the final model that considers both knowledge-guided architecture and loss terms, contains GPP inputs and is pre-trained using synthetic data. c The benefits of knowledge-guided components (pretrain, structure, and KG Loss) on reducing the residual mass balance (GPP-Reco-NEE). n = 48 independent experiments. Each box plot illustrates the first and third quartiles (lower and upper box edges), median (central line), mean (solid red dot), and minimum and maximum (lower and upper whiskers). d An example (2016 of US-NE1) of predicted NEE fluxes from models with different knowledge-guided components. n = 8 independent experiments for model ensembles at this site. Shaded areas represent the region within the max and min of the simulation ensembles, while the solid lines represent the mean values. Data in inset plots are presented as mean values (solid dots) and minimums and maximums (whiskers) of R2/RMSE. ML + GPP indicates the pure ML model with GPP inputs; ML + GPP + pretrain indicates the pure ML model pre-trained with synthetic data and with GPP inputs; KGML-ag-Carbon indicates the final model. Source data are provided as a Source Data file.

Back to article page