Fig. 2: Feature selection of the data layers and features that contribute most to model performance.
From: Nasal DNA methylation at three CpG sites predicts childhood allergic disease

a Based on the validation performance of an Elastic Net model using a ten-times repeated tenfold cross-validation procedure. Layers were added sequentially: age and gender were added first, then perinatal factors were included in the model, etc. Negative contributions could occur when variables had low predictive power and increased model overfit. Perinatal features were low birth weight and breastfeeding; Environment: pets during pregnancy, maternal smoking, and older siblings at home; Genetics: allergy SNPs and polygenic risk scores (PRS) for the combined allergy phenotype, as well as for asthma, rhinitis, eczema, and IgE sensitization; Blood methylation: 219 CpG sites from blood cells previously associated allergy; Nasal methylation: 134 CpG sites from nasal cells previously associated with allergy. b Rank product of top-ten features. c The AUC of models with an incremental number of features; after the first three nasal CpG sites, adding another CpG site did not further increase the model’s performance.