Fig. 1: Study design to develop and validate the prediction model and functionally interpret its predictors.
From: Nasal DNA methylation at three CpG sites predicts childhood allergic disease

a We integrated multi-omics data, including, genetics, DNA methylation from blood and nasal brushes, and environmental factors from the PIAMA cohort. Using machine learning methods, we aimed to (1) assess the contribution of each data layer to the performance of a prediction model; and (2) present a simple prediction model for allergic disease. b To develop a parsimonious prediction model, we first ranked all features from the full dataset, then features were added incrementally until no significant increase in performance was seen. The performance of the final parsimonious model was demonstrated in the discovery cohort. c The final model was evaluated by applying it to another similar but independent cohort (EVA-PR) and two independent younger children cohorts, COPSAC2010 and MAKI. d To understand the information that was captured by the three CpG sites used in the model, we linked the methylation level of the CpG sites to the expression of genes by eQTM analysis. The eQTM genes were functionally interpreted by gene network and pathway analysis; scRNA-seq data were used to interpret the expression pattern of eQTM genes in different nasal cell types.