Extended Data Fig. 10: Predictive modeling of methylation data. | Nature Medicine

Extended Data Fig. 10: Predictive modeling of methylation data.

From: Deciphering the genomic, epigenomic, and transcriptomic landscapes of pre-invasive lung cancer lesions

Extended Data Fig. 10

In addition to the predictive modeling based on probe variation shown in Fig. 5, we used differentially expressed methylation probes to create a predictor using a Prediction Analysis for Microarrays (PAM) method. The model was trained on a training set (a-c) consisting of 26 progressive samples, 11 regressive samples and 23 control samples, shown in red, green and blue, respectively. A predictor based on 141 DMPs was created. This was applied to a validation set of 10 progressive, 7 regressive and 10 control samples (d-f), predicting outcome with AUC = 0.99. g-i, Application of our predictive model to TCGA methylation data. Samples were correctly classified into TCGA LUSC and TCGA control samples with AUC = 0.99. j-m, ROC analytics and precision-recall curves for Methylation Heterogeneity Index (MHI) model presented in Fig. 4. Curves apply to cancer vs control (j-k) and progressive vs regressive (l-m), respectively. n, Histogram of AUC values using MHI model with random samples of 2000 probes, applied to progressive vs regressive data. This demonstrates that a similar AUC is achieved with a random sample of probes as when using the entire array.

Back to article page