Figure 2 | Scientific Reports

Figure 2

From: Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data

Figure 2

ROC AUC scores for Linear Regression model under different conditions on the dataset and on the penalty terms. Black dots and error bars refer to mean values and 2 standard deviation confidence intervals for 10 fold cross-validated models on the train dataset. Red diamonds refer to AUC scores obtained on the test dataset with the model trained on the entire train dataset, using the corresponding cross validated hyper-parameters. The numbers on top of the error bars refer to the number of features used by the model, and, in parenthesis, to the number of original features in the dataset. We show the AUC scores for: (A) different values of the upper bound on p-values for the SNP preselection phase, with \({\rm{MAF}} > 0.01\); (B) different values of the lower bound on MAF for the SNP preselection phase, with p-value \(p < {10}^{-4}\); (C) different values of the case/control ratio; (D) different types of regularization. In (C,D) \(p < {10}^{-4}\) and \({\rm{MAF}} > 0.01\).

Back to article page