Table 1 Impact of Quality Control, Imputation of missing genotypes and Coding methods on the results of Lasso penalized Logistic Regression.
AUC train | AUC test | \({{\boldsymbol{N}}}_{{\bf{S}}{\bf{N}}{\bf{P}}}^{{\boldsymbol{p}}}\) | N SNP≠0 | \({{\boldsymbol{I}}}_{{\bf{S}}{\bf{N}}{\bf{P}}}^{(\ast )}\) | \({{\boldsymbol{I}}}_{{\bf{L}}{\bf{o}}{\bf{c}}{\bf{i}}}^{(\ast )}\) | \({{\boldsymbol{I}}}_{{\bf{t}}{\bf{o}}{\bf{p}}{\bf{S}}{\bf{N}}{\bf{P}}}^{(\ast )}\) | \({{\boldsymbol{I}}}_{{\bf{t}}{\bf{o}}{\bf{p}}{\bf{L}}{\bf{o}}{\bf{c}}{\bf{i}}}^{(\ast )}\) | \({{\boldsymbol{I}}}_{{\bf{L}}{\bf{o}}{\bf{c}}{\bf{i}}}^{({\bf{G}}{\bf{W}}{\bf{A}}{\bf{S}})}\) | \({{\boldsymbol{I}}}_{{\bf{t}}{\bf{o}}{\bf{p}}{\bf{L}}{\bf{o}}{\bf{c}}{\bf{i}}}^{({\bf{G}}{\bf{W}}{\bf{A}}{\bf{S}})}\) | |
---|---|---|---|---|---|---|---|---|---|---|
NoQC/Unkw/OHE | 0.925 ± 0.003 | 0.922 | 23583 | 2927 | 29% | 55% | 6% | 35% | 88% | 19% |
QC/Unkw/OHE | 0.808 ± 0.008 | 0.802 | 21896 | 3198 | 69% | 87% | 38% | 48% | 89% | 25% |
NoQC/Maj/sum | 0.901 ± 0.003 | 0.897 | 23583 | 3419 | 36% | 69% | 6% | 23% | 90% | 12% |
QC/Maj/sum | 0.805 ± 0.008 | 0.800 | 21896 | 3553 | 91% | 100% | 64% | 63% | 91% | 27% |
NoQC/HWc/sum | 0.812 ± 0.007 | 0.803 | 23583 | 2730 | 38% | 66% | 26% | 45% | 87% | 29% |
QC/HWc/sum | 0.803 ± 0.008 | 0.800 | 21896 | 2575 | — | — | — | — | 89% | 36% |
QC/HWc/OHE | 0.796 ± 0.008 | 0.786 | 21896 | 3242 | 72% | 89% | 49% | 60% | 89% | 29% |
QC/HWc/raw | 0.800 ± 0.008 | 0.792 | 21896 | 2757 | 72% | 88% | 57% | 68% | 89% | 29% |
QC/HWa/sum | 0.803 ± 0.008 | 0.799 | 21896 | 2579 | 94% | 99% | 91% | 96% | 89% | 36% |