Extended Data Fig. 10: Assessment of gkmSVM model performance and additional high-effect candidate fine-mapped SNPs.

(a) The area under the receiver operator (AUROC), or (b) precision recall (AUPRC) curves for the gkm-SVM machine learning classifiers for each of the cluster models. Each dot indicates a cross-validation fold (n = 10). Boxplots represent the median, 25th percentile and 75th percentile of the data, and whiskers represent the highest and lowest values within 1.5 times the interquartile range of the boxplot. (c) The overlap of training data (peak sequences) between models. (d) The performance of each cluster model on predicting test sequences from a non-target cluster. (e) Enrichment of high-effect fine-mapped SNPs from eczema relative to random fine-mapped SNPs in cis-regulatory regions. (f) Same as in (e), but for AGA. (g) Normalized chromatin accessibility landscape for cell type–specific pseudo bulk tracks around the BNC2 locus. Integrated BNC2 expression levels are shown in the violin plot for each cell type to the right. The position of ATAC-seq peaks, the GWAS lead SNP, the fine-mapped SNP candidates in LD with the lead SNP, and the candidate functional SNP are shown below the ATAC-seq tracks. Significant peak-to-gene linkages are indicated by loops connecting the BNC2 promoter to indicated peaks. (h) GkmExplain importance scores for the 50 bp region surrounding rs12350739, a hair color associated SNP that creates a JUN motif in a CRE linked to BNC2 expression. (i) Same as in (G), but for the ALX4 locus. (j) GkmExplain importance scores for the 50 bp region surrounding rs10769041, an AGA associated SNP that disrupts an ETS motif in a CRE linked to ALX4 expression.