Extended Data Fig. 1: Comprehensive evaluation of ESM1b and EVE on ClinVar, HGMD/gnomAD and deep mutation scans. | Nature Genetics

Extended Data Fig. 1: Comprehensive evaluation of ESM1b and EVE on ClinVar, HGMD/gnomAD and deep mutation scans.

From: Genome-wide prediction of disease variant effects with a deep protein language model

Extended Data Fig. 1

(a) ROC curves of ESM1b and EVE as binary classifiers of variant pathogenicity over ClinVar (left) and HGMD/gnomAD (right). The true positive rate at the standard false positive rate (0.05) is annotated across all 4 curves. (b) Evaluation of EVE (left bar plots) and ESM1b (right bar plots) over ClinVar (top panels) and HGMD/gnomAD (bottom panels), using either the global ROC-AUC (red) or gene-average ROC-AUC (yellow) metric (see the relevant section in the Methods). For each dataset, we show the results for either the full dataset (left panels), or the subsets of variants in long (middle panels) or short (right panels) proteins (defined by a threshold of 1,022aa, which is the maximum window length supported by ESM1b; see Methods). Dashed lines: the top score (obtained by ESM1b or EVE) according to each of the two metrics. (c) Evaluation of ESM1b and EVE on deep mutational scanning datasets over each of the 28 assays (which were aggregated per gene in Fig. 3b).

Back to article page