Extended Data Fig. 1: CHIEF accurately identified the origins of tumors, with results validated in independent patient cohorts from the Clinical Proteomic Tumor Analysis Consortium (CPTAC).
From: A pathology foundation model for cancer diagnosis and prognosis prediction

a. The confusion matrix of CHIEF’s prediction in the held-out test sets. The overall macro-averaged accuracy of CHIEF is 0.895. b. CHIEF achieved high prediction performance and generalizability to independent cohorts in tumor origin prediction (AUROC = 0.9853 ± 0.0245). Micro-averaged one-versus-rest ROC curves for tumor origin classification are shown. We presented the AUROC±s.d. calculated across 18 tumor origins. In comparison, state-of-the-art methods have substantially lower performance in the independent cohorts (two-sided Wilcoxon signed-rank test P-value = 0.000015). c. CHIEF attained higher accuracy than state-of-the-art deep learning methods in tumor origin prediction. Overall accuracies for the held-out (n = 1,895) and independent test sets (n = 3,019) for CHIEF and other deep learning methods are shown. d. CHIEF attained higher AUROC, sensitivity, and specificity for each tumor origin in the held-out test sets (n = 1,895) compared with other methods. The model performance for all 18 tumor origins is shown. e. CHIEF possessed significantly higher AUROC, sensitivity, and specificity for each origin in the independent test sets (n = 3,019, P-value = 0.003906, two-sided Wilcoxon signed-rank test). In contrast, standard machine learning approaches suffer from substantial performance drops when applied to patient cohorts not involved in model development. In c-e, error bars represent 95% confidence intervals computed by the bootstrap method (n = 1,000 replicates), and the centers represent the values of various performance metrics specified in these figure panels. The detailed sample size for each cancer type shown in d-e can be found in Supplementary Table 14.