Extended Data Fig. 7: Performance metrics across models trained for different datasets.

Each row contains a different performance metric, while each column represents a single dataset. Training and validation sets were identical, but mNN correction incorporates the query dataset, slightly modifying input data. Accuracy metrics are derived from analysis of the holdout validation set, consisting of approximately 50% of the original dataset not used for training either SVM or XGB models (104902 cells). The first row presents histograms of XGBoost classification confidence for cells in the validation set, highlighting cells below 70% confidence in yellow and below 50% in red (the latter cells are dropped). Most cells in the validation set are classified with high confidence. Row 2 contains a UMAP visualization of classification confidence, revealing higher confidence for cells at the UMAP periphery and lower confidence for intermediate cells. Row 3 shows confusion matrices for the validation set. Row 4 presents sensitivity and specificity per class, which are comparable across different datasets. Row 5 shows boxplots for XGB classification confidence across the 4 classes. Boxplots represent the median (center), 25% (lower hinge), and 75% (upper hinge) percentiles. Whiskers extend to 1.5 times the IQR from the nearest hinge, with more extreme values represented as circles. Minima and maxima are not explicitly depicted. Classification confidence varies substantially depending on the data, with the ROSMAP data being the only dataset where classification confidence for families 167 and 24 is generally comparable to that for 3 and 5. Row 6 contains histograms of XGBoost classification confidence for the query cells. Notably, the glioblastoma and xenograft data have similar classification confidence to the validation set, but the ROSMAP data, and to a lesser extent, the Dräger data, diverge noticeably. Finally, row 7 shows marker gene expression across assigned labels in the query datasets. The size of the circle represents the percentage of cells in each cluster expressing the gene (no circle plotted if less than 10% of cells in a cluster express the gene). The color of the circle represents z-scored expression of the gene. Despite systematic differences, label transfer aligns expression profiles effectively.