Fig. 2: Non-trivial scaling behavior of scTab in cross-tissue cell type prediction. | Nature Communications

Fig. 2: Non-trivial scaling behavior of scTab in cross-tissue cell type prediction.

From: scTab: Scaling cross-tissue single-cell annotation models

Fig. 2

a Distribution of donors with respect to the number of unique cell types (x-axis) and with respect to the number of cells (y-axis). The y-axis histogram shows the distribution of donors with respect to the number of cells (log scale). The x-axis histogram indicates the distribution of donors with respect to the number of unique cell types. b Scaling behavior of scTab with respect to the size of the training data for two simulated scenarios in terms of macro F1-score and cross-entropy loss: i) cell-based subsampling which corresponds to increasing the number of sequenced cells while keeping the observed biological diversity constant ii) donor-based subsampling which corresponds to increasing the observed biological diversity. All cell types from the test set were observed during model training for all subsampled datasets. Data are presented as mean values ± 95% CI. Source data are provided as a Source Data file. c Scaling of the cross-organ model from Fig.Ā 2b with respect to training data size grouped by organ system (subsampling is done based on donor-based subsampling). Data are presented as mean values ± 95% CI. Source data are provided as a Source Data file. d Scaling behavior of scTab versus our linear reference model with respect to the training data size. Data are presented as mean values ± 95% CI. Source data are provided as a Source Data file. e Effect of training only on organ-specific data versus training on cross-organ data on organ-specific classification performance (evaluated on test data subset only to the corresponding organ) for scTab and the optimized linear model. Data are presented as mean values ± 95% CI. Source data are provided as a Source Data file. f Scaling behavior with respect to model size. The number of hidden units refers to the size of the fully connected layers (FC) in the architecture (Fig.Ā 1b, Methods). Data are presented as mean values ± 95% CI. Source data are provided as a Source Data file.

Back to article page