Extended Data Fig. 5: Benchmarking gene expression deconvolution approaches in B-ALL. | Nature Cancer

Extended Data Fig. 5: Benchmarking gene expression deconvolution approaches in B-ALL.

From: Multipotent lineage potential in B cell acute lymphoblastic leukemia is associated with distinct cellular origins and clinical features

Extended Data Fig. 5

a) Overview of benchmarking inference of developmental state abundance in B-ALL across various gene expression deconvolution approaches. This spans the stages of 1) feature selection and signature matrix generation, 2) application of gene expression deconvolution algorithm, and 3) benchmarking deconvolution accuracy using 85 patient samples with matched scRNA-seq and bulk RNA-seq. For the feature selection step, patient/donor ID was controlled for as a covariate in order to emphasize intra-donor heterogeneity in identifying transcriptional markers of normal and leukemic cell states along B-cell development. b,c) Association between predicted abundance (bulk RNA-seq deconvolution) and observed abundance (scRNA-seq) of B-ALL developmental states across 85 matched patient samples, shown for each deconvolution method and split by signature matrix. Associations shown include the Pearson correlation (b) as well as the coefficient of determination denoted as R-squared (c). Lines between points in the boxplots link the performance for each B-ALL developmental state (HSC/MPP, myeloid progenitor, pre-pDC, early lymphoid, pro-B, pre-B, mature B) across quantification methods. P values denote results from two-tailed paired t-tests comparing our LASSO regression approach with CIBERSORTx deconvolution. Box plots indicate the range of the central 50% of the data, with the central line marking the median. Whiskers extend from each box to 1.5x the interquartile range. d) Scatterplots of predicted abundance (bulk RNA-seq) and observed abundance (scRNA-seq) for each quantification method, with each dot representing one of the 85 patient samples with matched scRNA-seq and bulk RNA-seq. For each method, the B-ALL biologically relevant genes signature matrix was used. For each association, the linear regression line shaded with the 95% CI, as well as r and P values from Pearson correlation, are shown.

Source data

Back to article page