Extended Data Fig. 2: GET model performance and benchmarking.
From: A foundation model of transcription across human cell types

a. Pearson correlation of gene expression between biological replicates and across different culture systems for human astrocytes. b. Scatterplots showing predicted vs. observed log fold change (log FC) in gene expression between different cell type pairs. Top row compares erythroblast subtypes and hepatocytes. Bottom row compares enteric neurons to erythroblasts and astrocytes. Pearson correlations and R2 values are provided for each comparison. c. Comparison of Pearson correlation between predicted and observed gene expression in left-out cell types for different models and baselines. Quantitative ATAC: GET model with quantitative ATAC and motif input. Training cell type mean: Mean expression value across all training cell types. Enformer LP: Linear probing of Enformer CAGE header outputs. Binarized ATAC: GET model with only motif input. GET with finetuning shows the highest correlation at 0.94. d. Ablation study of GET pretraining on leave-out astrocytes, showing superior performance of finetuning the pretrained model when compared against random initialization and baselines. e. Comparison of GET to baseline machine learning models (CNN, MLP, CatBoost, SVMRegression, RandomForest, and LinearRegression) on leave-out-chromosome 11 and leave-out-astrocyte prediction performance (R2, Pearson correlation, Spearman correlation). f. Radar plot showing leave-one-chromosome-out finetuning performance (R2, Pearson correlation, Spearman correlation) of GET in fetal astrocytes.