Extended Data Fig. 7: Regulatory analysis with GET. | Nature

Extended Data Fig. 7: Regulatory analysis with GET.

From: A foundation model of transcription across human cell types

Extended Data Fig. 7

a. Predicted top three regulators (motifs) for BCL11A, NFIX, and HBG2. Similar sequence patterns are highlighted with color shades. b. GATA downstream targets inferred by GET (top 10% motif score) show functional enrichment in ‘hemopoiesis’. Scatterplot shows predicted gene expression (X axis) and GATA-motif score (Y axis) for GATA-targeted genes with predicted expression larger than 1. All transcription factors among these genes are labeled in the plot, where those involved in Hemopoiesis are highlighted in red. c. Workflow to collect and visualize the cross-cell-type regulatory embedding, showing a tSNE visualization of the resulting embedding space colored by Louvain clustering. d. Subsampled first layer embeddings from fetal astrocyte (blue) and two fetal erythroblast cell types are visualized with UMAP (yellow and brown). e. Louvain clustering of subsampled embedding in panel d. f. Gene ontology enrichment of genes in cluster 2, showing astrocyte-relevant terms and astrocyte marker genes e.g. NFIA, GLI3. X axis shows adjusted -log10 P-value from one-sided Fisher’s exact test. g. GET motif contribution Z-score (red means higher score compared to other clusters) for each cluster. Note that cluster 2 has elevated NFI/1 and NFI/2 motifs, which correspond to the NFI family transcription factors. h. Top correlated motif pairs have significantly larger functional similarity. X-axis is cosine similarity computed on term (motif clusters) frequency–inverse document (Gene Ontology biological process) frequency (transcription factor-IDF) matrix. i. Example causal neighbor graph showing interactions (edges) between motifs (nodes). Edge weights represent interaction effect size. Edge directions mark causal direction. Blue and red edge colors mark negative and positive estimated causal effect sizes by LiNGAM, respectively. Node color marks community detected on the full causal graph. In-community edges are marked by reduced saturation. j. Out-degree distribution across cell-type-specific causal networks.

Back to article page