Extended Data Fig. 10: Analyses of lineage-associated transcription factor-programs in breast cancer.

(a) Workflow for analyzing the transcription factor (TF) landscape of nā=ā1,096 TCGA breast cancer and normal cases by voom-normalized RNA-seq data. The top 500 most variably expressed transcription factors across the cohort were analyzed. (b) K-means clustering of cases; optimal kā=ā4. (c) UMAP representation and numbers of cases, colored by PAM50 subtype (top) or histological receptor subtype (bottom). Nā=ā173, 304, 486, and 133 tumors by x-axis order. āNormalā refers to normal breast tissue. āUnknownā refers to samples for which data was not unavailable. (d) One-way Welch test to identify most variable transcription factors. F is calculated as the variation of a TF between cluster means / variation within clusters; a larger F-test score represents TFs that are more specific to individual clusters. Each point is one TF; nā=ā1,271 TFs total. (e) Correlation matrix on top 100 ranked TFs to identify co-occuring transcription factors (for example āprogramsā); four programs identified. (f) Running score and pre-ranked list plots of GSEA results. Plots are shown for ESR1-mutant cases; NMD cases were omitted since all comparisons were non-significant.