Extended Data Fig. 7: Neighborhood analysis, validation and outcome association.

(a) Schematic representing process of identification of cellular neighborhoods with representative images for ER+ IDC, ILC and TNBC (b) False color image example of cellular neighborhood composition (left panel) and cell type distribution (right panel) along the X, Y coordinates for cells within a ROI (c) Types of cellular neighborhoods with a distance threshold of 20 μm instead of 50 μm (Fig. 2c) and relative enrichment above or below mean across neighborhoods for B cells (CD20+), CD8+ T cells, CD4+ T cells, Tregs (Foxp3+), tumor cells (PanCK+) and macrophages (CD68+). Likelihood of enrichment calculated as log odds ratio normalized between 5 and −5 (d) Frequency distribution of cellular neighborhoods at distance threshold 20 μm within ER+ IDC (n = 50) and ER+ ILC (n = 65). Two tailed Mann Whitney non-parametric T test was used for statistical analysis with p < 0.05 considered significant. (e) Types of cellular neighborhoods with a distance threshold of 70 μm instead of 50 μm (Fig. 2C) and relative enrichment above or below mean across neighborhoods for B cells (CD20+), CD8+ T cells, CD4+ T cells, Tregs (Foxp3+), tumor cells (PanCK+) and macrophages (CD68+). Likelihood of enrichment calculated as log odds ratio normalized between 5 and −5 (f) Frequency distribution of cellular neighborhoods at distance threshold 70 μm within ER+ IDC (n = 50 samples) and ER+ ILC (n = 65 samples). Two tailed Mann Whitney non-parametric T test was used for statistical analysis. (g) Boxplot visualizing individual patient CN frequencies across ROIs and associated tables listing correlation values for individual ROIs (ranging between 3–19) with median value of neighborhood frequency for the patient as a measure of intrapatient heterogeneity in ER+ IDC (left panel) and ILC (right panel). Box plots bound the first and third quartile, with the center representing the median and whiskers representing the minimum and maximum values. Chi- square test (H-statistic) was performed using Kruskal Wallis test for ILC (n = 552 ROIs, 64 degrees of freedom) and IDC (n = 479 ROIs, 49 degrees of freedom) (h) Cox-proportional hazards model for overall survival (OS) against log frequency of individual cellular neighborhoods (CNs) as variables adjusting for tumor grade in ER+ IDC (n = 50 samples) (top panel) and ER+ ILC (n = 62 samples) (bottom panel). Log Hazard ratios with 95% confidence interval, error bars show 2.5% (lower) and 97.5% (higher) bounds of the confidence Interval. Wald test was implemented to test whether HR=1 without multiple comparison adjustment, p values for significance listed for each parameter. Red box highlights significant association between OS and given variable.