Fig. 4: Analysis of shared morphologies between human breast cancer and Myc;Ptenfl tumor TMAs.

A Pipeline for generating morphological feature representation using a variational autoencoder (VAE). Tiles from both mice and human TMAs are used to train a VAE, then a latent encoding vector is computed for each tile. Tiles are compared using UMAP embedding and k-means clustering analysis of the latent features. B Density functions for all human and mouse tumor tiles are calculated within the two-dimensional UMAP space to visually compare overlap in embedding space (corresponding histopathological feature image is shown in Supplemental Fig. S5A). C K-means clusters (n = 8) are computed using latent features and projected into UMAP space for visualization. Clusters consisting primarily of edge artifacts were excluded from the analysis. D For every cluster, the nine tiles closest to the cluster center (left) and a single high-resolution tile image within each cluster (right, a representation of the nine tiles clusters. Scale bar = 22 µm) are shown to illustrate each cluster dominant morphology; main histologic features of each center: [a] carcinoma with discohesive growth pattern; [b] carcinoma with thin fibrotic septa and hyperchromatic nuclei; [c] stroma or coagulative necrosis; [d] IDC with high-grade nuclear feature; [e] inflammatory cell infiltration in the stroma; [f] fibrotic stroma, scattered single tumor cells; [g] sarcomatoid change of tumor cells and inflammatory cell infiltration; and [h] tumor with hyperchromatic and coarse chromatin with frequent atypical mitosis (The figure includes 92 mice TMA cores, representative of 80 mice, and 172 human TMA cores, corresponding to 172 patients). E Relative abundance of human and mouse tumor tiles is calculated for each cluster using the ratio of tiles in a cluster to total tiles from the given TMA source.