Deep learning identifies heterogeneous subpopulations in breast cancer cell lines

Jost, Tyler A.; Gardner, Andrea L.; Morgan, Daylin; Brock, Amy

doi:10.1038/s44385-025-00023-z

Download PDF

Article
Open access
Published: 21 June 2025

Deep learning identifies heterogeneous subpopulations in breast cancer cell lines

Tyler A. Jost¹,
Andrea L. Gardner¹,
Daylin Morgan¹ &
…
Amy Brock¹

npj Biomedical Innovations volume 2, Article number: 19 (2025) Cite this article

450 Accesses
5 Altmetric
Metrics details

Subjects

Abstract

Cells exhibit a wide array of morphological features, enabling computer vision methods to identify and track relevant parameters. Morphological analysis has long been implemented to identify specific cell types and cell responses. Here, we asked whether morphological features might also be used to classify transcriptomic subpopulations within in vitro cancer cell lines. Identifying cell subpopulations furthers our understanding of morphology as a reflection of underlying cell phenotype and could enable a better understanding of how subsets of cells compete and cooperate in disease progression and treatment. We demonstrate that cell morphology can reflect underlying transcriptomic differences in vitro using convolutional neural networks. First, we find that changes induced by chemotherapy treatment are highly identifiable in a breast cancer cell line. We then show that the intra cell line subpopulations that comprise breast cancer cell lines under standard growth conditions are also identifiable using cell morphology. We find that cell morphology is influenced by neighborhood effects beyond the cell boundary, and that including image information surrounding the cell can improve model discrimination ability.

Learning deep features for dead and living breast cancer cell classification without staining

Article Open access 13 May 2021

Unveiling functional heterogeneity in breast cancer multicellular tumor spheroids through single-cell RNA-seq

Article Open access 29 July 2020

Label-free tumor cells classification using deep learning and high-content imaging

Article Open access 26 August 2023

Introduction

Tumors are comprised of cells with varying levels of genetic and non-genetic differences^{1,2,3,4,5,6,7}. Viewing cancer as both an ecological and evolutionary^8,9 process has led to the development of “adaptive therapies”^10,11, which seek to contain the tumor rather than fully eliminate it. This perspective incorporates the inherent heterogeneity within cancer, as it seeks to address the subpopulations which proliferate when chemotherapy is applied. Heterogeneity, even within in vitro cell lines, has been well documented through single-cell sequencing methods^12,13,14,15. These high-dimensional measurements are generally destructive endpoint readouts and therefore can only provide snapshots in time, documenting the current state of the population. To better understand how relevant subpopulations react in response to various treatments and schedules, non-destructive methods that capture population dynamics in live cell samples are necessary.

Historically, image analysis of cell populations has focused on visually observable qualities such as cell line^16,17 or cell states that are defined primarily by distinctive morphological differences^18,19. Image analysis using machine learning has recently demonstrated that cell-state properties such as metastatic invasion^20,21,22, the induction of an epithelial to mesenchymal transition²³, or the introduction of genetic perturbations²⁴ can be detected through cellular morphology even though these properties are not readily observable through visual analysis. Deep learning has been particularly effective at discriminating between cell-state properties by extracting representations directly from images^25,26,27,28. Many of these results have been demonstrated only using imaging techniques such as phase contrast or brightfield microscopy. To further explore the capabilities of this approach, we asked whether more subtle cell-state differences are identifiable even within cell lines under standard culture conditions. Here, we investigate whether cells which exhibit distinct RNA expression patterns can be discriminated based on cell morphology alone.

To explore the extent to which transcriptomic differences are expressed through cell morphology, we tested the ability of a deep convolutional neural network (CNN) to discriminate cancer cell heterogeneity in three model conditions (Fig. 1a). In the first condition, we used heavily bottlenecked cell populations which had experienced high levels of doxorubicin exposure and compared them to healthy cells. In the second, we isolated and fluorescently labeled two transcriptomically distinct subpopulations within the MDA-MB-231 cell line. In the third, we extended this to transcriptomic subpopulations in the MDA-MB-436 cell line. After identification and isolation of these populations, we developed an image segmentation pipeline that allowed us to perform instance segmentation on patches of images (Fig. 1b). A CNN was then trained to discriminate between the two populations in each condition using only phase contrast imaging (Fig. 1c). We demonstrate that cell morphology is not only reflective of even subtle differences in transcriptomic expression, but that this morphology is heavily influenced by cell-to-cell orientation and interaction.

**Fig. 1: Workflow for data collection and classification.**

Results

To establish a baseline for what is identifiable through cell morphology using our methodology, we explored how chemotherapeutic dosing of MDA-MB-231 cells affected their morphology before and after treatment. Prior research has shown that morphology in MDA-MB-231 cells is altered dependent on the dosage of doxorubicin²⁹, therefore, we chose to treat cells with doxorubicin at a level we had previously observed exhibited a high lethality³⁰. Fifty thousand MDA-MB-231 cells were seeded and treated with 550 nM of doxorubicin after 24 h. After 48 h, the media containing the drug were removed and replaced with growth media. Cells were maintained in standard culture to reach a population size of 6 × 10⁶ cells/sample. We performed single-cell RNA sequencing (scRNA-seq) on both the treated population as well as an untreated control population. The full dataset was composed of 7502 treated and 3258 untreated cells after post-processing (Methods). Comparing overall RNA expression patterns revealed differences after treatment (Fig. 2a), with the lowest observed magnitude of correlation in gene expression between each condition (PCC = −0.03, Supplementary Fig. S1). Next, we separately plated 60 wells of doxorubicin-treated and 60 wells of control cells in 96-well plates and imaged at 20× magnification every 4 h by phase contrast using Incucyte S3 (Sartorius) until cultures reached maximum confluency. We collected 77,325 instances of treated and 120,536 instances of untreated cells. Each image was then segmented using a custom-trained Mask R-CNN segmentation network (Methods). To determine cell identity, we used a Resnet-152 CNN pretrained using ImageNet1k, as this had been previously shown to be effective at image classification^31,32. Using transfer learning³³, we were able to identify cells according to treatment status with an AUC of 0.95 (Fig. 3). This is in line with current knowledge, as it has been previously shown that doxorubicin induces an epithelial to mesenchymal^34,35 transition and that this transition causes a distinct morphological shift.

**Fig. 2: Transcriptional analysis and phenotype classification.**

**Fig. 3: Discrimination between untreated and treated populations.**

To determine whether it is possible to discriminate subsets of cells within a cell line, we examined scRNA-seq data of untreated MDA-MB-231 cells under standard growth conditions. We found that the MDA-MB-231 cell line consists of at least two transcriptomically distinct populations (231-Subpop 1 and 231-Subpop 2) as identified by Leiden clustering³⁶ (Fig. 2b). Relative to the untreated and treated sample comparison, the two intra-cell line populations have more similar gene expression patterns (PCC = 0.79, Fig. S1). We therefore asked whether these two subpopulations could be distinguished by the same approach of phase contrast imaging, segmentation, and phenotype classification.

In previous studies, we identified and experimentally validated ESAM as a surface marker that differentiates the identified clusters³⁷, with low expression in 231-Subpop 1 and high expression in 231-Subpop 2. To establish ground-truth knowledge of subpopulation identity during coculture imaging experiments, ESAM-separated subpopulations of MDA-MB-231 cells were established from parental cells with stable fluorescent nuclear labels on NLS-mCherry (ESAM-low, 231-Subpop 1) and NLS-GFP (ESAM-high, 231-Subpop 2) (Methods).

A 50/50 coculture of 231-Subpop 1 and 231-Subpop 2 cells was plated in 60 wells of a 96-well plate. They were imaged at 20× magnification every 4 h with phase contrast and fluorescence imaging using an Incucyte S3. Because each population is fluorescently labeled based on its transcriptomic identity, we were able to label each cell’s phenotype after segmentation. In total, we gathered 30,418 instances of cocultured 231-Subpop 1 and 63,758 instances of 231-Subpop 2 cells.

As before, we applied a Resnet-152 CNN to identify subpopulation identity, achieving an AUC of 0.74. We hypothesized that including neighborhood information, such as cell–cell interaction, could improve the ability of the network to correctly classify cells as belonging to 231-Subpop 1 or 231-Subpop 2. This environmental interaction is an underutilized aspect of cell morphology that is not always applied when attempting to classify cells using only morphology. While there are large bodies of literature which have researched how natural density-dependent phenomena such as the Allee effect^38,39 and contact inhibition^40,41,42 influence tumor growth, cell morphology is often focused on elements such as texture and shape, as opposed to orientation and interaction with neighboring cells. To test this hypothesis, we designed several in-silico experiments to investigate the effect of neighborhood properties. In the first, we incrementally increased the bounding box around the input image to include neighboring cells. We tested several bounding box increases between 0 and 65 pixels (Fig. 4a) and found that there exists an optimal range between pixel increases of 25 and 45 pixels, which includes enough information about the orientation and interaction between cells (Fig. 4b), but which does not obfuscate which cell is being identified. This resulted in a maximum AUC of 0.8.

**Fig. 4: Bounding box optimization for MDA-MB-231 subpopulations.**

To further test this hypothesis, we also designed in-silico experiments to selectively remove information (Fig. 5a) using a 25-pixel bounding box increase. In the first, we completely blacked out the area around the cells. In the second, we whited out the cell itself. Intuitively, these experiments test the ability of the CNN to classify cell phenotype without information about background or without information on texture. We find that the network classification ability diminishes with these limitations (Fig. 5b), giving support to our hypothesis that including the environment around the cell is a contributing factor to phenotype identification in cell morphology. To determine whether these differences could be attributed to changes in culture confluence, we tracked the classification accuracy across each day of culture (Fig. S2) and did not find consistent changes in classification accuracy. Training/testing loss curves for the best models and confusion matrices for these models were also determined (Figs. S3 and S4), and the analysis was repeated independently with each of 46 different hold-out test wells (Fig. S5) to confirm robustness.

**Fig. 5: Feature removal for MDA-MB-231 subpopulation identification.**

After demonstrating that highly perturbed MDA-MB-231 cells as well as unperturbed subpopulations within the parental MDA-MB-231 population could be identified using deep learning, we questioned whether this phenomenon would extend to other cell models. Using unsupervised Leiden clustering, we identified the MDA-MB-436 cell line as another example of an in vitro breast cancer model consisting of at least two subpopulations with unique transcriptional patterns³⁷. These transcriptomic subpopulations had the highest observed level of similarity (PCC = 0.96, Fig. S1). Similar to the gene ESAM in the MDA-MB-231 cells, BST2 is a marker that differentiates each population. MDA-MB-436 subpopulations were sorted directly from a parent population using tetherin, the protein product of BST2. We grew isolated replicates for each subpopulation, then dyed each individually using a red cytoplasmic membrane dye for BST2 high cells (436-Subpop 1) and a green cytoplasmic dye for BST2 low cells (436-Subpop 2) (Methods).

We plated each isolated subpopulation in a 50/50 coculture in 60 wells within a 96-well plate. Each well was imaged at 20X magnification every 4 h. We collected 86,924 instances of 436-Subpop 1 and 80,834 instances of 436-Subpop 2 cells. As before, cell phenotype was identified using the fluorescence of the segmented cell. Bounding box optimization was performed as above, with the max AUC = 0.75 at a 65 pixel increase. We found that the discrimination ability of the network increased as the bounding box increased, then dropped off, similar to the MDA-MB-231 subpopulation identification (Fig. 6).

**Fig. 6: Bounding box optimization for MDA-MB-436 subpopulations.**

Discussion

Cell morphology has long been used to identify basic properties of cells, such as cell line identification^16,17,18,19, but has only recently been applied to more cell-state properties such as metastatic viability and treatment status^20,22,27. The heterogeneity that can exist within cell morphologies, however, presents significant challenges in correctly segmenting cells and identifying properties of interest. Deep learning has proved to be an ideal approach for classifying cells based on morphology^25,26,27,28, as it has the ability to absorb a large corpus of data and learn relevant features.

In this study, we extend this cell-state prediction by classifying cell phenotypes that would be previously unidentifiable without the use of single-cell sequencing technologies. We show that cell morphology reflects these underlying differences using only phase contrast imaging and demonstrate that this methodology can be used to identify multiple types of in vitro populations. Additionally, we observed that including neighborhood context about cell orientation and interaction with other cells increased the accuracy of the model.

We tested transcriptomic differences in three separate contexts. In the first, we compared MDA-MB-231 cells treated with doxorubicin to control MDA-MB-231. We found that these populations had easily distinguished morphologies, and our model was able to achieve a high classification accuracy. In the second context, we compared two separate subpopulations within the MDA-MB-231 cell line, and in the third, we compared two separate subpopulations within the MDA-MB-436 cell line. In both cases, the model was able to identify subpopulation identity with at minimum modest results.

Additionally, we interrogated the effect of cell orientation and interaction by increasing the area around each cell that was input into the CNN. We found that increasing the bounding box initially improves the CNN’s classification ability, reaching an optimal size, followed by a drop-off in discrimination ability. To further understand this phenomenon, we performed several in-silico experiments which either completely removed the surroundings around each cell or removed textural information about the cell. Even without this information, the CNN was able to correctly classify each subpopulation with moderate discrimination ability in each case. In both experiments, the contour of the cell of interest is emphasized by a sharp contrast between the background and the cell area. This supports recent research which has utilized active shape models for prediction^21,22,43. However, we find that while both texture and the surrounding environment are important, including both enables the best discrimination ability.

One unexplored avenue for cell morphological analysis is the use of unsupervised approaches^44,45 to monitor changes in cell populations. The ability to autonomously observe shifts and correlate them to specific events could enable researchers to observe cell health, view the effects of different drug mechanisms, and observe phenotypic plasticity without the need for techniques such as sequencing or fluorescent labeling. One method that has been used on cell populations is autoencoders^20,25,26,46, which contain a reduced latent space that is representative of the input and have been used for unsupervised learning approaches. Autoencoders have been used primarily to perform latent space predictions and provide explanations for changes in cell morphology. Using this latent space to make inferences about changes in the transcriptome without a priori knowledge about these changes could give insight into these changes without the need for sequencing.

While these studies were performed on cells that also included fluorescent labels, classifying phenotype using only phase contrast images opens multiple possibilities for longitudinal experiments. The first is that it can be extended to many populations. Many automated imaging platforms capture red and green fluorescence, such that a maximum of 4 population labels (red, green, yellow, and non-fluorescing cells) can be monitored simultaneously. Given sufficient differences in morphology, a CNN could be adapted to identify additional subpopulations without fluorescent tags. Furthermore, some populations might not be suited to stable integration of a reporter gene, but a deep learning algorithm does not require this because it learns from morphology and interaction. Finally, not using fluorescent markers to track subpopulations frees the operator to have fluorescent indicators available for other uses. For example, cells could be stably integrated using the FUCCI⁴⁷ system, enabling the ability to monitor the growth of each subpopulation as well as the cell cycle. The models trained within this paper are designed for analyzing phase contrast images to distinguish specific phenotypes. Future work can expand this to surveillance of phenotypes in other contexts, such as non-adherent cells or 3D cell culture.

The use of morphology for the tracking of subpopulations provides an avenue for a better understanding of the ecology and evolution of subpopulations. Recent research has demonstrated that mathematical modeling can be used to provide insights into more optimal treatment schedules^48,49,50,51. However, these models are limited in their ability to consider the heterogeneity that exists within a tumor population. Using cell morphology for cell-state identification can provide insight into the transcriptomic composition of a population when testing and identifying new therapeutic strategies and targets.

Methods

Cell culture

MDA-MB-436 cells were cultured in high-glucose DMEM (Sigma D5796, 4.5 g/L glucose), 10% FBS, 0.01 mg/mL insulin (ThermoFisher, 12585014), 16 µg/mL glutathione (Sigma G013), penicillin-streptomycin (Thermo Scientific, 15140122). MDA-MB-231 cells were cultured in high-glucose DMEM (Sigma, D5796) containing 10% FBS and 1X penicillin-streptomycin (Thermo Scientific, 15140122). Cells were maintained at 37 °C in a 5% CO₂ atmosphere.

Drug treatment

Doxorubicin hydrochloride (Cayman Chemical 150007) was reconstituted in water. Cell culture media was replaced with new media containing 550 nM doxorubicin. After 48 h, drug-free media replaced the doxorubicin-dosed media.

scRNA-seq library preparation

MDA-MB-231 samples from untreated and treated populations were harvested. Cells were loaded into wells of a Chromium A Chip. Libraries were prepared using the 10X Genomics 3′ single-cell gene expression (v2) protocol. Paired-end sequencing was performed using a NovaSeq6000 with an S4 chip (400 cycles) according to the manufacturer’s instructions.

scRNA-seq analysis

MDA-MB-231 single-cell data was aligned to the GRCH38-2020-A and processed using Cell Ranger v7.1. All data were post-processed in the same way as described previously³⁷. Briefly, cells were filtered according to scanpy’s best practices⁵² by removing cells with a median absolute deviation greater than 5 for the number of genes, total counts, or percentage of mitochondrial genes when compared to the median amount for the entire population. The count matrix was normalized through a shifted logarithm transformation. Potential doublets were removed using scDblFinder⁵³, and a UMAP representation was then calculated. Finally, cell line subpopulation identity was determined with Leiden clustering. To make comparisons across samples and subpopulations, we computed Pearson Correlation Coefficients (PCC) between each sample. To do so, we integrated all samples by re-normalizing data after concatenation. We then calculated the top 50 principal component values and found the PCC of these principal component values between each sample.

Fluorescent nuclear labeling of MDA-MB-231 cells

MDA-MB-231 cells were stably labeled using a sleeping beauty transposase system. MDA-MB-231 cells were grown to 90% confluence in a 10 cm dish, then transfected with Lipofectamine 3000 according to the manufacturer’s protocol with 2.5 µg of a plasmid containing Sleeping Beauty transposase (SB100X, AddGene #34879) and 2.5 µg of a plasmid containing either GFP or mCherry on a nuclear localization signal (NLS) and flanked by inverted repeats. Stably transfected cells were selected after 72 h with G418, then expanded for two passages before antibody staining.

Antibody staining and cell sorting

GFP and mCherry-labeled MDA-MB-231 cells were stained with 2 µL ESAM-PE-Vio770 (Miltenyi, 130-115-039) per million cells. Cells were gated on cell single-cell fraction, fluorescent channel, and ESAM staining. ESAM-low cells were sorted from the mCherry-labeled population, and ESAM-high cells were sorted from GFP-labeled population. Enriched populations were expanded for at least two passages, then further purified by performing another round of staining and FACS.

MDA-MB-436 cells were resuspended in cell staining buffer (PBS + 5 mM EDTA + 1% BSA + 1.6 mM NaOH + 0.01% sodium azide) and incubated in 1:100 diluted Zombie Violet viability dye (Biolegend, 423113) for 5 min on ice, then incubated with the 9 µL of APC-conjugated CD317 (BST2) antibody (Miltenyi, 130-101-660) per 1e6 cells for 20 min. Stained cells were washed 3 times with cell sorting buffer supplemented with 1:1000 diluted Zombie viability dye, then passed through a 40 µm cell strainer before FACS. Collection media for live cell sorting was prepared by supplementing complete media with 25 mM HEPES. Cells were sorted into 15 mL tubes containing 7 mL of collection media with gating on live cells, FSC-A/SSC-A, single cells, and BST2-lo or BST2-hi expression. Collected cells were spun down at 300 × g for 10 min, then plated in 50% conditioned media (CM) for 24 h before transitioning to fresh, complete media. CM was prepared from complete media incubated on a 70% confluent plate of parental cells for 24 h. CM was spun down at 500 × g for 10 min, and supernatant was passed through a 0.22 µm filter. CM was diluted to 50% with fresh, complete media. Sorted subpopulations were allowed to recover from the sort and expand for a week before analysis.

Image acquisition and processing

All images were acquired on an Incucyte S3 (Sartorius) at 20× magnification with nine images acquired per well. Fluorescent composite images were only used for cell phenotype ground-truth identification. To reduce the total number of cells per image, each 20× image was split into 16 separate image patches.

Cell segmentation

Image segmentation for cell lines in low to medium confluency images was achieved by training a custom Mask R-CNN instance segmentation algorithm. To obtain ground-truth segmentations, we first used Cellpose, a generalist segmentation algorithm, to obtain initial masks. We then reviewed each segmentation generated by Cellpose and corrected them when necessary to achieve better segmentation. These corrected masks were fed into Mask R-CNN for specialized training. MDA-MB-231 cells had an associated precision (AP) score of 66%. MDA-MB-436 cells had an AP score of 55%.

Cell phenotype discrimination using a CNN

For each experiment, we chose to use a pretrained network with residual connections (ResNet-152 trained on Image-Net 1k³¹) as they had shown promise in image classification. We used a stochastic gradient descent optimizer with an initial learning rate of 0.05 that decayed on each epoch. Any predefined augmentations (such as blacking out the background, increasing the surrounding bounding box, etc.) were applied to each image, and then were centered within a 300 × 300 pixel black background for 20× magnification images. To avoid overfitting, images were randomly flipped and rotated at each epoch. Each image was normalized to the dataset mean and standard deviation. Models were trained for 30 epochs or until accuracy on the test set did not increase for 5 epochs. A hold-out well, which was unseen in the training data, was used as a test set for subpopulation identification, and one well from each plate in the untreated and treated MDA-MB-231 experiments was used as a test set for those comparisons. The untreated test well consisted of 353 instances of untreated cells and 647 instances of treated cells. The MDA-MB-231 subpopulation well contained 3328 instances of 231-Subpop 1 and 8120 instances of 231-Subpop 2. The MDA-MB-436 test well consisted of 715 instances of 436-Subpop 1 and 780 instances of 436-Subpop 2. Training/testing loss curves for the best models and confusion matrices for these models were also determined (Figs. S3 and S4) and the analysis was repeated with 46 different hold-out wells (Fig. S5) to confirm robustness.

Data availability

Images and models used in this paper are available upon request.

Code availability

Scripts for training, testing, and segmentation are available at https://github.com/brocklab/transcriptomicClusterMorph.

References

Groves, S. M. et al. Archetype tasks link intratumoral heterogeneity to plasticity and cancer hallmarks in small cell lung cancer. Cell Syst. 13, 690–710.e17 (2022).
Article PubMed PubMed Central CAS Google Scholar
Chang, C. A. et al. Ontogeny and vulnerabilities of drug-tolerant persisters in HER2+ breast cancer. Cancer Discov. 12, 1022–1045 (2022).
Article PubMed PubMed Central CAS Google Scholar
Kinker, G. S. et al. Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity. Nat. Genet. 52, 1208–1218 (2020).
Article PubMed PubMed Central CAS Google Scholar
Su, Y. et al. Phenotypic heterogeneity and evolution of melanoma cells associated with targeted therapy resistance. PLOS Comput. Biol. 15, e1007034 (2019).
Article PubMed PubMed Central CAS Google Scholar
Litzenburger, U. M. et al. Single-cell epigenomic variability reveals functional cancer heterogeneity. Genome Biol. 18, 15 (2017).
Article PubMed PubMed Central Google Scholar
Dave, A. et al. The Breast Cancer Single-Cell Atlas: defining cellular heterogeneity within model cell lines and primary tumors to inform disease subtype, stemness, and treatment options. Cell Oncol. 46, 603–628 (2023).
Article CAS Google Scholar
Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013).
Article PubMed CAS Google Scholar
Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nat. Rev. Cancer 6, 924–935 (2006).
Article PubMed CAS Google Scholar
Vendramin, R., Litchfield, K. & Swanton, C. Cancer evolution: Darwin and beyond. EMBO J. 40, e108389 (2021).
Article PubMed PubMed Central CAS Google Scholar
West, J. et al. Towards multidrug adaptive therapy. Cancer Res. 80, 1578–1589 (2020).
Article PubMed PubMed Central CAS Google Scholar
Strobl, M. A. R., Gallaher, J., Robertson-Tessi, M., West, J. & Anderson, A. R. A. Treatment of evolving cancers will require dynamic decision support. Ann. Oncol. 34, 867–884 (2023).
Article PubMed CAS Google Scholar
Gutierrez, C. et al. Multifunctional barcoding with ClonMapper enables high-resolution study of clonal dynamics during tumor evolution and treatment. Nat. Cancer 2, 758–772 (2021).
Article PubMed PubMed Central CAS Google Scholar
Torre, E. et al. Rare cell detection by single-cell RNA sequencing as guided by single-molecule RNA FISH. Cell Syst. 6, 171–179.e5 (2018).
Article PubMed PubMed Central CAS Google Scholar
Turashvili, G. & Brogi, E. Tumor heterogeneity in breast cancer. Front. Med. 4, 227 (2017).
Johnson, K. E. et al. Integrating transcriptomics and bulk time course data into a mathematical framework to describe and predict therapeutic resistance in cancer. Phys. Biol. 18, 016001 (2020).
Article PubMed PubMed Central Google Scholar
Olson, A. C., Larson, N. M. & Heckman, C. A. Classification of cultured mammalian cells by shape analysis and pattern recognition. Proc. Natl. Acad. Sci. USA 77, 1516–1520 (1980).
Article PubMed PubMed Central CAS Google Scholar
Mousavikhamene, Z., Sykora, D. J., Mrksich, M. & Bagheri, N. Morphological features of single cells enable accurate automated classification of cancer from non-cancer cell lines. Sci. Rep. 11, 24375 (2021).
Article PubMed PubMed Central CAS Google Scholar
Theriault, D. H., Walker, M. L., Wong, J. Y. & Betke, M. Cell morphology classification and clutter mitigation in phase-contrast microscopy images using machine learning. Mach. Vis. Appl. 23, 659–673 (2012).
Article Google Scholar
Alizadeh, E. et al. Cellular morphological features are predictive markers of cancer cell state. Comput. Biol. Med. 126, 104044 (2020).
Article PubMed CAS Google Scholar
Zaritsky, A. et al. Interpretable deep learning uncovers cellular properties in label-free live cell images that are predictive of highly metastatic melanoma. Cell Syst. 12, 733–747.e6 (2021).
Article PubMed PubMed Central CAS Google Scholar
Wu, P.-H. et al. Single-cell morphology encodes metastatic potential. Sci. Adv. 6, eaaw6938 (2020).
Article PubMed PubMed Central CAS Google Scholar
Wu, P.-H. et al. Evolution of cellular morpho-phenotypes in cancer metastasis. Sci. Rep. 5, 18437 (2015).
Article PubMed PubMed Central CAS Google Scholar
Wang, W. et al. Live-cell imaging and analysis reveal cell phenotypic transition dynamics inherently missing in snapshot data. Sci. Adv. 6, eaba9319 (2020).
Article PubMed PubMed Central CAS Google Scholar
Suzuki, G. et al. Machine learning approach for discrimination of genotypes based on bright-field cellular images. npj Syst. Biol. Appl. 7, 1–8 (2021).
Article Google Scholar
Soelistyo, C. J., Vallardi, G., Charras, G. & Lowe, A. R. Learning biophysical determinants of cell fate with deep neural networks. Nat. Mach. Intell. 4, 636–644 (2022).
Article Google Scholar
Wakui, T., et al. Predicting reprogramming-related gene expression from cell morphology in human induced pluripotent stem cells. Mol. Biol. Cell 34, ar45 (2023).
Article PubMed PubMed Central CAS Google Scholar
Chow, Y. L., Singh, S., Carpenter, A. E. & Way, G. P. Predicting drug polypharmacology from cell morphology readouts using variational autoencoder latent space arithmetic. PLOS Comput. Biol. 18, e1009888 (2022).
Article PubMed PubMed Central CAS Google Scholar
Harrison, P. J. et al. Evaluating the utility of brightfield image data for mechanism of action prediction. PLOS Comput. Biol. 19, e1011323 (2023).
Article PubMed PubMed Central CAS Google Scholar
Lovitt, C. J., Shelper, T. B. & Avery, V. M. Doxorubicin resistance in breast cancer cells is mediated by extracellular matrix proteins. BMC Cancer 18, 41 (2018).
Article PubMed PubMed Central Google Scholar
Howard, G. R., Jost, T. A., Yankeelov, T. E. & Brock, A. Quantification of long-term doxorubicin response dynamics in breast cancer cell lines to direct treatment schedules. PLOS Comput. Biol. 18, e1009104 (2022).
Article PubMed PubMed Central Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. Preprint at https://doi.org/10.48550/arXiv.1512.03385 (2015).
resnet152 — Torchvision main documentation. https://pytorch.org/vision/main/models/generated/torchvision.models.resnet152.html#torchvision.models.ResNet152_Weights.
Kensert, A., Harrison, P. J. & Spjuth, O. Transfer learning with deep convolutional neural networks for classifying cellular morphological changes. SLAS Discov. 24, 466–475 (2019).
Article PubMed PubMed Central Google Scholar
Paramanantham, A. et al. Doxorubicin-resistant TNBC cells exhibit rapid growth with cancer stem cell-like properties and EMT phenotype, which can be transferred to parental cells through autocrine signaling. Int. J. Mol. Sci. 22, 12438 (2021).
Article PubMed PubMed Central CAS Google Scholar
Han, R. et al. Activation of β-catenin signaling is critical for doxorubicin-induced epithelial–mesenchymal transition in BGC-823 gastric cancer cell line. Tumor Biol. 34, 277–284 (2013).
Article CAS Google Scholar
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Article PubMed PubMed Central CAS Google Scholar
Gardner, A. L., Jost, T. A., Morgan, D. & Brock, A. Computational identification of surface markers for isolating distinct subpopulations from heterogeneous cancer cell populations. npj Syst. Biol. Appl. 10, 1–9 (2024).
Article Google Scholar
Böttger, K. et al. An emerging Allee effect is critical for tumor initiation and persistence. PLOS Comput. Biol. 11, e1004366 (2015).
Article PubMed PubMed Central Google Scholar
Johnson, K. E. et al. Cancer cell population growth kinetics at low densities deviate from the exponential growth model and suggest an Allee effect. PLOS Biol. 17, e3000399 (2019).
Article PubMed PubMed Central CAS Google Scholar
Batt, D. B. & Roberts, T. M. Cell density modulates protein-tyrosine phosphorylation. J. Biol. Chem. 273, 3408–3414 (1998).
Article PubMed CAS Google Scholar
Pham, K. et al. Density-dependent quiescence in glioma invasion: instability in a simple reaction–diffusion model for the migration/proliferation dichotomy. J. Biol. Dyn. 6, 54–71 (2012).
Article PubMed Google Scholar
Mendonsa, A. M., Na, T.-Y. & Gumbiner, B. M. E-cadherin in contact inhibition and cancer. Oncogene 37, 4769–4780 (2018).
Article PubMed PubMed Central CAS Google Scholar
Phillip, J. M., Han, K.-S., Chen, W.-C., Wirtz, D. & Wu, P.-H. A robust unsupervised machine-learning method to quantify the morphological heterogeneity of cells and nuclei. Nat. Protoc. 16, 754–774 (2021).
Article PubMed PubMed Central CAS Google Scholar
Schmarje, L., Santarossa, M., Schroder, S.-M. & Koch, R. A survey on semi-, self- and unsupervised learning for image classification. IEEE Access 9, 82146–82168 (2021).
Article Google Scholar
Dillon, B. M., Plehn, T., Sauer, C. & Sorrenson, P. Better latent spaces for better autoencoders. SciPost Phys. 11, 061 (2021).
Article Google Scholar
Sommer, C., Hoefler, R., Samwer, M. & Gerlich, D. W. A deep learning and novelty detection framework for rapid phenotyping in high-content screening. Mol. Biol. Cell 28, 3428–3436 (2017).
Article PubMed PubMed Central CAS Google Scholar
Sakaue-Sawano, A. et al. Visualizing spatiotemporal dynamics of multicellular cell-cycle progression. Cell 132, 487–498 (2008).
Article PubMed CAS Google Scholar
Maltas, J. et al. Drug dependence in cancer is exploitable by optimally constructed treatment holidays. Nat. Ecol. Evol. 8, 147–162 (2024).
Article PubMed Google Scholar
Gatenby, R. A. & Brown, J. S. Integrating evolutionary dynamics into cancer therapy. Nat. Rev. Clin. Oncol. 17, 675–686 (2020).
Article PubMed Google Scholar
Miroshnychenko, D. et al. Stroma-mediated breast cancer cell proliferation indirectly drives chemoresistance by accelerating tumor recovery between chemotherapy cycles. Cancer Res. 83, 3681–3692 (2023).
Article PubMed PubMed Central Google Scholar
Kazerouni, A. S. et al. Integrating quantitative assays with biologically based mathematical modeling for predictive oncology. iScience 23, 101807 (2020).
Article PubMed PubMed Central CAS Google Scholar
Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).
Article PubMed CAS Google Scholar
Germain, P.-L., Lun, A., Meixide, C. G., Macnair, W. & Robinson, M. D. Doublet identification in single-cell sequencing data using scDblFinder. Preprint at https://doi.org/10.12688/f1000research.73600.2 (2022).

Download references

Acknowledgements

We thank R01CA226258, R01CA255536, and U01CA253540 (to A.B.), which provided funding that supported this project. 10X Genomics Single-Cell 3′ Gene Expression and TagSeq preparation were performed by the Genomic Sequencing and Analysis Facility at UT Austin, Center for Biomedical Research Support (RRID: SCR_021713). Flow cytometry and FACS were performed at the Center for Biomedical Research Support Microscopy and Imaging Facility at UT Austin (RRID: SCR_021756). T.J. thanks Clarence Yapp for guidance on instance segmentation, model training, and image augmentation.

Author information

Authors and Affiliations

Department of Biomedical Engineering, The University of Texas at Austin, Austin, TX, USA
Tyler A. Jost, Andrea L. Gardner, Daylin Morgan & Amy Brock

Authors

Tyler A. Jost
View author publications
Search author on:PubMed Google Scholar
Andrea L. Gardner
View author publications
Search author on:PubMed Google Scholar
Daylin Morgan
View author publications
Search author on:PubMed Google Scholar
Amy Brock
View author publications
Search author on:PubMed Google Scholar

Contributions

T.J. performed computational studies, wet lab experiments, and data analysis. A.G. performed wet lab experiments and data analysis. D.M. contributed to methodology development. T.J. and A.G. drafted the manuscript. A.B. supervised the project. All authors contributed to the interpretation of the results and reviewed and edited the manuscript.

Corresponding author

Correspondence to Amy Brock.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Jost, T.A., Gardner, A.L., Morgan, D. et al. Deep learning identifies heterogeneous subpopulations in breast cancer cell lines. npj Biomed. Innov. 2, 19 (2025). https://doi.org/10.1038/s44385-025-00023-z

Download citation

Received: 08 October 2024
Accepted: 05 May 2025
Published: 21 June 2025
DOI: https://doi.org/10.1038/s44385-025-00023-z