Abstract
Cells exhibit a wide array of morphological features, enabling computer vision methods to identify and track relevant parameters. Morphological analysis has long been implemented to identify specific cell types and cell responses. Here, we asked whether morphological features might also be used to classify transcriptomic subpopulations within in vitro cancer cell lines. Identifying cell subpopulations furthers our understanding of morphology as a reflection of underlying cell phenotype and could enable a better understanding of how subsets of cells compete and cooperate in disease progression and treatment. We demonstrate that cell morphology can reflect underlying transcriptomic differences in vitro using convolutional neural networks. First, we find that changes induced by chemotherapy treatment are highly identifiable in a breast cancer cell line. We then show that the intra cell line subpopulations that comprise breast cancer cell lines under standard growth conditions are also identifiable using cell morphology. We find that cell morphology is influenced by neighborhood effects beyond the cell boundary, and that including image information surrounding the cell can improve model discrimination ability.
Similar content being viewed by others
Introduction
Tumors are comprised of cells with varying levels of genetic and non-genetic differences1,2,3,4,5,6,7. Viewing cancer as both an ecological and evolutionary8,9 process has led to the development of “adaptive therapies”10,11, which seek to contain the tumor rather than fully eliminate it. This perspective incorporates the inherent heterogeneity within cancer, as it seeks to address the subpopulations which proliferate when chemotherapy is applied. Heterogeneity, even within in vitro cell lines, has been well documented through single-cell sequencing methods12,13,14,15. These high-dimensional measurements are generally destructive endpoint readouts and therefore can only provide snapshots in time, documenting the current state of the population. To better understand how relevant subpopulations react in response to various treatments and schedules, non-destructive methods that capture population dynamics in live cell samples are necessary.
Historically, image analysis of cell populations has focused on visually observable qualities such as cell line16,17 or cell states that are defined primarily by distinctive morphological differences18,19. Image analysis using machine learning has recently demonstrated that cell-state properties such as metastatic invasion20,21,22, the induction of an epithelial to mesenchymal transition23, or the introduction of genetic perturbations24 can be detected through cellular morphology even though these properties are not readily observable through visual analysis. Deep learning has been particularly effective at discriminating between cell-state properties by extracting representations directly from images25,26,27,28. Many of these results have been demonstrated only using imaging techniques such as phase contrast or brightfield microscopy. To further explore the capabilities of this approach, we asked whether more subtle cell-state differences are identifiable even within cell lines under standard culture conditions. Here, we investigate whether cells which exhibit distinct RNA expression patterns can be discriminated based on cell morphology alone.
To explore the extent to which transcriptomic differences are expressed through cell morphology, we tested the ability of a deep convolutional neural network (CNN) to discriminate cancer cell heterogeneity in three model conditions (Fig. 1a). In the first condition, we used heavily bottlenecked cell populations which had experienced high levels of doxorubicin exposure and compared them to healthy cells. In the second, we isolated and fluorescently labeled two transcriptomically distinct subpopulations within the MDA-MB-231 cell line. In the third, we extended this to transcriptomic subpopulations in the MDA-MB-436 cell line. After identification and isolation of these populations, we developed an image segmentation pipeline that allowed us to perform instance segmentation on patches of images (Fig. 1b). A CNN was then trained to discriminate between the two populations in each condition using only phase contrast imaging (Fig. 1c). We demonstrate that cell morphology is not only reflective of even subtle differences in transcriptomic expression, but that this morphology is heavily influenced by cell-to-cell orientation and interaction.
a Phenotype identification was performed by first imaging treated and untreated populations of MDA-MB-231 cells and cocultured subpopulations in MDA-MB-231 and MDA-MB-436 cell lines at 20× magnification using phase contrast and fluorescent imaging. Images are contrast-enhanced for visualization purposes only. b Cell segmentation was then performed on each image by splitting the image into 16 patches and then segmenting using a Mask R-CNN segmentation network. c Input images were fed into a convolutional neural network classifier that predicted each cell’s phenotype.
Results
To establish a baseline for what is identifiable through cell morphology using our methodology, we explored how chemotherapeutic dosing of MDA-MB-231 cells affected their morphology before and after treatment. Prior research has shown that morphology in MDA-MB-231 cells is altered dependent on the dosage of doxorubicin29, therefore, we chose to treat cells with doxorubicin at a level we had previously observed exhibited a high lethality30. Fifty thousand MDA-MB-231 cells were seeded and treated with 550 nM of doxorubicin after 24 h. After 48 h, the media containing the drug were removed and replaced with growth media. Cells were maintained in standard culture to reach a population size of 6 × 106 cells/sample. We performed single-cell RNA sequencing (scRNA-seq) on both the treated population as well as an untreated control population. The full dataset was composed of 7502 treated and 3258 untreated cells after post-processing (Methods). Comparing overall RNA expression patterns revealed differences after treatment (Fig. 2a), with the lowest observed magnitude of correlation in gene expression between each condition (PCC = −0.03, Supplementary Fig. S1). Next, we separately plated 60 wells of doxorubicin-treated and 60 wells of control cells in 96-well plates and imaged at 20× magnification every 4 h by phase contrast using Incucyte S3 (Sartorius) until cultures reached maximum confluency. We collected 77,325 instances of treated and 120,536 instances of untreated cells. Each image was then segmented using a custom-trained Mask R-CNN segmentation network (Methods). To determine cell identity, we used a Resnet-152 CNN pretrained using ImageNet1k, as this had been previously shown to be effective at image classification31,32. Using transfer learning33, we were able to identify cells according to treatment status with an AUC of 0.95 (Fig. 3). This is in line with current knowledge, as it has been previously shown that doxorubicin induces an epithelial to mesenchymal34,35 transition and that this transition causes a distinct morphological shift.
UMAP representations of a MDA-MB-231 cells treated with doxorubicin (500 nM) and untreated control samples, b Subpopulations (Subpop 1 and Subpop 2) within the control MDA-MB-231 and doxorubicin-treated MDA-MB-231, and c MDA-MB-436 subpopulations. (Subpop 1 and Subpop 2). Subpopulations were identified by Leiden clustering, as previously described37.
To determine whether it is possible to discriminate subsets of cells within a cell line, we examined scRNA-seq data of untreated MDA-MB-231 cells under standard growth conditions. We found that the MDA-MB-231 cell line consists of at least two transcriptomically distinct populations (231-Subpop 1 and 231-Subpop 2) as identified by Leiden clustering36 (Fig. 2b). Relative to the untreated and treated sample comparison, the two intra-cell line populations have more similar gene expression patterns (PCC = 0.79, Fig. S1). We therefore asked whether these two subpopulations could be distinguished by the same approach of phase contrast imaging, segmentation, and phenotype classification.
In previous studies, we identified and experimentally validated ESAM as a surface marker that differentiates the identified clusters37, with low expression in 231-Subpop 1 and high expression in 231-Subpop 2. To establish ground-truth knowledge of subpopulation identity during coculture imaging experiments, ESAM-separated subpopulations of MDA-MB-231 cells were established from parental cells with stable fluorescent nuclear labels on NLS-mCherry (ESAM-low, 231-Subpop 1) and NLS-GFP (ESAM-high, 231-Subpop 2) (Methods).
A 50/50 coculture of 231-Subpop 1 and 231-Subpop 2 cells was plated in 60 wells of a 96-well plate. They were imaged at 20× magnification every 4 h with phase contrast and fluorescence imaging using an Incucyte S3. Because each population is fluorescently labeled based on its transcriptomic identity, we were able to label each cell’s phenotype after segmentation. In total, we gathered 30,418 instances of cocultured 231-Subpop 1 and 63,758 instances of 231-Subpop 2 cells.
As before, we applied a Resnet-152 CNN to identify subpopulation identity, achieving an AUC of 0.74. We hypothesized that including neighborhood information, such as cell–cell interaction, could improve the ability of the network to correctly classify cells as belonging to 231-Subpop 1 or 231-Subpop 2. This environmental interaction is an underutilized aspect of cell morphology that is not always applied when attempting to classify cells using only morphology. While there are large bodies of literature which have researched how natural density-dependent phenomena such as the Allee effect38,39 and contact inhibition40,41,42 influence tumor growth, cell morphology is often focused on elements such as texture and shape, as opposed to orientation and interaction with neighboring cells. To test this hypothesis, we designed several in-silico experiments to investigate the effect of neighborhood properties. In the first, we incrementally increased the bounding box around the input image to include neighboring cells. We tested several bounding box increases between 0 and 65 pixels (Fig. 4a) and found that there exists an optimal range between pixel increases of 25 and 45 pixels, which includes enough information about the orientation and interaction between cells (Fig. 4b), but which does not obfuscate which cell is being identified. This resulted in a maximum AUC of 0.8.
a Representative images of MDA-MB-231 cells with increasing bounding boxes (1 pixel ≈ 0.62 μm). The segmented cells are centered, and the bounding box around them is increased uniformly. b The AUC of the trained CNN versus the pixel increase around the cell. Increasing the bounding box increases the AUC until it reaches 0.80 around 25 pixels, at which point it plateaus and then decreases.
To further test this hypothesis, we also designed in-silico experiments to selectively remove information (Fig. 5a) using a 25-pixel bounding box increase. In the first, we completely blacked out the area around the cells. In the second, we whited out the cell itself. Intuitively, these experiments test the ability of the CNN to classify cell phenotype without information about background or without information on texture. We find that the network classification ability diminishes with these limitations (Fig. 5b), giving support to our hypothesis that including the environment around the cell is a contributing factor to phenotype identification in cell morphology. To determine whether these differences could be attributed to changes in culture confluence, we tracked the classification accuracy across each day of culture (Fig. S2) and did not find consistent changes in classification accuracy. Training/testing loss curves for the best models and confusion matrices for these models were also determined (Figs. S3 and S4), and the analysis was repeated independently with each of 46 different hold-out test wells (Fig. S5) to confirm robustness.
After demonstrating that highly perturbed MDA-MB-231 cells as well as unperturbed subpopulations within the parental MDA-MB-231 population could be identified using deep learning, we questioned whether this phenomenon would extend to other cell models. Using unsupervised Leiden clustering, we identified the MDA-MB-436 cell line as another example of an in vitro breast cancer model consisting of at least two subpopulations with unique transcriptional patterns37. These transcriptomic subpopulations had the highest observed level of similarity (PCC = 0.96, Fig. S1). Similar to the gene ESAM in the MDA-MB-231 cells, BST2 is a marker that differentiates each population. MDA-MB-436 subpopulations were sorted directly from a parent population using tetherin, the protein product of BST2. We grew isolated replicates for each subpopulation, then dyed each individually using a red cytoplasmic membrane dye for BST2 high cells (436-Subpop 1) and a green cytoplasmic dye for BST2 low cells (436-Subpop 2) (Methods).
We plated each isolated subpopulation in a 50/50 coculture in 60 wells within a 96-well plate. Each well was imaged at 20X magnification every 4 h. We collected 86,924 instances of 436-Subpop 1 and 80,834 instances of 436-Subpop 2 cells. As before, cell phenotype was identified using the fluorescence of the segmented cell. Bounding box optimization was performed as above, with the max AUC = 0.75 at a 65 pixel increase. We found that the discrimination ability of the network increased as the bounding box increased, then dropped off, similar to the MDA-MB-231 subpopulation identification (Fig. 6).
a Representative images of MDA-MB-436 cells with pixel bounding box increases (1 pixel ≈ 0.62 μm). b The AUC of the trained CNN versus the pixel increase around the cell. As with the MDA-MB-231 cell line, the discrimination ability of the model increases with more neighborhood information, this time peaking with an AUC of 0.75 at a 55 pixel increase.
Discussion
Cell morphology has long been used to identify basic properties of cells, such as cell line identification16,17,18,19, but has only recently been applied to more cell-state properties such as metastatic viability and treatment status20,22,27. The heterogeneity that can exist within cell morphologies, however, presents significant challenges in correctly segmenting cells and identifying properties of interest. Deep learning has proved to be an ideal approach for classifying cells based on morphology25,26,27,28, as it has the ability to absorb a large corpus of data and learn relevant features.
In this study, we extend this cell-state prediction by classifying cell phenotypes that would be previously unidentifiable without the use of single-cell sequencing technologies. We show that cell morphology reflects these underlying differences using only phase contrast imaging and demonstrate that this methodology can be used to identify multiple types of in vitro populations. Additionally, we observed that including neighborhood context about cell orientation and interaction with other cells increased the accuracy of the model.
We tested transcriptomic differences in three separate contexts. In the first, we compared MDA-MB-231 cells treated with doxorubicin to control MDA-MB-231. We found that these populations had easily distinguished morphologies, and our model was able to achieve a high classification accuracy. In the second context, we compared two separate subpopulations within the MDA-MB-231 cell line, and in the third, we compared two separate subpopulations within the MDA-MB-436 cell line. In both cases, the model was able to identify subpopulation identity with at minimum modest results.
Additionally, we interrogated the effect of cell orientation and interaction by increasing the area around each cell that was input into the CNN. We found that increasing the bounding box initially improves the CNN’s classification ability, reaching an optimal size, followed by a drop-off in discrimination ability. To further understand this phenomenon, we performed several in-silico experiments which either completely removed the surroundings around each cell or removed textural information about the cell. Even without this information, the CNN was able to correctly classify each subpopulation with moderate discrimination ability in each case. In both experiments, the contour of the cell of interest is emphasized by a sharp contrast between the background and the cell area. This supports recent research which has utilized active shape models for prediction21,22,43. However, we find that while both texture and the surrounding environment are important, including both enables the best discrimination ability.
One unexplored avenue for cell morphological analysis is the use of unsupervised approaches44,45 to monitor changes in cell populations. The ability to autonomously observe shifts and correlate them to specific events could enable researchers to observe cell health, view the effects of different drug mechanisms, and observe phenotypic plasticity without the need for techniques such as sequencing or fluorescent labeling. One method that has been used on cell populations is autoencoders20,25,26,46, which contain a reduced latent space that is representative of the input and have been used for unsupervised learning approaches. Autoencoders have been used primarily to perform latent space predictions and provide explanations for changes in cell morphology. Using this latent space to make inferences about changes in the transcriptome without a priori knowledge about these changes could give insight into these changes without the need for sequencing.
While these studies were performed on cells that also included fluorescent labels, classifying phenotype using only phase contrast images opens multiple possibilities for longitudinal experiments. The first is that it can be extended to many populations. Many automated imaging platforms capture red and green fluorescence, such that a maximum of 4 population labels (red, green, yellow, and non-fluorescing cells) can be monitored simultaneously. Given sufficient differences in morphology, a CNN could be adapted to identify additional subpopulations without fluorescent tags. Furthermore, some populations might not be suited to stable integration of a reporter gene, but a deep learning algorithm does not require this because it learns from morphology and interaction. Finally, not using fluorescent markers to track subpopulations frees the operator to have fluorescent indicators available for other uses. For example, cells could be stably integrated using the FUCCI47 system, enabling the ability to monitor the growth of each subpopulation as well as the cell cycle. The models trained within this paper are designed for analyzing phase contrast images to distinguish specific phenotypes. Future work can expand this to surveillance of phenotypes in other contexts, such as non-adherent cells or 3D cell culture.
The use of morphology for the tracking of subpopulations provides an avenue for a better understanding of the ecology and evolution of subpopulations. Recent research has demonstrated that mathematical modeling can be used to provide insights into more optimal treatment schedules48,49,50,51. However, these models are limited in their ability to consider the heterogeneity that exists within a tumor population. Using cell morphology for cell-state identification can provide insight into the transcriptomic composition of a population when testing and identifying new therapeutic strategies and targets.
Methods
Cell culture
MDA-MB-436 cells were cultured in high-glucose DMEM (Sigma D5796, 4.5 g/L glucose), 10% FBS, 0.01 mg/mL insulin (ThermoFisher, 12585014), 16 µg/mL glutathione (Sigma G013), penicillin-streptomycin (Thermo Scientific, 15140122). MDA-MB-231 cells were cultured in high-glucose DMEM (Sigma, D5796) containing 10% FBS and 1X penicillin-streptomycin (Thermo Scientific, 15140122). Cells were maintained at 37 °C in a 5% CO2 atmosphere.
Drug treatment
Doxorubicin hydrochloride (Cayman Chemical 150007) was reconstituted in water. Cell culture media was replaced with new media containing 550 nM doxorubicin. After 48 h, drug-free media replaced the doxorubicin-dosed media.
scRNA-seq library preparation
MDA-MB-231 samples from untreated and treated populations were harvested. Cells were loaded into wells of a Chromium A Chip. Libraries were prepared using the 10X Genomics 3′ single-cell gene expression (v2) protocol. Paired-end sequencing was performed using a NovaSeq6000 with an S4 chip (400 cycles) according to the manufacturer’s instructions.
scRNA-seq analysis
MDA-MB-231 single-cell data was aligned to the GRCH38-2020-A and processed using Cell Ranger v7.1. All data were post-processed in the same way as described previously37. Briefly, cells were filtered according to scanpy’s best practices52 by removing cells with a median absolute deviation greater than 5 for the number of genes, total counts, or percentage of mitochondrial genes when compared to the median amount for the entire population. The count matrix was normalized through a shifted logarithm transformation. Potential doublets were removed using scDblFinder53, and a UMAP representation was then calculated. Finally, cell line subpopulation identity was determined with Leiden clustering. To make comparisons across samples and subpopulations, we computed Pearson Correlation Coefficients (PCC) between each sample. To do so, we integrated all samples by re-normalizing data after concatenation. We then calculated the top 50 principal component values and found the PCC of these principal component values between each sample.
Fluorescent nuclear labeling of MDA-MB-231 cells
MDA-MB-231 cells were stably labeled using a sleeping beauty transposase system. MDA-MB-231 cells were grown to 90% confluence in a 10 cm dish, then transfected with Lipofectamine 3000 according to the manufacturer’s protocol with 2.5 µg of a plasmid containing Sleeping Beauty transposase (SB100X, AddGene #34879) and 2.5 µg of a plasmid containing either GFP or mCherry on a nuclear localization signal (NLS) and flanked by inverted repeats. Stably transfected cells were selected after 72 h with G418, then expanded for two passages before antibody staining.
Antibody staining and cell sorting
GFP and mCherry-labeled MDA-MB-231 cells were stained with 2 µL ESAM-PE-Vio770 (Miltenyi, 130-115-039) per million cells. Cells were gated on cell single-cell fraction, fluorescent channel, and ESAM staining. ESAM-low cells were sorted from the mCherry-labeled population, and ESAM-high cells were sorted from GFP-labeled population. Enriched populations were expanded for at least two passages, then further purified by performing another round of staining and FACS.
MDA-MB-436 cells were resuspended in cell staining buffer (PBS + 5 mM EDTA + 1% BSA + 1.6 mM NaOH + 0.01% sodium azide) and incubated in 1:100 diluted Zombie Violet viability dye (Biolegend, 423113) for 5 min on ice, then incubated with the 9 µL of APC-conjugated CD317 (BST2) antibody (Miltenyi, 130-101-660) per 1e6 cells for 20 min. Stained cells were washed 3 times with cell sorting buffer supplemented with 1:1000 diluted Zombie viability dye, then passed through a 40 µm cell strainer before FACS. Collection media for live cell sorting was prepared by supplementing complete media with 25 mM HEPES. Cells were sorted into 15 mL tubes containing 7 mL of collection media with gating on live cells, FSC-A/SSC-A, single cells, and BST2-lo or BST2-hi expression. Collected cells were spun down at 300 × g for 10 min, then plated in 50% conditioned media (CM) for 24 h before transitioning to fresh, complete media. CM was prepared from complete media incubated on a 70% confluent plate of parental cells for 24 h. CM was spun down at 500 × g for 10 min, and supernatant was passed through a 0.22 µm filter. CM was diluted to 50% with fresh, complete media. Sorted subpopulations were allowed to recover from the sort and expand for a week before analysis.
Image acquisition and processing
All images were acquired on an Incucyte S3 (Sartorius) at 20× magnification with nine images acquired per well. Fluorescent composite images were only used for cell phenotype ground-truth identification. To reduce the total number of cells per image, each 20× image was split into 16 separate image patches.
Cell segmentation
Image segmentation for cell lines in low to medium confluency images was achieved by training a custom Mask R-CNN instance segmentation algorithm. To obtain ground-truth segmentations, we first used Cellpose, a generalist segmentation algorithm, to obtain initial masks. We then reviewed each segmentation generated by Cellpose and corrected them when necessary to achieve better segmentation. These corrected masks were fed into Mask R-CNN for specialized training. MDA-MB-231 cells had an associated precision (AP) score of 66%. MDA-MB-436 cells had an AP score of 55%.
Cell phenotype discrimination using a CNN
For each experiment, we chose to use a pretrained network with residual connections (ResNet-152 trained on Image-Net 1k31) as they had shown promise in image classification. We used a stochastic gradient descent optimizer with an initial learning rate of 0.05 that decayed on each epoch. Any predefined augmentations (such as blacking out the background, increasing the surrounding bounding box, etc.) were applied to each image, and then were centered within a 300 × 300 pixel black background for 20× magnification images. To avoid overfitting, images were randomly flipped and rotated at each epoch. Each image was normalized to the dataset mean and standard deviation. Models were trained for 30 epochs or until accuracy on the test set did not increase for 5 epochs. A hold-out well, which was unseen in the training data, was used as a test set for subpopulation identification, and one well from each plate in the untreated and treated MDA-MB-231 experiments was used as a test set for those comparisons. The untreated test well consisted of 353 instances of untreated cells and 647 instances of treated cells. The MDA-MB-231 subpopulation well contained 3328 instances of 231-Subpop 1 and 8120 instances of 231-Subpop 2. The MDA-MB-436 test well consisted of 715 instances of 436-Subpop 1 and 780 instances of 436-Subpop 2. Training/testing loss curves for the best models and confusion matrices for these models were also determined (Figs. S3 and S4) and the analysis was repeated with 46 different hold-out wells (Fig. S5) to confirm robustness.
Data availability
Images and models used in this paper are available upon request.
Code availability
Scripts for training, testing, and segmentation are available at https://github.com/brocklab/transcriptomicClusterMorph.
References
Groves, S. M. et al. Archetype tasks link intratumoral heterogeneity to plasticity and cancer hallmarks in small cell lung cancer. Cell Syst. 13, 690–710.e17 (2022).
Chang, C. A. et al. Ontogeny and vulnerabilities of drug-tolerant persisters in HER2+ breast cancer. Cancer Discov. 12, 1022–1045 (2022).
Kinker, G. S. et al. Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity. Nat. Genet. 52, 1208–1218 (2020).
Su, Y. et al. Phenotypic heterogeneity and evolution of melanoma cells associated with targeted therapy resistance. PLOS Comput. Biol. 15, e1007034 (2019).
Litzenburger, U. M. et al. Single-cell epigenomic variability reveals functional cancer heterogeneity. Genome Biol. 18, 15 (2017).
Dave, A. et al. The Breast Cancer Single-Cell Atlas: defining cellular heterogeneity within model cell lines and primary tumors to inform disease subtype, stemness, and treatment options. Cell Oncol. 46, 603–628 (2023).
Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013).
Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nat. Rev. Cancer 6, 924–935 (2006).
Vendramin, R., Litchfield, K. & Swanton, C. Cancer evolution: Darwin and beyond. EMBO J. 40, e108389 (2021).
West, J. et al. Towards multidrug adaptive therapy. Cancer Res. 80, 1578–1589 (2020).
Strobl, M. A. R., Gallaher, J., Robertson-Tessi, M., West, J. & Anderson, A. R. A. Treatment of evolving cancers will require dynamic decision support. Ann. Oncol. 34, 867–884 (2023).
Gutierrez, C. et al. Multifunctional barcoding with ClonMapper enables high-resolution study of clonal dynamics during tumor evolution and treatment. Nat. Cancer 2, 758–772 (2021).
Torre, E. et al. Rare cell detection by single-cell RNA sequencing as guided by single-molecule RNA FISH. Cell Syst. 6, 171–179.e5 (2018).
Turashvili, G. & Brogi, E. Tumor heterogeneity in breast cancer. Front. Med. 4, 227 (2017).
Johnson, K. E. et al. Integrating transcriptomics and bulk time course data into a mathematical framework to describe and predict therapeutic resistance in cancer. Phys. Biol. 18, 016001 (2020).
Olson, A. C., Larson, N. M. & Heckman, C. A. Classification of cultured mammalian cells by shape analysis and pattern recognition. Proc. Natl. Acad. Sci. USA 77, 1516–1520 (1980).
Mousavikhamene, Z., Sykora, D. J., Mrksich, M. & Bagheri, N. Morphological features of single cells enable accurate automated classification of cancer from non-cancer cell lines. Sci. Rep. 11, 24375 (2021).
Theriault, D. H., Walker, M. L., Wong, J. Y. & Betke, M. Cell morphology classification and clutter mitigation in phase-contrast microscopy images using machine learning. Mach. Vis. Appl. 23, 659–673 (2012).
Alizadeh, E. et al. Cellular morphological features are predictive markers of cancer cell state. Comput. Biol. Med. 126, 104044 (2020).
Zaritsky, A. et al. Interpretable deep learning uncovers cellular properties in label-free live cell images that are predictive of highly metastatic melanoma. Cell Syst. 12, 733–747.e6 (2021).
Wu, P.-H. et al. Single-cell morphology encodes metastatic potential. Sci. Adv. 6, eaaw6938 (2020).
Wu, P.-H. et al. Evolution of cellular morpho-phenotypes in cancer metastasis. Sci. Rep. 5, 18437 (2015).
Wang, W. et al. Live-cell imaging and analysis reveal cell phenotypic transition dynamics inherently missing in snapshot data. Sci. Adv. 6, eaba9319 (2020).
Suzuki, G. et al. Machine learning approach for discrimination of genotypes based on bright-field cellular images. npj Syst. Biol. Appl. 7, 1–8 (2021).
Soelistyo, C. J., Vallardi, G., Charras, G. & Lowe, A. R. Learning biophysical determinants of cell fate with deep neural networks. Nat. Mach. Intell. 4, 636–644 (2022).
Wakui, T., et al. Predicting reprogramming-related gene expression from cell morphology in human induced pluripotent stem cells. Mol. Biol. Cell 34, ar45 (2023).
Chow, Y. L., Singh, S., Carpenter, A. E. & Way, G. P. Predicting drug polypharmacology from cell morphology readouts using variational autoencoder latent space arithmetic. PLOS Comput. Biol. 18, e1009888 (2022).
Harrison, P. J. et al. Evaluating the utility of brightfield image data for mechanism of action prediction. PLOS Comput. Biol. 19, e1011323 (2023).
Lovitt, C. J., Shelper, T. B. & Avery, V. M. Doxorubicin resistance in breast cancer cells is mediated by extracellular matrix proteins. BMC Cancer 18, 41 (2018).
Howard, G. R., Jost, T. A., Yankeelov, T. E. & Brock, A. Quantification of long-term doxorubicin response dynamics in breast cancer cell lines to direct treatment schedules. PLOS Comput. Biol. 18, e1009104 (2022).
He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. Preprint at https://doi.org/10.48550/arXiv.1512.03385 (2015).
resnet152 — Torchvision main documentation. https://pytorch.org/vision/main/models/generated/torchvision.models.resnet152.html#torchvision.models.ResNet152_Weights.
Kensert, A., Harrison, P. J. & Spjuth, O. Transfer learning with deep convolutional neural networks for classifying cellular morphological changes. SLAS Discov. 24, 466–475 (2019).
Paramanantham, A. et al. Doxorubicin-resistant TNBC cells exhibit rapid growth with cancer stem cell-like properties and EMT phenotype, which can be transferred to parental cells through autocrine signaling. Int. J. Mol. Sci. 22, 12438 (2021).
Han, R. et al. Activation of β-catenin signaling is critical for doxorubicin-induced epithelial–mesenchymal transition in BGC-823 gastric cancer cell line. Tumor Biol. 34, 277–284 (2013).
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Gardner, A. L., Jost, T. A., Morgan, D. & Brock, A. Computational identification of surface markers for isolating distinct subpopulations from heterogeneous cancer cell populations. npj Syst. Biol. Appl. 10, 1–9 (2024).
Böttger, K. et al. An emerging Allee effect is critical for tumor initiation and persistence. PLOS Comput. Biol. 11, e1004366 (2015).
Johnson, K. E. et al. Cancer cell population growth kinetics at low densities deviate from the exponential growth model and suggest an Allee effect. PLOS Biol. 17, e3000399 (2019).
Batt, D. B. & Roberts, T. M. Cell density modulates protein-tyrosine phosphorylation. J. Biol. Chem. 273, 3408–3414 (1998).
Pham, K. et al. Density-dependent quiescence in glioma invasion: instability in a simple reaction–diffusion model for the migration/proliferation dichotomy. J. Biol. Dyn. 6, 54–71 (2012).
Mendonsa, A. M., Na, T.-Y. & Gumbiner, B. M. E-cadherin in contact inhibition and cancer. Oncogene 37, 4769–4780 (2018).
Phillip, J. M., Han, K.-S., Chen, W.-C., Wirtz, D. & Wu, P.-H. A robust unsupervised machine-learning method to quantify the morphological heterogeneity of cells and nuclei. Nat. Protoc. 16, 754–774 (2021).
Schmarje, L., Santarossa, M., Schroder, S.-M. & Koch, R. A survey on semi-, self- and unsupervised learning for image classification. IEEE Access 9, 82146–82168 (2021).
Dillon, B. M., Plehn, T., Sauer, C. & Sorrenson, P. Better latent spaces for better autoencoders. SciPost Phys. 11, 061 (2021).
Sommer, C., Hoefler, R., Samwer, M. & Gerlich, D. W. A deep learning and novelty detection framework for rapid phenotyping in high-content screening. Mol. Biol. Cell 28, 3428–3436 (2017).
Sakaue-Sawano, A. et al. Visualizing spatiotemporal dynamics of multicellular cell-cycle progression. Cell 132, 487–498 (2008).
Maltas, J. et al. Drug dependence in cancer is exploitable by optimally constructed treatment holidays. Nat. Ecol. Evol. 8, 147–162 (2024).
Gatenby, R. A. & Brown, J. S. Integrating evolutionary dynamics into cancer therapy. Nat. Rev. Clin. Oncol. 17, 675–686 (2020).
Miroshnychenko, D. et al. Stroma-mediated breast cancer cell proliferation indirectly drives chemoresistance by accelerating tumor recovery between chemotherapy cycles. Cancer Res. 83, 3681–3692 (2023).
Kazerouni, A. S. et al. Integrating quantitative assays with biologically based mathematical modeling for predictive oncology. iScience 23, 101807 (2020).
Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).
Germain, P.-L., Lun, A., Meixide, C. G., Macnair, W. & Robinson, M. D. Doublet identification in single-cell sequencing data using scDblFinder. Preprint at https://doi.org/10.12688/f1000research.73600.2 (2022).
Acknowledgements
We thank R01CA226258, R01CA255536, and U01CA253540 (to A.B.), which provided funding that supported this project. 10X Genomics Single-Cell 3′ Gene Expression and TagSeq preparation were performed by the Genomic Sequencing and Analysis Facility at UT Austin, Center for Biomedical Research Support (RRID: SCR_021713). Flow cytometry and FACS were performed at the Center for Biomedical Research Support Microscopy and Imaging Facility at UT Austin (RRID: SCR_021756). T.J. thanks Clarence Yapp for guidance on instance segmentation, model training, and image augmentation.
Author information
Authors and Affiliations
Contributions
T.J. performed computational studies, wet lab experiments, and data analysis. A.G. performed wet lab experiments and data analysis. D.M. contributed to methodology development. T.J. and A.G. drafted the manuscript. A.B. supervised the project. All authors contributed to the interpretation of the results and reviewed and edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Jost, T.A., Gardner, A.L., Morgan, D. et al. Deep learning identifies heterogeneous subpopulations in breast cancer cell lines. npj Biomed. Innov. 2, 19 (2025). https://doi.org/10.1038/s44385-025-00023-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s44385-025-00023-z