Abstract
Lung cancer, particularly adenocarcinoma, ranks high in morbidity and mortality rates worldwide, with a relatively low five-year survival rate. To achieve precise prognostic assessment and clinical intervention for patients, thereby enhancing their survival prospects, there is an urgent need for more accurate stratification schemes. Currently, the TNM staging system is predominantly used in clinical practice for prognostic evaluation, but its accuracy is constrained by the reliance on physician experience. Although biomarker discovery based on molecular pathology offers a new perspective for prognostic assessment, its dependence on expensive gene panel testing limits its widespread clinical application. Pathological images contain abundant diagnostic information, providing a new avenue for prognostic evaluation. In this study, we employed advanced Hover-Net technology to accurately quantify the abundance of epithelial cells, lymphocytes, macrophages, and neutrophils from pathological images, and delved into the clinical and biological significance of these cellular abundances. Our research findings reveal that, in contrast to patients classified as N0 stage, those belonging to the N1 stage demonstrated a marked elevation in the infiltration of epithelial cells, lymphocytes, macrophages, and neutrophils. Notably, the infiltration patterns of lymphocytes and neutrophils exhibited an inverse relationship with the activation status of numerous pivotal gene pathways, including the HALLMARK_HEME_METABOLISM pathway. Furthermore, our analysis distinguished FABP7 as a prognostic biomarker, exhibiting pronounced differential expression between patients with high and low levels of neutrophil infiltration, indicate that cellular abundance analysis based on pathological images can provide a more accurate and cost-effective prognostic evaluation, offering new strategies for the clinical management of lung adenocarcinoma.
Similar content being viewed by others

Introduction
Lung cancer is the second most common cancer worldwide, and the leading cause of cancer-related death worldwide. Lung adenocarcinoma (LUAD) has become the most common subtype of lung cancer globally in 2020, with incidence rates exceeding those of squamous cell carcinoma in most countries1. Despite the significant advances in LUAD multidiscipline treatment, including surgery, chemotherapy, radiotherapy, and especially targeted therapy, the survival rate of patients with LUAD remains discouragingly low. The identification of biomarkers that can precisely reflect the onset, progression, and prognosis of lung cancer, and the subsequent development of individualized treatment plans based on these biomarkers, has emerged as a pivotal and pressing area of focus in contemporary lung cancer research.
At present, the Tumor-Node-Metastasis (TNM) staging system holds a pivotal role in predicting the prognosis and formulating treatment strategies for LUAD2,3. However, despite the latest advancements in the TNM staging system, accurately prognosticating the survival outcomes of LUAD patients remains challenging, often yielding an imprecise reflection of their actual survival status4. Advancements in therapeutic technologies, particularly the rise of targeted therapies and immunotherapies, have broadened the spectrum of adjuvant treatment options for patients with advanced LUAD5,6. It is noteworthy that even among patients who are at the same disease stage and receive similar adjuvant treatment regimens, there is significant variability in their survival benefits, highlighting the limitations of the current TNM staging system in predicting the efficacy of adjuvant therapies7,8,9. These observations underscore a compelling notion that the current TNM staging system may fall short in comprehensively encompassing all critical factors influencing the prognosis of LUAD, thereby hindering precise identification of patients who are most likely to derive benefit from specific adjuvant therapies. This reality underscores the pressing necessity to explore novel biomarkers that are intimately tied to both the prognosis of LUAD and the responsiveness to adjuvant treatments.
To gain a profounder insight into the clinically observed heterogeneity in prognosis and variations in adjuvant therapy responses, researchers have fervently pursued the development of diverse subtype classification methodologies grounded in gene expression data10. These methodologies have not only enriched our comprehension of the intricate molecular mechanisms driving LUAD but have also illuminated potential, subtype-specific patterns of responsiveness to targeted therapeutic strategies. However, despite their potential, the substantial costs and intricacies associated with transcriptomic analyses, encompassing expression microarrays and RNA-seq technologies, remain formidable hurdles impeding their widespread integration into clinical practice11,12,13.
As technological advancements in pathological diagnosis have flourished, particularly with the remarkable progress in slide scanning technology and the drastic reduction in digital storage costs, the full digitization of LUAD tissue sections has become a reality. This advancement has underpinned the emergence of "pathomics," a burgeoning field focused on extracting vast amounts of information from digital pathological images to generate quantitative features that comprehensively represent the diverse phenotypic profiles of tissue samples14,15,16. Concurrently, the application of deep learning algorithms in image recognition has matured significantly, enabling the automated extraction of pertinent feature information from extensive collections of pathological images. The distinguishing attributes of tumor cells, including their morphology, structure, texture, and others, are readily discernible through pathological images. By leveraging deep learning algorithms to extract and classify these features from tumor cell images, we can achieve precise diagnosis of lung cancer17,18,19,20. For instance, Zhang et al. extracted 50 histomorphological phenotypic clusters to assess the therapeutic response in small cell lung cancer17. Haoyang Mi et al. developed cisplatin-based NAC predictive models for MIBC using nuclear morphology, tissue architecture, and IHC staining of salient proteins. They explored the predictive capability of multiscale computational features extracted from histological and immunohistological images for NAC response in MIBC treatment21. Jun Cheng et al. differentiated TFE3-RCC and ccRCC using quantitative histopathological features from H&E-stained whole slide images, demonstrating the potential of routine H&E slides to facilitate TFE3-RCC diagnosis and improve sample management and treatment development for this rare, aggressive cancer subtype22. Thus, we postulate that the diverse cellular characteristics discerned from H&E-stained sections can serve as predictors of patient prognosis and survival outcomes.
Inspired by Graham et al23, who utilized the horizontal and vertical distances of nuclear pixels to distinguish clustered cells and employed a dedicated upsampling branch to classify the nuclear type for each segmented instance, we have automatically segmented four types of cells (epithelial cells, phagocytes, neutrophils, and lymphocytes) using the HoVer-Net algorithm. By leveraging cutting-edge bioinformatics tools and sophisticated data analysis methodologies, we have comprehensively explored the intricate gene networks intricately intertwined with the progression of LUAD. Upon an in-depth examination of the pathological features differentiating N0 and N1 stage patients, we observed a pronounced increase in the infiltration of epithelial cells, lymphocytes, macrophages, and neutrophils in N1 stage patients, in contrast to their N0 counterparts. Notably, the infiltration levels of lymphocytes and neutrophils were intimately correlated with the suppression of multiple pivotal gene pathways, with a prominent example being the downregulation of the HALLMARK_HEME_METABOLISM pathway. Furthermore, through nuanced analysis of patient subsets displaying varying degrees of neutrophil infiltration, we successfully identified FABP7 as a prognostic biomarker that exhibits significant differences, with its expression level undergoing characteristic alterations in patients with heightened infiltration. As our research endeavors continue to deepen and technological advancements flourish, we are steadfastly committed to translating these research findings into increasingly precise and effective therapeutic interventions, ultimately aiming to enhance the quality of life and improve the prognostic outlook for LUAD patients.
Materials and methods
Data sources
The data included is divided into two parts: public data and private data (Fig. 1a). The public data originates from the Cancer Genome Atlas (TCGA), which contains a large number of primary cancers and their pathological images, providing researchers with a significant amount of searchable, viewable, and downloadable public data sources. We downloaded formalin-fixed paraffin-embedded (FFPE) pathological images and clinical information of LUAD from TCGA database, with follow-up outcomes at a specific time point as the prediction target. We selected a total of 232 clinical patients, whose records contained information on survival and recurrence/metastasis. Additionally, the private data originates from the data of 111 patients hospitalized between March 2013 and January 2018. We have designated this section of data as the “DL cohort”. This retrospective study was approved by the Ethics Committee of the First Affiliated Hospital of Dalian Medical University (Approval Number: PJ-KS-KY-2023-559). The endpoint for survival analysis is invasive disease-free survival (IDFS), defined as the time from tumor resection surgery to the occurrence of one of the following events: recurrence of LUAD, local lymph node metastasis, distant metastasis, second primary non-LUAD cancer, or death due to any cause. In addition, the study also labeled patients according to their recurrence and metastasis status at the time of the last follow-up.
Data Collection and Refined Hover-Net-Based Cell Segmentation Workflow. (a) Data collection details: We gathered patient data from both public databases and hospitals, comprising 232 cases from the former and 111 cases from the latter. (b) Schematic diagram for image preprocessing. (c) Extraction of cells from the tumor microenvironment: By employing the pre-trained Hover-net approach, we achieved concurrent segmentation and classification of cell nuclei within the images, successfully distinguishing between four distinct cell types: epithelial cells, lymphocytes, macrophages, and neutrophils.
Nucleus extraction
This study employed the Hover-Net23 method for concurrent segmentation and classification of nuclei in images. Hover-Net, a deep learning-driven image processing model, achieves precise recognition of nuclei through the integration of three distinct branches: a classification branch, a pixel prediction branch, and a distance map prediction branch. Specifically, we initially had pathologists manually delineate the tumor and adjacent non-tumor areas in the pathological images, and we employed a patch sampling technique24 to segment both regions into individual 256 × 256-pixel patches. Subsequently, we employed the threshold-based Otsu binarization technique25 to segment the blank background. Patches containing over 30% white background were discarded, a total of 12,822,963 patches (tumor: 4,105,929, normal: 8,717,034) were segmented for all patients. And we further utilized the Macenko method26 for color normalization, aiming to mitigate color variations stemming from inconsistencies in dyes and scanners during the manual production process (Fig. 1b). Finally, we utilized the pre-trained Hover-net model to perform nuclei segmentation and classification on the target dataset. In the nuclei segmentation phase, Hover-net initially leverages the pixel prediction branch to ascertain whether pixels in the image constitute part of a nucleus, thereby effectively segregating the nuclei from the surrounding background. Subsequently, the distance map prediction branch is employed to accurately predict the horizontal and vertical distance maps, indicating the distance of nuclear pixels from the centroid of the nuclear pixel cluster. This enables the precise identification and segregation of adjacent or overlapping nuclear pixels. Using the classification results obtained from Hover-net, we have successfully identified four distinct types of cell nuclei: epithelial cells, lymphocytes, macrophages, and neutrophils. We then proceeded to conduct a grouped analysis based on these identified types. This analysis provides valuable insights into the roles and relationships of various cell nuclei types in the context of diseases (Fig. 1c).
The analysis and utilization of gene expression data
In this study, we adopted RNA sequencing (RNA-seq) technology to quantify the expression levels of genes across various samples. By conducting a thorough analysis of these transcriptome datasets, we harnessed bioinformatics tools to pinpoint genes and pathways that are implicated in disease processes. To elaborate, pathway scores based on RNAseq were computed via combined z-score using the R package GSVA, leveraging gene sets from the comprehensive Molecular Signatures Database (MSigDB, https://www.gsea-msigdb.org/gsea/msigdb). MSigDB comprehends numerous categories of gene sets, with a special focus on the hallmark gene sets that encompass 50 distinct collections. These signature gene sets are extensively utilized to capture the core biological processes of cell types or specific disease conditions. We specifically conducted GSVA gene scoring on the hallmark gene sets to assess the activity of these biological pathways or processes across diverse samples.
Statistical analysis
In the analysis of RNA sequencing data, we utilized the powerful tools DESeq2 and edgeR to identify differences in gene expression across various conditions. To delve deeper into the relationship between these differentially expressed genes and disease progression, we leveraged the Cox proportional hazards model for modeling, establishing a significance threshold of P < 0.05.
For the survival analysis aspect, we capitalized on the “survminer” package in R, which offers a robust array of survival analysis functions and visualization tools, thereby facilitating a more intuitive comprehension of patients’ survival status and prognosis.
Results
Clinco-pathological features of the LUAD patients
111 patients from DL cohort between March 2013 and January 2018 and 232 patients from TCGA were enrolled in this study (Table 1). The median age of the patients in the two datasets was 61 (range, 55–68 years) and 65 years (range, 59–73 years), respectively, and most patients were women (74.7%, 56.0%, respectively). The pathological stages of the patients in the two datasets mainly at stage I (70.2% from the DL cohort and 63.3% from TCGA). Among the hospital patients, 5 patients recurred after one year, 10 patients recurred after three years and 7 patients recurred after five years. Among the TCGA patients, 32 patients recurred after one year, 81 patients recurred after three years and 86 patients recurred after five years.
Cellular distribution in the tumor microenvironment: analysis overview
In our comprehensive analysis of the tumor microenvironment, we delved into the cellular composition of both TCGA (The Cancer Genome Atlas) cohort and DL cohort, aiming to uncover potential differences and similarities in their cellular landscapes. The statistical analysis presented in Fig. 2 offers valuable insights into the proportional distribution of four key cell types within these cohorts. Specifically, our findings reveal a remarkably consistent pattern in the relative abundance of epithelial cells, lymphocytes, macrophages, and neutrophils between the two cohorts, suggesting a potential universality in the cellular composition of tumor microenvironments in LUAD (Fig. 2a). Notably, the analysis underscores a clear dichotomy in the distribution of epithelial cells and lymphocytes. Epithelial cells, being the primary cell type of origin in many cancers, are more prevalent in tumor regions compared to adjacent non-cancerous areas, a finding that aligns with the fundamental nature of tumorigenesis. Conversely, lymphocytes, as key players in the immune response, exhibit an inverse trend, with a lower presence in tumor regions, hinting at potential immune evasion mechanisms at play. Furthermore, our in-depth differential analysis has illuminated pronounced variations in the distribution patterns of lymphocytes, macrophages, neutrophils, and their combinations, independent of the data source. As evidenced in Fig. 2b,c, these cell types exhibit remarkable differences, notably with a prevalence of higher abundance in adjacent non-cancerous areas. These discoveries underscore not only the intricate complexity and heterogeneity inherent in the tumor microenvironment but also emphasize the critical need for a meticulous comprehension of its cellular composition to inform and refine targeted therapeutic approaches.
Characterisation of tumor-infiltrating cells in the microenvironment of LUAD tissues. The distribution pattern of cells within the tumor and its adjacent regions. (a) Illustrates the cell content distribution statistics in the Normal and Tumor regions from the DL cohort and TCGA cohort. (b, c) respectively show the distribution differences of different types of cells from the DL cohort and TCGA cohort. In (b) and (c), the content and color of the labels below the cell segmentation diagrams represent the cells of primary concern in that segmentation diagram. For example, in the segmentation diagram of epithelial cells, one can see the red English word “red” within the brackets below, indicating that in this set of segmentation diagrams, we have outlined epithelial cells with red circles. Wilcoxon rank-sum test was used. Statistical significance was indicated as follows: ns P ≥ 0.05, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.
The association between cell content and lymph node metastasis staging
To gain a profounder understanding of the intricate interplay between the tumor microenvironment and clinical outcomes, we conducted an intensive examination of the distribution patterns of cellular abundance within tumors and their adjacent non-cancerous regions, with a particular emphasis on assessing the clinical significance of this key indicator within pathological images. Herein, we present a comprehensive analysis of cellular content in LUAD tissues, ingeniously integrating heatmap visualization techniques with exhaustive clinical data to unveil novel insights that could inform the development of individualized treatment strategies (Fig. 3). In addition to the standard T, N, M pathological staging, age, and gender data shared by both the TCGA and DL cohorts, the heatmaps unique to the DL cohort also incorporate immune protein statistics. Furthermore, we found immune protein statistics for EGFR, Her-2, and p53 in the TCGA database. However, unfortunately, we discovered that the TCGA database did not include protein detection results for TOPIIA, p170, Ki67, and ALK. Therefore, we used mRNA expression data as a substitute and divided the data into high and low groups based on the median value. In the visualization of TCGA heatmaps, a pronounced elevation in the abundance of epithelial cells, lymphocytes, macrophages, and neutrophils is observed in N1 or advanced-stage patients when compared to those in the N0 stage, providing compelling evidence for the existence of lymph node metastasis (Fig. 3a). To further explore the impact of N staging, we conducted differential analysis of N staging and cellular content in both the tumor-adjacent and tumor groups. The differential analysis results showed that N0 staging indeed had significant differences compared to other staging phases, further corroborating the critical role of N staging in tumor progression (Fig. 3b,c). While the proportion of N1 and N2 stages in DL cohort’s dataset is comparatively limited, this trend aligns with the overarching observations from TCGA. The close correlation between N staging and patient survival is universally acknowledged. Specifically, patients classified as N0, indicating the absence of lymph node metastasis, often exhibit superior prognosis and higher survival rates. This significant discovery underscores the crucial importance of early detection and treatment of lymph node metastasis, ultimately aiming to provide patients with more effective therapeutic strategies and improved chances of survival.
A Comprehensive Assessment of Clinical Information and Cellular Distribution Patterns. (a) Heatmap of Cellular Content and Clinical Characteristics Distribution. (b) In the peritumoral regions of the TCGA cohort, an analysis was conducted to discern the differences in cellular content relative to N-stage. (c) In the tumor regions of the TCGA cohort, an analysis was conducted to discern the differences in cellular content relative to N-stage.
The cellular content exhibits a correlation with the activity of certain specific gene pathways
To elucidate the biological relevance of the observed abundances of epithelial cells, lymphocytes, macrophages, and neutrophils in pathological images, we initially utilized the ssGSVA methodology to derive scores for canonical cancer hallmark pathways. Subsequently, we assessed the correlation between these cellular abundances and the pathway scores. To refine our findings (Fig. 4a), we have discovered that both lymphocytes and neutrophils exhibit pronounced positive correlations with five pivotal gene pathways: HALLMARK_HEME_METABOLISM, HALLMARK_PROTEIN_SECRETION, HALLMARK_ANDROGEN_RESPONSE, HALLMARK_UV_RESPONSE_DN, and HALLMARK_NOTCH_SIGNALING. Notably, both cell types, lymphocytes and neutrophils, display a particularly striking negative correlation with HALLMARK_HEME_METABOLISM. Widely acknowledged, the HALLMARK analysis unveils gene sets that are intricately linked to a range of biological processes, encompassing oxidative phosphorylation, lipid metabolism, apoptosis, and others. These gene sets potentially hold crucial significance in the development and progression of various diseases. Multiple studies have suggested that heme and its metabolites exert significant influence on cellular proliferation, migration, and invasion. Notably, specific enzymes and proteins within the heme metabolism pathway emerge as potential therapeutic targets for cancer treatment. By precisely modulating the expression or functionality of these enzymes and proteins, we may effectively hinder the growth and dissemination of cancer cells27,28. Distortions or dysregulations in the protein secretion process can potentially impair the normal functioning and regulatory mechanisms of cells29. The expression of androgen receptor (AR) in lung cancer exhibits a potential correlation with the advancement of the disease and the extent of lymph node metastasis. Notably, in patients diagnosed with stage III lung cancer, the positive expression of AR is conspicuously elevated compared to those with stage I lung cancer30. The integration of Notch blockade with chemotherapy holds promise in efficiently suppressing tumor growth and postponing recurrence, thus hinting at the potential of the Notch signaling pathway as a therapeutic target in lung cancer management31. Lymphocytes and neutrophils are capable of recognizing signals via their surface receptors, initiating intracellular signaling pathways, and subsequently regulating the expression of specific genes32.
Analysis of the correlation between specific cellular components and gene pathway expressions. (a) The correlation between cellular content and signaling pathways. (b) Analysis of the differences in lymphocyte cellular content and designated signaling pathways. (c) Analysis of the differences in neutrophils cellular content and designated signaling pathways. (d) Linear regression analysis of specific cellular content and designated signaling pathways. (e) Survival curve.
After a thorough analysis of five gene pathways, we categorized the patients into high and low expression groups, utilizing the median cell content as the demarcation point. The Wilcoxon rank-sum test analysis has uncovered significant disparities in the HALLMARK_HEME_METABOLISM and HALLMARK_PROTEIN_SECRETION2 pathways between lymphocytes and neutrophils (Fig. 4b,c). Precisely, the ssGSVA scores are markedly elevated in the group with cell content falling below the median threshold, compared to those above it. Additionally, lymphocytes exhibit a notable degree of variation across three other pathways, with the pathway scores consistently higher in the group characterized by lower cell content than in the group with higher cell content. Furthermore, the linear regression analysis of these five pathways indicated a notable negative correlation between the pathways and the cell content (Fig. 4d). Upon thorough analysis of the survival data presented in Fig. 4e, it becomes evident that the HALLMARK_HEME_METABOLISM and HALLMARK_PROTEIN_SECRETION2 pathways have a noteworthy impact on the prognostic survival outcomes of patients. The HALLMARK_PROTEIN_SECRETION pathway exhibits different trends in survival time compared to the HALLMARK HEME METABOLISM pathway, primarily due to their distinct functional mechanisms and interactions with gene expression. High expression of the HALLMARK HEME METABOLISM pathway enhances cellular survival by promoting heme metabolism and the function of related proteins, thereby extending survival time33. In contrast, high expression of the HALLMARK_PROTEIN_SECRETION pathway influences the tumor microenvironment, promoting tumor cell invasiveness and metastatic ability, thereby inhibiting survival34,35.
Identification of the key roles of neutrophil-related genes in LUAD
In this study, we utilized two widely utilized differential analysis methods, edgeR and DESeq2, to perform an in-depth screening of differentially expressed genes within the dataset. Through a comparative analysis of the outcomes from both methods, we successfully identified a set of differentially expressed genes that were consistently detected by both edgeR and DESeq2 (Fig. 5a,b,e,f). To further explore the interplay between differentially expressed genes and specific biological attributes, we employed the median cell content as a threshold to categorize the samples into high and low groups. Subsequently, leveraging the previously screened differentially expressed genes, we constructed distinct heatmaps that illustrate the gene expression patterns, specifically tailored to lymphocyte and neutrophil content (Fig. 5d,h). The heatmap distinctly underscores a noteworthy trend: genes associated with a higher neutrophil content exhibit notably higher expression levels. This finding profoundly signifies the vigorous activity of cell differentiation and proliferation processes, offering pivotal insights into the biological underpinnings of diseases, including LUAD.
Unraveling the biological mechanisms of LUAD disease linked to cellular activity via differentially expressed gene analysis. (a) Volcano plot depicting lymphocyte-related differential expression analysis via DESeq2. (b) Volcano plot depicting lymphocyte-related differential expression analysis via edgeR. (c) Common genes identified in lymphocytes through two differential gene expression analysis methods. (d) Heatmap constructed using the median lymphocyte content as a threshold. (e) Volcano plot depicting neutrophil-related differential expression analysis via DESeq2. (f) Volcano plot depicting neutrophil-related differential expression analysis via edgeR. (g) Common genes identified in neutrophils through two differential gene expression analysis methods. (h) Heatmap constructed using the median neutrophil content as a threshold. (i) COX regression analysis. (j) Survival curve based on the risk score calculated from neutrophil-focused COX regression analysis.
Having obtained the differentially expressed genes, we performed both univariate and multivariate regression analyses to assess the strength of their correlation with the outcomes (Fig. 5i). Through rigorous statistical scrutiny, we identified two genes that exhibited a significant association with neutrophils. These two genes not only exhibited significant statistical significance in the regression analysis, but also hold immense biological importance. For instance, the protein encoded by the FABP7 gene is notably abundant in its expression within the nervous system, glial cells, and cancer cells36. This remarkable discovery presents us with a novel lens to comprehend the underlying functional mechanisms of these genes in life processes. To further substantiate the clinical significance of these two prominent genes, we computed risk scores derived from their expression patterns and crafted survival curves (Fig. 5j). The outcomes revealed a marked correlation between the risk scores associated with these genes and patients’ survival status, thereby strengthening the pivotal roles they play in disease progression. To bolster the reliability and universality of our results, we conducted an independent validation study utilizing the identical genes from the CPTAC dataset. The outcomes of this validation study were equally compelling, further validating the broad applicability and significant clinical implications of our discoveries.
Discussion
This study has successfully pinpointed crucial cell types that are intricately linked to the progression of LUAD, encompassing epithelial cells, lymphocytes, macrophages, and neutrophils. These cell types occupy diverse roles within the tumor microenvironment, collectively shaping the course of the disease. For instance, the aberrant proliferation and differentiation of epithelial cells are hallmark characteristics of LUAD37, while lymphocytes and macrophages are likely involved in tumor immune responses and inflammatory reactions38,39. Furthermore, the accumulation of neutrophils in the tumor microenvironment potentially signifies a tumor immune evasion mechanism40. These revelations offer a fresh perspective in our comprehension of the biological mechanisms underlying LUAD.
After a thorough analysis of gene pathways, we have identified numerous pathways, including HALLMARK_HEME_METABOLISM and HALLMARK_PROTEIN_SECRETION, that are intricately linked to the progression of LUAD. These pathways are pivotal in regulating tumor cell proliferation, migration, invasion, and immune evasion. Notably, we observed significant associations between specific gene pathways and cell types, particularly the correlation between the HALLMARK_HEME_METABOLISM pathway and neutrophils, which underscores the collaborative influence of cell types and gene pathways in LUAD progression. Utilizing differential gene screening and functional analysis, we have pinpointed several crucial genes, such as FABP7, that have significant relevance in LUAD’s development. These genes exert profound effects on tumor cell proliferation, apoptosis, migration, and invasion. By delving deeper into the functions and regulatory mechanisms of these genes, we aim to unravel the intricate biological mechanisms underlying LUAD, thereby laying the foundation for the development of novel diagnostic markers and therapeutic targets. As we progress, we will continue to explore the mechanisms of these cell types and gene pathways in LUAD, with the ultimate goal of providing patients with more effective treatment options.
Despite achieving significant findings, this study has its limitations. Firstly, the relatively small sample size may not comprehensively capture the heterogeneity of LUAD patients. Secondly, our focus has primarily been on gene expression levels, overlooking the analysis of protein levels, metabolic processes, and other crucial aspects. In the future, we plan to increase our sample size and incorporate a multi-omics approach to gain a more comprehensive understanding of the biological mechanisms underlying LUAD. Furthermore, we intend to delve deeper into the regulatory mechanisms of key cell types and gene pathways in LUAD, as well as their potential associations with other diseases, to develop more holistic strategies for its prevention and treatment.
Conclusion
After conducting an exhaustive analysis of cell type distribution and gene expression variations in LUAD samples, our study has made significant advancements. Utilizing state-of-the-art image analysis technologies, we accurately segmented various cell types from pathological images, laying a robust foundation for subsequent investigations. Delving deeper, we explored gene pathways intimately linked to disease progression, based on the differences observed among cell types. Our key findings include: Firstly, we uncovered a close association between neutrophils and the initiation and progression of LUAD. This revelation challenges conventional paradigms, emphasizing the intricate roles played by immune cells within the tumor microenvironment, particularly their dual potential to either augment or suppress tumor growth. Secondly, we observed notable differences in patient survival rates associated with genes such as FABP7, suggesting their potential as prognostic biomarkers to assess treatment responsiveness and survival prognosis. This discovery provides a theoretical foundation for the development of personalized therapeutic strategies based on gene expression profiles. Collectively, these pivotal findings not only deepen our understanding of the biological mechanisms underlying LUAD but also pave the way for the discovery of novel diagnostic markers and therapeutic targets. By precisely targeting cell types and gene pathways associated with disease progression, we can devise more effective and tailored treatment strategies, ultimately enhancing patient prognosis and quality of life.
Data and code availability
All data and code used in the article are available from Github (https://github.com/lvyaping/LUAD_Cell).
References
Zhang, Y. et al. Global variations in lung cancer incidence by histological subtype in 2020: a population-based study. Lancet Oncol. 24, 1206–1218. https://doi.org/10.1016/s1470-2045(23)00444-8 (2023).
Xing, X. et al. Decoding the multicellular ecosystem of lung adenocarcinoma manifested as pulmonary subsolid nodules by single-cell RNA sequencing. Sci. Adv. https://doi.org/10.1126/sciadv.abd9738 (2021).
Duan, G. C. et al. Circulating tumor cells as a screening and diagnostic marker for early-stage non-small cell lung cancer. Onco Targets Ther. 13, 1931–1939. https://doi.org/10.2147/ott.S241956 (2020).
Song, C. et al. A prognostic nomogram combining immune-related gene signature and clinical factors predicts survival in patients with lung adenocarcinoma. Front. Oncol. 10, 1300. https://doi.org/10.3389/fonc.2020.01300 (2020).
Gandhi, L. et al. Pembrolizumab plus chemotherapy in metastatic non-small-cell lung cancer. N. Engl. J. Med. 378, 2078–2092. https://doi.org/10.1056/NEJMoa1801005 (2018).
Wu, Y. L. et al. Osimertinib in resected EGFR-mutated non-small-cell lung cancer. N. Engl. J. Med. 383, 1711–1723. https://doi.org/10.1056/NEJMoa2027071 (2020).
Aggarwal, C. et al. Molecular testing in stage I-III non-small cell lung cancer: Approaches and challenges. Lung. Cancer 162, 42–53. https://doi.org/10.1016/j.lungcan.2021.09.003 (2021).
Šutić, M. et al. Diagnostic, predictive, and prognostic biomarkers in non-small cell lung cancer (NSCLC) management. J. Pers Med. https://doi.org/10.3390/jpm11111102 (2021).
He, B. et al. Predicting response to immunotherapy in advanced non-small-cell lung cancer using tumor mutational burden radiomic biomarker. J. Immunother. Cancer https://doi.org/10.1136/jitc-2020-000550 (2020).
Karczewski, K. J. & Snyder, M. P. Integrative omics for health and disease. Nat. Rev. Genet. 19, 299–310. https://doi.org/10.1038/nrg.2018.4 (2018).
Gillette, M. A. et al. Proteogenomic characterization reveals therapeutic vulnerabilities in lung adenocarcinoma. Cell 182, 200-225.e235. https://doi.org/10.1016/j.cell.2020.06.013 (2020).
Ma, W. et al. Genomic and transcriptomic profiling of combined small-cell lung cancer through microdissection: Unveiling the transformational pathway of mixed subtype. J. Transl. Med. 22, 189. https://doi.org/10.1186/s12967-024-04968-4 (2024).
Wei, Q. et al. Molecular subtypes of lung adenocarcinoma patients for prognosis and therapeutic response prediction with machine learning on 13 programmed cell death patterns. J. Cancer Res. Clin. Oncol. 149, 11351–11368. https://doi.org/10.1007/s00432-023-05000-w (2023).
Caie, P. D., Dimitriou, N. & Arandjelović, O. In Artificial Intelligence and Deep Learning in Pathology (ed Stanley Cohen) 149–173 (Elsevier, 2021).
Barisoni, L., Lafata, K. J., Hewitt, S. M., Madabhushi, A. & Balis, U. G. J. Digital pathology and computational image analysis in nephropathology. Nat. Rev. Nephrol. 16, 669–685. https://doi.org/10.1038/s41581-020-0321-6 (2020).
Gupta, R., Kurç, T. M., Sharma, A., Almeida, J. S. & Saltz, J. The emergence of pathomics. Current Pathobiol. Rep. 7, 73–84 (2019).
Zhang, Y. et al. Histopathology images-based deep learning prediction of prognosis and therapeutic response in small cell lung cancer. npj Digit. Med. 7, 15. https://doi.org/10.1038/s41746-024-01003-0 (2024).
Echle, A. et al. Deep learning in cancer pathology: A new generation of clinical biomarkers. Br. J. Cancer 124, 686–696. https://doi.org/10.1038/s41416-020-01122-x (2021).
Javed, R. et al. Deep learning for lungs cancer detection: A review. Artif. Intell. Rev. 57, 197. https://doi.org/10.1007/s10462-024-10807-1 (2024).
Bankhead, P. et al. Integrated tumor identification and automated scoring minimizes pathologist involvement and provides new insights to key biomarkers in breast cancer. Lab Invest. 98, 15–26. https://doi.org/10.1038/labinvest.2017.131 (2018).
Mi, H. et al. Predictive models of response to neoadjuvant chemotherapy in muscle-invasive bladder cancer using nuclear morphology and tissue architecture. Cell Rep. Med. 2, 100382. https://doi.org/10.1016/j.xcrm.2021.100382 (2021).
Cheng, J. et al. Computational analysis of pathological images enables a better diagnosis of TFE3 Xp11.2 translocation renal cell carcinoma. Nat. Commun. 11, 1778. https://doi.org/10.1038/s41467-020-15671-5 (2020).
Graham, S. et al. Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563. https://doi.org/10.1016/j.media.2019.101563 (2019).
Cruz-Roa, A. et al. in Medical Imaging.
Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979).
Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro 1107–1110 (2009).
Fiorito, V., Chiabrando, D., Petrillo, S., Bertino, F. & Tolosano, E. The multifaceted role of Heme in cancer. Front. Oncol. 9, 1540. https://doi.org/10.3389/fonc.2019.01540 (2019).
Luengo, A., Gui, D. Y. & Vander Heiden, M. G. Targeting metabolism for cancer therapy. Cell Chem. Biol. 24, 1161–1180. https://doi.org/10.1016/j.chembiol.2017.08.028 (2017).
Jia, X. et al. Protein translation: Biological processes and therapeutic strategies for human diseases. Signal Transduct. Target. Ther. 9, 44. https://doi.org/10.1038/s41392-024-01749-9 (2024).
Mikkonen, L., Pihlajamaa, P., Sahu, B., Zhang, F. P. & Jänne, O. A. Androgen receptor and androgen-dependent gene expression in lung. Mol. Cell Endocrinol. 317, 14–24. https://doi.org/10.1016/j.mce.2009.12.022 (2010).
Shi, Q. et al. Notch signaling pathway in cancer: From mechanistic insights to targeted therapies. Signal Transduct. Target. Ther. 9, 128. https://doi.org/10.1038/s41392-024-01828-x (2024).
Hu, J. et al. Tumor microenvironment remodeling after neoadjuvant immunotherapy in non-small cell lung cancer revealed by single-cell RNA sequencing. Genome Med. 15, 14. https://doi.org/10.1186/s13073-023-01164-9 (2023).
Gao, S., Gang, J., Yu, M., Xin, G. & Tan, H. Computational analysis for identification of early diagnostic biomarkers and prognostic biomarkers of liver cancer based on GEO and TCGA databases and studies on pathways and biological functions affecting the survival time of liver cancer. BMC Cancer 21, 791. https://doi.org/10.1186/s12885-021-08520-1 (2021).
Pang, A. P. et al. Dissecting Cellular Function and Distribution of β-Glucosidases in Trichoderma reesei. mBio https://doi.org/10.1128/mBio.03671-20 (2021).
Sun, J. et al. CLEC3B as a potential diagnostic and prognostic biomarker in lung cancer and association with the immune microenvironment. Cancer Cell Int. https://doi.org/10.1186/s12935-020-01183-1 (2020).
George Warren, W., Osborn, M., Yates, A. & O’Sullivan, S. E. The emerging role of fatty acid binding protein 7 (FABP7) in cancers. Drug Discov. Today 29, 103980. https://doi.org/10.1016/j.drudis.2024.103980 (2024).
Lou, Y. et al. Epithelial-mesenchymal transition is associated with a distinct tumor microenvironment including elevation of inflammatory signals and multiple immune checkpoints in lung adenocarcinoma. Clin. Cancer Res. 22, 3630–3642. https://doi.org/10.1158/1078-0432.Ccr-15-1434 (2016).
Chen, J. J. et al. Tumor-associated macrophages: The double-edged sword in cancer progression. J. Clin. Oncol. 23, 953–964. https://doi.org/10.1200/jco.2005.12.172 (2005).
Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of breast cancer: A pooled analysis of 3771 patients treated with neoadjuvant therapy. Lancet Oncol. 19, 40–50. https://doi.org/10.1016/s1470-2045(17)30904-x (2018).
Wu, L., Saxena, S. & Singh, R. K. Neutrophils in the tumor microenvironment. Adv. Exp. Med. Biol. 1224, 1–20. https://doi.org/10.1007/978-3-030-35723-8_1 (2020).
Funding
The work was supported by Wu Jieping Medical Foundation (No.320.6750.2022.03.35).
Author information
Authors and Affiliations
Contributions
YZ and CG conceived this study. XZ and ZZ performed the study and experiments. YL, SZ and XZ and LZ collected the data. XZ and YZ wrote the manuscript. All authors contributed to the interpretation of data and to the revision of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, X., Zhang, ZH., Liu, YM. et al. Investigating lung cancer microenvironment from cell segmentation of pathological image and its application in prognostic stratification. Sci Rep 15, 1704 (2025). https://doi.org/10.1038/s41598-025-85532-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-85532-y