Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Machine-learning and scRNA-Seq-based diagnostic and prognostic models illustrating survival and therapy response of lung adenocarcinoma

Abstract

Lung cancer is a major cause accounting for cancer-related mortalities, with lung adenocarcinoma (LUAD) being the most prevalent subtype. Given the high clinical and cellular heterogeneities of LUAD, accurate diagnosis and prognosis are crucial to avoid overdiagnosis and overtreatment. Taking full advantage of scRNA-Seq data to resolve the tumor heterogeneities, we explored the overall landscape of LUAD microenvironment. Utilizing the stage-specific tumor cell markers, we have developed highly accurate diagnostic and prognostic models with elevated sensitivity and specificity. The diagnostic model, developed through random forest algorithms with a thirteen-gene signature, achieved an accuracy of 96.4% and an AUC of 0.993. These metrics were further demonstrated by benchmarking with available models and scoring systems in independent cohorts. Concurrently, the prognostic model, formulated via Cox regression with a six-gene signature, effectively predicted overall survival, with elevated risk scores associated with increased fractions of cancer-associated fibroblasts, and higher likelihood of immune escape and T-cell exclusion. Subsequently, two nomograms were developed to predict survival and drug responses, facilitating their integration into clinical practice. Overall, this study underscores the potential of our models for efficient, rapid, and cost-effective diagnosis and prognosis of LUAD, adaptable to multiple expression profiling platforms and quantification methods.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The landscape of the LUAD tumor microenvironment.
Fig. 2: Diagnostic model construction and corresponding evaluations.
Fig. 3: Prognostic model construction and corresponding evaluations.
Fig. 4: Significant correlations between the prognostic model and clinical traits, therapy responses, as well as immune infiltration.
Fig. 5: Validations of prognosis-related genes.
Fig. 6: Practical clinical applications of the prognostic model.

Similar content being viewed by others

Data availability

The LUAD scRNA-Seq profiles GSE131907 [16] was obtained from GEO. Expression profiles and clinical traits of TCGA-LUAD were obtained from the UCSC Xena browser (https://xenabrowser.net/datapages/). Additionally, microarray datasets GSE7670 [27], GSE102287 [25], GSE30219 [24], GSE19804 [28], GSE10072 [26], GSE31210 [33], GSE13213 [34], and GSE72094 [35] were obtained from GEO. The drug responses and corresponding clinical data were obtained from a previous study [53]. The final diagnostic model is available from the Github repository (https://github.com/univerchen/LUAD).

References

  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71:209–49.

    PubMed  Google Scholar 

  2. Spella M, Stathopoulos GT. Immune Resistance in Lung Adenocarcinoma. Cancers (Basel). 2021;13:384.

    CAS  PubMed  Google Scholar 

  3. Senosain MF, Massion PP. Intratumor Heterogeneity in Early Lung Adenocarcinoma. Front Oncol. 2020;10:349.

    PubMed  PubMed Central  Google Scholar 

  4. Seguin L, Durandy M, Feral CC. Lung Adenocarcinoma Tumor Origin: A Guide for Personalized Medicine. Cancers (Basel). 2022;14:1759.

    CAS  PubMed  Google Scholar 

  5. Diaz-Cano SJ. Tumor heterogeneity: mechanisms and bases for a reliable application of molecular marker design. Int J Mol Sci. 2012;13:1951–2011.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin. 2016;66:7–30.

    PubMed  Google Scholar 

  7. Chatterjee S. Artefacts in histopathology. J Oral Maxillofac Pathol. 2014;18:S111–6.

    PubMed  PubMed Central  Google Scholar 

  8. Hillman H. Limitations of clinical and biological histology. Med Hypotheses. 2000;54:553–64.

    CAS  PubMed  Google Scholar 

  9. Taqi SA, Sami SA, Sami LB, Zaki SA. A review of artifacts in histopathology. J Oral Maxillofac Pathol. 2018;22:279.

    PubMed  PubMed Central  Google Scholar 

  10. D’Ambrosi S, Giannoukakos S, Antunes-Ferreira M, Pedraz-Valdunciel C, Bracht JWP, Potie N, et al. Combinatorial Blood Platelets-Derived circRNA and mRNA Signature for Early-Stage Lung Cancer Detection. Int J Mol Sci. 2023;24:4881.

    PubMed  PubMed Central  Google Scholar 

  11. Ye XD, Zhang N, Jin YX, Xu B, Guo CY, Wang XQ, et al. Dramatically changed immune-related molecules as early diagnostic biomarkers of non-small cell lung cancer. Febs J. 2020;287:783–99.

    CAS  PubMed  Google Scholar 

  12. Freitas C, Sousa C, Machado F, Serino M, Santos V, Cruz-Martins N, et al. The Role of Liquid Biopsy in Early Diagnosis of Lung Cancer. Front Oncol. 2021;11:634316.

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Li Y, Sun N, Lu Z, Sun S, Huang J, Chen Z, et al. Prognostic alternative mRNA splicing signature in non-small cell lung cancer. Cancer Lett. 2017;393:40–51.

    CAS  PubMed  Google Scholar 

  14. Li R, Yang YE, Yin YH, Zhang MY, Li H, Qu YQ. Methylation and transcriptome analysis reveal lung adenocarcinoma-specific diagnostic biomarkers. J Transl Med. 2019;17:324.

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Sun L, Zhang Z, Yao Y, Li WY, Gu J. Analysis of expression differences of immune genes in non-small cell lung cancer based on TCGA and ImmPort data sets and the application of a prognostic model. Ann Transl Med. 2020;8:550.

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Kim N, Kim HK, Lee K, Hong Y, Cho JH, Choi JW, et al. Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma. Nat Commun. 2020;11:2285.

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Ma KY, Schonnesen AA, Brock A, Van Den Berg C, Eckhardt SG, Liu Z, et al. Single-cell RNA sequencing of lung adenocarcinoma reveals heterogeneity of immune response-related genes. JCI Insight. 2019;4:121387.

    PubMed  Google Scholar 

  18. Zavidij O, Haradhvala NJ, Mouhieddine TH, Sklavenitis-Pistofidis R, Cai S, Reidy M, et al. Single-cell RNA sequencing reveals compromised immune microenvironment in precursor stages of multiple myeloma. Nat Cancer. 2020;1:493–506.

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Lu T, Yang X, Shi Y, Zhao M, Bi G, Liang J, et al. Single-cell transcriptome atlas of lung adenocarcinoma featured with ground glass nodules. Cell Discov. 2020;6:69.

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Olsen TK, Baryawno N. Introduction to Single-Cell RNA Sequencing. Curr Protoc Mol Biol. 2018;122:e57.

    PubMed  Google Scholar 

  21. Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–3587.e29.

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Team RC. R: A language and environment for statistical computing. 2013.

  23. Liaw A, Wiener M. Classification and regression by randomForest. R N. 2002;2:18–22.

    Google Scholar 

  24. Rousseaux S, Debernardi A, Jacquiau B, Vitte AL, Vesin A, Nagy-Mignotte H, et al. Ectopic activation of germline and placental genes identifies aggressive metastasis-prone lung cancers. Sci Transl Med. 2013;5:186ra66.

    PubMed  PubMed Central  Google Scholar 

  25. Mitchell KA, Zingone A, Toulabi L, Boeckelman J, Ryan BM. Comparative Transcriptome Profiling Reveals Coding and Noncoding RNA Differences in NSCLC from African Americans and European Americans. Clin Cancer Res. 2017;23:7412–25.

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Landi MT, Dracheva T, Rotunno M, Figueroa JD, Liu H, Dasgupta A, et al. Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS One. 2008;3:e1651.

    PubMed  PubMed Central  Google Scholar 

  27. Su LJ, Chang CW, Wu YC, Chen KC, Lin CJ, Liang SC, et al. Selection of DDX5 as a novel internal control for Q-RT-PCR from microarray data using a block bootstrap re-sampling scheme. BMC Genomics. 2007;8:140.

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Lu TP, Tsai MH, Lee JM, Hsu CP, Chen PC, Lin CW, et al. Identification of a novel biomarker, SEMA5A, for non-small cell lung carcinoma in nonsmoking women. Cancer Epidemiol Biomark Prev. 2010;19:2590–7.

    CAS  Google Scholar 

  29. Sheng M, Xie X, Wang J, Gu W. A Pathway-Based Strategy to Identify Biomarkers for Lung Cancer Diagnosis and Prognosis. Evol Bioinform Online. 2019;15:1176934319838494.

    PubMed  PubMed Central  Google Scholar 

  30. Zhang BZ, Wang YD, Zhou XZ, Zhang Z, Ju HY, Diao XQ, et al. Construction of a Prognostic and Early Diagnosis Model for LUAD Based on Necroptosis Gene Signature and Exploration of Immunotherapy Potential. Cancers. 2022;14:5153.

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Chen Q, Wang XY, Hu J. Systematically integrative analysis identifies diagnostic and prognostic candidates and small-molecule drugs for lung adenocarcinoma. Transl Cancer Res. 2021;10:3619–46.

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Cai SH, Guo XT, Huang CJ, Deng YJ, Du LD, Liu WY, et al. Integrative analysis and experiments to explore angiogenesis regulators correlated with poor prognosis, immune infiltration and cancer progression in lung adenocarcinoma. J Transl Med. 2021;19:361.

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Okayama H, Kohno T, Ishii Y, Shimada Y, Shiraishi K, Iwakawa R, et al. Identification of genes upregulated in ALK-positive and EGFR/KRAS/ALK-negative lung adenocarcinomas. Cancer Res. 2012;72:100–11.

    CAS  PubMed  Google Scholar 

  34. Tomida S, Takeuchi T, Shimada Y, Arima C, Matsuo K, Mitsudomi T, et al. Relapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosis. J Clin Oncol. 2009;27:2793–9.

    CAS  PubMed  Google Scholar 

  35. Schabath MB, Welsh EA, Fulp WJ, Chen L, Teer JK, Thompson ZJ, et al. Differential association of STK11 and TP53 with KRAS mutation-associated gene expression, proliferation and immune surveillance in lung adenocarcinoma. Oncogene. 2016;35:3209–16.

    CAS  PubMed  Google Scholar 

  36. Therneau T. A package for survival analysis in S. R package version, 2015. 2.

  37. Kassambara, A, Kosinski M, Biecek P, Fabian S. Survminer: Drawing Survival Curves Using Ggplot2. 2021. https://CRAN.R-project.org/package=survminer. R package version 0.4, 2021. 9.

  38. Sturm G, Finotello F, List M. Immunedeconv: An R Package for Unified Access to Computational Methods for Estimating Immune Cell Fractions from Bulk RNA-Sequencing Data. Methods Mol Biol. 2020;2120:223–32.

    CAS  PubMed  Google Scholar 

  39. Yoshihara K, Kim H, Verhaak R. estimate: Estimate of Stromal and Immune Cells in Malignant Tumor Tissues from Expression Data. R package version, 2016. 1: p. r21.

  40. Fu J, Li K, Zhang W, Wan C, Zhang J, Jiang P, et al. Large-scale public data reuse to model immunotherapy response and resistance. Genome Med. 2020;12:21.

    PubMed  PubMed Central  Google Scholar 

  41. Jiang P, Gu S, Pan D, Fu J, Sahu A, Hu X, et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med. 2018;24:1550–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Wong KY, Cheung AH, Chen B, Chan WN, Yu J, Lo KW, et al. Cancer-associated fibroblasts in nonsmall cell lung cancer: From molecular mechanisms to clinical implications. Int J Cancer. 2022;151:1195–215.

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Xiang H, Ramil CP, Hai J, Zhang C, Wang H, Watkins AA, et al. Cancer-Associated Fibroblasts Promote Immunosuppression by Inducing ROS-Generating Monocytic MDSCs in Lung Squamous Cell Carcinoma. Cancer Immunol Res. 2020;8:436–50.

    CAS  PubMed  Google Scholar 

  44. Wang L, Cao L, Wang H, Liu B, Zhang Q, Meng Z, et al. Cancer-associated fibroblasts enhance metastatic potential of lung cancer cells through IL-6/STAT3 signaling pathway. Oncotarget. 2017;8:76116–28.

    PubMed  PubMed Central  Google Scholar 

  45. Scholl C, Frohling S, Dunn IF, Schinzel AC, Barbie DA, Kim SY, et al. Synthetic Lethal Interaction between Oncogenic KRAS Dependency and STK33 Suppression in Human Cancer Cells. Cell. 2009;137:821–34.

    CAS  PubMed  Google Scholar 

  46. The Human Protein Atlas. 2022; Available from: http://www.proteinatlas.org.

  47. Thul PJ, Akesson L, Wiking M, Mahdessian D, Geladaki A, Ait Blal H, et al. A subcellular map of the human proteome. Science. 2017;356:eaal3321.

    PubMed  Google Scholar 

  48. Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347:1260419.

    PubMed  Google Scholar 

  49. Wang T, Hao D, Yang S, Ma J, Yang W, Zhu Y, et al. miR-211 facilitates platinum chemosensitivity by blocking the DNA damage response (DDR) in ovarian cancer. Cell Death Dis. 2019;10:495.

    PubMed  PubMed Central  Google Scholar 

  50. Zhao DD, Zhao X, Li WT. Identification of differentially expressed metastatic genes and their signatures to predict the overall survival of uveal melanoma patients by bioinformatics analysis. Int J Ophthalmol. 2020;13:1046–53.

    PubMed  PubMed Central  Google Scholar 

  51. Zhang D, Park D, Zhong Y, Lu Y, Rycaj K, Gong S, et al. Stem cell and neurogenic gene-expression profiles link prostate basal cells to aggressive prostate cancer. Nat Commun. 2016;7:10798.

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Wang YX, Marino-Enriquez A, Bennett RR, Zhu MJ, Shen YP, Eilers G, et al. Dystrophin is a tumor suppressor in human cancers with myogenic programs. Nat Genet. 2014;46:601–6.

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Ding Z, Zu S, Gu J. Evaluating the molecule-based prediction of clinical drug responses in cancer. Bioinformatics. 2016;32:2891–5.

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was funded by the National Key R&D Program of China (2022YFA1303200 to XS), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB29030104 to TJ), the National Natural Science Foundation of China (Grant 82272301 to TJ), and the Fundamental Research Funds for the Central Universities (WK9100000001 to TJ).

Author information

Authors and Affiliations

Authors

Contributions

Qingyu Cheng: Conceptualization, Data curation, Investigation, Visualization, Writing - original draft. Weidong Zhao: Writing - review & editing. Xiaoyuan Song: Project administration, Writing - review & editing. Tengchuan Jin: Conceptualization, Project administration, Supervision, Writing - review & editing.

Corresponding authors

Correspondence to Xiaoyuan Song or Tengchuan Jin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, Q., Zhao, W., Song, X. et al. Machine-learning and scRNA-Seq-based diagnostic and prognostic models illustrating survival and therapy response of lung adenocarcinoma. Genes Immun 25, 356–366 (2024). https://doi.org/10.1038/s41435-024-00289-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41435-024-00289-0

This article is cited by

Search

Quick links