Advancement of AI in oncology

Artificial Intelligence (AI) refers to computational systems that can perform tasks typically requiring human intelligence. AI has existed for decades, with deep learning now far ahead of the competition. Deep learning excels at discerning intricate patterns, especially in big data, and can deliver quantitative assessment automatically1. Supervised learning methods, such as convolutional neural networks (CNNs), have been widely applied in image analysis while they rely on intensive annotations. Weakly supervised methods, including multiple-instance learning and vision transformers (ViTs), address annotation limitations while enabling large-scale data analysis. Unsupervised methods, like self-supervised learning, further uncover hidden patterns in unlabeled data2. Deep learning has enabled AI to transition from theoretical concepts to practical applications in medicine3,4,5,6. In the 2010s, the remarkable performance of CNNs in dermoscopic diagnosis of skin cancer established them as a benchmark in medical image analysis7. In the 2020s, transformers, an architecture that revolutionized natural language processing8, has also enhanced the performance in medical image analysis when integrated with CNN9. It also facilitates the integration of multimodal data types, which was previously difficult to achieve only with CNNs. Recently, multimodal AI has been delivering more comprehensive and accurate insights across various application. As the backbone of generalist AI, this technology enables the integration of multi-modal medical data and supports multitasking capabilities. Generalist medical AI holds significant potential to revolutionize cancer treatment and research10.

AI is progressively being integrated into various facets of oncology. Radiology plays a crucial role in diagnosis and in monitoring tumors throughout the treatment process. The rapidly increasing volume of scans has significantly increased the workload for radiologists, leading to burnout and reduced interpretive accuracy11. However, the vast amount of data creates an ideal environment for AI development. Radiomics refers to the extraction of large amounts of quantitative features from radiological image using advanced computational algorithms. Deep learning methods such as ResNet, U-Net, and YOLO offer significant advantages for feature extraction by automatically learning hierarchical, high-dimensional representations from raw imaging data, enabling more robust and informative characterization of tumor phenotype compared to handcrafted features12. It ushers in a new era of “virtual biopsy”, enabling the extraction of clinically relevant pathology and genomic information from radiology scans across different lesion sites and at multiple time points13. As the development of GPU computational speed, the clinical integration of digital pathology platforms has enabled the real-time implementation of AI on whole-slide imaging. Pathological AI has tremendous potentials to reshape pathology practice by streamlining diagnostic tasks, enhancing diagnostic precision, improving workflow efficiency. Pathologists are freed to focus more on intellectually challenging diagnoses14. Large language models (LLMs) are being tested on summarizing and processing medical text, including radiology reports15, and pathology reports16, as well as serving as a medical chatbot to provide personalized treatment recommendations17. Although the clinical application of AI remains in its early stages due to regulatory and validation challenges, it offers a promising future for oncology.

AI in lung cancer

Lung cancer continues to be the leading cause of cancer-related deaths worldwide, with an estimated 1.8 million deaths per year18. Advances in understanding biological mechanisms and evolving treatment strategies have significantly improved patient survival rates. However, the vast amount of clinical and research data remains insufficiently integrated and analyzed. For instance, although various non-smoking-related risk factors for lung cancer have been identified, they have not yet been integrated into a reliable model for estimating risk in healthy individuals. This gap poses critical challenges to identify populations that would benefit most from lung cancer screening. Predicting the malignancy risk of intermediate pulmonary nodules on CT scans is also a difficult issue. Data-driven methods are needed to guide treatment selection for each patient based on biomarkers. With high-quality real-world data, these clinical challenges could be addressed by deep learning. AI has the potential to revolutionize lung cancer management across multiple domains, including prevention, screening, diagnosis, prognosis, treatment, and monitoring. (Fig. 1) Representative studies were summarized in Table 1 with tasks, number of cases, data modality, algorithm, and performance.

Fig. 1: AI applications in lung cancer care pathway.
figure 1

Created in BioRender. Zhu, E. (2025) https://BioRender.com/f83y797.

Table 1 Summary of representative AI studies in lung cancer

In this narrative review, we explored the translational potential of AI in lung cancer, focusing on urgent clinical challenges from the perspective of oncologists. Rather than detailing technical methodologies, we emphasized the functional capabilities of AI—what it can currently do and what it may enable in the future—in a manner accessible to clinicians. We further outlined key barriers to clinical translation, specifically for readers who are not data scientists. References for this review were identified through searches of MEDLINE, PubMed, and citations from relevant articles using the term “lung cancer” combined with keywords such as “Artificial Intelligence,” “AI,” “Machine Learning,” “Deep Learning,” “Radiomics,” and “Large Language Model.” The search primarily focused on English-language publications from January 1, 2014, to December 21, 2024.

Prevention

Tobacco remains the primary etiological factor for lung cancer. Smoking cessation or control is the most effective strategy for reducing its risk. However, it is estimated that global population of cigarette smokers remains close to 1 billion19. To support smoking cessation, AI can analyze images of a smoker’s daily environment to identify contexts associated with smoking cravings20,21, and can monitor smoking behavior by studying signals of wearable sensors and assess puff topography22. However, addressing smoking at the individual level alone is insufficient. Tobacco control is a major public health challenge as smoking prevalence is influenced by various factors, including being male, having a lower socioeconomic status, experiencing significant physical or mental health conditions, identifying as lesbian, gay, bisexual, or transgender, or belonging to certain racial and ethnic groups23. The underlying causes of smoking cravings and effective interventions for these populations are not yet fully understood. To tackle this issue, public health researchers must work closely with AI experts to harness the potential of advanced AI tools in supporting smoking cessation efforts.

Screening

The U.S. Preventive Services Task Force recommends annual low-dose CT (LDCT) screening for people aged 50–80 years and a smoking history of 20 pack-years. However, this is an imprecise criterion. To date, few studies have directly compared the use of pack-year smoking history and age versus other measures. Some evidence suggested that 20-year smoking duration was better than 20-pack-year24. Risk of lung cancer is influenced by a range of factors beyond age and smoking, including ethnicity, genetics, and environmental exposures25,26,27. Traditional lineal regression models often struggle to process and interpret such complex and high-dimensional data effectively. To address this limitation, AI tools have been tested for identifying individuals at high risk of developing lung cancer. These tools leverage hidden patterns in routine clinical data28, chest X-rays29, extensive web search histories30, and survey responses31.

A significant challenge with current screening criteria is the high false-positive rate, as many detected nodules are either benign or of intermediate risk32. This results in unnecessary follow-ups and patient anxiety. While guidelines exist to help radiologists estimate malignancy risk, these assessments often remain subjective and heavily reliant on individual expertise. Deep learning-powered algorithms have been tested for automating cancer diagnosis, demonstrating promising performance in lung cancer screening. For instance, a deep learning algorithm developed by google can analyze both current and prior CT scans of a patient. The model achieved a state-of-the-art performance (94.4% area under the curve) on 6716 National Lung Cancer Screening Trial (NLST) cases and outperformed six radiologists with absolute reductions of 11% in false positives and 5% in false negatives. If prior CT was available, the model performance was on-par with the same radiologists33. Another deep learning-based AI algorithm, Sybil, has demonstrated robust performance in predicting the future risk of developing lung cancer from a single LDCT. Sybil achieved area under the receiver-operator curves for lung cancer prediction at 1 year of 0.92 and 6 years of 0.75 on NLST34. Such AI can be used to personalize screening intervals to optimize resource utilization. Moreover, AI has enhanced interobserver agreement among radiologists for both risk stratification and management recommendations35. Patients undergo substantial radiation exposure during follow-up. Deep learning can revolutionize image reconstruction, by enabling lower doses of contrast agents, reducing ionizing radiation, and shortening image acquisition times. This advancement facilitates the application of ultra-low-dose CT for large-scale lung cancer screening36.

A meta-analysis showed that AI-based LDCT screening tools achieve high sensitivity (94.6%) but only moderate specificity (93.6%), translating to false-positive rates of ~6.4% and false-negative rates of ~5.4%37. Moreover, AI performance can also be skewed by biases in training datasets—such as variations in image quality, scan conditions, and vendor platforms—leading to inconsistent detection rates across institutions38. To mitigate these limitations, models should be developed and validated on large, multi-center, and demographically diverse cohorts, with systematic bias-audit frameworks and prospective external testing prior to clinical deployment.

Screening with LDCT presents opportunities for AI to simultaneously detect other smoking-related diseases, such as chronic obstructive lung disease39, and cardiovascular disease40. Beyond CT scans, deep learning has shown superior performance compared to radiologists in detecting lung nodules on X-rays41,42. In addition to radiographical scan, AI can also leverage blood biomarkers, including ctDNA43 and plasma protein markers44 to aid in the early-stage detection of lung cancer.

Diagnosis

Lung cancer is a heterogeneous disease with diverse clinicopathological characteristics. AI could improve diagnosis in three domains by analyzing radiomics, digital pathology, and genomic sequencing data.

Radiomics

Advances in radiomics and deep learning technologies allow clinicians to derive comprehensive pathological insights from routine radiology scans prior to final pathological confirmation. This approach has been demonstrated to help differentiate lung cancer and benign lesions45, primary and metastatic lung lesions46, malignant and benign pleural effusions47, as well as adenocarcinoma and squamous cell carcinoma48, and even subtypes of adenocarcinoma49. It has demonstrated the ability to predict driver mutations, such as EGFR 19Del (exon 19) and L858R (exon 21)50, T790M (exon 20)51, and ALK rearrangement mutation52. Radiomics could also predict expression of PD-L153 and CD8+ T cell54 to guide immunotherapy.

Digital pathology

The field of computational pathology, despite being initially hindered by the high costs associated with digitalization, has gained significant momentum due to advances in AI. AI enabled the automatic diagnosis of lung cancer across various specimen types, including H&E slides55,56, cryosection tissue slides57, cytopathology samples58, and lymph node biopsy59. It also demonstrated the ability to predict driver mutations55, PD-L1 expression60 and tumor-infiltrating lymphocytes61 from H&E slides.

Genomic sequencing

Advances in predictive biomarker discovery have paved the way for targeted therapies and immunotherapies in lung cancer treatment. AI enhances somatic mutation identification in next-generation sequencing, outperforming standard genetic analysis approaches62,63. By decoding genomic and transcriptomic data. AI can accurately determine the cell-of-origin for cancers of unknown primary, aiding in diagnosis and treatment planning64. In the context of immune biomarkers, AI is able to predict other biomarkers like tumor mutation burden65, neoantigens66, and T-cell receptor-antigen binding specificity67.

Prognosis

Lung cancer is primarily staged using the tumor, nodal, and metastasis classification system. In some cases, patients may require additional invasive procedures, such as endobronchial ultrasound biopsy, to assess nodal involvement. AI has emerged as a valuable tool for integrating multi-modal data, including medical records, radiology, pathology, and molecular data, to enhance staging accuracy and risk stratification. For instance, some pilot AI studies have been tested on routine radiology scans to predict the invasiveness of adenocarcinoma68,69, distant metastasis70, as well as to identify novel imaging subtypes71,72. In addition, AI automated the extraction of intricate features in medical imaging to offer new insights in prognostic stratification. For radiology, AI predictions were significantly associated with overall survival with AUC 0.70–0.71, outperforming clinical feature predictions with AUC 0.58–0.6673. For pathology, AI was able to predict overall survival with AUC 0.64–0.85, outperforming clinical feature predictions with AUC 0.52–0.8474.

Treatment

Surgery

Sublobar resection is not inferior to lobectomy in patients with peripheral, node-negative NSCLC measuring 2 cm or smaller. However, lymph node-negative status can only be definitively confirmed after surgery. To address this limitation, a deep learning-based AI model was developed to predict lymph node metastasis preoperatively. The AI demonstrated strong performance and had the potential to assist surgeons in accurately identifying patients who are suitable candidates for sublobar resection75. Commonly, sublobar resection is reserved for patients with compromised pulmonary function. A collaboration between pulmonologists and AI could interpret pulmonary function tests (PFTs), which is crucial for assessing a patient’s surgical candidacy76. When patients were selected for segmentectomy, the variability and complexity of intrathoracic anatomy present a significant challenge. Virtual reality systems have been developed to reconstruct thoracic anatomy, aiding in preoperative surgery planning and potentially reducing the duration of complex surgeries77,78. During the final stages of surgery, AI can assist by detecting air-leak sites through the analysis of surgical videos, even in deflated lungs. This capability enhances the surgeon’s ability to address potential complications before closing thoracic cavities79.

Radiotherapy

Radiotherapy is a critical therapeutic approach, especially for locally advanced lung cancer, where it still holds curative potential. Accurate delineation of the gross tumor volume and consistent contouring of Organs at Risk are essential yet challenging. AI-based algorithms were tested for auto contouring and radiotherapy planning, which is especially useful for low- and middle-income countries80. Moreover, radiomic models have been used to predict lung cancer recurrence69,70, cardiotoxicity81, and lung toxicity82, after radiotherapy.

Systemic therapy

NSCLC was poorly immunogenic. However, advances in immunotherapy have identified two key immune checkpoints relevant to NSCLC: CTLA-4 and the PD-1/PD-L1 axis. Clinical studies have consistently demonstrated that anti-PD-1 and anti-PD-L1 antibodies significantly improve patient survival compared to chemotherapy, marking a breakthrough in treatment strategies for this disease83. PD-L1 expression is the primary biomarker for predicting treatment response to immune checkpoint inhibitors. However, responses have also been observed in patients without detectable PD-L1 expression in their tumors. This phenomenon is likely attributed to the heterogeneity of PD-L1 expression both within a single tumor (intratumoral) and among different tumors (intertumoral). Such variability introduces inherent bias when relying on biopsy samples, which may not accurately represent the overall tumor microenvironment. Beyond traditional immune biomarkers, radiomic biomarkers have provided early indicators of survival in patients84,85. These markers can also predict adverse reactions of immunotherapy, such as hyperprogression86 cachexia87, and immunotherapy-induced pneumonitis53. The deep learning model effectively captured additional imaging patterns beyond known hand-crafted features, enhancing predictive accuracy88. On the other hand, blood biomarkers, such as ctDNA89 and cytokines90 have also been valuable for AI in predicting responses to immunotherapy. AI’s capacity to integrate multimodal data—including radiomics, pathomics, and genomics—into comprehensive big data analyses holds great promise for identifying immunotherapy responders, ultimately advancing personalized treatment strategies for lung cancer91.

EGFR mutations are the most commonly targetable driver mutations in lung adenocarcinoma. Third-generation EGFR-TKIs have significantly extended patients’ survival. However, treatment resistance remains challenging. Combination strategies involving chemotherapy92 or VEGF inhibitor93 have been shown to improve the durability of response to EGFR-TKIs. Despite these benefits, patients receiving combination therapies experience a higher incidence of severe adverse events. Therefore, oncologists must carefully identify and select high-risk patients who are most likely to benefit from these approaches. Two studies found that AI could predict progression risk to identify high-risk patients94,95.

Clinical decision support systems

AI holds great potential in clinical decision support by integrating radiology, pathology, genomics, and clinical data96. Clinical Decision Support Systems, when effectively integrated with AI, can provide physicians with personalized treatment information97. Some studies explored the application of AI tools like Watson for Oncology (WFO) in decision-making of lung cancer patients. Preliminary results indicated AI’s potential in adhering to clinical guidelines and assisting in decision-making. However, a relatively high proportion of cases are still not supported by WFO, and it needs to learn the regional characteristics of patients98,99.

Monitoring

Currently, the evaluation of treatment response and disease progression in lung cancer primarily relies on lesion size, as outlined in the Response Evaluation Criteria in Solid Tumors (RECIST). Early differentiation between responders and non-responders is critical for timely adjustments to treatment regimens. However, the validity of RECIST has been questioned in the context of targeted therapies and immunotherapies due to phenomena such as pseudoprogression100. Noninvasive radiomic biomarkers can predict pseudoprogression and hyperprogression in patients with lung cancer with AUC of 0.88 (pseudoprogression vs. hyperprogression) and 0.87 (hyperprogression vs. progression)101. Additionally, response assessment is a time-intensive process requiring significant expertise and is subject to high intra- and inter-reader variability. Deep learning has shown promise in automating this process. Applications include automated RECIST evaluations for patients receiving immunotherapy102.

Minimal residual disease (MRD) is strongly associated with disease progression in lung cancer. Monitoring circulating tumor DNA (ctDNA) in plasma has emerged as a valuable method for detecting MRD and predicting patient survival103. Longitudinal ctDNA detection offers insights into treatment response and can guide therapeutic strategies for patients with metastatic non-small-cell lung cancer (NSCLC)89. Additionally, machine learning approaches have shown promise in analyzing ctDNA kinetics, enabling the optimization of personalized therapies for NSCLC104.

Large language models in lung cancer

LLMs can respond to free-text queries without requiring specific task training. This enables AI to learn and comprehend medical ___domain knowledge extremely rapidly and accurately. Medical chatbots, for instance, have demonstrated the capability to generate responses to patients queries that are comparable to those of clinicians, both in quality and empathy105. For lung cancer, LLMs may be used as decision aids106,107 Although promising, inaccuracy is the most concerning problem. LLMs can generate fabricating facts because they learn statistical word associations rather than achieving true understanding. Also, the training data is often from the internet, which is not verified. They function best as assistive tools under human supervision rather than in autonomous roles108. In the context of clinical trials, AI facilitates the matching process by aligning patient medical records against the enrollment criteria. Multiple studies have reported that AI can effectively extract patients’ data and matches it to relevant clinical trials109.

Approved AI devices in lung cancer

Before AI algorithms can be implemented in clinical settings, official approval is required. The pace of AI development challenges the appropriate regulatory frameworks and requires more staffs to efficiently process submissions. This process involves more stringent clinical trials and validation testing than what is usually presented in academic publications. FDA categorizes these AI medical devices according to the level of potential risk posed to individual patients. Many AI devices in oncology fall under Class II (moderate risk) for which randomized controlled clinical trials are not typically required. To integrate AI product into widespread use, well-controlled clinical studies are necessary to show that the product’s benefits outweigh its risks. Also, most AI products performed well in predefined tasks like detection but lack generalizability across different patient populations requires validation. Consequently, only a small proportion of AI algorithms are eligible to be deployed in clinical settings. Among the approved AI applications in lung cancer, they mainly focus on lung nodule detection, diagnosis, and radiotherapy planning, with all these algorithms being imaging-based. (Table 2) Multi-party collaboration is needed to optimize and adjust regulatory frameworks and processes, improve AI development, validation, and documentation standards, address challenges of advanced and evolving AI, and strengthen full lifecycle management and post-market surveillance110,111.

Table 2 FDA approved AI devices in lung cancer

Challenges and opportunities

AI research in lung cancer offers promising prospects for automated and precise management but translating these advances into clinical practice faces several hurdles. Key challenges are discussed in the following.

Data sharing

Continuous data supply is crucial for the effective training, validation, and refinement of AI algorithms. To develop robust AI tools, large high-quality datasets from multiple institutions are needed to address the limitation, such as statistical power, diversity, and clinical practice variations. However, sharing data is challenging due to concerns about patient privacy and intellectual property protection112. To address this issue, three primary options are available. The first one is centralized learning, with institutions creating a shared legal agreement and security protocol to pull data together. While effective, this approach is costly. The second option is through creating deidentified public datasets. Some commonly used lung cancer databases have been summarized. (Table 3) This option is more affordable but may lack certain types of patient information, making it difficult to train AI for specific clinical applications. The third option is federated learning. Data remains private at each institution, but AI models are trained in a distributed manner113. It has been implemented in several cancer-related applications, including breast cancer114, brain cancer115, gastric cancer116, melanoma117, and lung cancer118.

Table 3 Publicly available lung cancer datasets and their description and challenges

Bias and fairness

AI models inevitably inherit associated biases that favor a particular racial, ethnic, or gender groups, resulting in poor performance when applied to diverse populations119. For instance, only 50% of Black women and 63% of Black men diagnosed with lung cancer qualified for screening120. According to the report of the 75,774 patient from The Society of Thoracic Surgeons General Thoracic Surgery database, white patients and those with private insurance had a higher incidence of complex operations121. Efforts are underway to generate more diverse datasets and reduce biases in both breast cancer122 and lung cancer123. Additionally, AI algorithms can be specifically designed to ensure fairness, enhancing their effectiveness across varied demographic and socioeconomic groups124.

Interpretability

This remains a significant challenge for AI, particularly with deep learning approaches that operate as end-to-end systems, mapping inputs directly to outcomes without manually selected features. This black-box nature makes it difficult to understand which factors are driving decisions, potentially leading to misleading conclusions due to spurious confounders in the data125. Such opacity is often deemed unacceptable in healthcare decision-making, posing a significant barrier to clinical utilization126. Consequently, explainable AI has emerged as a highly active research area, aiming to make AI models more transparent and understandable127. Despite these efforts, the optimal form of explainability for clinical use remains unknown, and even FDA-approved AI devices currently offer limited interpretability.

Reproducibility and translation

A robust AI model requires independent review and test by external groups, which is crucial for assessing potential biases in datasets and ensuring generalizability across diverse clinical settings. Despite these needs, most published AI studies still lack reproducibility. Imaging protocols such as CT scanner manufacturer, radiation dose, convolution kernel, iterative reconstruction, and section thickness significantly impact the diagnostic performance of deep learning algorithms. This variability reduced the reliability in clinical practice128. Motion artifacts from patient breathing and image noise further degrade data quality, complicating tasks like nodule detection and segmentation. Additionally, annotation variability among radiologists introduces subjectivity, affecting accuracy. To address these issues, standardized preprocessing pipelines are essential to address these challenges and ensure robust, generalizable AI models for lung cancer applications. Recently, the Image Biomarker Standardization Initiative (IBSI) has made significant strides in establishing standards129,130. A list of 16 criteria for the optimal development of a radiomic test serves as a guide for the implementation of future radiomic analyses131. Various guidelines have been proposed to provide essential frameworks to report necessary information about AI modeling, including MINIMAR (MINimum Information for Medical AI Reporting)132, SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence)133, CONSORT-AI (Consolidated Standards of Reporting of Trials–Artificial Intelligence)134, and ESMO-GROW (European Society for Medical Oncology Guidance for Reporting Oncology Real-World Evidence)135.

Future directions

AI is evolving rapidly, with the ultimate objective being the development of a comprehensive model, known as generalist AI, capable of analyzing multi-modal data and addressing a wide range of tasks. Currently, most AI models in healthcare are uni-modal and uni-task, requiring separate models for different types of medical data—such as medical records, radiology, pathology, and genomic data—to solve even a single task. Novel deep learning architectures can integrate multimodal, thereby improving model performance2. Recently, PathChat, a chatbot enabling interactive discussions with pathologists, has been introduced, potentially providing expert-level insights related to specific cases136. Extending this concept, generalist deep learning models could integrate comprehensive patient information and interact with physicians similarly to how ChatGPT functions. Such models could allow physicians to define prediction tasks in natural language, with the model explaining its predictions. Generalist AI has the potential to significantly enhance diagnostic and prognostic methods in oncology, shifting from task-specific models to a holistic, integrated approach.

Beyond traditional medical data like radiologic images and genomic information which are costly and not time-sensitive, technological advances in smartphones and wearable sensors can collect extensive physiological and environmental data for each patient. AI holds substantial promise in managing these large datasets to identify individuals at high risk for cancers that are influenced by environmental and behavior factors, such as lung cancer. In the future, real-time AI-assisted lung cancer prevention could offer personalized early intervention and risk management strategies while accumulating valuable data for researchers to identify underlying risk factors. Integrating personal data can also facilitate remote monitoring, providing alerts to primary physicians and patients as necessary, during the diagnosis and treatment course of lung cancer.

Limitations

The discussed AI applications themselves face significant limitations hindering immediate widespread clinical translation. These include challenges in data sharing and quality, inherent model biases, the “black-box” nature affecting interpretability, and a general lack of reproducibility and external validation in many studies. Most currently approved AI tools are imaging-based and target specific tasks, indicating that the full potential of multi-modal, generalist AI is yet to be realized in routine clinical practice.

Conclusion

AI is significantly advancing lung cancer care across prevention, screening, diagnosis, prognosis, treatment, and monitoring by analyzing complex data to personalize patient management. (Fig. 1) Deep learning algorithms show immense potential in improving diagnostic accuracy, predicting treatment responses, and automating tasks. However, challenges such as data sharing, model bias, lack of interpretability, and reproducibility issues hinder widespread clinical adoption. Multi-party collaboration is needed to optimize and adjust regulatory frameworks and processes, improve AI development, validation, and documentation standards, address challenges of advanced and evolving AI, and strengthen full lifecycle management and post-market surveillance. The development of generalist AI which capable of integrating multimodal data, will provide holistic and interactive decision support.