Predicting the tumor microenvironment composition and immunotherapy response in non-small cell lung cancer from digital histopathology images

Patkar, Sushant; Chen, Alex; Basnet, Alina; Bixby, Amber; Rajendran, Rahul; Chernet, Rachel; Faso, Susan; Kumar, Prashanth Ashok; Desai, Devashish; El-Zammar, Ola; Curtiss, Christopher; Carello, Saverio J.; Nasr, Michel R.; Choyke, Peter; Harmon, Stephanie; Turkbey, Baris; Jamaspishvili, Tamara

doi:10.1038/s41698-024-00765-w

Download PDF

Article
Open access
Published: 19 December 2024

Predicting the tumor microenvironment composition and immunotherapy response in non-small cell lung cancer from digital histopathology images

Sushant Patkar ORCID: orcid.org/0000-0002-5877-0599¹^na1^na2,
Alex Chen¹^na2,
Alina Basnet ORCID: orcid.org/0000-0002-0062-9007²,
Amber Bixby³,
Rahul Rajendran³,
Rachel Chernet³,
Susan Faso³,
Prashanth Ashok Kumar ORCID: orcid.org/0000-0001-6236-2822²,
Devashish Desai²,
Ola El-Zammar³,
Christopher Curtiss³,
Saverio J. Carello³,
Michel R. Nasr³,
Peter Choyke¹,
Stephanie Harmon ORCID: orcid.org/0000-0002-2507-2399¹,
Baris Turkbey¹ &
…
Tamara Jamaspishvili ORCID: orcid.org/0000-0002-1475-5717³^na1

npj Precision Oncology volume 8, Article number: 280 (2024) Cite this article

7471 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 09 January 2025

This article has been updated

Abstract

Immune checkpoint inhibitors (ICI) have become integral to treatment of non-small cell lung cancer (NSCLC). However, reliable biomarkers predictive of immunotherapy efficacy are limited. Here, we introduce HistoTME, a novel weakly supervised deep learning approach to infer the tumor microenvironment (TME) composition directly from histopathology images of NSCLC patients. We show that HistoTME accurately predicts the expression of 30 distinct cell type-specific molecular signatures directly from whole slide images, achieving an average Pearson correlation of 0.5 with the ground truth on independent tumor cohorts. Furthermore, we find that HistoTME-predicted microenvironment signatures and their underlying interactions improve prognostication of lung cancer patients receiving immunotherapy, achieving an AUROC of 0.75 [95% CI: 0.61-0.88] for predicting treatment responses following first-line ICI treatment, utilizing an external clinical cohort of 652 patients. Collectively, HistoTME presents an effective approach for interrogating the TME and predicting ICI response, complementing PD-L1 expression, and bringing us closer to personalized immuno-oncology.

Cellular engagement and interaction in the tumor microenvironment predict non-response to PD-1/PD-L1 inhibitors in metastatic non-small cell lung cancer

Article Open access 31 May 2022

Multi-omic profiling highlights factors associated with resistance to immuno-chemotherapy in non-small-cell lung cancer

Article 10 December 2024

Tumour and microenvironment crosstalk in NSCLC progression and response to therapy

Article 16 May 2025

Introduction

Lung cancer is the leading cause of cancer-related mortality globally, of which non-small cell lung cancer (NSCLC) is the most common histological subtype¹. In recent years, immune checkpoint inhibitors (ICI) have radically transformed the prognosis of clinically advanced NSCLC. Although ICI-treatment is approved for the treatment of NSCLC patients in the localized and metastatic setting regardless of PD-L1 status^2,3,4, many patients still fail to achieve clinically meaningful responses. Considering the high costs of treatment and potential immune related toxicities^5,6, identifying biomarkers of anti-PD1/PDL1 treatment response is of critical importance. The use of PD-L1 as a biomarker to determine which patients respond to ICI-treatment is hindered by challenges associated with standardized quantification of PD-L1 expression⁷. Controversies such as responders with low PD-L1 expression levels and non-responders with high PD-L1 levels are very common⁸. For example, after selecting patients based on PD-L1 expression, response rates to treatment still vary widely, ranging from 17 – 49% in patients with Tumor Proportion Score (TPS) > 1%⁹. These discrepancies could be attributed to a high level of intratumor heterogeneity of PDL1 expression and the underlying complexity of tumor microenvironment¹⁰. The clinical utility of other biomarkers such as the Tumor Mutational Burden (TMB) has been explored in several different clinical trials^11,12,13,14. However, both PDL-1 expression and TMB fail to encapsulate various tumor microenvironmental features influencing ICI responses^15,16,17. Hence, there is a clinically unmet need for additional predictive biomarkers capturing both tumor and microenvironmental factors associated with ICI responses.

In recent years, several new multiplex tissue imaging and spatial transcriptomics technologies have been used for profiling the tumor microenvironment (TME) of patients in unprecedented detail^{18,19,20,21,22,23,24,25,26}. However, they are quite expensive, which does not allow them to be implemented for wider use in a clinical setting. Hematoxylin and Eosin (H&E)-stained pathology slides, on the other hand, are relatively cheap and easily accessible in any pathology lab. These slides hold a wealth of TME-related information that can be unlocked with the help of Artificial Intelligence (AI)²⁷. One of the first large-scale attempts to characterize the TME of patients from H&E slides was the work of Saltz et al., who mapped the abundance and spatial distribution of tumor infiltrating lymphocytes (TILs) across 23 different cancer types^28,29. Graham et al. developed Hover-Net³⁰, a pan-cancer nuclei segmentation and classification neural network that enables single-cell quantification of tumor, stroma, and lymphocyte populations from H&E slides. More recently, Diao et al. developed a collection of supervised machine learning (ML) methods to quantify 607 human interpretable TME features from histopathology images³¹. While these approaches are extremely valuable, they are limited by availability of relevant pixel-level annotations from expert pathologists, which are time and resource consuming to generate. To overcome these limitations, several research groups have alternatively proposed the use of weakly-supervised deep learning models, which can be trained to perform various downstream computational pathology tasks such as tumor subtyping and prognosis^{32,33,34,35,36,37} without any region or pixel-level annotations.

Building on these recent AI-based advances, we introduce HistoTME: a weakly supervised multi-task learning approach to infer the TME composition of patients from routinely collected pathology slides. Unlike previous approaches, HistoTME harnesses recently developed digital pathology foundation models to infer the expression of 30 distinct cell type-specific TME signatures, previously shown to be associated with immunotherapy responses³⁸. HistoTME is trained in a weakly supervised fashion utilizing matched whole slide H&E and bulk transcriptomics data of 865 NSCLC patients from The Cancer Genome Atlas (TCGA), validated on matched whole slide H&E and bulk transcriptomic data of 333 NSCLC patients from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and tested on whole slide H&E and IHC data from 82 NSCLC patients that had complete surgical resection at SUNY Upstate Medical University. We further demonstrate the clinical utility of HistoTME predictions by retrospectively analyzing needle biopsy specimens and clinical outcome data from an additional 570 NSCLC patients from the SUNY Upstate cohort treated with either chemotherapy or immune checkpoint inhibitors. Importantly, we show that HistoTME AI scores complement low PD-L1 expression and can identify more patients responding to first line immune checkpoint inhibitor therapy. Taken together, HistoTME presents a versatile and accessible tool for unraveling the complex dynamics of TME, leading to improved risk stratification and management of NSCLC patients.

Results

Overview of HistoTME

HistoTME is a deep learning model trained to predict the average normalized gene expression levels of 30 cell type-specific TME signatures from whole slide H&E images, collectively providing a comprehensive profile of the TME composition^38,39 (Fig. 1A). HistoTME was trained using whole slide images (WSI) of H&E staining and matched bulk transcriptomics data from the TCGA-NSCLC cohort (N = 865 patients) and validated using an external cohort of 333 NSCLC patients from CPTAC using the same data modalities (Supplementary Fig. 1). The HistoTME model consists of two main components: a frozen feature extraction component and a trainable attention-based multiple instance learning (AB-MIL) component⁴⁰. In our efforts to efficiently train HistoTME, we explored three state-of-the-art open-source foundational models—CTransPath, RetCCL, and UNI^32,33,37—as potential feature extractors. Additionally, we conducted experiments with two distinct approaches for AB-MIL: a single-task approach featuring a unique attention and multilayer perceptron (MLP) head for each TME signature, and a multitask approach, which incorporates a shared attention head for functionally related TME signatures but maintains separate MLP heads for each individual TME signature (Fig. 1B).

**Fig. 1: Overview of pipeline and performance of HistoTME.**

HistoTME accurately infers the TME composition of NSCLC patients from histopathology images

Of all versions of HistoTME that were explored, we observed that the version utilizing multi-task AB-MIL + the UNI foundation model produced the most accurate predictions when compared with the ground truth, achieving an average Pearson correlation coefficient of 0.50 (Fig. 1C, Supplementary Fig. 2A). The performance of single-task AB-MIL + UNI was slightly worse than the performance of multi-task AB-MIL + UNI. However, both single-task AB-MIL + UNI and multi-task AB-MIL + UNI significantly outperformed other versions of HistoTME for predicting antitumor and protumor immune signatures while displaying similar performance for other signature prediction tasks (Supplementary Fig. 3). Therefore, we settled on multitask AB-MIL + UNI as the final version of HistoTME. Having validated the accuracy of HistoTME on CPTAC-NSCLC data, we next tested HistoTME on whole slide H&E images of 82 NSCLC patients from SUNY Upstate Medical University, which had serial immunohistochemistry (IHC) performed on surgical resection specimens for immune cell panel: T cells (CD3, CD4, DC8), B cells (CD20) and Macrophage (CD163) markers using adjacent serial sections (Supplementary Fig. 1). Overall, we found that HistoTME predicted expression levels were correlated with the abundance of each cell type derived from IHC, achieving Pearson correlations of 0.60 [95% CI: 0.49-0.69] for T cells, 0.48 [95% CI: 0.21-0.67] for B cells, and 0.41 [95% CI: 0.28-0.53] for macrophages, and Spearman correlations of 0.60 [95% CI: 0.44-0.72] for T cells, 0.54 [95% CI: 0.34-0.71] for B cells, and 0.49 [95% CI: 0.30-0.65] for macrophages (Fig. 1D, Supplementary Fig. 2B). Spearman correlations were included due to the uncertain linear correlation between gene expression and IHC-measured protein abundance⁴¹.

We then trained a simple unsupervised model on the TCGA + CPTAC cohorts to cluster NSCLC patients into distinct subgroups based on their predicted TME signatures (Fig. 2A, see Methods). This model effectively identifies two main clusters resembling the Immune-Inflamed and Immune-Desert phenotypes^42,43,44(Fig. 2B, Supplementary Fig. 4). Subsequently, we applied this two-cluster model to the entire institutional SUNY Upstate cohort, including patients with core needle biopsies (N = 652 patients) (Fig. 2C). We next compared the distribution of various clinical characteristics of patients across model-predicted immune inflamed and immune desert clusters including age, smoking history, tissue type (Primary vs metastatic), specimen source (biopsy vs surgical specimen), disease stage, grade, PDL1 score and histopathologic subtype. Overall, we do not find any significant differences in the distribution of majority of the clinical features across the two subtypes with the exception of tumor stage, tissue type (primary vs metastatic) and PD-L1 scores (Supplementary Fig. 5).

**Fig. 2: Overview of the computational pipeline to classify patients into distinct clusters based on their H&E-predicted TME composition.**

Upon examining the feature importance scores assigned to each TME signature by the trained two-class classification model (Fig. 2D), we found that the following signatures: T cell traffic, Antitumor Cytokines, MDSC(myeloid-derived suppressor cells), Co-activation molecules and Macrophage/Dendritic Cell Traffic were among the top 5 TME signatures driving the distinction between the Immune-Inflamed and Immune-Desert clusters. Interestingly, in line with these results, we noted substantial differences in the abundances of T cells, B cells and, macrophages, when observed through IHC, among patients predicted to be either in the Immune Inflamed or in the Immune Desert cluster (Fig. 2E).

To understand the histopathological features influencing HistoTME predictions, we generated attention maps for all 652 patients within the SUNY Upstate cohort. In general, HistoTME attends to different areas of the TME to estimate the expression of antitumor, pro-tumor, angiogenesis/stroma and malignant cell signatures, with the exception of antitumor and protumor immune signatures (Supplementary Fig. 6). We additionally performed a qualitative review of four randomly chosen resection cases (2 Immune-Inflamed, 2 Immune-Desert) with the help of two board-certified pathologists (Figs. 3 and 4, Supplementary Figs. 7 and 8). Overall, for immune-inflamed cases, which have relatively high predicted expression levels of antitumor and protumor immune signatures, HistoTME assigns great attention to regions abundant with lymphocytes and the formation of lymphocytic aggregates around tumor-stroma boundaries. For, immune desert cases, which have relatively low predicted expression levels of antitumor and protumor immune signatures, HistoTME assigns great attention to regions containing solid areas of more pleomorphic cells, and in addition, dense fibrotic areas within and around the tumor periphery. Collectively, these results suggest that HistoTME effectively captures the TME composition of tumors from H&E slides.

**Fig. 3: Overview of HistoTME predictions for a lung resection case predicted to be immune inflamed.**

**Fig. 4: Overview of HistoTME predictions for a lung resection case predicted to be immune desert.**

Association between the tumor microenvironment status and survival outcomes of NSCLC patients treated with immune checkpoint inhibitors

We next conduct a retrospective analysis to assess how the TME status (Inflamed vs Desert), inferred from H&E slides, relates with survival outcomes of NSCLC patients from the SUNY Upstate cohort treated with immune checkpoint inhibitor therapy. All patients that underwent independent PD-L1 IHC testing were considered for treatment with immune checkpoint inhibitors with a total of 292 patients ultimately receiving either an anti-PD1 or PD-L1 inhibitor as monotherapy or in combination with chemotherapy. Overall, 77% of these 292 patients had PD-L1 ≥ 1% and 50% had metastatic disease (stage IV). A detailed summary of the clinical cohort is available in Supplementary Table 1.

Interestingly, we found that the TME status is particularly predictive of overall survival of patients receiving ICIs as first line of therapy (Fig. 5A, log-rank test p-value = 0.0012, HR = 0.53 [95% CI: 0.36-0.78]), especially when administered in combination with chemotherapy (First line ICI+chemo log-rank test p-value: 0.00067, HR = 0.39 [95% CI: 0.22-0.68], First line ICI monotherapy log-rank test p-value: 0.22, HR = 0.68 [95% CI: 0.37-1.26]; Supplementary Fig. 9). Importantly, TME status remains predictive of overall survival of first line ICI-treated patients even after accounting for confounding effects of tumor stage, tissue type (primary vs metastatic) and PDL1 scores (HR = 0.57 [95% CI: 0.38-0.85], Cox proportional hazards p-value: 0.006; Supplementary Fig. 10) and restricting the analysis to patients with needle biopsy specimens (log-rank test p-value 0.0034, HR = 0.54 [95% CI: 0.36-0.82], Supplementary Fig. 11). The TME status, however, is less effective at predicting overall survival when considering all lines of ICI-treated patients (Supplementary Fig. 12A, log-rank test p-value: 0.02, HR = 0.7 [95% CI: 0.52-0.95]) and non-ICI treated patients (Supplementary Fig. 13, HR = 0.84 [95% CI: 0.6-1.19], log-rank test p-value: 0.029, Cox proportional hazards p-value: 0.34). When looking at PD-L1 expression, patients with PDL1 expression > = 50% and receiving ICI treatment as first line of therapy showed markedly improved survival compared to those with PD-L1 1-49% or <1% (Fig. 5B, log rank test p-value = 0.0059, HR = 0.62 [95% CI: 0.37- 1.04]). PDL1 expression was however less effective at predicting of overall survival when considering all lines of ICI-treated patients (Supplementary Fig. 12B, log rank test p-value = 0.13, HR = 0.9 [95% CI: 0.68–1.18]). When looking at progression-free survival (PFS), we observe that both the TME status and PD-L1 expression are primarily predictive of PFS at first line ICI therapy (TME status: log rank test p-value = 0.0037), HR = 0.59 [95% CI: 0.41-0.85]; PD-L1 expression: log rank test p-value = 0.003, HR = 0.55 [95% CI: 0.35-0.88]; Supplementary Fig. 14.

**Fig. 5: Association between HistoTME-based TME classification and overall survival outcomes of SUNY NSCLC patients treated with first-line anti-PD1/PD-L1 therapy (first-line IO patients).**

When performing additional subgroup analysis of patients receiving first line ICI treatment, we find that the H&E-inferred TME status is primarily predictive of overall survival outcomes of PD-L1 absent (<1%) patients (Fig. 5C; log rank test p-value = 0.08, HR = 0.4 [95% CI: 0.13- 1.18]), and PD-L1 low (1–49%) patients (Fig. 5D; log rank test p-value = 0.009, HR = 0.44 [95% CI: 0.23-0.82]) but not PD-L1 high (≥50%) patients (Fig. 5E; log rank test p-value = 0.85, HR = 0.94 [95% CI: 0.48-1.84]). Subgroup analyses by histologic subtype revealed that HistoTME-derived TME status was particularly predictive of overall survival in adenocarcinoma patients, which were a majority of the patients treated with first line ICI therapy, while less so in patients with squamous cell carcinoma (Supplementary Fig. 15A, B). Interestingly, first-line ICI patients with rare neuroendocrine or pleomorphic histologic subtypes (i.e., defined as “other”) and an immune-inflamed TME had worse overall survival compared to those with an immune-desert TME in first line ICI treatment setting (Supplementary Fig. 15C).

When utilizing the TME status to predict treatment responses (i.e., predict immune inflamed as responder and immune desert as non-responder), we achieve a sensitivity of 0.46 [95% CI: 0.35-0.58], specificity of 0.72 [95% CI: 0.60-0.81] and positive predictive value of 0.64 [95% CI: 0.51-0.76] for first line ICI-treated patients and a sensitivity of 0.44 [95% CI: 0.35-0.54], specificity of 0.73 [95% CI: 0.63-0.81] and positive predictive value of 0.63 [95% CI: 0.52-0.74], when considering all ICI-treated patients. When utilizing PD-L1 expression to predict treatment response (i.e., predict PD-L1 > = 50% as responder and PD-L1 1-49% or <1% as non-responder), we achieve a sensitivity of 0.49 [95% CI: 0.38-0.60], specificity of 0.74 [95% CI: 0.63-0.84] and positive predictive value of 0.67 [95% CI: 0.54-0.79] for first line ICI-treated patients and sensitivity of 0.44 [95% CI: 0.35-0.54], specificity of 0.72 [95% CI: 0.62-0.80] and positive predictive value of 0.63 [95% CI: 0.51-0.73] when considering all ICI-treated patients. When combining TME status and PD-L1 expression into a single predictor, which defines any patient with either an inflamed TME or PD-L1 > = 50% as responders, we achieve a sensitivity of 0.69 [95% CI: 0.58-0.79], specificity of 0.59[0.47-0.70] and positive predictive value of 0.64 [95% CI: 0.54-0.74] for first line ICI-treated patients and a sensitivity of 0.65 [95% CI: 0.56-0.74], specificity of 0.56 [95% CI: 0.47-0.66] and positive predictive value of 0.62 [95% CI: 0.52-0.70] when considering all ICI-treated patients. Collectively, these results highlight the complementary value of HistoTME for prognostication of NSCLC patients, especially for those with PD-L1 < 50% and being considered for first line treatment with ICI+chemotherapy.

Besides PDL1, several groups have also explored the prognostic value of tumor infiltrating lymphocytes (TIL) detected from histopathology images, using automated deep learning tools^27,45,46. Hence, we evaluated the added prognostic value of HistoTME-derived TME signatures relative to percentage TILs detected using a state-of-the-art open source TIL detection tool, CellVIT⁴⁷, with well documented source code for cell segmentation and classification. Immune inflamed patients had significantly higher %TIL counts relative to non-inflamed patients (Supplementary Fig. 16A) highlighting the ability of HistoTME to capture variations in immune cell abundances, despite being trained without any pathologist annotations. We next aimed to assess the prognostic value added by HistoTME-derived immune inflamed and immune desert clusters upon %TILs derived from CellVIT. Hence, we first stratified patients treated with first line ICI into high-TIL and low-TIL subgroups based on a previously established cut-off of >10% TILs in NSCLC⁴⁸ and then compared overall outcomes of each %TIL subgroup. Our findings indicate that HistoTME-derived immune inflamed and immune desert predictions achieve superior stratification of overall survival outcomes of patients receiving first line ICI therapy compared to %TIL subgroups (Supplementary Fig. 16B), and can predict ICI responses in both low and high %TIL subgroups (Supplementary Fig. 16C, D). Moreover, joint Cox proportional hazards modeling of both HistoTME and CellVIT revealed that HistoTME-predicted immune inflamed and immune desert clusters achieve statistically significant hazard ratios, exceeding the magnitude of hazard ratios achieved by %TILs (Supplementary Fig. 16E). These results collectively highlight the added prognostic value of a comprehensive assessment of the TME composition, beyond TILs alone, for accurate prediction of patient responses to ICI therapy.

Interactions between TME signatures improve prediction of immunotherapy response

Interactions among the various components of the TME play a key role in influencing immunotherapy responses⁴⁹. A previous study by Liu et al. highlighted an example of this complexity, revealing in NSCLC patients treated with ICI that only high PD-L1 expression in macrophages was correlated with better overall survival, while high PD-L1 expression in tumor or stromal cells was not¹⁷. Hence, we developed a supervised ML model that incorporates interactions between H&E-inferred TME signatures to predict immunotherapy responders (See Methods).

Specifically, we engineered 1740 interaction features by taking the sum, difference, product, and quotient of each pair of signatures to characterize interactions between TME signatures, then used a random forest model to select the most important interaction features from the training set, and trained XGBoost, a gradient boosted decision tree, using selected interaction features for ICI response prediction (Fig. 6A; Methods). After applying 5-fold cross-validation to the training set to optimize the number of features selected, 18 TME signature interactions were chosen. These interactions maximize the cross-validation receiver operating characteristic curve (AUROC), achieving a CV AUROC of 0.68 (Supplementary Fig. 17). Of the 18 pairwise interactions, coactivation molecules, T cell traffic, and MDSC traffic were incorporated most frequently (Fig. 6B). We then retrained XGBoost on the full training set using these 18 selected interaction features. The trained model utilizing interaction features predicted response on the test set with an AUROC of 0.68 [95% CI: 0.55-0.80], whereas using TME signatures alone, without interactions, achieved an AUROC of 0.55 [95% CI: 0.41-0.69], although the difference was not significant (p = 0.17, paired DeLong’s test) (Fig. 6C). Furthermore, when only considering patients from the test set that received first line ICI-treatment, we found accuracies for predicting treatment response improved to an AUROC of 0.75 [95% CI: 0.61-0.88], while achieving a lower AUROC of 0.51 [95% CI: 0.23-0.78] for non-first-line ICI-treated patients.

**Fig. 6: Overview of supervised machine learning model development and validation for predicting immune checkpoint inhibitor responses based on HistoTME predictions.**

At a probability threshold that maximized the Youden index, the model predicted responders with a sensitivity of 0.80 [95% CI: 0.63-0.92], specificity of 0.50 [95% CI: 0.32-0.68], and positive predictive value of 0.71 [95% CI: 0.49-0.87]. At this threshold, the predicted responders did not have significantly higher overall survival compared to non-responders in the entire test set (OS: HR = 0.35; 95% CI: -0.26 to 0.96; p = 0.26). However, when considering only first-line ICI-treated patients in the test set, the predicted responders had significantly longer survival time compared to non-responders (OS: HR = 0.90; 95% CI: 0.20 to 1.60; p = 0.0095) (Fig. 6D, E). Furthermore, we observed significantly higher progression-free survival (PFS) outcomes for patients that were treated with first line ICI and predicted to be responders (PFS: HR = 0.94; 95% CI: 0.29 to 1.59; p = 0.0035) but not for second-line or subsequent-line ICI-treated patients (Supplementary Fig. 18). Using a Shapley additive explanation (SHAP)⁵⁰ summary plot ordered based on feature importance, we determined TME signature interactions that had the most influence on the response prediction, with the most important interaction consisting of the sum between coactivation molecules and Th2 signatures (CM+Th2) (Fig. 6F). High levels of CM+Th2 corresponded to higher SHAP values, indicating higher predicted probabilities of response.

Discussion

Immune checkpoint inhibitors have emerged as a promising treatment option for lung cancer. Yet only a minority of patients respond to these treatments, sometimes at the cost of severe toxicities and financial implications for those who may not respond to this therapy. Hence it is imperative to identify effective biomarkers capable of differentiating responders from non-responders prior to therapy initiation. The tumor microenvironment plays a fundamental role in shaping the responses of patients to immunotherapy. However, its dynamic nature, with multiple interacting components, makes it challenging to identify robust predictive biomarkers^32,33. The integration of deep learning methods with digital pathology presents a promising and potentially cost-effective approach to interrogate the TME and alleviate some of these issues⁵¹.

In this work, we introduce HistoTME, a novel weekly supervised deep learning method to characterize the TME of patients from H&E slides, leveraging enhanced feature extraction capabilities of recent digital pathology foundation models. In contrast to recent deep learning approaches, which aim to predict spatially resolved gene expression profiles from histopathology images^52,53,54,55, our approach aims to infer the TME composition through estimating the expression of distinct functional TME signatures. A key advantage of this approach is that by learning to predict the expression of gene signatures, we not only avoid overfitting to the expression of individual genes but also increase interpretability by directly relating specific histopathological features to previously established biological concepts³⁸. Due to the limited benchmarking of foundation models for continuous biomarker prediction tasks, we experimented with three popular foundation models as feature extractors. We found out that the UNI foundation model³², when paired with multi-task AB-MIL, achieved the best predictive performance for the various TME signature prediction tasks. This improved performance likely stems from the considerably large histopathology datasets used to pre-train the UNI foundation model, as well as the added regularization induced by multitask learning.

With the help of HistoTME, we next classify patients into two distinct TME subtypes: Immune-Inflamed and Immune-Desert, characterized by distinct expression of anti-tumor and protumor immune signatures. These observations were further corroborated by significant differences in cellular abundances of T cells, B cells and macrophages across patients, based on matched IHC data. Moreover, the two distinct TME subtypes were significantly predictive of OS and PFS in the patients treated with first-line immune checkpoint inhibitors, going beyond TIL estimates and PD-L1 scores. While numerous deep learning studies have emerged for predicting ICI responses in NSCLC from H&E images, they are primarily focused on refining PD-L1 quantification^56,57,58. In contrast to previous studies, our approach aims to offer a more comprehensive overview of the tumor microenvironment by predicting the TME cell type and molecular composition. The rationale and motivation behind our approach come from recent clinical trials showing that PD-L1 expression alone is insufficient to predict ICI treatment outcomes (Checkmate 057^59,60, OAK⁶¹). Our results indicate that HistoTME-generated TME composition predictions can further identify patients associated with ICI response, particularly in the low PD-L1 expression group where additional biomarkers for ICI response are urgently needed. In the future, we anticipate that combining our approach with emerging deep learning-based PD-L1 biomarker assays will further enhance ICI response prediction.

In addition to clustering, we developed a novel supervised model to predict ICI response, which analyzes interactions between distinct TME signatures. Our supervised model achieves an AUROC of 0.68 in the unseen test patient cohort, improving to an impressive AUROC of 0.75 when predicting responses to first-line immunotherapy, highlighting the significance of interactions among various TME components in shaping treatment responses. On performing SHAP analysis, we found that the total expression of coactivation molecules and the Th2 signature were the most predictive of ICI response. This observation likely stems from the importance of coactivation molecules, such as CD28, which is essential for T cell activation after interacting with antigen-presenting cells, in cancer immunotherapy response⁶². Although Th2 signatures have been associated with an immune suppressive TME, their role in responses to ICI treatment remains unclear⁶³. The emphasis laid by the AI model on this interaction could be explained by coactivation molecules promoting production of Th2 cytokines⁶⁴, indicating that high levels of coactivation molecules and Th2 may offer a favorable environment crucial for therapeutic response. It is important to note, however, that interpreting the feature importance of ML models requires caution due to potential variability in feature importance depending on the training set, particularly with limited-size datasets.

Compared to previous studies of predicting response to ICI from H&E slides, we demonstrated a unique ability to identify responders using both unsupervised and supervised ML techniques. Both these approaches can be valuable for deriving ICI response biomarkers depending on the availability of clinically annotated datasets. Hu et al. developed a supervised deep learning model that used extracted hidden features from histopathology images to predict anti-PD-1 response in melanoma and lung cancer. However, a main caveat of their approach was the limited interpretability of their hidden features⁶⁵. In addition, they reported modest accuracies (AUROC: 0.645 [95% CI: 0.495-0.784]) in predicting ICI response for lung cancer patients, which they attribute to the dataset consisting of core-needle biopsy samples rather than surgery samples. In another recent study, Wang et al. achieved an impressive performance by using hand-crafted features derived from spatial interactions of tumor cells and TILs²⁷; however, their approach required a minimum of five large image patches due to the reliance on spatial interactions, which may not generalize well on needle biopsies. Most importantly, to our knowledge, this is the first study showing that predicting molecular features of TME is feasible from scanned H&E images. This makes HistoTME extremely versatile and useful for analysis of both surgical resection and needle biopsy data. In fact, HistoTME signatures are able to effectively predict responses of patients while also maintaining interpretability, despite limited availability of tumor tissue from core needle biopsies, which make up ~85% of the SUNY cohort.

This work has some limitations that should be further considered. First, while HistoTME offers valuable insights into TME composition from H&E slides, its inability to predict the spatial distribution of specific cell types or molecules is a limitation stemming from its weakly supervised training on bulk transcriptomics data. To address this, future advancements in HistoTME could consider harnessing emerging large-scale spatial transcriptomics datasets⁶⁶ for enhanced training and validation. Second, although a large clinical cohort of ICI-treated patients was studied, we lacked additional external validation datasets to further validate the prediction of responses to ICI. Second, this work was conducted in a retrospective manner. We plan to further validate HistoTME in additional external cohorts of patients treated with ICI, both retrospectively and prospectively. Third, the development of HistoTME was limited to patients with NSCLC, implying that further testing and development will be required to extend the approach to different cancer types. Fourth, only three TME signature predictions were validated using IHC due to the lack of other strictly comparable IHC markers for other signatures. Bagaev et al.³⁸, who previously analyzed these TME signatures directly from transcriptomics data identified 4 distinct TME subtypes, which besides capturing differences in the activity of immune signatures also capture notable differences in activity of stromal and angiogenesis-associated signatures. In contrast, our approach results in the identification of two subtypes (Immune-Inflamed and Immune-Desert). This difference could potentially be explained by lower predictive accuracies of HistoTME for stroma and angiogenesis-associated signatures compared to immune signatures. We plan to utilize more complex molecular profiling tools, such as spatial transcriptomics, to further train and validate the accuracy of HistoTME predictions for other signatures. Lastly, while we showed promising results for predicting responses to first-line ICI treatment, the predictive accuracy of HistoTME was limited for the patients who received immune checkpoint blockade as second and subsequent-line treatment. This finding can be attributed to the fact that a majority of patients in this cohort received ICI as first-line treatment. In addition, since the TME is dynamically changing throughout the treatment, reflecting either response to therapy or tumor progression⁶⁷, the H&E WSIs may not accurately represent the TME of patients following first-line treatment. Hence, it is important to consider the time interval between the H&E biopsy and ICI treatment to assess the utility of HistoTME for the prediction of treatment response.

In conclusion, HistoTME is an effective approach to characterize the TME of NSCLC patients and identify patients who will benefit from ICI therapy. Being based on H&E slides alone, HistoTME allows for a broad characterization of the TME without the need for expensive molecular tests or additional tissue stains. Given the routine use and low cost of H&E slides in diagnostic pathology along with the increasing adoption of digital and computation pathology in clinical practices, HistoTME promises to improve clinical management of cancer patients undergoing immunotherapy. Future research should focus on validating HistoTME in diverse patient populations and exploring its applicability to other cancer types, potentially extending its benefits beyond lung cancer. Finally, HistoTME can help advance our understanding of the role of TME in the context of other cancer treatments, opening avenues for the discovery of novel biomarkers and accelerating the adoption of personalized immuno-oncology.

Methods

Description of the SUNY NSCLC cohort

This retrospective institutional cohort of 692 NSCLC patients was assembled at SUNY Upstate university, of which 652 were analyzed in this study based on the following criteria: primary diagnosis of non-small cell lung cancer who were followed at SUNY Upstate for at least a minimum of 2 years, and had PD-L1 testing record along with IHC slide availability including corresponding H&E slides (n = 652 patients, 1329 H&E slides). Neoadjuvant or perioperative cases were not included and analyzed in our study. All H&E slides from the SUNY Upstate cohort were obtained prior to receiving any treatment. The query and chart review were done using electronic medical records (EMR, EPIC) and pathology information system (Co-Path) for abstracting clinico-pathological, treatment and follow-up information. All clinical and pathology imaging data from the SUNY Upstate cohort was approved for use in this study by the local Institutional Review Board (reference numbers: 1857564, 2153970) and considered exempt from further IRB oversight. The requirement for patient consent was waived as all data was collected retrospectively and anonymized. The details of the cohort description are provided in Supplementary Table 1. Briefly, 292 (44.8%) patients were treated with immunotherapy, and of these patients, 230 had treatment response information (partial, complete, stable) as documented radiographically or clinically by the treating physician in the patient’s charts (progress notes, EPIC). More specifically, the responders were defined as patients that exhibited a partial, complete, or stable disease without experiencing any recurrence or death for at least 6 months since the start of ICI treatment. Non-responders were defined as patients who exhibited progressive disease or death within 6 months since the start of ICI treatment. Overall survival (OS) of patients was defined as the time from the date of diagnosis until death from any cause. Patients who were alive at the last follow-up were censored for overall survival analysis. Progression-free survival (PFS) of patients receiving checkpoint inhibitor treatment was defined as the time interval from immune checkpoint inhibitor start until progression or death. Patients who were alive without disease progression at their last follow-up were censored for PFS.

Specimen availability, immunohistochemical analysis and slide scanning

Briefly, from 652 patients, 445 had tumor specimens available from primary disease sites (biopsy = 398, resection = 47) and 207 from metastatic sites (biopsy=169, resection = 35). Serial immunostaining was done on four-micrometer-thick freshly cut serial sections using archived FFPE blocks of surgically resected specimens from 82 cases diagnosed with NSCLC (47 primary sites, 35 metastatic sites). The following biomarker panel was used for immune (CD3, CD20, CD4, CD8, CD163, FOXP3), cancer-specific (TTF-1, P40) and epithelial markers (PAN-CK). The staining was performed in a CLIA-certified clinical pathology lab using an automated immunostainer BenchMark Ultra (Roche Diagnostics, Germany) at SUNY Upstate Medical University. For pretreatment, antibody detection and counterstaining, the following reagents were used: ULTRA CC1 (Cat #950-124), UltraView DAB (Cat. 760–500), UltraView Red (Cat # 760-501) and Hematoxylin (Cat. 760–2021) according to the manufacturer’s instructions (Ventana Medical Systems; Roche Diagnostics, Germany). The details of the primary and secondary antibodies, antigen retrieval conditions, as well as detection methods are listed in Supplementary Table 2. PD-L1-stained slides (clone 22C3 PharmDx^TM, 28-8 pharmDx^TM, Dako Autostainer Link 48 platform), along with negative controls and corresponding H&E of 406 patients, were requested from LabCorp and the rest (n = 246 patients) were obtained from the local pathology archives at SUNY Upstate Medical University. Glass slides were digitized using an Aperio AT2 Dx scanner (Leica Biosystems, CA, USA) at 40x magnification at the Pathology Research Core at SUNY Upstate. PD-L1 manual scoring was performed by expert pathologists using an FDA-approved assay and scoring guidelines at LabCorp. Tumor proportion score (TPS) was calculated as the % of viable positive tumor cells/all tumor cells, where positivity was defined as partial and/or complete membrane staining at any intensity ( > 1%) in tumor cells. PD-L1 quantification was categorized into clinically relevant groups as approved by the FDA: <1% (absent), 1-49% (low), and ≥50% (high).

Description of pre-processing steps for whole slide H&E images

All WSI in the experiments were first preprocessed to mask out the tumor tissue from background using RGB to HSV color transformation, median blurring and Otsu thresholding⁶⁸. Following tissue segmentation, each WSI was split into image tiles of physical size 256 µm × 256 µm (i.e 512 × 512 pixels at 20x magnification). Our choice of tessellation of whole slide images at this resolution was motivated by recent successful applications of foundation models on H&E slides, as demonstrated by studies predicting colorectal cancer MSI⁶⁹ and pan-cancer homologous recombination deficiency⁷⁰. Each whole slide image tile with an overlap >25% with the tumor tissue was stain normalized using the Macenco algorithm⁷¹ and scaled to have 0 mean pixel intensity and standard deviation of 1 prior to being fed as input to an open-source foundation model—CTransPath³³, RetCCL³⁷ or UNI³²—which learns to extract informative histopathologic features from each tile. CTransPath model consists of a convolutional neural network (CNN) and a multi-scale Swin Transformer architecture as its backbone, RetCCL uses a CNN-based architecture, and UNI implements a vision transformer (ViT). All foundation models were pre-trained self-supervised learning; CTransPath and RetCCL were pretrained on ~30,000 WSIs, while UNI was pretrained on ~100,000 WSIs. Feature extraction from each pretrained foundation model results in a feature matrix of shape (n × 768) for CTransPath, (n × 2048) for RetCCL, and (n × 1024) for UNI per patient, where n represents the number of total number image tiles derived from WSI of each patient.

Experimental setup and implementation details of HistoTME

HistoTME was trained using matched patient-level bulk RNA sequencing and whole slide imaging data of 865 patients (955 WSIs) from the TCGA cohort and validated on patient-level bulk RNA sequencing and whole slide imaging data of 333 patients (1501 WSIs) from the CPTAC cohort. Pre-processed bulk RNA sequencing data (gene level TPM counts) from each patient were downloaded from NCI Genomic Data Commons and further analyzed using the bioinformatics pipeline previously published by Bagaev et al. to calculate the average normalized expression of distinct functional gene sets³⁸, referred to as TME signatures, which comprehensively capture TME composition and its various functional characteristics.

For each patient, histopathologic features were extracted from WSI tiles using one of the three foundation models described above. WSI tile-level features were then concatenated together into a single “bag-of-features” representation to facilitate weakly supervised regression, using the attention-based multiple instance learning (AB-MIL) method proposed by Ilse et al.⁴⁰. The AB-MIL model consists of a learnable attention module, which assigns a weight, commonly referred to as attention, to each tile, and a feature aggregation module, which calculates the weighted sum of features across all tiles. This results in a single patient-level representation, summarizing key histopathological characteristics of the TME. The output of the feature aggregation module is then fed to a multilayer perceptron (MLP) module, which learns to predict the expression of a specific TME signature (single task) or multiple functionally related TME signatures (multi-task) as established previously by Ayers et al.³⁹ and Bagaev et al.³⁸. The AB-MIL model was trained with the AdamW optimizer⁷² and Huber loss function with delta set to one, which mitigates overfitting of model predictions to outliers by balancing the mean squared error and mean absolute error together, defined by Eq. (1):

$$L\left(y,\,f\left(x\right)\right)=\left\{\begin{array}{c}{\frac{1}{2}(y-f(x))}^{2}\\ \delta \left|y-f(x)\right|-\frac{1}{2}{\delta }^{2}\end{array}\right.\begin{array}{c}{for}\left|y-f\left(x\right)\right|\le \delta\\ {otherwise}\end{array}$$

(1)

The learning rate was set to 1 × 10⁻⁴ with a weight decay of 1 × 10⁻⁴. Batch size was set to 1 with gradient accumulation for 8 batches. Overall, for each model benchmarked in this study, training was done for 40 epochs with early stopping criteria of 10 consecutive epochs with no improvement in validation loss.

After completion of training, the Pearson correlation metric was utilized to evaluate accuracy of predictions of each model on the independent validation set (CPTAC-NSCLC). 95% confidence intervals (CI) of the Pearson correlation metric were estimated through 1000 bootstrapping iterations using the SciPy package⁷³. The model achieving the highest Pearson correlation coefficients, on average, was eventually selected for external testing and determining ICI efficacy. All AI models were developed using open source PyTorch version 2.1.0⁷⁴.

The external test cohort consisted of serial H&E and multiplex IHC sections of surgically resected tumors from 82 NSCLC patients (47 primary tumors, 35 metastases) enrolled at SUNY Upstate Medical University (See cohort description above). The best model, as determined from the benchmarking experiments on the CPTAC validation set, was applied to this test cohort to estimate the expression of T cell, B cell and Macrophage signatures. These expression predictions were then compared against the actual cellular abundances of respective cell types from corresponding IHC slides, using both Pearson and Spearman correlation metrics given the different scales of quantification. Cellular abundances of T cells (CD4, CD8), B cells (CD20) and Macrophages (CD163) were estimated from corresponding IHC stains using an open-source cell counting software QuPath (v0.5.0)⁷⁵. Specifically, for each whole-slide IHC image, the tumor tissue was manually segmented and separated from the background. Following manual segmentation, a standard pipeline of stain deconvolution and positive cell segmentation was implemented to quantify the cell type abundance, defined as the total number of marker-positive cells per mm² (https://qupath.readthedocs.io/en/stable/docs/tutorials/cell_detection.html). Three patients from this test cohort contained surgical resections from lymph-node metastases with an extremely high number of immune cells and were excluded from correlation calculations, to avoid reporting inflated accuracies.

Predicting TME status using HistoTME

K means clustering algorithm was utilized to cluster patients from the TCGA and CPTAC-NSCLC cohorts into distinct TME subtypes based on HistoTME-predicted microenvironment signatures. Optimal number of clusters for K means clustering was determined using the average silhouette score metric⁷⁶ (Supplementary Fig. 4), which revealed 2 distinct TME subtypes. To ensure the robustness of our findings, we conducted a bootstrap analysis, repeatedly applying the silhouette scoring method to clustering results generated from different randomly sampled subsets of 80% size of the original dataset. This analysis generated an empirical distribution of silhouette scores for each value of K, allowing us to assess the robustness of cluster assignments. The highest silhouette scores consistently occurred for K = 2, providing strong evidence for the existence of two distinct clusters in our data. Finally, a random forest classification model was trained on the clustered data to enable classification of individual patients from the SUNY cohort into distinct TME subtypes. To mitigate issues arising from data distribution shifts across cohorts, all predicted signature expression values were further scaled by their respective dataset-specific means and standard deviations prior to the development of the TME subtype classification model.

Prediction of ICI response utilizing HistoTME-derived microenvironment signatures

Training and test sets were derived from 230 NSCLC patients with matched whole slide imaging data and treatment response labels as described above for supervised machine learning analysis using a stratified random splitting strategy (70% train, 30% test) implemented in scikit-learn package⁷⁷. A total of 161 patients (84 responders, 77 non responders) were assigned to be part of the IO training set and 69 patients (35 responders, 34 non responders) were assigned to be part of the IO test set. A supervised machine learning model was trained to predict immunotherapy response using data from the IO training set. The model takes as input the 30 HistoTME predicted microenvironment signatures. In addition, 1740 interaction features were engineered by taking the sum, difference, product, and quotient of each pair of TME signatures. Since HistoTME signatures include negative and positive values, we take the exponent of each signature prior to computing the product and quotient to maintain monotonic relationships and preserve interpretability. A Random forest feature selection approach was used to select the top k most important features predictive of ICI response from the training set. A gradient-boosted decision tree model, XGBoost⁷⁸, was trained using these selected features to predict ICI response. To identify the best set features for ICI response prediction, we utilized a 5-fold cross-validation strategy on the training data. Here, 80% of the training set was randomly allocated for model training, while the remaining 20% of the training set was reserved for validation purposes. A separate cross-validation iteration was conducted for each set of k features, and the number of boosting rounds was selected to optimize two metrics (minimize logistic loss and maximize AUROC) during cross-validation. Early stopping was set to 100 to select boosting rounds, where additional boosting rounds are not created if the metric does not improve after 100 rounds. The set of features that maximized the cross-validation AUROC was chosen as the final set of features. These features were then utilized along with XGBoost to train the final model on the entire training set. XGBoost learning rate was selected at 0.1, gamma was set to 0.1 to reduce overfitting, and the number of boosting rounds was set to 35 based on cross-validation. SHAP (Shapley Additive exPlanations) was used to interpret the output of the XGBoost model and estimate contribution of each feature⁵⁰.

Additional analyses and statistical tests

The Wilcoxon ranked sum test was used to determine the significance of differences in cellular abundances, as inferred from IHC, between the two predicted TME subtypes. The log-rank test was used to evaluate the prognostic significance of HistoTME in predicting PFS and OS. Response prediction was evaluated by AUROC. 95% Confidence intervals for AUROC were computed by the DeLong approach⁷⁹. The optimal cutpoint of the ROC curve was chosen at the threshold that maximized the Youden index (J)⁸⁰. 95% confidence intervals of precision, recall, and positive predicted value of the response predictions were calculated using exact binomial confidence limits. Kaplan-Meier survival curves were used to visualize the differences in the OS and PFS of AI-predicted responders and non-responders. Hazard ratios between survival groups with 95% CIs were calculated using a univariate Cox proportional hazards regression model. All statistical tests were two-sided, with a p-value less than 0.05 considered statistically significant. All statistical analyses were performed in R version 4.3.1 unless otherwise specified.

Data availability

The TCGA Whole slide Imaging and bulk transcriptomics data is publicly available and can be downloaded from the GDC portal (https://portal.gdc.cancer.gov). The CPTAC lung cancer whole slide imaging data is publicly available at The Cancer Imaging Archive (https://www.cancerimagingarchive.net/collection/cptac-luad, https://www.cancerimagingarchive.net/collection/cptac-lscc), whereas the CPTAC lung cancer bulk transcriptomics data is available from the GDC portal (https://portal.gdc.cancer.gov/projects/CPTAC-3). Whole slide imaging and clinical data from the SUNY Upstate cohort is currently not publicly available owing to restrictions of patient privacy.

Code availability

The codes to implement HistoTME are publicly available at: https://github.com/spatkar94/HistoTME.git.

Change history

09 January 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41698-024-00796-3

References

Gridelli, C. et al. Non-small-cell lung cancer. Nat. Rev. Dis. Prim. 1, 15009 (2015).
Article PubMed Google Scholar
Wakelee, H. et al. Perioperative pembrolizumab for early-stage non-small-cell lung cancer. N. Engl. J. Med 389, 491–503 (2023).
Article CAS PubMed PubMed Central Google Scholar
Spigel, D. R. et al. Five-year survival outcomes from the PACIFIC trial: durvalumab after chemoradiotherapy in stage III non-small-cell lung cancer. J. Clin. Oncol. 40, 1301–1311 (2022).
Article CAS PubMed PubMed Central Google Scholar
Garassino, M. C. et al. Pembrolizumab plus pemetrexed and platinum in nonsquamous non-small-cell lung cancer: 5 Year outcomes from the phase 3 KEYNOTE-189 study. J. Clin. Oncol. 41, 1992–1998 (2023).
Article CAS PubMed PubMed Central Google Scholar
Gunturu, K. S. et al. Immune checkpoint inhibitors: immune-related adverse events, healthcare utilization, and costs among commercial and medicare advantage patients. Support Care Cancer 30, 4019–4026 (2022).
Article PubMed PubMed Central Google Scholar
Martins, F. et al. Adverse effects of immune-checkpoint inhibitors: epidemiology, management and surveillance. Nat. Rev. Clin. Oncol. 16, 563–580 (2019).
Article CAS PubMed Google Scholar
Rimm, D. L. et al. A prospective, multi-institutional, pathologist-based assessment of 4 immunohistochemistry assays for PD-L1 expression in non-small cell lung dancer. JAMA Oncol. 3, 1051–1058 (2017).
Article PubMed PubMed Central Google Scholar
Borghaei, H. et al. Pembrolizumab plus chemotherapy versus chemotherapy alone in patients with advanced non-small cell lung cancer without tumor PD-L1 expression: a pooled analysis of 3 randomized controlled trials. Cancer 126, 4867–4877 (2020).
Article CAS PubMed Google Scholar
Sholl, L. M. et al. Programmed death ligand-1 and tumor mutation burden testing of patients with lung cancer for selection of immune checkpoint inhibitor therapies: guideline from the college of American pathologists, association for molecular pathology, international association for the study of lung cancer, pulmonary pathology society, and LUNGevity foundation. Arch. Pathol. Lab Med. 148, 757–774 (2024).
Article PubMed Google Scholar
Wang, M., Wang, S., Trapani, J. A. & Neeson, P. J. Challenges of PD-L1 testing in non-small cell lung cancer and beyond. J. Thorac. Dis. 12, 4541–4548 (2020).
Article PubMed PubMed Central Google Scholar
Chan, T. A. et al. Development of tumor mutation burden as an immunotherapy biomarker: utility for the oncology clinic. Ann. Oncol. 30, 44–56 (2019).
Article CAS PubMed Google Scholar
Hellmann, M. D. et al. Nivolumab plus Ipilimumab in pung cancer with a high tumor mutational burden. N. Engl. J. Med. 378, 2093–2104 (2018).
Article CAS PubMed PubMed Central Google Scholar
Marcus, L. et al. FDA approval summary: pembrolizumab for the treatment of tumor mutational burden-high solid tumors. Clin. Cancer Res. 27, 4685–4689 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jardim, D. L., Goodman, A., de Melo Gagliato, D. & Kurzrock, R. The challenges of tumor mutational burden as an immunotherapy biomarker. Cancer Cell 39, 154–173 (2021).
Article CAS PubMed Google Scholar
Tumeh, P. C. et al. PD-1 blockade induces responses by inhibiting adaptive immune resistance. Nature 515, 568–571 (2014).
Article CAS PubMed PubMed Central Google Scholar
Taube, J. M. et al. Colocalization of inflammatory response with B7-h1 expression in human melanocytic lesions supports an adaptive resistance mechanism of immune escape. Sci. Transl. Med. 4, 127ra137 (2012).
Article Google Scholar
Liu, Y. T. et al. Immune cell PD-L1 colocalizes with macrophages and is associated with outcome in PD-1 pathway blockade therapy. Clin. Cancer Res. 26, 970–977 (2020).
Article CAS PubMed Google Scholar
Yuan, Y. Spatial heterogeneity in the tumor microenvironment. Cold Spring Harb. Perspect Med. 6, a026583 (2016).
Lundberg, E. & Borner, G. H. H. Spatial proteomics: a powerful discovery tool for cell biology. Nat. Rev. Mol. Cell Biol. 20, 285–302 (2019).
Article CAS PubMed Google Scholar
Sorin, M. et al. Single-cell spatial landscapes of the lung tumour immune microenvironment. Nature 614, 548–554 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cords, L. et al. Cancer-associated fibroblast phenotypes are associated with patient outcome in non-small cell lung cancer. Cancer Cell 42, 396–412.e395 (2024).
Article CAS PubMed PubMed Central Google Scholar
Mohanty, C. et al. SpatialView: an interactive web application for visualization of multiple samples in spatial transcriptomics experiments. Bioinformatics 40, btae117 (2024).
Ren, L. et al. Applications of single‑cell omics and spatial transcriptomics technologies in gastric cancer (Review). Oncol. Lett. 27, 152 (2024).
Article CAS PubMed PubMed Central Google Scholar
Cao, L. et al. Deciphering spatial domains from spatially resolved transcriptomics with Siamese graph autoencoder. Gigascience 13, giae003 (2024).
Yu, Q., Jiang, M. & Wu, L. Spatial transcriptomics technology in cancer research. Front Oncol. 12, 1019111 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q. et al. The spatial transcriptomic landscape of non-small cell lung cancer brain metastasis. Nat. Commun. 13, 5983 (2022).
Article PubMed PubMed Central Google Scholar
Wang, X. et al. Spatial interplay patterns of cancer nuclei and tumor-infiltrating lymphocytes (TILs) predict clinical benefit for immune checkpoint inhibitors. Sci. Adv. 8, eabn3966 (2022).
Article CAS PubMed PubMed Central Google Scholar
Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193.e187 (2018).
Article CAS PubMed PubMed Central Google Scholar
Abousamra, S. et al. Deep learning-based mapping of tumor infiltrating lymphocytes in whole slide images of 23 types of cancer. Front Oncol. 11, 806603 (2021).
Article PubMed Google Scholar
Graham, S. et al. Hover-net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med Image Anal. 58, 101563 (2019).
Article PubMed Google Scholar
Diao, J. A. et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat. Commun. 12, 1613 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med 30, 850–862 (2024).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
Article PubMed Google Scholar
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Article CAS PubMed PubMed Central Google Scholar
Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).
Article PubMed Google Scholar
Vorontsov, E. et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 30, 2924–2935 (2024).
Wang, X. et al. RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval. Med. Image Anal. 83, 102645 (2023).
Article PubMed Google Scholar
Bagaev, A. et al. Conserved pan-cancer microenvironment subtypes predict response to immunotherapy. Cancer Cell 39, 845–865.e847 (2021).
Article CAS PubMed Google Scholar
Ayers, M. et al. IFN-gamma-related mRNA profile predicts clinical response to PD-1 blockade. J. Clin. Invest. 127, 2930–2940 (2017).
Article PubMed PubMed Central Google Scholar
Ilse, M., Tomczak, J. & Welling, M. In International Conference on Machine Learning 2127–2136 (PMLR, 2024).
Pascal, L. E. et al. Correlation of mRNA and protein levels: cell type-specific gene expression of cluster designation antigens in the prostate. BMC Genom. 9, 246 (2008).
Article Google Scholar
Petitprez, F. et al. B cells are associated with survival and immunotherapy response in sarcoma. Nature 577, 556–560 (2020).
Article CAS PubMed Google Scholar
Chen, D. S. & Mellman, I. Elements of cancer immunity and the cancer-immune set point. Nature 541, 321–330 (2017).
Article CAS PubMed Google Scholar
Cristescu, R. et al. Pan-tumor genomic biomarkers for PD-1 checkpoint blockade-based immunotherapy. Science 362, eaar3593 (2018).
Park, S. et al. Artificial intelligence-powered spatial analysis of tumor-infiltrating lymphocytes as complementary biomarker for immune checkpoint inhibition in non-small-cell lung cancer. J. Clin. Oncol. 40, 1916–1928 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ding, R. et al. Image analysis reveals molecularly distinct patterns of TILs in NSCLC associated with treatment outcome. NPJ Precis Oncol. 6, 33 (2022).
Article CAS PubMed PubMed Central Google Scholar
Horst, F. et al. CellViT: Vision Transformers for precise cell segmentation and classification. Med. Image Anal. 94, 103143 (2024).
Article PubMed Google Scholar
Gataa, I. et al. Tumour-infiltrating lymphocyte density is associated with favourable outcome in patients with advanced non-small cell lung cancer treated with immunotherapy. Eur. J. Cancer 145, 221–229 (2021).
Article CAS PubMed Google Scholar
Sadeghi Rad, H. et al. Understanding the tumor microenvironment for effective immunotherapy. Med. Res. Rev. 41, 1474–1498 (2021).
Article PubMed Google Scholar
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inform. Proc. Syst. https://doi.org/10.48550/arXiv.1705.07874 (2017).
El Nahhas, O. S. M. et al. Regression-based deep-learning predicts molecular biomarkers from pathology slides. Nat. Commun. 15, 1253 (2024).
Schmauch, B. et al. A deep learning model to predict RNA-Seq expression of tumours from whole slide images. Nat. Commun. 11, 3877 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hoang, D. T. et al. Prediction of cancer treatment response from histopathology images through imputed transcriptomics. Res. Sq. https://doi.org/10.21203/rs.3.rs-3193270/v1 (2023).
Article PubMed PubMed Central Google Scholar
He, B. et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat. Biomed. Eng. 4, 827–834 (2020).
Article CAS PubMed Google Scholar
Tan, X. et al. STimage:robust, confident and interpretable models for predicting gene markers from cancer histopathological images. bioRxiv https://doi.org/10.1101/2023.05.14.540710 (2023).
van Eekelen, L. et al. Comparing deep learning and pathologist quantification of cell-level PD-L1 expression in non-small cell lung cancer whole-slide images. Sci. Rep. 14, 7136 (2024).
Article PubMed PubMed Central Google Scholar
Sha, L. et al. Multi-field-of-view deep learning model predicts nonsmall cell lung cancer programmed death-ligand 1 status from whole-slide hematoxylin and eosin images. J. Pathol. Inf. 10, 24 (2019).
Article Google Scholar
Ligero, M. et al. Weakly supervised deep learning predicts immunotherapy response in solid tumors based on PD-L1 expression. Cancer Res. Commun. 4, 92–102 (2024).
Article PubMed PubMed Central Google Scholar
Brahmer, J. et al. Nivolumab versus docetaxel in advanced squamous-cell non-small-cell lung cancer. N. Engl. J. Med. 373, 123–135 (2015).
Article CAS PubMed PubMed Central Google Scholar
Borghaei, H. et al. Nivolumab versus docetaxel in advanced nonsquamous non-small-cell lung cancer. N. Engl. J. Med. 373, 1627–1639 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rittmeyer, A. et al. Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): a phase 3, open-label, multicentre randomised controlled trial. Lancet 389, 255–265 (2017).
Article PubMed Google Scholar
Kamphorst, A. O. et al. Rescue of exhausted CD8 T cells by PD-1-targeted therapies is CD28-dependent. Science 355, 1423–1427 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jenkins, R. W., Barbie, D. A. & Flaherty, K. T. Mechanisms of resistance to immune checkpoint inhibitors. Br. J. Cancer 118, 9–16 (2018).
Article CAS PubMed PubMed Central Google Scholar
Skapenko, A., Lipsky, P. E., Kraetsch, H. G., Kalden, J. R. & Schulze-Koops, H. Antigen-independent Th2 cell differentiation by stimulation of CD28: regulation via IL-4 gene expression and mitogen-activated protein kinase activation. J. Immunol. 166, 4283–4292 (2001).
Article CAS PubMed Google Scholar
Hu, J. et al. Using deep learning to predict anti-PD-1 response in melanoma and lung cancer patients from histopathology images. Transl. Oncol. 14, 100921 (2021).
Article CAS PubMed Google Scholar
Jaume, G. et al. HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis. The Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2024).
Binnewies, M. et al. Understanding the tumor immune microenvironment (TIME) for effective therapy. Nat. Med. 24, 541–550 (2018).
Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst., Man, Cybern. 9, 62–66 (1979).
Article Google Scholar
Wagner, S. J. et al. Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study. Cancer Cell 41, 1650–1661.e1654 (2023).
Article CAS PubMed PubMed Central Google Scholar
Loeffler, C. M. L. et al. Direct prediction of homologous recombination deficiency from routine histology in ten different tumor types with attention-based multiple instance learning: a development and validation study. medRxiv https://doi.org/10.1101/2023.03.08.23286975 (2023).
Article PubMed PubMed Central Google Scholar
Macenko, M. et al. 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro (Boston, MA, 2009).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1711.05101 (2017).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inform. Proc. Syst. https://doi.org/10.48550/arXiv.1912.01703 (2019).
Bankhead, P. et al. QuPath: Open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).
Article PubMed PubMed Central Google Scholar
Rousseeuw, P. J. Silhouettes - a graphical aid to the interpretation and validation of cluster-analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Chen, T. & Carlos G. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794 (2016).
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).
Article CAS PubMed Google Scholar
Fluss, R., Faraggi, D. & Reiser, B. Estimation of the Youden index and its associated cutoff point. Biom. J. 47, 458–472 (2005).
Article PubMed Google Scholar

Download references

Acknowledgements

This project was supported by an award from Upstate Foundation’s Hendricks Endowment. Data (digital images and clinical meta-data) from the institutional cohort was generated at the Pathology Research Core Lab using institutional resources and support.

Funding

Open access funding provided by the National Institutes of Health.

Author information

These authors jointly supervised this work: Sushant Patkar, Tamara Jamaspishvili.
These authors contributed equally: Sushant Patkar, Alex Chen.

Authors and Affiliations

Artificial Intelligence Resource (AIR), National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
Sushant Patkar, Alex Chen, Peter Choyke, Stephanie Harmon & Baris Turkbey
Department of Hematology and Oncology, SUNY Upstate Medical University, Syracuse, NY, USA
Alina Basnet, Prashanth Ashok Kumar & Devashish Desai
Department of Pathology and Laboratory Medicine, SUNY Upstate Medical University, Syracuse, NY, USA
Amber Bixby, Rahul Rajendran, Rachel Chernet, Susan Faso, Ola El-Zammar, Christopher Curtiss, Saverio J. Carello, Michel R. Nasr & Tamara Jamaspishvili

Authors

Sushant Patkar
View author publications
Search author on:PubMed Google Scholar
Alex Chen
View author publications
Search author on:PubMed Google Scholar
Alina Basnet
View author publications
Search author on:PubMed Google Scholar
Amber Bixby
View author publications
Search author on:PubMed Google Scholar
Rahul Rajendran
View author publications
Search author on:PubMed Google Scholar
Rachel Chernet
View author publications
Search author on:PubMed Google Scholar
Susan Faso
View author publications
Search author on:PubMed Google Scholar
Prashanth Ashok Kumar
View author publications
Search author on:PubMed Google Scholar
Devashish Desai
View author publications
Search author on:PubMed Google Scholar
Ola El-Zammar
View author publications
Search author on:PubMed Google Scholar
Christopher Curtiss
View author publications
Search author on:PubMed Google Scholar
Saverio J. Carello
View author publications
Search author on:PubMed Google Scholar
Michel R. Nasr
View author publications
Search author on:PubMed Google Scholar
Peter Choyke
View author publications
Search author on:PubMed Google Scholar
Stephanie Harmon
View author publications
Search author on:PubMed Google Scholar
Baris Turkbey
View author publications
Search author on:PubMed Google Scholar
Tamara Jamaspishvili
View author publications
Search author on:PubMed Google Scholar

Contributions

S.P. and T.J. designed the study. A.B., A.B., R.R., R.C., S.F., P.K., D.D., O.E.Z, C.C., S.C., M.N., and T.J. curated and acquired the clinical and whole slide imaging data of SUNY patients. S.P. and A.C. developed and implemented the AI model, conducted the validation experiments and analyzed the clinical data. O.E.C., C.C., and T.J. reviewed AI model predictions on pathology slides of patients. S.P. and A.C. wrote the first draft of the manuscript. P.C., S.H., B.T. and T.J. supervised the project.

Corresponding authors

Correspondence to Sushant Patkar or Tamara Jamaspishvili.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Patkar, S., Chen, A., Basnet, A. et al. Predicting the tumor microenvironment composition and immunotherapy response in non-small cell lung cancer from digital histopathology images. npj Precis. Onc. 8, 280 (2024). https://doi.org/10.1038/s41698-024-00765-w

Download citation

Received: 15 July 2024
Accepted: 18 November 2024
Published: 19 December 2024
DOI: https://doi.org/10.1038/s41698-024-00765-w