Introduction

Lung cancer is the most common cancer worldwide and one of the leading causes of cancer-related mortality1. Non-small cell lung cancer is the major histological subtype of lung cancer, with lung adenocarcinoma (LUAD) being the predominant subtype. Over the past three decades, the 5-year overall survival rate of lung cancer has remained below 20%2. Despite undergoing surgical treatment, the risk of recurrence or metastasis is high in early-stage LUAD patients3.

The phenomenon of spread through air spaces (STAS), which represents a novel and unique pattern of intrapulmonary metastasis, was first reported in 19804. In 2015, the World Health Organization (WHO) officially included the STAS concept5. STAS refers to tumor cell dissemination within the pulmonary parenchyma beyond the primary tumor margins6, and it involves the spread of tumor cells in three forms: single cells, micropapillary clusters, and solid nests5.

Numerous retrospective clinical studies have reported that STAS is associated with poorer prognosis, higher recurrence risk, and more advanced clinicopathological staging7,8,9,10. Although STAS is considered an independent high-risk factor for LUAD, its impact on the need for adjuvant therapy in surgically treated early-stage LUAD patients remains uncertain. Adjuvant therapy may be beneficial for Stage IB STAS-positive patients, whereas the benefits of adjuvant therapy for Stage IA patients are controversial11,12,13. Only moderate interobserver agreement exists among physicians in STAS diagnosis for LUAD patients who underwent surgery14. However, Hironori Uruga and Tamás Zombori suggested that an increased number of STAS in early LUAD results in a worse prognosis15,16. Additionally, two studies illustrated the correlation between the spatial distance of STAS from the primary tumor and prognosis. They reported that greater distances of STAS were associated with poorer outcomes8,17. Currently, laborious and time-consuming tasks such as STAS counting or spatial distance assessment are not routine.

Accurately diagnosing and quantitatively analyzing STAS can help predict patients recurrence and resolve the current dilemma associated with treatment decision-making for patients with early-stage LUAD. However, the aforementioned barriers, limited inter-observer agreement and laborious and time-consuming STAS quantification, hinder the clinical use of STAS. There is an urgent need for an automated and reliable method to diagnose and quantify STAS, which is essential to address the treatment-related challenges currently faced in its clinical application.

Deep learning, a branch of artificial intelligence (AI), models have made remarkable inroads across diverse industries, including the analysis of digital pathology slides. These models have been employed for various applications, such as detecting lesions18, classifying lesions19, predicting genetic mutation types20, and assessing patient prognosis21. Deep learning models also exhibit superior performance in terms of quantitative image analysis, thereby correctly measuring the percentage of each tissue on the whole slide image (WSI) of benign breast cancer biopsy samples22 and establishing a quantitative criterion for lumbar disc degeneration on magnetic resonance imaging23. Our team also constructed a deep learning model for distinguishing the morphological characteristics of the five pathological subtypes of LUAD with quantitative and spatial parameters24. When implemented an efficient AI model can provide diagnostic support and enhance interobserver consistency among physicians24. Furthermore, AI can aid physicians in effortlessly conducting those laborious, challenging, and time-consuming tasks.

To address the current concern related to STAS detection so as to assist the clinical diagnosis of STAS characteristics in patients with early-stage LUAD, we successfully developed a robust deep learning model, STASNet, for STAS detection. It possesses excellent diagnostic capabilities for STAS, allowing for quantitative analysis and enhancing the risk stratification ability of STAS. To the best of our knowledge, this is the first study constructing a deep learning model for STAS detection and quantitation by using digitized H&E WSI.

Methods

Dataset establishment

This study was conducted after receiving approval from the Institutional Review Board of Jiangsu Cancer Hospital and Jiangsu Institute of Cancer Research. The patient’s informed consent was waived because of the study’s retrospective nature. Three pathological image datasets were constructed, namely the internal dataset from the Affiliated Cancer Hospital of Nanjing Medical University (ACHNJMU), the external dataset from Taizhou Hospital of Traditional Chinese Medicine (THTCM), and the public dataset from The Cancer Genome Atlas (TCGA).

For the ACHNJMU dataset, we collected 349 WSIs corresponding to 119 patients from among those who underwent lung surgery between 2016 and 2020. For the validation dataset THTCM, we collected 245 WSIs corresponding to 90 patients from among those who underwent lung surgery from 2018 to 2020. For the TCGA-LUAD dataset, we downloaded 540 WSIs corresponding to 478 patients from the TCGA-LUAD cohort in February 2022. Based on our research objectives, only patients who met the following criteria were included in the study: (i) a clear diagnosis of LUAD, (ii) corresponding routine pathological sections, (iii) detailed follow-up information; and (iv) pathological staging of stage IA–IB. The WSIs were excluded if they (i) had multiple foci on a single slide; (ii) involved no adjacent normal lung tissues (the normal tissue of the included H&E slides should be at least five normal alveoli away from the tumor); (iii) involved no main tumor area; (iv) had poor quality, such as being bent, wrinkled, or blurred, or having color variation. In total, 268 patients and their corresponding 489 WSIs were included. Supplementary Fig. 1A–C presents the detailed inclusion and exclusion process for each dataset, and Supplementary Fig. 1D presents the number of patients and WSIs included in each dataset. The last date for follow-up was March 06, 2023, with a median follow-up period of 41.3 months (95% confidence interval (CI): 38.1–46.8 months) for the ACHNJMU dataset and 34.9 months (95% CI: 33.5–35.8 months) for the THTCM dataset.

Labeling and preprocessing

Slides in the ACHNJMU and THTCM datasets were scanned and digitized using a KF-PRO-400 scanner (KFBIO, Yuyao, Zhejiang, P.R.C) at 40× magnification by utilizing a 20× brightfield objective lens with a 0.2418-μm/pixel resolution. The scanned slides were exported as a single-file pyramidal tiled TIFF image file encoded in the SVS format and subsequently processed into single-lesion WSI encoded in the OME TIFF format25,26.

In this study, the gold standard for STAS diagnosis was based on the 2015 WHO criteria: tumor cells in airspaces exceeding the main tumor boundary. Diagnosis in all patients was made based on the same golden standard. Two experienced pathologists meticulously annotated the boundaries of the whole tumor lesion and the STAS-containing regions of interest (ROI) in the WSIs by using QuPath v0.326. In cases of disagreement, consensus was reached through discussion with another experienced pathologist. LUAD epithelial cells were labeled with CK7 (ab68459) to confirm difficult-to-determine STAS (Supplementary Fig. 1). By using the Python packages OpenSlide v1.1.2 and Pillow v8.4.0, the annotated WSIs were divided into tiles of varying resolutions27. The pathologists then reviewed and cleaned the tile dataset to ensure labeling accuracy and reliability. Consequently, a substantial dataset comprising STAS and normal tiles at each resolution was obtained (Supplementary Table 2).

The tile sizes were configured as 128 × 128, 256 × 256, and 512 × 512 pixels at 40× magnification. Figure 1 illustrates an example of tiles representing the normal, main tumor, and STAS regions. To mitigate overfitting, a data augmentation workflow that involved implementing five histological image augmentation techniques, namely random Gaussian blur, random affine transformation, random elastic transformation, random HED color space perturbation, and random gamma transformation, was employed28,29.

Fig. 1: Dataset characteristics and pre-training preparation for model.
figure 1

A The comprehensive study flowchart. B Patients and WSIs number of three cohorts. C Tiles of normal, tumor, and STAS at 128*128, 256*256, and 512*512 pixels.

Model architecture

In this study, four deep learning models, finetuned using ImageNet, were used for image feature extraction and class prediction: MobileNet V330, Swin Transformer31, DenseNet12132, and ResNet1833. After comparing the accuracy, time consumed, and number of parameters of each model at each resolution, we selected MobileNetV3 at a 256 × 256 resolution as the main model for further analysis. Figure 2A presents the detailed framework of the MobileNetV3 model. The Bneck Block, the key block of MobileNetV3, segregated the conventional convolution operation into depth-wise convolution and 1 × 1 point-wise convolution. This segregation considerably reduced the number of parameters and computational complexity while maintaining relatively good model performance, which thus augmented the model efficiency. Additionally, the Bneck Block enhanced the significance and weight discrimination of input features by incorporating the Squeeze-and-Excitation (SE Block) attention mechanism module, which thus improved model performance. To take full advantage of the subtle feature and better classify the complex pathological images, a 4-layered neural network was attached to the model, and the dropout rates of the two hidden layers were set to 0.2 to avoid overfitting.

Fig. 2: Development of deep learning models and WSI workflow.
figure 2

A Detailed architecture of STASNet. B Whole slide work flowchart based on STASNet.

Training detail and establishment of the WSI prediction workflow

The ACHNJMU tiles dataset was randomly split into the ACHNJMU training and validation sets in a 7:3 ratio. The ACHNJMU training set was used to train the deep learning model, and the ACHNJMU validation set served as an internal validation. To verify the generalization capacity of the model, another THTCM tiles dataset was included as an external dataset. To evaluate the model, a dataset comprising 10,000 meticulously selected tile images (5000 STAS, 5000 non-tumor) was created, designated as the ACHNJMU public dataset, and made publicly available in the National Genomics Data Center database (Project Number: PRJCA020924). During training, a three-fold cross-validation strategy was implemented. According to this strategy, the training dataset was randomly divided into three subsets, two subsets were used for model finetuning, and the training results were evaluated on the remaining subset. The cross-entropy loss function and the Adam optimizer were used for model optimization. For each fold of the model, 100 epochs were trained, and the accuracy, time consumed, and parameter size of the model were simultaneously recorded.

The model’s efficacy at the WSI level was also evaluated using the ACHNJMU, THTCM, and TCGA-LUAD datasets. To determine whether STAS was present on the WSI scale, a WSI STAS detection workflow was created based on the handcrafted annotations of the main tumor regions (Fig. 2B). The WSIs were categorized into background tiles, normal tiles, and main tumor tiles. Specifically, when the values of R, G, and B all exceed 220 the image tile will be close to white, representing a background area with no tissue coverage. Based on the main tumor region labels, the remaining tiles were segregated into normal and main tumor tiles. The main tumor tiles were represented in blue, normal tiles were in green, and the top 10 tiles predicted to be STAS were in hematein color. The tile intensity varied based on the proximity of the tiles to the main tumor border, with darker tiles further away and lighter tiles closer to the border. Each dot on the plot represented a tile, and its size indicated the likelihood of STAS. The largest dots represented the top 10 points with the highest likelihood of STAS. Finally, the tile predictions were remapped onto the original WSI to display the results as a heatmap-style point plot, thereby allowing for a comprehensive assessment of the presence and distribution of STAS across the WSI.

Identification and evaluation of the STAS-related quantitative score

To predict the presence of STAS in the WSIs of LUAD, a workflow was created to perform the WSI-level STAS prediction. (1) Input: A digitized hematoxylin and eosin (H&E)-stained WSI was provided as the input. (2) Label: A pathologist created a simple outline delineating the main tumor and normal tissue. (3) Tile cutting: A WSI was segmented into tiles of size 256 × 256 pixels. (4) STASNet (a STAS deep learning model): The STASNet was used to detect and assign an STAS score to each tile. Tiles with an STAS score of >0.5 were considered to represent STAS. The top 10 tiles with the highest STAS scores are indicated. Additionally, the pixel distance of each tile from the main tumor’s boundary was calculated. (5) Results Output: Several semi-quantitative scores were derived based on the STASNet’s recognition results:

$${{\rm{STAS}}\; {\rm{num}}={\rm{{Number}}\; {\rm{of}}\; {\rm{tiles}}\; {\rm{in}}\; {\rm{non}}\; {\rm{tumor}}\; {\rm{area}}\; {\rm{with}}\; {\rm{STAS}}}}$$
$${{\rm{STAS}}\; {\rm{to}}\; {\rm{tumor}}=\frac{{{\rm{STAS}}\; {\rm{num}}}}{{\rm{{tumor}}\; {\rm{num}}}}}$$
$${{\rm{STAS}}\; {\rm{to}}\; {\rm{normal}}=\frac{{{\rm{STAS}}\; {\rm{num}}}}{{{\rm{Normal}}\; {\rm{num}}}}}$$
$${{\rm{STAS}}\; {\rm{to}}\; {\rm{ALL}}}=\frac{{{\rm{STAS}}\; {\rm{num}}}}{{{\rm{Normal}}\; {\rm{num}}}+{{\rm{Tumor}}\; {\rm{num}}}}$$
$${{\rm{All}}\; {\rm{score}}}=\sum {{\rm{AI}}\; {\rm{score}}\; {\rm{of}}\; {\rm{STAS}}\; {\rm{tile}}}* {{\rm{tile}}\; {\rm{distance}}}$$
$${{\rm{{Top}}10{{\rm{Score}}}({\rm{T}}10S)={\sum }_{{{\rm{Top}}}10{{\rm{tiles}}}}{{\rm{AI}}\; {\rm{score}}\; {\rm{of}}\; {\rm{STAS}}\; {\rm{tile}}}* {{\rm{tile}}\; {\rm{distance}}}}}$$
$${{{\rm{Max}}\; {\rm{STAS}}\; {\rm{distance}}}={{\rm{Max}}\; {\rm{tile}}\; {\rm{distance}}\; {\rm{of}}\; {\rm{the}}\; {\rm{STAS}}\; {\rm{tiles}}}}$$

(Tumor num = Number of tiles in main tumor area; Normal num = Number of tiles in the non-tumor area with no STAS; Tile distance = The distance between the tile and main tumor boundary).

If a patient has multiple WSIs, the maximum of each score was selected as the patient’s fine score. The STAS status of a patient was determined based on the four scores directly related to the STAS number, and all seven scores were employed to predict their recurrence risk. The receiver operating characteristic curve16 analysis was performed to predict the STAS status. To compare the predictive capacity of disease-free survival (DFS), Harrell’s concordance index (C-index) and time-dependent area under the ROC curve34 analysis were used. Additionally, Kaplan–Meier survival curves were plotted to present differences in survival rates. Univariate and multivariate Cox regression analyses were performed to ascertain the significance of STAS-related quantitative scores in risk assessment.

Statistical analysis

Data processing, statistical analysis, and plotting were all performed using Python (v3.7) and R (v4.0.5) software. The performance of the deep learning model was evaluated by analyzing the confusion matrix and recording relevant metrics. Statistical calculations were performed using the scikit-learn 1.0.1 machine learning module and the PyCM 3.5 library in Python 3.7.1235. The C-index was calculated using the survival (v3.4) package in R. The ROC analysis for predicting binary categorical variables was performed using the pROC (v1.18) package. Optimal cut-off values were determined using the survminer (v0.4) package in R. The time-dependent AUC analysis for survival variables was performed using the timeROC (v0.4) package. Univariate and multivariate Cox regression analyses, Kaplan–Meier analysis, and the log-rank test were performed using the survival (v3.4) package in R. p < 0.05 was considered statistically significant.

Results

Dataset characteristics and pre-training preparation for the model

Figure 1A illustrates the comprehensive study flowchart. The digitized H&E-stained LUAD pathology slides were used to construct an effective deep-learning model for detecting and semi-quantitatively analyzing STAS. The model was applied to assist LUAD diagnosis and treatment. The study involved 249 WSIs of 81 patients from the ACHNJMU dataset, 125 WSIs of 72 patients from the THTCM dataset, and 115 WSIs of 115 patients from the TCGA-LUAD dataset (Fig. 1B). Supplementary Table 1 presents the distribution of baseline levels among the three centers. During the study, the ACHNJMU dataset was used for training the deep learning model, while the THTCM and TCGA-LUAD datasets served as two external validation datasets. The pathologists labeled the primary tumor, normal tissue, and STAS regions in the WSIs. On comparison, a high STAS diagnostic concordance was noted between the two pathologists (Kappa = 0.704, Supplementary Fig. 2A). In cases of disagreement, consensus was reached through discussion with another experienced pathologist. For the illegible areas, immunohistochemistry was performed to attain further clarification (Supplementary Fig. 3A).

To accommodate the pathologist’s recommendations and the model’s specifications, three tile resolutions were determined: 128 × 128, 256 × 256, and 512 × 512 pixels (Fig. 1C). Based on the manual annotation of the three resolutions, a total of 1,72,825; 55,942; and 19,279 STAS tiles and 5,48,960; 1,71,905; and 53,021 normal tiles were obtained, respectively (Supplementary Table 2). To improve the model’s generalizability and ensure that its performance is robust across various datasets, appropriate data augmentation techniques were implemented before the model was trained (Supplementary Fig. 4A). This was performed to account for any potential alterations occurring to the lesions during filming and scanning in a real-world scenario.

Development of deep learning models and WSI workflow

Four deep learning architectures, namely ResNet18, DenseNet121, MobileNetV3, and Swin Transformer were used in our study. The deep learning model was trained on the ACHNJMU training tile dataset, and its performance was evaluated on the ACHNJMU validation dataset and the external cohort THTCM tile dataset. We recorded the parameter size, training speed, and validation accuracy by using the ACHNJMU training dataset (Supplementary Fig. 5A–C). Then, the efficiency of these models was compared at three resolutions. At the given resolution pixels, MobileNetV3, DenseNet121, ResNet18, and Swin Transformer achieved accuracies of 0.857–0.933 (Supplementary Fig. 5A). Notably, MobileNetV3 attained the highest accuracy of 0.933 at the 256 × 256 pixel resolution. It also exhibited a smaller parameter size and faster training speed (Supplementary Fig. 5B, C). We further validated the model’s accuracy by using the ACHNJMU validation tile dataset and the THTCM tile dataset (Supplementary Fig. 5D, E). At the same time, MobileNetV3 achieved faster validating speed and excellent discriminatory performance, with the highest AUC, sensitivity, and specificity among all subgroups (Supplementary Fig. 5F–I). Regarding the model’s ability to discriminate between STAS and non-STAS, the AUC for the ACHNJMU validation dataset was 0.872 (cutoff value = 0.5, specificity = 92.6%, sensitivity = 66.7%) and that for the THTCM dataset was 0.898 (cutoff value = 0.5, specificity = 92.6%, sensitivity = 69.1%). The same performance was observed when MobileNetV3 was further evaluated using the ACHNJMU public dataset (Supplementary Fig. 5J, K), with an AUC of 0.92 (cutoff value = 0.5, specificity = 99.4%, sensitivity = 86.5%).

Because MobileNetV3 produced appreciable results, the framework was used as STASNet for further research (Fig. 2A). Considering the promising capacity of STASNet, the model was deployed at the WSI level to perform STAS detection, and a five-step workflow was established: Input, Label, Tile Cutting, STASNet, and Results output (Fig. 2B). MobileNetV3 embedded in STASNet has a stable STAS discrimination capability in the tile resolution. However, in the WSI-level predictions, excessively large image sizes resulted in numerous image tiles during segmentation. This, in turn, led to a significant number of minor misclassification-induced false-positive results. Moreover, relying solely on a binary prediction threshold of 0.5 for an individual image tile to determine the presence of STAS in the entire WSI is impractical. To evaluate STAS in a WSI-level view, seven STAS-related scores were introduced, including four scores directly related to the STAS number (STAS num, STAS to Tumor, STAS to Normal, STAS to All) and three space-related semi-quantitative scores (All score, T10S, and Max STAS distance). These semi-quantitative parameters have been defined in the METHODS section. T10S is the sum of the products of distances between the tiles in the top 10 STAS probability scores and tumor boundaries (Supplementary Fig. 6A). These scores were then evaluated in STAS detection and prognostic prediction.

STASNet could accurately identify STAS

We first determined the performance of STASNet on the three independent WSI datasets. As shown in Fig. 3A, D, WSIs from the THTCM and ACHNJMU datasets exhibited similar imaging styles, as they originated from uniform scanning conditions. However, the public TCGA-LUAD dataset exhibited darker staining, indicating an imaging style distinct from those of the other two datasets (Fig. 3G). We then displayed the MobileNetV3 result at WSI levels across the three datasets and highlighted the tiles with the highest prediction (Fig. 3B, E, H, respectively). The closer the tiles were to the main tumor, the more the tumor tended to STAS. Thus, the distances between all STAS tiles and the main tumor across the three datasets were comprehensively analyzed. The STAS tiles were predominantly concentrated around the main tumor’s vicinity (Supplementary Fig. 7A–C). This is consistent with the speculation that STAS originates from the main tumor and shares the same morphological characteristics as tumor cells. Consequently, there is less STAS with increasing distance from the primary tumor. The tiles with the highest STAS scores, as predicted by STASNet, were further inspected (Fig. 3C, F, I). Typical STAS structures were observed in the highlighted tiles, which further validated the reliability of our model. However, some histiocytes in the TCGA-LUAD cohort affected the identification results because of darker staining styles.

Fig. 3: STASNet was able to accurately identify the STAS.
figure 3

A, D, G Represent digitated H&E of LUAD in three cohorts. B, E, H The result of STASNet at the represent digitated H&E. C, F, I The represent tiles (256*256 pixel) of the top 10 tiles of STAS score.

Based on the WSI workflow described in Fig. 2B, the four aforementioned semi-quantitative scores that are directly related to the STAS number were calculated. At WSI levels, the ROC analysis yielded AUC values of 0.7184–0.7523 for the ACHNJMU cohort (Supplementary Fig. 8A), 0.7404–0.7707 for the THTCM cohort (Supplementary Fig. 8B), and 0.5939–0.7215 for the TCGA-LUAD cohort (Supplementary Fig. 8C). The semi-quantitative metrics for STAS status interpretation indicated higher sensitivity than specificity (Supplementary Fig. 8D–F). This result is consistent with earlier findings for the tile dataset. The score for each patient was determined on the basis of all the WSIs they corresponded to. This is similar to the clinical practice where multiple H&E-stained sections are cut for each patient. The pathologist combined the results of multiple WSIs to evaluate the STAS status of the patient. At patient levels, the ROC analysis yielded AUC values of 0.7656–0.7814 and 0.7374–0.7786 for the ACHNJMU (Supplementary Fig. 8G) and THTCM (Supplementary Fig. 8H) cohorts, respectively. According to the data, the overall AUC was slightly higher at the patient level than at the WSI level. Similarly, the sensitivity of the semi-quantitative indicators was significantly higher than their specificity (Supplementary Fig. 8I, J). Because of the limitation of having only one WSI per patient in the TCGA dataset, the patient-level results were consistent with the WSI-level results. Thus, the results indicated that MobileNetV3 successfully identified the STAS status of the patients. The relatively limited model performance with the TCGA-LUAD cohort may be attributable to it being significantly different from the training dataset, as well as the fact that a single WSI of each patient in this cohort was available for analysis.

Space-related T10S is an excellent recurrence predictor

Given the substantial variations in prognosis and clinicopathological characteristics associated with different STAS statuses7,8,9,10, we hypothesized that STAS predictions generated by STASNet can potentially serve as a reliable prognostic indicator. The efficacy of the seven scores in predicting the recurrence time of LUAD in patients was evaluated. The univariate Cox regression analysis of all three datasets indicated that the elevated scores on each of the seven scores corresponded to the increased recurrence risk, with T10S exhibiting the highest HR value (Supplementary Fig. 9A). The C-index analysis revealed that T10S had the highest predictive value for LUAD recurrence across all three datasets (Supplementary Fig. 9B, D, F). The AUC results for the 1-, 2-, and 3-year DFS also revealed that T10S had a high ability to predict recurrence in the ACHNJMU, THTCM, and TCGA-LUAD cohorts (AUC: 0.63–0.8) (Supplementary Fig. 9C, E, G). The T10S-based survival analysis revealed that recurrence times were significantly shorter in patients with higher T10S (p < 0.01) (Supplementary Figs. 10A, 11A, 12A). According to the subgroup survival analysis focusing on STAS-positive patients, T10S was exceptionally effective in stratifying this subgroup (p < 0.01) (Supplementary Figs. 10B, 11B, 12B). Understandably, T10S exhibited no stratification efficiency for STAS-negative patients (p > 0.05) (Supplementary Figs. 10C, 11C, 12C). The subgroup survival analysis based on different pathology stages unveiled that stratification was more effective in stage IB patients, likely because the overall recurrence risk was lower in stage IA patients (Supplementary Figs. 10D, E, 11D, E, 12D, E).

Pathological diagnostic results are clinically used as the gold standard for determining the STAS-positive or STAS-negative status. Therefore, the ability of T10S and STAS to predict the recurrence risk in LUAD patients was compared. The survival analysis was conducted by combining the results of the three datasets. According to the study results, both T10S and STAS effectively differentiated the recurrence risk in patients from the overall patient population (p < 0.05) (Fig. 4A, D) and stage IA subgroups (p < 0.05) (Fig. 4B, E). T10S exhibited a more significant stratification effect. However, STAS could not predict the recurrence risk in stage IB patients (p > 0.05) (Fig. 4C), whereas T10S exhibited a significant predictive power (p < 0.001) (Fig. 4F). The C-index analysis and 1-, 2-, and 3-year ROC analyses revealed that T10S significantly outperformed STAS in predicting recurrence in patients (Fig. 4G–J). The calibration analyses unveiled that the predicted DFS of both T10S and STAS significantly agreed with the actual DFS (Supplementary Fig. 13A, B). Further subgroup univariate regression analysis across all three cohorts identified T10S as a significant risk factor in all clinical subgroups of patients, except in STAS-negative patients and those with the LPA subtype (Table 1). However, STAS could only stratify the risk for stage IA, non-smoking, and female patients (Table 2).

Fig. 4: Spatial-related T10S is an excellent recurrence predictor.
figure 4

AC DFS curves for LUAD patients with STAS positive versus STAS negative; p-value reflects Log Rank testing. A All patients (n = 268, p = 0.0042). B Stage IA patients (n = 150, p = 0.0016). C Stage IB patients (n = 118, p = 0.42). DF DFS curves for LUAD patients with T10S high versus T10S low. D All patients (n = 268, p = 0.00000028). E Stage IA patients (n = 150, p = 0.0010). F Stage IB patients (n = 118, p = 0.0000046). G C-index of T10S and STAS in the total cohort (T10S, C-index = 0.633 95% CI = 0.586–0.680), (STAS, C-index = 0.561, 95% CI = 0.513–0.609). HJ Time-dependent ROC curve analyses on the LUAD patients for predicting 1-, 2-, and 3-year DFS.

Table 1 Subgroup univariate regression analysis of T10S
Table 2 Subgroup univariate regression analysis of STAS

Meanwhile, according to univariate and multivariate Cox regression analyses, T10S was an independent risk factor for recurrence in patients with early-stage LUAD (HR: 3.819, p < 0.001) (Supplementary Fig. 14A). We also conducted subgroup univariate regression analyses within the STAS-positive and STAS-negative groups and found that T10S was the only risk factor in the STAS-positive group (HR: 6.683, p < 0.001) (Supplementary Fig. 14B). In almost all clinical subgroups within the STAS-positive group, T10S was a significant risk factor (Supplementary Fig. 15A). In the STAS-negative group, older age, smoking history, and stage IB were risk factors. Among them, stage IB and smoking history were independent risk factors (Supplementary Fig. 14C). T10S had no risk stratification ability in all clinical subgroups (Supplementary Fig. 15B). Overall, these findings indicate that T10S can effectively predict the recurrence risk for stage I LUAD patients.

AI-assisted STAS detection

Based on the AI model, considering the low parameter size of the STASNet framework, we deployed our model on a two-screen workstation to mimic the daily pathological diagnostic practice. As shown in Supplementary Movie 1, in the single-screenshot mode, the Python script processed a single H&E-stained image, whereas, in the movie-process mode, the script continuously processed the input image and presented the prediction. Because STASNet was designed for classification, the prediction speed in the movie-process mode could only be 6–9 s/image.

We also developed an AI-assisted diagnostic workflow to aid physicians in STAS detection and risk stratification of patients (Fig. 5A). During the review process, a real-time STAS detection system was integrated to aid in STAS diagnosis. Once the report was obtained, our AI model performed a semi-quantitative analysis of STAS on the WSIs from the STAS-positive group. Risk were then stratified based on T10S. T10S was clearly not helpful in assessing the recurrence risk in the STAS-negative patients, as demonstrated by the aforementioned results. This showed that clinicopathological features helped assess recurrence in this group. Therefore, the recurrence risk in STAS-negative patients was evaluated primarily on the basis of their clinicopathological characteristics. Meanwhile, occult STAS was detected among the STAS-negative patients by T10S, and a small number of STAS-negative patients were eventually reclassified as positive for STAS following the detailed evaluation by pathologists. Similarly, the patients reclassified as positive for STAS underwent semi-quantitative analysis and were categorized into the high-risk or low-risk group. The AI-assisted modification of STAS results caused alterations in the STAS status of five patients from the ACHNJMU cohort, two patients from the THTCM cohort, and three patients in the TCGA-LUAD cohort (Supplementary Table 3).

Fig. 5: AI-assistance STAS detection.
figure 5

A Workflow of STAS diagnosis combined with AI. BD The characteristics of three types of mis-detection STAS (0.2418 μm/pixel).

The characteristics of these occult STAS were then categorized into three groups. First, a small number of STAS structures were observed in the presence of numerous histiocytes and erythrocytes. When the WSI was magnified to approximately 40× (Fig. 5B), the STASNet-identified tiles exhibited a distinct micropapillary characteristic of STAS. At such a high magnification level, most physicians would be able to accurately identify STAS. However, the situation is contradictory in routine clinical practice, wherein physicians often experience time constraints and are unlikely to carefully examine at such high magnification levels. Second, a few STAS structures were intermingled with a substantial number of tertiary lymphoid structures (Fig. 5C). In these cases, the STAS structures were relatively large, exceeding the average size of solid STAS, but had a color and morphology similar to those of the adjacent tertiary lymph nodes. Third, a few STAS structures were in clean alveolar spaces (Fig. 5D). These spaces appear normal during gross scans, and identifying them is challenging even at 40× magnification. In brief, the AI-assisted workflow considerably helps in the pathological diagnosis of patients with early-stage LUAD.

Discussion

We constructed the first STAS detection and semi-quantitative model, STASNet, by using a deep learning model. Our study demonstrated the accuracy of STASNet in identifying STAS at the tiles and WSI levels from the three datasets and constructed seven STAS-related semi-quantitative scores. Among them, T10S combined with a spatial characteristic can accurately predict the recurrence risk in early-stage LUAD patients, especially in STAS-positive patients. Meanwhile, three easily missed occult STAS characteristics were identified based on our proposed model.

In our study, to ensure model accuracy while enhancing stability and generalizability, four widely used deep learning frameworks and three resolutions of 128 × 128, 256 × 256, and 512 × 512 pixels were selected. Moreover, we included two external datasets: one was the THTCM dataset from other centers under the same scanning conditions as the training dataset (ACHNJMU training), and the other dataset was the well-known public dataset TCGA-LUAD. We also constructed the initial publicly accessible dataset with STAS labels that encourage the sharing of STAS research (PRJCA020924). Deep learning has shown outstanding progress in diverse industries and fields because of its extensive and comprehensive learning capabilities. Notably, on comparing the model results, we noted that MobileNetV3 remarkably outperformed our expectations by exhibiting the best performance with fewer parameters and the shortest training time. The streamlined network architecture of MobileNetV3, equipped with lightweight deep convolution and feature-filtering capabilities in intermediate expansion layers, supports the generation of efficient mobile models.

Across different studies, wide variability has been observed in the incidence of STAS in LUAD, ranging from 14.8% to 60.5%36,37. For patients with early-stage LUAD, the incidence ranges from 14.8% to 48.1%. Our dataset presented an STAS incidence of 31.9%–48.1%, which is consistent with those specified in preceding reports. The range of STAS incidence is wide, which is attributable to variations in study populations, disease stages, interobserver variability, research methodologies, etc. The average value of the mean inter-observer Concordance rate agreement of five pathologists assessing STAS using routine pathological slides was 82.6% (78.1%–87.1%, Fleiss’ kappa = 0.646 ± 0.032)15. Achieving consistency, especially amongst the same observers within the same institution, is challenging. This makes establishing it across different institutions even more difficult. The deep learning model constructed in the present study accurately identified STAS at both the tiles and WSI levels across multiple datasets. At the tiles level, the accuracy of the model was >0.85. The AUC values for predicting STAS in the patients using ACHNJMU, THTCM, and TCGA-LUAD datasets were 0.772, 0.778, and 0.722, respectively. An AI model with precise STAS identification can reduce interobserver variability, enable accurate diagnosis, and substantially contribute to precision medicine.

Using STASNet, we developed seven STAS-associated semi-quantitative scores, which help us understand the clinical significance underlying STAS. Two previous semi-quantitative analyses of STAS, which were conducted within a single center, demonstrated that a greater number of STAS and a greater distance of STAS dissemination from the primary tumor are associated with an elevated recurrence risk in STAS patients15,16. By contrast, our study incorporated data from three centers. Nonetheless, our results consistently revealed that additional semi-quantitative STAS analysis is advantageous for categorizing the patient recurrence risk. In the manual semi-quantitative analysis of STAS in early-stage LUAD patients, clinicians could only analyze a few selected fields of interest in each slide without evaluating STAS across the entire field of view. By contrast, our constructed STASNet clearly allowed easy semi-quantitative analysis of STAS across the full field of view.

According to the predictive result of patient recurrence obtained using these seven scores, T10S exhibited the best ability to predict recurrence. This T10S value, from the perspective of AI, enables the risk stratification of STAS-positive patients who had undergone surgery for early-stage LUAD, without imposing excessive burden on the physicians. In the era of precision medicine, more accurate diagnoses and detailed evaluations are required to meet the requirements of personalized treatment for everyone. Most studies have indicated the significance of adjuvant therapy following surgery in patients with stage IB LUAD with STAS11,12,17. However, opinions regarding stage IA STAS patients are different. Surprisingly, we also found that T10S significantly stratified the recurrence risk in stage IA LUAD patients with STAS. This finding can contribute to the formulation of postoperative adjuvant therapy decisions for this patient subgroup.

The application of STASNet in LUAD STAS-negative patients was also investigated. Surprisingly, T10S aids physicians in identifying occult STAS patients among STAS-negative patients, thereby further improving the STAS detection rate. The occult STAS patients identified in this study shared a common characteristic: a minimal amount of STAS concealed within a deceptive normal background, which posed a significant challenge to pathologists in accurately diagnosis the condition. This finding provides valuable insights into our understanding of STAS. To our knowledge, no study has focused on these missed STAS-positive patients.

The findings of this study have extensive implications for clinical practice. Firstly, it introduces a transformative shift in the diagnostic approach, moving away from a reliance solely on pathologists to a hybrid model where machine interpretation precedes human assessment. This innovative approach enables simultaneous assistance and real-time reference for physicians in diagnosing STAS. Secondly, the study sheds light on the characteristics and clinical significance of previously overlooked occult STAS patients, facilitating their comprehensive identification with the support of STASNet. Lastly, it contributes to advancing the current clinical understanding of STAS. Our study results indicate that the distance of STAS from the main tumor was significantly correlated with the patient’s recurrence. Therefore, paying attention to patients with distant STAS presentations is needed.

While we attempt to present a comprehensive study, acknowledging the limitations of our study is important. Despite the high accuracy rates of STASNet, the substantial amount of data resulted in the classification of an unacceptable number of misclassified tiles as STAS. Although previous studies have suggested that a larger surgical resection margin in patients with early-stage lung cancer may reduce the recurrence risk in STAS-positive patients38, our study could not address this clinical issue conclusively. Future efforts will require developing intraoperative rapid pathological section-based models to gain early insights into the STAS status of the patients.

The study developed an AI model to accurately identify STAS. This model will aid pathologists in the quantitative analysis of STAS at the WSI full-field-of-view level and allow them to perform further risk stratification of patients with early-stage positive LUAD. STASNet can serve as an objective judgment scale so that pathologists can refer to its results and adjust their interpretations accordingly. This would thus improve diagnostic consistency. At present, only a few studies have quantitatively analyzed STAS, and this has been only in the ROI region. Because of the huge workload, relying on manpower to quantify STAS at the WSI full-field-of-view level is difficult. STASNet can complete this task effortlessly as it can work non-stop for 24 h without the need for rest. Currently, the clinical knowledge about STAS is slightly inadequate. We have developed an AI model for the detection and semi-quantitative analysis of STAS using WSIs, which is a novel approach in this field. The AI-based STAS detection and semi-quantitative model effectively differentiated the recurrence risk in STAS-positive patients. These findings will increase the focus of clinicians on this high-risk group and allow them to plan more aggressive treatments, ultimately reducing the recurrence risk.

In conclusion, STASNet exhibited commendable performance in STAS diagnosis, and T10S exhibited a significant ability to differentiate the recurrence risk in STAS-positive LUAD patients. Thus, STASNet can aid pathologists in STAS detection while offering opinions for clinical decision-making.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.