Introduction

Rectal cancer poses a significant worldwide burden, with a staggering 70% of patients being diagnosed with locally advanced rectal cancer (LARC)1,2. Neoadjuvant chemoradiotherapy (nCRT) followed by a total mesorectal excision (TME) is one of the primary therapeutic approachs3,4. The efficacy of nCRT is associated with treatment prognosis and influences clinical decision-making. Evaluation methods include pathological tumor regression grade (pTRG) and post-neoadjuvant chemotherapy tumor-node (ypTN) staging5,6. For example, patients expected to achieve pathological complete response (pCR), indicating the absence of residual tumor cells, are typically considered to have higher survival rates and lower rates of recurrence7. Currently, clinical complete response (cCR) is often used as an approximation for pCR8. Patients who achieve cCR can opt for a watch and wait approach to preserve organs and reduce surgical complications9. Research indicates that, for patients who achieve cCR, their recurrence rates are comparable to patients who undergo surgery10,11. There are two significant issues that need to be addressed in this context. Firstly, it is important to recognize that cCR cannot entirely replace pCR. Secondly, the assessment of lymph node status has not been considered in this regard4,11. Hence, it is necessary to predict pathological response and lymph node status before surgery.

Currently, some studies have indicated that the use of MRI tumor regression grade (mrTRG) shows promise in evaluating the effectiveness of neoadjuvant therapy. This includes assessing parameters such as pCR, and neoadjuvant response score (NAR)12,13. mrTRG classifies regression grades into 1–5 based on the proportion of residual tumor (moderate signal intensity on T2WI) and fibrosis (markedly low signal intensity on T2WI). However, some research indicates that there is low consistency between mrTRG and pTRG14,15. Furthermore, the identification of lymph nodes metastases (LNM) on MRI after neoadjuvant treatment poses challenges, particularly in recognizing small metastatic lymph nodes measuring less than 5 mm16,17. While some studies suggest that imaging features of lymph nodes can serve as predictors of LNM after nCRT18, it is important to note that there may not always be a consistent match between MRI data and the corresponding pathological data17.

In comparison to the conventional assessment of MRI, radiomics has the ability to extract high-dimensional features. Prior studies have demonstrated that radiomic features in rectal cancer are associated with treatment response and prognosis, such as pathologic treatment response grade (pTRG), overall survival (OS), and disease-free survival (DFS)19,20,21. Combining radiomic features can enhance the predictive ability for pathological complete response (pCR) compared to using mrTRG alone14. Compared to pre-treatment radiomic features, delta radiomic features can capture the dynamic changes of features over time. However, the study by Wen et al.22 demonstrates that the delta radiomic model derived from the tumor performed suboptimally in predicting pCR in patients with LARC following nCRT, achieving an AUC of only 0.724. This performance is inferior to models utilizing pre-nCRT or post-nCRT radiomic features alone. Notably, this study did not account for the radiomic features of the mesorectum impacted by nCRT. Integrating delta radiomic features of the mesorectum into the model may potentially enhance its predictive performance regarding the efficacy of neoadjuvant treatment.

Our objective is to explore the development of models that utilize different radiomic features, either individually or in combination, to predict various outcomes of nCRT treatment, including pathological response and LNM.

Methods

Patients

This study obtained approval from the Institutional Review Board of Peking University Third Hospital (Ethics approval No.IRB00006761-M2022474). We confirmed that all research was performed in accordance with relevant guidelines and regulations, and due to its retrospective nature, the need for obtaining written informed consent was waived. The study enrolled patients diagnosed with LARC who underwent nCRT at Peking University Third Hospital between January 2013 and June 2022. The inclusion criteria were as follows: (a) Diagnosis of LARC before the initiation of nCRT; (b) Availability of T2WI sequences in MRI scans both before and after nCRT. Exclusion criteria were as follows: (a) Distance from the tumor mass to the anal verge exceeding 10 cm; (b) Use of non-standard neoadjuvant chemoradiotherapy protocols; (c) non-receipt of total mesorectal excision surgery; (d) Occurrence of distant metastasis during the treatment course; (e) Significant image artifacts that impacted tumor boundary delineation; and (f) Missing postoperative pathological results. After patient selection, a total of 209 participants were included in our study (Fig. 1). They were randomly divided into a training set (n = 146) and a test set (n = 63) in a 7:3 ratio.

Fig. 1
figure 1

Study flowchart of the enrolled patients.

Endpoints

Postoperative pathological results for each patient were retrieved from the electronic medical record system, and three independent endpoints were determined: pCR, pathological good response (pGR), and LNM. According to the American Joint Committee on Cancer (AJCC) 8th edition Tumor Regression Grade (TRG) system23, the TRG is classified into four levels, ranging from TRG0 to TRG3, where TRG0 corresponds to pCR, and TRG1-3 correspond to non-pCR. TRG0-1 corresponds to pGR, while TRG2-3 correspond to non-pGR. Based on lymph node restaging after nCRT, ypN0 represents non-LNM, and ypN1-2 represents LNM.

Neoadjuvant chemoradiotherapy

The radiotherapy regimen involved conformal intensity-modulated radiation therapy (IMRT) targeting the rectal tumor and drainage area. Patients received a daily radiation dose of 1.8–2.0 Gy, totaling 45–50 Gy over a 5-week course (5 sessions per week for a total of 25 sessions). Concurrent chemotherapy using capecitabine or the XELOX regimen was administered during radiotherapy. Surgery was conducted 8–12 weeks following the completion of radiotherapy.

MRI protocol

All individuals in our study cohort underwent pre-nCRT and post-nCRT rectal MRI, which included oblique high-resolution T2WI. The majority of patients (n = 193) were scanned using the 3.0-T Discovery MR 750 system (GE Medical Systems, LLC, America), while a smaller subset of patients (n = 16) were scanned using the 3.0-T MAGNETOM Prisma system (Siemens AG Healthcare, Erlangen, Germany). Pre- and post-treatment images were acquired on the same MRI scanner. Specific details about the scan parameters for both systems can be found in Table 1.

Table 1 Parameters of MR scanners used in this study.

MRI tumor regression grade (mrTRG)

The analysis of tumor signal changes before and after nCRT to evaluate mrTRG was conducted on the T2WI and Diffusion-Weighted Imaging (DWI) sequences. mrTRG 1 indicates the absence of any residual tumor signal. mrTRG 2 signifies the presence of minimal remaining tumor signal, primarily dominated by fibrotic signal. mrTRG 3 indicates the coexistence of fibrotic and tumor signals, with tumor signal not being predominant. mrTRG 4 represents predominant tumor signal with minimal fibrotic signal remaining. mrTRG 5 indicates the presence of only tumor signal, with no discernible fibrotic signal. Two radiologists, one with 4 years of experience in radiology (radiologist 1) and the other with 18 years of experience in abdominal radiology (radiologist 2), jointly assessed the mrTRG and reached a consensus.

Image segmentation

In the absence of pathological knowledge, radiologist 1 segmented two ROIs on T2WI sequence before and after nCRT, namely tumor and the mesorectal compartment (MC). Subsequently, the segmentation was reviewed by radiologist 2. The pre-treatment ROI for tumor included the entire tumor while excluding intraluminal gas, feces, and bubbles. The post-treatment ROI for tumor encompassed the residual tumor and post-treatment fibrosis. The MC represented the mesorectal fat within the confines of the rectal mesorectal fascia, external to the rectal contour.

Feature extraction and the computation of Delta Radiomics features

uAI Research Portal software (uRP) (Version:1.1, United Imaging Intelligence, China, https://urp.united-imaging.com ) was used for feature extraction. The radiomic features were generated using Pyradiomics (Version 3.0.1, https://pyradiomics.readthedocs.io/en/latest/features.html ) and comply with the Image Biomarker Standardization Initiative (IBSI) protocol24. To mitigate image heterogeneity, B-spline interpolation was employed to resample anisotropic pixels into isotropic 1.0 × 1.0 × 1.0(mm) voxels. To reduce biases introduced by variations in imaging protocols and equipment, the Z-score method was used to standardize the images. The radiomic features extracted from the original images include 19 first-order statistical features, 16 shape features, and 75 texture features. Additionally, wavelet filters (using the PyWavelets package) and Laplacian of Gaussian (LoG) filters (sigma = 0.5, 1, 2, 4 mm) were applied, and the same features were extracted from the filtered images. In total, 1184 features were extracted for each ROI (the tumor and MC before and after nCRT). The delta radiomic features represent the change rate between features obtained after nCRT and those obtained before nCRT.

Feature selection and model building

Due to the large number of features, a series of feature selection methods were employed for dimensionality reduction. First, the extracted features were standardized using z-score normalization. Next, Spearman correlation analysis was used to eliminate features with a correlation coefficient greater than 0.9. Subsequently, univariate analysis was performed to select features with P-value < 0.05, using the SelectKBest function from the scikit-learn package in Python. Finally, the Least Absolute Shrinkage and Selection Operator (LASSO) was applied to determine the final set of selected features. The regularization parameter λ for LASSO was tuned using 5-fold cross-validation, selecting the value that minimized the mean squared error on the validation folds. The feature selection process was carried out exclusively on the training set. The logistic regression models were trained using the selected features and subsequently evaluated on the test set.

A total of 30 logistic regression models were developed for the three different outcomes in this study. The models were named using the convention “outcome_model_features used”. The features used were named as follows: mrTRG refers to MRI tumor regression grade; pre_tumor and post_tumor represent features from the tumor ROI before and after treatment, respectively; pre_MC and post_MC represent features from the mesorectal compartment ROI before and after treatment, respectively; delta_tumor and delta_MC refer to the change rate between the post-treatment to pre-treatment features; combine_tumor refers to the combination of pre_tumor and delta_tumor features; combine_MC refers to the combination of pre_MC and delta_MC features; and combine_all refers to the combination of pre_tumor, delta_tumor, pre_MC, and delta_MC features. For example, pCR_model_pre_tumor refers to the model that predicts pCR using pre-treatment tumor features.

The workflow of the radiomic study is illustrated in Fig. 2.

Fig. 2
figure 2

The workflow of the radiomic study.

Statistical analysis

R software (Version: 4.2.0, http://www.Rproject.org ) and Python (Version: 3.10.0, https://www.python.org ) were used for statistical analysis. The independent-samples t-test was used for analysis of normally distributed continuous variables and the Mann–Whitney U test for non-normally distributed variables. Model performance was assessed using the average area under the curve (AUC) of the validation sets. The optimal cutoff value was determined using the Youden Index (Sensitivity + Specificity − 1), with statistics on the average sensitivity, specificity, and accuracy obtained when the Youden Index was maximized. The F1 score is calculated as the harmonic mean of precision and recall. DeLong’s test was used to compare two receiver operating characteristic (ROC) curves. Decision curve analysis (DCA) was employed to evaluate the clinical applicability of the model. Two-sided P < 0.05 was considered to indicate statistical significance.

Results

Clinical features

This study enrolled 209 consecutive patients with LARC (71% male, 29% female), with an average age of 60 years (range: 25–82 years). Among them, 44 patients achieved pCR and 118 patients achieved pGR. A total of 40 patients were diagnosed with LNM (ypN1-2) following nCRT. Each patient received a radiation therapy dose of 45–50 Gy. There were no statistically significant differences in the clinical factors between the training set and the test set. Detailed patient characteristics are summarized in Table 2.

Table 2 Patient characteristics.

Performance comparison of different radiomics models and the mrTRG models

We first performed conventional visual assessment using pre- and post-treatment MRI, namely mrTRG. In our study cohort, the number of patients categorized into mrTRG grades 1 to 5 were 26 (12.4%), 41 (19.6%), 90 (43.1%), 46 (22.0%), and 6 (2.9%) respectively. In the test set, the AUCs for predicting pCR, pGR, and LNM using mrTRG alone were 0.800 (95% CI: 0.627–0.944), 0.583 (95% CI: 0.461–0.704), and 0.603 (95% CI: 0.444–0.750), respectively.

We developed a total of 27 radiomic models. For predicting pCR, we established four models based on single ROIs (pCR_model_pre_tumor, pCR_model_pre_MC, pCR_model_post_tumor, and pCR_model_post_MC), two delta radiomic models (pCR_model_delta_tumor and pCR_model_delta_MC), and three combined models (pCR_model_combine_tumor, pCR_model_combine_MC, and pCR_model_combine_all). The same number and types of models were developed for predicting both pGR and LNM.

After feature selection, the number of remaining features ranged from 2 to 46. For each model, the final features selected by LASSO and their corresponding coefficients are shown in FiguresS1 to S27. For each outcome (pCR, pGR, and LNM), we developed 9 models using the features retained after LASSO regression. The sensitivity, specificity, accuracy, F1 score, and AUC for these models in both the training and test sets are displayed in radar charts in Fig. 3. Detailed results can be found in TableS1. For the models predicting pCR, pCR_model_combine_all achieved an AUC of 0.874 in the test set, significantly higher than pCR_model_mrTRG (AUC = 0.800, DeLong test P = 0.001). It also outperformed pCR_model_combine_tumor (AUC = 0.817) and pCR_model_combine_MC (AUC = 0.771), although these differences were not statistically significant according to the DeLong test. Despite the high AUC of 0.800 for pCR_model_mrTRG, its sensitivity and specificity were 0.923 and 0.240, respectively, indicating low specificity. For the models predicting pGR, pGR_model_combine_all had the highest AUC of 0.801, significantly outperforming pGR_model_mrTRG (AUC = 0.715, DeLong test P < 0.001). It also exceeded pGR_model_combine_tumor (AUC = 0.761) and pGR_model_combine_MC (AUC = 0.685), although these differences were not statistically significant. For the models predicting LNM, LNM_model_combine_all had the highest AUC of 0.826, significantly higher than LNM_model_mrTRG (AUC = 0.603, DeLong test P = 0.003). It also outperformed LNM_model_combine_tumor (AUC = 0.805) and LNM_model_combine_MC (AUC = 0.694), although these differences were not statistically significant according to the DeLong test. For the combined models and mrTRG models, detailed DeLong test results are presented in Tables S2 to S4. The ROC curves for these models in the training and test sets are shown in Fig. 4.

Fig. 3
figure 3

Radar charts displaying the sensitivity, specificity, accuracy, F1 score, and ROC AUC for the models predicting pCR (A), pGR (B), and LNM (C). pCR, pathological complete response; pGR, pathological good response; LNM, lymph node metastasis. ROC, receiver operating characteristic; AUC, area under the curve.

Fig. 4
figure 4

The ROC curves for the models predicting pCR (A for the training set, B for the test set), pGR (C for the training set, D for the test set), and LNM (E for the training set, F for the test set). pCR, pathological complete response; pGR, pathological good response; LNM, lymph node metastasis; ROC, receiver operating characteristic; AUC, area under the curve.

The decision curve analysis for these models in the training and test sets is presented in Fig. 5. The analysis reveals that LNM_model_combine_all provides greater clinical benefit than other LNM prediction models when the threshold is between 0 and 0.24. For the models predicting pCR and pGR, pCR_model_combine_all and pGR_model_combine_all show the highest clinical benefit across most thresholds.

Fig. 5
figure 5

Decision curve analysis for the models predicting pCR (A for the training set, B for the test set), pGR (C for the training set, D for the test set), and LNM (E for the training set, F for the test set). pCR, pathological complete response; pGR, pathological good response; LNM, lymph node metastasis.

Analysis of radiomics features

For predicting pCR, pGR, and LNM, the models combining pre_tumor, delta_tumor, pre_MC, and delta_MC features performed the best. We analyzed the feature composition of these models. The pCR_model_combine_all model used 16 pre_tumor features, 17 delta_tumor features, 4 pre_MC features, and 2 delta_MC features. The pGR_model_combine_all model used 7 pre_tumor features, 18 delta_tumor features, 6 pre_MC features, and 13 delta_MC features. The LNM_model_combine_all model used 14 pre_tumor features, 13 delta_tumor features, 10 pre_MC features, and 8 delta_MC features (Fig. 6).

Fig. 6
figure 6

A stacked bar chart displaying the number and sources of features used in the optimal models for predicting pCR, pGR, and LNM. pCR, pathological complete response; pGR, pathological good response; LNM, lymph node metastasis.

From the feature composition, it is evident that pCR_model_combine_all primarily utilized tumor features, whereas pGR_model_combine_all and LNM_model_combine_all relied more on mesorectal compartment features. Notably, the feature “deltatumor_wavelet_gldm_wavelet.LHH.SmallDependenceLowGrayLevelEmphasis” was used in all three models. Figure 7 illustrates the variation of this feature value with increasing AJCC-TRG, showing that higher TRG is associated with lower values of this feature.

Fig. 7
figure 7

A violin plot showing the distribution of the feature “deltatumor_wavelet_gldm_wavelet.LHH.SmallDependenceLowGrayLevelEmphasis” across the four AJCC-TRG groups (0, 1, 2, 3) reveals a decreasing trend in this feature value as the AJCC-TRG grade increases. AJCC-TRG, the American Joint Committee on Cancer 8th edition tumor regression grade.

Discussion

Our findings suggest that leveraging MRI radiomic features extracted from both rectal tumors and the mesorectal fascia holds promise for predicting pathological responses and lymph node metastasis in LARC patients. This innovative approach facilitates the personalized tailoring of treatment strategies for individual patients. Upon comparing various models, it became evident that the integrated model, which incorporates pre-nCRT radiomic features and delta radiomics features from both rectal tumors and the mesorectal fascia, exhibited superior efficacy in predicting pCR, pGR, and LNM. Notably, this integrated model consistently outperformed single-ROI models and surpassed the predictive efficacy of mrTRG.

The mrTRG is a five-tier scoring system based on the proportion of tumor residue and fibrosis, demonstrating commendable inter-observer consistency25,26. However, the agreement between mrTRG and pTRG appears to be less than optimal25. Voogt et al. observed that the concordance between mrTRG and pTRG is higher when the interval between nCRT and surgery is less than 7 weeks (κ = 0.69), compared to intervals longer than 7 weeks (κ = 0.34). Additionally, the positive predictive value for assessing good responders using mrTRG by a lead radiologist (95%) surpasses that of other radiologists (56%), highlighting the limited and fluctuating predictive value of mrTRG for pTRG27. Our results indicate that the AUC for mrTRG in predicting pCR, pGR, and LNM is only 0.800, 0.715, and 0.603, respectively. Although pCR_model_mrTRG demonstrated a high AUC value of 0.800 in the test set, its AUC value was lower in the training set (0.596). This discrepancy may be attributed to the randomness of the grouping. The limited predictive value of the mrTRG model for nCRT treatment outcomes is consistent with previous studies, underscoring the need for more effective supplementary methods.

Our radiomic model employs high-resolution T2WI due to the enhanced ability of radiologists to discern tumors from fibrosis in this sequence. There is a high level of consistency in ROI segmentation among different radiologists for this specific sequence. Although certain functional MRI sequences, such as DWI or Apparent Diffusion Coefficient (ADC) sequences, hold potential value, they were excluded from this study due to their lower resolution, pronounced artifacts 28,29 and challenges in identifying fibrosis in post-nCRT images22. Schurink et al. noted significant variations in ADC radiomic features across different centers in comparison to T2WI30. Furthermore, a previous high-quality radiomic study illustrated that the combined use of T2WI and DWI did not surpass the predictive performance of T2WI alone in predicting pCR20. In the past, numerous studies have utilized radiomic methodologies to anticipate various treatment outcomes in the context of nCRT for rectal cancer. However, the predominant focus in most investigations has been solely on radiomic features derived from pre-treatment tumor images31,32. Delta radiomic features, capturing alterations in radiomic characteristics induced by treatment, offer heightened interpretability. Wen et al.22 established three models for predicting pCR, encompassing pre-, post-, and delta-radiomic models. Their findings indicated that the post-radiomics model exhibited the optimal predictive performance (AUC = 0.805). They postulated that this superiority might stem from post-nCRT MRI containing information more closely resembling surgical pathology. A limitation of this study was the omission of a fusion model incorporating diverse radiomic features.

nCRT not only induces tumor regression and fibrosis but also results in the reduction or disappearance of metastatic lymph nodes in the mesorectum33. Changes in tumor deposits and vascular invasion may also occur34,35, leading to alterations in mesorectal components and MRI signals. Consequently, we posited that radiomic features of the mesorectal compartment harbor information relevant to the prognosis of nCRT treatment. Our results suggest that the pCR_model_combine_all model incorporates fewer features from the mesorectal compartment, which may be because pCR is mainly related to the regression of the tumor itself. Conversely, both the pGR_model_combine_all and LNM_model_combine_all models utilize a larger number of mesorectal compartment features, likely because predicting pathological grade of response (pGR) and lymph node metastasis (LNM) requires more information from the surrounding mesorectal structures. Additionally, we found that the LNM_model_delta_MC had a relatively high AUC of 0.768, indicating that changes in radiomic features of the mesorectal compartment can effectively predict lymph node metastasis. This may be related to signal changes in the lymph nodes within the mesorectal space, thereby highlighting the value of radiomic features from the mesorectal compartment. Chiloiro et al. developed predictive models for pCR, and 2-year disease-free survival (2yDFS) based on delta radiomic features. The results revealed that the pCR prediction model included two Gross Tumor Volume (GTV) features, achieving an AUC of 0.69 on the validation set. The 2yDFS prediction model comprised one GTV feature and two mesorectal features, with a validation set AUC of 0.70. These findings further suggested that pCR is primarily associated with tumor characteristics36. A potential reason for the suboptimal performance of their model might be the exclusive consideration of delta radiomic features in their study. Our model demonstrated strong performance on the test set, likely due to the inclusion of delta radiomic features, which capture changes in both the tumor and the mesorectal compartment during treatment, thereby enhancing the model’s accuracy.

Our study has several limitations. Firstly, it is a single-center study, and the generalizability of the model has not been validated. Therefore, further validation in a larger, more diverse population is warranted. Secondly, the manual segmentation of tumors and mesorectal compartment on pre- and post-nCRT T2WI is a time-consuming process. While some studies have achieved automatic segmentation of rectal tumors37, an automatic segmentation model for mesorectal compartment has not yet been developed. Our next step involves the development of an automatic segmentation model for mesorectal compartment to save time in the segmentation process and enhance reproducibility. Thirdly, our study did not incorporate certain imaging metrics associated with patient prognosis, such as mesorectal fascia involvement (MRF) and extramural vascular invasion (EMVI). We omitted these parameters as we believe their information is already encompassed within the pre- and post-treatment images. Fourthly, our study endpoints are pCR and pGR, rather than AJCC-TRG. This is because existing machine learning models are often better suited for binary classification rather than multi-class tasks. If more suitable algorithms become available in the future, developing a multi-class classification model would be beneficial for more precise patient management.

Conclusion

The models ultilizing MRI radiomics features from both pre-nCRT tumor and MC contributes to the prediction of pathological response and lymph node metastasis in LARC patients following nCRT. Our models demonstrate the additional value of radiomic features from the mesorectum in predicting various endpoints of nCRT in LARC patients. If validated in the future, our study will provide valuable insights into the identification of novel radiomics biomarkers for pre-treatment planning of neoadjuvant therapy.