Introduction

Colorectal cancer (CRC) is the among the most common cancers worldwide, with at least 1.8 million new cases diagnosed annually1. Approximately 10% of these patients need to undergo multivisceral resection (MVR) because the tumor is adherent to contiguous organs or structures at the time of initial diagnosis2. However, true malignant invasion to adjacent organs or structures was reported pathohistologically in 30–70% of patients who underwent MVR, while only inflammatory adhesion was detected in the remaining patients3. Therefore, MVR performed during curative resection for patients with CRC inflammation adhering to their surroundings is often overtreatment.

Current imaging modalities for preoperative assessment of bladder invasion (BI) face critical limitations. While contrast-enhanced CT is widely used for staging, retrospective studies demonstrate its sensitivity for macroscopic BI detection is only 64.3%, with pathological confirmation of malignancy in merely 28.6% of resected specimens4. CT cannot reliably differentiate inflammatory adhesions from true tumor infiltration, leading to overestimation of invasion. Cystoscopy, though adjunctively employed, detects mucosal abnormalities (e.g., edema) but fails to predict transmural invasion, resulting in 53% false-positive rates for malignancy4,5. These limitations underscore the unmet need for objective preoperative tools to stratify adhesion severity.

It is well acknowledged that extensive operation is often associated with a high rate of complications. Complication rates have been reported as 20–42% after MVR6. Yuji Nakafusa et al. revealed that the overall morbidity rate was 49.1% after MVR, while 17.8% after standard operation7. As the most common organ involved in MVR (53.2%)8, the bladder was partially or totally resected depending on the site and extent of bladder involvement. Compared with surgery for primary bladder cancer, urinary tract reconstruction after cystectomy is more susceptible to complications when performed via MVR for CRC9. Unfortunately, patients with inflammatory adherent CRC who undergo MVR cannot achieve oncological benefits3 and suffer from a greater risk of perioperative complications and poor postoperative quality of life.

However, since imaging examination methods are limited, diagnosing the nature of adhesions preoperatively is still a challenge10. Intraoperative identification of adhesions as inflammatory or malignant is often inaccurate, and frozen sectioning is often ineffective11,12. Intraoperative attempts to release adhesions often result in transection of the tumor, which leads to higher local recurrence rates and poorer overall survival in these CRC patients than in patients undergoing en bloc MVR2,6,13,14,15. Thus, the preoperative evaluation of locally advanced CRC, including those with a suspicion of BI, has become increasingly important and relevant, especially in the era of minimally invasive surgery for CRC and neoadjuvant chemoradiation for rectal cancer.

The term “radiomics”, which refers to the process of converting medical images into high-dimensional data by high-throughput extraction of quantitative features and subsequent data analysis for decision-making, has recently attracted increasing interest16. Currently, radiomics has been widely applied in tumor diagnosis, prognosis assessment and treatment response prediction17. To the best of our knowledge, no studies have focused on applying radiomics models to evaluate BI in patients with CRC.

Therefore, the purpose of this study was to investigate the feasibility of a radiomics model for the detection of BI by CRC on CT images.

Materials and methods

Data enrollment

This retrospective study was approved by the local Institutional Review Board (Peking University First Hospital 2019–170), and all methods were performed in accordance with the relevant guidelines and regulations. Due to the retrospective nature of the study, Peking University First Hospital waived the need of obtaining informed consent. CT images of the abdomen between December 2009 and September 2021 were retrospectively collected. The inclusion criteria were as follows: (a) underwent primary CRC resection combined with partial or total cystectomy, (b) had clinical information available, and (c) had postoperative pathology to determine whether invasion of the bladder by CRC was present. The exclusion criteria were as follows: (a) CT data were obtained two weeks before surgery, (b) CT images did not fulfill the requirements of the study, (c) preoperative therapy (radiotherapy, chemotherapy or chemoradiotherapy).

A total of 96 eligible patients were ultimately included in this study; 35 had bladder invasion [BI(+)], and 61 had no bladder invasion [BI(-)]. The 96 patients were randomly assigned to the training dataset (n = 68) or test dataset (n = 28) at a ratio of 7:3 (Fig. 1).

Fig. 1
figure 1

Flow chart of patient enrollment. BI(-): absence of bladder invasion. BI(+): presence of bladder invasion.

CT acquisition parameters

The CT images were acquired from six CT scanners. The detailed scanning parameters are shown in Table 1. There were no significant differences in the scanning parameters between the training dataset and the test dataset (all P > 0.05).

Table 1 CT image acquisition protocols.

Clinical information

The age, sex, BMI, CA19-9, CEA, tumor ___location, tumor size, and differentiation of the tumors are shown in Table 2. The most common site of CRC was the sigmoid colon (62.5%), followed by the rectum (34.4%). There were no statistically significant differences in clinical information between the training and test datasets (all P > 0.05), except for CA19-9 (P = 0.008).

Table 2 Clinical characteristics of the training and test datasets.

CT impression by visual assessment of two radiologists

The CT images were reviewed by two experienced radiologists. The positive findings included CRC expansion to the bladder, thickening of the bladder wall, and stranding of the surrounding tissue. Bladder invasion was diagnosed only if direct CRC expansion to the bladder wall was observed. The presence of secondary signs (bladder wall thickening or perivesical stranding) supported the diagnosis but were not independently sufficient. The two radiologists gave the CT impression in consensus that there existed invasion of the bladder by the CRC.

Region of interest for radiomics model

The region of interest (ROI) on the CT images was manually labeled by two radiologists (an intern in radiology training and a radiologist with 30 years of experience) with ITK-Snap software (http://www.itksnap.org). A rectangle shape of the ROI was annotated at the area where the bladder and the CRC were closest (Fig. 2). Radiologists manually delineated ROIs by marking key points to form minimum bounding cuboids around tumor regions. These cuboids were not constrained to align with the image axes. ROIs were variable in size (mean 25311 mm3, range: 13054–58767 mm3) to accommodate tumor heterogeneity.

Fig. 2
figure 2

Region of interest of the models. A rectangle-shaped ROI was annotated at the area where the bladder and the CRC were closest. The ROIs are shown in the axial (a), sagittal (b), and coronal (c) planes and were viewed three-dimensionally (d).

Development of the radiomic model

All CT images were resampled to isotropic voxels (1 × 1 × 1 mm3) using B-spline interpolation to eliminate variations in spatial resolution caused by differences in slice thickness or reconstruction kernels. ROIs were converted to binary format and smoothed using morphological operations (e.g., dilation/erosion with a 3 × 3 kernel) to reduce edge artifacts. HU values were truncated to a range of -100 to 400 to exclude irrelevant tissues (e.g., air, bone). A fixed bin number (64 bins) was applied to standardize the gray-level range across all scans, ensuring comparability of texture-based features between datasets. All preprocessing adhered to the Image Biomarker Standardization Initiative (IBSI) recommendations to ensure methodological transparency and cross-study comparability. Feature extraction from these images was carried out using the PyRadiomics package in Python, with further details available in the PyRadiomics documentation. This package enabled the calculation of 14 shape-related features, 18 primary-level features, and 70 textural features. To standardize these features, the Z score normalization technique was employed. To study the reproducibility of the radiomic features, the 68 patients in the training cohort were labeled again by Reader A and Reader B.

Inter- and intraobserver reliability were assessed using intraclass correlation coefficients (ICCs), derived from a two-way random effects model. Radiomics features demonstrating excellent reliability (with ICCs greater than 0.85) were deemed robust and subsequently chosen for model construction. The selected variables were then shifted to zero centered and scaled to have unit variance. Then, principal component analysis (PCA) was performed to reduce the dimensionality of the features. Principal components (PCs) explaining 95% of the cumulative variance were retained to balance information preservation and model simplicity. The top 15 PCs were subsequently selected for downstream analysis, as they collectively captured the majority of variance while minimizing redundancy. ANOVA was used to select the PCA features for training a Categorical Boosting (CatBoost) model. In this study, due to the limited size of the test set which may not fully represent the model’s performance, we performed 5-fold cross-validation on the training set to more robustly evaluate the model’s generalization capability while maximizing the use of the available data. After the CatBoost model was trained in the training dataset, the cases in the test dataset were predicted by the model. The probability predicted by CatBoost was used to evaluate the efficacy of the radiomics model.

Evaluation of the model

Model evaluation was performed using the R programming language (version 4.1.1). In the test cohort, the predictive efficacy of the radiomics model and the CT impression were assessed using receiver operating characteristic (ROC) curves, generated with the pROC package. Precision-recall (PR) curves and calibration curves (bootstrapped with 1,000 repetitions for bias correction) were plotted using the PRROC and rms packages, respectively. Model performance metrics—including accuracy (ACC), sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), and negative likelihood ratio (NLR)—were calculated using the modEvA package.

Statistical analysis

The statistical analysis was conducted using the R software, version 4.1.3. For continuous variables, those following a normal distribution were presented as mean ± standard deviation, while non-normally distributed variables were depicted as median [1st quartile, 3rd quartile]. Categorical variables were summarized as counts and percentages. Normality of distribution was assessed using the Kolmogorov-Smirnov test. Associations among categorical variables were evaluated using chi-square or Fisher’s exact tests as appropriate. For two-group comparisons involving non-normally distributed data, the Mann-Whitney U test was utilized. In cases where multiple groups were compared and the data did not meet parametric assumptions, the Kruskal-Wallis test was applied. The DeLong test was employed to compare the areas under the Receiver Operating Characteristic (ROC) curves across various models. Differences were considered statistically significant at a P value of less than 0.05.

Results

Clinical characteristics of the cases

The clinical characteristics of the BI(+) and BI(-) patients are shown in Table 3. The characteristics of BI(+) and BI(-) patients were compared. No significant differences were found between the BI(+) and BI(-) groups for any of the clinical factors (all P > 0.05), except for the differentiation of the tumor in the test dataset (P = 0.018).

Table 3 Clinical characteristics of patients with and without BI.

Results of the CT impression

The results of the CT impression are shown in Table 4. The inter-reader agreement for the CT image features of CRC expansion, bladder wall thickening, and perivesical stranding was 0.876, 0.761, and 0.769, respectively. In the training dataset, the diagnostic accuracy of the CT impression was 64.7%. In the test dataset, the diagnostic accuracy of the CT impression was 53.6%.

Table 4 Results of the CT impression.

Results of the radiomics model

After PCC analysis, 88 features were selected as robust features and passed for PCA. Then, feature reduction was performed by PCA (Fig. 3; Table 5), and five PCA features were selected to train the CatBoost model. For training the CatBoost model, accuracy was used to select the optimal model using the largest value. The final values used for the model were depth = 2, learning_rate = 0.1, iterations = 100, l2_leaf_reg = 1e-06, rsm = 0.9, and border_count = 255. The relative importance of the PCA features was 24.12%, 21.17%, 20.73%, 19.50%, and 14.49% in the overall model. The AUC values and their corresponding 95% confidence intervals from the 5-fold cross-validation were 0.706 (0.551–0.806), 0.797 (0.636–0.891), 0.747 (0.588–0.799), 0.831 (0.774–0.858), and 0.891 (0.768–0.991), respectively.

Fig. 3
figure 3

Feature reduction was performed by PCA. (a) Variance against the number of dimensions in the result of PCA. (b) Individual case in the PCA. Each individual case is plotted on the platform of dimension 1 (Dim1) and dimension 2 (Dim2). Dim1 and Dim2 contribute to the principal component with proportions of 42.1% and 19.1%, respectively. (c) Individual case in the PCA with different groups. Group 0 [BI(-)] in red and group 1 [BI(+)] in green are plotted in the platform of dimension 1 (Dim1) and dimension 2 (Dim2).

Table 5 Standard deviation and proportion of the top 5 PCA features.

Evaluation of the model

The area under the curve (AUC) and other evaluation metrics for CT impression and the radiomics model are shown in Table 6; Fig. 4, including AUC, accuracy (ACC), sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), and negative likelihood ratio (NLR), respectively. In the training dataset, the AUC of the radiomic model [0.864 (95% CI: 0.778, 0.951)] was significantly greater than that of CT impression [0.678 (95% CI: 0.569. 0.786), P = 0.007]. In the test dataset, the AUC of the radiomic model [0.883 (95% CI: 0.699, 1.000)] was also significantly greater than that of CT impression [0.570 (95% CI: 0.370, 0.770), P = 0.040].

The PR curves of the CT impression and the radiomics model are shown in Fig. 5. The calibration curves are shown in Fig. 6. With 1,000 repetitions of bootstrapping, the calibration curve of the CT impression yielded a mean absolute error of 0.042, a mean squared error of 0.002, and a 0.9 quantile of absolute error of 0.043. The calibration curve of the radiomics model yielded a mean absolute error of 0.029, a mean squared error of 0.001, and a 0.9 quantile of absolute error of 0.071.

Table 6 AUCs and other evaluation metrics for the CT impression and the radiomics model.
Fig. 4
figure 4

ROC curves of the CT impression and the radiomic model in the training dataset (a), the AUC of the radiomic model [0.864 (95% CI: 0.778, 0.951)] was significantly greater than that of the CT impression [0.678 (95% CI: 0.569. 0.786), P = 0.007]. In the test dataset (b), The AUC of the radiomic model [0.883 (95% CI: 0.699, 1.000)] was significantly greater than that of the CT impression [0.570 (95% CI: 0.370, 0.770), P = 0.040].

Fig. 5
figure 5

Precision‒recall curves of the CT impression and the radiomic model in the training dataset (a) and the test dataset (b).

Fig. 6
figure 6

Calibration curve of the CT impression (a), with 1,000 repetitions of bootstrapping, the calibration curve yielded a mean absolute error of 0.042, a mean squared error of 0.002, and a 0.9 quantile of absolute error of 0.043. Calibration curve of the radiomics model (b), with 1,000 repetitions of bootstrapping, the calibration curve yielded a mean absolute error of 0.029, a mean squared error of 0.001, and a 0.9 quantile of absolute error of 0.071.

Discussion

Predicting BI by locally advanced CRC is important because it can help determine the stage of the cancer and guide treatment decisions. Studies have shown that the most relevant preoperative predictors of definite BI in patients with CRC are gross hematuria, a tumor visible during cystoscopy, and abnormal CT findings5. CT can provide detailed images of the bladder and surrounding structures, allowing radiologists to identify abnormal findings such as gross tumor invasion, an enhancing mass at the bladder wall, irregular bladder mucosa, and loss of the perivesical fat plane that may indicate cancer. However, CT scans are not definitive and are often used in combination with other diagnostic methods, such as cystoscopy and MRI, to confirm the diagnosis. Relatively low concordance rates, approximately 35-55%, between preoperative CT scans and final histopathology results have been reported8,18, which means that there may be excessive preoperative staging, which leads to an increase in the proportion of neoadjuvant therapy and the expansion of the scope of intraoperative resection. Thus, improving the coincidence rate of imaging and pathology is the future direction of diagnosis. Radiomics is a field of medical imaging in which advanced computational methods are used to extract a large number of quantitative features from medical images. In the case of CRC, radiomics has been shown to be useful for evaluating the aggressiveness of the disease, as well as for predicting patient outcomes19.

In comparison to prior radiomics studies on colorectal cancer, our work advances the field in two key aspects. First, while existing CT-based radiomics research in CRC has addressed diverse diagnostic tasks (e.g., tumor staging, lymph node metastasis), our study is the first to specifically evaluate bladder invasion. This addresses a critical unmet need in preoperative assessment, where distinguishing true BI from inflammatory adhesions remains a major clinical challenge. By resolving this ambiguity, our model could reduce unnecessary multivisceral resections and improve surgical planning. Second, we introduce a standardized rectangular ROI spanning the tumor-bladder interface. Unlike conventional approaches that rely on irregular tumor contours, this method ensures consistent, reproducible annotations across radiologists, thereby enhancing the validity and reliability of radiomics feature extraction. Together, these advances strengthen the clinical utility of radiomics for guiding preoperative surgical decisions in CRC.

In this study, we reviewed 96 CRC patients who underwent surgery for suspected bladder invasion. A CT visual assessment model and a radiomics model were developed and compared. We found that there were no significant differences in age, sex, BMI, CA19-9, CEA, tumor ___location, or tumor diameter between patients with and without BI. However, there was a statistically significant difference in tumor differentiation between the two groups in the test dataset. In the test dataset, the AUC of the radiomics model (0.883) was significantly greater than that of the CT visual assessment (0.570, P = 0.040).

Accurate and consistent annotation of ROIs is crucial for our radiomics study to be robust20. Our PubMed search results indicate that most radiomics studies on rectal cancer use manual segmentation, in which radiologists annotate the ___location and precise boundary of the tumor21. In this study, we chose the ROI as the area where the bladder and the CRC were closest because we believe that this area contains important information about the biology of the tumor, such as the aggressiveness of the tumor, the extent of invasion, and the potential for metastasis. Using this clear definition of the ROI, we can ensure that the radiologists consistently annotate the ROI, making the results of the study valid and reliable. To evaluate the bias in the ROI, we enlisted the participation of two radiologists with varying levels of experience and assessed the intra- and interrater repeatability of the annotations. The results demonstrated good consistency between them.

At present, comprehensive treatment based on surgical resection is still the treatment mode for locally advanced CRC with bladder involvement. This study revealed that applying radiomics models for the prediction of BI by CRC is feasible and may perform better than human radiologists, expecting to improve the accuracy of clinical staging of locally advanced CRC, guide the selection of appropriate treatment strategies and decrease unnecessary bladder resection.

A first diagnosis of locally advanced CRC with suspected bladder infiltration is recommended for radiomics assessment. For locally advanced colon cancer, if the bladder is assessed as non-invasive using the radiomics model, surgical exploration may be an option for attempting to preserve the bladder intactly or reduce the extent of cystectomy; otherwise, combined partial or total cystectomy or neoadjuvant therapy may be an option. For locally advanced rectal cancer, if the bladder is assessed as non-invasive by the radiomics model, attempts may be made to preserve the bladder intactly or reduce the extent of cystectomy after neoadjuvant therapy.

This study has several limitations. First, the small sample size (particularly the limited number of positive invasion cases, n = 8 in the test set) may compromise the statistical power and robustness of the radiomics model, thereby affecting its reproducibility—a critical criterion for radiomics models. Second, the use of heterogeneous CT scanners with varying imaging protocols could introduce variability in radiomic feature extraction, further limiting the reproducibility of results. Future work should incorporate harmonization techniques, such as batch effect correction using empirical Bayes methods (e.g., ComBat) or deep learning-based ___domain adaptation, to mitigate scanner- and parameter-related variability. Third, the study population was primarily composed of patients who underwent CRC resection and cystectomy, which may introduce selection bias as conservatively managed patients were excluded. Future prospective cohorts including both surgical and non-surgical patients are needed to better reflect real-world clinical diversity. Fourth, the single-center design and limited dataset restrict the generalizability of our findings; external validation with multicenter data is essential to confirm the model’s applicability across diverse clinical settings. Finally, manual ROI annotation, while necessary for this study, is labor-intensive and prone to inter-observer variability. Automated segmentation methods should be developed in future work to improve efficiency and consistency.

Conclusions

In summary, based on the results of this study, it appears that using radiomics models for the prediction of BI by locally advanced CRC is feasible and may perform better than human radiologists. The application of radiomic models can assist in the preoperative staging of locally advanced CRC, guide the selection of treatment strategies, and reduce unnecessary bladder resection.