Abstract
No studies have examined the prognostic value of the log odds of negative lymph nodes/T stage (LONT) in locally advanced rectal cancer (LARC) treated with neoadjuvant chemoradiotherapy (nCRT). We aimed to assess the prognostic value of LONT and develop a machine learning model to predict overall survival (OS) and disease-free survival (DFS) in LARC patients treated with nCRT. The study included 820 LARC patients who received nCRT between September 2010 and October 2017. Univariate and multivariate Cox regression analyses identified prognostic factors, which were then used to develop risk assessment models with 9 machine learning algorithms. Model hyperparameters were optimized using random search and 10-fold cross-validation. The models were evaluated using metrics such as the area under the receiver operating characteristic curves (AUC), decision curve analysis, calibration curves, and precision and accuracy for predicting OS and DFS. Shapley’s additive explanations (SHAP) was also used for model interpretation. The study included 820 patients, identifying LONT as a significant independent prognostic factor for both OS and DFS. Nine machine learning algorithms were used to create predictive models based on these factors. The extreme gradient boosting (XGB) model showed the best performance, with a mean AUC of 0.89 for OS and 0.83 for DFS in 10-fold cross-validation. Additionally, the predictions generated by the XGB model were analyzed using SHAP. Finally, we developed an online web-based calculator utilizing the XGB model to enhance the model’s generalizability and to provide improved support for physicians in their decision-making processes. The study developed an XGB model utilizing LONT to predict OS and DFS in patients with LARC undergoing nCRT. Furthermore, an online web calculator was constructed using the XGB model to facilitate the model’s generalization and to enhance physician decision-making.
Similar content being viewed by others
Introduction
The prevalence of local advanced rectal cancer has exhibited a persistent increase in recent years, posing a significant menace to human health1. The utilization of neoadjuvant chemoradiotherapy (nCRT) followed by radical surgery has been advocated as the established therapeutic approach for individuals diagnosed with locally advanced rectal cancer (LARC), resulting in a reduction in local recurrence rates and mitigating the likelihood of distant metastasis2,3. Consequently, ensuring the precision of survival prognosis for patients with LARC assumes paramount importance in the context of postoperative therapy and follow-up arrangements.
In recent years, a substantial number of scholarly investigations have been dedicated to examining the influence of lymph node dissection on tumor prognosis4,5,6,7,8. The number of positive lymph nodes, examination of lymph nodes (ELN) and the number of negative lymph nodes (NLN) are classical evaluation indicators of lymph-node status, which are associated with survival in gastric9, breast cancer10, and colorectal11,12. In the study conducted by İmamoğlu Gİ, Oğuz A et al.12, positive lymph node ratio (PLNR) is an important prognostic factor for Stage III colorectal. In addition, Ogino S et al.13, found that the negative lymph node count is associated with improved survival of colorectal cancer patients, independent of lymphocytic reactions to tumor and tumoral molecular features. However, a shared constraint among these studies was the dearth of data pertaining to personalized tumor characteristics.
Indeed, the T stage in LARC serves as a strong and reliable risk factor, as it is determined by the extent of tumor invasion and effectively captures the primary tumor characteristics. A growing body of research has indicated a strong correlation between T stage and both prognosis and tumor biological characteristics13,14,15. Both NLN and the T stage play crucial roles as independent prognostic factors in LARC, as they respectively indicate the extent of LND and the severity of the disease. Recently, a number of studies have employed the integration of negative lymph nodes (NLN) and T stage (log (NLN + 1)/T stage) as a novel prognostic factor, aiming to accurately assess the extent of individualized lymph node dissection (LND) in patients with cancer16,17,18,19. Nevertheless, the applicability and accuracy of these studies are constrained by their reliance on traditional linear models, which exhibit limited precision and uncertain generalizability.
Machine learning, a rapidly advancing predictive analytics technique, holds significant promise for surpassing traditional modeling approaches in the prediction of clinical outcomes. Unlike linear models, such as logistic regression, machine learning enhances modeling capabilities by effectively capturing nonlinear relationships among multiple variables. Classical statistical methodologies predominantly concentrate on inferential processes, encompassing the estimation of model parameters and the testing of hypotheses. These techniques yield relatively straightforward models, prioritizing interpretability over predictive accuracy, and are less adept at handling datasets characterized by numerous interacting factors. The advent of machine learning presents a promising avenue for addressing many of the limitations associated with traditional approaches. This approach demonstrates superior performance in various applications, including disease diagnosis, prognosis prediction, anti-tumor drug response, and treatment response assessment20,21,22. Machine learning has shown effectiveness in predicting prognostic outcomes utilizing clinical and radiological features23,24,25. However, to date, no study has employed machine learning methods based on LONT to predict the prognosis of patients with LARC following neoadjuvant therapy.
In this study, we integrated the number of NLN and T stage to formulate and define the LONT index, with the aim of investigating its relationship with the survival outcomes of patients with LARC. Additionally, we developed ten distinct machine learning prediction models to evaluate the variables influencing OS and DFS in LARC patients. The performance of these models was rigorously assessed using a range of metrics, and their interpretability was analyzed in the context of various influencing factors.
Materials and methods
Patients
This retrospective analysis enrolled 820 LARC patients treated with nCRT between September 2010 and October 2017. Eligibility criteria included: (1) histologically confirmed rectal adenocarcinoma; (2) age ≥ 18 years; (3) absence of secondary malignancies; (4) documented follow-up data, including survival duration and cause of death. Exclusion criteria comprised: (1) incomplete lymph node status records; (2) undetermined TNM staging; (3) insufficient surgical documentation; (4) concurrent malignancies; (5) missing clinicopathological data; (6) metastatic disease (Stage IV). Tumor staging was established through multimodal assessments, including digital/endoscopic examinations, cross-sectional imaging (chest/abdominal-pelvic CT), endorectal ultrasound, and pelvic MRI. Surgical resection followed nCRT after a 6–8-week interval, adhering to TME principles for mid/low rectal tumors and partial TME for high rectal lesions (> 5 cm distal margin). Adjuvant chemotherapy commenced 4–8 weeks postoperatively per NCCN26 guidelines. Institutional ethics approval was obtained, with all procedures conforming to ethical standards.
Treatment strategy
All patients were administered preoperative radiation therapy at a dose of 45 Gy/25, delivered to the pelvic over a period of 5 weeks. This was followed by a boost of 5.4 Gy/3 specifically targeting the primary tumor27. The preoperative concurrent chemoradiotherapy regimens employed were CapeOX (capecitabine plus oxaliplatin), Capecitabine and FOLFOX (5FU plus oxaliplatin). The surgical intervention is typically conducted within a timeframe of 6 to 8 weeks following the conclusion of radiation therapy. Middle and low rectal cancers were managed through TME, while high rectal cancers were addressed through partial TME, ensuring a distal margin of 5 cm, with routine dissection of lateral lymph nodes not being conducted28. The short axis diameter of lymph nodes is considered the most critical parameter for diagnosing lateral lymph node metastasis. In the clinical assessment of lateral lymph node metastasis at our center, there is no substantial reduction in the size of lateral lymph nodes following NCRT (including instances where NCRT cannot be administered for various reasons), and selective lateral lymph node dissection (LLND) is performed. After NCRT, there was a significant reduction in the size of the lateral lymph nodes, with internal iliac lymph nodes measuring less than 4 mm. It is generally accepted that LLND is not performed in patients whose obturator lymph nodes have a short diameter of less than 6 mm or have completely regressed29,30,31. Subsequently, patients received postoperative adjuvant chemotherapy approximately 4–8 weeks after the surgical procedure, irrespective of the outcomes of the surgical pathology assessment.
Data collection and definitions
LONT was computed as log[(NLN + 1)/T], where NLN = examined lymph nodes (ELN) - metastatic nodes. T-stage ordinal mapping assigned T0–T4 as 1–5. Survival endpoints included overall survival (OS: diagnosis to all − cause-mortality)and disease free survival (DFS: recurrence − free interval).
Model construction and evaluation
Machine learning algorithms (extreme gradient boosting (XGB)32, random forest (RF)33, decision tree (DT)34, light gradient boosting (LGBM)35, adaptive boosting (AB)36, K-nearest neighbor (KNN)37, support vector machine (SVM)38, naive Bayes (NBC)39, and multilayer perceptron (MLP)40.) were implemented for prognostic modeling. The dataset was split into a 70% training set and a 30% test set through random selection, using 10-fold cross-validation. The training set was used to build the model, while the test set was for validation and evaluation. A 10-fold cross-validation and grid search within the training set helped find the best parameters. The model’s performance was initially evaluated using an internal test set, calculating the average AUC under ten-fold cross-validation. An AUC near 1 indicates strong predictive power, while 0.5 suggests limited capability. Decision and calibration curves assessed clinical utility and predictive accuracy, respectively. The best model was chosen based on multiple metrics. Nine models trained on the internal test set were used to assess generalization, with ROC and calibration curves plotted for analysis. SHAP, a model-agnostic tool based on cooperative game theory, was used to explain and evaluate variable importance in the best machine learning model. Additionally, a web-based calculator was created to facilitate clinical use of the model.
Statistical analysis
Statistical analysis and model building were performed using R (v4.2.3) and Python (v3.8). T-tests analyzed continuous variables, while Chi-square or Fisher’s exact tests handled categorical ones. Odds ratios (OR) and confidence intervals (CI) were calculated. Univariate and multivariate Cox regression models identified variables related to OS and DFS, with significant risk factors (P < 0.05) used in machine-learning models.
Results
Patient characteristics
A retrospective analysis was performed on 937 patients diagnosed with LARC who underwent radical surgery after nCRT at the First Affiliated Hospital of Fujian Medical University from September 2010 and October 2017. After screening according to the inclusion and exclusion criteria, a total of 820 patients were included. Among all LARC patients receiving nCRT, the median follow-up time was 48 months. The median ELN, NLN and LONT count (Mean ± SD) were 12.71 ± 5.79, 12.00 ± 5.61 and 0.65 ± 0.32, respectively. The median distance from the anal verge and tumor size count (Mean ± SD) were 6.48 ± 2.44 and 2.70 ± 1.28. The majority of them were male (66.59%). Moreover, the most common ypTNM stage II (28.29%). Most of the LARC patients presented with T3 (43.41%) and N0 (74.63%). No statistically significant distinctions were observed among the two subgroups.(Table 1).
The prognostic impact of LONT on OS and DFS
In order to investigate the prognostic significance of LONT on OS and DFS in patients with LARC, a COX regression analysis was conducted. In the univariate analysis, several factors including the tumor size (P < 0.001), pathological T stage (P < 0.001), pathological N stage (P < 0.001), pathological TNM stage (P < 0.001), TRG grade (P < 0.001), abdominoperineal resection (P = 0.047), neural invasion (P = 0.001), Lymphovascular invasion (P = 0.038), pre-CEA (P = 0.019), pre-CA199 (P = 0.009), post-CEA (P = 0.001), post-CA199 (P = 0.029) and LONT level (P < 0.001) were found to be independently associated with OS in patients with LARC who underwent nCRT and TME(Table 2). The Cox regression analysis revealed that the pathological TNM stage (P < 0.001), the tumor size (HR = 1.164, 95%CI: 1.015–1.335, P = 0.029), and LONT level (HR = 0.286, 95%CI: 0.138–0.596, P = 0.001) were identified as significant independent predictors of OS following nCRT, as presented in Table 2.
In the analysis conducted, several factors were found to be independently associated with DFS in patients with LARC who underwent nCRT and TME. These factors included the tumor size (P < 0.001), pathological T stage (P < 0.001), pathological N stage (P < 0.001), pathological TNM stage (P < 0.001), neural invasion (P = 0.021), Lymphovascular invasion (P = 0.035), TRG grade (P < 0.001), distance from the anal verge (P = 0.003), post-CEA (P < 0.001) and LONT level (P < 0.001), as shown in Table 3. The results of the Cox regression analysis revealed that the pathological TNM stage (P < 0.001), the tumor size (HR = 1.202, 95%CI: 1.079–1.340, P = 0.001), LONT level (HR = 0.374, 95%CI: 0.206–0.680, P = 0.001), and distance from the anal verge (HR = 0.899, 95%CI: 0.843–0.958, P = 0.001) were identified as independent predictors of DFS after nCRT according to Table 3.
Model performance
To evaluate the predictive efficacy of the nine models, a ten-fold cross-validation was performed on both OS and DFS (Fig. 1A, B). The average AUC values for the machine learning models in predicting OS ranged from 0.74 to 0.89, demonstrating robust predictive capabilities. Notably, the XGB algorithm achieved the highest average AUC value of 0.89 (SD = 0.011). Similarly, the average AUC values for the models in predicting DFS ranged from 0.73 to 0.82, also indicating strong predictive performance. The XGB algorithm again exhibited the highest average AUC value of 0.82 (SD = 0.014). The XGB model employed in the clinical decision curve analysis demonstrated robust predictive performance, as illustrated in Fig. 2C, F. Furthermore, Fig. 2B, E presents the calibration curve of the XGB model for OS and DFS, corroborating its predictive accuracy. Collectively, the XGB model exhibits a high degree of reliability. Figure 2 displays the receiver operating characteristic (ROC) curves, clinical decision curves, and calibration curves for both OS and DFS across nine different models. The XGB model outperformed the others in terms of OS and DFS, highlighting its superior discriminative capabilities. The utilization of heat map analysis, characterized by its comprehensive, lucid, intuitive, and easily interpretable nature, proved to be an effective method for conducting a thorough evaluation, enabling a more nuanced assessment of the models across multiple dimensions (refer to Fig. 3). After a comprehensive evaluation of model performance across training and internal validation datasets, it was concluded that the XGB model demonstrated superior predictive accuracy for OS and DFS in patients with LARC, thereby justifying its designation as the optimal model.
(A) ROC curves of nine machine learning models in OS. (B) Calibration curves of nine machine learning models in the OS. (C) DCA curves of nine machine learning models in OS. (D) ROC curves of nine machine learning models in the DFS. (E) Calibration curves of nine machine learning models in DFS. (F) DCA curves of nine machine learning models in DFS. Extreme gradient boosting (XGB), random forest (RF), decision tree (DT), light gradient boosting (LGBM), adaptive boosting (AB), K-nearest neighbor (KNN), support vector machine (SVM), naive Bayes (NBC), and multilayer perceptron (MLP); DCA, Decision curve analysis.
(A) Prediction performance of nine models in OS. (B) Prediction performance of nine models in DFS. AUC area under the curve, XGB extreme gradient boosting, RF random forest, DT decision tree, LGBM light gradient boosting, AB adaptive boosting, KNN K-nearest neighbor, SVM support vector machine, NBC naive Bayes, MLP multilayer perceptron.
The SHAP to model interpretation
To visually elucidate the selected variables, SHAP was utilized to illustrate their predictive capabilities for OS and DFS within the model. Figure 4A and D display the ranking of risk factors in predicting OS and DFS, respectively, based on their average absolute SHAP values. The x-axis represents the importance of each variable within the predictive model. Furthermore, two representative cases were presented to illustrate the interpretability of the model: one depicting LARC patients without OS/DFS (Fig. 4B, E), and the other showcasing LARC patients with OS/DFS (Fig. 4C, F). Figure 4 displays the most significant features within our model. Each feature is represented by a line indicating the attributions of all patients to the results, with red dots denoting high-risk values and blue dots denoting low-risk values. Figure 4B illustrates that elevated levels of LONT, lower clinical TNM stages, and smaller tumor sizes are correlated with enhanced survival outcomes in LARC patients. Conversely, Fig. 4C demonstrates that reduced LONT levels, higher clinical TNM stages, and larger tumor sizes are associated with poorer survival outcomes in LARC patients.
Relative importance of variables based on SHAP for XGB prediction model in OS and DFS (A,B); Relative importance of variables based on SHAP for XGB prediction model depicting LARC patients without OS/DFS (B,E), and the other showcasing LARC patients with OS/DFS (C,F). SHAP Shapley’s additive explanations, XGB extreme gradient boosting;
Web calculator
Although the XGB model exhibited the highest performance among the nine machine learning models assessed, its complexity and limited interpretability make it inappropriate for clinical generalization. To address this limitation, we have developed a web-based calculator that employs the XGB model, allowing users to input patient clinicopathological data on the left-hand side to estimate the probability of distant metastasis. Figure 5 illustrates the web calculator.
Discussion
In the present study, LONT was utilized to evaluate the comparative extent of lymph node dissection (LND). Furthermore, a prognostic machine learning model was constructed based on LONT. Our findings revealed a positive correlation between elevated LONT levels and improved survival rates among patients with LARC, independent of clinicopathological variables. The developed machine learning model, which integrates LONT, not only encompasses tumor characteristics but also provides insights into the extent of LND and exhibits strong prognostic capabilities. This machine learning model has the potential to assist clinicians in customizing personalized treatment strategies for patients with LARC following nCRT. To the best of our knowledge, this is the inaugural study to quantify the relative degree of LND and to develop a machine learning model incorporating LONT and clinicopathological factors to predict the survival outcomes of LARC patients after nCRT.
Previous studies have established that both ELN and NLN hold independent prognostic value concerning colorectal cancer. Lei P et al.41 demonstrated that a higher count of ELN correlates with more precise nodal staging and improved prognosis in stage II colorectal cancer. Furthermore, Benli S et al.42 identified that the count of NLN is linked to enhanced survival outcomes in colorectal cancer patients, irrespective of lymphocytic reactions to the tumor and the molecular characteristics of the tumor. The lack of essential data regarding the biological characteristics of tumors necessitates further research to elucidate their clinical utility. Notably, the T stage in LARC serves as a significant and reliable risk factor, reflecting the extent of tumor invasion and encapsulating the primary tumor’s attributes. A growing body of scholarly literature has consistently demonstrated a strong correlation between the T stage and both prognosis and tumor biological characteristics. In our study, LONT was operationalized as the logarithm of the ratio between the sum of NLN counts plus one and the T stage. Here, NLN denote the extent of LND, while the T stage signifies the tumor’s characteristics and severity. Unlike the straightforward use of ELN and NLN to represent the degree of LND, LONT integrates individualized tumor characteristics. It can be understood as NLN adjusted by the T stage, thereby reflecting the proportionate quantity of negative lymph nodes extracted in postoperative patients with LARC. A higher LONT value indicates a greater proportion of negative lymph nodes, whereas a lower value denotes a smaller proportion. Importantly, the LONT value can consistently represent the level of risk, irrespective of variations in the patient’s T stage and the number of negative lymph nodes. This attribute enables the comparison of individualized LND degrees among patients with varying TNM stages.
Previous research has demonstrated a strong correlation between LONT and both cancer progression and prognosis16,17,18,19. However, to date, there have been no studies investigating the prognostic significance of LONT in LARC patients undergoing nCRT utilizing machine learning models. Machine learning algorithms have been utilized to forecast treatment responses in patients with various types of cancer. A substantial body of research has validated the effectiveness of machine learning in accurately predicting treatment outcomes across multiple cancer types43,44,45,46. SHAP values were employed to assess the significance of each variable within the machine learning model47. The importance of each variable was quantified and elucidated through a SHAP summary plot. SHAP values, grounded in game theory, provide a method for model interpretability by explaining the global structure of the model through the aggregation of local explanations for individual predictions.
In this study, we combined the number of NLN and the T stage to develop and define the LONT index. Our findings indicate that this novel indicator functions as an independent prognostic factor in patients with LARC. The results of the univariate Cox regression analysis reveal that the hazard ratio (HR) for NLN in OS and DFS is 0.960 and 0.968, respectively. Importantly, when the T stage is combined with NLN, the HR significantly decreases to 0.132 and 0.208. These findings are corroborated by the multivariate Cox regression analysis. Additionally, the prognostication of patients with LARC following nCRT can be distinctly stratified by the LONT in both OS and DFS cohorts. Subsequently, we developed a predictive model for OS and DFS in patients with LARC by integrating LONT with conventional clinical and pathological characteristics, employing nine widely recognized machine learning algorithms. The results indicate that the extreme gradient boosting (XGB) algorithm exhibited superior predictive performance for both OS and DFS in LARC. The XGB model, renowned for its efficiency, flexibility, and scalability, is a widely utilized machine learning algorithm classifier across various medical domains, including LARC48,49. A significant advantage of this method lies in its deployment of a large ensemble of decision trees that demonstrate minimal inverse correlation. The XGB model was meticulously chosen to reduce error rates and mitigate the risk of overfitting within the training model50,51.
This study assessed the distribution of SHAP values for a pertinent covariate, along with its significance and directionality. Notably, LONT demonstrated a high proportion of risk scores within the model, indicating a pronounced discriminatory capacity for risk assessment when considering identical pathological TNM stages and other clinicopathological factors. The SHAP dependence diagram demonstrated that the LONT is an effective predictor of OS and DFS in patients with LARC. Our model successfully integrated a range of risk factors influencing OS and DFS in LARC patients, achieving superior predictive performance. The substantial contribution of LONT to the model further validated the impact of the degree of LND on prognosis and highlighted the importance of utilizing LONT for prognostic prediction in LARC cases. Furthermore, this marker can be readily obtained from the postoperative pathological report without incurring additional costs. Considering the widespread use of this model in clinical practice, improving the accuracy of survival outcome estimation will benefit a substantial number of patients.
Some limitations were present in our study. Firstly, it may be deemed imprecise to solely depict the attributes of a tumor based on its T stage, as the significant biological characteristics of a tumor encompass additional factors such as pathological type, grade, genotyping, and so forth. Secondly, since this was a retrospective single-center investigation, we require a prospective study design to evaluate our findings. Lastly, our sample size was limited. Thus, multicenter studies could provide a larger sample size for further investigation.
Conclusions
In conclusion, the application of LONT as an innovative prognostic marker facilitates the evaluation of the heterogeneity in LND across different patient populations. This marker has proven effective in reliably predicting OS and DFS in patients with LARC undergoing nCRT, independent of clinicopathological variables. The creation of a machine learning model based on LONT provides clinicians with a simple, accurate, and user-friendly scoring system to develop personalized treatment strategies.
Data availability
Some or all data used during the study are available from the corresponding author by request.
References
Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68(6):394–424. (2018).
Li, Y. et al. A review of Neoadjuvant Chemoradiotherapy for locally advanced rectal Cancer. Int. J. Biol. Sci. 12 (8), 1022–1031. https://doi.org/10.7150/ijbs.15438 (2016).
Gollins, S. & Sebag-Montefiore, D. Neoadjuvant treatment strategies for locally advanced rectal Cancer. Clin. Oncol. (R. Coll. Radiol.) 28(2):146–151. (2016).
Nicholls, R. J., Zinicola, R. & Haboubi, N. Extramural spread of rectal cancer and the AJCC cancer staging manual 8th edition. Ann. Oncol. 30(8):1394–1395. https://doi.org/10.1093/annonc/mdz147. (2017).
Hazen, S. J. A., Sluckin, T. C., Konishi, T. & Kusters, M. Lateral lymph node dissection in rectal cancer: state of the art review. Eur. J. Surg. Oncol. 48 (11), 2315–2322 (2022).
Li, J. & Shiomi, A. Lateral lymph node dissection in advanced low rectal cancer treatment. Int. J. Colorectal Dis. 36(11):2361–2371. https://doi.org/10.1007/s00384-021-03975-x (2021).
Kroon, H. M. et al. Systematic review and meta-analysis of long-term oncological outcomes of lateral lymph node dissection for metastatic nodes after neoadjuvant chemoradiotherapy in rectal cancer. Eur. J. Surg. Oncol. 48 (7), 1475–1482 (2022).
Law, B. Z. Y., Yusuf, Z., Ng, Y. E. & Aly, E. H. Does adding lateral pelvic lymph node dissection to neoadjuvant chemotherapy improve outcomes in low rectal cancer? Int. J. Colorectal Dis. 35(8):1387–1395. https://doi.org/10.1007/s00384-020-03656-1. (2020).
Gulmez, S. et al. Prognostic significance of the metastatic lymph node ratio compared to the TNM classification in stage III gastric cancer. Niger. J. Clin. Pract. 24(11):1602–1608. https://doi.org/10.4103/njcp.njcp_345_20. (2021).
Tausch, C. et al. Prognostic value of number of removed lymph nodes, number of involved lymph nodes, and lymph node ratio in 7502 breast cancer patients enrolled onto trials of the Austrian breast and Colorectal Cancer Study Group (ABCSG). Ann. Surg. Oncol. 19 (6), 1808–1817. https://doi.org/10.1245/s10434-011-2189-y (2012).
Li Destri, G. et al. Predictive Value of the Number of Harvested Lymph Nodes and Cut-Off for Lymph Node Ratio in the Prognosis of Stage II and III Colorectal Cancer Patients. J. Invest. Surg. 32(1):1–7. (2019).
İmamoğlu, G. İ. et al. The impact of lymph node ratio on overall survival in patients with colorectal cancer. J. Cancer Res. Ther. 17(4):1069–1074. https://doi.org/10.4103/jcrt.JCRT_11_19. (2021).
Ogino, S. et al. Negative lymph node count is associated with survival of colorectal cancer patients, independent of tumoral molecular alterations and lymphocytic reaction. Am. J. Gastroenterol. 105 (2), 420–433. https://doi.org/10.1038/ajg.2009.578 (2010).
Sun, Y. et al. Radiomic features of pretreatment MRI could identify T stage in patients with rectal cancer: Preliminary findings. J. Magn. Reson. Imaging. https://doi.org/10.1002/jmri.25969. (2018).
Ao, W. et al. Value of apparent diffusion coefficient for assessing preoperative T staging of low rectal Cancer and whether this is correlated with Ki-67 expression. Can. Assoc. Radiol. J. 71 (1), 5–11 (2020).
Chen, T. et al. Predictive value of the log odds of Negative Lymph Nodes/T Stage as a Novel Prognostic factor in bladder Cancer patients after Radical Cystectomy. Front. Oncol. 12, 895413. https://doi.org/10.3389/fonc.2022.895413 (2022).
Wang, X., Wu, Y., Li, X., Hong, J. & Zhang, M. Log odds of negative lymph nodes/T stage ratio (LONT): a new prognostic tool for differentiated thyroid cancer without metastases in patients aged 55 and older. Front. Endocrinol. (Lausanne). 14, 1132687. https://doi.org/10.3389/fendo.2023.1132687 (2023).
Yang, W., Lu, S., Ge, F., Hua, Y. & Chen, X. Prognostic and predictive model of the log odds of the Negative Lymph Node/T Stage Ratio in Resectable gastric adenocarcinoma patients. J. Gastrointest. Surg. 26 (8), 1743–1756. https://doi.org/10.1007/s11605-022-05408-8 (2022).
Xie, J., Pang, Y., Li, X. & Wu, X. The log odds of negative lymph nodes/T stage: a new prognostic and predictive tool for resected gastric cancer patients. J. Cancer Res. Clin. Oncol. 147 (8), 2259–2269. https://doi.org/10.1007/s00432-021-03654-y (2021).
Ren, Y. et al. Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Predict Postoperative complications and Report on a Mobile platform. JAMA Netw. Open. 5 (5), e2211973 (2022).
Bertsimas, D., Dunn, J., Velmahos, G. C. & Kaafarani, H. M. A. Surgical Risk Is Not Linear: Derivation and Validation of a Novel, User-friendly, and Machine-learning-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator. Ann Surg.
Liu, Y., Ko, C. Y., Hall, B. L. & Cohen, M. E. American College of Surgeons NSQIP Risk Calculator Accuracy using a machine learning Algorithm compared with regression. J. Am. Coll. Surg. 236 (5), 1024–1030 (2023).
Zhang, X. Y. et al. Predicting rectal Cancer response to Neoadjuvant Chemoradiotherapy using Deep Learning of Diffusion Kurtosis MRI. Radiology 296 (1), 56–64 (2020).
Shaish, H. et al. Radiomics of MRI for pretreatment prediction of pathologic complete response, tumor regression grade, and neoadjuvant rectal score in patients with locally advanced rectal cancer undergoing neoadjuvant chemoradiation: an international multicenter study. Eur. Radiol. 30 (11), 6263–6273. (2020).
Feng, L. et al. Development and validation of a radiopathomics model to predict pathological complete response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer: a multicentre observational study. Lancet Digit. Health. 4 (1), e8–e17 (2022).
Benson, A. B. et al. NCCN guidelines insights: rectal Cancer, Version 6.2020. J. Natl. Compr. Cancer Netw. 18 (7), 806–815. https://doi.org/10.6004/jnccn.2020.0032 (2020).
Li, S., Wang, Y., Cai, H., Pan, Z. & Guan, G. A good preoperative immune prognostic index is precits a better prognosis for locally advanced rectal cancer patients with ypTNM stage II who underwent radical resection after neoadjuvant chemoradiotherapy. Int. J. Colorectal Dis. 38 (1), 184 (2023).
Kroon, H. M. et al. Systematic review and meta-analysis of long-term oncological outcomes of lateral lymph node dissection for metastatic nodes after neoadjuvant chemoradiotherapy in rectal cancer. Eur. J. Surg. Oncol. 48 (7), 1475–1482 (2022).
Oh, H. K. et al. Neoadjuvant chemoradiotherapy affects the indications for lateral pelvic node dissection in mid/low rectal cancer with clinically suspected lateral node involvement: a multicenter retrospective cohort study. Ann. Surg. Oncol. 21 (7), 2280–2287 (2014).
Liang, J. T. Technical feasibility of laparoscopic lateral pelvic lymph node dissection for patients with low rectal cancer after concurrent chemoradiation therapy. Ann. Surg. Oncol. 18 (1), 153–159 (2011).
Sluckin, T. C. et al. Coverage of lateral lymph nodes in rectal cancer patients with routine radiation therapy practice and associated locoregional recurrence rates. Int. J. Radiat. Oncol. Biol. Phys. 117 (2), 422–433 (2023).
Chen, T. & Guestrin, C. ACM Press the 22nd ACM SIGKDD International Conference - San Francisco, California, USA. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD \16 - XGBoost. (2016).
Wallace, M. L. et al. Use and misuse of random forest variable importance metrics in medicine: demonstrations through incident stroke prediction. BMC Med. Res. Methodol. 23 (1), 144 (2023).
Breiman, L. Classification and Regression Trees 1st edition (Routledge, 1984).
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
Shek, A. et al. Machine learning-enabled multitrust audit of stroke comorbidities using natural language processing. Eur. J. Neurol. 28 (12), 4090–4097 (2021).
Zhang, S., Li, X., Zong, M., Zhu, X. & Wang, R. Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 29 (5), 1774–1785 (2018).
Lapin, M., Hein, M. & Schiele, B. Learning using privileged information: SVM + and weighted SVM. Neural Netw. 53, 95–108 (2014).
Sugahara, S. & Ueno, M. Exact learning augmented naive bayes classifier. Entropy. 23 (12), 1703 (2021).
Mansour, M., Alsulamy, S. & Dawood, S. Prediction of implementing ISO 14031 guidelines using a multilayer perceptron neural network approach. PloS One. 16 (1), e0244029 (2021).
Lei, P. et al. Prognostic impact of the number of examined lymph nodes in stage II colorectal adenocarcinoma: a retrospective study. Gastroenterol. Res. Pract. 2020, 8065972. https://doi.org/10.1155/2020/8065972 (2020).
Benli, S., Tazeoğlu, D. & Çolak, T. Combination of the ratio between negative and harvested lymph nodes and Metastatic Lymph Node Count as a Prognostic Indicator in Stage III Colon cancer: a retrospective cohort study. Am. Surg. 90 (1), 38–45. (2024).
Huang, S., Yang, J., Shen, N., Xu, Q. & Zhao, Q. Artificial intelligence in lung cancer diagnosis and prognosis: current application and future perspective. Semin. Cancer Biol. 89, 30–37 (2023).
Peng, Z. et al. Application of radiomics and machine learning in head and neck cancers. Int. J. Biol. Sci. 17 (2), 475–486 (2021).
Kang, J., Schwartz, R., Flickinger, J. & Beriwal, S. Machine Learning approaches for Predicting Radiation Therapy outcomes: a clinician’s perspective. Int. J. Radiat. Oncol. Biol. Phys. 93 (5), 1127–1135 (2015).
Swanson, K., Wu, E., Zhang, A., Alizadeh, A. A. & Zou, J. From patterns to patients: advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell 186 (8), 1772–1791 (2023).
Cheng, J. et al. Multimodal disentangled variational autoencoder with game theoretic interpretability for glioma grading. IEEE J. Biomed. Health Inf. 26 (2), 673–684. https://doi.org/10.1109/JBHI.2021.3095476 (2022).
Liu, J. et al. Identification of colorectal cancer progression-associated intestinal microbiome and predictive signature construction. J. Transl. Med. 21 (1), 373. (2023).
Huang, W. et al. Utility of Machine Learning algorithms in Predicting Preoperative Lymph Node Metastasis in patients with rectal Cancer based on three-Dimensional Endorectal Ultrasound and Clinical and Laboratory Data. J. Ultrasound Med. 42 (11), 2615–2627. (2023).
Ma, B. et al. Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Comput. Biol. Med. 121, 103761 (2020).
Liu, Y., Wang, Y. & Zhang, J. Information Computing and Applications. In: New. Machine Learning Algorithm. Random Forest. Springer, 246–252. (2012)
Funding
This study was supported by National Natural Science Foundation of China (No. 82172800), Special Financial Foundation of Fujian Provincial (No. 2020B019), Joint Funds for the Innovation of Science and Technology, Fujian Province (No. 2020Y9125), Talent programs granted from The First Affiliated Hospital of Fujian Medical University (YJRC3600), and Startup Fund for Scientific Research of Fujian Medical University (No.2020QH2036).
Author information
Authors and Affiliations
Contributions
Conception and design were the responsibility of Ye Wang, Acquisition of data was carried out by Shoufeng Li, Xing Liu and Ying Huang, analysis were performed by Zhen Pan, Jinfu Zhuang and Huajun Cai. Interpretation, drafting, and revision of the manuscript were done by Guoxian Guan. All authors approved the final version of the manuscript. Informed consent for research purpose with patient data and images were obtained for all patients.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, Y., Pan, Z., Cai, H. et al. Prognostic model for log odds of negative lymph node in locally advanced rectal cancer via interpretable machine learning. Sci Rep 15, 7924 (2025). https://doi.org/10.1038/s41598-025-90191-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-90191-0