Introduction

Urological stones are one of the most common conditions in urology. In recent years, the incidence of renal stones in China has risen to approximately 10.63%, with a high recurrence rate—around 50% of patients experience recurrence within five years1,2. Retrograde Intrarenal Surgery (RIRS) and percutaneous nephrolithotomy (PCNL) are the primary endoscopic treatments for kidney stone disease (KSD). Treatment selection is guided by specific indications: PCNL is recommended for kidney stones ≥ 2 cm in diameter or complex stones, while RIRS is preferred for stones < 2 cm in diameter or < 1 cm in the lower pole3,4. However, the choice of procedure often depends on the surgeon’s experience and, occasionally, the patient’s preference5. Although PCNL has a higher stone clearance rate (SFR) than RIRS, it also carries a greater risk of complications, such as hemorrhage, infection, tissue damage, and thromboembolism6. With advances in flexible ureteroscopy technology, the global use of RIRS is increasing due to its favorable safety profile7. Staged RIRS may be considered for patients with larger kidney stones. However, RIRS is not without risks, especially when performed by less experienced surgeons. Complications can include ureteral perforation, stricture, renal function loss, and postoperative infections8,9. Among these, infections pose a significant concern, ranging from postoperative fever to potentially life-threatening urosepsis, especially in nosocomial infections10,11.

Infection is the most common complication following endoscopic procedures for kidney stones. The incidence of infection complications in patients undergoing RIRS surgery ranges from 5.7–18.3%12,13. After PCNL surgery, 10–30% of patients may experience fever, 35% may develop systemic inflammatory response syndrome (SIRS), and 0.9–9.3% may develop urosepsis14,15. Urinary sepsis, originating from the urinary tract or male genitalia, is a complex systemic inflammatory response to infection. It can lead to organ dysfunction and potentially life-threatening conditions16. The mortality rate for urinary sepsis ranges from 20 to 40%, while severe sepsis can result in a mortality rate as high as 50%17,18. Urinary sepsis progresses rapidly, making early diagnosis and timely treatment critical. SIRS, the initial stage of urosepsis, typically persists throughout the disease course and is closely related to urosepsis19,20. Failure to adequately address SIRS before it progresses to severe sepsis and associated clinicopathophysiological changes can result in missed opportunities for early intervention. Even with intensified treatment at later stages, achieving a favorable outcome is challenging21. Early identification of high-risk kidney stone patients and effective intervention to prevent SIRS has been prioritized by the World Health Organization(WHO)22.

Infection stones account for 10–15% of urinary stones23. Their formation is linked to urinary tract infections caused by urease-producing bacteria24. Patients with infection stones are at a higher risk of developing SIRS and uroseptic shock after surgery25. Currently, the definitive diagnosis of stone composition can only be made postoperatively utilizing infrared spectroscopy. Clinically, spiral CT is widely employed to diagnose urinary stones, which typically appear as high-density calcifications. However, different stone types exhibit varying densities on CT images, enabling a preliminary assessment of their composition through CT value measurements26. CT values represent tissue density, with the Hounsfield Unit (HU) being the standard measurement unit. Typical infection stones have a CT density reading of 900 HU or lower27. Therefore, we hypothesized that stone CT values could be an important predictor of urinary sepsis after lithotripsy. The aim of this study was to construct an early clinical risk prediction model for predicting postoperative urinary sepsis after endoscopic surgery for renal calculi based on CT values, thus providing support for early intervention, prevention and assessment by urologists.

In addition, certain non stone related signs may be associated with systemic infections after kidney stone surgery. Perinephric fluid is a common finding in patients with renal colic and urolithiasis, especially in cases involving hydronephrosis and ureteral stones. It may reflect increased urinary tract pressure, leading us to consider its potential association with systemic infections following kidney stone surgery. However, studies have shown that while perinephric fluid is associated with hydronephrosis and proximal and distal ureteral stones, it is not linked to an increased need for urological intervention or prolonged hospital stays. There are no statistically significant differences in inpatient complications, sepsis rates, antibiotic use, acute kidney injury (AKI), or acute kidney disease (AKD) between patients with and without ultrasound-detected perinephric fluid28. Therefore, further prospective studies are necessary to better understand the impact of perinephric fluid on acute urolithiasis, particularly regarding potential delayed complications and long-term outcomes.

Methods

Experimental model and study participant details

This retrospective modeling study followed a three-step approach for model development, validation, and interpretation. Initially, predictive models were developed utilizing five machine learning (ML) algorithms: Multilayer Perceptron (MLP), Support Vector Machine (SVM), XGBoost, Gaussian Naive Bayes (GNB), and Logistic Regression (LR) to predict systemic infections following endoscopic lithotripsy for kidney stones. Next, model performance was evaluated utilizing an internal validation dataset. Finally, the optimal model was interpreted utilizing the SHapley Additive exPlanations (SHAP) method. SHAP, based on cooperative game theory, introduces Shapley values to build an additive explanation model. It fairly and consistently calculates each feature’s contribution to model predictions, ensuring interpretability across various models and contexts29. Figure 1 provides a comprehensive overview of the study process, including inclusion criteria, feature selection, dataset division, data balancing, model development and validation, model comparison, and the selection and interpretation of the best model.

Fig. 1
figure 1

Machine learning model building flowchart.

A total of 1,413 patients who underwent urinary stone removal surgery at the Second Affiliated Hospital of Nanchang University between February 2021 and September 2023 were included in the study. All participants were of East Asian descent. Inclusion criteria were: (1) preoperative CT confirmation of renal or ureteropelvic junction stones; (2) treatment via PCNL or RIRS; (3) complete medical records and relevant laboratory results; (4) postoperative stone composition analysis utilizing infrared spectroscopy (SUN-3G intelligent stone analysis system). Exclusion criteria included: (1) absence of CT scan; (2) refusal of stone composition analysis; (3) alternative surgical treatments; (4) sepsis caused by infections unrelated to the urinary system; (5) presence of tumors, hematological disorders, or immunodeficiencies; (6) congenital renal abnormalities such as polycystic kidney disease. Ultimately, 833 eligible patients were included in the modeling. All methods complied with relevant guidelines and regulations. The study followed the Declaration of Helsinki, and informed consent was obtained from all participants. The study protocol was reviewed and approved by the Biomedical Research Ethics Committee of the Second Affiliated Hospital of Nanchang University.

Endoscopic lithotripsy

RIRS

Under general anesthesia, the patient is positioned in lithotomy. A ureteroscope is introduced through the urethra into the bladder, and a guidewire is advanced into the target ureter. The ureteroscope is then used to inspect the renal system for stones or strictures. Following stone removal, a 10-12Fr ureteral access sheath (UAS) is inserted over the guidewire. A flexible ureteroscope is advanced into the renal collecting system for holmium: YAG laser lithotripsy. A 200-µm laser fiber is employed, with laser energy settings ranging from 0.5 to 1.5 J and a frequency of 20–30 Hz, depending on the stone characteristics and the urologist’s preference. A negative pressure device is used to suction out stones, preventing high intrarenal pressure. Finally, a double-J stent is placed utilizing the guidewire.

PCNL

Under general anesthesia, the patient is initially placed in the lithotomy position, and a ureteral stent is inserted. A ureteroscope is advanced to locate the ureteral orifice, and a 5 F ureteral catheter is placed for drainage. The patient is then repositioned to the prone position. Under ultrasound guidance, a percutaneous tract is created, and dilators are used to expand the tract over a safety guidewire. A semirigid ureteroscope or nephroscope is employed to access the renal collecting system. A pulsed irrigation pump is used to deliver saline, maintaining visibility. Holmium laser is applied to fragment the stones, and the resulting fragments are flushed out utilizing a pulsating stream from the flushing pump. Larger stones are removed with stone forceps. Postoperatively, a ureteral stent and nephrostomy tube are inserted for drainage.

Method details

Data collection and preprocessing

Clinical data were collected, including age, sex, body mass index, hypertension, diabetes, urinalysis results, urine culture, CT values, stone burden, and degree of hydronephrosis, encompassing 32 indicators. Before the surgery, urine samples were collected for analysis, and midstream urine samples were obtained for culture. A positive urine culture was defined as a colony count of a single pathogen exceeding 105/mL. Surgical time was recorded as the duration from the first step of the operation to the completion of the final procedure.

Two urology experts collected CT imaging features of the urinary system, including stone density in Hounsfield Units (HU), stone burden, stone ___location, number of stones, and degree of hydronephrosis. Images were processed and stored utilizing a Picture Archiving and Communication System (PACS). For CT value measurement, the plane with the largest cross-sectional area of the stone was selected, and the central 50% was defined as the region of interest (ROI). CT values were measured three times within the ROI, and the average value was recorded. In cases of complex stones, such as multiple renal stones, measurements were repeated at different locations, and an average value was calculated (Fig. S1)30. The formula for calculating urinary stone burden was as follows:\(\sum\nolimits_{{k = 1}}^{n} {\left( {0.785\: \cdot {\text{Width}}{\mkern 1mu} {\text{max}}\: \cdot {\text{Length}}{\mkern 1mu} {\text{max}}} \right)}\)31. The postoperative stone composition was analyzed utilizing the LIIR-20 infrared spectroscopy system (Tianjin LAMBDA SCIENTIFIC). The type of stone composition was determined based on its major components (> 50%).

Sepsis diagnosis was based on the 2016 Sepsis-3 guidelines32. The diagnostic criteria for sepsis included patients with either confirmed or suspected infections and a Sequential Organ Failure Assessment (SOFA) score of 2 or higher. The diagnostic criteria for SIRS included the presence of at least two of the following four conditions: white blood cell count < 4,000 or > 12,000 cells/µL; body temperature > 38 °C or < 36 °C; heart rate > 90 beats/min; respiratory rate > 20 breaths/min or PaCO2 < 32 mmHg.

Selection of predictors

All case samples were randomly divided into a training set (70%) and a validation set (30%) to aid model development and evaluation. A total of 32 variables were extracted from clinical, imaging, and laboratory data (Table 1). Univariate analysis of variance was performed solely on the training set. Variables that showed statistical significance (P < 0.05) were selected for further analysis utilizing Lasso regression. Variables were chosen exclusively within the training set to simulate model performance on unseen data, ensuring better generalization. Selecting variables across the entire dataset could inadvertently expose information from the validation set, leading to biased model evaluation.

Table 1 Univariate analysis between SIRS and non SIRS groups in the training and validation set.

Data balancing

The dataset was imbalanced, with fewer patients presenting with SIRS after renal endoscopic lithotripsy compared to those without SIRS. Without adjustment, the model might be biased toward predicting the majority class, reducing accuracy for the minority class33. The Synthetic Minority Over-sampling Technique (SMOTE) was applied to the training data to generate synthetic samples for the minority class. This method helps the model learn data features more effectively and reduces the risk of overfitting34.

Model development

Variables screened by Lasso regression analysis were incorporated into five ML algorithms: MLP, SVM, XGBoost, GNB, and LR to construct predictive models. Parameters for each model were automatically tuned utilizing Bayesian optimization techniques, which construct a probabilistic model of the objective function to find optimal hyperparameters. To prevent overfitting, performance was evaluated utilizing 10-fold cross-validation. The goal was to identify high-risk patients for SIRS following renal endoscopic lithotripsy and provide guidance for clinicians in delivering timely and effective interventions.

Evaluation of model performance

Receiver Operating Characteristic (ROC) curves were generated for each of the five models, and the area under the curve (AUC) was utilized to assess their capacity to detect SIRS. Sensitivity, specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), accuracy, and F1 score (which considers both precision and recall) were also calculated for a comprehensive performance evaluation. The threshold for performance metrics was determined utilizing the Youden Index, with each training session calculating a threshold based on this index. The final threshold was selected based on the model’s optimal performance. Decision curve analysis (DCA) and calibration curves were used to assess the diagnostic capability of the final model, ensuring its robustness and reliability in predicting SIRS.

Visualization of predictors

Traditional ML models are often considered “black boxes” due to their lack of transparency in showing how features influence predictions, limiting clinical utility. The SHAP method was used to explain the best-performing model, highlighting feature importance and the relationship between features and outcomes. This approach enhances the transparency and interpretability of the model, mitigating the typical “black box” limitation.

Quantification and statistical analysis

Statistical analyses were performed utilizing SPSS 26.0. The Shapiro-Wilk test assessed distribution normality. For normally distributed data, t-tests were applied, while the Mann-Whitney U test was used for non-normal data. Categorical data were expressed as percentages, with differences assessed via chi-squared or Fisher’s exact tests. A two-tailed P-value < 0.05 was considered significant. Analyses were also conducted utilizing R version 4.2.2 and Python version 3.11.4.

Result

Study population

The study flowchart is presented in Fig. 1. A total of 833 eligible patients were included, of which 126 (15.1%) developed postoperative SIRS and 61 (7.3%) developed sepsis. Among these, 101 cases of SIRS (12.1%) occurred after PCNL surgery, while 25 cases (3%) occurred after RIRS. The HU of stones in the SIRS group was 895.88 ± 278.41, compared to 1106.04 ± 361.71 in the non-SIRS group, with a significant difference (P < 0.001) (Fig. 2a). After surgery, the stone composition was analyzed utilizing an infrared spectrometer, and a bar chart illustrating the percentage accumulation of stone components was generated for both the Non SIRS group and the SIRS group (Fig. 2b). There were 43 cases (34.13%) of infectious stones in the SIRS group and 154 cases (21.78%) of infectious stones in the Non SIRS group (P = 0.003) (Fig. 2c).

Fig. 2
figure 2

Comparative analysis of stone characteristics between NON-SIRS and SIRS groups. (a) The box plot compares the HU values of the Non-SIRS group (red box) and the SIRS group (orange box). The median for the Non-SIRS group is higher, above 1000 HU, while the SIRS group has a lower median, around 800 HU. The four asterisks (****) above the plot indicate a highly statistically significant difference between the groups (P<0.001). (b) The graph is a percentage stacked histogram showing the percentage composition of stone components in the NON-SIRS and SIRS groups. (c) The graph demonstrates the proportion of infection stones in the NON-SIRS and SIRS groups. The proportion of infection stones was significantly higher in the SIRS group (orange columns) than in the NON-SIRS group (red columns). Two asterisks (**) in the graph indicate that the difference between the two groups is statistically significant (P = 0.025).

One-way analysis of variance and lasso regression

The 833 patient samples were randomly divided into a training set (70%) and a validation set (30%). The prevalence of SIRS in the overall sample (15.1%) did not differ significantly from that in the training set (14.9%) and the validation set (15.6%). As shown in Table S1, no significant differences (P > 0.05) were observed in any variables between the training and validation sets. Univariate analysis of the training set (Table 1) identified 15 factors associated with postoperative SIRS following renal endoscopic lithotripsy. These factors included diabetes (P = 0.044), Staghorn stone (P < 0.001), Number of stones (P = 0.002), Stone burden (P < 0.001), CT value (HU) (P < 0.001), degree of hydronephrosis (P < 0.001), Urinary protein (P < 0.001), Urinary leukocyte esterase (ULE) (P = 0.021), urine turbidity (P < 0.001), Urine culture (P < 0.001), Urease producing bacteria, (P < 0.001), Serum uric acid (P = 0.003), BUN (P = 0.002), Surgical method (P < 0.001), and Operation time (P < 0.001). C-reactive protein and procalcitonin were excluded due to excessive missing data.

Variables with a P-value of < 0.05 from the univariate analysis of the training set were subsequently incorporated into Lasso regression, which led to the selection of 10 key variables for model construction: operation time, urease-producing bacteria, urine turbidity, urine culture, urinary protein, degree of hydronephrosis, HU, BUN, serum uric acid, and stone burden(Fig. 3a,b).

Fig. 3
figure 3

Use of lasso regression analysis for variable selection. (a) Vertical lines were drawn over selected values using 10-fold cross validation, where the best lambda produced 10 non-zero coefficients. (b) Distribution of coefficients for 15 texture features extracted from the log (λ) sequence. Vertical dashed lines are drawn at the minimum mean square error (λ = 0.011) and the standard error of the minimum distance (λ = 0.054). optimal λ When the value is 0.011, our model selects 10variables: Operation time, Urease producing bacteria, Urine turbidity, Urine culture, Urinary protein, Degree of hydronephrosis, HU, BUN, Serum uric acid, Stone burden.

Modeling and evaluation

Before balancing, the training set included 87 positive and 496 negative samples. After applying the SMOTE algorithm, the number of samples in the SIRS group increased to 496. The validation set contained 39 positive and 211 negative samples. utilizing the aforementioned 10 selected variables and the oversampled training set, five models were developed: MLP, SVM, GNB, XGBoost, and LR.

For each model, Bayesian optimization was used to automatically adjust parameters, and 10-fold cross-validation was performed to evaluate performance based on various metrics. The performance evaluation results for each model in the training and validation sets are summarized in Table S2 and Table 2, respectively. These include the AUC, accuracy, specificity, sensitivity, PPV, NPV, and F1 score. The ROC curves for the different models are shown in Fig. 4a,b. The results indicate that XGBoost performed the best on the validation set with the following metrics: AUC: 0.858, sensitivity: 0.877, specificity: 0.981, accuracy: 0.841, PPV: 0.629, NPV: 0.851, and F1 score: 0.830. Overall, XGBoost and GNB outperformed the other three models. Considering both sensitivity and specificity, XGBoost was determined to be the optimal model. As a result, XGBoost was selected for further prediction and analysis.

Table 2 Diagnostic performance of different machine models for infected stones in the validation set.
Fig. 4
figure 4

Comprehensive analysis of machine learning models in training and validation sets. (a) Training cohort ROC and AUC; (b) Validation cohort ROC and AUC. If the ROC curve of one model is completely covered by the ROC curve of another model, it can be concluded that the latter is better than the former, and the higher the AUC value, the better the model performance. The different colors in the image represent the corresponding models.

Feature importance and SHAP values

Since XGBoost was identified as the best model for predicting SIRS, SHAP (Shapley Additive Explanations) was used to interpret the relationship between the model’s features and outputs. Figure 5a shows the average absolute SHAP values for the 10 features, representing their importance in the model’s predictions. HU, urinary protein, stone burden, and serum uric acid have the greatest impact on the model, while blood urea nitrogen (BUN), urine turbidity, degree of hydronephrosis, and urine culture exert a smaller influence. Figure 5b provides further insights into the positive or negative impacts of these features on the XGBoost model. The vertical axis represents the features, while the horizontal axis shows the SHAP values. Each point on the feature lines corresponds to a sample, with red points showing higher feature values and blue points indicating lower values. Factors such as moderate stone HU, increased stone burden, positive urine protein, elevated serum uric acid, urease-producing bacterial infection, prolonged surgical time, higher BUN levels, turbid urine, worsened hydronephrosis, and positive urine culture are all associated with an increased likelihood of SIRS development.

Fig. 5
figure 5

SHAP summary plot for the 10 clinical features contributing to the XGBoost model (a) Ranking of feature importance by mean(|SHAP value|). (b) The distribution of the impact of a risk factor value on the model output.The contribution of each feature of each patient to the model corresponds to a dot. The dots are coloured according to the values of features. Red represents a higher feature value, and blue represents a lower feature value. SHAP value = SHapley Additive exPlanations value, BUN = Blood Urea Nitrogen.

Model evaluation

DCA demonstrated that the XGBoost model offers superior net clinical benefit in predicting postoperative SIRS (as shown in Fig. 6a). utilizing the model for decision-making offers greater advantages than the “treat all” or “treat none” strategies, underscoring its clinical value in guiding decision-making. The Hosmer-Lemeshow test showed a well-fitting calibration curve (Fig. 6b), confirming the XGBoost model’s high accuracy and strong diagnostic performance in the cohort.

Fig. 6
figure 6

XGBoost model comprehensive analysis. (a) DCA curve of the XGBoost model for the validation set. The black dotted line represents the assumption that all patients have SIRS, while the red dotted line indicates the assumption that no patients have SIRS. The remaining solid line corresponds to the XGBoost model. (b) Calibration curve of the XGBoost model for the validation set. The diagonal dashed line serves as a reference line, while the smooth solid line represents the fitted line of the XGBoost model. The closer the fitted line is to the reference line, the more accurate the model’s predictions are.

Discussion

In recent years, ML algorithms have gained increasing traction in the medical field, aiding clinicians in diagnosing diseases more swiftly and accurately while enabling personalized treatment plans. This study applied five ML algorithms to develop a predictive model for SIRS following renal endoscopic lithotripsy, focutilizing on ten key predictors: operation time, urease-producing bacteria, urine turbidity, urine culture, urinary protein, degree of hydronephrosis, HU, BUN, serum uric acid, and stone burden. Among the tested models, XGBoost yielded the best performance for early identification of urinary sepsis, achieving an AUC of 0.858, sensitivity of 0.877, specificity of 0.981, accuracy of 0.841, PPV of 0.629, NPV of 0.851, and an F1 score of 0.830. DCA and calibration curves confirmed the model’s clinical feasibility in predicting SIRS complications after renal stone lithotripsy. Additionally, the SHAP method was employed to improve the model’s interpretability, identifying the importance of each feature in the predictions. SHAP analysis revealed that HU was the most influential feature in the model’s predictions. Interestingly, blood uric acid, though initially considered of lower interest, was found to have significant predictive value for SIRS. This predictive model offers a rapid and clinically relevant method for assessing the risk of postoperative urinary sepsis in patients with kidney stones.

PCNL and RIRS are the two main approaches to the endoscopic treatment of renal stones3. Both approaches offer distinct advantages and are widely adopted in clinical practice. In recent years, advancements in minimally invasive surgical techniques have significantly improved stone clearance rates while reducing complication incidences35. However, postoperative infections remain the most common complication and can progress to SIRS with even fatal consequences13,36,37. Infection stones are a significant independent risk factor for postoperative SIRS in renal stone cases38,39. The morbidity and mortality rates associated with postoperative sepsis are notably higher in patients with infection stones compared to those with stones of other compositions40,41. In the present study, 43 (34.13%) patients with infection stones were in the SIRS group, while 154 (21.78%) were in the non-SIRS group, showing a significant difference (P = 0.003; Fig. 2b,c). Urease-producing bacteria, including Proteus mirabilis, Klebsiella pneumoniae, and Pseudomonas aeruginosa, are closely linked to the formation of infection stones. These bacteria, present in higher concentrations in infection stones, release endotoxins that contribute to the development of infection stones40,42. During lithotripsy, these bacteria and endotoxins may enter the bloodstream due to hydrostatic pressure from renal perfusion or tissue damage, increasing the risk of systemic inflammatory reactions43. Moreover, infection stones can rapidly grow and form staghorn stones within 4–6 weeks, making infection stones the most common cause of staghorn stone formation40.

Moreover, previous studies have shown that patients with larger stone loads, longer operative times, and more severe urinary tract obstruction have a higher risk of postoperative SIRS25,44,45,46, consistent with our findings (Fig. 5b). Larger stone diameters are more likely to cause ureteral epithelial abrasion, leading to minor injuries that increase the risk of subsequent infections. These stones also contribute to urinary tract obstruction, preventing the effective expulsion of infected urine. This obstruction raises pressure in the renal pelvis and calyces, facilitating the entry of endotoxins into the systemic circulation and exacerbating systemic inflammatory responses. Furthermore, larger stones complicate the surgery, extending operative time and increasing the risk of iatrogenic injury and prolonged tissue exposure, both of which heighten the likelihood of infection from stone-associated and surrounding flora47,48.

Peng L et al.49 demonstrated that there is a correlation between stone composition and infection complications, with infectious stones being an important independent risk factor for urinary sepsis. However, since stone composition analysis is typically performed postoperatively, it does not contribute to infection prediction models. Although there is no gold standard for identifying infection stones preoperatively, they can be clinically recognized by the presence of urease-producing bacteria and reduced CT values27. In this study, the average stone HU in the SIRS group was 895.88 ± 278.41, while in the non-SIRS group, it was 1106.04 ± 361.71, with a significant difference (Fig. 2a). This difference correlates with a higher proportion of infectious stones in the SIRS group (Fig. 2b,c). Additionally, the SHAP method applied to the final model revealed that HU was the most influential factor in the model’s predictions (Fig. 5).

In addition to stone composition, several factors related to preoperative urinary tract infections are important predictors of postoperative SIRS49,50. These factors include positive urinary nitrite, positive urinary proteins, turbid urine, and positive mid-stream urine cultures. Preoperative urinary tract infections can significantly increase the risk of postoperative sepsis following lithotripsy51. Although a positive mid-stream urine culture has traditionally been considered a crucial factor in predicting postoperative SIRS and urosepsis52, our study found that only 61 patients (48.41%) in the SIRS group had positive mid-stream urine cultures. SHAP analysis further indicated that mid-stream urine culture positivity had a limited impact on predicting SIRS (Fig. 5). Mid-stream urine cultures may not accurately reflect the microbial state of the upper urinary tract, particularly when large or complex stones cause obstruction, preventing renal urine from reaching the bladder. Recent studies suggest that renal pelvic urine and stone cultures have higher positivity rates and may serve as more reliable predictors of postoperative urosepsis than mid-stream urine cultures40,53. However, due to prolonged culture times, the early predictive value of intraoperative renal pelvic urine and stone cultures is limited, similar to stone composition analysis. As a result, preoperative mid-stream urine culture remains the most effective early predictor.

Our study also revealed that elevated preoperative serum uric acid levels are important predictors of postoperative SIRS (Fig. 5). Although no consensus exists on the correlation between serum uric acid levels and sepsis54, some studies suggest a positive association between elevated serum uric acid levels and the occurrence and severity of sepsis55. Serum uric acid may contribute to inflammatory responses, oxidative stress, and immune regulation. Inflammation triggers increased intracellular purine release, which stimulates uric acid production and release56. Oxidative stress may also disrupt purine metabolism, further elevating uric acid levels57. Additionally, immune dysfunction in patients with gout may increase susceptibility to infections58.

Our study has several strengths. First, we applied SMOTE for oversampling to address bias from unbalanced data and prevent overfitting. Second, we developed five classical ML models and identified XGBoost as the most effective. Third, we utilized the SHAP method to interpret the relationship between input features and the XGBoost model’s output, improving clinicians’ intuitive understanding and supporting the personalization of preventive treatments, thereby improving the model’s clinical applicability.

However, the study has some limitations. As a retrospective analysis conducted at a single institution, it lacks external validation from multiple centers, which limits the generalizability of the model to other populations. Additionally, newer biomarkers such as calcitonin and C-reactive protein were not included due to medical constraints. Some procedural risk factors may also have been overlooked, potentially influencing the results. The retrospective design introduces potential biases, including selection and information biases, which should be considered in the interpretation of the findings. Although the model demonstrated strong fit and discrimination, external validation is still required. Further large-scale, prospective, multicenter studies are needed to optimize and validate the model.

Conclusions

A ML model was developed to predict postoperative SIRS after endoscopic kidney stone surgery utilizing CT values. The model demonstrates promising predictive performance and may assist in assessing urosepsis risk in postoperative patients. The SHAP method revealed HU and serum uric acid as key predictive features. The link between HU, hyperuricemia, and urosepsis opens new research avenues and potential therapeutic strategies. However, external validation through further studies is necessary to confirm these findings and provide a foundation for developing new therapeutic approaches.