Introduction

Gastrointestinal bleeding (GIB) is a severe and potentially life-threatening medical condition, responsible for over half a million hospitalizations each year in the United States1. GIB is classified as either upper or lower, based on its anatomical ___location relative to the ligament of Treitz. Upper GIB constitutes a range of 61 to 78 cases per 100,000 annual hospital admissions and is associated with a mortality risk ranging from 2 to 10%2,3,4,5. Lower GIB comprises a range of 33 to 87 cases per 100,000 annual hospital admissions and carries a mortality risk ranging from 2 to 4%6,7. Risk stratification is strongly recommended in the management of GIB. It not only assists clinicians in making informed clinical decisions and enhancing patient prognosis but also supports patient and family counseling, explanations, and resource allocation5. Many risk assessment scores have been constructed for mortality prediction in patients with upper GIB, including the Glasgow Blatchford score (GBS), Rockall score, and AIMS65 score8. However, Currently, there is no widely utilized scoring system for predicting adverse outcomes in lower GIB9.

Additionally, differentiating between upper and lower GIB solely based on clinical symptoms is not a consistently accurate approach. For example, some patients with lower GIB can present as melena, characterized by black and tarry stools. Conversely, brisk or rapid upper GIB can also lead to hematochezia. For patients admitted to the intensive care unit (ICU) with GIB, it can be challenging to determine the ___location of the bleeding before undergoing imaging and endoscopic examination. Consequently, we may encounter a situation where there is no risk stratification scoring system that can be used. Therefore, there is an immediate need for a risk stratification predictive model that applies universally to all patients with GIB, regardless of whether the bleeding is from the upper or lower gastrointestinal tract. A nomogram is a visual statistical model that finds widespread use in medical research for assessing the prognosis of related diseases. The aim of this study was to create and validate a dynamic nomogram for predicting in-hospital mortality in patients admitted to the ICU due to GIB.

Methods

Data source and outcome

This retrospective cohort study retrieved critical care data from two databases: the Electronic Intensive Care Unit Collaborative Research Database (eICU-CRD) and the Medical Information Mart for Intensive Care-IV database (MIMIC-IV). The eICU-CRD (version 2.0) is a multicenter database encompassing more than 200,000 ICU admissions from 208 hospitals in the United States between 2014 and 201510. The MIMIC-IV (version 2.2) is a comprehensive single-center database that includes data from more than 70,000 ICU patients admitted to the Beth Israel Deaconess Medical Center between 2008 and 201911. Both databases are deidentified patient databases and offer a wealth of patient information, including demographics, bedside monitoring data, diagnoses coded with International Classification of Diseases version 9 and 10 (ICD-9 and ICD-10), laboratory results, treatment information, radiology reports, clinical outcomes, and more. After completing an extensive training course titled 'Protecting Human Research Participants,' the author (Kang Zou) received permission to access the two databases for research purposes (certification number: 11739201). The MIMIC database has received ethical approval from the Institutional Review Boards of both Beth Israel Deaconess Medical Center (Boston, MA, USA) and the Massachusetts Institute of Technology (Cambridge, MA, USA). For the eICU-CRD, the security schema, which was independently evaluated by a privacy expert (Privacert, Cambridge, MA), was certified as meeting the reidentification risk safe harbor standards (Health Insurance Portability and Accountability Act Certification No. 1031219-2). Hence, this study may qualify for exemption from institutional review board approval owing to its retrospective design, absence of direct patient intervention, and utilization of deidentified data. The primary outcome of this study was all-cause in-hospital mortality among patients with GIB admitted to the ICU.

Study patients

All patients diagnosed with “GI bleeding” from the eICU-CRD were included in the study. For the MIMIC-IV, patients were identified through a manual search of ICD-9 (578.9) and ICD-10 (K92.2) codes. Exclusion criteria were as follows: (1) not admitted to the ICU, (2) not the first ICU stay, (3) missing primary outcome data, (4) missing demographic information, (5) lack of comorbidity details, (6) vital sign information being unavailable, (7) missing medication information, and (8) missing elements of the AIMS65 score. The patient selection process is visually depicted in Fig. 1.

Fig. 1
figure 1

Flow chart of patients selection.

Data collection and variable extraction

For this study, we collected data including the general demographic, vital signs, laboratory test results, treatment information, comorbidities, and score systems. The data collection process, based on the methodology by Cosgriff12, was followed with a minor adaptation. Variables were excluded based on two criteria: (1) variables with missing data exceeding 30% and (2) variables not shared by both databases. Based on the above principles, the following indicators were excluded: height, temperature, amylase, lipase, creatine kinase, lactate dehydrogenase, ionized calcium, fibrinogen, lactate, lymphocyte, neutrophil, fraction of inspired oxygen, partial pressure of oxygen, partial pressure of carbon dioxide, and potential of hydrogen. Consequently, the following demographic information from admission was extracted: age, sex, and race. Additionally, the mean values of vital signs in the first 24 h were computed, which included heart rate, respiratory rate (RR), systolic blood pressure (SBP), diastolic blood pressure, and oxyhemoglobin saturation (Spo2). The maximum values recorded within 24 h of admission were utilized for the following variables: creatinine, blood urea nitrogen, total bilirubin, aminotransferase alanine (ALT), aminotransferase aspartate (AST), alkaline phosphatase (ALP), international normalized ratio, prothrombin time, partial thromboplastin time (PTT), and anion gap. The minimum values within the first 24 h were extracted for bicarbonate, albumin, platelets, hemoglobin, and hematocrit. For the following variables, both the minimum and maximum values within 24 h after admission were selected: the maximum of white blood cells and the minimum of white blood cells. Medication information encompassed the use of antiarrhythmic drugs, antibiotics, blood products, diuretics, sedatives, and vasopressors within the first 24 h. Comorbidities were represented as binary variables, indicating the presence or absence of conditions, such as liver cirrhosis, atrial fibrillation, chronic kidney disease, coronary artery disease, chronic obstructive pulmonary disease, hypertension, and diabetes. Furthermore, score systems including the Sequential Organ Failure Assessment (SOFA), Glasgow Coma Scale (GCS), and AIMS65 scores, along with the hospital expire flag indicating in-hospital death, were also included. To mitigate the impact of missing data on the results, multiple imputation was employed to handle missing values.

Statistical analysis

Continuous variables with a normal distribution were presented as the mean ± standard deviation (\(\overline{x}\hspace{0.17em}\)± s), and the comparison between groups was performed using Student’s t test. For variables with nonnormal distribution, median and interquartile ranges [M (Q1, Q3)] were used, and the comparison was performed using the Mann‒Whitney U rank-sum test. Categorical variables were represented as counts and percentages [n (%)], and group comparisons were conducted using either the χ2 test or Fisher’s exact test. From the eICU-CRD database, all patients were randomly allocated to either the development or validation group using a 7:3 ratio. In the development group, the least absolute shrinkage and selection operator (LASSO) regression method was employed to select the best predictors. Subsequently, based on these selected variables, multivariate logistic regression was utilized to establish the nomogram in the development group, followed by validation in the validation group. Additionally, external validation was performed using data from the MIMIC-IV database. Various indicators, including the receiver operating characteristic curve (ROC), the area under the ROC (AUC), sensitivity, specificity, and Yoden index, were employed to evaluate the discrimination of the predictive model. The whole cohort was classified into low-risk and high-risk groups according to the cutoff value of the nomogram prediction scores. To compare the AUC of the nomogram and the scoring systems, the Delong test was utilized. Additionally, we evaluated the nomogram’s calibration using calibration curves and assessed the predictive model’s clinical utility through decision curve analysis (DCA). All statistical analyses were performed using R software (version 3.6.3), and statistical significance was defined as p < 0.05.

Results

Baseline characteristics from the eICU-CRD

As presented in Table 1, a total of 2929 patients were included in this study, drawn from the eICU-CRD. Within this cohort, 191 patients died, while the remaining 2738 patients survived their hospital stay. Consequently, the overall in-hospital mortality rate was approximately 6.5%. The average age of the total cohort was 68.0 years, with a gender distribution of 58.5% being male. Among the patients, 43.6% were admitted with a primary or major diagnosis of upper GIB upon their admission to the ICU. Moreover, 30.1% of patients were admitted due to lower GIB, and 26.3% had an unspecified source for their condition. In addition, the median length of ICU stay was 43.0 h. Based on the above exclusion criteria, 1951 patients were excluded from the analysis due to missing components of the AIMS65 score. Considering that a significant number of patients were excluded, we also compared the differences in characteristics between these two groups in Supplementary Table S1.

Table 1 Summary of demographic and clinical features of eICU patients with gastrointestinal bleeding.

Predictive nomogram development

The development group consisted of 2050 patients, while the validation group comprised 879 patients. The LASSO regression was employed to identify significant variables from 43 alternatives. The analysis revealed that the most influential predictors of in-hospital mortality in GIB patients were SBP, Albumin, Spo2, RR, Vasopressor use, Bicarbonate, ALT, AST, ALP, PTT, and GCS score, as illustrated in Supplementary Fig. S1.

Based on the above variables, a predictive model represented by a nomogram was constructed through multivariate logistic regression (Fig. 2a). The results of the multivariate logistic regression analysis identified 8 independent predictors of in-hospital mortality in GIB patients, as outlined in Table 2.

Fig. 2
figure 2

Nomogram to estimate the risk of in-hospital mortality in patients with GI bleeding. (a) Ordinary nomogram. (b) The schematic outline of dynamic nomogram. To obtain the accurate probability of in-hospital mortality, select the value of each variable and then click on the “Predict” button. The probability and 95% confidence interval will receive. For properly function of the site, please click the “Quit” button after finish predicting. HR, heart rate; RR, respiratory rate; SBP, systolic blood pressure; Spo2, oxyhemoglobin saturation; ALP, alkaline phosphatase; PTT, partial thromboplastin time; GCS, Glasgow coma scale.

Table 2 Risk factors selected after multivariate logistic regression analysis.

To estimate an individual patient’s risk of in-hospital mortality, the specific values for each variable can be obtained based on the patient’s clinical characteristics. Then, draw a vertical line to the ‘Points’ axis to determine the points for each variable. Sum these points across all eight variables to calculate the total points. Finally, you can ascertain the patient’s probability of in-hospital mortality by drawing a vertical line from the ‘Total Points’ axis to the 'Probability of in-hospital mortality’ axis. For precise calculation, the dynamic nomogram can be utilized by visiting https://kangzou.shinyapps.io/DynNomapp_GIB/. For example, a patient with GIB admitted to the ICU had mean values of 91 mmHg for SBP, 24 breaths/min for RR, and 95% for Spo2 in the first 24 h. The maximum values within 24 h were 30 s for PTT, 71 Units/L for ALP, and 14 points for GCS score. The minimum values for albumin and bicarbonate in the first 24 h were 25 g/L and 8 mmol/L, respectively. Using these values, we can determine the patient’s exact probability of in-hospital mortality, which was 0.399 (with a 95% confidence interval of 0.268–0.546), as shown in Fig. 2b.

Evaluation and validation of the nomogram

The ROC and AUC for the nomogram and score systems were calculated and compared in both the development and validation groups (Fig. 3a and b). Table 3 displayed the comparison of the nomogram model and score systems in the development and validation sets. In the development group, the AUCs for the nomogram, SOFA, and AIMS65 scores were 0.893, 0.828, and 0.794, respectively. The discrimination ability of the nomogram was significantly superior to both the SOFA score (p < 0.001) and the AIMS65 score (p < 0.001). In the validation group, the AUCs were 0.860, 0.842, and 0.743, respectively. Compared to the AIMS65 score, our nomogram demonstrated better discrimination (p < 0.001). However, there was no significant difference between the SOFA score and the nomogram (p = 0.459). In patients with upper GIB, the AUCs of the nomogram, SOFA, and AIMS65 scores were 0.831, 0.858, and 0.725, respectively. The nomogram outperformed the AIMS65 score (p = 0.045) but was equal to the SOFA score (p = 0.575). In patients with lower GIB, the AUCs of the nomogram, SOFA, and AIMS65 scores were 0.866, 0.851, and 0.763, respectively. The nomogram also exhibited better discrimination compared to the AIMS65 score (p = 0.046) but was equal to the SOFA score (p = 0.741).

Fig. 3
figure 3

The discrimination, calibration, and clinical value of the nomogram. (a and b) Receiver operating characteristic curves in the development and validation group by Nomogram and Score systems. (a) development group; (b) validation group. (c and d) Calibration curves of development and validation group by Nomogram and Score systems. (c) development group; (d), validation group. (e and f) Decision curve analysis (DCA) of Nomogram and Score systems. (e) development group; (f) validation group. Model 1, nomogram; model 2, SOFA score; model 3, AIMS65 score.

Table 3 Comparison of the nomogram model and score systems for predicting the in-hospital mortality of gastrointestinal bleeding in the development, internal validation and external validation sets.

The calibration curves were depicted in Fig. 3c and d. The calibration curves exhibited that a strong alignment between the predicted probability and the actual probability of the nomogram in both the development and validation groups, surpassing the SOFA and AIMS65 scores.

Figure 3e and f displayed the DCA curve, indicating that the nomogram, SOFA score, and AIMS65 score offered greater benefits compared to the strategies of treating all patients or treating none. Furthermore, in comparison to the SOFA and AIMS65 scores, our nomogram exhibited a significantly higher net benefit, underscoring its superior clinical utility.

To enhance the model’s generalizability, we conducted external validation using data from the MIMIC-IV database. In this external dataset, there were 718 patients, with 182 patients dying and 536 surviving during hospitalization. The overall hospital mortality rate was approximately 25.3%. The clinical features of patients of the MIMIC-IV database were displayed in Supplementary Table S2. In Fig. 4a, the ROC curve for the external cohort was displayed. In this cohort, the AUC, sensitivity, and specificity of the nomogram were 0.781, 0.665, and 0.765, respectively (Table 3). Figure 4b presented the calibration curve, demonstrating a strong concordance between the predicted probability of the nomogram and the actual probability in the external cohort.

Fig. 4
figure 4

The discrimination and calibration in external validation set based on the data of the Medical Information Mart for Intensive Care-IV database. (a) Receiver operating characteristic curve. (b) Calibration curve.

The best cutoff value of nomogram scores was 101.974, calculated from the ROC of the development group. Patients were classified into low-risk (nomogram scores ≤ 101.974) and high-risk groups (nomogram scores > 101.974) according to the nomogram scores (Table 4). In the whole internal validation group, the nomogram classified 82.7% (727/879) of patients as low risk with a mortality of 2.8% (20/727). On the other hand, 17.3% (152/879) of patients were classified as high risk, with a mortality rate of 25.7% (39/152). Furthermore, the nomogram demonstrated its ability to identify patients at very low risk during hospitalization, encompassing 82.9% (340/410) of patients with upper GIB and 85.5% (212/248) of patients with lower GIB, both having a mere 2.4% risk of death. Conversely, the nomogram effectively identified patients at high risk during hospitalization, encompassing 17.1% (70/410) of patients with upper GIB and 14.5% (36/248) of patients with lower GIB, with considerably higher mortality rates of 18.6% and 27.8%, respectively.

Table 4 Outcome in classified low-risk and high-risk patients according to cutoff value of nomogram scores.

Discussion

GIB resolves spontaneously in as many as 80% of patients, while approximately 15% of patients who experience persistent bleeding will need medical intervention13. For patients with severe GIB, intensive care and advanced life support were needed. Currently, there is a lack of predictive model widely used for assessing the risk of mortality in ICU patients with GIB, regardless of the ___location of the bleeding. Early risk stratification enables timely interventions in high-risk patients and more efficient allocation of resources. This study developed a dynamic nomogram using data from the eICU-CRD. Through multivariate regression analyses, 8 predictors, including SBP, Spo2, RR, albumin, bicarbonate, ALP, PTT, and GCS score, were identified as independent prognostic factors for in-hospital mortality in patients with GIB. The nomogram exhibited excellent predictive performance for in-hospital mortality in both upper and lower GIB cases. External validation received similarly promising results. Furthermore, when compared to existing scoring systems (SOFA and AIMS65 scores), our nomogram also showed favorable predictive performance.

While most cases of lower GIB tend to resolve on their own and can be treated conservatively, it is essential to note that significantly lower GIB cases can carry an in-hospital mortality rate of up to 8.8%, and approximately 18% of these cases may require surgical intervention14. In contrast, upper GIB cases result in nearly six times as many hospitalizations compared to lower GIB cases, and they are linked to an increased risk of morbidity and mortality15. Irrespective of the bleeding source, patients with GIB necessitate timely assessment, resuscitation if they are in an unstable condition, and prompt intervention if they are at high risk of decompensation.

For patients with upper GIB, various pre-endoscopy and post-endoscopy risk scoring systems, including the GBS, Rockall score, and AIMS65 score, are recommended by most clinical guidelines to predict outcomes such as the need for hospital-based interventions and endoscopic interventions16,17,18. Notably, the GBS is particularly advantageous in identifying low-risk patients with upper GIB, potentially reducing hospital admissions by 15 to 20% and facilitating outpatient management19. However, the efficacy of these scores in predicting mortality risk is still controversial, with several studies reporting AUC values not exceeding 0.820,21, thereby indicating limited clinical utility. In contrast to upper GIB, there remains a dearth of risk assessment tools for patients with lower GIB, with few recommendations from clinical guidelines. Scores like the GBS, Rockall, and AIMS65, which are popular in upper GIB research, lack robust evidence supporting their use in lower GIB risk stratification. The Oakland score, developed and validated through a large multicenter study, has shown promise for safely managing lower GIB patients on an outpatient basis. However, this study also found that the Oakland, Rockall, Blatchford, Strate, BLEED, AIMS65, and NOBLADS scores have poor predictive ability for mortality in lower GIB patients, with all AUC values below 0.822. The updated American College of Gastroenterology guidelines for acute lower GIB only recommend using the Oakland and SHA2PE scores to identify low-risk patients suitable for early discharge and outpatient evaluation9. Nevertheless, these recommendations still require validation through multicenter large-sample studies to ensure their safety in emergency department and ICU settings. Currently, there remains a lack of reliable clinical scoring systems for predicting in-hospital mortality for patients with GIB, particularly for those with lower GIB.

Clinical presentation alone is insufficient for determining the ___location of GIB, particularly in ICU patients. In the present study, we found there were still 26.3% of patients with GIB that could not determine the bleeding site, leaving clinicians without reliable means to assess in-hospital mortality risk for this population. GBS and Oakland scores appear to be promising options, but few studies have verified their performance in this specific ICU population. Therefore, developing a mortality risk prediction score applicable to patients with GIB of any source would be highly beneficial. The UK National Confidential Enquiry into Patient Outcome and Death report on gastrointestinal bleeding also explicitly recommended the creation of a comprehensive risk score that could be universally utilized in the evaluation and management of all patients presenting with GIB from any source23.

In this study, we incorporated the ICU General Score (SOFA) and the Upper GIB Specific Score (AIMS65) to compare with the nomogram. The AIMS65, a simple risk score containing five predictors, can easily predict the in-hospital mortality of patients with upper GIB. The AUCs of the AIMS65 score were all below 0.80 in both upper and lower GIB in the present study, which is lower than in some studies24,25. However, most of the previous studies were small samples, and we speculated that the AIMS65 score may not be suitable for predicting the mortality of patients admitted to the ICU. The SOFA score is a valuable tool for monitoring the progression of organ dysfunction or failure over time. It proves particularly beneficial in assessing the risk of in-hospital mortality among patients admitted to the ICU. Ferreira et al.26 used the highest SOFA score during ICU stay to predict mortality, which received an AUC of 0.90. In the present study, the SOFA score also showed a favorable predictive value in both upper and lower GIB, with no significant difference between our nomogram. It suggested that the ICU general scores, such as SOFA, continue to play an indispensable role in the risk stratification of patients with GIB.

Recently, Laursen SB et al.27 developed an age, blood tests, and comorbidities (ABC) score that can also predict the mortality of patients with GIB from any source. However, we did not calculate the ABC score in this study, as we could not extract some comorbidity information from the databases, such as the American Society of Anesthesiologists Physical Status scoring system. In the low-risk group, our nomogram identified 82.9% of patients with upper GIB and 85.5% with lower GIB, with a mortality of 2.4%. The ABC score constructed by Laursen SB identified 40% of patients with upper GIB and 71% with lower GIB with a mortality of ≤ 1%. Compared to the ABC score, patients classified as low risk using the nomogram seemed to have a higher mortality. However, the cohort in the present study was patients admitted to the ICU, and the overall mortality was higher than that in the general ward. The mortality was even higher in the external validation group, as the overall mortality of GIB in MIMIC-IV was as high as 25.3%. In addition, the proportion of low-risk patients was also higher than the ABC score, as we did not have a medium group classification. In the high-risk group, our nomogram identified 17.1% of patients with upper GIB and 14.5% with lower GIB, with mortality rates of 18.6% and 27.8%, respectively. The ABC score identified 15% of patients with upper GIB and 3% with lower GIB with a mortality of 18–25%. The mortality in high-risk patients identified by the nomogram and ABC score seemed similar. However, the proportion classified as high risk in lower GIB using the nomogram was higher than that by the ABC score. We speculated that most lower GIB cases are self-limiting, but the situation is different in patients admitted to the ICU. Among most scoring systems, GBS is the only one validated by multiple studies to identify low-risk patients with upper GIB. A GBS cut-off value of ≤ 1 or 2 corresponds to a mortality rate of less than 1% in the low-risk group22. In our study, among the 340 patients identified by nomogram as low-risk upper GIB, 8 patients still died. Hence, we believe that GBS remains indispensable for the outpatient management of very low-risk patients. A major limitation of our study is that we did not compare the GBS score with our nomogram due to the lack of GBS scoring components in the two databases. For example, we were unable to extract from the database whether the patients exhibited clinical manifestations such as melena or syncope. Future research could aim to compare the predictive performance of the nomogram and the GBS score in a prospective study, which would provide valuable insights.

To our knowledge, this is the first nomogram that focuses on predicting in-hospital mortality in ICU patients with both upper and lower GIB. Furthermore, this nomogram serves as a preendoscopy predictive model, utilizing variables that are readily available in the first 24 h after ICU admission. Building upon the foundation of the traditional ordinary nomogram, we developed a dynamic nomogram designed to enhance swift and precise risk stratification. Nevertheless, there were several limitations in this study. First, selection bias was inherent in this retrospective analysis. Prospective studies are warranted to enhance the quality of evidence. Second, the variables of the nomogram were extracted from the two databases, and some features, such as melena, previous gastrointestinal bleeding, and endoscopic examination results, were lacking. Therefore, we cannot further calculate some existing scores, including the GBS, Oakland, and ABC scores. Last, data was obtained from the patient’s medical records, and the nomogram’s performance was related to the accuracy of the documentation.

Conclusions

In conclusion, we developed a dynamic nomogram and conducted both internal and external validation using data from the eICU-CRD and MIMIC-IV. The link to the dynamic nomogram is https://kangzou.shinyapps.io/DynNomapp_GIB/. The use of our nomogram can assist clinicians in the early identification of patients at high risk of mortality, allowing for close monitoring and personalized management. Additionally, it provides valuable prognostic information for both patients and their relatives. However, further prospective studies are still essential to expedite the clinical application of such predictive models.