Fig. 6: Patients are divided into high and low-risk groups based on stacking model output score thresholds (green curve represents the low-risk group, red represents the high-risk group).

Results show that the p values for both the training set (a) and the validation set (b) are <0.05, indicating statistically significant differences.