Fig. 5: Analysis of the top 15 important features in the development cohort.

a Distributions of the features across the subphenotypes. Bar charts indicate relative frequency for binary/ordinal features, while box plots are shown for continuous features. In all boxplots, the central bar in each box represents the median value of each respective category, the bounds of each box are the interquartile range (IQR), whiskers extend 1.5*IQR from each box, and the dots are the outliers. The numbers of samples to derive the statistics are shown in the figure. b Impact of the features on predicting the subphenotype membership, using Shapley Addictive Explanations (SHAP). Individual values of the features for each sample are colored according to their relative values, with the blue color representing lower values, and the red color representing higher values. The features are ranked based on mean absolute SHAP values. Positive SHAP values (> 0) indicate increased likelihood of subphenotype membership. ECOG Eastern Cooperative Oncology Group, NLR neutrophils-to-lymphocytes ratio, NMLR neutrophils and monocytes to lymphocytes ratio, WBC white blood cell. Source data are provided with this paper.