Extended Data Fig. 4: Additional feature and model analysis.

(a) Prediction performance as the function of the number of SNVs. Average precision of the multivariable regression model for increasing numbers of common (upper) and rare (lower) variants. For each major irAE label, the irAE and control groups of samples were divided into a training and valid set by 8:2. MSK, Musculoskeletal; GI, Gastrointestinal; Multiple, Multiple (any grade); Multiple2, Multiple (grade ≥ 2) (N = 564 patients). (b) Correlation matrix among the 12 major types of irAE based on regression by the selected 859 features included in the prediction models. The color scale of the cells corresponds to the two-sided pearson correlation coefficient. Statistically significant correlations (Nominal P < 0.05) are indicated in color. (c) Comparison of the performance of the DNN model with that of the support vector machine (upper) and gradient boosting decision trees implemented by XGBoost (lower) in terms of ROC-AUC. (d) ROC curves for external validation by the DNN model, support vector machine, and gradient boosting decision trees implemented by XGBoost (N = 169 patients).