Fig. 1: IRENE.

a, Contrasting the previous non-unified multimodal diagnosis paradigm with IRENE. IRENE eliminates the tedious text structuralization process, separate paths for modality-specific feature extraction and the multimodal feature fusion module in traditional non-unified approaches. Instead, IRENE performs multimodal diagnosis with a single unified transformer. b, Scheme for splitting an original dataset into training, validation and testing sets for pulmonary disease identification (left) and adverse clinical outcome prediction of COVID-19 (right). c,d, Comparison of the experimental results from the image-only models, non-unified early fusion methods, multimodal transformer (that is, Perceiver) and IRENE in the two tasks in b. We compared the mean performance of IRENE and the multimodal transformer using independent two-sample t-test (two-sided). Specifically, we repeated each experiment ten times with different random seeds, after which P values were calculated. e, Comparison of IRENE with junior (<7 yr of experience, n = 2) and senior (>7 yr of experience, n = 2) physicians; average performance reported for each group. IRENE surpasses the diagnosis performance of junior physicians while performing competitively with senior experts. AUC, area under the curve.