Extended Data Fig. 6: Ablation studies and cross-validation results. | Nature Medicine

Extended Data Fig. 6: Ablation studies and cross-validation results.

From: Artificial-intelligence-based molecular classification of diffuse gliomas using rapid, label-free optical imaging

Extended Data Fig. 6

We conducted three main ablation studies to evaluate the following model architectural design choices and major training strategies: (1) cross-entropy versus contrastive loss for visual representation learning, (2) linear versus transformer-based multi-label classification, and (3) randomly initialized versus pretrained genetic embedding. a, The first two ablation studies are shown in the panel and the details of the cross-validation experiments are explained in the Methods section (see ‘Ablation Studies’). Firstly, a ResNet50 model was trained using either cross-entropy or patchcon. The patchcon trained image encoder was then fixed. A linear classifier and transformer classifier were then trained using the same patchcon image encoder in order to evaluate the performance boost from using a transformer encoder. This ablation study design allows us to evaluate (1) and (2). The columns of the panel correspond to the three levels of prediction for SRH image classification: patch-, slide-, and patient-level. Each model was trained three times on randomly sampled validation sets and the average (± standard deviation) ROC curves are shown for each model. Each row corresponds to the three molecular diagnostic mutations we aimed to predict using our DeepGlioma model. The results show that patchcon outperforms cross-entropy for visual representation learning and that the transformer classifier outperforms the linear classifier for multi-label classification. Note that the boost in performance of the transformer classifier over the linear model is due to the deep multi-headed attention mechanism learning conditional dependencies between labels in the context of specific SRH image features (i.e., not improved image feature learning because the SRH encoder weights are fixed). b, We then aimed to evaluate (3). A single ResNet50 model was trained using patchcon and the encoder weights were fixed for the following ablation study to isolate the contribution of random initialization versus pretraining of the genetic embedding layer. Three mask label training regimes were tested and are presented in the tables: all input labels masked (100%), two labels randomly masked (66%), and one label randomly masked (33%). The first row in the first table (100% masked) is non-multimodal training, where no genetic information is provided at any point during training or inference. We found that 66% input label masking, or randomly masking two of three diagnostic mutations, showed the best overall classification performance. We hypothesize that this results from allowing a single mutation to weakly define the genetic context while allowing supervision from the two masked labels to backpropagate through the transformer encoder. mAcc, mean label accuracy; mAP, mean average precision; mAUC, mean area under ROC curve; SubAcc, subset accuracy; ebF1, example based F1 score; micF1, micro-F1 score.

Back to article page