Fig. 1
From: Leveraging logit uncertainty for better knowledge distillation

Overall framework of Logits Uncertainty Distillation. We use category uncertainty weighting to address uncertainty in the teacher model’s predictions. A confidence threshold, based on the teacher’s predictions, is used to create a mask that discounts uncertain classes during distillation. Additionally, two Spearman correlation loss functions are used to align the logits of the teacher and student models by measuring the discrepancy between their outputs at the category and sample levels. The arrows show the flow of data through the system, including the generation of sample and category logits, indicating the roles of different components.