Fig. 1: Schematic of the SEAMLESSM4T-V2 model. | Nature

Fig. 1: Schematic of the SEAMLESSM4T-V2 model.

From: Joint speech and text machine translation for up to 100 languages

Fig. 1

The three main blocks of UNITY2 (S2ST fine-tuning) with its non-autoregressive (NAR) T2U are shown on the top left. Multitask-UNITY2 with its additional text encoder are shown on the bottom left. Break down of the components of SEAMLESSM4T-V2 (a multitask-UNITY2 model) are shown on the right with the side panel showing the teacher T2U model used for pseudo-labelling (M4).

Back to article page