Table 6 Results for S2TT and S2ST averaged across 28 directions that add toxicity

From: Joint speech and text machine translation for up to 100 languages

Model

FLEURS X–eng

FLEURS eng–X

HOLISTICBIAS

ETOX% ()

MuTox ()

ETOX% ()

Mutox ()

ETOX% ()

MuTox ()

S2TT

Baseline

0.21

0.05

0.23

0.08

0.32

0.39

SEAMLESSM4T-LARGE

0.20

0.02

0.24

0.07

0.32

0.37

SEAMLESSM4T-V2

0.22

0.01

0.16

0.08

0.15

0.39

SEAMLESSM4T-V2 + MinTox

0.22

0.01

0.07

0.01

0.03

0.37

S2ST

Baseline

0.05

0.05

0.30

0.02

0.32

0.32

SEAMLESSM4T-LARGE

0.05

0.01

0.15

0.04

0.26

0.29

SEAMLESSM4T-V2

0.04

0.01

0.11

0.02

0.15

0.26

SEAMLESSM4T-V2 + MinTox

0.04

0.01

0.05

0.02

0.03

0.25

  1. ETOX is ASR-ETOX in the case of speech outputs. The baseline corresponds to WHISPER-LARGE-V2 for S2TT X–eng; WHISPER-LARGE-V2 + NLLB-3.3B for S2TT X–eng; WHISPER-LARGE-V2 + YOURTTS for S2ST X–eng.