Figure 2
From: Restoring speech intelligibility for hearing aid users with deep learning

Human mean opinion scores (MOS) of state-of-the-art denoising methods. (A) Comparison between current denoising models on 3 publicly available test sets34, total of 150,000 Human MOS (500 sound files per dataset, 20 human raters per file, 5 models). Triangles denote the mean, the bar chart shows the 25%, 50% and 75% quartiles, and Whiskers show the 1.5 times the interquartile range. (B) Dependence of Human MOS on signal-to-noise ratio (1500 samples of A). Shading: 25th and 75th quartiles, no shading for SNR values with too few audio samples. SNR values rounded to the nearest integer value. (C) Comparison of denoising methods using other common speech quality metrics on a 1 to 5 scale and the SI-SNR, normalized to 1–5 (see Extended Data Fig. S3 for original range and full details on SI-SNR).