Fig. 1
From: Speaker-normalized sound representations in the human auditory cortex

Listeners perceive speech sounds relative to their acoustic context. a Target sounds were synthesized to create a six-step continuum ranging from sufu (step 1; low-first formant [F1]) to sofo (step 6; high F1). Context sentences were synthesized to sound like two different speakers: a speaker with a long vocal tract (low-F1 range: Speaker A), and a speaker with a short vocal tract (high-F1 range; Speaker B). Context sentences contained only the vowels /e/ and /a/, but not the target vowels /u/ and /o/. b Context sentences preceded the target on each trial (separated by 0.5 s of silence), after which participants responded with a button press to indicate whether they heard “sufu” or “sofo”. c All targets were presented after both speaker contexts. d Listeners more often gave “sofo” responses to target sounds if the preceding context was spoken by Speaker A (low F1) than Speaker B (high F1). Error bars indicate s.e.m.