Fig. 1
From: Quality assessment of large language models’ output in maternal health

Boxplots depicting average overall evaluator responses of Survey 1 by models, including: (a) Overall; (b) Clarity; (c) Quality; (d) English; (e) Portuguese and (f) Urdu. Scores range from 1 to 5, with higher scores indicating better overall performance. LLM, Large Language Model, * signify statistical significance (p < 0.05).