Extended Data Fig. 5: Trends in vignette and conversational formats in dermatology datasets. for cases with single most likely diagnosis.
From: An evaluation framework for clinical use of large language models in patient interaction tasks

Trends in vignette and conversational formats persist across skin disease datasets (MedQA-USMLE, Derm-Public and Derm-Private) for cases with single most likely diagnosis. Results are shown for both (a,b,c,d) 4-choice MCQ and (e,f,g,h) FRQ settings. Error bars represent 95% confidence intervals.