Figure 4
From: Disparities in medical recommendations from AI-based chatbots across different countries/regions

In (A), it presents the results of the multiple linear regression model (adjusted for region, patient, and rater), where bars represent the adjusted mean total scores for the three AI chatbots, with error bars representing 95% confidence intervals. The numbers on the chart indicate the adjusted mean differences. Distribution of AI chatbot performance across five parameters by region: (B) Bing, (C) Bard, and (D) ChatGPT-3.5. In (B)–(D), radar charts summarize the 100-point scale scores for each parameter, with the p-value for each parameter indicating the results of the one-way ANOVA test.