Figure 3
From: Disparities in medical recommendations from AI-based chatbots across different countries/regions

In (A), it displays the results of the multiple linear regression model (adjusted for chatbot, patient, and rater), where bars represent the regionally adjusted mean total scores, with error bars representing 95% confidence intervals. The numbers on the chart indicate the adjusted mean differences. Distribution of overall performance for AI chatbots by region: (B) Bing, (C) Bard, and (D) ChatGPT-3.5. In (B)–(D), bars denote the average total scores for the four regions, with error bars representing standard errors. The p-value in the figure indicates the results of the one-way ANOVA test with Scheffe's post hoc analysis.