Abstract
Reception is an essential process for patients seeking medical care and a critical component influencing the healthcare experience. However, current communication systems rely mainly on human efforts, which are both labor and knowledge intensive. A promising alternative is to leverage the capabilities of large language models (LLMs) to assist the communication in medical center reception sites. Here we curated a unique dataset comprising 35,418 cases of real-world conversation audio corpus between outpatients and receptionist nurses from 10 reception sites across two medical centers, to develop a site-specific prompt engineering chatbot (SSPEC). The SSPEC efficiently resolved patient queries, with a higher proportion of queries addressed in fewer rounds of queries and responses (Q&Rs; 68.0% ≤2 rounds) compared with nurse-led sessions (50.5% ≤2 rounds) (P = 0.009) across administrative, triaging and primary care concerns. We then established a nurse–SSPEC collaboration model, overseeing the uncertainties encountered during the real-world deployment. In a single-center randomized controlled trial involving 2,164 participants, the primary endpoint indicated that the nurse–SSPEC collaboration model received higher satisfaction feedback from patients (3.91 ± 0.90 versus 3.39 ± 1.15 in the nurse group, P < 0.001). Key secondary outcomes indicated reduced rate of repeated Q&R (3.2% versus 14.4% in the nurse group, P < 0.001) and reduced negative emotions during visits (2.4% versus 7.8% in the nurse group, P < 0.001) and enhanced response quality in terms of integrity (4.37 ± 0.95 versus 3.42 ± 1.22 in the nurse group, P < 0.001), empathy (4.14 ± 0.98 versus 3.27 ± 1.22 in the nurse group, P < 0.001) and readability (3.86 ± 0.95 versus 3.71 ± 1.07 in the nurse group, P = 0.006). Overall, our study supports the feasibility of integrating LLMs into the daily hospital workflow and introduces a paradigm for improving communication that benefits both patients and nurses. Chinese Clinical Trial Registry identifier: ChiCTR2300077245.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
27,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
209,00 € per year
only 17,42 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
The data supporting the findings of this trial are available within the paper and its Supplementary Information files. All requests for further data sharing will be reviewed by the Ethics Review Committee of Southern University of Science and Technology Yantian Hospital, Shenzhen, China, and Renmin Hospital of Wuhan University, Wuhan, China, to verify whether the request is subject to any intellectual property or confidentiality obligations. Requests for access to de-identified individual-level data from this trial can be submitted via email to E.L. ([email protected]) with detailed proposals for approval and will be responded to within 60 d. Each request complying with the terms of use of the data indicated in the consent form will be granted. A signed data access agreement with the collaborator is required before accessing shared data. The raw conversation data are not publicly available due to privacy restrictions.
Code availability
The source code can be accessed via the following link: https://github.com/ZigengHuang/SSPEC. The SSPEC was developed with OpenAI version 0.28.1 (https://github.com/openai/openai-python), RAGAS version 0.0.18 (https://github.com/explodinggradients/ragas) and LangChain version 0.0.333 (https://github.com/langchain-ai/langchain).
References
Himmelstein, D. U. et al. A comparison of hospital administrative costs in eight nations: US costs exceed all others by far. Health Aff. (Millwood) 33, 1586–1594 (2014).
Guo, S., Yang, T. & Dong, S. Research advances on cost-efficiency measurement and evaluation of public hospitals in China. Chin. J. Health Policy 13, 45–51 (2020).
Zeng, D. The comparison and implication on the cost management of public hospitals between China and the US. Chin. J. Health Policy 12, 13–17 (2012).
Kwame, A. & Petrucka, P. M. A literature-based study of patient-centered care and communication in nurse–patient interactions: barriers, facilitators, and the way forward. BMC Nurs. 20, 158 (2021).
Sharkiya, S. H. Quality communication can improve patient-centred health outcomes among older patients: a rapid review. BMC Health Serv. Res. 23, 886 (2023).
Wang, J. et al. Prevalence of depression and depressive symptoms among outpatients: a systematic review and meta-analysis. BMJ Open 7, e017173 (2017).
Fang, H. et al. Depressive symptoms and workplace-violence-related risk factors among otorhinolaryngology nurses and physicians in Northern China: a cross-sectional study. BMJ Open 8, e019514 (2018).
Yuan, Y. et al. Survey on mental health status of the medical staff. Nurs. J. Chin. Peoples Liberation Army 24, 22–23 (2007).
Portoghese, I., Galletta, M., Coppola, R. C., Finco, G. & Campagna, M. Burnout and workload among health care workers: the moderating role of job control. Saf. Health Work 5, 152–157 (2014).
Jingwei He, A. & Qian, J. Hospitals’ responses to administrative cost-containment policy in urban China: the case of Fujian Province. China Q. 216, 946–969 (2013).
Ayers, J. W. et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med. 183, 589–596 (2023).
Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).
Lee, P., Bubeck, S. & Petro, J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N. Engl. J. Med. 388, 1233–1239 (2023).
Agrawal, G., Kumarage, T., Alghami, Z. & Liu, H. Can knowledge graphs reduce hallucinations in LLMs?: a survey. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 3947–3960 (Association for Computational Linguistics, 2024).
Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021).
Konopasky, A. et al. Understanding context specificity: the effect of contextual factors on clinical reasoning. Diagnosis 7, 257–264 (2020).
Mondal, P. The limits of language-thought influences can be set by the constraints of embodiment. Front. Psychol. 12, 593137 (2021).
Loh, S. B. & Raamkumar, A. S. Harnessing large language models’ empathetic response generation capabilities for online mental health counselling support. Preprint at https://arxiv.org/abs/2310.08017 (2023).
Geist, S. M. & Geist, J. R. Improvement in medical consultation responses with a structured request form. J. Dent. Educ. 72, 553–561 (2008).
Du, S. & Martinez, A. M. Compound facial expressions of emotion: from basic research to clinical applications. Dialogues Clin. Neurosci. 17, 443–455 (2015).
Okada, B. M., Lachs, L. & Boone, B. Interpreting tone of voice: musical pitch relationships convey agreement in dyadic conversation. J. Acoust. Soc. Am. 132, EL208–EL214 (2012).
Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).
Robles, P. & Mallinson, D. J. Artificial intelligence technology, public trust, and effective governance. Rev. Policy Res. https://doi.org/10.1111/ropr.12555 (2023).
Shaikh, O., Zhang, H., Held, W., Bernstein, M. & Yang, D. On second thought, letʼs not think step by step! Bias and toxicity in zero-shot reasoning. In Proc. 61st Annual Meeting of the Association for Computational Linguistics 4454–4470 (Association for Computational Linguistics, 2023).
Ito, N. et al. The accuracy and potential racial and ethnic biases of GPT-4 in the diagnosis and triage of health conditions: evaluation study. JMIR Med. Educ. 9, e47532 (2023).
Singh, N., Lawrence, K., Richardson, S. & Mann, D. M. Centering health equity in large language model deployment. PLoS Digit. Health 2, e0000367 (2023).
Neuwelt, P. M., Kearns, R. A. & Cairns, I. R. The care work of general practice receptionists. J. Prim. Health Care 8, 122–129 (2016).
Yang, X. et al. A study of deep learning methods for de-identification of clinical notes in cross-institute settings. BMC Med. Inf. Decis. Mak. 19, 232 (2019).
Yang, X. et al. A large language model for electronic health records. NPJ Digit. Med. 5, 194 (2022).
Yuan, J. et al. Advanced prompting as a catalyst: empowering large language models in the management of gastrointestinal cancers. Innov. Med. 1, 100019 (2023).
Martin-Maroto, F. & de Polavieja, G. G. Semantic embeddings in semilattices. Preprint at https://arxiv.org/abs/2205.12618 (2022).
Gao, K. et al. Examining user-friendly and open-sourced large GPT models: a survey on language, multimodal, and scientific GPT models. Preprint at https://arxiv.org/abs/2308.14149 (2023).
Mao, R., Chen, G., Zhang, X., Guerin, F. & Cambria, E. GPTEval: a survey on assessments of ChatGPT and GPT-4. In Proc. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (eds Calzolari, N. et al.) 7844–7866 (ELRA and ICCL, 2024).
Es, S., James, J., Espinosa-Anke, L. & Schockaert, S. RAGAS: Automated Evaluation of Retrieval Augmented Generation. In Proc. 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations (eds Aletras, N. & De Clercq, O.) 150–158 (Association for Computational Linguistics, 2024).
Jiang, Z. et al. Active retrieval augmented generation. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing 7969–7992 (Association for Computational Linguistics, 2023).
Bonferroni, C. E. Il calcolo delle assicurazioni su gruppi di teste. In Studi in Onore del Professore Salvatore Ortu Carboni (Tipografia des Senato del dott. G. Bardi, 1935).
Acknowledgements
E.L. is supported by the National Natural Science Foundation of China (Excellent Youth Scholars Program, 32300483 and 82090011) and the Chinese Academy of Medical Sciences Innovation Fund (2023-I2M-3-010). We thank Z. Wang and the Long laboratory members for valuable comments. We also thank the Bioinformatics Center of the Institute of Basic Medical Sciences for computing support.
Author information
Authors and Affiliations
Contributions
E.L., P.W. and Q.C. contributed to the conception and design of the study. P.W., Z.H., W.T., Y.N., D.P., S.D. and J.C. contributed to the data acquisition, curation and analysis. Y.Z. and H.D. provided technical assistance. All authors contributed to the drafting and revising of the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Medicine thanks Jordan Alpert, Max Rollwage and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Study design overview.
(A) Conversational audio was collected from 10 reception sites across two medical centers. (B) The audio data was transformed into text format with meticulous manual editing, encompassing a spectrum of patients’ queries including administrative, triaging, and primary-care concerns. The cases were used as the training set for knowledge curation at each site. Fine-tuning and prompt strategies were applied for developing the site-specific prompt engineering chatbot (SSPEC). (C) The cases independent from the training set were reserved as the validation set for the comparison testing in terms of factuality, integrity, safety, empathy, readability, and satisfaction. (D) An alert system was implemented to alert the uncertainty in SSPEC responses. Subsequently, a collaboration model between receptionist nurses and SSPEC was established. This model was then tested in a randomized controlled trial to ascertain its practicality in the outpatient reception setting.
Extended Data Fig. 2 Context setting of patient-nurse conversation.
The study involved outpatients, patients about to be admitted to the hospital, or individuals seeking help. Patients with queries for the nurse who agreed to participate in the audio recording of conversations were recruited upon arrival at the hospital entrance. After recruitment, participants were directed to the reception sites for an in-person visit with the nurse. Prior to the commencement of audio recording, informed consent was obtained from both the nurses and the patients.
Extended Data Fig. 3 Nurse-SSPEC collaboration model involving the alert system in mitigating uncertainty.
Upon patient arrival at the reception site, their queries are recorded audibly and automatically transformed into text. To address uncertain or potentially harmful responses generated by SSPEC, an alert system has been implemented. This system triggers an alert to the nurses if any ‘signals of uncertainty’ are detected, through key-phrases matching, independent LLM evaluation, or automatic evaluation. This alert prompts immediate nurses review or modification of the response. Furthermore, a dedicated team reviews all patient-SSPEC conversations to continually refine the prompting.
Extended Data Fig. 4 The workflow in randomized controlled trial.
Patient participants were randomly assigned to either the nurse-SSPEC group or the nurse group. Patients in the nurse-SSPEC group primarily interacted with SSPEC via audio, with nurses alerted for review if any uncertain responses were detected. For patients in the nurse group, they were directed to nurses and interacted in person. Satisfaction was measured immediately after the encounter.
Supplementary information
Supplementary Information
Supplementary Tables 1–25, study protocol and statistical plan.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wan, P., Huang, Z., Tang, W. et al. Outpatient reception via collaboration between nurses and a large language model: a randomized controlled trial. Nat Med 30, 2878–2885 (2024). https://doi.org/10.1038/s41591-024-03148-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-024-03148-7
This article is cited by
-
Unravelling ChatGPT’s potential in summarising qualitative in-depth interviews
Eye (2025)
-
Rethinking clinical trials for medical AI with dynamic deployments of adaptive systems
npj Digital Medicine (2025)
-
Clinical implementation of AI-based screening for risk for opioid use disorder in hospitalized adults
Nature Medicine (2025)
-
Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases
npj Digital Medicine (2025)
-
Improving primary healthcare with generative AI
Nature Medicine (2024)