Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Outpatient reception via collaboration between nurses and a large language model: a randomized controlled trial

Abstract

Reception is an essential process for patients seeking medical care and a critical component influencing the healthcare experience. However, current communication systems rely mainly on human efforts, which are both labor and knowledge intensive. A promising alternative is to leverage the capabilities of large language models (LLMs) to assist the communication in medical center reception sites. Here we curated a unique dataset comprising 35,418 cases of real-world conversation audio corpus between outpatients and receptionist nurses from 10 reception sites across two medical centers, to develop a site-specific prompt engineering chatbot (SSPEC). The SSPEC efficiently resolved patient queries, with a higher proportion of queries addressed in fewer rounds of queries and responses (Q&Rs; 68.0% ≤2 rounds) compared with nurse-led sessions (50.5% ≤2 rounds) (P = 0.009) across administrative, triaging and primary care concerns. We then established a nurse–SSPEC collaboration model, overseeing the uncertainties encountered during the real-world deployment. In a single-center randomized controlled trial involving 2,164 participants, the primary endpoint indicated that the nurse–SSPEC collaboration model received higher satisfaction feedback from patients (3.91 ± 0.90 versus 3.39 ± 1.15 in the nurse group, P < 0.001). Key secondary outcomes indicated reduced rate of repeated Q&R (3.2% versus 14.4% in the nurse group, P < 0.001) and reduced negative emotions during visits (2.4% versus 7.8% in the nurse group, P < 0.001) and enhanced response quality in terms of integrity (4.37 ± 0.95 versus 3.42 ± 1.22 in the nurse group, P < 0.001), empathy (4.14 ± 0.98 versus 3.27 ± 1.22 in the nurse group, P < 0.001) and readability (3.86 ± 0.95 versus 3.71 ± 1.07 in the nurse group, P = 0.006). Overall, our study supports the feasibility of integrating LLMs into the daily hospital workflow and introduces a paradigm for improving communication that benefits both patients and nurses. Chinese Clinical Trial Registry identifier: ChiCTR2300077245.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Profiling real-world conversation data across reception 10 sites from two medical centers.
Fig. 2: Internal validation of the SSPEC.
Fig. 3: CONSORT diagram.
Fig. 4: Randomized controlled trial testing the feasibility of the nurse–SSPEC collaboration model.

Similar content being viewed by others

Data availability

The data supporting the findings of this trial are available within the paper and its Supplementary Information files. All requests for further data sharing will be reviewed by the Ethics Review Committee of Southern University of Science and Technology Yantian Hospital, Shenzhen, China, and Renmin Hospital of Wuhan University, Wuhan, China, to verify whether the request is subject to any intellectual property or confidentiality obligations. Requests for access to de-identified individual-level data from this trial can be submitted via email to E.L. ([email protected]) with detailed proposals for approval and will be responded to within 60 d. Each request complying with the terms of use of the data indicated in the consent form will be granted. A signed data access agreement with the collaborator is required before accessing shared data. The raw conversation data are not publicly available due to privacy restrictions.

Code availability

The source code can be accessed via the following link: https://github.com/ZigengHuang/SSPEC. The SSPEC was developed with OpenAI version 0.28.1 (https://github.com/openai/openai-python), RAGAS version 0.0.18 (https://github.com/explodinggradients/ragas) and LangChain version 0.0.333 (https://github.com/langchain-ai/langchain).

References

  1. Himmelstein, D. U. et al. A comparison of hospital administrative costs in eight nations: US costs exceed all others by far. Health Aff. (Millwood) 33, 1586–1594 (2014).

    Article  PubMed  Google Scholar 

  2. Guo, S., Yang, T. & Dong, S. Research advances on cost-efficiency measurement and evaluation of public hospitals in China. Chin. J. Health Policy 13, 45–51 (2020).

    Google Scholar 

  3. Zeng, D. The comparison and implication on the cost management of public hospitals between China and the US. Chin. J. Health Policy 12, 13–17 (2012).

    Google Scholar 

  4. Kwame, A. & Petrucka, P. M. A literature-based study of patient-centered care and communication in nurse–patient interactions: barriers, facilitators, and the way forward. BMC Nurs. 20, 158 (2021).

  5. Sharkiya, S. H. Quality communication can improve patient-centred health outcomes among older patients: a rapid review. BMC Health Serv. Res. 23, 886 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Wang, J. et al. Prevalence of depression and depressive symptoms among outpatients: a systematic review and meta-analysis. BMJ Open 7, e017173 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Fang, H. et al. Depressive symptoms and workplace-violence-related risk factors among otorhinolaryngology nurses and physicians in Northern China: a cross-sectional study. BMJ Open 8, e019514 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Yuan, Y. et al. Survey on mental health status of the medical staff. Nurs. J. Chin. Peoples Liberation Army 24, 22–23 (2007).

    Google Scholar 

  9. Portoghese, I., Galletta, M., Coppola, R. C., Finco, G. & Campagna, M. Burnout and workload among health care workers: the moderating role of job control. Saf. Health Work 5, 152–157 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Jingwei He, A. & Qian, J. Hospitals’ responses to administrative cost-containment policy in urban China: the case of Fujian Province. China Q. 216, 946–969 (2013).

    Article  Google Scholar 

  11. Ayers, J. W. et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med. 183, 589–596 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).

    Article  CAS  PubMed  Google Scholar 

  14. Lee, P., Bubeck, S. & Petro, J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N. Engl. J. Med. 388, 1233–1239 (2023).

    Article  PubMed  Google Scholar 

  15. Agrawal, G., Kumarage, T., Alghami, Z. & Liu, H. Can knowledge graphs reduce hallucinations in LLMs?: a survey. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 3947–3960 (Association for Computational Linguistics, 2024).

  16. Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021).

  17. Konopasky, A. et al. Understanding context specificity: the effect of contextual factors on clinical reasoning. Diagnosis 7, 257–264 (2020).

    Article  PubMed  Google Scholar 

  18. Mondal, P. The limits of language-thought influences can be set by the constraints of embodiment. Front. Psychol. 12, 593137 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Loh, S. B. & Raamkumar, A. S. Harnessing large language models’ empathetic response generation capabilities for online mental health counselling support. Preprint at https://arxiv.org/abs/2310.08017 (2023).

  20. Geist, S. M. & Geist, J. R. Improvement in medical consultation responses with a structured request form. J. Dent. Educ. 72, 553–561 (2008).

    Article  PubMed  Google Scholar 

  21. Du, S. & Martinez, A. M. Compound facial expressions of emotion: from basic research to clinical applications. Dialogues Clin. Neurosci. 17, 443–455 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Okada, B. M., Lachs, L. & Boone, B. Interpreting tone of voice: musical pitch relationships convey agreement in dyadic conversation. J. Acoust. Soc. Am. 132, EL208–EL214 (2012).

    Article  PubMed  Google Scholar 

  23. Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).

    Article  CAS  PubMed  Google Scholar 

  24. Robles, P. & Mallinson, D. J. Artificial intelligence technology, public trust, and effective governance. Rev. Policy Res. https://doi.org/10.1111/ropr.12555 (2023).

  25. Shaikh, O., Zhang, H., Held, W., Bernstein, M. & Yang, D. On second thought, letʼs not think step by step! Bias and toxicity in zero-shot reasoning. In Proc. 61st Annual Meeting of the Association for Computational Linguistics 4454–4470 (Association for Computational Linguistics, 2023).

  26. Ito, N. et al. The accuracy and potential racial and ethnic biases of GPT-4 in the diagnosis and triage of health conditions: evaluation study. JMIR Med. Educ. 9, e47532 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Singh, N., Lawrence, K., Richardson, S. & Mann, D. M. Centering health equity in large language model deployment. PLoS Digit. Health 2, e0000367 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Neuwelt, P. M., Kearns, R. A. & Cairns, I. R. The care work of general practice receptionists. J. Prim. Health Care 8, 122–129 (2016).

    Article  PubMed  Google Scholar 

  29. Yang, X. et al. A study of deep learning methods for de-identification of clinical notes in cross-institute settings. BMC Med. Inf. Decis. Mak. 19, 232 (2019).

    Article  Google Scholar 

  30. Yang, X. et al. A large language model for electronic health records. NPJ Digit. Med. 5, 194 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Yuan, J. et al. Advanced prompting as a catalyst: empowering large language models in the management of gastrointestinal cancers. Innov. Med. 1, 100019 (2023).

    Article  Google Scholar 

  32. Martin-Maroto, F. & de Polavieja, G. G. Semantic embeddings in semilattices. Preprint at https://arxiv.org/abs/2205.12618 (2022).

  33. Gao, K. et al. Examining user-friendly and open-sourced large GPT models: a survey on language, multimodal, and scientific GPT models. Preprint at https://arxiv.org/abs/2308.14149 (2023).

  34. Mao, R., Chen, G., Zhang, X., Guerin, F. & Cambria, E. GPTEval: a survey on assessments of ChatGPT and GPT-4. In Proc. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (eds Calzolari, N. et al.) 7844–7866 (ELRA and ICCL, 2024).

  35. Es, S., James, J., Espinosa-Anke, L. & Schockaert, S. RAGAS: Automated Evaluation of Retrieval Augmented Generation. In Proc. 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations (eds Aletras, N. & De Clercq, O.) 150–158 (Association for Computational Linguistics, 2024).

  36. Jiang, Z. et al. Active retrieval augmented generation. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing 7969–7992 (Association for Computational Linguistics, 2023).

  37. Bonferroni, C. E. Il calcolo delle assicurazioni su gruppi di teste. In Studi in Onore del Professore Salvatore Ortu Carboni (Tipografia des Senato del dott. G. Bardi, 1935).

Download references

Acknowledgements

E.L. is supported by the National Natural Science Foundation of China (Excellent Youth Scholars Program, 32300483 and 82090011) and the Chinese Academy of Medical Sciences Innovation Fund (2023-I2M-3-010). We thank Z. Wang and the Long laboratory members for valuable comments. We also thank the Bioinformatics Center of the Institute of Basic Medical Sciences for computing support.

Author information

Authors and Affiliations

Authors

Contributions

E.L., P.W. and Q.C. contributed to the conception and design of the study. P.W., Z.H., W.T., Y.N., D.P., S.D. and J.C. contributed to the data acquisition, curation and analysis. Y.Z. and H.D. provided technical assistance. All authors contributed to the drafting and revising of the paper.

Corresponding authors

Correspondence to Qingyu Chen or Erping Long.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Jordan Alpert, Max Rollwage and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Study design overview.

(A) Conversational audio was collected from 10 reception sites across two medical centers. (B) The audio data was transformed into text format with meticulous manual editing, encompassing a spectrum of patients’ queries including administrative, triaging, and primary-care concerns. The cases were used as the training set for knowledge curation at each site. Fine-tuning and prompt strategies were applied for developing the site-specific prompt engineering chatbot (SSPEC). (C) The cases independent from the training set were reserved as the validation set for the comparison testing in terms of factuality, integrity, safety, empathy, readability, and satisfaction. (D) An alert system was implemented to alert the uncertainty in SSPEC responses. Subsequently, a collaboration model between receptionist nurses and SSPEC was established. This model was then tested in a randomized controlled trial to ascertain its practicality in the outpatient reception setting.

Extended Data Fig. 2 Context setting of patient-nurse conversation.

The study involved outpatients, patients about to be admitted to the hospital, or individuals seeking help. Patients with queries for the nurse who agreed to participate in the audio recording of conversations were recruited upon arrival at the hospital entrance. After recruitment, participants were directed to the reception sites for an in-person visit with the nurse. Prior to the commencement of audio recording, informed consent was obtained from both the nurses and the patients.

Extended Data Fig. 3 Nurse-SSPEC collaboration model involving the alert system in mitigating uncertainty.

Upon patient arrival at the reception site, their queries are recorded audibly and automatically transformed into text. To address uncertain or potentially harmful responses generated by SSPEC, an alert system has been implemented. This system triggers an alert to the nurses if any ‘signals of uncertainty’ are detected, through key-phrases matching, independent LLM evaluation, or automatic evaluation. This alert prompts immediate nurses review or modification of the response. Furthermore, a dedicated team reviews all patient-SSPEC conversations to continually refine the prompting.

Extended Data Fig. 4 The workflow in randomized controlled trial.

Patient participants were randomly assigned to either the nurse-SSPEC group or the nurse group. Patients in the nurse-SSPEC group primarily interacted with SSPEC via audio, with nurses alerted for review if any uncertain responses were detected. For patients in the nurse group, they were directed to nurses and interacted in person. Satisfaction was measured immediately after the encounter.

Supplementary information

Supplementary Information

Supplementary Tables 1–25, study protocol and statistical plan.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wan, P., Huang, Z., Tang, W. et al. Outpatient reception via collaboration between nurses and a large language model: a randomized controlled trial. Nat Med 30, 2878–2885 (2024). https://doi.org/10.1038/s41591-024-03148-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-024-03148-7

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research