Introduction

Helicobacter pylori (H. pylori) is a Gram-negative bacterium that colonizes human gastric mucosal epithelial cells1. Infection with H. pylori is closely associated with gastric ulcers, chronic gastritis, duodenal ulcers, and prolonged infection can lead to gastric cancer2. H. pylori has a global infection rate of over half the world’s population, making it one of the most prevalent bacterial infections3. China is a country with a high H. pylori infection rate. A large, national epidemiologic study in 2023 showed an average individual-based prevalence of H. pylori infection of 40.66%, 43.45% in adults, and 20.55% in children and adolescents4. Therefore, early diagnosis and treatment of H. pylori infection are crucial. Currently, it is widely believed that H. pylori can be transmitted via "human-to-human," "fecal–oral," or “endoscopic” routes, exhibiting a clear familial aggregation phenomenon5. The diagnosis of H. pylori can be divided into invasive and non-invasive methods (which do not require endoscopy). The former includes pathological examination and serological diagnosis, while the latter mainly involves the 13C or 14C-urea breath test (UBT)6. In clinical practice, 13C or 14C breath tests are predominantly used. The principle of this test is based on isotopically labeled urea being ingested by the patient, which is then broken down by urease produced by H. pylori, generating 13CO2. This 13CO2 diffuses through epithelial cells, is absorbed by the bloodstream, and is exhaled through the lungs7. Therefore, it carries a small amount of radioactivity, posing potential safety risks to patients. Consequently, researchers are exploring new diagnostic methods such as Surface-Enhanced Raman Spectroscopy (SERS)8, biosensor technology9, and metagenomic sequencing10. Compared to these novel diagnostic approaches, breath detection offers advantages such as non-invasiveness, non-intrusiveness, and rapid convenience. In summary, the development of a new breath test method for H. pylori that is radiation-free and highly safe holds great promise.

Exhaled breath detection provides a non-invasive approach for identifying disease biomarkers and monitoring patient health in the field of medical diagnostics. This method relies on the analysis of endogenous volatile organic compounds (VOCs). Endogenous VOCs are byproducts or end-products of human or microbial metabolism and can be detected in exhaled breath, sweat, blood, tissue samples, urine, and feces11,12. Exhaled VOCs include molecules such as alkanes, benzene derivatives, acetone, dimethyl sulfide, phenol, and aromatic compounds13. A growing body of research indicates that certain diseases and infections can lead to characteristic changes in exhaled VOCs, such as esophageal cancer14, colorectal cancer15, and inflammatory bowel disease16. Apart from the UBT, novel breath analysis technologies, such as the graphene-based high-sensitivity electronic nose17, are being investigated for the diagnosis of H. pylori infection. However, related research in this area remains limited at present.

Fourier-transform infrared (FTIR) spectroscopy is an analytical technique used to analyze the spectral characteristics of various solids, liquids, and gases18. This technology measures the absorption of infrared light by molecular bonds in biological samples, generating a unique “fingerprint” spectrum that reflects the molecular composition of the sample. The sensitivity of FTIR to biochemical changes enables researchers to apply it to the diagnosis of various diseases, including cancer and metabolic disorders. In cancer detection, FTIR has been employed to analyze blood, urine, and tissue samples, with existing studies highlighting its potential in identifying early-stage tumors, making it a promising tool for non-invasive cancer diagnosis19,20,21. The application of FTIR has also extended to metabolic diseases such as diabetes, where it can detect glucose and other metabolic byproducts in blood and saliva samples22,23. In recent years, FTIR spectroscopy has emerged as a promising breath analysis technology for disease detection. It has been applied to detect infectious diseases such as SARS-CoV-224 and respiratory conditions like asthma and COPD25, demonstrating significant potential. However, no studies to date have explored the use of FTIR-based exhaled breath analysis for the diagnosis of H. pylori.

Based on this, our study employs FTIR spectroscopy to analyze the exhaled breath of patients with H. pylori infection before and after the ingestion of unlabeled urea. By comparing the breath spectra with those of a healthy control group, we use the K-means clustering algorithm to classify different sample types and establish a predictive model. This approach provides new insights and methods for using breath analysis in the diagnosis of H. pylori infection.

Materials and methods

Instruments and reagents

Materials and equipment

Gas collection bags (Shenzhen CNNC High-Tech Co., Ltd.); Nexus 870 Fourier Transform Infrared Spectrometer (Thermo Fisher Scientific, USA) with a spectral measurement range of 1.5–25 µm and a maximum resolution of 0.4 cm⁻1; high-purity nitrogen gas.

Construction of the gas cell

The gas cell, serving as the carrier for sample gas testing, must meet the wavelength measurement range requirements while also possessing a stable mechanical structure, interference resistance, and appropriate volume. The volume of the gas collection bag determines the gas cell volume. The volume of the gas collection bag is 100 ml; therefore, the gas cell is best designed with a volume of approximately 20 ml to ensure sufficient sample gas for testing. Assuming the gas cell has an optical path length of 10 cm and an inner diameter of 1.6 cm, the calculated volume is 20.096 ml, meeting the design requirements. The windows of the gas cell need to accommodate a broad wavelength transmission range, as different materials transmit light across varying spectral ranges. Based on project requirements and to obtain more spectral data, potassium bromide was chosen as the material for the windows.

Study subjects and grouping

This study was conducted at the Digestive Endoscopy Center of Renmin Hospital of Wuhan University from August 2021 to May 2022, with a total collection of 330 exhaled breath samples. The Ethics Committee of Wuhan University People’s Hospital has approved this single center research project, which was conducted in accordance with the principles of the Helsinki Declaration. A written informed consent was obtained from all participants. After excluding abnormal samples, 278 samples were included in the study. Among them, 63 patients infected with H. pylori were defined as the disease group. The healthy control group consisted of 215 individuals, including 46 who had been cured of H. pylori infection. All subjects signed an information survey form and informed consent before sample collection. The inclusion criteria for study subjects were: (1) Individuals were definitively diagnosed with or without H. pylori infection using the 13C-UBT. (2) Informed consent to participate in this study. Exclusion criteria included: (1) A history of antibiotic and proton pump inhibitor use within 1 month before the test. (2) Presence of mental disorders. (3) Suffering from oral, nasopharyngeal, or respiratory diseases such as dental caries, periodontitis, pharyngitis, or lung cancer. (4) Suffering from infectious respiratory diseases such as tuberculosis or COVID-19.

Exhaled breath sample collection

  1. 1.

    Pre-sampling Requirements: Subjects fasted for at least 8 h before sampling.

  2. 2.

    Rest and Sampling Procedure: Subjects were required to rest in the sampling area for at least 20 min before sampling. The collection of breath samples was conducted immediately before endoscopy or surgery. After taking three deep breaths and holding their breath for 10 s (inhale through the nose and exhale through the mouth), subjects exhaled into the gas collection bag using a disposable mouthpiece.

  3. 3.

    Sample Collection Timeline: One bag of exhaled breath was collected before urea intake, and another bag was collected half an hour after urea intake.

  4. 4.

    Sample Handling and Analysis: After the collection of exhaled breath samples, each sample was labeled and stored in a light-proof, temperature-controlled box. Samples were transported back to the laboratory within 2 h and analyzed within 24 h.

Exhaled gas detection

A high-precision gas cell with a broad spectral measurement range was fixed inside the sample compartment of the Fourier Transform Infrared Spectrometer. The gas sample from the collection bag was introduced into the gas cell for measurement through negative pressure suction. The gas cell’s inlet was connected to the collection bag via an electromagnetic valve, and the outlet was connected to a vacuum pump through another electromagnetic valve. The gas extracted by the vacuum pump was discharged outdoors through a gas wash bottle. The vacuum pump was used to achieve a vacuum level of 10 kPa, with 10% of the gas remaining after each vacuum cycle. Before each measurement, the system was evacuated and rinsed: first with nitrogen gas, then twice with the sample gas to ensure that residual sample gas was less than 0.1% and nitrogen gas greater than 0.9%. This procedure was performed to avoid cross-interference between samples and to ensure the purity of the sample gas.

Data analysis

General clinical data analysis

Clinical data were entered and organized using Excel 2019. Statistical analyses were performed using SPSS 26.0 software. Measurement data conforming to a normal distribution were expressed as mean values, and comparisons between groups were conducted using the t-test. Categorical data were presented as frequencies and percentages, with comparisons between groups performed using the chi-square test. A p-value of less than 0.05 was considered statistically significant.

Spectral analysis

Given the numerous influencing factors, the exhaled breath spectral data before urea intake is used as the baseline. The foundational sample data is obtained by subtracting this baseline spectrum from the exhaled breath spectral data after consuming unlabeled urea. The specific subtraction formula used is:

$$A_{0\nu } = A_{1\nu } - A_{2\nu }$$

A1 and A2 represent the spectra measured after and before taking the medication, respectivel.υ represents the wavenumber points, with a range of 542.7429–3959.744 cm⁻1.

The obtained spectral data were analyzed using Principal Component Analysis (PCA) to reduce complexity and enhance the density of useful information. Subsequently, the K-means clustering algorithm was used to classify different sample types and preliminarily construct various predictive models. These models were then analyzed for their recognition rate, as well as their accuracy, precision, sensitivity, and specificity on the training set. The final predictive model was determined based on these metrics, and its recognition rate was analyzed to further validate the diagnostic performance of FTIR in detecting H. pylori infection.

Results

Comparison of general data between two groups

This study included 63 patients infected with H. pylori and 215 healthy controls. Statistical analysis of the clinical data showed no significant differences between the two groups in terms of age, gender, smoking history, alcohol consumption history, or comorbidities (P > 0.05) (The results are presented in Supplementary Table S1).

Comparison of spectra between two groups

Figure 1 presents the preprocessed spectra of positive and negative samples. Based on extensive comparisons of positive and negative spectra, it was found that positive samples exhibit significant differences in the spectral bands at 930–970 cm−1 and 2350 cm−1. Through cross-referencing and analysis with various infrared standard gas molecule databases, these bands were identified as the absorption peaks of ammonia and carbon dioxide, respectively. In individuals infected with Helicobacter pylori, the ingestion of urea leads to its hydrolysis by urease, resulting in the production of ammonia and carbon dioxide. Consequently, this segment of the infrared spectrum was identified as the basis for subsequent qualitative and quantitative analyses.

Fig. 1
figure 1

Infrared difference preprocessed spectra comparison chart for positive and negative samples.

Within the wavenumber range of 1300–2000 cm⁻1, the “positive” line exhibits more pronounced “positive peaks” compared to the “negative” line, while the “negative” line shows more “negative peaks” in the same region. This difference is instrumental for distinguishing between the two in diagnostic discrimination models. In other wavenumber regions, variations in absorption intensity are also observed between the “positive” and “negative” lines, which further facilitates classification.

Preliminary establishment and evaluation of the predictive model

Based on the spectral data calculated from 141 exhaled breath samples of H. pylori infected patients and healthy individuals, the samples were categorized into a training set (70 samples), a test set (46 samples), and a validation set (25 samples) by applying the K-means clustering algorithm after PCA analysis. Subsequently, three models were constructed using Partial Least Squares (PLS), Support Vector Regression (SVR), and Extremely Randomized Trees (ERT) algorithms. The model with the highest accuracy was selected as the final diagnostic model based on usability criteria. Validation testing was then performed using the accuracy of positive-to-negative conversion samples to evaluate the model’s performance with new samples.

As shown in Table 1, among the three models, SVR and ERT demonstrated higher recognition rates for the test set, training set, and validation set. Specifically, the recognition rate of SVR reached 100% for the test set, while ERT achieved 100% for the validation set. Furthermore, when evaluating the diagnostic performance of the three machine learning models on the test set, the accuracy, precision, sensitivity, and specificity of all three models were suboptimal, with ERT showing the best diagnostic ability. Its accuracy reached 89.13%, with good specificity, sensitivity, and precision (Table 2).

Table 1 Results of preliminary machine learning models for breath test differential diagnosis.
Table 2 Diagnostic performance of three machine learning models on test set.

Establishment and evaluation of the final predictive model

Based on the spectral data calculated from a total of 278 exhaled breath samples from H. pylori infected patients and healthy individuals, the K-means clustering algorithm was used to divide the training set with 170 samples, the test set with 62 samples and the validation set with 46 samples. Considering the diagnostic performance of the three aforementioned machine learning models, the final predictive model was constructed using Partial Least Squares Discriminant Analysis (PLS-DA).

As shown in Table 3, the recognition rates of the predictive model for the training set, test set, and validation set were 94.12%, 98.39%, and 91.30%, respectively (Fig. 2). Therefore, the PLS-DA machine learning model based on pre- and post-medication spectral data demonstrates significant potential in diagnosing H. pylori infection.

Table 3 Final prediction model breath test differential diagnosis result.
Fig. 2
figure 2

Final predictive model breath test differential diagnosis results chart.

Discussion

Several diagnostic methods for H. pylori infection have been developed, including invasive tests and non-invasive tests, such as serological tests, stool antigen tests for H. pylori, and UBT. Among non-invasive methods, the 13C-UBT has been proven to be one of the most accurate diagnostic methods for detecting H. pylori infection in both adults and children26,27. Compared to biopsy, the sensitivity and specificity of UBT have been demonstrated to range between 90 and 100%, making it the clinical diagnostic gold standard. Currently, the analysis of 13CO2 in breath samples is conducted using a highly sensitive (up to 0.01‰) but expensive and complex system based on isotope ratio mass spectrometry (IRMS)28. Therefore, this diagnostic method has drawbacks such as high cost, time-consuming operation, and potential health risks to the human body. In addition to the non-invasive UBT, laboratory serological methods such as ELISA tests, latex agglutination assays, and immunochromatographic tests, as well as H. pylori stool antigen (HpSA) detection, are widely used and represent advanced diagnostic techniques29. Researchers are also actively developing innovative diagnostic approaches that are convenient, safe, and rapid, including novel breath testing methods. Studies utilizing gas chromatography-mass spectrometry (GC–MS) for detecting H. pylori infection highlight the promising potential of these new breath analysis techniques30.A preliminary clinical study has, for the first time, employed an innovative point-contact nanosensor device based on the organic conductor tetracyanoquinodimethane (TCNQ) to identify differences in breath characteristics between H. pylori-positive and H. pylori-negative adolescents with functional dyspepsia. Furthermore, this study demonstrates encouraging potential in distinguishing pathogenic H. pylori strains from non-pathogenic ones31.

There is limited research on the application of FTIR Spectroscopy in breath analysis of H. pylori detection. Research has utilized FTIR spectroscopy in 13C-labeled UBT, allowing highly specific measurements of 13C enrichment in breath samples32. A study utilizing infrared spectroscopy to analyze exhaled breath, particularly by detecting specific biomarkers, effectively tracked the biochemical changes associated with antibiotic treatment in H. pylori patients33, demonstrating the advantages of FTIR in breath analysis. This study focuses on the use of FTIR to compare exhaled breath before and after non-labeled urea intake in 63 H. pylori infected patients and 215 healthy individuals (including 46 subjects who had been cured of H. pylori infection). The predictive model constructed using PLS-DA achieved recognition rates of 94.12%, 98.39%, and 91.30% for the training set, test set, and validation set, respectively. Currently, research on novel diagnostic methods for H. pylori infection is advancing in two major directions. The first focuses on developing more precise and detailed diagnostic techniques. For instance, one approach employs label-free SERS spectroscopy of serum combined with deep learning to analyze H. pylori infection. This method classifies infections into two types based on the presence of CagA and VacA toxins in serum: carcinogenic Type I infection and non-carcinogenic Type II infection34. Another study integrates endoscopic examination with intraoperative gastric fluid analysis to detect H. pylori infection, followed by antibiotic susceptibility testing for patients with positive results35. This approach aids in tailoring eradication strategies for H. pylori infections. In comparison, the detection method adopted in this study demonstrates significant advantages in non-invasiveness, safety, and convenience, although its diagnostic accuracy requires improvement. Future efforts will focus on expanding the sample size and enhancing the diagnostic model’s accuracy. The second direction emphasizes the development of rapid, non-invasive, and user-friendly diagnostic techniques. For example, a recent study employed a portable, handheld ammonia sensor to detect H. pylori infection36. In comparison, this study’s method analyzes a broader range of gas types, showcasing an advantage in multidimensional gas recognition. Moving forward, we aim to develop more portable and efficient detection devices to enhance the applicability and competitiveness of the method presented in this study.

This study establishes a method for diagnosing H. pylori infection without the use of isotopic breath tests, providing new insights and approaches for using breath analysis in the diagnosis of H. pylori infection. In future research, we plan to increase the sample size to enhance the accuracy of breath screening for H. pylori infection. Furthermore, we will continue to optimize and refine the experimental testing protocols and platform to ensure data quality, and directly analyze the compositional differences in exhaled breath between H. pylori-infected patients and healthy controls.