Introduction

Yersinia pestis, the causative agent of the fatal communicable disease plague, is a Gram negative, non-spore forming bacterium that belongs to the family Enterobacteriaceae. Y. pestis has been documented as a potent biological weapon and classified as a tier 1 select agent by Centre for Disease Control and Prevention (CDC) [http://www.bt.cdc.gov/agent/agentlistcategory.asp]. Plague had had a big impact on human history through three devastating pandemics that initially spread from Central Asia to Africa and Europe; and subsequently outstretched to every continent1,2. Plague is again drawing our concern, as this disease has re-emerged since 1990s in regions where no human cases had been reported for many decades. These include Zambia (1993), India (1994 & 2002), Algeria (2003), and Libya (2009)3,4,5,6. In 2015, 15 human cases of plague were reported in the US, resulting in 4 deaths7, and in late 2017, the island of Madagascar experienced a large outbreak of plague, where 202 deaths were reported (case fatality rate 8.6%) inciting regional panic8.

Apart from Y. pestis, Yersinia has two other human pathogenic species, Y. pseudotuberculosis and Y. enterocolitica which are gastrointestinal pathogens. A plasmid named as pCD1 is shared by all three species of Yersinia which encodes a type III secretion system (T3SS). T3SS injects virulence proteins into host cells resulting in disruption of immune responses and establishment of infection9,10,11. Y. pestis has evolved from its less virulent ancestor, Y. pseudotuberculosis around 5,700-6,000 years ago12,13,14,15 through a series of genetic changes that enhanced its pathogenicity and transmission16. It acquired new genes and modified existing ones, leading to unique virulence factors and adaptation to flea vectors, crucial for spreading the plague. Key changes include acquisition of pMT1 plasmid-carrying the murine toxin (Ymt) that allows bacteria to survive in the midgut of fleas and capsular F1 protein which interfere the complement mediated opsonisation to prevent the phagocytic properties of macrophages and; pPCP1 plasmid-carrying the plasminogen activator (Pla) responsible for spreading in mammals12,17. Y. pestis also lost a few genes which are functional in Y. pseudotuberculosis, reflecting its shift to a vector-borne pathogen.

Plague has three documented clinical forms which include bubonic, septicaemic, and pneumonic plague18. In natural bubonic plague, bacteria are generally transmitted by the bite of an infected flea. The Oriental rat flea (Xenopsylla cheopis) is a classical vector for plague. Plague epizootics result in the deaths of many rodents, causing hungry fleas to search for another host. People or animals visiting the epizootic zone get infected from flea bites2. Earlier, there was a common belief that Y. pestis is unable to survive outside of a flea or a mammalian host19. However, Alexander Yersin showed the isolation of plague bacteria from the soil of a house where the inhabitants had died of plague20, suggesting persistence of Y. pestis in the environment (soil or water). Researchers later documented the isolation of Y. pestis from a burrow of dead plague-infected rodents21,22 and experimentally showed the persistence of Y. pestis in soil for more than 40 weeks23. Earlier studies have shown that Y. pestis was found in culturable form for 2–21 days in tap water and ≥ 100 days in bottled drinking water24,25,26. Though long term survival of Y. pestis in soil still remains a controversial issue, yet it has been suggested that some animals could acquire Y. pestis by burrowing in contaminated soil thus initiate a new rodent/rodent flea cycle of transmission. Experiments carried out by Indian plague commission in 1906 showed that rodent can acquire plague by burrowing in artificially contaminated soil23. Therefore, exposure to environmental matrices accidently or intentionally contaminated with Y. pestis poses a real threat27, necessitating credible and efficient detection of Y. pestis from contaminated environmental sources. Direct detection of Y. pestis is performed by nucleic acid-based or immunometric assays like the plasminogen activator protein enzyme immunoassay (PLA-EIA) and PLA-dipstick28,29,30,31. However, the complex environmental matrices such as soil may contain the substances e.g. humic acid which may interfere with nucleic acid-based detection, resulting in false positive or negative results32,33.

New advancements in mass spectrometry (MS) technology offer a precise, sensitive, and specific identification of microbes. The MS approach enables both targeted and comprehensive shot gun detection of microbes in complex matrices such as air, water, soil, culture medium, physiological fluids, and food because of its ability to simultaneously analyze multiple analytes (molecules of interest) in a single experiment34. Currently, an MS platform like MALDI-TOF-MS is being routinely used in clinical diagnostic microbiology which has advantage of ease of use and very short turn around time35. The MALDI-TOF MS approach provides genus and species level identification of unknown microbes by matching microbial protein spectra against a mass spectral library database of collected microorganisms with known taxonomic identity. However, the technology does not provide sequence-based identification and it has reduced discriminatory power due to relatively low resolution of protein signals over a wide concentration range. This results in overlapping peaks, making it challenging to distinguish between closely related species as they have very similar mass profiles36. Differentiating high-consequence pathogens from related bacterial species is challenging, posing a significant concern for clinical diagnostic laboratories especially in a biothreat context.

In contrast to MALDI-TOF MS, nano liquid chromatography coupled to tandem MS (nLC-MS/MS) offers very high resolution and mass accuracy in a single run, which results in the detection and identification of large numbers of signals with greater precision37. The combination of MS with nLC allows the detection of closely related peptides with high mass accuracy, which is useful in the detection of target organisms in complex samples, such as biological fluids or environmental samples, where numerous background compounds are likely to be present such as inorganic or organic compounds, peptides, and proteins of other microbial community along with their lipids and metabolites38. LC-MS/MS utilizes the shotgun or bottom-up proteomic approach for mass determination of proteolytic cleavage products, typically tryptic peptides, rather than intact proteins39, which leads to the generation of a diverse set of peptides that collectively represent the entire proteome of an organism. This results in better signal intensity and increased sensitivity for detection and quantification of peptides, including those present at low abundance levels. Thus, nLC-MS/MS shotgun proteomic workflow involving enzymatic digestion of complex protein mixtures into peptides provides a powerful approach for high-throughput identification of target microorganisms in complex biological and environmental samples, making it a valuable tool in areas such as biomarker discovery, systems biology, and drug development40.

In this study, we developed a peptide-based screen for the specific and credible identification of Y. pestis. The peptide-based screen was developed by delineating the total proteome of Y. pestis and related Yersinia species followed by elucidating the Y. pestis specific proteotypic peptides by exclusion of common peptides and their global pBLAST analysis. Peptides from plasmids encoded and chromosomal associated virulence markers proteins were shortlisted by bioinformatics approach and also incorporated into the screen. The developed peptide screen was validated using soil samples spiked with different concentrations of Y. pestis. The detection of sizable number of these peptides in spiked soil suggests the usefulness of developed method in identification of Y. pestis.

Results

Total proteome analysis of Y. pestis, Y. pseudotuberculosis, and Y. enterocolitica

The three Yersinia species were grown in BHI broth and total protein was extracted. The cellular proteins were characterized by nLC-MS/MS, and the spectra obtained after MS2 analysis was matched against their respective UniProt database. This led to the identification of 2,355 ± 33 proteins and 25,735 ± 130 peptides of Y. pestis. The identified proteome included 6 proteins (70 peptides) of plasmid pPCP1, 22 proteins (151 peptides) of plasmid pCD1, and 21 proteins (163 peptides) of plasmid pMT1. The remaining proteins and peptides were encoded by the chromosomal genes of Y. pestis (Fig. 1). Similarly, 2,152 ± 20 proteins and 22,303 ± 888 peptides of Y. pseudotuberculosis, and 2,062 ± 22 proteins and 17,260 ± 668 peptides of Y. enterocolitica, with 0.01% FDR confidence and 0.01 to 0 experimental q values were identified after nLC-MS/MS. The detailed data of identified protein and peptide for Y. pestis, Y. pseudotuberculosis, and Y. enterocolitica in replicates (n = 3) are summarized in Supplementary Fig. S1 (A-B). A representative total ion chromatogram (TIC) of three protein samples has been shown in Supplementary Fig. S2.

Fig. 1
figure 1

Distribution of proteins and peptides from Y. pestis on the three plasmids and chromosome, identified after nLC-MS/MS analysis.

Elucidation of putative Y. pestis specific peptides using wet lab data

To determine the differential peptides for Y. pestis, peptides identified in Y. pseudotuberculosis and Y. enterocolitica were subtracted from the total of 25,735 peptides identified for Y. pestis. A total of 6,580 differential peptides were identified (Fig. 2) and subjected to global pBLAST search. The percentage identity, coverage, E-value for the nearest homolog for these peptides in the species other than Y. pestis were listed. A total of 61 peptides showing ≤ 99% identity on 100% sequence coverage with closest homolog in a bacterial species other than Y. pestis were identified as species-specific proteotypic peptides (Supplementary Table S1). The percentage identity of species specific proteotypic peptides ranged from 84 to 98% (58 peptides ≤ 96%; 26 peptides ≤ 93%; 09 peptides ≤ 90%; and 01 peptide ≤ 85%). Peptides showing less than 100% sequence identity with any other bacterial species indicated a difference of at least one amino acid residue and as consequence provides a unique precursor mass for envisaged MS-based assay for the targeted detection of Y. pestis.

Fig. 2
figure 2

Y. pestis specific peptides obtained from the total proteome nLC-MS/MS analysis of Y. pestis, Y. enterocolitica, and Y. pseudotuberculosis.

Elucidation of species-specific proteotypic peptides using in silico analysis

Y. pestis plasmids pMT1, pCD1, and pPCP1 are known to code for virulence associated proteins. The FASTA sequences of all the plasmid encoded proteins and chromosomal virulence associated proteins were subjected to global pBLAST analysis as described in the Materials and Methods. After pBLAST analysis, 45 proteins from pMT1, 23 proteins from pCD1, and 02 proteins from chromosomal virulence proteins exhibited < 100 identity with closely related species as listed in Supplementary Table S2 (A-B) & S3. The proteins from pPCP1 showed no significant difference from other related species. These putative Y. pestis specific proteins were subjected to in silico trypsin digestion and peptides (> 1000 Da) were subjected to global pBLAST. A total of 134 peptides from pMT1, 11 from pCD1, and 03 from chromosomal virulence associated proteins were shortlisted based on the sequence similarity results (≤ 99% identity with closest species) (Supplementary Table S4 A-C).

Consolidated list of Y. pestis specific proteotypic peptides for detection

The proposed schematic workflow for the development steps of the putative peptide screen is illustrated in Fig. 3. Out of 6,580 peptides identified by nLC-MS/MS analysis, 61 peptides were found to be Y. pestis specific and a total of 148 peptides were elucidated by in silico analysis. The consolidated list of Y. pestis specific peptides consisting of 209 peptides obtained through combined wet lab and in silico analysis was termed as peptide screen for identification of Y. pestis (Supplementary Table S5). Interestingly, we found that 8 of 61 peptides from wet lab experiments could be traced back to 8 proteins having molecular size within 2k- 20k Da range. Likewise, 49 of 148 peptides from in silico work could also be traced back to 34 proteins of same molecular size range suggesting that these proteins may be relevant as characteristic peak for primary identification of Y. pestis by MALDI -TOF -MS (Supplementary Table S6).

Fig. 3
figure 3

Schematic workflow for the development of the putative peptide screen.

Validation of peptide screen for detection of Y. pestis

To validate the above Y. pestis specific peptide screen, we attempted to detect these peptides in total proteome of soil samples spiked with Y. pestis. Garden soil used for the spiking experiment was composed of 42% sand, 21.2% silt, and 36.8% clay, resulting in a clay loam texture with pH 8.5. Additionally, the soil exhibited an electrical conductivity of 0.55 dSm−1, an organic carbon content of 0.48%, available nitrogen content 192.0 kg/ha, phosphorus content 10.1 kg/ha, and potassium content 178.0 kg/ha. The bacterial flora of the garden soil sample was determined using 16 S rRNA metagenomics analysis suggested the presence of various bacterial species e.g. Bacillus thuringiensis, B. licheniformis, B. tropicus, B. paramycoides, B. velenzensis, Aneurinibacillus migulans, Azospirillum sp., Staphylococcus xylosus, and Staphylococcus succinus. Y. pestis bacteria were spiked in soil and the samples were incubated overnight at room temperature. The results of viable count showed that the garden soil samples (per g) were spiked with 2.6 × 107 to 2.6 × 104 cfu of Y. pestis and viable Y. pestis could be cultured from all soil samples after overnight incubation. Total protein from spiked and un-spiked control soil samples was extracted and subjected to nLC-MS/MS analysis as described in Materials and Methods section. After the MS analysis, the total peptides were screened for the presence of Y. pestis specific peptides elucidated in this study. The numbers of peptides detected in each sample were found to be- 52 peptides in very high (1.3 × 108 cfu), 49 peptides in high (1.3 × 107 cfu), 34 peptides in medium (1.3 × 106 cfu), and 18 peptides in low (1.3 × 105 cfu); while no peptide could be detected in blank sample (un-spiked soil). The results are represented in Fig. 4 and depicted in Table 1. A representative total ion chromatogram (TIC) of soil samples has been shown in Supplementary Fig. S3.

Fig. 4
figure 4

Schematic flowchart representing the validation of peptide screen.

Table 1 Validation of Y. Pestis spiked soil samples using proposed peptide-based screen.

Discussion

Y. pestis primarily resides in flea vectors and rodent hosts. The bacteria are transmitted to humans by bite of infected flea; however, humans are accidental hosts and disease occurs when they enter the epizootic zone. The apparent resurgence of plague may be attributed to the fact that Y. pestis can endure for extended periods within its natural ecological habitat without resulting in any human infections. There are accumulating data suggesting the presence of Y. pestis in environmental matrices with a primary focus on rodent burrows, soil, and water21,22,23,24,25,26. Apart from soil and water, persistence of Y. pestis on environmental surfaces, particularly stainless steel, polyethylene, glass, and paper have also been documented41. In addition, there is a risk of an intentional spread of Y. pestis in the environment with the aim of using this as a biological weapon27. Thus, to mitigate this disease, an accurate and credible method for identification of Y. pestis from environmental matrices is required.

In this study, our primary objective was to develop a robust peptide-based screen for the sensitive and specific identification of Y. pestis primarily from challenging environmental samples. Combining a shotgun MS approach with in-silico analysis, our integrated methodology sought to obtain sequence information for a diverse array of peptides with specific affiliation to Y. pestis. Subsequently, we aimed to create an inclusion list comprising of unique species-specific peptides to facilitate the unequivocal identification of Y. pestis in complex environmental matrices. Notably, during the analysis, we rigorously eliminated peptides that were common to other closely related Yersinia species, ensuring that the selected peptides were distinctive to Y. pestis only.

In this study, 2,355 proteins and 25,735 peptides of Y. pestis were identified using nLC-MS/MS, leveraging nano-flow chromatography for smaller column diameter, and reduced flow rates. This enhances peak sharpness, improving the separation of complex mixtures and increasing signal-to-noise ratios, particularly advantageous for low-abundance proteins or trace-level analytes. The fragmentation of selected ions in nLC-MS/MS provides detailed sequence information, aiding discrimination of closely related compounds and reducing false positives. Previous studies employing various mass spectrometry techniques have reported limited number of identified proteins for Y. pestis. For instance, Jabbour et al. (2010)42 reported identification of 182 proteins in the whole cell proteome of Y. pestis through liquid chromatography-tandem mass spectrometry. Another study utilizing different strategies such as 2D SDS-PAGE and liquid chromatography with LTQ-FT, identified 2000 proteins43. The present study reports by far the deepest coverage of Y. pestis proteome using LC-MS/MS analysis.

Arguably, Y. pestis diverged from Y. pseudotuberculosis roughly 2,500 years ago and shares approximately 97% nucleotide identity at the whole genome sequence level44,45,46. It is important to discriminate between these closely related species for verification of Y. pestis from environmental or clinical samples suspected to contain the plague causing pathogen. We analyzed the total proteins and peptides from Y. pestis, Y. pseudotuberculosis and Y. enterocolitica for a differential elucidation of putative Y. pestis specific peptides. Of the total 25,735 peptides of Y. pestis identified here, 6,580 peptides were differentially present in Y. pestis as compared to the other two closely related species of Y. enterocolitica and Y. pseudotuberculosis. Out of 6,580 peptides, 61 were identified as specific to Y. pestis following global pBLAST analysis. To broaden the screen comprehensiveness, additional 148 Y. pestis-specific peptides (134 from pMT1, 11 from pCD1, and 3 from chromosomal virulence factors) were included after in silico analysis.

In last decade or so, MALDI-TOF-MS has replaced biochemical assay-based identification of bacteria in clinical microbiology set up47,48,49. Initially developed to identify Bacillus anthracis spores and differentiate them from B. cereus in powder samples, MALDI-TOF-MS offers efficient, rapid identification. The technique is well suited for routine diagnostic microbiology because of reduced turnaround times, costs, and overall labour50. In the present work, we found that 57 Y. pestis specific peptides could also be traced back to 42 proteins having molecular size within 2k- 20k Da range suggesting that these proteins may be relevant as characteristic peak for primary identification of Y. pestis by MALDI -TOF -MS.

Virulent strains of Y. pestis contain plasmids coding for virulence associated proteins. For instance, Y. pestis S1 strain contains three plasmids (pMT1, pCD1, and pPCP1) coding largely for the virulence associated proteins. The expression of virulence associated proteins coded by pMT1 plasmid is likely to be higher at 37oC51. Moreover, adding certain supplements to liquid broth medium enhances the expression of virulence determinants in Y. pestis. For instance, to express F1 proteins from plasmid pMT1, 0.2% xylose is introduced to the broth medium and the culture is grown at 37ºC instead of the optimal growth temperature of 28ºC52. Similarly, for the expression of V protein derived from plasmid pCD1, the liquid broth medium is supplemented with either 2.5 mM CaCl2 or 20 mM MgCl2 and the cells are grown at 37ºC53,54,55. In the present investigation, we deliberately cultivated the Y. pestis S1 strain at 28oC to mimic the natural conditions the bacterium may encounter in soil. This also helped in giving greater insight to the Y. pestis specific proteome coverage of the bacterium chromosome instead of the plasmid coded virulence associated protein as they are well documented. This temperature selection for the bacterial growth likely resulted in reduced expression of proteins from pMT1 as evident from the peptide results, providing opportunity to screen for additional species-specific chromosomal peptides. In this study, only 21 proteins from pMT1 and 22 from pCD1 were observed to be expressed at 28oC and none of these were the classical virulence factors e.g. F1 or V antigen. Of these proteins only 01 from pMT1 could be represented in peptide screen.

To validate the developed peptide screen, we chose to use the artificially contaminated soil. The reason for choosing soil as matrix was twofold. One, in a scenario of intentional aerosol release of this pathogen, sampling of environmental soil may be required to collect the evidence. Secondly, as Y. pestis have been shown to survive in soil, there is a possibility of transmission of Y. pestis to rodents from contaminated soil which can initiate a new rodent/rodent flea cycle of transmission56. We found that Y. pestis specific peptides could be determined in all soil samples suggesting the usefulness of the developed peptide screen.

There are several factors which may influence the survival of bacteria in soil e.g. RH, nutritional sources, and temperature57,58. Further, due to adverse conditions, there is expression of proteins that help bacteria to survive and are crucial for adaptation to stress, nutrient scarcity, and changing conditions59,60,61. A number of these proteins e.g. Cold Shock Proteins (CspA family), Acyl-CoA Thioesterase, Peroxiredoxin/Glutaredoxin Family Proteins, Putrescine-Binding Periplasmic Protein, HTH-Type Transcriptional Regulator (SgrR), Vitamin B12 Import System (BtuC) etc. have been reported to be expressed in various bacteria during adverse conditions62,63,64. After overnight incubation of soil, we observed that Y. pestis could be cultured next day. Interestingly, a total of 131 proteins were observed to be uniquely expressed in Y. pestis from spiked soil, compared to the pure Y. pestis culture. These 131 proteins included above mentioned stress proteins suggesting that Y. pestis adapted to the new environment. However, none of peptides originating from 131 uniquely expressed proteins were specific to Y. pestis and did not form part of developed peptide screen.

The nLC-MS/MS method provides a deep coverage of bacterial proteome, uncovering unknown peptide markers that can be used for analyzing complex samples for identification and proteomic characterization of Y. pestis. Moreover, this approach boasts enhanced sensitivity, enabling the detection of proteins with lower abundance, coupled with a broader dynamic range. The list of Y. pestis specific peptides reported here can be used for developing robust MS based targeted assay for quantitative analysis. The proof-of-concept study using spiked soil samples clearly indicated strength of our peptides screen in identification of Y. pestis in complex environmental milieu in a probable biothreat scenario and beyond. The current methodology can also be employed in clinical settings for disease diagnosis, providing deeper analysis with a diverse range of proteins or peptides.

In conclusion, the present study provides the deepest proteome analysis with coverage from chromosomal proteins for elucidation of Y. pestis specific peptides. The differential analysis of proteome using the closely related species of Y. pestis (Y. enterocolitica and Y. pseudotuberculosis) strengthens the specificity component of the peptide screen alongside the global BLAST analysis of putative markers. The outcomes of the validation study affirm that the peptide screen, developed through thorough wet lab analysis using nLC-MS/MS, literature mining, and bioinformatic analysis of peptides derived from the Y. pestis, effectively identified the target bacteria in spiked soil samples at concentrations ranging from 1.3 × 108 cfu to 1.3 × 105 cfu. Hence, the peptide-based screening approach designed for the identification and proteomic characterization of Y. pestis in soil has the potential for application across diverse environmental and clinical matrices. The consolidated list of Y. pestis specific peptides can be potentially used for SWATH MS analysis and developing targeted assay such as Selected Reaction Monitoring (SRM) and Multiple Reaction Monitoring (MRM) for detecting Y. pestis in diverse and complex matrices. Additionally, several peptides could be traced back to proteins which may be relevant for spectra generation for identification of Y. pestis using MALDI-TOF-MS. We believe similar approach may be used to generate peptides database for credible detection of other bacterial species.

Materials and methods

Bacterial strains and growth conditions

Yersinia pestis strain S1, Yersinia pseudotuberculosis NCTC 10275, and Yersinia enterocolitica NCTC 10460 were used in this study. The Y. pestis S1 strain was isolated from a patient of Shimla outbreak65 and belonged to Antiqua biovar. It was retrieved from the glycerol stock (−80 °C) of the Defence Research and Development Establishment’s (DRDE) culture repository and grown in Brain Heart Infusion (BHI) broth (Sigma Aldrich, USA) at 28 °C for 48 h with shaking (150 rpm). Y. pseudotuberculosis NCTC 10275 and Y. enterocolitica NCTC 10460 strains were obtained from the National Collection of Type Cultures (NCTC), U.K., and grown at 37 °C in BHI broth for 24 h with shaking conditions (150 rpm). All manipulations with live Y. pestis, e.g. extraction of total cellular protein lysate and preparation of whole cell protein from the spiked soil samples, were carried out in High Containment Facility, a biosafety level 3 facility.

Extraction of total cellular protein

The total cellular protein of Y. pestis, Y. pseudotuberculosis, and Y. enterocolitica was extracted by the method of Rajoria et al.66 with modifications. Briefly, 5 ml bacterial cultures were harvested by centrifugation at 8000 × g for 30 min at 4 °C and washed with 20 mM Tris-Cl buffer (pH 7.5). The culture pellet was re-suspended in extraction buffer (2% SDS, 100 mM DTT, 20 mM Tris-Cl, pH 8.8) and incubated at 95 °C for 20 min. The supernatant was collected after centrifugation at 20,000 × g for 10 min and precipitated with trichloroacetic acid (TCA, 10% w/v) and β-mercaptoethanol (0.07% v/v) on ice for 2 h. Afterwards, the protein pellet was collected by centrifugation at 10,000 × g for 20 min at 4 °C. Using the method described earlier, the protein pellet was air-dried for 5 min at room temperature and solubilized in 200 µl of SDS-lysis buffer (5% SDS (w/v) in 0.1 M Tris, pH 8.8). The protein samples from each bacterial strain were prepared in triplicate, and their concentrations were determined using the Pierce™ BCA Protein Assay Kit (Thermo Scientific, U.S.) and subjected to nLC-MS/MS analysis using the custom services at m/s Valerian Chem Private Ltd., New Delhi.

In-sol tryptic digestion

To reduce disulfide bonds, 50 µg of protein sample was treated with 5 mM Tris (2-carboxyethyl) phosphine hydrochloride (TCEP, Sigma Aldrich, USA). After TCEP treatment, free sulfhydryl groups of cysteine residues were alkylated with 50 mM iodoacetamide (IAA, Sigma Aldrich, USA) to form S-carboxyamidomethyl-cysteine and further digested with trypsin (Promega, USA) (1:50 trypsin/lysate ratio) for 16 h at 37 °C. The protein digests were cleaned using a C18 silica cartridge (PierceTM C18 Spin Tips & Columns, catalogue number 89870) to remove the salt and dried using a speed vac (Eppendorf 5424R, concentrator plus). The dried enzymatically digested protein was diluted in buffer A [2% acetonitrile (J.T. Baker™, HPLC grade) and 0.1% formic acid (Fisher Scientific, HPLC grade)] and injected in triplicate into the nLC-MS/MS system.

Mass spectrometric analysis of tryptic protein digests

The mass spectrometric analysis was performed on an Easy-nlc-1000 system coupled with an Orbitrap Exploris™ 240 mass spectrometer (Thermo Fisher Scientific, USA). The chromatographic separation of peptide sample was performed on an Acclaim Pep-Map trap C18 column (75 μm× 70 mm with 3.0 μm particle size and 100 Å porosity) (Thermo Fisher Scientific, USA). Desalted peptides (1 µg) were separated using buffer B (80% acetonitrile, 0.1% formic acid) as a mobile phase at a flow rate of 300 nl/min with a linear gradient of 0–40% for 110 min. The peptides separated from the column as elutes were directly introduced into the electrospray source of the Orbitrap Exploris mass spectrometer for MS1 and MS2 analysis. Ions from MS1 spectra were specifically gathered with Max IT (Ion Trap Maximum Injection Time) set to 25 ms, AGC (Automatic Gain Control) target at 300%, RF (Radio Frequency) lens at 70%, resolving power (R) at 60,000, and a mass range of 375–1500 for subsequent MS2 analysis. To control the inclusion of all charge states for a specific precursor, a dynamic exclusion period of 30 s was employed. Following MS2 analysis of 20 most abundant ions, identified with parameters such as Max IT set to 60 ms, resolution (R) at 15 K, and AGC target set to 100%, were carefully screened, and selectively collected. The RAW files generated from all the samples were analyzed by Proteome Discoverer (v2.5) against the respective UniProts Y. pestis, Y. pseudotuberculosis, and Y. enterocolitica databases. The following search parameters and filters were used: (i) for the dual Sequest and Amanda searches, the precursor and fragment mass tolerances were set at 10 ppm and 0.02 Da, respectively; (ii) the protease used to generate peptides, indicating enzyme specificity, was set for trypsin/P (cleavage at the C terminus of “K/R” unless followed by “p”), (iii) Carbamidomethyl on cysteine was designated as a fixed modification, while oxidation of methionine and N-terminal acetylation were considered variable modifications for the database search; and (iv) both the peptide spectrum match and protein false discovery rate were established at 0.01% FDR.

Elucidation of Y. pestis specific peptides

The peptides obtained from Y. pestis after mass spectrometric analysis (Supplementary Fig. S1) were compared with those from Y. pseudotuberculosis and Y. enterocolitica. From the three bacterial species proteome dataset, peptides unique to Y. pestis were shortlisted. These Y. pestis specific putative marker peptides were further subjected to in silico bioinformatic analysis.

Bioinformatic analysis

In the preliminary step of bioinformatic analysis, the FASTA sequence of shortlisted Y. pestis specific peptides were subjected to global protein BLAST (pBLAST) against the non-redundant protein database at the National Centre for Biotechnology Information (NCBI). The percent identity, query coverage, and E value (≥ 0) were listed for each selected peptide. A peptide with percent identity < 100% to any other species (including the closely related Y. pseudotuberculosis, Y. enterocolitica) was considered unique to Y. pestis as it indicated a difference of at least one amino acid residue and a peptide mass difference in the envisaged downstream targeted MS analysis. Strain coverage of the shortlisted peptides was determined through taxon-specific pBLAST searches and counting for the number of entries for Y. pestis strains from the alignment data generated by ClustalW (https://www.genome.jp/tools-bin/clustalw) analysis.

In silico mining of Y. pestis specific peptides

The plasmid-encoded proteins and chromosomal virulence-associated proteins of Y. pestis were mined through literature survey to look for the species-specific peptides using in silico approach. Y. pestis harbors three plasmids, pMT1, pCD1, and pPCP1, which codes for 99, 89, and 09 proteins, respectively (https://www.uniprot.org/dataset/identifier). Twenty-four virulence-associated chromosomal proteins of Y. pestis were shortlisted by extensive literature mining at PUBMED using a set of appropriate keywords such as protein biomarkers, virulence determinant, immune-protection, and chromosomal virulence factors (Supplementary Table S3). To select putative species-specific proteins, the selected proteins were subjected to global protein BLAST (pBLAST) against the non-redundant protein database at the NCBI using the FASTA sequence of each protein retrieved from UniProt. The proteins were shortlisted for further analysis using criteria of ≤ 99% sequence identity (100% sequence coverage) with the closest homolog in a bacterial species other than Y. pestis. The sub-cellular localization of the shortlisted proteins in the cell was predicted by a deep neural network algorithm using the DeepLoc 2.0 tool (https://services.healthtech.dtu.dk/services/DeepLoc-2.0).

The FASTA sequence of the selected putative marker proteins were subjected to in silico tryptic digestion using the Peptide Mass algorithm at ExPASy Proteomics Tools (http://www.expasy.org). After in silico trypsin digestion, peptides with a molecular mass of ≥ 1000 Da were subjected to global pBLAST against the non-redundant protein database at NCBI (http://www.ncbi.nlm.nih.gov), and the peptides showing < 100% sequence identity with closest homolog in a bacterial species other than Y. pestis were shortlisted as putative species-specific peptide (site last accessed on 11th April, 2024). The unique peptides were further screened for strain coverage as described earlier.

Validation of the peptide screen in soil sample

To validate the above identified peptide screen, Y. pestis was spiked in garden soil samples. The soil sample was collected from the garden of the Institute and physicochemical parameters were analyzed at the Department of Soil Science, RVS Agriculture University, Gwalior, India. The background bacterial load in the soil sample was evaluated through a culture-based method. Subsequently, the colonies obtained were subjected to identification via 16 S rRNA gene sequencing using custom services of M/s Eurofins Genomics India Pvt. Ltd. in Bengaluru. Y. pestis strain S1 (5 ml) grown at 28oC for 48 h (described above) was washed twice, and suspended in 50 mM Tris-HCl (pH 7.5). A ten-fold dilution of the culture was prepared in the 50 mM Tris-HCl and was used to inoculate the 5 gm of garden soil with each dilution. Un-spiked soil samples were treated as blank control samples. The inoculated soil was incubated overnight at room temperature (25oC) to simulate natural binding of the bacterial cells to the matrix. Actual count of bacterial cells used for inoculation of soil was determined by viable plate count. Total protein from spiked and blank soil samples (5 g) were extracted by the method of Rajoria et al.66 with modifications, quantified and whole extracted protein were subjected to nLC-MS/MS analysis as described above. Briefly, spiked soil (5 g) was resuspended in distilled water (2.5 ml) and Alkaline-SDS Buffer (3.75 ml- 5% SDS; 50 mM Tris. HCl; 0.15 M NaCl; 0.1 mM EDTA; 1 mM MgCl2; 50 mM DTT) and intermittently vortexed for 10 min at room temperature. The mixture was heated at 95oC for 10 min and allowed to cool at room temperature. After centrifugation at 10,000 × g for 30 min at room temperature, the supernatant was collected and protein was precipitated by trichloro-acetic acid (TCA) at a final concentration of 10% (w/v) in the presence of β-mercaptoethanol (0.07%). The protein pellet was collected by centrifugation (10,000 × g, 4°C, 10 min) after 2 h of incubation on ice and washed twice with 500 µl of acetone. The pellet was air-dried for 5 min. In the modified method we used 200 µl of SDS-lysis buffer (5% SDS (w/v) in 0.1 M Tris, pH 8.8) for solubilization of the protein pellet unlike the original method. Total proteome of each sample against Y. pestis database was identified. The identified peptides were matched with the list of Y. pestis specific peptide screen.

Statistical analysis

The MS Excel (©Microsoft, USA) was used to perform the descriptive statistical analysis. The triplicate values of protein and peptide are presented as the mean ± standard deviation (SD).

Data availability

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE67 partner repository (http://www.ebi.ac.uk/pride) with the data set identifier PXD054825. Data generated or analyzed during this study are provided in the Supporting Information.