Background & Summary

The thyroid, an essential endocrine organ, is frequently affected by thyroid nodules, a prevalent type of endocrine tumor1. Recent statistics indicate a consistent increase in the incidence of thyroid nodules, affecting over half of the population, with an estimated 5–15% being malignant2. Thyroid cancer ranks as the ninth most common cancer in the world3. Approximately 30% of thyroid nodules are indeterminate preoperatively, which is one of the great challenges in clinical practice. Thyroid carcinomas are roughly classified based on histopathology into differentiated thyroid carcinoma (papillary (PTC, ~84%), follicular (FTC, ~4%), and oncocytic thyroid carcinoma (~2%)), poorly differentiated thyroid carcinoma (PDTC, ~5%), and anaplastic thyroid carcinoma (ATC, ~1%) of epithelial cell origin, as well as medullary thyroid carcinoma (MTC, ~4%) of C cell origin4,5. There are two extremes in the prognosis of patients with thyroid cancer. Although approximately 99% of thyroid cancer patients survive beyond 20 years (statistics from The Surveillance, Epidemiology, and End Results program database), once a patient is identified with undifferentiated cancer, their median survival drops to only 6.5 months6, with nearly 100% disease-related mortality7. Therefore, accurate stratification and appropriate management of the thyroid at different risks are crucial. For low-risk thyroid cancer, which is precisely identified preoperatively, conservative treatments with active surveillance can be adopted. Conversely, for high-risk tumors, in addition to early diagnosis, it is crucial to find effective therapeutic targets and methods to prolong patient survival. Understanding the proteomic basis of thyroid nodules can help to refine strategies for disease identification8, stratification9, treatment10, and management11.

Our previous studies have underscored the importance of proteomics in exploring thyroid physiology and pathology8,10,12,13. The bottom-up proteomics approach based on mass spectrometry (MS) provides important clues for biomarker discovery14,15. Reliable biomarker discovery relies on high-throughput, stable, repeatable, and deeply covered detection, and the emergence of data-independent acquisition (DIA) strategy just meets these conditions16,17. The DIA mode combines the high discovery ability of the data-dependent acquisition (DDA) method with the high reproducibility of selection reaction monitoring (SRM) targeted quantification, ensuring that all precursors are fragmented within a predefined mass-to-charge (m/z) ratio isolation window for the identification of peptides and proteins18, theoretically yielding lossless data. Given the complexity of DIA data, utilizing a spectral library for searches is typically necessary, as it offers greater accuracy and efficiency compared to searches conducted without one19. Although pan-human spectral libraries have been published19,20,21, their low tissue specificity may introduce inaccuracies during the process of ion matching22. Therefore, several cell-, tissue- or species- specific spectral libraries are needed and have been published23,24,25,26. Previously, our group27 and other research groups10,28 have generated thyroid-specific peptide spectral libraries, but there is still room for improvement due to their limited coverage of thyroid tissue types and mass spectrometer types. This has hindered the clinical application of thyroid spectral libraries to some extent.

In this study, we upgraded and improved based on our previous research, mainly for two reasons. Firstly, to cover the protein types related to thyroid diseases as comprehensively as possible, we included 245 samples from nine tissue types in this section. Additionally, we incorporated samples from 50 human thyroid cell lines. The second improvement is that we collected data using four instruments and established thyroid spectral libraries suitable for different instruments. The ___domain of thyroid proteomics research has utilized advanced mass spectrometry instruments, including Q Exactive HF, Orbitrap Exploris 480, ZenoTOF 7600, and timsTOF Pro. These instruments offer high-resolution and precision mass analysis, enabling a comprehensive characterization of the thyroid proteome. Finally, we used the spectral library for the analysis of nine types of samples. These libraries present a useful resource for mass spectrometry-based protein measurement of thyroid tissues.

Methods

Tissue sample collection

The study methodologies adhered to the guidelines set forth in the Declaration of Helsinki, and the study protocol and waiver of informed consent were approved by the Ethics Committee of Westlake University (ref. 240826GTN001). This study provides a waiver of written informed consent due to the impracticality of obtaining it from discharged or deceased patients. It meets three key criteria: it cannot be feasibly conducted without the waiver, it offers significant social value by creating a thyroid nodule-specific spectral library, and it poses minimal risk to participants by using existing medical records with strict privacy safeguards. The study included 245 paraffin-embedded thyroid tissue samples from multiple centers, including 50 cases of MTC (M), 40 cases of PTC (P), 30 cases of ATC (T), 30 cases of PDTC (D), 28 cases of follicular thyroid adenoma (FA, A), 25 cases of lymph node (LN) with or without cancer metastasis, 22 cases of FTC (C), 13 cases of normal (N), and 7 cases of multinodular goiter (MNG, G). Each tissue section was verified by a histopathologist, and only the area of interest was analyzed.

Cell culture

The thyroid cancer cell lines used in this study were obtained following the methodology outlined in a previous publication29. The cells were cultured and maintained in the recommended medium at 37°C and 5% CO2 in a humidified atmosphere.

Sample preparation for MS analysis

Tissues

Tissue sample preparation followed a pressure cycling technology (PCT) assisted methodology30,31. Briefly, tissues were dewaxed with 100% heptane (Sigma-Aldrich) and hydrated through a series of ethanol concentrations (100%, 90%, and 75%). Subsequently, basic hydrolysis was performed in 100 mM Tris-HCl buffer (pH 10, Sigma-Aldrich) at 95°C for 30 minutes. Proteins were then extracted using a lysis buffer containing 6 M urea (Sigma-Aldrich), 2 M thiourea (Sigma-Aldrich), 10 mM tris(2-carboxyethyl)phosphine (TCEP, Adamas Reagent), and 40 mM iodoacetamide (IAA, Sigma-Aldrich) in a PCT machine set at 30°C for 90 cycles at 45,000 psi pressure with 30 seconds on-time and 10 seconds off-time. Digestion was carried out with trypsin (enzyme-to-protein ratio of 1:50, Hualishi Scientific) and LysC (enzyme-to-protein ratio of 1:40, Hualishi Scientific), using PCT settings of 20,000 psi pressure for 120 cycles with 50 seconds on-time and 10 seconds off-time. The digestion procedure was halted with 1% trifluoroacetic acid (TFA).

Cell lines

Harvested cells were washed with cold phosphate-buffered saline three times and lysed with an 8 M urea buffer. Extracted proteins were processed with 10 mM TCEP for reduction and 40 mM IAA for alkylation. Subsequently, the denatured proteins were digested with trypsin (Promega) at the enzyme-to-protein ratio of 1:100 for 4 hours and 12 hours. The digestion procedure was stopped with 1% TFA.

Pooled sample combination strategy

The peptides extracted from each type or related type of sample were mixed into pools. Specifically, FA versus FTC, normal thyroid versus MNG, and all cell lines, respectively, combined into three pooled samples. Other pools were created from the same histological tissue type. In total, we obtained eight pooled peptide samples.

High-pH reversed-phase chromatography fractionation of peptides

All pooled peptides (120 μg) were separated using high-pH reversed-phase high-performance liquid chromatography (HPLC) with a Thermo Dionex UltiMate 3000 system (Thermo Fisher Scientific, Germany). The system was equipped with an XBridge Peptide BEH C18 column (4.6 mm × 250 mm, 5 μm particle size) and operated at 45°C. Buffer A (pH 10) consisted of distilled water with 0.6% ammonia, while Buffer B (pH 10) was composed of 98% acetonitrile with 0.6% ammonia. A gradient of Buffer B ranging from 8% to 34% over 120 minutes was applied, with a flow rate of 1 mL/min. One fraction was collected every minute, resulting in a total of 120 fractions. These fractions were then combined based on 30-minute intervals; for instance, fractions #1, #31, #61, and #91 were merged into one sample. This process resulted in a final set of 30 merged fractions. After acquiring 240 merged fractions from eight pools, the next step was to dry and re-dissolve these merged fractions. Additionally, an internal standard iRT (Biognosys, Switzerland) was added to the samples.

DDA

Q Exactive HF

The fractionated peptides were injected into an UltiMateTM 3000 RSLCnano System (Thermo Fisher Scientific, Germany) and further separated using an analytical column (15 cm × 75 µm) packed with 1.9 µm 100 Å C18-Aqua, employing a 60-minute gradient at 300 nL/min. The linear gradient ranged from 8% to 28% buffer B (98% acetonitrile, 0.1% formic acid), while the mobile phase buffer A contained 2% acetonitrile and 0.1% formic acid. The eluted peptides were analyzed using a Q Exactive HF mass spectrometer (Thermo Fisher Scientific, Germany). For DDA analysis, full MS scans for precursors were performed over an m/z range of 400 to 1200 Th at a resolution of 60,000 full width at half maximum (FWHM) in the Orbitrap. The automatic gain control (AGC) target value was set to 3E6 charges, and the maximum injection time (MaxIT) was set to 80 ms. The top 20 peptide signals were selected for fragmentation with 27% collision energy in a higher-energy collision dissociation (HCD) cell and subsequently transferred for MS/MS analysis. For MS2 scans, the resolution was 30,000 FWHM and the AGC target was 1E5.

Orbitrap Exploris 480

The peptide separation step used the same settings as previously described for the Q Exactive HF, except the linear gradient was adjusted to 7% to 30% buffer B. The eluted peptides were then ionized and transferred into the mass spectrometer. High-field asymmetric waveform ion mobility spectrometry (FAIMS, Thermo Fisher Scientific, Germany) was employed in conjunction with the Orbitrap Exploris™ 480 (Thermo Fisher Scientific, Germany). Full MS scans were conducted in the orbitrap over an m/z range of 350 to 1200 Th at a resolution of 60,000, with the AGC target set to 300% charges. FAIMS settings were enabled at two compensation voltages (CVs): −44 V and −64 V. The MaxIT was set to 20 ms. Each selected CV was applied to sequential survey scans and MS/MS cycles, each lasting one second per CV. Survey scans were followed by MS2 scans of the most intense precursor ions for one second. Fragmentation of peptide signals was performed in HCD with 30% collision energy. MS2 scans within a m/z window of 1.6 were recorded in the Orbitrap with a resolution of 30,000 FWHM, an AGC target value of 7.5E4 charges, and a MaxIT of 22 ms. The radio frequency ion funnel was set to 40.

timsTOF Pro

Data acquisition using the timsTOF Pro followed the methodology outlined in our previous publication32. Briefly, fractionated peptides were further loaded and separated on a 15 cm × 75 μm silica column with 1.9 μm 100 Å C18 aqua, using the nanoElute® system. The mobile phase included buffer A (100% H2O, 0.1% formic acid) and buffer B (100% acetonitrile, 0.1% formic acid). Buffer B concentration was increased from 5% to 27% over 50 minutes, then to 40% over 10 minutes, and finally to 80%, using various linear gradients. Eluted peptides were analyzed with a timsTOF Pro (Bruker, Germany) using hybrid trapped ion mobility spectrometry (TIMS). DDA in Parallel Accumulation–Serial Fragmentation (PASEF) mode included 10 PASEF scans per topN cycle to enhance the ion mobility spectral library. The dual TIMS analyzer had 100 ms for both accumulation and ramp time, totaling a cycle time of 1.17 seconds. Ion mobility was scanned from 0.6 to 1.6 Vs/cm², and MS1 and MS2 data were collected in the m/z range of 100 to 1700 Th. Precursors reaching 20,000 arbitrary units were dynamically excluded for 0.4 minutes, with singly charged precursors excluded by their m/z–ion mobility position.

ZenoTOF 7600

The peptide separation step used the same settings as previously described for the Orbitrap Exploris 480, except the linear gradient was adjusted to 9.5% to 17% buffer B. In information-dependent acquisition mode, a time-of-flight mass spectrometry survey scan was conducted over an m/z range of 350 to 1500 Th, with an accumulation time of 0.1 seconds. The 40 most intense precursor ions (charge states 2+ to 5+, counts > 300) from the survey scan were consecutively isolated for product ion scans. Dynamic exclusion was implemented for 6 seconds. Product ion spectra were collected for 20 milliseconds within the  m/z range of 100 to 1500 Th, utilizing rolling collision energy parameters: 0.048 × m/z + 4 for z = 2, 0.049 × m/z + 3 for z = 3, 0.050 × m/z + 2 for z = 4, and 0.050 × m/z - 2 for z = 5.

Spectral library construction based on DDA

Peptide spectral libraries were constructed using the FragPipe platform33 (version 19.1) with the MSFragger engine34 (version 3.7), against a human FASTA file from SwissProt dated June 16, 2023. The FASTA file comprises 20,422 protein entries and 11 iRT sequences. The “SpecLib” workflow was utilized, with trypsin KR/P specified as the digestion enzyme. A mass tolerance of 20 ppm for precursors and 0.05 Da for fragments was adopted.

DIA and data processing

Full MS scans were executed in the Orbitrap Exploris 480 over an m/z range of 390 to 1010 Th, with a resolution of 60,000 at m/z of 200 Th. Precursor fragmentation was achieved using HCD (normalized collision energy 32%), and the resulting fragment ions were analyzed across 34 DIA windows at a resolution of 30,000. Specifically, 14 windows with a width of 15 Th were implemented at m/z of 395–600 Th, and 20 windows with a width of 20 Th were applied at m/z of 600–1000 Th, ensuring a 1 Th overlap between consecutive windows. Two CVs of FAIMS were configured for these two window partitions: −62 V for the former and −42 V for the latter. The normalized AGC target for MS1 was set at 300%, and for MS2, it was determined at 2000%, with a maximum ion-injection accumulation time of 50 ms and 54 ms, respectively.

The DIA raw data were analyzed by DIA-NN (version 1.8.1)35 against thyroid tissue-specific library built by Orbitrap Exploris 480 containing 175,086 precursors, 152,353 peptides, 12,020 protein groups. Precursors and proteins were determinate at 0.01 false discovery rate. Other parameters were left to default.

Statistical analysis and protein annotation

R (version 4.4.0) was used for statistical analyses and data visualization. The quality of the spectral library was assessed at various levels - protein, peptide, precursor, and fragment ion - using the modified and re-implemented DIALib-QC tool36 within an R script. The seqinR package was utilized for FASTA processing. Kernel density estimation was performed using the ‘density’ function in the stats package, while ‘stat_density2d’ in the ggplot2 package was used for estimating 2D kernel density.

In quantitative proteomics data analysis, proteins were retained only if their present ratio was not less than 50%. Missing values were imputed with a value corresponding to 80% of the minimum value in the entire data matrix, followed by logarithmic transformation. Correlation was measured using Pearson’s method, and a two-way analysis of variance was utilized to determine the significant differences among multiple groups, following a Benjamini-Hochberg adjustment, unless otherwise specified. Protein annotation for cellular ___location and disease biomarker identification was conducted using the Ingenuity Pathway Analysis (IPA) software37.

Data Records

The mass spectrometry proteomics data generated in this study, including instrument raw files, centroided mzML files, spectral libraries in TSV format, sample-data metadata as an SDRF file, FragPipe and DIA-NN configuration files, and search database files, have been deposited to the ProteomeXchange Consortium (https://proteomecentral.proteomexchange.org) via the iProX partner repository38,39 with the dataset identifier PXD05105140.

Technical Validation

Overview of thyroid-specific spectral library construction

To generate an in-depth spectral library for thyroid-related proteomic studies, we collected two types of thyroid samples: tissues and cell lines. Tissues directly represent the disease itself, reflecting real-life situations, and are complex and heterogeneous. In contrast, cell lines are purer and advantageous for functional mechanism analysis. To comprehensively cover the diversity of proteins, we acquired various tissue samples (n = 245) (Supplementary Table 1) from nine types, which mainly encompass four categories: 1) normal or tumor-adjacent normal tissues, 2) benign thyroid tumors (MNG, FA), 3) malignant thyroid tumors (PTC, FTC, MTC, PDTC, ATC), and 4) lymph nodes with or without thyroid cancer metastasis. Additionally, we collected cell lines (n = 50) (Supplementary Table 2) covering the above histological types (Fig. 1).

Fig. 1
figure 1

Schematic diagram of the development of thyroid spectral libraries. Thyroid tissues (n = 245) from nine histological types and thyroid cell lines (n = 50) from six histological types were collected and processed for protein extraction and digestion. Peptides from the same sample type were combined into a mixed pool (eight pooled samples in total) and then separated into 30 fractions. Each fraction was analyzed using four types of mass spectrometers in data-dependent acquisition (DDA) mode. DDA raw files (n = 240) from each mass spectrometer were searched and used to construct spectral libraries. The statistical information for the four established libraries is provided in the accompanying sheet.

Samples were further processed according to our previous publications27; details are described in the Materials and Methods section. Based on sample type and histological type, the extracted clean peptides from tissues or cell lines were combined into eight pooled peptide samples. Each pool was then fractionated by HPLC into 120 samples and further combined into 30 fractions. Each fractionated pool was analyzed using four types of mass spectrometers: ZenoTOF 7600, Q Exactive HF, Orbitrap Exploris 480, and timsTOF Pro, operating in DDA mode. The resulting data were then processed with FragPipe to generate four spectral libraries.

The overview of identification results for our spectral libraries is illustrated in Fig. 1. The library generated by the ZenoTOF 7600 contains 118,539 precursors, 97,195 peptides, and 9146 proteins. The library constructed by the Q Exactive HF comprises 136,821 precursors, 111,689 peptides, and 10,547 proteins. The library established from the Orbitrap Exploris 480 includes 175,086 precursors, 152,353 peptides, and 12,020 proteins. The spectral library constructed by the timsTOF Pro has 215,343 precursors, 172,339 peptides, and 12,213 proteins.

Properties of thyroid-specific library and evaluation

To assess the quality of our spectral libraries, we performed an optimized DIALib-QC pipeline36. We conducted assessments for each spectral library from five to six perspectives, fragments, precursors, peptides, proteins, retention time (RT), and additionally ion mobilities for timsTOF Pro. The characteristics and comparison results are shown in Fig. 2. The trends observed at the fragment, precursor, and peptide levels are consistent across all four instruments. In terms of identification counts, timsTOF Pro illustrated the best performance compared to the other instruments. To more clearly describe the data, each panel in Fig. 2 is detailed using the spectral library established by timsTOF Pro as a representative example. The b- and y- ions contribute to 43.76% (2,856,955) and 56.24% (3,671,375), respectively, of the fragments (Fig. 2a). The distribution of ion charges ranges from +1 to +4, with proportions of 62.45% (4,076,922), 32.34% (2,111,292), 4.72% (308,366), and 0.49% (31,750), respectively (Fig. 2b). A minimum of six fragments are identified in 99.34% of the precursors (Fig. 2c). The precursor mass range covers 250–1750 m/z, and approximately 86% of the precursors are between 400–1000 m/z (Fig. 2d). The majority of precursors have charge states of two (67.78%, 169,334) or three (28.55%, 71,325) across all instruments, while singly charged and higher charge states are less frequent. The timsTOF Pro produced more doubly charged precursors, while the Orbitrap Exploris 480 yielded slightly more triply charged precursors than the other instruments. (Fig. 2e). Peptides containing eight to 20 amino acids account for 83%, with a median length of 13 amino acids, consistent with the properties of trypsinized peptides (Fig. 2f). We next focused on peptide modifications. Oxidation on methionine, the most common modification in our library, was detected in 32,786 peptides. Sample preparation generated 21,527 carbamidomethylated peptides at cysteine residues and 2747 N-terminal acetylated ones (Fig. 2g). Eighty-one percent of peptide segments exhibit a missed cleavage rate of zero (Fig. 2h). From Fig. 2i, there are 6847 proteins identified with at least eight peptides or more, accounting for 56.06% of the total proteins in this spectral library. The median protein coverage is 27.09% (Fig. 2j). Pearson correlation coefficients were calculated between the normalized RT (iRT) values of peptide ions with the same peptide sequence but different charge states (+2 and +3) in the library to evaluate the quality of the ion library. The variable ‘n’ represents the number of peptide pairs showing both +2 and +3 charge states. The results suggest a high correlation of between peptide pairs with +2 and +3 charge states, ranging from 0.98 to 1.00 among four libraries (Fig. 2k). Figure 2l depicts a kernel density distribution and scatter plot, with the x-axis indicating RT and y-axis representing ion mobility. Only the timsTOF Pro spectral library is shown because ion mobility data is only reported in this library. The above data indicates the high quality of our established spectral libraries.

Fig. 2
figure 2

Spectral library characteristics and comparison among four instruments: ZenoTOF 7600, Q Exactive HF, Orbitrap Exploris 480, and timsTOF Pro. (ac) The counts of (a) fragment ion types, (b) fragment charge states, and (c) fragment counts per precursor in the spectral library. (d) The distribution of precursor m/z. (e) The counts of precursor charge states. (f) The distribution of identified peptide lengths. (g) The numbers of modified peptide. Modifications: [+16], methionine oxidation; [+42], N-terminal acetylation; [+57], cysteine carbamidomethylation. (h) The numbers of peptides without and with missed cleavage. (i) Identified peptide counts per protein. (j) Distribution of sequence coverage ratios for each protein. (k) Pearson correlations of iRT values for precursors with +2 and +3 charges. (l) Scatter and density plots of precursor ion mobility at different retention time for timsTOF Pro.

Comparative analysis of spectral libraries built by four different MS machines

We conducted a comparative analysis regarding three aspects, (a) four libraries built using different MS instruments in this study, (b) two version libraries generated through Q Exactive HF this time and previously by our group, and (c) our libraries and other published thyroid-specific libraries.

Among the four libraries generated by the four MS instruments, timsTOF Pro and Orbitrap Exploris 480 performed comparatively outstanding in protein identification, each comprising more than 12,000 proteins (Fig. 3a). The four spectral libraries collectively identified 8841 overlapping proteins. We further sought insight into the 8841 overlapped protein by annotating their cellular localizations through IPA. Among them, 1021 proteins are disease biomarkers, with the proportion of proteins located in the extracellular space increasing from 8.04% to 17.92% and those in the plasma membrane increasing from 12.14% to 20.70% (Fig. 3b).

Fig. 3
figure 3

Comparative analysis of spectral libraries. (a) The upset plot showing the overlapping proteins count of the four established spectral libraries in the present study. (b) Protein ___location ratios of the overlapping proteins across the four libraries (8841 proteins) and the subset of annotated biomarkers (1021 proteins). (c) Comparison of our previously published thyroid-specific spectral library (version #1 in blue) with the newly developed library in this study (version #2 in orange) using the Q Exactive HF at five levels.

Next, we compared our last published version with this new version spectral library developed by Q Exactive HF27. As shown in Fig. 3c, firstly, the new version included nine types of tissues, encompassing the five types from the previous version. Furthermore, there were an additional 2,649,147 (74.11%) transition groups in the new version, which is a significant improvement. From the perspective of precursors, peptides, and proteins, the two versions collectively identified 92,196 (42.14%), 77,312 (45.34%), and 9046 (80.54%) overlapping, respectively. The 2nd version newly includes 61,218 (27.98%), 48,545 (28.47%), and 1405 (1.25%) identifications of precursors, peptides, and proteins.

In addition, we compared with published thyroid-specific libraries by other research groups (Supplementary Table 3). The previously published studies only cover one tissue type (MTC) or four tissue types (N, FA, PTC, FTC)10,28, while ours comprehensively included nine common thyroid tissue types (N, MNG, FA, FTC, PTC, PDTC, ATC, MTC, LN). Therefore, we identified 9146 proteins—3.41 times more than the 2682 proteins identified in one prior study26—and 12,213 proteins—1.44 times more than the 8457 proteins identified in another study9. Furthermore, we not only included tissues but also cell lines. More importantly, our libraries can be compatible with multiple MS types, compared to others with only one type. Deeper coverage (sample type, histological type, MS machine, identifications) opens up more possibilities for protein discovery for thyroid research, thus holding greater promise for clinical applications.

Application of a spectral library in thyroid cancer through comprehensive proteomic profiling

To test the newly established spectral library, we processed five samples from each histological type, totaling 45 samples (Supplementary Table 4), and collected data using the Orbitrap Exploris 480 instrument as a representative example. The identification IDs of each validation sample from the nine tissue types were compared individually (Fig. 4a). The top bar chart shows that the highest number of peptides identified was 41,855, while the lowest was 28,647. Similarly, the bottom bar chart indicates that the highest number of proteins identified was 7273, and the lowest was 6024. Subsequently, using the quantification matrix, we identified the top five highly expressed tissue-specific proteins in each tissue type, as shown in Fig. 4b. Notably, certain proteins such as Q5VTB9_RNF220 and Q15836_VAMP3 exhibit distinct expression patterns across different conditions. The heatmap reveals significant variations in protein expression, which could be crucial for understanding the proteome of thyroid nodules and refining strategies for disease identification, stratification, treatment, and management.

Fig. 4
figure 4

Technical validation of representative spectral library of Orbitrap Exploris 480. Five samples from each tissue type were analysed by data-independent acquisition mode and searched with the spectral library established in this study. (a) The bar plot showing the total number of quantified peptides and proteins in each group. Dots indicate the quantification of each sample. (b) The supervised heatmap of the top five proteins specifically highly expressed in each tissue type. The color represents the z-scored protein expression. Proteins are labeled by UniProt ID_symbol.

Usage Notes

Firstly, this study provides valuable resources by offering DDA quantitative data for the same sample across four different specifications of mass spectrometers from three different brands, which is crucial for data alignment between different instruments. Secondly, the spectral library serves as a key resource for proteomics research in thyroid diseases, enhancing the specificity of mass spectrometry spectrum searches. Users can directly select the appropriate spectral library based on their instrument.

To further enhance the accuracy of library searches and peptide specificity, users can consider the following guidelines. In constructing the spectral library, we carefully evaluated various classification methods, including sample type (e.g., bulk tissue or cell line), specific pathological tissue type (e.g., MTC, FTC, PTC, PDTC, ATC, FA, MNG, N, LN), and the types of instruments (e.g. ZenoTOF 7600, Q Exactive HF, Orbitrap Exploris 480, and timsTOF Pro) used. These factors were integrated into our pooling and fractionation strategy. We recommend that users select specific data subsets according to their research needs and utilize the latest version of software available to construct a more targeted spectral library. By doing so, higher accuracy and specificity can be achieved in proteomics analysis.

Despite the contributions of our study, several limitations should be noted. We used only one enzyme digestion method, which might have limited protein coverage and detection comprehensiveness. Additionally, pooling cell lines without considering tissue type may have affected the study’s relevance to specific tissue-related characteristics or responses. Furthermore, using high-pH HPLC as the only fractionation technique may have restricted peptide coverage.