Background & Summary

P. polyphylla var. yunnanensis, a perennial herb of the genus Paris, is a characteristic medicinal plant of Yunnan. It is mainly distributed in Yunnan, Guizhou, and Sichuan Province. The primary chemical constituents of P. polyphylla include steroidal saponins, flavonoids, triterpenes, and polysaccharides. Notably, steroidal saponins are the most abundant chemical constituents and serve as the principal active ingredients1, and a total of 112 steroidal saponins have been isolated and identified from this species2. Research indicates that P. polyphylla possesses several pharmacological effects, including heat-clearing and toxin-detoxifying properties, anti-tumor, anti-inflammatory, antibacterial, hemostatic effects, vascular endothelial cell protection, and promotion of uterine contractions3,4. It serves as the primary raw material for over 100 proprietary Chinese medicines, including Yunnan Baiyao, Gongxuening Capsule, and Jidesheng Snake Tablet5,6. It is widely utilized in clinical settings for the treatment of hemorrhage, trauma, and chronic bronchitis2,3,6. With the rapid advancement of the Chinese herbal medicine industry, the demand for P. polyphylla as a raw material has surged significantly. Reports indicate that approximately 80% of wild resources of P. polyphylla are exploited annually, with the demand in Yunnan Province alone exceeding 1,000 tons, while the annual production of wild resources is less than 100 tons. This, coupled with the plant’s slow growth rate, results in a significant discrepancy between supply and demand. The irreversible loss of wild resources of P. polyphylla could lead to significant ecological imbalance, adversely affecting the habitat of associated flora and fauna, as well as diminishing the genetic diversity of its wild populations. Therefore, studying the growth characteristics, accumulation patterns of medicinal components, and their relationship with growth years of P. polyphylla is of great significance for the rational development and sustainable utilization of its resources7,8.

The accumulation of secondary metabolites in P. polyphylla is influenced by numerous factors. Different growing environments exhibit variations in altitude, soil composition, light exposure, and temperature, all of which significantly influence the yield and quality of P. polyphylla9. Current research has demonstrated that the number of growth years significantly influences the profiles of secondary metabolites of plants10,11,12. Currently, existing literature has conducted comparative studies on P. polyphylla from various producing areas13,14,15, different harvest periods16,17, tissue parts18,19,20 and different varieties14,15. There is also a strong correlation between saponin content and the growth stage of Paris21. However, these studies predominantly focus on single-omics approaches and lack comprehensive multi-omics integrated analyses. Transcriptome sequencing technology enables the rapid and comprehensive identification of gene expression differences and molecular marker screening in tissues or cells under specific conditions. Meanwhile, metabolomics allows for the quantitative analysis of metabolite variations within organisms, thus reflecting the outcomes of gene expression22,23,24. The integrated analysis of transcriptome and metabolomics can screen key gene expressions involved in the synthesis of secondary metabolites in medicinal plants, thereby providing a foundation for understanding their biosynthetic pathways25. Additionally, investigating the relationships between polyphyllins levels and growth years of plants will provide valuable guidance for its sustainable harvesting practices.

In this study, we collected 18 samples of P. polyphylla, aged 3 to 8 years, across six different growth stages for saponin content, transcriptome and metabolomics analyses. There was no significant difference in the total saponin content among 3, 4, 5-year-old samples, all of which had saponin levels below 0.6% (Table 1), failing to meet the standards set by the 2025 edition of the Chinese Pharmacopoeia. The highest levels of total saponins, as well as Polyphyllin I, II, and VII, were observed in the 8-year-old samples, followed by the 7-year-old samples (Table 1). A total of 270.65 Gb of clean data were obtained, and 1,510 metabolites were detected, including phenolic acids, amino acids and their derivatives, flavonoids, steroids, and so on. Among these, phenolic acids were found to be the most abundant, with 186 identified varieties, while tannins were the least abundant, with only 4 varieties detected. The contents of various compounds in P. polyphylla exhibited significant differences across different growth years (Table 2). Diosgenin serves as a precursor in the synthesis of steroidal saponins, and the peak area of diosgenin varies in P. polyphylla across different growth years. This variation may contribute to the differences in the levels of steroidal saponins. In addition, we found that the upstream genes DXS, DXR, GPPS, SMO, DWF1, CPI1, STE1, and C5-SD, of cholesterol synthesis showed significantly high expression in high-level saponin P. polyphylla (PP8). These data provide a reference for the standardized cultivation, quality control, resource protection, and sustainable utilization of P. polyphylla, which is crucial for advancing the modernization and internationalization of traditional Chinese medicine. This study offers comprehensive transcriptomic and metabolomic data for P. polyphylla at various growth stages and served as essential resources for identifying key genes participated in the saponin synthesis in P. polyphylla and for investigating their regulatory roles and functions during growth and development.

Table 1 Contents of polyphyllins in different years of P. polyphylla (mg. g−1).
Table 2 Comparison of metabolite content in rhizomes of different years of P. polyphylla. Different letters indicate significant differences (p < 0.05, n = 3).

Methods

Samples collection and treatment

Samples of Paris polyphylla var. yunnanensis, aged between 3 to 8 years, were collected in November 2023 from Wenshan City (23.27198 N, 104.01723 E), Yunnan Province, China (Fig. 1). All selected samples were subjected to identical cultivation and management conditions. Specifically, they were planted in an open sandy loam soil field and shaded with a black plastic net that blocks 60% of the sunlight. Each sample underwent three biological replicates, and all samples were obtained from three individual plants of similar size. The samples are designated as PP3, PP4, PP5, PP6, PP7, and PP8, representing the rhizomes of P. polyphylla grown for 3, 4, 5, 6, 7, and 8 years, respectively. After removing the fibrous roots, the fresh rhizomes were washed immediately under tap water and stored at −80 °C for metabolome and transcriptome sequencing analysis.

Fig. 1
figure 1

Demonstration map of samples collection base.

Widely targeted metabolic profiling analysis

Metabolites are the basis of biological phenotype, which can help to understand biological processes and their mechanisms more intuitively and effectively. The rhizome samples were placed in a lyophilizer (Scientz-100F) utilizing vacuum freeze-drying technology. Subsequently, the samples were ground into a powder at 30 Hz for 1.5 minutes using a grinder (MM 400, Retsch). A total of 50 mg of the sample powder was extracted with 1.2 mL of 70% aqueous methanol, which had been pre-cooled to −20°C. The extraction process involved vortexing the mixture for 30 seconds every 30 minutes, repeated six times. After centrifugation at 12 000 rpm for 3 min, the supernatant was filtered through a microporous membrane (0.22 μm pore size) and stored in an injection bottle for analysis using the UPLC-ESI-MS/MS system and tandem mass spectrometry system. The UPLC conditions and materials were as follows: column type: Agilent SB-C18 (1.8 µm, 2.1 × 100 mm); solvent system: pure water (0.1% formic acid) and acetonitrile (0.1% formic acid); gradient program: starting at a ratio of 95:5 (v/v) at 0 minutes, transitioning linearly to 5:95 (v/v) over 9 minutes and maintaining this ratio for 1 minute, followed by a return to 95:5 (v/v) over 1.1 minutes and holding for 2.9 minutes; flow rate: 0.35 mL/min; column temperature: 40°C; injection volume: 2 μL. The effluent was connected alternately to an ESI-triple quadrupole-linear ion trap (QTRAP)-MS. The methanol, acetonitrile and formic acid used in this experiment are all chromatographic pure. The ESI source operation parameters were set as follows: source temperature at 500°C; ion spray voltage (IS) at 5500 V (positive ion mode) and −4500 V (negative ion mode); ion source gas I (GSI), gas II(GSII), and curtain gas (CUR) were set at 50, 60, and 25 psi, respectively. Triple quadrupole (QQQ) scans were performed as via multiple reaction monitoring (MRM) experiments with the collision gas (nitrogen) set to medium. For single MRM transitions, declustering potential (DP) and collision energy (CE) were optimized further. A specific set of MRM transitions was monitored for each period according to the metabolites eluted during that time. The qualitative analysis of metabolites was conducted using the proprietary Metware Database (MWDB), while the quantitative analysis was performed using Multiple Reactoin Monitoring (MRM). The specific procedures are as follows: First, the precursor ions of the target substances are screened, followed by the generation of fragment ions through induced ionization. Subsequently, the required characteristic fragment ions are selected to obtain mass spectrometry analysis data for various metabolites. Finally, the peak area of the chromatographic peaks are integrated, and integration corrections are applied.

RNA extraction, transcriptomic sequencing and library construction

The RNA of P. polyphylla was extracted using ethanol precipitation and CTAB-PBIOZOL, and subsequently dissolved in 50 µL of DEPC-treated water. Following extraction, the total RNA was identified and quantified using a Qubit fluorescence quantifier and a Qsep400 high-throughput biofragment analyzer. Oligo (dT) magnetic beads were employed to enrich mRNAs with polyA tails, as most eukaryotic mRNAs possess a polyA tail. Subsequently, a SMARTer PCR cDNA Synthesis Kit was utilized to convert mRNA into cDNA through reverse transcription. Sequencing adapter ligation was performed, followed by DNA magnetic bead purification, and fragment selection was completed to yield a library with insert fragments ranging from 250~350 bp after ligation. The ligated products were then amplified by PCR and purified again using DNA magnetic beads, with the products being solubilized in nuclease-free water. After the initial library construction, concentration detection was performed using a Qubit fluorescence quantifier, followed by fragment size detection using a Qsep400 high-throughput biofragment analyzer. Finally, the effective concentration of the library was accurately quantified using qRT-PCR. Upon passing the library inspection, the full-length transcriptome was sequenced using the PacBio SM-RT Iso-seq platform. The fragments per kilobase of transcript per million fragments mapped (FPKM) value were calculated to determine the gene expression level. To ensure the integrity of the data for subsequent analyses, we utilized fastp to filter out low-quality sequences and adapter sequences. This step was crucial for obtaining clean reads, which serve as the foundation for all further analyses. We then employed SMRTlink software to process the sequence data, resulting in high-quality, polished full-length consensus sequences.

Data analysis

One-way analysis of variance (ANOVA) was conducted using IBM SPSS Statistics version 25.0. Hierarchical cluster analysis (HCA) of samples and metabolites was performed using the R package ComplexHeatmap (version 2.9.4). Unsupervised principal component analysis (PCA) was executed using the statistical function prcomp within the R base package (www.r-project.org, version 4.1.2). Mass spectrometry data are processed by software Analyst 1.6.3.

Data Records

The complete raw metabolomics data set for rhizome tissues of Paris polyphylla var. yunnanensis with different growth years can be accessed on the MetaboLights database under the accession number MTBLS1068326. While the raw transcriptome data information has been submitted to the Genome Sequence Archive (GSA) of National Genome Data Center (NGDC) under the accession number CRA02232627.

Technical Validation

To ensure the reliability of the technical quality of the datasets, each data type adheres to an established methodology for its technical validation. In broadly targeted metabolomics, metabolite quantification is performed using triple quadrupole mass spectrometry in multiple reaction monitoring (MRM) mode (Fig. 2a). For enhanced accuracy and reproducibility in metabolite quantification, the parent ion of the target substance is initially screened using a quadrupole, followed by induced ionization in a collision chamber to form fragment ions. These fragment ions are then filtered through a triple quadrupole to isolate the desired ions corresponding to the target feature. Quality control (QC) samples are utilized to assess the reproducibility of samples processed by the same method, prepared from a mixture of sample extracts. During instrumental analysis, one QC sample is inserted for every ten samples analyzed to monitor the reproducibility of the analytical process and evaluate the consistency of metabolite extraction and detection. The principal component 1 (PC1) scores of the QC sample solutions collected at different times fall within the range of positive and negative standard deviations (Fig. 2b). By overlaying and displaying the analysis of the total ionic current (TIC) plots of the different QC samples (Fig. 3), these results demonstrate high overlap in the total ion current curves detected by metabolites, indicating consistent retention times and peak intensities. This suggests that the signal stability of the mass spectrometer is robust when the same sample is detected at different times, confirming that the instrument’s status was stable and reliable throughout the experiment and that the analytical method was reproducible. Hierarchical clustering analysis of the metabolites from P. polyphylla revealed significant differences in metabolic profiles among samples of varying ages, while samples within the same group exhibited similar metabolic profiles (Fig. 4). It further shows that P. polyphylla has significant differences in different growth years.

Fig. 2
figure 2

(a) Schematic diagram of mass spectrometry multi-reaction monitoring mode. (b) PC1 control chart for all samples.

Fig. 3
figure 3

Detection of TIC overlay by QC sample mass spectrometry. (a) Negative ion mode. (b) Positive ion mode.

Fig. 4
figure 4

Accumulation patterns of metabolite detected from all samples.

During the transcriptome sequencing process, the original third-generation data were processed using the official PacBio software package SMRTlink to obtain consistent transcript sequences. The second-generation data were employed for quality control to eliminate junctions and low-quality sequences, while the third-generation transcripts underwent error correction using the second-generation data and de-redundancy via CD-HIT after obtaining clean data. Subsequent analyses were based on these transcripts. Principal component analysis (PCA), an unsupervised pattern analysis method, elucidates the most prominent features within a multidimensional data matrix. By examining the differences in gene expression across different years of P. polyphylla through PCA, PC1 (20.80%) and PC2 (14.52%) effectively distinguished samples from different years, while samples from the same year clustered together. This indicates a higher reproducibility of the samples and significant differences in gene expression of P. polyphylla across years, consistent with the results from hierarchical clustering analysis (HCA) (Fig. 5).

Fig. 5
figure 5

PCA scores of Unigene detected from all samples.