Introduction

The indigenous Chen-Xiang of China is a resinous part of Aquilaria sinensis(Lour.) Gilg in the Thymelaeaceae family. It is mainly produced in Guangdong, Guangxi, Hainan, and Fujian provinces1. The genus Aquilaria consists of more than 21 species, of which all species of Aquilaria and Gyrinops appear in the Appendix II list of the Convention on International Trade in Endangered Species of Wild Fauna and Flora, since 2004 (Amendments to appendices I and II of CITES, 2004). Aquilaria malaccensis (A. agallocha Roxb.) is mainly produced in Malaysia, India, Laos, Cambodia, Thailand, and Taiwan island2. In Vietnam, Aquilaria crassna (Kỳ Nam, Trầm Hương, Dó Bầu) is the most important variety and is also widely distributed in Cambodia and Thailand3. Agarwood is widely harvested extensively to obtain aromatic oils through a distillation process. The oils have traditionally been used in perfumes in the Middle East and have been widely used in advanced perfumes, toiletries, fragrance additives, and other biotechnology products4. The oleoresin component only exists in withered and dying Aquilaria trees. In recent years, due to the increasing demand and commercial value of agarwood, the trade in agarwood has intensified, leading to the destruction of natural agarwood forests.

As a crucial aspect of aromatherapy, the extraction methods of essential oils have a significant impact on their components. According to the definition of International Organization for Standardization (ISO) (ISO/D1S9235.2), essential oil is a product made by water or steam distillation, citrus peel machining or natural materials. In addition, the methods of COsupercritical extraction, subcritical fluid extraction or organic solvent extraction are also called agarwood extracts or agarwood oils5. The main components are sesquiterpenes, phenylethyl chromone derivatives, and aromatic compounds6,7. Chromone components can be used as important indicator components for the quality evaluation and identification of agarwood. It is generally believed that essential oils with a high content of chromone compounds have better quality8,9. Among the methods of extracting AEOs, steam distillation is also common. The extraction device is simple, the product is natural and pollution-free, but only low-boiling-point components can be extracted, and characteristic components such as chromones are lost, resulting in a low yield of essential oil10. Different extraction methods, plant species, and locational areas of agarwood oils lead to difference in market price performance, which makes the essential oils adulterate3. Research on AEOs mainly focuses on the identification of chemical components, but comparison of compounds is not enough to support the establishment of a quality evaluation system for agarwood11.

Metabolomic data usually include a wide range of dynamic changes in metabolite concentrations, due to the geographical phenology and processing methods of plant varieties and production areas. Although the data obtained by GC-MS can be compared with the common components of AEOs through sampling from a wide range of sources, these common components of AEOs still cannot be used as indicators for statistical verification. Our work focused on seven regions of AEOs by hydro-distillation from different habitats and species. Different chemical constituents were identified by GC-MS fingerprint and multivariate statistical analysis, including partial least squares-discriminant analysis (PLS-DA), orthogonal partial least squares-discriminant analysis (OPLC-DA) and SPSS cluster analysis12,13. This multivariate statistical method is used to handle the complex data generated by GC-MS. It helps in identifying characteristic chemical markers and distinguishing samples from different regions and species14,15,16. The methods together provide a comprehensive approach to analyzing AEOs, ensuring accurate identification and differentiation of their chemical components. These components were identified by using NIST general database retrieval and literature review, providing reference for the overall quality evaluation of AEOs.

Results and discussion

GC-MS analysis of volatile components and common chemical compounds

According to the GC-MS standard mass spectrometry database NIST2020, the volatile components of AEO samples from different habitats and 3 species were analyzed, as shown in Table 1. The 2-(2-phenylethyl) chromone compounds were determined based on the mass spectrometry characteristics and fragmentation patterns summarized in literature17, combined with the characteristics of ion fragments in this study. A total of 127 compounds with more than 85% similarity were identified from the essential oils of 7 regions, accounting for 28.6–74.6% of the total volatile components (Table 1)(Supplement file Figure S1). According to different chemical structures, these volatile components are classified into sesquiterpenes (0.3–70.4%), aromatic compounds (0.6–24.9%), aliphatic compounds (0–8.4%), 2-(2-phenylethyl) chromones (0–12.0%) and others (0.3–7.6%) (Fig. 1). Sesquiterpenes had a total of 73 components, with the most abundant being sesquiterpenes. Aromatic groups followed, consisting of 12 components in total. Aliphatic groups and 2-(2-phenylethyl) chromones were the least, with 10 and 4 components, respectively.

Table 1 Chemical compositions of agarwood essential oils analyzed by GC-MS.
Fig. 1
figure 1

Volatile constituents per each planting region of agarwood.

In the AEOs extracted by hydro-distillation, due to the high temperature of water vapor in the extraction process, or the influence of the solvent used in the analysis, the 2-(2-phenylethyl) chromone component often did not appear11. However, the AEOs obtained by supercritical fluid extraction and microwave-assisted extraction usually contain Flidersiachromone, 6-methoxy-2-(2-phenylethyl) chromone, 4 H-1-benzopyran-4-one chromone, and a few other semi-volatile 2-(2-phenylethyl) chromones11,17. In this study, more aromatic components could be extracted from hydro-distillation AEO samples by pretreatment with 95% ethanol in GC analysis, and excellent chromone components could be detected18.

Low molecular weight aromatic compounds are important components of AEOs, and they are frequently regarded as the primary source of aroma in AEOs. More aromatic compounds were detected in the resinous agarwood, were absent from the non-resinous parts and confirmed as characteristic of the resinous parts19. AEOs contained abundant fatty acids, possibly affecting the complex process of resin accumulation, prolonging the accumulation time, and resulting in a longer formation time being required for agarwood oil yield4. Based on the aliphatic relative content of samples (Fig. 1), these were ranked as S3 Taiwan (8.38%) > S2 Hainan (2.34%) > S7 Cambodia (1.71%) > S1 Guangxi (1.64%) > S6 Vietnam B (0.68%) > S5 Vietnam A (0.15%) > S4 Malaysia (0%). We also considered in practical operations that the lower the fatty acid content, the more important it was to evaluate the quality of agarwood.

The phytocomplexity of the AEOs signifies the production of a multitude of plant–fungus mediated secondary metabolites as chemical signals for natural ecological communication. Table 1 shows an aromatic compound, 4-phenyl-2-butanone, as the only common component. A similar component, 3-phenyl-2-butanone, also appeared in hydro-distilled essential oil of A malaccensis and A. sub-integrafrom Malaysia, Thailand, and Cambodia20. This common 4-phenyl-2-butanone was presented in A. malaccensis represents an important basis for plant–fungal metabolic analysis chemistry in wild plants and in vitro plantlets7,21. We consider that the S2 sample should contain a small amount or no agarwood, as most of its heartwood was directly extracted by hydro-distillation or might have been added with unknown chemical essences. Therefore, six common components could be detected in the other 6 AEOs except the S2 sample, namely, 1,1,4,5,6-pentamethyl-2,3-dihydro-1H-indene (aromatic compound), viridiflorol (sesquiterpenoid compound), bis(2-ethylhexyl)phthalate (aromatic compound, plasticizer), in addition to 5-(2-methylpropyl)-nonane and 2,6,10-trimethyl-dodecane (others). One terpenoid of particular interest is viridiflorol, a known common fragrance molecule of agarwood22. Viridiflorol has shown moderate antibacterial activity against Mycobacterium tuberculosis, the causative agent of tuberculosis, in an in vitro assay. It is also produced by the endophytic root fungus Serendipita indica and exhibits antifungal activity against Colletotrichum truncatum23. It was particularly surprising that bis(2-ethylhexyl) phthalate, diethyl phthalate, and dibutyl phthalate were detected in this study (Table 1). These components are often used as plasticizers, condensing agents, anti-wear agents, and gas chromatographic stationary liquids for polyvinyl chloride resins. The plasticizers are mixed with some food oils to reduce product costs and should not be used as effective components of AEOs. Studies have shown that excessive intake of these plasticizers can have adverse effects on human reproduction, development, and the cardiovascular system24. The total content of plasticizer added in the S4 sample accounted for about 23.9%. The quality of S4 essential oil was poor, and some samples were also detected but low, which might be due to the accumulation of plants themselves, or pollution caused by GC analysis25.

Sesquiterpenoids are natural terpenoids containing 15 carbon atoms in a molecule composed of three isoprene units11. In addition to the viridiflorol, by comparing various samples, we have identified the following 6 common components that were worth noting: elemol (sesquiterpene),γ-eudesmol (sesquiterpene), (-)–aristolene (sesquiterpene), agarospirol (sesquiterpene), 2(3H)-naphthalenone, 4, 4a, 5,6,7,8-hexahydro-4a, 5-dimethyl-3- (1-methyllidene)-, (4ar cis)- (sesquiterpene) and 2-phenyl-4 H-chromen-4-one (chromone derivative) (Table 1). According to modern pharmacology, sesquiterpene components of agarwood have good biological activity in the central nervous system, respiratory system, and digestive system, etc. Elemol is a natural product that sesquiterpenoid has a role as a fragrance, showing modest antioxidant, anti-inflammatory, and antiproliferative activities of the essential oil of Cymbopogon nardus26,27. Plants with aromatic properties have multiple chemical components in their essential oils, such as the main component of Blepharocalyx salicifolius, which was the viridiflorol and eudesmane sesquiterpenes28,29. In the fungus-mediated fermentation of resinous agarwood, the most significant finding was the appearance of key agarwood sesquiterpenes such as agarospirol, γ-eudesmol, (−)-aristolene28. A sesquiterpenoid 2-(3H)-naphthalenone, 4, 4a, 5, 6, 7, 8-hexahydro-4a, 5-dimethyl-3- (1-methylidene)-, (4ar cis)-, which has not been found in other agarwood literatures. However, it showed higher performance in the comparison of common components in this study, and its relative content was higher than other components.

One of the main active components of agarwood, chromone, has been isolated and found to have 240 different subunits. It has anti-inflammatory and anti-tumor properties, neuroprotective effects, and inhibitory effects on acetylcholinesterase, tyrosinase, and glucosidase30. It is worth noting that 2-(2-phenylethyl) chromones often do not appear in the essential oil extracted by hydro-distillation. However, among the four chromones analyzed in this study, the common component, 2-phenyl-4H-chromen-4-one, was presented in the 5 regions, and this chromone component has not been found in other literatures. There were significant differences in the relative content of each component, which might lead to differences in special flavors of agarwood from different habitats.

Multivariate statistical analysis of PLS-DA and OPLS-DA

PLS-DA is the deformation of PLS, used to establish classification and is suitable for supervised discriminant analysis methods with small intergroup differences14. It is applied to prediction and descriptive modeling, as well as selecting discriminative variables, determining the chemical compositions from different genotypes and product regions, and automatically generating more important principal components15,16. The PLS-DA model displayed clear separation among the 7 regions and 3 genotypes of AEOs (Fig. 2a). The software automatically generated R2X (cum) = 0.848, R2Y (cum) = 1, and Q2(cum) = 0.854 for predictive ability. In a previous agarwood study, HPLC chromatograms were used in combination with multivariate statistical screening to establish the identification methods for wild and cultivated agarwood. The Fisher linear recognition model and the PLS-DA recognition model were established31. This study established PLC-DA based on GC-MS data (Fig. 2a), Q2 > 0.5 indicating a strong predictive ability. The result showed that there were significant differences in the volatile components of AEOs from different habitats. In addition, the discrimination of different genotypes was provided with a certain effect. The Permutation validation in SIMCA 14.1 software was used to verify the fitting of PLS-DA (Fig. 2b). Through 200 iterations of permutation testing, the model results showed that the Y-axis intercept was all less than 0, indicating that the PLS-DA model validation results were fitting and reliable. Alternative, the Hotelling’s T2 analysis also verified that all samples were within the 95% confidence interval32,33, validation results provided a more evaluation of model performance. (Supplementary file Fig. S2).

Fig. 2
figure 2

The PLS-DA obtained scatter plot, (a) classification of AEOs in three genotypes; (c) classification of CAN and OCA areas, and 200 permutations of PLS-DA model validation results (b, d). The scatter plot of OPLS-DA, (e) classification of CAN and OCA areas, and the verification results of 200 permutation of OPLS-DA models (f). S: A.sinensis, M: A.malaccensis, C: A.crassna, CNA: AEOs from China (S1-S3 samples). OCA: AEOs from outside of China (S4-S7 samples). R2: Coefficient of determination, Q2: squared cross-validation. The Hotelling’s T2 analysis showed that all samples were within the 95% confidence interval, and no samples beyond this confidence interval were found, so there was no significance.

Supervised methods offer another approach to classification enhancing the discrimination between specimens by minimizing variance34. In this study, PLS-DA was utilized to classify AEOs generated in China (CNA) and non-China (OCA). The model indicated that S4 - S7 were dispersed OCA, regardless of the first principal component or the second principal component. AEOs (S1-S3) produced in CNA showed a quadrant of aggregation (Fig. 2c). The software automatically generated R2X (cum) = 0.84, R2Y (cum) = 1, Q2 (cum) = 0.981, suggesting that the difference in volatile components of AEOs in OCA was significantly higher than that of CNA. Through 200 iterations of permutation testing, the model results showed that the Y-axis intercept was all less than 0, indicating that the PLS-DA model validation results were fitting and reliable (Fig. 2d). However, the model was not effective for screening differential volatile markers (Fig. 2c), so we conducted OPLS-DA to analyze the strategy of identifying these markers.

Supervised OPLS discriminant analysis (DA) was applied to identify the volatile markers for AEOs from different habitats. OPLS has excellent external prediction ability as well as a better visualization effect compared with PLS14. In the OPLS-DA scatter plot (Fig. 2e), the R2X, R2Y, and Q2 of S1-S3 AEO samples from China and S4-S7 samples from outside China were 0.84, 1, and 0.997, respectively. The samples were located on both sides of the positive and negative axes of the first principal component with the X-axis was at 0, indicating that the volatile components of AEOs produced in CNA (S1-S3) could effectively distinguish the two quadrants from AEOs (except S5, S4-S7) in other regions, and the genotype and relative content of AEOs were different. The 200 permutation tests were conducted to verify the OPLS-DA model, and the Hotelling’s T2 analysis showed that all samples were within the 95% confidence interval.

In the multivariate statistical analysis, S5 was an outlier phenomenon, with the difference components being the largest compared with the representativeness of other samples. The difference components of S5 could be used as the volatile markers of A. crassna. Although the PLS-DA genotype discrimination effect was good, the regional characteristics were not obvious in the quadrant, and the SIMCA software could not present the variable influence on projection (VIP). The difference between OPLS-DA groups was maximized, and the difference within the groups was weakened, which was more suitable for the separation of samples between groups35. Therefore, the VIP diagram was presented for further analysis.

OPLS-DA model screening the volatile markers of AEOs

The VIP value and S-plot evaluation method were employed to identify the key components contributing to the grouping of AEOs. The S-plot, a scatter plot combining covariance and correlation loading profiles resulting from an OPLS-DA model was utilized36. Variables with a VIP greater than 1 were deemed statistically significant and served as important markers of the model37,38. The VIP values (Fig. 3a) and S-plot (Fig. 3b) generated by the OPLS-DA model revealed 26 components with a VIP value > 1 (Table 2). These potential component values far away from the origin of S-plot represented variables that contributed a lot to the classification and were more reliable than the near origin components as potential markers to distinguish the AEOs from different producing regions. Statistical tests such as SPSS was carried out on significant variables to make the model acceptable. It is worth noting that sesquiterpenes and chromones are the index components of agarwood, sesquiterpenes α-gurjunene (VIP = 4.86), agarospirol (VIP = 2.86), alloaromadendrene (VIP = 2.49), (-)-aristolene (VIP = 2.37) and chromone 2-phenylethyl-4H-methylene-4-one (VIP = 2.63) were significantly VIP > 2 components between CNA and OCA. While α-gurjunene exhibited the highest VIP value, especially in “Hui-An” agarwoods. OPLS-DA analysis revealed, but this component did not appear in the CNA (S1-S3) group (Table 2, No. 19). The main factor affecting the VIP value was the detection of 40.82% content in S5, but this outstanding VIP value was also easy to distinguish among complex plant metabolites. Moreover, most aromatic components carry distinct aromas associated with AEOs39. Although bis(2-ethylhexyl) phthalate raises concerns due to its toxicity, and it remains uncertain whether it originated from AEOs or pyrolysis during extraction24,25. Due to the detection of 16.4% content of bis(2-ethylhexyl)phthalate in S4 sample, it also affected the VIP value of inter genomic comparison analysis. However, this could also clearly highlight the identification of this component.

Table 2 Twenty-six distinguished violate compounds of AEOs between CNA and OCA.
Fig. 3
figure 3

The VIP value diagram (a) and S-plot diagram (b) of AEOs were obtained by OPLS-DA model. The OPLS-DA model showed that there were 26 components (red color) with a VIP value > 1. The two positions indicated by the arrow are agarospirol (left) and α-gurjunene (right).

AEOs from CNA contained more guaiol, and OCA (Malaysia, Vietnam, and Cambodia) contained more α-gurjunene. The total relative contents of differential components between the two producing regions were higher in OCA. Notably, two sesquiterpenes, α-gurjunene and agarospirol, stood out in the S-polt diagram, being distanced from the origin and the main compound groups (Fig. 3b). Specifically, α-gurjunene (VIP = 4.86) significantly influenced the grouping of AEO samples and was positively correlated with the grouping of AEOs. Prior studies have employed OPLS-DA model to discriminate between the A.sinensisand its subspecies “Chi-Nan” and to identify potential distinguishing components. Notably, sesquiterpenes, particularly guaiane and eudesmane derivatives, were considered key markers contributing to their odoriferous properties37. Similarly, the sesquiterpenes in AEOs also exhibited significant differences, indicating their potential as characteristic components.

OPLS-DA model comparing the pairwise genotypes of AEOs

In this study, OPLS-DA effectively modeled two or more classes. In addition to the CNA and OCA analyses mentioned above, three agarwood genotypes, A. sinensis, A.malaccensis and A.crassna, were classified and compared according to the pairwise genotypes (Fig. 4). The OPLS-DA model analyzed different components among different producing regions. The model results indicated that when comparing A.sinensis and A.malaccensis, R2X (cum) = 0.689, R2Y (cum) = 1, Q2 (cum) = 0.86 ; when comparing A.sinensis and A.crassna, R2X (cum) = 1, R2Y (cum) = 1, Q2 (cum) = 1, when comparing A.malaccensis and A.crassna, R2X (cum) = 1, R2Y (cum) = 1, Q2 (cum) = 1, indicating that the models could describe most of the GC-MS data and possessed good predictive ability. The volatile components of AEOs exhibited certain similarities within the same genotypes, but differences existed between different genotypes.

Fig. 4
figure 4

OPLS-DA model was used to distinguish the producing areas of AEOs by comparison of pairwise genotypes. Compare S: A.sinensis with M: A. malaccensis (a), compare S: A.sinensis with C: A.crassna (b), compare M: A.malaccensis with C: A.crassna (c).

VIP values and S-polt diagrams were used to screen the differential chemical components contributing the most to the pairwise genotype group (Fig. 5). The VIP value results showed that there were 25 components with a VIP value > 1 between A. sinensis and A. malaccensis, with sesquiterpene guaiol (VIP = 2.55) being the largest contribution component between the two genotypes (excluding bis(2-ethylhexyl) phthalate) (Fig. 5a). Comparing A. sinensis and A. crassna, 25 components with VIP value > 1 were identified, with α-gurjunene (VIP = 5.28) being the largest contribution component (Fig. 5c). Between A. malaccensis and A. crassna, 22 components with VIP value > 1 were found, with α-gurjunene (VIP = 5.03) being the largest contribution component (Fig. 5e).

Fig. 5
figure 5

The VIP value diagram (left) and S-polt diagram (right) of PLS-DA model were compared the pairwise genotypes of the AEOs. (a and b) A.sinensis and A.malaccensis, (c and d) A.sinensis and A.crassna, (e and f) A.malaccensis and A.crassna. The arrows exhibit significant differences in composition contribution between two-pair genotypes. Plasticizers should not be the component of AEOs, so the contributing component should be eliminated.

The results of the differential component analysis revealed that AEOs from A. sinensis contained relatively more guaiol and 2-phenethyl-4H-chromen-4-one, whereas those from A. malaccensis contained more sesquiterpene 2- (4a, 8-dimethyl-2,3,4,5,6,8a-hexahydro-1 H-naphthalen-2-yl)propan-2-ol (Fig. 5a and b). Additionally, when comparing AEOs of A. sinensis and A. crassna, aside from the differences in guaiol, A. sinensis AEOs also contained more agarospirol and 2-phenethyl-4H-chromen-4-one, while the AEOs of A. crassna contained more sesquiterpenes α-gurjunene and alloaromadendrene (Fig. 5c and d). Compared with AEOs in pairwise genotypes, A. malaccensis and A. crassna, the α-gurjunene exhibited significant differences in composition contribution (Fig. 5e and f), with AEOs of A. malaccensis containing more sesquiterpene 2-(4a, 8-dimethyl-2,3,4,5,6,8a-hexahydro-1H-naphthalen-2-yl)propan-2-ol, while AEOs of A. crassna contained more sesquiterpene γ-eudesmol. These contribution classifications of pairwise genotypes could serve as potential markers to distinguish AEOs of different species. The results demonstrated that the production regions of AEOs could be better distinguished based on chemometrics.

Analysis of such multivariate data requires methodology capable of handling both the contribution to the OPLS model, i.e., concentration variant, and correlation to the OPLS model, i.e., concentration invariant36. Statistical SPSS test (P < 0.05) was carried out on significant variables to make the model acceptable. Based on the above OPLS-DA results, the unique phytochemical characteristics of various species may be related to the genetic information of primitive plant germplasm or endophytic fungi. The current strategy focuses on this complex problem, emphasizing the strategy of obtaining additional information when appropriate multivariate modeling is combined with appropriate and effective visualization of specific marker metabolites to identification.

Conclusion

AEOs have a significant international market through hydro-extraction, particularly in Muslim regions. For the first time, we utilized GC-MS to delineate the chemical fingerprints of AEOs in three primary genotypes: A. sinensis, A malaccensis and A.crassna, and analyzed the differences in aroma components across various production regions. Metabolomics data typically encompass vast dynamic ranges in metabolite concentration. Here, we reveal distinctive differences in sesquiterpenes, chromone and its derivatives, and low-molecular-weight aromatic compounds. A total of 127 compounds were identified from the AEOs, with sesquiterpenes comprising the majority, totaling 73 components. The aromatic compound 4-phenyl-2-butanone was the sole common component among the seven samples. Additionally, there were 7 common components with a higher occurrence of sesquiterpenes and chromone: viridiflorol; elemol; γ-eudesmol; (-)–aristolene; agarospirol; 2(3H)-naphthalenone, 4, 4a, 5,6,7,8-hexahydro-4a, 5-dimethyl-3- (1-methyllidene)-, (4ar cis)- and 2-phenyl-4H-chromen-4-one. It was particularly surprising that plasticizers bis(2-ethylhexyl) phthalate, diethyl phthalate and dibutyl phthalate were detected in this study. The total content of plasticizers added in S4 sample accounted for about 23.9%, considering the poor quality of S4 essential oil. Other samples exhibited low levels of detection, likely due to contamination during GC analysis.

PLS-DA and OPLS-DA methods were employed for multivariate statistical analysis of the differential chemical components between different genotypes and habitats. The results demonstrated that the AEOs from different habitats could be effectively classified and identified based on GC-MS combined with chemometrics. In OPLS-DA, 26 differential markers including 17 sesquiterpenes, 2 chromones and 3 aromatics, were identified according to VIP value. The VIP value and S-plot generated by the comparison of regional groups (CNA and OCA) in the OPLS-DA model showed a total of 26 potential markers in VIP > 1, and a total of up to 25 potential markers were generated by comparison of two genotypes. The components of agarwood such as α-gurjunene, agarospirol, guaiol, γ-eudesmol and 2-phenethyl-4H-chromen-4-one were searched and summarized in the literature related to agarwood, which contributed the most to the VIP value. The unique phytochemical characteristics of agarwood may be related to the interactive information of original plant germplasm or invasive microorganisms. The current strategy focuses on this complex issue. By using multivariate statistical analysis, the indicator components can be scientifically highlighted, even if additional chemicals are added to reduce product costs such as plasticizers. Therefore, the strategy emphasizes providing more information and obtaining additional information when appropriate multivariate modeling is combined with appropriate and effective visualization of specific marker metabolites for identification.

Experimental section

Plant materials

Seven regions of AEOs were collected from Guangxi, Hainan and Taiwan for the China areas, and from Vietnam, Cambodia and Malaysia producing areas for Southeast Asia. Essential oils were obtained through water distillation or steam at the production regions and local shop purchase. Six samples were randomly selected from each planting region. See Table 3 for the source information.

Table 3 Source information of 7 regions of AEO samples.

GC–MS analyses of AEOs

Accurately weigh 30 mg of essential oil in a 5 mL EP tube, then add 2 mL of ethyl acetate solution (China National Pharmaceutical Group Chemical Reagent Co., Ltd., China) to dissolve. Shook well and let it stand for 2 h. Extract 1mL of the essential oil solution and filter it through a 0.45 μm filter membrane, preparing it for gas chromatography-mass spectrometry analysis.

The compositions of the essential oils were analyzed by GCMS-QP2010 Plus (Shimadzu, Tokyo, Japan), equipped with an SH-Rxi-5Sil MS Cap. column (30 m × 0.25 mm i.d., 0.25 μm film thickness; Shimadzu, Japan). The temperature program was as follows: initial temperature 90 °C for 2 min, then increased by 2 °C min−1 to 150 °C and held for 5 min, and then increased by 2 °C min−1 to 280 °C and held for 5 min. The other parameters were as follows: injection temperature, 250 °C; ion source temperature, 230 °C; EI, 70 eV; carrier gas, He at 1 ml min−1; injection volume, 1 ml; spilt ratio, 1:20; solvent delay of 2.5 min and mass range, m/z 50–550. Quantification was obtained from percentage peak areas from the gas chromatogram. Identification of individual compounds was carried out using the NIST2020 (National Institute of Standards and Technology, US. Department of Commerce) Registry of Mass Spectral Database to search the compounds of authentic references. Chromatographic results expressed as area percentages were calculated with a response factor of 1.0.

Methodological examination

Precision test

In S1–S7 regions of AEOs from different sources (Table 3), randomly selected one region, such as S1. Out of the six samples in each region, equal amounts were drawn and thoroughly mixed to form one sample. The test solution was prepared according to the above preprocessing description, and GC-MS analysis was conducted under the above chromatographic and mass spectrometric conditions. Following the same process, the analysis was repeated six times on the mixed S1 sample. The six data points were compared using the Similarity Evaluation System for Chromatographic Fingerprint of Traditional Chinese Medicine (Version 2012) (Chinese Pharmacopoeia Commission, China), and a similarity of no less than 0.99 indicated fine precision of the instrument.

Repeatability test

For the repeatability test, samples of AEOs from the same source (such as S1) were used. Six samples were made according to the steps description above. The weighing of each sample had to be precise. GC-MS analysis was conducted as described above. The six data points were compared using the Similarity Evaluation System for Chromatographic Fingerprint of Traditional Chinese Medicine, and a similarity of no less than 0.99 indicated good repeatability of the method.

Stability test

Any sample solution from S1-S7 was randomly selected, and the selected sample (such as S1) was dissolved into a tube of solution following the preprocessing steps. The solution was stored for different times: 2, 4, 6, 8, 12, and 24 h for GC-MS analysis. The six data points were compared using the Similarity Evaluation System for Chromatographic Fingerprint of Traditional Chinese Medicine, and a similarity of no less than 0.99 indicated that the test solution was stable within 24 h.

Data processing

Each experiment was repeated three times. Based on the NIST2020 database, the volatile components of the samples were qualitatively analyzed by mass spectrometry. Peak area normalization was used to calculate the relative percentage content. Substances with a similarity greater than 85% were identified as potential chemical components of AEOs. Using the software of Similarity Evaluation System for Chromatographic Fingerprint of Traditional Chinese Medicine (Version 2012) with a time window of 0.2, automatic matching was performed through multi-point correction using the median method. The similarity and common peaks between each sample and the reference map were calculated, and a GC-MS fingerprint map was constructed.

Multivariate analysis

The SIMICA14.1 software (Umetrics Co., Sweden) for multivariate data analysis was used. The compound data was normalized, and then the software performed diversified statistical analysis through PLS-DA analysis and OPLS-DA modules. PLS-DA and OPLS-DA were introduced for discrimination and derivation of potential markers (VIP score > 1)36. Finally, the cluster analysis was carried out in combination with SPSS27.0 data processing software. The univariate statistical analysis was introduced to confirm those differentially expressed features (p < 0.05). The cluster analysis used between-cluster linkage, and the Euclidean distance was used as a sample measure to determine the difference between the producing regions and species of AEOs.