Introduction

Camellia oleifera Abel. (Theaceae) is an evergreen shrub or small tree species valued for its high oil content and substantial economic importance1,2,3. Alongside oil palm, olive, and coconut, C. oleifera is among the world's foremost woody plants for edible oil production4. Current cultivars include C. oleifera, C. meiocarpa, C. vietnamensis, C. reticulata, C. chekiangoleosa, C. yuhsienensis, C. semiserrata, C. polyodonta, C. octopetala, C. gigantocarpa, C. semiserrata var. albiflora, and C. nanyonggenesis, totaling 13 varieties5. The species predominantly thrives between 18° 28′ and 34° 34′ N, and 100° 0′ and 122° 0′ E, at elevations ranging from 100 to 2400 m above sea level. Major cultivation regions include the Yangtze River and Pearl River basins in South China, with Hunan, Jiangxi, and Guangxi provinces comprising 76.2% of the country's total cultivation area. Additionally, cultivation occurs on a smaller scale in Vietnam, Burma, Thailand, Malaysia, and Japan6,7. Camellia oil, a traditional vegetable oil in southern China, is prized for its excellent fatty acid profile and stable composition8. The fatty acid composition across C. oleifera varieties includes palmitic (C16:0, 7.68–10.01%), palmitoleic (C16:1, 0.14–0.55%), stearic (C18:0, 1.46–2.97%), oleic (C18:1, 75.78–81.39%), linoleic (C18:2, 4.85–10.79%), alpha-linolenic (C18:3, 0.30–1.11%), eicosenoic (C20:1, 0.68–0.97%), and nervonic acids (C24:1, 0.08–0.36%)9. With over 90% unsaturated fatty acids, Camellia oil boasts a higher oleic acid content than other oil-producing tree species10. It also contains numerous health-promoting components and exhibits pharmacological properties such as anti-tumor, anti-inflammatory, and cholesterol-lowering effects10. Moreover, Camellia oil includes various bioactive compounds like triterpenes, vitamins, squalene, tocopherols, carotenoids, saponins, and glycerides11. By-products such as tea seed cake, tea saponin, and tea seed husks hold considerable value for applications in daily-use chemical products, medicine, agriculture, and industry12.

Fruit quality encompasses external characteristics such as size, shape, color, and uniformity, as well as internal attributes like soluble solids, sugar–acid ratio, vitamin C, anthocyanins, flavonoids, and other functional components13. Traditionally, the evaluation of C. oleifera fruits has focused primarily on yield and oil content. However, recent efforts aim to establish a systematic and comprehensive procedure for assessing the fatty acid composition, physicochemical properties, and nutritionally functional components of Camellia oil, which is essential for selecting and breeding high-quality C. oleifera varieties. Standard methods for assessing economic traits include factor analysis14, principal component analysis (PCA)15, grey relational analysis16, and the technique for order preference by similarity to ideal solution17. This experiment employed an integrated approach combining PCA and cluster analysis, which has proven effective in evaluating the quality of various fruits, including plums (Prunus salicina Lindl.)18, peaches (Prunus persica (L.) Batsch)19, apples (Malus pumila Mill.)20, grapes (Vitis vinifera L.)21, and pears (Pyrus spp)22.

C. oleifera is generally unsuitable for high-altitude cultivation due to its dependence on insect cross-pollination23,24. In such regions, early autumn temperature drops can adversely affect pollinating insect activity during the flowering period25, thereby impacting fruit set. Altitude also influences the quality of C. oleifera fruits. A survey of 34 locations in major C. oleifera-producing areas of China indicated a significant negative correlation between latitude and stearic acid content, and a positive correlation with oleic acid and crude fat contents in the seed oil26. In samples of the 'Cenxi' soft branch C. oleifera variety from ten locations in Guangxi, China, longitude did not significantly affect fatty acid composition. However, with increasing latitude and altitude, oleic acid content increased while linoleic acid content decreased27.

C. oleifera, extensively distributed across various climatic zones in China, exhibits significant biodiversity, with many superior varieties predominantly concentrated in low-altitude areas28,29. Limited research on C. oleifera fruit in high-altitude regions impedes its large-scale development in these areas. Identifying varieties suitable for high-altitude cultivation is essential to expanding the cultivation area and increasing Camellia oil production. This study investigated 48 wild C. oleifera germplasms from the high-altitude regions of East Guizhou Province, analyzing their commercially relevant traits to identify high-quality germplasms. The results provide a theoretical and practical foundation for the selection and hybrid breeding of superior C. oleifera varieties for high-altitude areas.

Materials and methods

Overview of experimental site and materials

Experimental site

The experimental site is located in the Tongren region of East Guizhou Province (107° 45′–109° 30′ E, 27° 07′–29° 05′ N), China, characterized by a mid-subtropical, monsoon-influenced, humid climate. The climate displays significant vertical variation. Annual sunshine duration ranges from 1044.7 to 1266.2 h, with an average temperature between 13.5 and 17.6 °C. The onset of average temperatures above 10 °C occurs in late March, and this persists until late November, providing 250 days with a cumulative temperature of 5300 °C. The region receives an average annual precipitation of 1110 to 1410 mm and has a frost-free period ranging from 275 to 317 days, indicating a climate with abundant heat, sunlight, and rainfall.

Experimental materials

The germplasm resources of C. oleifera utilized in this study were mature wild trees from high-altitude regions (above 1000 m) in East Guizhou Province, China. Figures 1 and 2, as well as Table 1, present the fundamental data. In November 2023, during the fruit ripening period, 30 mature fruits with slightly cracked pericarps were randomly collected from the periphery of each designated superior tree. These samples were labeled with "QD-" followed by a serial number for identification and packaged. “QD-” denotes the code of eastern Guizhou Province.

Figure 1
figure 1

Mature Fruits of the 48 Germplasms of C. oleifera.

Figure 2
figure 2

Mature Seeds of the 48 Germplasms of C. oleifera.

Table 1 Descriptive Statistics for the 15 Traits of Camellia oleifera Fruits.

Experimental methods

Measurement of the seed and fruit phenotypic traits

The color of each mature fresh fruit from individual plants was documented and photographed. The transverse and longitudinal diameters, along with the peel thickness of each fruit, were measured using an electronic digital caliper. The averages of these measurements were used to calculate the fruit shape index. An electronic balance was used to measure the single fruit weight of 30 sample fruits from 48 germplasms. Fresh seeds were peeled to determine the total fresh seed, dry seed, and dry kernel weights with a precision of 0.01 g. Subsequently, the fresh seed, dry seed, and dry kernel yields per fruit were calculated. The formulae used were:

$$\begin{aligned} & {\text{Fruit shape index }} = {\text{ Fruit longitudinal diameter}}/{\text{fruit transverse diameter}} \\ & {\text{Fresh seed yield }}\left( \% \right) \, = \, \left( {{\text{Weight of fresh seeds}}/{\text{weight of fresh fruit}}} \right) \, \times { 1}00 \\ & {\text{Dry seed yield }}\left( \% \right) \, = \, \left( {{\text{Weight of dry seeds}}/{\text{weight of fresh seeds}}} \right) \, \times { 1}00 \\ & {\text{Dry kernel yield }}\left( \% \right) \, = \, \left( {{\text{Weight of dry kernels}}/{\text{weight of dry seeds}}} \right) \, \times { 1}00 \\ \end{aligned}$$

Determination of oil and fatty acid contents

(1) Oil content

The oil content in Camellia seed kernels was quantified via Soxhlet extraction. Initially, the seeds were dried at 80 °C for 24 h and subsequently ground into a fine powder. Approximately 10 g of this sample was wrapped in dry 12 × 12 cm filter paper, with its initial weight recorded (W0). The sample was dried to a constant weight and, after cooling, the total weight of the filter paper and sample was measured precisely (W1). This package was then placed in a Soxhlet extractor and subjected to petroleum ether extraction for 10 h. Post-extraction, the sample was dried at 105 °C and the final weight was documented (W2). The oil content was calculated using the formula:

$${\text{Oil content }}\left( \% \right) = \left( {{\text{W1}} - {\text{W2}}} \right)/\left( {{\text{W1}} - {\text{W}}0} \right)\, \times \,{1}00.$$

(2) Fatty acid composition and content

Fatty acid composition and content were analyzed via the alkaline methylation method, with components identified by gas chromatography (GC) through peak area normalization. The protocol involved placing 4 g of the sample in a conical flask, followed by the addition of 40 mL methanol and 2 mL of 0.5 mol/L KOH–methanol. This mixture was refluxed in a 75 °C water bath until it turned clear. After cooling, the mixture was transferred to a separatory funnel. The flask was rinsed with 20 mL n-heptane, and the rinse was also added to the funnel along with 40 mL distilled water. Vigorous shaking ensured thorough mixing, resulting in two distinct layers: an upper lipid layer and a lower aqueous layer. Further extraction with an additional 20 mL n-heptane ensured complete separation of the lipid layer. The lipid-containing n-heptane solution was repeatedly rinsed until neutral, and the ester layer was isolated. The lipid layer was then dried with anhydrous sodium sulfate, followed by dehydration of the acetate solution with more anhydrous sodium sulfate and filtration. The drying agent was rinsed with a small volume of n-heptane, the rinse was combined with the acetate solution, and the final volume was adjusted to 100 mL with n-heptane.

GC conditions included the use of a flame ionization detector (FID) and an SP2340 chromatographic column with dimensions of 60 m × 0.25 mm × 0.2 μm. The temperature program commenced at 50 °C for 2 min, then ramped to 170 °C at 10 °C/min, maintained for 10 min, followed by an increase to 180 °C at 2 °C/min, held for 10 min, and finally reached 220 °C at 4 °C/min, sustained for 22 min. Injector and detector temperatures were set at 250 °C and 300 °C, respectively. N2 served as the carrier gas with a split ratio of 1:50 and an injection volume of 1 μL.

Fatty acid components were identified by comparing retention times with standards, and their relative content was quantified via the area normalization method. The experiment was conducted in triplicate, and mean values were calculated for each component.

Data analysis and processing and image processing

Data analysis utilized WPS Office 2019 (https://www.wps.com/), while image processing employed Adobe Photoshop 2020 (https://www.adobe.com/in/products/photoshop.html). Pearson correlation analysis and PCA were performed with SPSS Statistics 25 (https://www.ibm.com/).

The weight coefficient (P) for each principal component (PC) was determined using the following formula:

$$P = \frac{{C_{i} }}{{\sum\nolimits_{i = 1}^{n} {C_{i} } }}(i = 1,2,3,4, \ldots n)$$

where Ci denotes the contribution rate of the ith PC.

The comprehensive score (Y) for each germplasm was calculated using the following formula:

$$Y = \sum\nolimits_{i = 1}^{n} {PiZi} (i = 1,2,3, \ldots ,n)$$
(2)

where Pi represents the weight coefficient of the ith PC, and Zi signifies the score of the ith PC for that germplasm.

Results and analysis

Descriptive statistical results of the main traits of all 48 germplasms

Descriptive statistics were performed on the primary traits of mature fruits from 48 C. oleifera germplasms (Table 2). The germplasms were categorized by pericarp color: 1 with a red pericarp, 10 with red-yellow, 14 with red-cyan, and 13 with green, with 4 germplasms appearing in both the green pericarp category. According to Table 3, the average fruit weight among the 48 germplasms was 6.03 g. The QD-48 germplasm exhibited the heaviest individual fruit weight at 11.37 g, whereas QD-7 had the lightest at 3.01 g (Table 1). The mean transverse and longitudinal diameters were 23.29 mm and 21.25 mm, respectively, with the maximum observed in QD-38. The average peel thickness was 2.8 mm, with QD-18 having the thinnest peel at 1.83 mm. The average dry yield was 63.04%, with QD-45 achieving the highest dry yield at 78.67% and QD-28 the lowest at 53.06%. The mean oil content of seed kernels was 49.88%, peaking at 53.70% in QD-11; 23 germplasms exhibited oil content above 50%. In summary, optimal trait values were distributed among various germplasms, indicating a need for comprehensive evaluation to assess productivity effectively.

Table 2 Descriptive statistics of fruit color of Camellia oleifolia.
Table 3 Correlation among the C.oleifera Traits.

The 48 germplasms exhibited varying degrees of variation across four phenotypic and 11 quality traits, with coefficients of variation (CV) ranging from 1.37 to 44.53% (Table 1). The average CVs for phenotypic and quality traits were 25.36% and 9.86%, respectively. A CV exceeding 15% indicates substantial variability in fruit traits. Notably, four traits—single fruit weight, peel thickness, and fresh and dry seed yield rates—had CVs above 15%. The highest CV was observed in single fruit weight at 44.53%, followed by peel thickness at 26.31%. Conversely, 11 traits had CVs below 15%, with oleic acid content exhibiting the lowest variability. These 11 traits were more stable, indicating a lower potential for genetic improvement.

Correlation analysis of traits

The correlation analysis of the 15 fruit traits (Table 3) identified varying degrees of association. In the high-altitude regions of East Guizhou Province, a significant correlation emerged between single fruit weight and fruit shape. Conversely, fresh seed, dry seed, and dry kernel yields, along with fatty acid components, exhibited weaker associations with phenotypic traits. Notably, a strong inter-correlation was observed among various fatty acids. Specifically, the correlation between dry seed rate and kernel oil content was highly significant, with a correlation coefficient of 0.415. Significant correlations were also found between dry seed rate and both palmitic acid (r = 0.311) and linoleic acid (r = 0.317). These findings suggest a significant positive correlation between the dry seed rate of C. oleifera in high-altitude areas of East Guizhou and these three traits. The oil content of seed kernels was negatively correlated solely with cis-11-eicosenoic acid, indicating no significant correlation with other fatty acids except cis-11-eicosenoic acid. Palmitoleic acid exhibited a marked negative correlation with single fruit weight, fruit longitudinal diameter, and peel thickness, with most fatty acids demonstrating significant inter-correlation.

Cluster analysis

Hierarchical clustering analysis, utilizing the between-groups linkage method and squared Euclidean distance, was conducted based on the standardized results of 15 traits of C. oleifera fruits. The analysis of 48 C. oleifera germplasms generated a dendrogram (Fig. 3), dividing the germplasms into five categories at a squared Euclidean distance of 14.

Figure 3
figure 3

A clustering dendrogram of the 48 Camellia oleifera germplasms.

Table 4 presents the average performance of the 15 traits across these five groups. The first group, with small, exhibited significantly higher seed kernel oil content, ideal for breeding high oil content germplasm. The second group, characterized by small, mostly ovate fruits, showed markedly higher α-linolenic acid content and average levels of other traits, suitable for breeding germplasm rich in α-linolenic acid and high dry seed yield. The third group, distinguished by large, primarily spherical fruits, had notably higher peel thickness, dry seed yield rate, and contents of palmitic acid, palmitoleic acid, linoleic acid, and cis-11-eicosenoic acid, but lower fresh seed and dry kernel yield rates, as well as reduced levels of stearic acid, oleic acid, and α-linolenic acid, suitable for breeding germplasm rich in fatty acids. The fourth group, had lower peel thickness, dry seed yield rate, and contents of palmitic acid, palmitoleic acid, linoleic acid, and cis-11-eicosenoic acid, but higher dry kernel yield rate and contents of stearic acid and oleic acid, making it suitable for selecting germplasm rich in stearic and oleic acids. The fifth group, exhibited a significantly higher fresh seed yield rate but lower seed kernel oil content, making it suitable for breeding germplasm with excellent phenotypic characteristics.

Table 4 Average trait performance by group.

Principal component analysis

The 15 fruit traits data were standardized using SPSS 20.0 to minimize the impact of different scales. PCA was then performed, employing eigenvalues greater than 1 as the extraction criterion, resulting in five principal components (PCs). These PCs accounted for 73.81% of the total variance, encapsulating 73.81% of the original data's information without variable loss, indicating a significant retention of the original information. The results are detailed in Table 5, Figs. 4 and 5.

Table 5 Component loadings, eigenvalues, and contribution rates.
Figure 4
figure 4

3D Plots of PC1, PC2, and PC3 of the 48 Camellia oleifera Germplasms.

Figure 5
figure 5

3D Plots of PC1, PC4, and PC5 of the 48 Camellia oleifera Germplasms.

The first principal component (PC1), including single fruit weight, longitudinal and transverse diameters, peel thickness, and oleic acid content, had an eigenvalue of 3.57 and explained 23.8% of the variance. High positive loadings for these traits signified that PC1 primarily represented phenotypic characteristics, suggesting that fruit characterization is paramount in evaluating high-quality C. oleifera fruits in East Guizhou's high-altitude regions. The second principal component (PC2), comprised dry seed yield rate and the acids palmitic, linoleic, α-linolenic, and cis-11-eicosenoic. PC2 highlighted quality-related traits associated with these acids, with notable positive loadings for linoleic acid and dry seed yield rate, emphasizing their significance as evaluation factors. The third principal component (PC3) was dominated by seed kernel oil content, underscoring the importance of oil content. The fourth principal component (PC4) included fresh seed and dry kernel yield rates, with fresh seed yield rate exhibiting a higher loading, indicating its relevance. The fifth principal component (PC5), consisting of palmitoleic acid and stearic acid, had an eigenvalue of 1.12 and explained 7.45% of the variance, reflecting their concentrations, with palmitoleic acid showing a more pronounced positive loading, underscoring its value as an indicator.

Comprehensive evaluation of fruit quality

Considering the varying variance contribution rates of each PC, a comprehensive evaluation was performed by deriving the corresponding PC score coefficients through the ratio of component eigenvectors to the square root of their eigenvalues. Using these score coefficients and standardized values of quantitative traits, comprehensive scores for each trait were calculated, with the contribution rates of each PC serving as weights.

This model facilitated the calculation and ranking of comprehensive scores for each variety, as detailed in Tables 6 and 7. The top ten superior germplasms were ranked as follows: QD-33 > QD-34 > QD-48 > QD-38 > QD-27 > QD-15 > QD-35 > QD-5 > QD-14 > QD-36. Among the 48 germplasms, QD-33 achieved the highest score (1.92), followed by QD-34 (1.12) and QD-48 (1.11). The bottom five germplasms were QD-7 < QD-23 < QD-16 < QD-18 < QD-42, with scores of − 2.08, − 1.54, − 1.43, − 1.35, and − 1.14, respectively.

Table 6 Principal component scores and comprehensive score ranking of the 48 C. oleifera Germplasms.
Table 7 Specific phenotypic traits of the top ten ranked C. oleifera Germplasms.

The top three germplasms in PC1 were QD-48, QD-33, and QD-5, with scores of 3.89, 3.38, and 2.93, respectively, demonstrating exceptional performance in single fruit weight, and transverse and longitudinal diameters. In PC2, the leading germplasms were QD-8, QD-33, and QD-2, with scores of 3.89, 3.33, and 2.86, respectively, indicating superior quality traits in palmitic acid, linoleic acid, α-linolenic acid, and cis-11-eicosenoic acid contents. For PC3, the top germplasms were QD-10, QD-36, and QD-11, with scores of 3.17, 2.46, and 2.18, respectively, distinguishing their fruits by oil content. The leading performers in PC4 were QD-9, QD-45, and QD-48, with scores of 2.73, 2.58, and 2.31, respectively, indicating outstanding fresh seed and dry kernel yield rates. In PC5, the top germplasms were QD-20, QD-38, and QD-25, with scores of 2.55, 2.01, and 1.85, respectively, excelling in palmitoleic acid and stearic acid levels.

Discussion

The economically significant traits and oil quality of fruits are foundational for the commercial breeding of forestry species30. As an oilseed species, C. oleifera is typically evaluated based on yield and oil content. Its fatty acid composition resembles that of other high-quality edible oil-yielding tree species, such as walnuts and olives31. The seed kernels of C. oleifera are rich in oil, predominantly composed of unsaturated fatty acids, including oleic, linoleic, alpha-linolenic, palmitoleic, and cis-11-eicosenoic acids7. Oleic acid, often termed the "safe fatty acid" by nutritionists, lowers cholesterol and prevents cardiovascular diseases32. As a key indicator for assessing edible oil quality and oxidative stability, oleic acid content is paramount in evaluating the economically relevant characteristics of C. oleifera33. Notably, C. oleifera oil contains a higher oleic acid content than olive oil31. Linoleic acid, which the human body cannot synthesize, must be obtained through diet. Although it can lower blood cholesterol, excessive intake may increase blood viscosity and cause vascular spasms. General edible oils, such as soybean, corn, and sesame oils, have relatively high linoleic acid content, ranging from approximately 35–55%34. Previous studies suggest that high oleic or alpha-linolenic acid and low linoleic acid contents are beneficial to human health35. Therefore, reducing linoleic acid content is essential for enhancing oil quality to support human health.

C. oleifera, a cross-pollinated plant with extensive distribution and diverse germplasm resources, has evolved over a long period27. Geographical and climatic conditions across cultivation areas significantly impact seed weight and fruit size, with altitude being a primary factor influencing phenotypic traits36. At higher altitudes, fruits are generally smaller due to lower temperatures, while at lower altitudes, they are larger. The levels of unsaturated fatty acids, such as oleic and linoleic acids, vary significantly with geographical and climatic variations. Significant temperature differences greatly enhance oleic acid accumulation in C. oleifera seed oil26. In this study, the dry seed yield, kernel oil content, and oleic acid content of the 48 C. oleifera samples were 60.75%, 49.88%, and 81.85%, respectively, showing advantages over other reported germplasms. For instance, germplasms from Changsha, Hunan37; ‘Changlin’ and ‘Xianglin’ series from Taihu, Anhui38; 'Gan' series germplasms39; 13 high-quality C. yuhsienensis germplasms40; and 9 high-quality C. weiningensis Li et al.41. Additionally, correlation analysis indicated a negative association between oleic and linoleic acid contents among the fruits of the 48 C. oleifera germplasms from high-altitude areas of East Guizhou Province, suggesting excellent commercial qualities and enhanced potential for development and utilization.

This experiment employed a comprehensive evaluation method integrating PCA and cluster analysis. PCA identified the composition of PCs based on eigenvectors and cumulative contribution rates, utilizing PC values as indices to select superior varieties. This approach offered an accurate assessment of each trait's overall performance, providing significant theoretical and practical insights42. Cluster analysis categorized evaluation factors based on the similarity of fruit traits. Including more traits in cluster analysis enhances the comprehensive assessment of the varieties43. Cluster analysis elucidates genetic relationships between germplasm resources and examines associations between various traits and germplasm resources, providing a theoretical foundation for exploiting heterosis and is widely used in germplasm resource research. In this study, five PCs were extracted, with a cumulative contribution rate of 73.81%, serving as a comprehensive index for evaluating the fruit traits of 48 C. oleifera germplasm resources. Hierarchical cluster analysis based on Euclidean genetic distances classified the 48 germplasms into five categories. Combining PCA and cluster analysis offers a more scientific assessment of fruit quality. Cross-verification of cluster analysis, PCA, and germplasm comprehensive ranking ensures the accuracy of the analysis. In this study, QD-33 ranked first in the comprehensive germplasm ranking, and cluster analysis placed QD-33 in the third group, with a higher dry seed rate than the other four groups. The PC2 score of QD-33 was higher than that of the other 47 germplasms, and the percentage of dry seed in PC2 was higher than in the other four PCs. These results aligned with actual observations, indicating the suitability of the comprehensive evaluation method for the 48 C. oleifera germplasms in the high-altitude area of East Guizhou Province, China. Considering the final comprehensive scores and the ranking of elite genotypes, those with high single fruit weight, fresh seed yield rate, and oil, palmitoleic, and linoleic acid levels were identified as superior. However, as the experimental period was only 1 year, evaluating the trait performance of C. oleifera clones over multiple years, with numerous varieties, and across extensive areas is necessary.

This study utilized PCA and cluster analysis to objectively evaluate 15 traits of C. oleifera fruits. The PCA identified the top ten elite varieties based on comprehensive quality: QD-33, QD-34, QD-48, QD-38, QD-27, QD-15, QD-35, QD-5, QD-14, and QD-36, which are recommended for the development of superior germplasm resources. These results provide a theoretical foundation for breeding high-quality C. oleifera varieties and offer a reference for selecting premium germplasm resources, particularly addressing the shortage of varieties suitable for high-altitude regions. However, additional observation and experimentation are necessary to assess their stress resistance and yield stability.