Abstract
Drought is one of the major environmental issues that reduce crop yield. Seed germination is a crucial stage of plant development in all crop plants, including soybean. In soybean breeding, information about genetic mechanism of drought tolerance has great importance. However, at germination stage, there is relatively little knowledge on the genetic basis of soybean drought resistance. The objective of this work was to find the quantitative trait nucleotides (QTNs) linked to drought tolerance related three traits using a genome-wide association study (GWAS), viz., germination rate (GR), root length (RL), and whole seedling length (WSL), using germplasm population of 240 soybean PIs with 34,817 SNPs genotype data having MAF > 0.05. It was observed that heritability (H2) for GR, WSL, and RL across both environments (2020, and 2019) were high in the range of 0.76–0.99, showing that genetic factors play a vital role in drought tolerance as compared to environmental factors. A number of 23 and 27 QTNs were found to be linked to three traits using MLM and mrMLM, respectively. Three significant QTNs, qGR8-1, qWSL13-1, and qRL-8, were identified using both MLM and mrMLM methods among these QTNs. QTN8, located on chromosome 8 was consistently linked to two traits (GR and RL). The area (± 100 Kb) associated with this QTN was screened for drought tolerance based on gene annotation. Fifteen candidate genes were found by this screening. Based on the expression data, four candidate genes i.e. Glyma08g156800, Glyma08g160000, Glyma08g162700, and Glyma13g249600 were found to be linked to drought tolerance regulation in soybean. Hence, the current study provides evidence to understand the genetic constitution of drought tolerance during the germination stage and identified QTNs or genes could be utilized in molecular breeding to enhance the yield under drought stress.
Similar content being viewed by others
Introduction
Soybean (Glycine max (L.) Merr.) is among the world's five most important crops, providing food, feed, protein, and edible oil. However, it is subjected to several abiotic stresses like drought, salinity, and heat stress1. Among these stresses, drought stress is going to be a major threat to crop plants. In the current scenario, increased temperature, extreme variability of seasonal rainfall, and prolonged dry season due to climate change have contributed to the water scarcity challenge2. Water scarcity is becoming a major constraint for crop production throughout the world3. In soybean, drought stress mostly occurs during the whole growing season which leads to considerable yield loss and quality deterioration mainly in arid and semi-arid zones. Some studies have reported that drought stress could cause 24–50% yield losses in soybean4,5 The research studies have also shown that during juvenile and reproductive stages soybean is sensitive to drought stress6,7 In most of the soybean growing regions, irrigation is not a possible option due to lack of water availability and the high cost of irrigation. Therefore, the most economical way of sustainable soybean cultivation is to develop soybean genotypes having the ability to germinate and survived under drought stress condition8.
The most apparent symptoms of drought stress in soybean are reduction in germination percentage, cell elongation and expansion, hypocotyl length, shoot and root fresh weight, increased root length, leaf flipping, leaf clamping and initiation of early flowering5. Drought stress also disturbs carbon assimilation and plant phenology9. The germination and seedling stages are the most sensitive to drought stress in soybean and other crops like rice10, maize11 and wheat12. To combat with the challenge of drought stress, plants have adopted various morphological, physiological and molecular mechanisms including the robust root system, reduced leaf area and stomatal conductance, maintenance of turgor pressure, early maturation, activation of drought responsive genes and transcriptional factors e.g. DREBs, WRKY, ZIP, ZFP, NAC, MYB and ERFs13,14,15,16. It is predicted that in future, plants will face drought stress more frequently due to high temperature extremes17. Therefore, efforts are necessary to explore the genetic base and genetic mechanism regulating the drought tolerance in soybean to combat climate change.
Drought tolerance is a polygenic trait influenced by genotypic and environmental factors18. Due to complex genetic architecture of drought tolerance, less genetic diversity, segregation distortion, linkage drag, and distortion and suppression of recombination conventional breeding fails to meet the requirement for stabilizing yield potential in plants under variable conditions19,20,21,22. Advances in genetic and genomic approaches have assisted in the dissection of genetic control of various complex traits which have facilitated mapping of numerous quantitative trait loci (QTLs) in different crops, including soybean23,24. However, QTL mapping provides inadequate genetic diversity as reported in Oryza sativa25. To far, numerous genetic maps have been created to investigate QTLs that affect drought tolerance in soybeans. But in practical breeding, confirmation and verifications of QTLs are recommended prior to their application in the breeding program26.
Advancement in next-generation sequencing (NGS) has facilitated exploiting marker assisted selection (MAS) to enhance the genetic gains in an economical way. Genome wide association studies (GWAS) by utilizing genome wide single nucleotide polymorphism (SNPs) have the potential to detect variants that could be used directly for crop plant improvement23,27. GWAS could overwhelm the constraints of traditional QTL mapping and can be utilized to investigate the minor gene effects controlling complex traits in crop plants28. GWAS has been successfully used in breeding of various crops like cotton, wheat, barely, rice and corn13,23,29,30.
Association mapping accuracy relies on marker density and mapping populations31. Among all the approaches utilized for GWAS, the mixed linear model (MLM) is applied widely for association mapping32. Furthermore, advanced MLM- based methods have also been established like compressed MLM (cMLM), which saves time for computation and enhances the statistical performance. Bonferroni correction is indeed for such tests since these approaches rely on single-locus genome-wide scanning32. Currently, a more refined method is purposed known as random-SNP impact MLM (mrMLM), which combines the advantages of both multiple locus model and single locus genome scanning. The mrMLM revealed significantly better accuracy and power in the detection of QTN and affected the estimation24.
Several GWAS have been conducted and reported in soybean33,34. Still, literature related to GWAS for drought tolerance at germination is limited35. Germination is a critical step contributing toward final soybean production. Cultivars with a better germination percentage during drought stress produced higher yields36. The information is scarce about molecular genetics controls drought tolerance during the germination stage. As a result, it's critical to identify the QTLs linked to drought tolerance in soybeans during the germination stage.
The present research includes marker-trait association depending upon phenotypic and genotypic information from 240 worldwide germplasm accessions with wide diversity. The 240 accessions were screened for drought tolerance at germination stage with popular method for phenotyping. We used both single and multi-locus models viz., MLM and mrMLM to detect the significant SNPs/QTNs, as well as for the prediction of candidate genes linked to drought tolerance during germination in soybean. Physiological mechanisms associated with drought tolerance at the germination stage may be akin to those that help plants to survive under drought stress during seedling and reproductive stages. The objective of present study is to unravel the genetic constitution of drought tolerance in 240 soybean accessions at the germination stage using GWAS.
Results
Phenotypic characterization of the 240 germplasm
The drought tolerance of 240 soybean accessions was assessed on the base of three germination linked traits i.e. RL, GR and WSL. The values of mean, standard deviation, range, ANOVA (F-value) and estimated h2 for all traits of 240 lines in two different years 2018 and 2019 are shown in Table 2-1. During 2018, the average value of GR, WSL and RL were 0.30, 0.32 and 0.43 and their range varied from 0–1.25, 0.11–1.75 to 0.0028–0.96 respectively. However, the mean comparison revealed that, during 2019, the average values of GR and WSL were higher than the mean values in 2018 whereas RL growth pattern was found contradictory to it (higher in 2018). These findings depicted that, in soybean, drought tolerance is influenced by environment. The results of ANOVA indicated highly significant difference between genotypes (G) and environments (E). Furthermore, significant genotype× environment (G×E) interaction was detected for all three studied traits (Table 1). Furthermore, h2 for GR, WSL and RL across both environments were high in the range of 0.76–0.99, showing that genetic factors play a vital role in drought tolerance as compared to environmental factors.
SNPs distribution, population structure and LD estimation
In this study, out of a total 42,509 SNPs, 17.98% SNPs having MAF ≤ 0.05 were extracted for analyses, while the leftover 34,817 SNPs with MAF > 0.05 were used to investigate the genetic variability in 240 soybean genotypes and other subsequent analyses. These targeted SNPs were randomly located on 20 chromosomes of G. max (Fig. 1A). On an average, each chromosome (Chr) carried 1762 SNPs. Maximum number of SNPs (2922) were mapped on Chr.18 while minimum number of SNPs (1223) were found on Chr. 12. Furthermore, the SNPs distribution density on 20 chromosomes is presented in Fig. 1B that varied from 25.87 SNPs/Mb on Chr.01 to 51.01 SNPs/Mb on Chr.13.
Besides this, three different methods, principal component analysis (PCA), phylogenetic analysis, and bayesian model-based methods were employed to build the population structure of 240 soybean accessions. STRUCTURE software was used to compute Delta K (∆K) value (Fig. 2A; K = 1–10, Fig. 2B), and the result revealed that 240 soybean PIs were grouped into three subpopulations i.e. K = 3. Moreover, PCA analysis and phylogenetic tree showed consistent results which agree with the findings of population structure analysis (Fig. 2C, D).
An examination of the population structure of 240 soybean lines. (A) ΔK plot calculated for K = 1–10 (B) structure software was used to determine the population structure. Three subpopulations are represented by three colors (red, green, and blue) (C) the tested 240 lines’ maximum likelihood neighbor-joining tree (D) principal component analysis plot.
Furthermore, the r2 values were calculated for those SNPs pairs that resided within 10 Mb region and the LD decay rate was established based on the regression of negative natural logarithm. In G. max, LD decay was comparatively higher37 than other crop plants like A. thaliana38, O. sativa39, Z. mays40. In addition, our results showed that r2 values decreased gradually as the distance increases and At 1.76 Mb, the LD decay distance was measured, and r2 dropped to half of its highest value (Fig. 3).
Association mapping for traits via MLM
The MLM model was utilised in GWAS research to identify important QTNs related to GR, WSL, and RL. Results highlighted that a total of 4, 9 and 10 QTNs were found to be associated with GR, WSL, and RL, respectively, at a threshold value of − log10P = 4.5 (Table 2 and Fig. 4, Fig. 5, Fig. 6 respectively). Four QTNs associated with GR were mapped on two chromosomes only: Chr.03 and Chr.08. Among these QTNs, two QTN qGR3-1 and qGR8-1 were distributed on Chr.3 and 8 respectively and covered 8.55 to 10.59% phenotypic variation (PV) and they were identified consistently in all three environments (i.e. 2018, 2019 and combined environment) (Table 2, Fig. 4). These QTNs showed significant association with SNP marker Gm03-4741942, Gm08-12368145 and exhibited largest phenotypic variation effect (10.59%) in combined environment. Besides this, qGR3-2 and qGR8-2 were detected in two environments only and they covered 8.99–9.14% of the PV (Table 2).
For WSL, a total of 9 QTNs were identified on six different chromosomes i.e. Chr.08, 09, 12, 13, 16 and 18 (Table 2, Fig. 5). Among these, qWSL-8 and qWSL13-1 were found in all the three environments and categorized as stable QTNs linked with WSL. These two QTNs accounted for 8.01%–8.63% phenotypic variations, respectively. Two QTNs linked to WSL viz., qWSL13-2 and qWSL13-3 were identified in two environments and explicated 8.84–9.77% of the PV respectively. However, qWSL-16–1 and qWSL-9–1 were detected during 2018 and 2019, respectively (Table 2).
We used the MLM approach to identify 10 QTNs in RL, and all these QTNs were located on 5 distinct chromosomes viz., Chr.06, 08, 15, 14 and 16 (Table 2). Out of these QTNs, only qRL-8 and qRL15-4 were continually detected in all environments (2018, 2019 and Combined), elucidating 8.12–9.83% of the PV and these QTNs were linked SNP marker Gm08-12368145 on Chr.08 and Gm15-4497855 on Chr.15, respectively. Other QTNs, qRL16-1 and qRL-14 were detected in 2018 and 2019 environments, explaining 8.14% and 8.49% of the phenotypic variance, respectively. The three remaining QTNs associated with RL were only found in one environment and accounted for 8.08 percent–8.12 percent of the PV (Table 2 and Fig. 6).
Association mapping for traits via mrMLM
Multi-locus GWAS analysis was performed by a mrMLM approach to ensure the QTNs dependability detected by the MLM model and to identify further QTNs related with GR, WSL and RL. As shown in Table 2-3, a total of 8, 8 and 11 QTNs were linked with GR, WSL and RL respectively (Table 3). For GR, one QTN qGR8-1 linked with the Gm08-12368145 marker was constantly found in all environments, and this QTN showed broad PV effects ranging from 8.40 to 9.95%. The remaining seven GR QTNs were only found in one environment, accounting for 4.04 to 9.95% of the PV.
In case of WSL, 8 QTNs were identified, scattered on 6 different chromosomes. Among them, only single QTN qWSL-13 was mapped in all three environments with larger effect, while qWSL-18 was identified in two environments (2019 and combined). However, the rest of 6 QTNs were identified in single environment only. For RL, a total of eleven QTNs were detected, explaining 3.7–18.6% of the PV. Out of these, qRL-8-1 associated with RL in all environments (2018, 2019 and combined). Two QTNs qRL7-1 and qRL7-2 were observed in two environments (2019 and combined). However, the remaining eight QTNs were environment specific QTNs (Table 3). In short, a total of 23 and 27 QTNs linked with all studied traits of drought tolerance were observed via MLM and mrMLM model, respectively (Tables 2 and 3).
Comparison between MLM and mrMLM results
In comparison with the MLM and mrMLM, For MLM, six QTNs, viz., qGR3-1, qGR8-1, qWSL8, qWSL13-1, qRL-8 and qRL15-4 were detected in all three environments. As predicted, for mrMLM, three major QTN qGR8-1, qWSL13-1 and qRL-8 were linked with GR, WSL and RL were detected. These three critical QTNs were detected by both MLM and mrMLM. Nevertheless, the qGR3-1 and qWSL8-1 detected by MLM could not be confirmed by mrMLM. Furthermore, qRL6 and qRL14 were also detected by both MLM and mrMLM in 2018 and 2019, respectively (Table 3). Based on above mentioned results, two major and stable QTN i.e. qGR8-1(Gm08-12368145), qWSL13-1(Gm13-35394159) and qRL8 (Gm08-12368145) were identified in all studied environments via MLM and mrMLM methods. The QTNs, qRL8 and qGR8-1 were renamed as QTN8. QTN qRL15-4 (Gm15-4497855) was identified in all three environments in MLM but not in mrMLM. Thus, these major QTN on Chr.8 and 13 were potentially employed as the hotspot area for fine mapping and analysis of candidate genes linked with drought tolerance in soybean.
Candidate gene prediction
As the stable QTN found on Chr.08 (QTN8) and Chr.13 (qWSL13-1) were constantly detected in all studied traits i.e. GR, WSL and RL by both methods in all environments. For further confirmation, candidate gene prediction analysis has been conducted in the genomic region adjacent to the QTN8-1 and QTN13-1. Based on LD distance, 500 kb up as well as downstream region carrying peak SNPs markers was explored and within the interval of 1.0 Mb, a total of 100 and 139 genes were mapped on Chr.08 and Chr.13, respectively. The haplotype block analysis was carried out with Haploview 4.2 for all the SNPs located within ± 100 kb candidate region (Fig. 7). Based on gene annotation data derived from Soybase (http://www.soybase.org) and the Phytozome (https://phytozome.jgi.doe.gov), 15 genes were identified as possible candidate genes regulating the drought tolerance in soybean (Table2-4). Out of 15 candidate genes, Glyma08g156800, Glyma08g160000, Glyma08g162700 and Glyma13g249600 only showed higher expression in soybean root tissues.
Discussion
In this study, significant diversity has been observed in 240 soybean PIs regarding the GR, WSL and RL values that are three important drought-linked parameters at germination stage. Results depicted that GWAS analysis using 34,817 SNPs detected QTLs related with drought tolerance (Table 2, Table 3). Furthermore, strong positive correlations of r = 0.79 was observed between GR and RL in the 240 soybean PIs (Table 4). Drought stress impairs germination by limiting water imbibition41. Therefore, for evaluation of soybean genotypes, we screened the population at germination stage by measuring GR, WSL and RL. On the basis of correlation analysis, it was cleared that WSL showed a positive correlation with GR and RL (Table 4).
Two stable QTNs were discovered among the QTNs identified, viz. QTN8, (qGR8-1, qRL-8) and qWSL13-1 were detected in all environments (2018, 2019, and Combined) via MLM, explaining 8.34–10.59% of the PV, whereas these QTNs (QTN8 and qWSL13-1) were also detected by mrMLM. Remarkably, Stable QTN (QTN8) located on Chr.08, was linked with SNP marker Gm08-12368145 for two traits (qGR8-1 and qRL-8), this QTN was discovered in all of the environments examined. Furthermore, mrMLM was used to effectively identify this stable QTN8 on Chr. 8. Another major QTN (qWSL13-1) associated with Gm13-35394159 was detected in all environments (Table 3). Thus, this stable QTN8 detected on Chr.08 in present study is a newly QTN linked with drought-tolerance in soybean.
However, the second stable QTNs viz., qWSL13-1 identified via MLM was detected in all environments, was only linked to a single trait. Therefore, on the base of above mentioned results, it is summarized that in this research all the identified stable QTNs were novel for drought tolerance in soybean and mostly carried larger phenotypic effects. Besides this, QTN8 and qWSL13-1 exerted larger phenotypic effects on drought tolerance. That’s why, we carried out candidate gene prediction within 1.0 Mb region adjacent to the QTN8/Gm08-12368145 and Gm13-35394159. Based on functional annotations, fifteen candidate genes which are directly or indirectly linked with drought tolerance were selected (Table 5).
Up till now, various studies have been carried out in soybean to explore the genetic mechanism that regulates drought tolerance. Several QTLs linked with drought tolerance have been identified across different environments42,43,44. However, most of the research related to QTLs mapping was carried out by employing linkage mapping analysis on the basis of low-density genetic maps. Drought tolerance is a complex character as controlled by several minor loci with significant environmental influence. To overcome the linkage mapping limitations, LD based GWAS analysis was conducted in this research. Several studies have confirmed that utilization of GWAS analysis could be an alternative and powerful approach for precise detection and defining the genomic regions associated with quantitative traits. Utilization of GWAS is an efficient alternative approach for precise detection and marking the genomic region of loci/genes linked with complex traits16,45.
Linkage disequilibrium refers to non-random association of SNP alleles that are located at different locus. Nonetheless, the amount of genome-wide LD is critical in GWAS research. As it is influenced by several forces viz., population stratification, linkage, genetic drift, and relatedness13,46. For the soybean panel, the LD decay was approximately assessed at 1.76 Mb, which was markedly higher as compared to other plant species i.e. rice (75 kb in indica)39, Arabidopsis thaliana (10 kb)39 and 1.1 kb in sunflower47. In the self-pollinated species like soybean comparatively higher LD value has been recorded as compared to cross-pollinated species48,49. Furthermore, resequencing of 31 soybean genomes, Lam, Xu, Liu, Chen, Yang, Wong, Li, He, Qin and Wang37 stated that both cultivated and wild soybean species showed high LD, while the comparison of both species showed that cultivated soybean demonstrated higher LD level than wild species. Therefore, for effective improvement of the genetic variability of GWAS population genotypes should have diverse genetic background (including wild species, cultivars, and landraces) collected from different geographical regions.
From previous functionally verified genes, transgenic lines with GmNAC8 overexpression show noticeably greater drought tolerance than those with GmNAC8 deletion, indicating that GmNAC8 is a positive regulator of drought tolerance50. GmWRKY54 enhanced drought tolerance by activating genes in the abscisic acid and Ca2+ signaling pathways from functionally confirmed genes. In Arabidopsis, overexpression of WRKY20 from a wild soybean greatly increased drought tolerance51,52. Using a constitutive expression promoter, it was possible to overexpress the LOS5/ABA3 gene in soybeans, encoding the molybdenum cofactor sulfurase, therefore improving drought tolerance and increasing seed output by at least 21% during drought stress53. In plants, U-box (PUB) proteins serve as E3 ligases. Under drought stress conditions, the transgenic lines of Arabidopsis showed reduced plant viability when GmPUB6 was overexpressed54.
Manavalan, Prince, Musket, Chaky, Deshmukh, Vuong, Song, Cregan, Nelson and Shannon43 has reported one QTN associated with drought tolerance linked with marker Satt315 on Chr.08 within a confidence interval of 15 cm. Depending upon the genomic sequence information, it was found that this QTL was linked with Satt315 from SoyBase (http://www.soybase.org), this QTL was present between 6751690 and 6751725 Mb of physical interval on Chr.08. In another study, Abdul Haleem et al. (2011) identified one QTL Q_root_Gm08 (flanked with Satt228 and Satt429 on Gm08) via MLM relevant to the fibrous root system. This QTL was located between 46366169 and 46366092 Mb of physical interval of Chr.08. (QTN8)55 also reported a cluster of 4 QTL associated with seed and root related traits (RFW, RDW, SFW, and SDW) on LG A2 (chromosome 8). This QTL is linked with marker Satt177 within a confidence interval of 15 cM. Charlson et al. (2009) also identified QTL (Sat_319) on MLG A2 contributed slow-wilting. This QTL was located between 4717934 and 4717907 Mb of physical interval of Chr.08. Wang et al., (2012) also identified QTL (I-Satt327 on G2-A2) controlling drought tolerant traits during germination stage. This QTL was located between 20511527 and 20511574 Mb of physical interval of Chr.08 but the physical position of Gm08-12368145 was 12.36 Mb of Chr.08.
Materials and methods
Plant materials and field evaluation
An association mapping population comprised 240 worldwide (Table S2-1) soybean plant introductions (PIs). The collections were obtained from 22 countries across Europe (EU), West Asia and North Africa, East Asia, and the Americas. The list of germplasm accessions and their origins are presented in Supplementary Table 1. Seeds of the 240 germplasm were collected from National Center for Soybean Improvement, Nanjing Agricultural University, Nanjing–China. The seeds obtained were sown in Jiangpu Experimental station of Nanjing Agricultural University in 2018 and 2019. The healthy and uniform seeds were selected for drought tolerance screening in the laboratory.
Germination test
To test the germination percentage, twenty seeds per entry were randomly selected for each of three replications. For evaluation of growth performance and phenotypic diversity related to seedling traits, seeds were kept on wetted filter paper in a 9 cm diameter petri dish. Drought stress was created by the addition of Polyethylene glycol 1000 (PEG-1000, 15% w/v) whereas distilled water (without PEG) was employed in control conditions. The Petri dishes were retained in an incubator at 25 ℃ in the darkness. The experiment was carried out in complete randomized design (CRD). The seeds were categorized as germinated when radicle attained the length of 2 mm. Germination was recorded at 24 h’ intervals for five consecutive days. To evaluate germination traits, we measured germination rate (GR, whole seedling length (WSL) and root length (RL) under drought and control conditions. Germination related traits were scored as per International Seed Testing Association rules (ISTA) (ISTA, 2003):
where n and N represent number of seeds germinated and the total number of seeds sown, respectively.
For measurement of whole seedling length (WSL) and root length (RL), three plants of each accession were taken under control conditions and 15% PEG stress. After five days, RL and WSL (in cm) were measured manually by scaled ruler56. The data for these traits was utilised to create indices, which were calculated as the ratio of GR, WSL, and RL, germination traits, at drought conditions to the control conditions of same trait. The germination rate (GR) was scored by using the following equations:
Data analyses
Using SAS PROC UNIVARIATE tools, descriptive statistics such as mean, standard deviation (SD), and range were calculated for 240 soybean PIs in two environments: 2018 and 2019 respectively. The PROC GLM and PROC CORR programmes of SAS were used to calculate analysis of variance (ANOVA) and Pearson’s correlation for drought traits57. The broad sense heritability of studied traits under drought and normal condition was calculated for combined environment using the below formula
For individual environment heritability was calculated by using the following formula:
where σ2g is the genotypic variance, σ2ge is interaction variance between genotypes and environment, σ2e is the error variance, n is the number of environments, r and e represent replications and error respectively58.
Genotyping and quality control
The SNP data for the 240 accessions is from SoySNP50K BeadChip available on SoyBase database (soybase.org/snps/index.php) with 42,449 SNPs59. Briefly, a high-throughput SNP assay, the SoySNP50K iSelect BeadChip containing 42,449 SNPs was utilized for genotyping following procedure reported by60. Chromosomal distributions of SNPs and quality control were estimated following the procedure of Wen et al. (2014). A subset of 34,817 SNPs (MAF > 0.05 and missing rates < 0.25) was selected out of 42,449 SNPs for further analyses.
Principal component, linkage disequilibrium, and population structure analysis
After screening using the PLINK in-depth-pairwise command 7358 SNPs were retrieved for this study61. Using the software STRUCTURE 2.2, a model-based cluster assay was performed to find the genetic structure and define the cluster (gene pools) in the data62. Potential populations (K), the number was from 1 to 10 with four repetitions, and each Q and P-value was estimated. The Delta K method was used to calculate the most likely number of subpopulations63. TASSEL 5.0 software was used to create a neighbor-joining tree (NJT)64. R studio was used to perform principal component analysis (PCA). TASSEL 5.0 was used to estimate Kinship, and further pairwise SNPs linkage disequilibrium (LD) was estimated as a squared correlation coefficient (r2) using RTM-GWASV1.165. The distance of chromosome at which r2 dropped to half of its highest value was used to determine the linkage disequilibrium decay rate.
Marker-trait association and haplotype block analysis (GWAS)
After eliminating SNPs with an MAF < 5%, 34,817 SNPs were remained for the 240 PIs. To reduce false-positives and enhance statistical performance, the principal component (PC) and kinship (K) matrix were calculated. For the SLM (Single locus model) viz., MLM (PCA + K) was carried out in TASSEL5.2 (Bradbury et al., 2007) while for multi-locus model, mrMLM was conducted in R using mrMLMGUI package (version 2.1)66. A threshold level (–log10 (P) value = 4.5; P = 1/34,817) was set to calculate a significant linkage of SNPs with drought tolerance parameters in MLM whereas a critical LOD value of 3 was adopted for mrMLM (Xu et al., 2018; Misra et al., 2018). The SNPs identified by both models and in all environments was regarded as comparatively stable SNPs. The QTNs were named after the nomenclature of McCouch et al. (1997) with minor changes. Furthermore, the abbreviation was started with ‘q’, followed by trait name (GR, germination rate; WSL, whole seedling length; RL, root length) and the chromosome name, succeeded by number of QTN identified on same chromosome. Thus, qGR8-1 designated a locus on chromosome 08 related with GR in all environments. Haplotype block analysis of the comparatively stable SNPs was carried out in Haploview software version 4.2 with default parameters (Barrett et al.37.
Prediction of candidate genes
Possible candidate genes were obtained within major QTN region found on Chr.08 and Chr.13 by either single or multi-locus models in the soybean (William 82) gene models 1.0 (Grant et al., 2010). The functional annotations of reference genes were obtained from SoyBase and phytozome and candidate genes were selected manually based on their expression pattern in roots.
Conclusions
A total of 23 and 27 QTNs linked with GR, WSL and RL were detected using MLM and mrMLM approaches in all three environments (2018, 2019, and Combined). Among these QTNs, six QTNs, viz., qGR3-1, qGR8-1, qWSL8, qWSL13-1, qRL-8 and qRL15-4 were identified in all three environments via MLM and three of them (qGR8-1, qWSL13-1 and qRL-8) were also detected using mrMLM method. On the base of MLM and mrMLM comparative analysis, the most stable QTN8 was linked with RL and GR and it was confirmed through MLM and mrMLM methods in all environments. Eventually, the 1.0 Mb region around QTN8/ Gm08-12368145 and Gm13-35394159 ± 500 kb was utilized for candidate gene prediction. Based on the expression data, Glyma08g156800, Glyma08g16000, Glyma08g162700 and Glyma13g249600 were regarded as most possible candidate genes which regulate drought tolerance in soybean. Although, functional validation is further needed to define its role in drought tolerance. In short, this study broadens our vision to understand the genetic mechanisms regulating drought tolerance in soybean and it will be helpful in future to breed soybean cultivars with increased drought tolerance via gene cloning and marker assisted selection.
Data availability
Manuscript and supplementary material has data.
References
Miransari, M. Abiotic and biotic stresses in soybean production 53–77 (Elsevier, 2016).
Payus, C. et al. Impact of extreme drought climate on water security in North Borneo: Case study of Sabah. Water 12, 1135 (2020).
Wollenweber, B., Porter, J. & Schellberg, J. Lack of interaction between extreme high-temperature events at vegetative and reproductive growth stages in wheat. J. Agron. Crop Sci. 189, 142–150 (2003).
Frederick, J. R., Camp, C. R. & Bauer, P. J. Drought-stress effects on branch and mainstem seed yield and yield components of determinate soybean. Crop Sci. 41, 759–763 (2001).
Sadeghipour, O. & Abbasi, S. Soybean response to drought and seed inoculation. World Appl. Sci. J. 17, 55–60 (2012).
Samarah, N., Mullen, R. & Anderson, I. Soluble sugar contents, germination, and vigor of soybean seeds in response to drought stress. J. New Seeds 10, 63–73 (2009).
Eck, H., Mathers, A. & Musick, J. Plant water stress at various growth stages and growth and yield of soybeans. Field Crops Res. 17, 1–16 (1987).
Pennisi, E. The blue revolution, drop by drop, gene by gene. Science 320, 171–173 (2008).
Zeid, I. & Shedeed, Z. Response of alfalfa to putrescine treatment under drought stress. Biol. Plant. 50, 635 (2006).
Nayyeripasand, L., Garoosi, G. A. & Ahmadikhah, A. Genome-wide association study (GWAS) to identify salt-tolerance QTLs carrying novel candidate genes in rice during early vegetative stage. Rice 14, 1–21 (2021).
Seetharam, K. et al. Genomic regions associated with heat stress tolerance in tropical maize (Zea mays L.). Sci. Rep. 11, 1–11 (2021).
Ahmed, A. A., Mohamed, E. A., Hussein, M. Y. & Sallam, A. Genomic regions associated with leaf wilting traits under drought stress in spring wheat at the seedling stage revealed by GWAS. Environ. Exp. Bot. 184, 104393 (2021).
Zhao, K. et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat. Commun. 2, 1–10 (2011).
Kavar, T., Maras, M., Kidrič, M., Šuštar-Vozlič, J. & Meglič, V. Identification of genes involved in the response of leaves of Phaseolus vulgaris to drought stress. Mol. Breed. 21, 159–172 (2008).
Lacerda, J. S. et al. Importance of zinc for arabica coffee and its effects on the chemical composition of raw grain and beverage quality. Crop Sci. 58, 1360–1370 (2018).
Lehnert, H., Serfling, A., Friedt, W. & Ordon, F. Genome-wide association studies reveal genomic regions associated with the response of wheat (Triticum aestivum L.) to mycorrhizae under drought stress conditions. Front. Plant Sci. 9, 1728 (2018).
Coumou, D. & Robinson, A. Historic and future increase in the global land area affected by monthly heat extremes. Environ. Res. Lett. 8, 034018 (2013).
Jiang, Y. et al. Quantitative trait loci associated with drought tolerance in Brachypodium distachyon. Front. Plant Sci. 8, 811 (2017).
Tester, M. & Langridge, P. Breeding technologies to increase crop production in a changing world. Science 327, 818–822 (2010).
Oluoch, G. et al. QTL mapping for salt tolerance at seedling stage in the interspecific cross of Gossypium tomentosum with Gossypium hirsutum. Euphytica 209, 223–235 (2016).
Paterson, A. H., DeVerna, J. W., Lanini, B. & Tanksley, S. D. Fine mapping of quantitative trait loci using selected overlapping recombinant chromosomes, in an interspecies cross of tomato. Genetics 124, 735–742 (1990).
Baytar, A. A. et al. Genome-wide association mapping of yield components and drought tolerance-related traits in cotton. Mol. Breed. 38, 1–16 (2018).
Abou-Elwafa, S. F. & Shehzad, T. Genetic diversity, GWAS and prediction for drought and terminal heat stress tolerance in bread wheat (Triticum aestivum L.). Genet. Resour. Crop Evol. 68, 711–728 (2021).
Wang, L. et al. QTL fine-mapping of soybean (Glycine max L.) leaf type associated traits in two RILs populations. BMC Genom. 20, 1–15 (2019).
Cui, Y., Zhang, F. & Zhou, Y. The application of multi-locus GWAS for the detection of salt-tolerance loci in rice. Front. Plant Sci. 9, 1464 (2018).
Boerma, H. R. & MianMR. In Proceedings of the World Soybean Research Conference VI. 4–7.
Yang, W. et al. Combining high-throughput phenotyping and genome-wide association studies to reveal natural genetic variation in rice. Nat. Commun. 5, 1–9 (2014).
Korte, A. & Farlow, A. The advantages and limitations of trait analysis with GWAS: A review. Plant Methods 9, 1–9 (2013).
Sun, H. et al. Genome-wide association mapping of stress-tolerance traits in cotton. Crop J. 7, 77–88 (2019).
Pham, A.-T. et al. Genome-wide association of barley plant growth under drought stress using a nested association mapping population. BMC Plant Biol. 19, 134 (2019).
Gowda, M. et al. Relatedness severely impacts accuracy of marker-assisted selection for disease resistance in hybrid wheat. Heredity 112, 552–561 (2014).
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
Zeng, A. et al. Genome-wide association study (GWAS) of salt tolerance in worldwide soybean germplasm lines. Mol. Breed. 37, 1–14 (2017).
Zatybekov, A. et al. GWAS of agronomic traits in soybean collection included in breeding pool in Kazakhstan. BMC Plant Biol. 17, 1–8 (2017).
Liu, Z. et al. Genome-wide association study of soybean seed germination under drought stress. Mol. Genet. Genom. 295, 661–673 (2020).
Baalbaki, R., Zurayk, R., Bleik, M. & Talhouk, S. Germination and seedling development of drought tolerant and susceptible wheat under moisture stress. Seed Sci. Technol. 27, 291–302 (1999).
Schneider, R. et al. Genome-wide association mapping of partial resistance to Phytophthora sojae in soybean plant introductions from the Republic of Korea. BMC Genom. 17, 1–14 (2016).
Lam, H.-M. et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat. Genet. 42, 1053–1059 (2010).
Kim, S. et al. Recombination and linkage disequilibrium in Arabidopsis thaliana. Nat. Genet. 39, 1151–1155 (2007).
Mather, K. A. et al. The extent of linkage disequilibrium in rice (Oryza sativa L.). Genetics 177, 2223–2232 (2007).
Wang, M., Yang, W.-M. & Du, W.-J. Construction of a molecular marker linkage map and its use for quantitative trait locus (QTLs) underlying drought tolerance at germination stage in soybean. Afr. J. Biotechnol. 11, 12830–12838 (2012).
Carter, T. In World soybean research conference IV; Proceedings. (Asociacion Agrentina de la Soja).
Kaya, M. D., Okçu, G., Atak, M., Cıkılı, Y. & Kolsarıcı, Ö. Seed treatments to overcome salt and drought stress during germination in sunflower (Helianthus annuus L.). Eur. J. Agron. 24, 291–295 (2006).
Manavalan, L. P. et al. Identification of novel QTL governing root architectural traits in an interspecific soybean population. PLoS One 10, e0120490 (2015).
Abdel-Haleem, H., Lee, G.-J. & Boerma, R. H. Identification of QTL for increased fibrous roots in soybean. Theor. Appl. Genet. 122, 935–946 (2011).
Riedelsheimer, C. et al. Genome-wide association mapping of leaf metabolic profiles for dissecting complex traits in maize. Proc. Natl. Acad. Sci. 109, 8872–8877 (2012).
Stich, B. et al. Linkage disequilibrium in European elite maize germplasm investigated with SSRs. Theor. Appl. Genet. 111, 723–730 (2005).
Zhu, C., Gore, M., Buckler, E. S. & Yu, J. Status and prospects of association mapping in plants. Plant Genome 1, 1 (2008).
Liu, A. & Burke, J. M. Patterns of nucleotide diversity in wild and cultivated sunflower. Genetics 173, 321–330 (2006).
Yang, C. et al. GmNAC8 acts as a positive regulator in soybean drought stress. Plant Sci. 293, 110442 (2020).
Wei, W. et al. GmWRKY54 improves drought tolerance through activating genes in abscisic acid and Ca2+ signaling pathways in transgenic soybean. Plant J. 100, 384–398 (2019).
Luo, X. et al. Expression of wild soybean WRKY20 in Arabidopsis enhances drought tolerance and regulates ABA signalling. J. Exp. Bot. 64, 2155–2169 (2013).
Li, Y. et al. Expression of an Arabidopsis molybdenum cofactor sulphurase gene in soybean enhances drought tolerance and increases yield under field conditions. Plant Biotechnol. J. 11, 747–758 (2013).
Wang, N. et al. The soybean U-box gene GmPUB6 regulates drought tolerance in Arabidopsis thaliana. Plant Physiol. Biochem. 155, 284–296 (2020).
Brensha, W. et al. Genetic analysis of root and shoot traits in the ‘Essex’ by ‘Forrest’ recombinant inbred line (RIL) population of soybean [Glycine max (L.) Merr.]. Plant Genet. Genom. Biotechnol. 1, 1–9 (2012).
Hetz, W., Hochholdinger, F., Schwall, M. & Feix, G. Isolation and characterization of rtcs, a maize mutant deficient in the formation of nodal roots. Plant J. 10, 845–857 (1996).
Base, S. 9.3 Procedures Guide: Statistical Procedures (SAS Institute Inc., Cary, NC, 2011).
Nyquist, W. E. & Baker, R. Estimation of heritability and prediction of selection response in plant populations. Crit. Rev. Plant Sci. 10, 235–322 (1991).
Song, Q. et al. Fingerprinting soybean germplasm and its utility in genomic research. Genes Genomes Genet. 5, 1999–2006 (2015).
Song, Q. et al. Development and evaluation of SoySNP50K, a high-density genotyping array for soybean. PLoS One 8, e54985 (2013).
Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
Evanno, G., Regnaut, S. & Goudet, J. Detecting the number of clusters of individuals using the software structure: A simulation study. Mol. Ecol. 14, 2611–2620 (2005).
Bradbury, P. J. et al. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).
He, J. et al. An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding. Theor. Appl. Genet. 130, 2327–2343 (2017).
Wang, S.-B. et al. Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci. Rep. 6, 1–10 (2016).
Acknowledgements
The authors would like to extend their sincere appreciation to the Researchers Supporting Project number (RSP2024R347), King Saud University, Riyadh, Saudi Arabia.
Funding
This research was funded by the National Key Research and Development Program, National Natural Science Foundation of China (Grant Nos. 31571691), the Core Technology Development for Breeding Program of Jiangsu Province (JBGS-2021-014), the Program of Collaborative Innovation Center for Modern Crop Production co-sponsored by Province and Ministry (CIC-MCP), and also by the Researchers Supporting Project number (RSP2024R347), King Saud University, Riyadh, Saudi Arabia.
Author information
Authors and Affiliations
Contributions
Author contributions: Conceptualization, T.Z, Muqadas Aleem, MHS, M.K.R; Data curation, Maida Aleem., M.K.R, and W.Y; Formal analysis, Muqadas Aleem, M.S.I, I.S, S.A; Investigation, Maida Aleem., W.Y., Z.A, and B.K.; Methodology, T.Z, BK, and Muqadas Aleem; Project administration, T.Z..; Supervision, T.Z.; Writing—original draft, Muqadas Aleem, NB and Maida A. Writing—review & editing, T.Z., M.A, Z.A, and N.B.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Aleem, M., Razzaq, M.K., Aleem, M. et al. Genome-wide association study provides new insight into the underlying mechanism of drought tolerance during seed germination stage in soybean. Sci Rep 14, 20765 (2024). https://doi.org/10.1038/s41598-024-71357-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-71357-8