Genome assembly of a diversity panel of Chenopodium quinoa

Rey, Elodie; Abrouk, Michael; Dufau, Isabelle; Rodde, Nathalie; Saber, Noha; Cizkova, Jana; Fiene, Gabriele; Stanschewski, Clara; Jarvis, David E.; Jellen, Eric N.; Maughan, Peter J.; von Baer, Ingrid; Troukhan, Maxim; Kravchuk, Maksym; Hribova, Eva; Cauet, Stephane; Krattinger, Simon G.; Tester, Mark

doi:10.1038/s41597-024-04200-4

Download PDF

Data Descriptor
Open access
Published: 18 December 2024

Genome assembly of a diversity panel of Chenopodium quinoa

Scientific Data volume 11, Article number: 1366 (2024) Cite this article

2582 Accesses
2 Citations
4 Altmetric
Metrics details

Subjects

Abstract

Quinoa (Chenopodium quinoa) is an important crop for the future challenges of food and nutrient security. Deep characterization of quinoa diversity is needed to support the agronomic improvement and adaptation of quinoa as its worldwide cultivation expands. In this study, we report the construction of chromosome-scale genome assemblies of eight quinoa accessions covering the range of phenotypic and genetic diversity of both lowland and highland quinoas. The assemblies were produced from a combination of PacBio HiFi reads and Bionano Saphyr optical maps, with total assembly sizes averaging 1.28 Gb with a mean N50 of 71.1 Mb. Between 43,733 and 48,564 gene models were predicted for the eight new quinoa genomes, and on average, 66% of each quinoa genome was classified as repetitive sequences. Alignment between the eight genome assemblies allowed the identification of structural rearrangements including inversions, translocations, and duplications. These eight novel quinoa genome assemblies provide a resource for association genetics, comparative genomics, and pan-genome analyses for the discovery of genetic components and variations underlying agriculturally important traits.

Mining genomic regions associated with agronomic and biochemical traits in quinoa through GWAS

Article Open access 22 April 2024

A chromosome-scale assembly of the quinoa genome provides insights into the structure and dynamics of its subgenomes

Article Open access 13 December 2023

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Background & Summary

Over the past fifty years, quinoa (Chenopodium quinoa Willd.) cultivation transformed from the state of a subsistence crop for local farmers exclusively grown in the Andean region of South America, to being cultivated or experimented with by more than 110 countries across all continents¹. The rapid global expansion of quinoa beyond the Andes arose from the international recognition of the high nutritional value of its seeds, resilience to environmental extremes, tolerance to several abiotic stresses, and the large genetic diversity available for this crop. These qualities make quinoa a promising future crop, specifically for diversifying sustainable cropping systems and food markets which in turn will address nutrient and food security in marginal agricultural regions of the world^2,3,4. Along with the spread of quinoa cultivation comes the need to develop breeding programs, collect and catalog the available genetic diversity, and identify favorable allelic variation underlying beneficial traits that will allow quinoa to adapt to new environments and agronomic practices^5,6. Underlying these breeding program and genetics studies is the need to generate genomic resources specific to quinoa to support its accelerated improvement.

Quinoa germplasm banks are composed of several thousand accessions, both landraces and cultivars, from diverse geographical origins^7,8. Diversity studies based on molecular markers classified quinoa into two major groups: (1) highland quinoa, which displays the greatest genetic diversity and which spans the high plains of Bolivia, Peru, and Ecuador near its center of origin by Lake Titicaca (between 3650 and 4200 m above sea level); and (2) lowland quinoa, also named coastal or sea-level quinoa, which defines a unique South-Central Chilean ecotype of lesser diversity and has been used to breed quinoa for lowland regions worldwide^2,8,9. Additional studies grouped quinoa according to morphological traits linked with adaptation to specific agro-ecological conditions in major production areas, further classifying quinoa into four smaller genetic groups also referred to as ecotypes (Valley, Northern and Southern Highland, and Sea Level)^{7,10,11,12,13}. Phenotypic diversity and genetic variation for adaptation to various environments and stresses have been described within each regional geographical distribution. However, very little is known of the natural allelic variation underlying these traits since genomic resources are still restricted to a limited representation of the available genetic diversity.

Several assemblies have been produced for the quinoa genome (2n = 4x = 36) in the past decade and have been important for these initial characterizations of quinoa genetic diversity and genetic studies. Draft assemblies have been produced for four quinoa accessions, including Kd¹⁴, a Japanese inbred accession; PI 614886 (QQ74)¹⁵, a coastal Chilean accession; Real¹⁶, a highland accession representing the most commonly cultivated commercial variety; and the Bolivian accession CHEN125¹⁷. Though fragmented and incomplete, these resources have enabled transcriptomic analysis and targeted gene family approaches for the identification and mapping of genes involved in adaptation and domestication traits including seed saponin synthesis¹⁵, quinoa seed quality¹⁸, flowering time¹⁹, and the response to abiotic^{16,20,21,22,23,24} and biotic²⁵ stresses. Recently, we produced an improved version, both in terms of contiguity and completeness, of the coastal quinoa PI 614886 genome assembly (QQ74-V2²⁶) which has enabled larger-scale genetic studies^{27,28,29,30,31,32} (both in terms of population size and allelic variation representation) through quantitative genetics and genome-wide association studies. This new assembly, together with the resequencing of a set of quinoa accessions from both highland and lowland groups, uncovered large genomic rearrangements which may have significant implications for quinoa breeding and improvement. The results of that study further emphasized the need to consider not only nucleotide but also structural polymorphism to better understand quinoa diversity, especially as structural variants can be important components underlying phenotypic variation.

Here we present the genome assembly and annotation of a panel of eight quinoa accessions of diverse geographical origins selected to represent the diversity of quinoa phenotypes and to support the genetic improvement and adaptation of quinoa to warm and arid environments. All eight genomes are high-quality, chromosome-scale reference sequences assembled from over 30x genome coverage of PacBio HiFi long-reads, with validation and scaffolding provided by Bionano Saphyr optical maps. We further provide a map of structural rearrangements (inversion, duplication, and translocation) between the eight genomes. These genomes represent important resources for the investigation of the genetic components underlying important agronomic traits for quinoa improvement needed to meet the challenges of adapting to novel environments outside its native cultivation range.

Methods

Plant material

A set of eight quinoa accessions, including three lowland (Regalona and Javi from Chile; D-12282 from Argentina) and five highland genotypes (CHEN-199 and PI-614919 from Bolivia; 03-21-072RM, D-10126 and CHEN-90 from Peru), were primarily selected to represent the diversity of phenotypes (growth habit, panicle architecture, leaf shape, stem and seed colors – Fig. 1) and within the selected diversity for performing relatively well in hot, dry and short-day environments³³. Each of these eight accessions is highly homogeneous and exhibits low seed shattering³³. Previous analysis of SNPs produced through short-read resequencing indicated that all eight genotypes have high coefficient of inbreeding (F.coef ≥ 0.85; homozygous) and are well distributed across the spread of genetic diversity of quinoa²⁸.

Reference genome sequencing and assembly

Genome size estimation

We performed genome size estimation for each of the eight quinoa genotypes to inform the genome sequencing and assembly strategy. DNA amounts of each Chenopodium quinoa accession were estimated using CytoFLEX flow cytometer (Beckman Coulter, Brea, USA) equipped with a 488 nm solid-state laser. Samples were prepared following the protocol of Doležel et al.³⁴ using LB01 nuclei isolation buffer and propidium iodide staining. Lycopersicon esculentum cv. Stupicke polni tyckove rane (2 C = 1.96 pg) was used as the internal reference standard³⁵. Five individuals were analyzed per accession, and each plant was analyzed three times on three different days. 2 C nuclear DNA amounts were calculated according to Doležel et al.³⁴. Genome sizes (1 C values) were then determined considering 1 pg DNA equal to 0.978 × 10⁹ bp according to Doležel et al.³⁶. The genome sizes averaged 1.42 Gb with limited variation between accessions, which is in agreement with previous studies^37,38. The smallest genome (1.40 Gb) was found for CHEN-199 and the largest (1.43 Gb) for Javi, representing less than 2% genome size difference (Table 1).

Table 1 Genome size estimation.

Full size table

DNA extraction and sequencing

The seed stock used for genome sequencing is the second generation of propagation through single-seed descent from the original seeds received from the genebanks. Seeds were sown in pots containing a mixture of 30% sand + 70% soil & rocks (3:1 part), and grown in KAUST greenhouses in short-day conditions (10-hour day length) with 26 °C day / 18 °C night temperatures. Quinoa plants at the bolting stage (50 on the Biologische Bundesanstalt Bundessortenamt und CHemische Industrie (BBCH)-scale^39,40) were dark-treated for 48 hours, at which point the youngest leaf material from a single plant for each genotype was collected and immediately frozen in liquid nitrogen and kept at −80 °C until DNA extraction. Samples were ground into powder in liquid nitrogen using mortar and pestle, and high molecular weight (HMW) genomic DNA was isolated using the LeafGo protocol⁴¹ for long-read sequencing. DNA was quantified using a Qubit dsDNA HS Assay (Q32851, ThermoFisher Scientific), purity was confirmed using a Nanodrop spectrophotometer by checking that 260/280 and 260/230 ratios and DNA fragments size (>50 kb) was validated using the FemtoPulse system (Agilent, Santa Clara, CA, USA). HiFi libraries were then prepared according to the manual “Procedure & Checklist - Preparing HiFi SMRTbell® Libraries using the SMRTbell Express Template Prep Kit 2.0” (PN 101-853-100, Pacific Biosciences, Menlo Park, CA, USA) with 10 μg DNA sheared using the Megaruptor 2 system (Diagenode, Liège Science Park, Belgium) to obtain an average fragment size of 15–20 kb. Size-selected libraries were sequenced on a PacBio Sequel II system in CCS mode for 30 hours. For each accession, two SMRT cells were sequenced, and an average of 53 Gb PacBio HiFi reads per sample were obtained, corresponding to a ~36-fold coverage (Table 2).

Table 2 Genome sequencing and assembly statistics.

Full size table

Iso-Seq library preparation and sequencing

Iso-Seq libraries were produced from 5-6 tissues at different developmental stages for the Regalona and 03-21-072RM genotypes (root, shoot, apical meristem, leaves, and flowers for both genotypes, and developing seeds for Regalona only), in order to support the gene annotation of one lowland (Regalona) and one highland (03-21-072RM) reference genome assembly, respectively. Plants issued from the same seed stock as for the genome sequencing were sown and grown in the same soil, temperature and daylength conditions as described in the DNA extraction section. Each sample was made of tissues collected from three plants. Cleaned tissues washed in pure water were flash frozen in liquid nitrogen and kept at −80 °C until processing. RNA extraction was done using RNeasy Mini Kit, Qiagen, Cat. No. 74104, then DNAse treatment was performed using Invitrogen™ DNA-free™ DNA Removal Kit, Catalog No. AM1906. The concentration and purity of the RNA samples were measured on Thermo Scientific™ NanoDrop™ 2000 Spectrophotometer, and RNA integrity was assessed on the Agilent 2100 Bioanalyzer system. Only samples with RNA-integrity (RIN) value above 6 were retained for sequencing. Library preparation following the PacBio Iso-Seq^TM Express Template Preparation, and sequencing on Sequel II System was performed at Novogene Co., Ltd.

Bionano optical map

Genome assembly

For each of the eight quinoa accessions, PacBio HiFi reads were assembled using hifiasm⁴² v16.1 with default parameters to generate primary contig assemblies, and hybrid scaffolds were then generated by combining contig assemblies and optical maps using the hybridScaffold pipeline (https://bionano.com/wp-content/uploads/2023/01/30073-Bionano-Solve-Theory-of-Operation-Hybrid-Scaffold.pdf; Bionano Solve version 3.7). Hybrid scaffolds for each quinoa accession were compared to the quinoa reference QQ74-V2²⁶ using a dotplot produced by chromeister⁴³. For the accession 03-21-072RM, the dotplot comparison (Fig. 2) showed that the 18 longest (with an average length of 71.2 Mb) hybrid scaffolds out of 21 (with 0.2 Mb average length for the three remaining scaffolds) represented the chromosome-scale assembly, and the orientation and ordering of each pseudomolecule were assigned according to the quinoa reference genome QQ74-V2. Thus, the 03-21-072RM assembly was used as reference to guide the scaffolding of the other quinoa accessions to chromosome-scale level using RagTag⁴⁴ v2.1.0 with the following parameters: -C -r -u. LTR Assembly Index (LAI) was determined with LTR_retriever^45,46 v2.8.7, and BUSCO⁴⁷ v5.4.5 scores were computed with the eudicots_odb10 lineage dataset to assess the completeness of each genome (Table 3).

Table 3 Genome assembly contiguity and completeness assessment.

Full size table

Genome analysis

Repeat analysis

We performed the annotation, filtering, and consolidation of the repeat elements across the 8 quinoa genomes using default parameters of panEDTA⁴⁸ with the REPET^49,50 curated library provided for QQ74-V2²⁶ as input. On average, 66% of each quinoa genome is classified as repetitive sequences (Table 4).

Table 4 Repeat element annotation.

Full size table

Gene model prediction

For Regalona and 03-21-072RM accessions, the Circular Consensus Sequencing (CCS) application v6.4.0 (https://github.com/PacificBiosciences/ccs) using the continuous long reads subread dataset and default parameters was used to produce HiFi reads. Iso-Seq tool v4.0.0 (https://github.com/PacificBiosciences/IsoSeq) was used to trim primers and polyadenylation (polyA) tails and produce de novo Full Length Non Chimeric (FLNC) transcripts for downstream mapping and annotation.

Then, the FLNC transcripts from the six and five developmental stages were mapped to their corresponding reference assembly using minimap2⁵¹ v2.17-r941 (parameter: -ax splice -uf –secondary = no -C5 -O6,24 -B4), and the redundant isoforms were further collapsed into transcript loci using cDNA_Cupcake (http://github.com/Magdoll/cDNA_Cupcake; parameter:–dun-merge-5-shorter -C 0.95). All the gff3 files from each sample were merged into a single gtf file for both Regalona and 03-21-072RM accessions using StringTie⁵² v2.1.7. The Transdecoder.LongOrfs script (https://github.com/TransDecoder/TransDecoder/blob/master/TransDecoder.LongOrfs) was used to identify open reading frames (ORF) of at least 100 amino acids from the merged gtf file. The predicted protein sequences were compared to the UniProt (2021_03) and Pfam35 databases using BLASTP⁵³ (parameters: -max_target_seqs. 1 -outfmt 6 -evalue 1e−5) and hmmer356 v3.3.2 (parameters: hmmsearch -E 1e-10). The Transdecoder.Predict script (https://github.com/TransDecoder/TransDecoder/blob/master/TransDecoder.Predict) was used with the BLASTP and hmmer results to select the best translation per transcript. Finally, the annotation gff3 file was computed using the perl script “cdna_alignment_orf_to_genome_orf.pl” provided in the Transdecoder package.

In addition, a lifting approach using liftoff⁵⁴ v1.6.3 was combined with the genome-guided approach to predict gene models for Regalona and 03-21-072RM accessions. For the gene lifting, the annotations of QQ74-V2 and Beta vulgaris cv EL10_v1 were independently transferred using liftoff⁵⁴ (parameters: -a 0.9 -s 0.9 -copies -exclude_partial -polish).

All the output gff files from the lifting and genome-guided approaches were merged into a single file using AGAT (https://github.com/NBISweden/AGAT; perl script “agat_sp_merge_annotations.pl”). The merged file was then post-processed using gffread tool v0.11.7 (parameters:–keep-genes -N -J) to retain transcripts with a start and stop codon and to discard transcripts containing premature stop codons and/or introns with non-canonical splice sites. In total, 46,730 and 43,733 genes were predicted for Regalona and 03-21-072RM, respectively.

For the six others genomes (CHEN-90, CHEN-199, D-10126, D-12282, Javi, and PI-614919), a gene lifting approach using QQ74-V2, Regalona and 03-21-072RM gene models as reference was performed and merged into a comprehensive gff3 file using the same methods described above (using AGAT (https://github.com/NBISweden/AGAT) and gffread⁵⁵ v0.11.7). Between 43,733 and 48,564 gene models were predicted for the eight new quinoa genomes (Table 5), and the BUSCO⁴⁷ score was calculated with the protein mode (Table 6). Finally, the putative functional annotations were assigned using a protein comparison against the UniProt database (2021_03) using DIAMOND^56,57 (parameters: -f 6 -k1 -e 1e-6). PFAM ___domain signatures and GO terms were assigned using InterproScan⁵⁸ v5.55-88.039.

Table 5 Gene annotation statistics.

Full size table

Table 6 Gene annotation BUSCO (embryophyta_odb10) assessment.

Full size table

Genome visualization

The eight quinoa genomes were uploaded into the Persephone^® multi-genome browser (https://web.persephonesoft.com/). The data tracks available are the DNA sequence and gene model prediction. Among other functions provided by the platform, a BLAST³⁴ search and synteny analysis tool within all quinoa genomes is also available (Fig. 3).

Collinearity and structural variation detection

The collinearity and major structural variations (inversions, duplications, and translocations) between QQ74 and the eight new quinoa genomes were assessed using minimap2⁵¹ v2.26-r1175 (parameters: -ax asm5–eqx) and SyRI⁵⁹ v1.6.3 with defaults parameters. The results were visualized with plotsr⁶⁰ v1.1.1 (Fig. 4).

Data Records

The HiFi and Iso-Seq reads and the final chromosome assemblies were deposited in the Sequence Read Archive at NCBI under BioProject number PRJNA1018548⁶¹. The raw Bionano optical maps were deposited under EMBL project number PRJEB66274^{62,63,64,65,66,67,68,69}. The final chromosome assembly was deposited at NCBI^{70,71,72,73,74,75,76,77}.

The quinoa assemblies, gene model prediction and the repeat annotations are available on DRYAD Digital Repository (https://doi.org/10.5061/dryad.zkh1893jj)⁷⁸.

Technical Validation

Assessment of genome assembly and annotation

The average BUSCO⁴⁷ v5.4.5 (using embryophyta_odb10) score for the eight quinoa genomes is 98.1% at the genome level, indicating a high completeness of the assemblies. In comparison, the BUSCO⁴⁷ score for the previous quinoa genome QQ74-V2 is 97.9%. The quality of the eight genome assemblies was also assessed with Merqury based on the PacBio HiFi reads using 19-mers. The QV (consensus quality value) scores are in the range of 64.7 and 69 for CHEN-199 and PI-614919, respectively. The k-mer completeness scores were between 98.25% and 99.03 for Regalona and D-10126 genomes, respectively (Table 3).

We validated the concordance of the assembly by re-mapping the optical map onto the pseudomolecules of each of the eight quinoa genomes. Each pseudomolecule was visually inspected using Bionano Access software, and no major discrepancies were found.

Telomeric repeats (TTTAGGG)n²⁶ were screened using tidk⁷⁹ v0.2.31 (parameter: find -c Caryophyllales) (https://github.com/tolkit/telomeric-identifier). Telomeric repeats are present at the extremities of 35 out of 36 pseudomolecules for the assemblies of 03-21-072RM, CHEN-90, D-10126 and Regalona. Interestingly, telomeric repeat sequences are absent from the short arm of Cq1B in all genome assemblies. Telomeric repeat sequences are missing for one additional chromosome for CHEN-199 (Cq2BL), D-12282 (Cq2BS) and PI-614919 (Cq8AS), while Javi has missing telomere for two additional chromosomes (Cq2AS and Cq7AS).

Completeness of the gene model prediction was evaluated using BUSCO⁴⁷ v5.4.5 (using embryophyta_odb10, Table 6) and OMark⁸⁰ v0.3.0 using default settings and the Pentapetalae lineage database (Supplementary Table 1).

Code availability

All software and pipelines were executed according to the manual and protocol of published tools. No custom code was generated for these analyses.

References

Alandia, G., Rodriguez, J., Jacobsen, S.-E., Bazile, D. & Condori, B. Global expansion of quinoa and challenges for the Andean region. Global Food Security 26, 100429 (2020).
Article Google Scholar
Vavilov, N. I. & Dorofeev, V. F. Origin and geography of cultivated plants. Cambridge University Press (1992).
Jacobsen, S.-E. The worldwide potential for quinoa (Chenopodium quinoa Willd.). Food reviews international 19, 167–177 (2003).
Article Google Scholar
Rojas, W., Alandia, G., Irigoyen, J., Blajos, J. & Santivañez, T. Quinoa, an ancient crop to contribute to world food security. Santiago, Chile: FAO, Oficina Regional para America Latina y el Caribe (2011).
Zurita-Silva, A., Fuentes, F., Zamora, P., Jacobsen, S.-E. & Schwember, A. R. Breeding quinoa (Chenopodium quinoa Willd.): potential and perspectives. Molecular Breeding 34, 13–30 (2014).
Article Google Scholar
Murphy, K. M. et al. Quinoa breeding and genomics. Plant breeding reviews 42, 257–320 (2018).
Article Google Scholar
Christensen, S. A. et al. Assessment of genetic diversity in the USDA and CIP-FAO international nursery collections of quinoa (Chenopodium quinoa Willd.) using microsatellite markers. Plant Genetic Resources 5, 82–95 (2007).
Article CAS Google Scholar
Rojas, W. et al. State of the Art Report on Quinoa around the World in 2013. Food and Agriculture Organization of the United Nations, 56–82 (2015).
Gandarillas, H. La quinua (Chenopodium quinoa Willd.): Botánica. La Quinua y la Kañiwa cultivos andinos. Bogota: CIID-IICA, 20–44 (1979).
Tapia, M., Mujica, S. & Canahua, A. A1–A8. Puno, Peru: Proyecto PISCA/UNTA/IBTA/IICA/CIID (1980).
Bertero, H. D., De la Vega, A., Correa, G., Jacobsen, S. & Mujica, A. Genotype and genotype-by-environment interaction effects for grain yield and grain size of quinoa (Chenopodium quinoa Willd.) as revealed by pattern analysis of international multi-environment trials. Field crops research 89, 299–318 (2004).
Article Google Scholar
Curti, R. N. & Bertero, H. D. Botanical context for domestication in South America. The Quinoa Genome, 13–31 (2021).
Wilson, H. D. Quinua biosystematics I: domesticated populations. Economic Botany 42, 461–477 (1988).
Article Google Scholar
Yasui, Y. et al. Draft genome sequence of an inbred line of Chenopodium quinoa, an allotetraploid crop with great environmental adaptability and outstanding nutritional properties. DNA Research 23, 535–546 (2016).
Article CAS PubMed PubMed Central Google Scholar
Jarvis, D. E. et al. The genome of Chenopodium quinoa. Nature 542, 307–312 (2017).
Article ADS CAS PubMed Google Scholar
Zou, C. et al. A high-quality genome assembly of quinoa provides insights into the molecular basis of salt bladder-based salinity tolerance and the exceptional nutritional value. Cell Research 27, 1327–1340 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bodrug-Schepers, A., Stralis-Pavese, N., Buerstmayr, H., Dohm, J. C. & Himmelbauer, H. Quinoa genome assembly employing genomic variation for guided scaffolding. Theoretical and Applied Genetics 134, 3577–3594 (2021).
Article CAS PubMed Google Scholar
Grimberg, Å. et al. Transcriptional Regulation of Quinoa Seed Quality: Identification of Novel Candidate Genetic Markers for Increased Protein Content. Frontiers in Plant Science 13 (2022).
Golicz, A. A., Steinfort, U., Arya, H., Singh, M. B. & Bhalla, P. L. Analysis of the quinoa genome reveals conservation and divergence of the flowering pathways. Functional & Integrative Genomics 20, 245–258 (2020).
Article CAS Google Scholar
Mizuno, N. et al. The genotype-dependent phenotypic landscape of quinoa in salt tolerance and key growth traits. DNA Research 27 (2020).
Li, K. et al. Genome-wide identification, phylogenetic analysis, and expression profiles of trihelix transcription factor family genes in quinoa (Chenopodium quinoa Willd.) under abiotic stress conditions. BMC Genomics 23, 499 (2022).
Article CAS PubMed PubMed Central Google Scholar
Shi, P. & Gu, M. Transcriptome analysis and differential gene expression profiling of two contrasting quinoa genotypes in response to salt stress. BMC Plant Biology 20, 568 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ren, Y. et al. Genome-wide identification and expression analysis of the SPL transcription factor family and its response to abiotic stress in Quinoa (Chenopodium quinoa). BMC Genomics 23, 773 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhu, X., Wang, B., Wang, X. & Wei, X. Genome-wide identification, structural analysis and expression profiles of short internodes related sequence gene family in quinoa. Frontiers in Genetics 13 (2022).
Colque-Little, C. et al. Genetic variation for tolerance to the downy mildew pathogen Peronospora variabilis in genetic resources of quinoa (Chenopodium quinoa). BMC Plant Biology 21, 41 (2021).
Article CAS PubMed PubMed Central Google Scholar
Rey, E. et al. A chromosome-scale assembly of the quinoa genome provides insights into the structure and dynamics of its subgenomes. Commun Biol 6 (2023).
Maldonado-Taipe, N., Barbier, F., Schmid, K., Jung, C. & Emrani, N. High-density mapping of quantitative trait loci controlling agronomically important traits in quinoa (Chenopodium quinoa willd.). Frontiers in plant science 13, 916067 (2022).
Article PubMed PubMed Central Google Scholar
Patiranage, D. S. et al. Genome-wide association study in quinoa reveals selection pattern typical for crops with a short breeding history. Elife 11, e66873 (2022).
Article CAS PubMed PubMed Central Google Scholar
Patiranage, D. S. et al. Haplotype variations of major flowering time genes in quinoa unveil their role in the adaptation to different environmental conditions. Plant, Cell & Environment 44, 2565–2579 (2021).
Article CAS Google Scholar
Emrani, N. et al. An efficient method to produce segregating populations in quinoa (Chenopodium quinoa). Plant Breeding 139, 1190–1200 (2020).
Article CAS Google Scholar
Maldonado‐Taipe, N., Rey, E., Tester, M., Jung, C. & Emrani, N. Leaf and shoot apical meristem transcriptomes of quinoa (Chenopodium quinoa Willd.) in response to photoperiod and plant development. Plant, Cell & Environment (2024).
Rahman, H. et al. Mining genomic regions associated with agronomic and biochemical traits in quinoa through GWAS. Scientific Reports 14, 9205 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Stanschewski, C. S. Domestication and adaptation of Chenopodium quinoa for marginal environments Doctoral dissertation thesis, King Abdullah University of Science and Technology (2023).
Dolezel, J., Greilhuber, J. & Suda, J. Estimation of nuclear DNA content in plants using flow cytometry. Nat Protoc 2, 2233–2244 (2007).
Article CAS PubMed Google Scholar
Doležel, J., Sgorbati, S. & Lucretti, S. Comparison of three DNA fluorochromes for flow cytometric estimation of nuclear DNA content in plants. Physiologia plantarum 85, 625–631 (1992).
Article Google Scholar
Dolezel, J., Bartos, J., Voglmayr, H. & Greilhuber, J. Nuclear DNA content and genome size of trout and human. Cytometry A 51, 127–128, https://doi.org/10.1002/cyto.a.10013 (2003).
Article CAS PubMed Google Scholar
Kolano, B., Siwinska, D., Gomez Pando, L., Szymanowska-Pulka, J. & Maluszynska, J. Genome size variation in Chenopodium quinoa (Chenopodiaceae). Plant Systematics and Evolution 298, 251–255 (2012).
Article CAS Google Scholar
Palomino, G., Hernández, L. T. & de la Cruz Torres, E. Nuclear genome size and chromosome analysis in Chenopodium quinoa and C. berlandieri subsp. nuttalliae. Euphytica 164, 221–230 (2008).
Article CAS Google Scholar
Sosa‐Zuniga, V., Brito, V., Fuentes, F. & Steinfort, U. Phenological growth stages of quinoa (Chenopodium quinoa) based on the BBCH scale. Annals of Applied Biology 171, 117–124 (2017).
Article Google Scholar
Stanschewski, C. S. et al. Quinoa phenotyping methodologies: An international consensus. Plants 10, 1759 (2021).
Article PubMed PubMed Central Google Scholar
Driguez, P. et al. LeafGo: Leaf to Genome, a quick workflow to produce high-quality de novo plant genomes using long-read sequencing technology. Genome Biology 22, 256 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nature Biotechnology 40, 1332–1335 (2022).
Article CAS PubMed Google Scholar
Pérez-Wohlfeil, E., Diaz-del-Pino, S. & Trelles, O. Ultra-fast genome comparison for large-scale genomic experiments. Scientific Reports 9, 10274 (2019).
Article ADS PubMed PubMed Central Google Scholar
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biology 23, 258 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant physiology 176, 1410–1422 (2018).
Article CAS PubMed Google Scholar
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic acids research 46, e126–e126 (2018).
PubMed PubMed Central Google Scholar
Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness. Gene prediction: methods and protocols, 227-245 (2019).
Ou, S. et al. Differences in activity and stability drive transposable element variation in tropical and temperate maize. bioRxiv, 2022.2010. 2009.511471 (2022).
Flutre, T., Duprat, E., Feuillet, C. & Quesneville, H. Considering transposable element diversification in de novo annotation approaches. PloS one 6, e16526 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Quesneville, H. et al. Combined evidence annotation of transposable elements in genome sequences. PLoS computational biology 1, e22 (2005).
Article ADS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20, 278 (2019).
Article CAS PubMed PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J Mol Biol 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare. F1000Res 9 (2020).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods 18, 366–368 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biology 20, 277 (2019).
Article PubMed PubMed Central Google Scholar
Goel, M. & Schneeberger, K. plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics 38, 2922–2926 (2022).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP461962 (2024).
European Nucleotide Archive https://identifiers.org/ebi/biosample:SAMEA114426149 (2024).
European Nucleotide Archive https://identifiers.org/ebi/biosample:SAMEA114426156 (2024).
European Nucleotide Archive https://identifiers.org/ebi/biosample:SAMEA114426151 (2024).
European Nucleotide Archive https://identifiers.org/ebi/biosample:SAMEA114426154 (2024).
European Nucleotide Archive https://identifiers.org/ebi/biosample:SAMEA114426155 (2024).
European Nucleotide Archive https://identifiers.org/ebi/biosample:SAMEA114426150 (2024).
European Nucleotide Archive https://identifiers.org/ebi/biosample:SAMEA114426152 (2024).
European Nucleotide Archive https://identifiers.org/ebi/biosample:SAMEA114426153 (2024).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_040571405.1 (2024).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_040571485.1 (2024).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_040571585.1 (2024).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_040571465.1 (2024).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_040571445.1 (2024).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_040571505.1 (2024).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_040571545.1 (2024).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_040571565.1 (2024).
Rey, E. et al. Data from: Genome assembly of a diversity panel of Chenopodium quinoa. Dryad Digital Repository. https://doi.org/10.5061/dryad.zkh1893jj (2024).
Brown, M., González De la Rosa, P. M. & Mark, B. A Telomere Identification Toolkit. (Zenodo, 2023).
Nevers, Y. et al. Quality assessment of gene repertoire annotations with OMArk. Nature Biotechnology, 1–10 (2024).

Download references

Acknowledgements

We would like to thank Ingrid von Baer for enabling the use of Regalona quinoa in this study, and the public release of all related data to the scientific community. We thank the KAUST Bioscience Core Laboratory for sequencing support, the KAUST Plant Growth Facility Core Lab for greenhouse support, and the KAUST Supercomputing Facilities (https://www.hpc.kaust.edu.sa/) for providing computing resources. This research was supported by the King Abdullah University of Science and Technology (KAUST).

Author information

Authors and Affiliations

Plant Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
Elodie Rey, Michael Abrouk, Noha Saber, Gabriele Fiene, Clara Stanschewski, Simon G. Krattinger & Mark Tester
Center for Desert Agriculture, KAUST, Thuwal, Saudi Arabia
Elodie Rey, Michael Abrouk, Noha Saber, Gabriele Fiene, Clara Stanschewski, Simon G. Krattinger & Mark Tester
INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France
Isabelle Dufau, Nathalie Rodde & Stephane Cauet
Institute of Experimental Botany of the Czech Academy of Sciences, Centre of Plant Structural and Functional Genomics, Šlechtitelů 31, CZ-77900, Olomouc, Czech Republic
Jana Cizkova & Eva Hribova
Brigham Young University, Department of Plant and Wildlife Sciences, College of Life Sciences, Provo, UT, 84602, USA
David E. Jarvis, Eric N. Jellen & Peter J. Maughan
Fdo LA Esperanza sn, Perquenco, Chile
Ingrid von Baer
Persephone Software, LLC, Agoura Hills, CA, 91301, USA
Maxim Troukhan & Maksym Kravchuk

Authors

Elodie Rey
View author publications
Search author on:PubMed Google Scholar
Michael Abrouk
View author publications
Search author on:PubMed Google Scholar
Isabelle Dufau
View author publications
Search author on:PubMed Google Scholar
Nathalie Rodde
View author publications
Search author on:PubMed Google Scholar
Noha Saber
View author publications
Search author on:PubMed Google Scholar
Jana Cizkova
View author publications
Search author on:PubMed Google Scholar
Gabriele Fiene
View author publications
Search author on:PubMed Google Scholar
Clara Stanschewski
View author publications
Search author on:PubMed Google Scholar
David E. Jarvis
View author publications
Search author on:PubMed Google Scholar
Eric N. Jellen
View author publications
Search author on:PubMed Google Scholar
Peter J. Maughan
View author publications
Search author on:PubMed Google Scholar
Ingrid von Baer
View author publications
Search author on:PubMed Google Scholar
Maxim Troukhan
View author publications
Search author on:PubMed Google Scholar
Maksym Kravchuk
View author publications
Search author on:PubMed Google Scholar
Eva Hribova
View author publications
Search author on:PubMed Google Scholar
Stephane Cauet
View author publications
Search author on:PubMed Google Scholar
Simon G. Krattinger
View author publications
Search author on:PubMed Google Scholar
Mark Tester
View author publications
Search author on:PubMed Google Scholar

Contributions

E.R. and M.Tester conceived the project and planned the analyses. E.R., G.F., C.S., D.E.J., E.N.J., P.J.M. and M. Tester contributed to the germplasm selection. E.R. performed the gDNA extractions, sequence assemblies and validations. J.C. and E.H. performed the genome size estimation analyses. N.R., I.D. and S.C. produced the optical maps and hybrid scaffold assemblies. N.S. performed the RNA extractions. M.A. performed the annotations and variants identification. M.T. and M.K. managed the visualization platform. E.R., M.A. and M. Tester wrote the initial manuscript with input from all authors. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Elodie Rey or Mark Tester.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Rey, E., Abrouk, M., Dufau, I. et al. Genome assembly of a diversity panel of Chenopodium quinoa. Sci Data 11, 1366 (2024). https://doi.org/10.1038/s41597-024-04200-4

Download citation

Received: 08 July 2024
Accepted: 29 November 2024
Published: 18 December 2024
DOI: https://doi.org/10.1038/s41597-024-04200-4

Subjects

Abstract

Similar content being viewed by others

Mining genomic regions associated with agronomic and biochemical traits in quinoa through GWAS

A chromosome-scale assembly of the quinoa genome provides insights into the structure and dynamics of its subgenomes

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Background & Summary

Methods

Plant material

Reference genome sequencing and assembly

Genome size estimation

DNA extraction and sequencing

Iso-Seq library preparation and sequencing

Bionano optical map

Genome assembly

Genome analysis

Repeat analysis

Gene model prediction

Genome visualization

Collinearity and structural variation detection

Data Records

Technical Validation

Assessment of genome assembly and annotation

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Table 1

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links