Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation

Abstract

Cotton (Gossypium hirsutum L.) is a key allopolyploid crop with global economic importance. Here we present a telomere-to-telomere assembly of the elite variety Zhongmian 113. Leveraging technologies including PacBio HiFi, Oxford Nanopore Technology (ONT) ultralong-read sequencing and Hi-C, our assembly surpasses previous genomes in contiguity and completeness, resolving 26 centromeric and 52 telomeric regions, 5S rDNA clusters and nucleolar organizer regions. A phylogenetically recent centromere repositioning on chromosome D08 was discovered specific to G. hirsutum, involving deactivation of an ancestral centromere and the formation of a unique, satellite repeat-based centromere. Genomic analyses evaluated favorable allele aggregation for key agronomic traits and uncovered an early-maturing haplotype derived from an 11 Mb pericentric inversion that evolved early during G. hirsutum domestication. Our study sheds light on the genomic origins of short-season adaptation, potentially involving introgression of an inversion from primitively domesticated forms, followed by subsequent haplotype differentiation in modern breeding programs.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: T2T assembly and validation of ZM113 v1.0.
Fig. 2: Characteristics of ZM113 centromeric regions.
Fig. 3: Evolutionary dynamics of the D08 centromere.
Fig. 4: Structural and GCVs in G. hirsutum.
Fig. 5: Identification of an INV-derived haplotype associated with FD on chromosome D03.

Similar content being viewed by others

Data availability

The ZM113 genome assembly and annotation were deposited in NCBI under BioProject accession number PRJNA1137578 and Cotton Breeding Database (accessible at http://222.88.152.130:1130/). The raw sequencing reads by ONT ultralong-read sequencing, PacBio HiFi sequencing, MGI PE150 short-read sequencing, Hi-C, RNA-seq, ChIP-seq and BS-seq are available in NCBI under BioProject PRJNA1041574 and Cotton Breeding Database (accessible at http://222.88.152.130:1130/). The raw Bionano data are available in the Genome Sequence Archive of China National Genomics Data Center under PRJCA029603 accession CRA018637 (accessible at https://ngdc.cncb.ac.cn/gsa/). Nine G. hirsutum reference genomes and two diploid cotton genomes were downloaded from CottonGen (accessible at https://cottongen.org/). Genetic variant VCF for Hap-D03-1 to Hap-D03-5 was deposited in https://github.com/huguanjing/cottonRef_ZM113/. Source data are provided with this paper.

Code availability

Custom scripts are available in the GitHub repository (https://github.com/huguanjing/cottonRef_ZM113) and on Zenodo (https://doi.org/10.5281/zenodo.14840103)127.

References

  1. Viot, C. R. & Wendel, J. F. Evolution of the cotton genus, Gossypium, and its domestication in the Americas. CRC Crit. Rev. Plant Sci. 42, 1–33 (2023).

    Article  Google Scholar 

  2. Yang, Z. et al. Recent progression and future perspectives in cotton genomic breeding. J. Integr. Plant Biol. 65, 548–569 (2023).

    Article  CAS  PubMed  Google Scholar 

  3. Wen, X. et al. A comprehensive overview of cotton genomics, biotechnology and molecular biological studies. Sci. China Life Sci 66, 2214–2256 (2023).

    Article  PubMed  Google Scholar 

  4. Zhao, H. et al. Recent advances and future perspectives in early-maturing cotton research. New Phytol 237, 1100–1114 (2023).

    Article  PubMed  Google Scholar 

  5. Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, 531–537 (2015).

    Article  CAS  PubMed  Google Scholar 

  6. Li, F. et al. Genome sequence of cultivated upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat. Biotechnol. 33, 524–530 (2015).

    Article  PubMed  Google Scholar 

  7. Yang, Z. et al. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat. Commun. 10, 2989 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Wang, M. et al. Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. Nat. Genet. 51, 224–229 (2018).

    Article  PubMed  Google Scholar 

  9. Chang, X. et al. High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the centromeric landscape and evolution. Plant Commun. 5, 100722 (2024).

    Article  CAS  PubMed  Google Scholar 

  10. Huang, G. et al. Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution. Nat. Genet. 52, 516–524 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Chen, Z. J. et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 52, 525–533 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Hu, Y. et al. Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton. Nat. Genet. 51, 739–748 (2019).

    Article  CAS  PubMed  Google Scholar 

  13. Sreedasyam, A. et al. Genome resources for three modern cotton lines guide future breeding efforts. Nat. Plants 10, 1039–1051 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  14. He, S. et al. The genomic basis of geographic differentiation and fiber improvement in cultivated cotton. Nat. Genet. 53, 916–924 (2021).

    Article  CAS  PubMed  Google Scholar 

  15. Ma, Z. et al. High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement. Nat. Genet. 53, 1385–1391 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Perkin, L. C. et al. Genome assembly of two nematode-resistant cotton lines (Gossypium hirsutum L.). G3 11, jkab276 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Peng, R. et al. Evolutionary divergence of duplicated genomes in newly described allotetraploid cottons. Proc. Natl Acad. Sci. USA 119, e2208496119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Cheng, Y. et al. Gossypium purpurascens genome provides insight into the origin and domestication of upland cotton. J. Advert. Res. https://doi.org/10.1016/j.jare.2023.03.006 (2023).

  19. Meng, Q. et al. Comparative analysis of genome sequences of the two cultivated tetraploid cottons, Gossypium hirsutum (L.) and G. barbadense (L.). Ind. Crops Prod. 196, 116471 (2023).

    Article  CAS  Google Scholar 

  20. Dai, S. et al. Phenotypic characteristics and cultivation techniques of an early maturing and machine-harvested cotton variety Zhongmian 113 in introduction and demonstration of Xinjiang. China Cotton 49, 34–36 (2022).

    Google Scholar 

  21. Wang, K. et al. High yield and efficiency cultivation techniques of an upland cotton cultivar, Zhongmian 113, with early maturity and excellent fiber quality. China Cotton 48, 32–33 (2021).

    CAS  Google Scholar 

  22. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Hawkins, J. S., Kim, H., Nason, J. D., Wing, R. A. & Wendel, J. F. Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 16, 1252–1261 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).

    Article  CAS  PubMed  Google Scholar 

  25. Hanson, R. E. et al. Distribution of 5S and 18S-28S rDNA loci in a tetraploid cotton (Gossypium hirsutum L.) and its putative diploid ancestors. Chromosoma 105, 55–61 (1996).

    Article  CAS  PubMed  Google Scholar 

  26. Ji, Y. et al. New ribosomal RNA gene locations in Gossypium hirsutum mapped by meiotic FISH. Chromosoma 108, 200–207 (1999).

    Article  CAS  PubMed  Google Scholar 

  27. Gan, Y. et al. Chromosomal locations of 5S and 45S rDNA in Gossypium genus and its phylogenetic implications revealed by FISH. PLoS ONE 8, e68207 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Mower, J. P. Variation in protein gene and intron content among land plant mitogenomes. Mitochondrion 53, 203–213 (2020).

    Article  CAS  PubMed  Google Scholar 

  29. Wu, Z.-Q., Liao, X.-Z., Zhang, X.-N., Tembrock, L. R. & Broz, A. Genomic architectural variation of plant mitochondria—a review of multichromosomal structuring. J. Syst. Evol. 60, 160–168 (2022).

    Article  Google Scholar 

  30. Feng, Y. et al. Assembly and phylogenomic analysis of cotton mitochondrial genomes provide insights into the history of cotton evolution. Crop J 11, 1782–1792 (2023).

    Article  Google Scholar 

  31. Han, J. et al. Rapid proliferation and nucleolar organizer targeting centromeric retrotransposons in cotton. Plant J 88, 992–1005 (2016).

    Article  CAS  PubMed  Google Scholar 

  32. Luo, S. et al. The cotton centromere contains a Ty3-gypsy-like LTR retroelement. PLoS ONE 7, e35261 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Nagaki, K. et al. Sequencing of a rice centromere uncovers active genes. Nat. Genet. 36, 138–145 (2004).

    Article  CAS  PubMed  Google Scholar 

  34. Schneider, K. L., Xie, Z., Wolfgruber, T. K. & Presting, G. G. Inbreeding drives maize centromere evolution. Proc. Natl Acad. Sci. USA 113, E987–E996 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Zhao, H. et al. Gene expression and chromatin modifications associated with maize centromeres. G3 6, 183–192 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Wang, K., Wu, Y., Zhang, W., Dawe, R. K. & Jiang, J. Maize centromeres expand and adopt a uniform size in the genetic background of oat. Genome Res. 24, 107–116 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Gassmann, R. et al. An inverse relationship to germline transcription defines centromeric chromatin in C. elegans. Nature 484, 534–537 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Liu, Y. et al. Pan-centromere reveals widespread centromere repositioning of soybean genomes. Proc. Natl Acad. Sci. USA 120, e2310177120 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Zhao, J. et al. Centromere repositioning and shifts in wheat evolution. Plant Commun 4, 100556 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 20, 275 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol 25, 107 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Lovell, J. T. et al. GENESPACE tracks regions of interest and gene copy number variation across multiple genomes. eLife 11, e78526 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Ma, Z. et al. Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield. Nat. Genet. 50, 803–813 (2018).

    Article  CAS  PubMed  Google Scholar 

  47. Li, Y. et al. Genomic insights into the genetic basis of cotton breeding in China. Mol. Plant 16, 662–677 (2023).

    Article  CAS  PubMed  Google Scholar 

  48. Li, L. et al. Genomic analyses reveal the genetic basis of early maturity and identification of loci and candidate genes in upland cotton (Gossypium hirsutum L.). Plant Biotechnol. J. 19, 109–123 (2021).

    Article  CAS  PubMed  Google Scholar 

  49. Zhang, Y. et al. Uncovering genomic and transcriptional variations facilitates utilization of wild resources in cotton disease resistance improvement. Theor. Appl. Genet. 136, 204 (2023).

    Article  CAS  PubMed  Google Scholar 

  50. Lee, C.-R. et al. Young inversion with multiple linked QTLs under selection in a hybrid zone. Nat. Ecol. Evol. 1, 119 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Comai, L., Maheshwari, S. & Marimuthu, M. P. A. Plant centromeres. Curr. Opin. Plant Biol. 36, 158–167 (2017).

    Article  CAS  PubMed  Google Scholar 

  52. Henikoff, S., Ahmad, K. & Malik, H. S. The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293, 1098–1102 (2001).

    Article  CAS  PubMed  Google Scholar 

  53. Jiang, J., Birchler, J. A., Parrott, W. A. & Dawe, R. K. A molecular view of plant centromeres. Trends Plant Sci 8, 570–575 (2003).

    Article  CAS  PubMed  Google Scholar 

  54. Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1221–1231 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Song, J.-M. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).

    Article  CAS  PubMed  Google Scholar 

  59. Wang, T. et al. A complete gap-free diploid genome in Saccharum complex and the genomic footprints of evolution in the highly polyploid Saccharum genus. Nat. Plants 9, 554–571 (2023).

    Article  CAS  PubMed  Google Scholar 

  60. Zhang, L. et al. A near-complete genome assembly of Brassica rapa provides new insights into the evolution of centromeres. Plant Biotechnol. J. 21, 1022–1032 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Zhang, W. et al. Identification of centromeric regions on the linkage map of cotton using centromere-related repeats. Genomics 104, 587–593 (2014).

    Article  CAS  PubMed  Google Scholar 

  62. Melters, D. P. et al. Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biol 14, R10 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Huang, G. et al. A telomere-to-telomere cotton genome assembly reveals centromere evolution and a Mutator transposon-linked module regulating embryo development. Nat. Genet. https://doi.org/10.1038/s41588-024-01877-6 (2024).

  64. Gong, Z. et al. Repeatless and repeat-based centromeres in potato: implications for centromere evolution. Plant Cell 24, 3559–3574 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Presting, G. G. Centromeric retrotransposons and centromere function. Curr. Opin. Genet. Dev. 49, 79–84 (2018).

    Article  CAS  PubMed  Google Scholar 

  66. Neumann, P. et al. Plant centromeric retrotransposons: a structural and cytogenetic perspective. Mob. DNA 2, 4 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Wlodzimierz, P. et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618, 557–565 (2023).

    Article  CAS  PubMed  Google Scholar 

  68. Montefalcone, G., Tempesta, S., Rocchi, M. & Archidiacono, N. Centromere repositioning. Genome Res. 9, 1184–1188 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Grover, C. E. et al. Molecular confirmation of species status for the allopolyploid cotton species, Gossypium ekmanianum Wittmack. Genet. Resour. Crop Evol. 62, 103–114 (2015).

    Article  Google Scholar 

  70. Gallagher, J. P., Grover, C. E., Rex, K., Moran, M. & Wendel, J. F. A new species of cotton from Wake Atoll, Gossypium stephensii (Malvaceae). Syst. Bot. 42, 115–123 (2017).

    Article  Google Scholar 

  71. Song, H.-R. et al. The RNA binding protein ELF9 directly reduces SUPPRESSOR OF OVEREXPRESSION OF CO1 transcript levels in arabidopsis, possibly via nonsense-mediated mRNA decay. Plant Cell 21, 1195–1211 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Jarillo, J. A. & Piñeiro, M. H2A.Z mediates different aspects of chromatin function and modulates flowering responses in Arabidopsis. Plant J 83, 96–109 (2015).

    Article  CAS  PubMed  Google Scholar 

  73. Hu, H. et al. Unravelling inversions: technological advances, challenges, and potential impact on crop breeding. Plant Biotechnol. J. https://doi.org/10.1111/pbi.14224 (2023).

  74. Stefanova, P., Taseva, M., Georgieva, T., Gotcheva, V. & Angelov, A. A modified CTAB method for DNA extraction from soybean and meat products. Biotechnol. Biotechnol. Equip. 27, 3803–3810 (2013).

    Article  CAS  Google Scholar 

  75. Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20, 129 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  76. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).

    Article  CAS  PubMed  Google Scholar 

  78. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: assessing genomic data quality and beyond. Curr. Protoc. 1, e323 (2021).

    Article  PubMed  Google Scholar 

  80. Jo, H. & Koh, G. Faster single-end alignment generation utilizing multi-thread for BWA. Biomed. Mater. Eng. 26, S1791–S1796 (2015).

    PubMed  Google Scholar 

  81. Danecek, P. & McCarthy, S. A. BCFtools/csq: haplotype-aware variant consequences. Bioinformatics 33, 2037–2039 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Wang, X. & Wang, L. GMATA: an integrated software package for genome-scale SSR mining, marker development and viewing. Front. Plant Sci. 7, 1350 (2016).

    PubMed  PubMed Central  Google Scholar 

  84. Saha, S., Bridges, S., Magbanua, Z. V. & Peterson, D. G. Empirical comparison of ab initio repeat finding programs. Nucleic Acids Res. 36, 2284–2294 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  86. Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  88. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).

    Article  CAS  PubMed  Google Scholar 

  90. Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res. 43, e78 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Wang, B. et al. High-quality Arabidopsis thaliana genome assembly with Nanopore and HiFi long reads. Genom. Proteom. Bioinform. 20, 4–13 (2022).

    Article  CAS  Google Scholar 

  93. Udall, J. A. et al. De novo genome sequence assemblies of Gossypium raimondii and Gossypium turneri. G3 9, 3079–3085 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Argout, X. et al. The genome of Theobroma cacao. Nat. Genet. 43, 101–108 (2011).

    Article  CAS  PubMed  Google Scholar 

  95. Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  97. Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).

    Article  CAS  PubMed  Google Scholar 

  98. Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49, 9077–9096 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE 11, e0163962 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  102. Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).

    Article  CAS  PubMed  Google Scholar 

  103. Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Nattestad, M. & Schatz, M. C. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics 32, 3021–3023 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  107. Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  108. Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  109. Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  111. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Zhang, Y., Chu, J., Cheng, H. & Li, H. De novo reconstruction of satellite repeat units from sequence data. Genome Res. 33, 1994–2001 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Hao, Z. et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 6, e251 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  114. Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Cheng, Z., Presting, G. G., Buell, C. R., Wing, R. A. & Jiang, J. High-resolution pachytene chromosome mapping of bacterial artificial chromosomes anchored by genetic markers reveals the centromere ___location and the distribution of genetic recombination along chromosome 10 of rice. Genetics 157, 1749–1757 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Liu, Y. et al. Construction and primary application of oligos fluorescence in situ hybridization technology in cotton. Cotton Sci. 29, 213–221 (2017).

    CAS  Google Scholar 

  117. Krueger, F. Trim galore: a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files (Babraham Institute, 2015).

  118. Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics 27, 1571–1572 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Quinlan, A. R. BEDTools: the swiss-army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11.12.1–11.12.34 (2014).

    Article  PubMed  Google Scholar 

  120. Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol 9, R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  121. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 22, 1813–1831 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44, W160–W165 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  124. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. He, F., Ding, S., Wang, H. & Qin, F. IntAssoPlot: an R package for integrated visualization of genome-wide association study results with gene structure and linkage disequilibrium matrix. Front. Genet. 11, 260 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Hu G. Code for the publication ‘T2T reference genome of G. hirsutum cv. ZM113’. Zenodo https://doi.org/10.5281/zenodo.14840103 (2025).

Download references

Acknowledgements

This study was supported by the National Key Research and Development Program of China (2021YFF1000100 to Xiongfeng Ma), the National Natural Science Foundation of China (32072114 to Xiongfeng Ma, 32072111 to G.H. and 32070544 to K.W.), the Key Research and Development project of Xinjiang Uygur Autonomous Region (2022B02052-2 to Xiongfeng Ma), the Tianshan Innovation Team of Xinjiang Uygur Autonomous Region (2023D14016 to G.H.), the Major Science and Technology Program of Changji Hui Autonomous Prefecture (2021Z01-01 to Xiongfeng Ma), the Postdoctoral and High-level Flexible Talents of Xinjiang Uygur Autonomous Region (RSSQ00066509 to X. Zhang), the China Agriculture Research System (CARS-15-07 to Xiongfeng Ma), the Shennong Talents program (SNYCQN002-2022 to Xiongfeng Ma), the Top Talent Project of Henan Province (ZYYCYU202012146 to Xiongfeng Ma), the Natural Science Foundation of Henan Province (202300410550 to Xiongfeng Ma) and the Chinese Academy of Agricultural Sciences (the Innovation Project CAAS-ASTIP-ICR-KP-2021-01 to Xiongfeng Ma, the Youth Innovation Project Y2023QC38 to Z. Wang, the Young Elite Scientists Sponsorship Program by Henan Association for Science and Technology (2024HYTP010 to Z.T.) and the Special Project Cotton Research Institute 1610162023005 to X.W.). We thank X. Du (Cotton Research Institute, Chinese Academy of Agricultural Sciences) and J. Zhang (Guangxi University) for discussion and critical reading of the paper.

Author information

Authors and Affiliations

Authors

Contributions

Xiongfeng Ma conceived this project and coordinated research activities. G.H. and X.W. designed the experiments. X. Zhang, X.W. and K.F. prepared the plant materials. Z.T., X.W., G.H., X.L., R.N., D.Z., Xueyuan Ma, X.X., Ya Zhang, X. Zang and W.P. assembled, validated and annotated the ZM113 genome. K.F. and Z. Wu assembled and annotated the cytoplasmic genomes. K.W., G.Y. and J.H. performed the ChIP-seq experiment. L.D., Ya Zhang, G.H. and X.X. analyzed the ChIP-seq data. Yuzhi Zhang performed FISH and PCR experiments. Z.T., Ya Zhang and G.H. performed the centromere characterization and evolutionary analysis. X.L. performed the SV analysis. R.N. performed pan-genomic and RNA-seq analyses. Xueyuan Ma analyzed the bisulfite sequencing data. Z. Wang, G.J., Z.Y., J.Z. and X.W. conducted the GWAS and haplotype analysis, with inputs from J.S., S.H. and X. Zang. G.H., Z.T., Z. Wang, G.J. and X.W. wrote the paper, with inputs from D.Z. and X.X. for genome annotation, X.L. and R.N. for pan-genomic analysis, Xueyuan Ma for DNA methylation analysis, and Ya Zhang for ChIP-seq analysis. K.W., C.E.G. and J.F.W. revised the paper. All authors read and approved the paper.

Corresponding author

Correspondence to Xiongfeng Ma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Yuxian Zhu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Syntenic plot of two different assembly versions of ZM113.

Reference (ref), a previous version assembled using only ONT ultralong with NextDenovo (v2.3.1). Query (qry), the current version assembled using both the ONT ultralong and PacBio HiFi reads. The current assembly is larger in size (2,299.07 Mb vs 2288.35 Mb) and displayed notable improvement on A09 and D09 regions both harboring 5S rDNA clusters.

Extended Data Fig. 2 Potential assembly error (D10) and heterzygous sites (A01, A09, A12, D04 and D08) detected by Veritymap K-mer distance concordance test.

Based on the k-mer discrepancies between HiFi reads and the assembly, if a significant proportion of reads (20-80%) deviate from the assembly at a specific ___location, a heterozygous site is suspected; if nearly all reads deviate (>80%), a potential assembly error is indicated.

Extended Data Fig. 3 Whole-genome alignment between ZM113 and five TM-1 reference genome assemblies.

SyRI visualization of synteny (gray), inversion (orange), translocation (green) and duplication (blue) are shown. Common inversions between ZM113 and TM-1 references are indicated by red arrows.

Extended Data Fig. 4 Collinearity of A08 chromosome between ZM113 and 11 G. hirsutum genome assemblies.

SyRI visualization of synteny (gray), inversion (orange), translocation (green) and duplication (blue) are shown.

Extended Data Fig. 5 Collinearity of D03 chromosome in seven G. hirsutum assemblies.

The newly published TM-1 UTX_v3.0 version (Sreedasyam et al. 2024) corrected the ~10 Mb inversion in chromosome D03 found between UTX v2.1 and other G. hirsutum genomes including ZM113.

Extended Data Fig. 6 Integrated analysis of 4 Mb inversion and centromere repositioning in D08 chromosome.

Shown left is the synteny plot of the chromosome D08 between ZM113 and other reference genomes. The inversion at 34–38 Mb divided the D08 locus into two haplotypes. GhSat194 arrays colocalize with the boundary of the inversion. Shown right are synteny plots between ZM113 with AD6 (classified to Hap1) and AD7 (classified to Hap2). CENH3 peaks were shown at the bottom.

Extended Data Fig. 7 IGV visualization of the ZM113 D08 chromosome showed log2(ChIP/input) signal tracks from CENH3 ChIP-seq experiments.

Top to bottom tracks are: 1-4. ZM113 ChIP-seq, GhCEN08 ___location, Ψ-GhD08CEN ___location, and repetitive sequence annotation; 5-7. ChIP-seq experiments on G. hirsutum (AD1) var TM-1 (ZJU), G. stephensii (AD7), G. ekmanianum (AD6), G. tomentosum (AD3), G. barbadense (AD2), G. darwinii (AD5), G. mustelinum (AD4), and G. raimondii (D5).

Extended Data Fig. 8 Comparative annotation and functional validation of the ZM113 private gene Ghir_D03G00738.

(a) IGV visualization of the TM-1 reference genome (version: CRI v1.0) region containing the unannotated Ghir_D03G00738 sequence. (b) IGV visualization of Ghir_D03G00738 annotation and transcriptome alignment in the ZM113 reference genome. (c) The VIGS results show that Ghir_D03G00738 regulates flowering time in upland cotton. Silenced plant phenotype images. (d) Silencing efficiency of Ghir_D03G00738 was detected by qRT-PCR. (e–g) Statistics of first fruiting node position, flowering time, and budding time.

Extended Data Fig. 9 PCR validation of the ZM113 private gene Ghir_D03G00738.

(a) Gene structure and PCR primer design. (b) Agarose gel image of the amplified fragments from both ZM113 and TM-1 materials. (c) Sequence alignment results of the amplified fragments. The unprocessed images of the gels are available in the source data file.

Source data

Extended Data Fig. 10 PCR amplification validation of D03_INV.

(a) Primer design scheme and sequences spanning the inversion breakpoint region. (b) Amplification results with Primer 1. (c) Amplification results with Primer 2. The unprocessed images of the gels are available in the source data file.

Source data

Supplementary information

Supplementary Information

Supplementary Notes 1 and 2 and Figs. 1–31.

Reporting Summary

Peer Review File

Supplementary Tables 1–30

Supplementary Table 1. Comparison of reference genomes of G. hirsutum. Supplementary Table 2. Multi-platform sequencing data generated for genome assembly. Supplementary Table 3. Summary of the five gap regions closed. Supplementary Table 4. Statistics of 26 chromosomes with centromere and telomere locations. Supplementary Table 5. Assembly consensus quality assessment by Mercury. Supplementary Table 6. Assembly and gene annotation quality assessment by BUSCO. Supplementary Table 7. Repetitive elements in ZM113 genome assembly. Supplementary Table 8. Statistics of functional annotation of gene models in the ZM113 genome assembly. Supplementary Table 9. Statistics of transcriptome support for gene models. Supplementary Table 10. Syntenic orthologous analysis of ZM113 genes relative to parental diploids. Supplementary Table 11. Annotation summary of non-coding RNA. Supplementary Table 12. Genomic distribution of rDNA arrays. Supplementary Table 13. Assembly features of ZM113 organelle genomes. Supplementary Table 14. Gene profile and organization of the ZM113 mtochondrial genome. Supplementary Table 15. Gene profile and organization of the ZM113 chloroplast genome. Supplementary Table 16. Summary of multi-omics data of ChIP-seq, BS-seq, and RNA-seq. Supplementary Table 17. Centromere positions in ZM113 identified by mapping centromere-specific repeat sequences. Supplementary Table 18. Gene identification within and flanking centromeric regions. Supplementary Table 19. Characterization of ZM113 centormeric genes by HOG, pan-genomic annotation, and differential expression. Supplementary Table 20. Statistics of repetitive sequences in centromeric regions and genome-wide. Supplementary Table 21. Characterization of 2530 CEN-Tes based on percentage content per centromere. Supplementary Table 22. DNA methylation levels, genome-wide and in centromeres. Supplementary Table 23. Summary of structurals variants between ZM113 and 9 other G. hirsutum genome assemblies used for genome comparisons. Supplementary Table 24. Summary of non-redundant SVs against ZM113. Supplementary Table 25. Summary of coreDELs relative to ZM113. Supplementary Table 26. Summary of 28 genes located in coreDELs. Supplementary Table 27. Candiate genes contraining SNPs and elite alleles associated with lint percentage (LP) and Flowering day (FD). Supplementary Table 28. SNPs and elite alleles associated with lint percentage (LP) and Flowering day (FD) in 419 accessions. Supplementary Table 29. The cumulative number of elite SNPs associated with LP and FD in 419 accessions. Supplementary Table 30. Haplotype typing results of the D03 chromosome interval.

Source data

Source Data Fig. 4

The unprocessed gels in red box correspond to Fig. 4g,h.

Source Data Extended Data Fig. 9

The unprocessed gels in red box correspond to Extended Data Fig. 9c.

Source Data Extended Data Fig. 10

The unprocessed gels in red box correspond to Extended Data Fig. 10b,c.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, G., Wang, Z., Tian, Z. et al. A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation. Nat Genet 57, 1031–1043 (2025). https://doi.org/10.1038/s41588-025-02130-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-025-02130-4

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing