A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation

Hu, Guanjing; Wang, Zhenyu; Tian, Zunzhe; Wang, Kai; Ji, Gaoxiang; Wang, Xingxing; Zhang, Xianliang; Yang, Zhaoen; Liu, Xuan; Niu, Ruoyu; Zhu, De; Zhang, Yuzhi; Duan, Lian; Ma, Xueyuan; Xiong, Xianpeng; Kong, Jiali; Zhao, Xianjia; Zhang, Ya; Zhao, Junjie; He, Shoupu; Grover, Corrinne E.; Su, Junji; Feng, Keyun; Yu, Guangrun; Han, Jinlei; Zang, Xinshan; Wu, Zhiqiang; Pan, Weihua; Wendel, Jonathan F.; Ma, Xiongfeng

doi:10.1038/s41588-025-02130-4

Article
Published: 17 March 2025

A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation

Nature Genetics volume 57, pages 1031–1043 (2025)Cite this article

3971 Accesses
2 Citations
17 Altmetric
Metrics details

Subjects

Abstract

Cotton (Gossypium hirsutum L.) is a key allopolyploid crop with global economic importance. Here we present a telomere-to-telomere assembly of the elite variety Zhongmian 113. Leveraging technologies including PacBio HiFi, Oxford Nanopore Technology (ONT) ultralong-read sequencing and Hi-C, our assembly surpasses previous genomes in contiguity and completeness, resolving 26 centromeric and 52 telomeric regions, 5S rDNA clusters and nucleolar organizer regions. A phylogenetically recent centromere repositioning on chromosome D08 was discovered specific to G. hirsutum, involving deactivation of an ancestral centromere and the formation of a unique, satellite repeat-based centromere. Genomic analyses evaluated favorable allele aggregation for key agronomic traits and uncovered an early-maturing haplotype derived from an 11 Mb pericentric inversion that evolved early during G. hirsutum domestication. Our study sheds light on the genomic origins of short-season adaptation, potentially involving introgression of an inversion from primitively domesticated forms, followed by subsequent haplotype differentiation in modern breeding programs.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: T2T assembly and validation of ZM113 v1.0.**

**Fig. 2: Characteristics of ZM113 centromeric regions.**

**Fig. 3: Evolutionary dynamics of the D08 centromere.**

Fig. 4: Structural and GCVs in *G. hirsutum.*

**Fig. 5: Identification of an INV-derived haplotype associated with FD on chromosome D03.**

Post-polyploidization centromere evolution in cotton

Article 03 March 2025

The genomic basis of geographic differentiation and fiber improvement in cultivated cotton

Article 15 April 2021

A telomere-to-telomere genome assembly of Chinese grain sorghum 654

Article Open access 19 March 2025

Data availability

The ZM113 genome assembly and annotation were deposited in NCBI under BioProject accession number PRJNA1137578 and Cotton Breeding Database (accessible at http://222.88.152.130:1130/). The raw sequencing reads by ONT ultralong-read sequencing, PacBio HiFi sequencing, MGI PE150 short-read sequencing, Hi-C, RNA-seq, ChIP-seq and BS-seq are available in NCBI under BioProject PRJNA1041574 and Cotton Breeding Database (accessible at http://222.88.152.130:1130/). The raw Bionano data are available in the Genome Sequence Archive of China National Genomics Data Center under PRJCA029603 accession CRA018637 (accessible at https://ngdc.cncb.ac.cn/gsa/). Nine G. hirsutum reference genomes and two diploid cotton genomes were downloaded from CottonGen (accessible at https://cottongen.org/). Genetic variant VCF for Hap-D03-1 to Hap-D03-5 was deposited in https://github.com/huguanjing/cottonRef_ZM113/. Source data are provided with this paper.

Code availability

Custom scripts are available in the GitHub repository (https://github.com/huguanjing/cottonRef_ZM113) and on Zenodo (https://doi.org/10.5281/zenodo.14840103)¹²⁷.

References

Viot, C. R. & Wendel, J. F. Evolution of the cotton genus, Gossypium, and its domestication in the Americas. CRC Crit. Rev. Plant Sci. 42, 1–33 (2023).
Article Google Scholar
Yang, Z. et al. Recent progression and future perspectives in cotton genomic breeding. J. Integr. Plant Biol. 65, 548–569 (2023).
Article CAS PubMed Google Scholar
Wen, X. et al. A comprehensive overview of cotton genomics, biotechnology and molecular biological studies. Sci. China Life Sci 66, 2214–2256 (2023).
Article PubMed Google Scholar
Zhao, H. et al. Recent advances and future perspectives in early-maturing cotton research. New Phytol 237, 1100–1114 (2023).
Article PubMed Google Scholar
Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, 531–537 (2015).
Article CAS PubMed Google Scholar
Li, F. et al. Genome sequence of cultivated upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat. Biotechnol. 33, 524–530 (2015).
Article PubMed Google Scholar
Yang, Z. et al. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat. Commun. 10, 2989 (2019).
Article PubMed PubMed Central Google Scholar
Wang, M. et al. Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. Nat. Genet. 51, 224–229 (2018).
Article PubMed Google Scholar
Chang, X. et al. High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the centromeric landscape and evolution. Plant Commun. 5, 100722 (2024).
Article CAS PubMed Google Scholar
Huang, G. et al. Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution. Nat. Genet. 52, 516–524 (2020).
Article CAS PubMed PubMed Central Google Scholar
Chen, Z. J. et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 52, 525–533 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hu, Y. et al. Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton. Nat. Genet. 51, 739–748 (2019).
Article CAS PubMed Google Scholar
Sreedasyam, A. et al. Genome resources for three modern cotton lines guide future breeding efforts. Nat. Plants 10, 1039–1051 (2024).
Article PubMed PubMed Central Google Scholar
He, S. et al. The genomic basis of geographic differentiation and fiber improvement in cultivated cotton. Nat. Genet. 53, 916–924 (2021).
Article CAS PubMed Google Scholar
Ma, Z. et al. High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement. Nat. Genet. 53, 1385–1391 (2021).
Article CAS PubMed PubMed Central Google Scholar
Perkin, L. C. et al. Genome assembly of two nematode-resistant cotton lines (Gossypium hirsutum L.). G3 11, jkab276 (2021).
Article CAS PubMed PubMed Central Google Scholar
Peng, R. et al. Evolutionary divergence of duplicated genomes in newly described allotetraploid cottons. Proc. Natl Acad. Sci. USA 119, e2208496119 (2022).
Article CAS PubMed PubMed Central Google Scholar
Cheng, Y. et al. Gossypium purpurascens genome provides insight into the origin and domestication of upland cotton. J. Advert. Res. https://doi.org/10.1016/j.jare.2023.03.006 (2023).
Meng, Q. et al. Comparative analysis of genome sequences of the two cultivated tetraploid cottons, Gossypium hirsutum (L.) and G. barbadense (L.). Ind. Crops Prod. 196, 116471 (2023).
Article CAS Google Scholar
Dai, S. et al. Phenotypic characteristics and cultivation techniques of an early maturing and machine-harvested cotton variety Zhongmian 113 in introduction and demonstration of Xinjiang. China Cotton 49, 34–36 (2022).
Google Scholar
Wang, K. et al. High yield and efficiency cultivation techniques of an upland cotton cultivar, Zhongmian 113, with early maturity and excellent fiber quality. China Cotton 48, 32–33 (2021).
CAS Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hawkins, J. S., Kim, H., Nason, J. D., Wing, R. A. & Wendel, J. F. Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 16, 1252–1261 (2006).
Article CAS PubMed PubMed Central Google Scholar
Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).
Article CAS PubMed Google Scholar
Hanson, R. E. et al. Distribution of 5S and 18S-28S rDNA loci in a tetraploid cotton (Gossypium hirsutum L.) and its putative diploid ancestors. Chromosoma 105, 55–61 (1996).
Article CAS PubMed Google Scholar
Ji, Y. et al. New ribosomal RNA gene locations in Gossypium hirsutum mapped by meiotic FISH. Chromosoma 108, 200–207 (1999).
Article CAS PubMed Google Scholar
Gan, Y. et al. Chromosomal locations of 5S and 45S rDNA in Gossypium genus and its phylogenetic implications revealed by FISH. PLoS ONE 8, e68207 (2013).
Article CAS PubMed PubMed Central Google Scholar
Mower, J. P. Variation in protein gene and intron content among land plant mitogenomes. Mitochondrion 53, 203–213 (2020).
Article CAS PubMed Google Scholar
Wu, Z.-Q., Liao, X.-Z., Zhang, X.-N., Tembrock, L. R. & Broz, A. Genomic architectural variation of plant mitochondria—a review of multichromosomal structuring. J. Syst. Evol. 60, 160–168 (2022).
Article Google Scholar
Feng, Y. et al. Assembly and phylogenomic analysis of cotton mitochondrial genomes provide insights into the history of cotton evolution. Crop J 11, 1782–1792 (2023).
Article Google Scholar
Han, J. et al. Rapid proliferation and nucleolar organizer targeting centromeric retrotransposons in cotton. Plant J 88, 992–1005 (2016).
Article CAS PubMed Google Scholar
Luo, S. et al. The cotton centromere contains a Ty3-gypsy-like LTR retroelement. PLoS ONE 7, e35261 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nagaki, K. et al. Sequencing of a rice centromere uncovers active genes. Nat. Genet. 36, 138–145 (2004).
Article CAS PubMed Google Scholar
Schneider, K. L., Xie, Z., Wolfgruber, T. K. & Presting, G. G. Inbreeding drives maize centromere evolution. Proc. Natl Acad. Sci. USA 113, E987–E996 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zhao, H. et al. Gene expression and chromatin modifications associated with maize centromeres. G3 6, 183–192 (2015).
Article PubMed PubMed Central Google Scholar
Wang, K., Wu, Y., Zhang, W., Dawe, R. K. & Jiang, J. Maize centromeres expand and adopt a uniform size in the genetic background of oat. Genome Res. 24, 107–116 (2014).
Article PubMed PubMed Central Google Scholar
Gassmann, R. et al. An inverse relationship to germline transcription defines centromeric chromatin in C. elegans. Nature 484, 534–537 (2012).
Article CAS PubMed PubMed Central Google Scholar
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
Article PubMed PubMed Central Google Scholar
Liu, Y. et al. Pan-centromere reveals widespread centromere repositioning of soybean genomes. Proc. Natl Acad. Sci. USA 120, e2310177120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhao, J. et al. Centromere repositioning and shifts in wheat evolution. Plant Commun 4, 100556 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 20, 275 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol 25, 107 (2024).
Article PubMed PubMed Central Google Scholar
Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lovell, J. T. et al. GENESPACE tracks regions of interest and gene copy number variation across multiple genomes. eLife 11, e78526 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ma, Z. et al. Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield. Nat. Genet. 50, 803–813 (2018).
Article CAS PubMed Google Scholar
Li, Y. et al. Genomic insights into the genetic basis of cotton breeding in China. Mol. Plant 16, 662–677 (2023).
Article CAS PubMed Google Scholar
Li, L. et al. Genomic analyses reveal the genetic basis of early maturity and identification of loci and candidate genes in upland cotton (Gossypium hirsutum L.). Plant Biotechnol. J. 19, 109–123 (2021).
Article CAS PubMed Google Scholar
Zhang, Y. et al. Uncovering genomic and transcriptional variations facilitates utilization of wild resources in cotton disease resistance improvement. Theor. Appl. Genet. 136, 204 (2023).
Article CAS PubMed Google Scholar
Lee, C.-R. et al. Young inversion with multiple linked QTLs under selection in a hybrid zone. Nat. Ecol. Evol. 1, 119 (2017).
Article PubMed PubMed Central Google Scholar
Comai, L., Maheshwari, S. & Marimuthu, M. P. A. Plant centromeres. Curr. Opin. Plant Biol. 36, 158–167 (2017).
Article CAS PubMed Google Scholar
Henikoff, S., Ahmad, K. & Malik, H. S. The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293, 1098–1102 (2001).
Article CAS PubMed Google Scholar
Jiang, J., Birchler, J. A., Parrott, W. A. & Dawe, R. K. A molecular view of plant centromeres. Trends Plant Sci 8, 570–575 (2003).
Article CAS PubMed Google Scholar
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Article CAS PubMed PubMed Central Google Scholar
Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1221–1231 (2023).
Article CAS PubMed PubMed Central Google Scholar
Song, J.-M. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).
Article CAS PubMed Google Scholar
Wang, T. et al. A complete gap-free diploid genome in Saccharum complex and the genomic footprints of evolution in the highly polyploid Saccharum genus. Nat. Plants 9, 554–571 (2023).
Article CAS PubMed Google Scholar
Zhang, L. et al. A near-complete genome assembly of Brassica rapa provides new insights into the evolution of centromeres. Plant Biotechnol. J. 21, 1022–1032 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhang, W. et al. Identification of centromeric regions on the linkage map of cotton using centromere-related repeats. Genomics 104, 587–593 (2014).
Article CAS PubMed Google Scholar
Melters, D. P. et al. Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biol 14, R10 (2013).
Article PubMed PubMed Central Google Scholar
Huang, G. et al. A telomere-to-telomere cotton genome assembly reveals centromere evolution and a Mutator transposon-linked module regulating embryo development. Nat. Genet. https://doi.org/10.1038/s41588-024-01877-6 (2024).
Gong, Z. et al. Repeatless and repeat-based centromeres in potato: implications for centromere evolution. Plant Cell 24, 3559–3574 (2012).
Article CAS PubMed PubMed Central Google Scholar
Presting, G. G. Centromeric retrotransposons and centromere function. Curr. Opin. Genet. Dev. 49, 79–84 (2018).
Article CAS PubMed Google Scholar
Neumann, P. et al. Plant centromeric retrotransposons: a structural and cytogenetic perspective. Mob. DNA 2, 4 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wlodzimierz, P. et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618, 557–565 (2023).
Article CAS PubMed Google Scholar
Montefalcone, G., Tempesta, S., Rocchi, M. & Archidiacono, N. Centromere repositioning. Genome Res. 9, 1184–1188 (1999).
Article CAS PubMed PubMed Central Google Scholar
Grover, C. E. et al. Molecular confirmation of species status for the allopolyploid cotton species, Gossypium ekmanianum Wittmack. Genet. Resour. Crop Evol. 62, 103–114 (2015).
Article Google Scholar
Gallagher, J. P., Grover, C. E., Rex, K., Moran, M. & Wendel, J. F. A new species of cotton from Wake Atoll, Gossypium stephensii (Malvaceae). Syst. Bot. 42, 115–123 (2017).
Article Google Scholar
Song, H.-R. et al. The RNA binding protein ELF9 directly reduces SUPPRESSOR OF OVEREXPRESSION OF CO1 transcript levels in arabidopsis, possibly via nonsense-mediated mRNA decay. Plant Cell 21, 1195–1211 (2009).
Article CAS PubMed PubMed Central Google Scholar
Jarillo, J. A. & Piñeiro, M. H2A.Z mediates different aspects of chromatin function and modulates flowering responses in Arabidopsis. Plant J 83, 96–109 (2015).
Article CAS PubMed Google Scholar
Hu, H. et al. Unravelling inversions: technological advances, challenges, and potential impact on crop breeding. Plant Biotechnol. J. https://doi.org/10.1111/pbi.14224 (2023).
Stefanova, P., Taseva, M., Georgieva, T., Gotcheva, V. & Angelov, A. A modified CTAB method for DNA extraction from soybean and meat products. Biotechnol. Biotechnol. Equip. 27, 3803–3810 (2013).
Article CAS Google Scholar
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20, 129 (2019).
Article PubMed PubMed Central Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).
Article CAS PubMed Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: assessing genomic data quality and beyond. Curr. Protoc. 1, e323 (2021).
Article PubMed Google Scholar
Jo, H. & Koh, G. Faster single-end alignment generation utilizing multi-thread for BWA. Biomed. Mater. Eng. 26, S1791–S1796 (2015).
PubMed Google Scholar
Danecek, P. & McCarthy, S. A. BCFtools/csq: haplotype-aware variant consequences. Bioinformatics 33, 2037–2039 (2017).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. & Wang, L. GMATA: an integrated software package for genome-scale SSR mining, marker development and viewing. Front. Plant Sci. 7, 1350 (2016).
PubMed PubMed Central Google Scholar
Saha, S., Bridges, S., Magbanua, Z. V. & Peterson, D. G. Empirical comparison of ab initio repeat finding programs. Nucleic Acids Res. 36, 2284–2294 (2008).
Article CAS PubMed PubMed Central Google Scholar
Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199 (2010).
Article PubMed PubMed Central Google Scholar
Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Article CAS PubMed Google Scholar
Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res. 43, e78 (2015).
Article PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
Article CAS PubMed PubMed Central Google Scholar
Wang, B. et al. High-quality Arabidopsis thaliana genome assembly with Nanopore and HiFi long reads. Genom. Proteom. Bioinform. 20, 4–13 (2022).
Article CAS Google Scholar
Udall, J. A. et al. De novo genome sequence assemblies of Gossypium raimondii and Gossypium turneri. G3 9, 3079–3085 (2019).
Article CAS PubMed PubMed Central Google Scholar
Argout, X. et al. The genome of Theobroma cacao. Nat. Genet. 43, 101–108 (2011).
Article CAS PubMed Google Scholar
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89 (2016).
Article PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Article PubMed PubMed Central Google Scholar
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).
Article CAS PubMed Google Scholar
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).
Article CAS PubMed PubMed Central Google Scholar
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49, 9077–9096 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Article CAS PubMed PubMed Central Google Scholar
Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE 11, e0163962 (2016).
Article PubMed PubMed Central Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article CAS PubMed Google Scholar
Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015).
Article CAS PubMed PubMed Central Google Scholar
Nattestad, M. & Schatz, M. C. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics 32, 3021–3023 (2016).
Article CAS PubMed PubMed Central Google Scholar
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central Google Scholar
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
Article PubMed PubMed Central Google Scholar
Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
Article PubMed PubMed Central Google Scholar
Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368 (2021).
Article CAS PubMed PubMed Central Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Article PubMed PubMed Central Google Scholar
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y., Chu, J., Cheng, H. & Li, H. De novo reconstruction of satellite repeat units from sequence data. Genome Res. 33, 1994–2001 (2023).
Article CAS PubMed PubMed Central Google Scholar
Hao, Z. et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 6, e251 (2020).
Article PubMed PubMed Central Google Scholar
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
Article CAS PubMed PubMed Central Google Scholar
Cheng, Z., Presting, G. G., Buell, C. R., Wing, R. A. & Jiang, J. High-resolution pachytene chromosome mapping of bacterial artificial chromosomes anchored by genetic markers reveals the centromere ___location and the distribution of genetic recombination along chromosome 10 of rice. Genetics 157, 1749–1757 (2001).
Article CAS PubMed PubMed Central Google Scholar
Liu, Y. et al. Construction and primary application of oligos fluorescence in situ hybridization technology in cotton. Cotton Sci. 29, 213–221 (2017).
CAS Google Scholar
Krueger, F. Trim galore: a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files (Babraham Institute, 2015).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics 27, 1571–1572 (2011).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. BEDTools: the swiss-army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11.12.1–11.12.34 (2014).
Article PubMed Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol 9, R137 (2008).
Article PubMed PubMed Central Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Article CAS PubMed PubMed Central Google Scholar
Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 22, 1813–1831 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44, W160–W165 (2016).
Article PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
Article CAS PubMed PubMed Central Google Scholar
He, F., Ding, S., Wang, H. & Qin, F. IntAssoPlot: an R package for integrated visualization of genome-wide association study results with gene structure and linkage disequilibrium matrix. Front. Genet. 11, 260 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hu G. Code for the publication ‘T2T reference genome of G. hirsutum cv. ZM113’. Zenodo https://doi.org/10.5281/zenodo.14840103 (2025).

Download references

Acknowledgements

This study was supported by the National Key Research and Development Program of China (2021YFF1000100 to Xiongfeng Ma), the National Natural Science Foundation of China (32072114 to Xiongfeng Ma, 32072111 to G.H. and 32070544 to K.W.), the Key Research and Development project of Xinjiang Uygur Autonomous Region (2022B02052-2 to Xiongfeng Ma), the Tianshan Innovation Team of Xinjiang Uygur Autonomous Region (2023D14016 to G.H.), the Major Science and Technology Program of Changji Hui Autonomous Prefecture (2021Z01-01 to Xiongfeng Ma), the Postdoctoral and High-level Flexible Talents of Xinjiang Uygur Autonomous Region (RSSQ00066509 to X. Zhang), the China Agriculture Research System (CARS-15-07 to Xiongfeng Ma), the Shennong Talents program (SNYCQN002-2022 to Xiongfeng Ma), the Top Talent Project of Henan Province (ZYYCYU202012146 to Xiongfeng Ma), the Natural Science Foundation of Henan Province (202300410550 to Xiongfeng Ma) and the Chinese Academy of Agricultural Sciences (the Innovation Project CAAS-ASTIP-ICR-KP-2021-01 to Xiongfeng Ma, the Youth Innovation Project Y2023QC38 to Z. Wang, the Young Elite Scientists Sponsorship Program by Henan Association for Science and Technology (2024HYTP010 to Z.T.) and the Special Project Cotton Research Institute 1610162023005 to X.W.). We thank X. Du (Cotton Research Institute, Chinese Academy of Agricultural Sciences) and J. Zhang (Guangxi University) for discussion and critical reading of the paper.

Author information

These authors contributed equally: Guanjing Hu, Zhenyu Wang, Zunzhe Tian, Kai Wang, Gaoxiang Ji, Xingxing Wang, Xianliang Zhang, Zhaoen Yang.

Authors and Affiliations

National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
Guanjing Hu, Zhenyu Wang, Gaoxiang Ji, Xingxing Wang, Xianliang Zhang, Zhaoen Yang, Ruoyu Niu, Yuzhi Zhang, Lian Duan, Junjie Zhao, Shoupu He, Xinshan Zang & Xiongfeng Ma
Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
Guanjing Hu, Zunzhe Tian, Xuan Liu, Ruoyu Niu, De Zhu, Lian Duan, Xueyuan Ma, Xianpeng Xiong, Jiali Kong, Xianjia Zhao, Ya Zhang, Zhiqiang Wu, Weihua Pan & Xiongfeng Ma
Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
Zhenyu Wang, Gaoxiang Ji, Xingxing Wang, Xianliang Zhang, Zhaoen Yang, Xuan Liu, Junjie Zhao, Shoupu He, Xinshan Zang & Xiongfeng Ma
School of Life Sciences, Nantong University, Nantong, China
Kai Wang, Guangrun Yu & Jinlei Han
Western Research Institute, Chinese Academy of Agricultural Sciences, Changji, China
Xianliang Zhang
Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, Iowa, USA
Corrinne E. Grover & Jonathan F. Wendel
State Key Laboratory of Aridland Crop Science, College of Life Science and Technology, Gansu Agricultural University, Lanzhou, China
Junji Su
Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, China
Keyun Feng

Authors

Guanjing Hu
View author publications
Search author on:PubMed Google Scholar
Zhenyu Wang
View author publications
Search author on:PubMed Google Scholar
Zunzhe Tian
View author publications
Search author on:PubMed Google Scholar
Kai Wang
View author publications
Search author on:PubMed Google Scholar
Gaoxiang Ji
View author publications
Search author on:PubMed Google Scholar
Xingxing Wang
View author publications
Search author on:PubMed Google Scholar
Xianliang Zhang
View author publications
Search author on:PubMed Google Scholar
Zhaoen Yang
View author publications
Search author on:PubMed Google Scholar
Xuan Liu
View author publications
Search author on:PubMed Google Scholar
Ruoyu Niu
View author publications
Search author on:PubMed Google Scholar
De Zhu
View author publications
Search author on:PubMed Google Scholar
Yuzhi Zhang
View author publications
Search author on:PubMed Google Scholar
Lian Duan
View author publications
Search author on:PubMed Google Scholar
Xueyuan Ma
View author publications
Search author on:PubMed Google Scholar
Xianpeng Xiong
View author publications
Search author on:PubMed Google Scholar
Jiali Kong
View author publications
Search author on:PubMed Google Scholar
Xianjia Zhao
View author publications
Search author on:PubMed Google Scholar
Ya Zhang
View author publications
Search author on:PubMed Google Scholar
Junjie Zhao
View author publications
Search author on:PubMed Google Scholar
Shoupu He
View author publications
Search author on:PubMed Google Scholar
Corrinne E. Grover
View author publications
Search author on:PubMed Google Scholar
Junji Su
View author publications
Search author on:PubMed Google Scholar
Keyun Feng
View author publications
Search author on:PubMed Google Scholar
Guangrun Yu
View author publications
Search author on:PubMed Google Scholar
Jinlei Han
View author publications
Search author on:PubMed Google Scholar
Xinshan Zang
View author publications
Search author on:PubMed Google Scholar
Zhiqiang Wu
View author publications
Search author on:PubMed Google Scholar
Weihua Pan
View author publications
Search author on:PubMed Google Scholar
Jonathan F. Wendel
View author publications
Search author on:PubMed Google Scholar
Xiongfeng Ma
View author publications
Search author on:PubMed Google Scholar

Contributions

Xiongfeng Ma conceived this project and coordinated research activities. G.H. and X.W. designed the experiments. X. Zhang, X.W. and K.F. prepared the plant materials. Z.T., X.W., G.H., X.L., R.N., D.Z., Xueyuan Ma, X.X., Ya Zhang, X. Zang and W.P. assembled, validated and annotated the ZM113 genome. K.F. and Z. Wu assembled and annotated the cytoplasmic genomes. K.W., G.Y. and J.H. performed the ChIP-seq experiment. L.D., Ya Zhang, G.H. and X.X. analyzed the ChIP-seq data. Yuzhi Zhang performed FISH and PCR experiments. Z.T., Ya Zhang and G.H. performed the centromere characterization and evolutionary analysis. X.L. performed the SV analysis. R.N. performed pan-genomic and RNA-seq analyses. Xueyuan Ma analyzed the bisulfite sequencing data. Z. Wang, G.J., Z.Y., J.Z. and X.W. conducted the GWAS and haplotype analysis, with inputs from J.S., S.H. and X. Zang. G.H., Z.T., Z. Wang, G.J. and X.W. wrote the paper, with inputs from D.Z. and X.X. for genome annotation, X.L. and R.N. for pan-genomic analysis, Xueyuan Ma for DNA methylation analysis, and Ya Zhang for ChIP-seq analysis. K.W., C.E.G. and J.F.W. revised the paper. All authors read and approved the paper.

Corresponding author

Correspondence to Xiongfeng Ma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Yuxian Zhu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Syntenic plot of two different assembly versions of ZM113.

Reference (ref), a previous version assembled using only ONT ultralong with NextDenovo (v2.3.1). Query (qry), the current version assembled using both the ONT ultralong and PacBio HiFi reads. The current assembly is larger in size (2,299.07 Mb vs 2288.35 Mb) and displayed notable improvement on A09 and D09 regions both harboring 5S rDNA clusters.

Extended Data Fig. 2 Potential assembly error (D10) and heterzygous sites (A01, A09, A12, D04 and D08) detected by Veritymap K-mer distance concordance test.

Based on the k-mer discrepancies between HiFi reads and the assembly, if a significant proportion of reads (20-80%) deviate from the assembly at a specific ___location, a heterozygous site is suspected; if nearly all reads deviate (>80%), a potential assembly error is indicated.

Extended Data Fig. 3 Whole-genome alignment between ZM113 and five TM-1 reference genome assemblies.

SyRI visualization of synteny (gray), inversion (orange), translocation (green) and duplication (blue) are shown. Common inversions between ZM113 and TM-1 references are indicated by red arrows.

Extended Data Fig. 4 Collinearity of A08 chromosome between ZM113 and 11 G. hirsutum genome assemblies.

SyRI visualization of synteny (gray), inversion (orange), translocation (green) and duplication (blue) are shown.

Extended Data Fig. 5 Collinearity of D03 chromosome in seven G. hirsutum assemblies.

The newly published TM-1 UTX_v3.0 version (Sreedasyam et al. 2024) corrected the ~10 Mb inversion in chromosome D03 found between UTX v2.1 and other G. hirsutum genomes including ZM113.

Extended Data Fig. 6 Integrated analysis of 4 Mb inversion and centromere repositioning in D08 chromosome.

Shown left is the synteny plot of the chromosome D08 between ZM113 and other reference genomes. The inversion at 34–38 Mb divided the D08 locus into two haplotypes. GhSat194 arrays colocalize with the boundary of the inversion. Shown right are synteny plots between ZM113 with AD₆ (classified to Hap1) and AD₇ (classified to Hap2). CENH3 peaks were shown at the bottom.

Extended Data Fig. 7 IGV visualization of the ZM113 D08 chromosome showed log2(ChIP/input) signal tracks from CENH3 ChIP-seq experiments.

Top to bottom tracks are: 1-4. ZM113 ChIP-seq, GhCEN08 ___location, Ψ-GhD08CEN ___location, and repetitive sequence annotation; 5-7. ChIP-seq experiments on G. hirsutum (AD₁) var TM-1 (ZJU), G. stephensii (AD₇), G. ekmanianum (AD₆), G. tomentosum (AD₃), G. barbadense (AD₂), G. darwinii (AD₅), G. mustelinum (AD₄), and G. raimondii (D₅).

Extended Data Fig. 8 Comparative annotation and functional validation of the ZM113 private gene Ghir_D03G00738.

(a) IGV visualization of the TM-1 reference genome (version: CRI v1.0) region containing the unannotated Ghir_D03G00738 sequence. (b) IGV visualization of Ghir_D03G00738 annotation and transcriptome alignment in the ZM113 reference genome. (c) The VIGS results show that Ghir_D03G00738 regulates flowering time in upland cotton. Silenced plant phenotype images. (d) Silencing efficiency of Ghir_D03G00738 was detected by qRT-PCR. (e–g) Statistics of first fruiting node position, flowering time, and budding time.

Extended Data Fig. 9 PCR validation of the ZM113 private gene Ghir_D03G00738.

(a) Gene structure and PCR primer design. (b) Agarose gel image of the amplified fragments from both ZM113 and TM-1 materials. (c) Sequence alignment results of the amplified fragments. The unprocessed images of the gels are available in the source data file.

Source data

Extended Data Fig. 10 PCR amplification validation of D03_INV.

(a) Primer design scheme and sequences spanning the inversion breakpoint region. (b) Amplification results with Primer 1. (c) Amplification results with Primer 2. The unprocessed images of the gels are available in the source data file.

Source data

Supplementary information

Supplementary Information

Supplementary Notes 1 and 2 and Figs. 1–31.

Reporting Summary

Peer Review File

Supplementary Tables 1–30

Supplementary Table 1. Comparison of reference genomes of G. hirsutum. Supplementary Table 2. Multi-platform sequencing data generated for genome assembly. Supplementary Table 3. Summary of the five gap regions closed. Supplementary Table 4. Statistics of 26 chromosomes with centromere and telomere locations. Supplementary Table 5. Assembly consensus quality assessment by Mercury. Supplementary Table 6. Assembly and gene annotation quality assessment by BUSCO. Supplementary Table 7. Repetitive elements in ZM113 genome assembly. Supplementary Table 8. Statistics of functional annotation of gene models in the ZM113 genome assembly. Supplementary Table 9. Statistics of transcriptome support for gene models. Supplementary Table 10. Syntenic orthologous analysis of ZM113 genes relative to parental diploids. Supplementary Table 11. Annotation summary of non-coding RNA. Supplementary Table 12. Genomic distribution of rDNA arrays. Supplementary Table 13. Assembly features of ZM113 organelle genomes. Supplementary Table 14. Gene profile and organization of the ZM113 mtochondrial genome. Supplementary Table 15. Gene profile and organization of the ZM113 chloroplast genome. Supplementary Table 16. Summary of multi-omics data of ChIP-seq, BS-seq, and RNA-seq. Supplementary Table 17. Centromere positions in ZM113 identified by mapping centromere-specific repeat sequences. Supplementary Table 18. Gene identification within and flanking centromeric regions. Supplementary Table 19. Characterization of ZM113 centormeric genes by HOG, pan-genomic annotation, and differential expression. Supplementary Table 20. Statistics of repetitive sequences in centromeric regions and genome-wide. Supplementary Table 21. Characterization of 2530 CEN-Tes based on percentage content per centromere. Supplementary Table 22. DNA methylation levels, genome-wide and in centromeres. Supplementary Table 23. Summary of structurals variants between ZM113 and 9 other G. hirsutum genome assemblies used for genome comparisons. Supplementary Table 24. Summary of non-redundant SVs against ZM113. Supplementary Table 25. Summary of coreDELs relative to ZM113. Supplementary Table 26. Summary of 28 genes located in coreDELs. Supplementary Table 27. Candiate genes contraining SNPs and elite alleles associated with lint percentage (LP) and Flowering day (FD). Supplementary Table 28. SNPs and elite alleles associated with lint percentage (LP) and Flowering day (FD) in 419 accessions. Supplementary Table 29. The cumulative number of elite SNPs associated with LP and FD in 419 accessions. Supplementary Table 30. Haplotype typing results of the D03 chromosome interval.

Source data

Source Data Fig. 4

The unprocessed gels in red box correspond to Fig. 4g,h.

Source Data Extended Data Fig. 9

The unprocessed gels in red box correspond to Extended Data Fig. 9c.

Source Data Extended Data Fig. 10

The unprocessed gels in red box correspond to Extended Data Fig. 10b,c.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, G., Wang, Z., Tian, Z. et al. A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation. Nat Genet 57, 1031–1043 (2025). https://doi.org/10.1038/s41588-025-02130-4

Download citation

Received: 23 November 2023
Accepted: 14 February 2025
Published: 17 March 2025
Issue Date: April 2025
DOI: https://doi.org/10.1038/s41588-025-02130-4

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links