Abstract
Cotton (Gossypium hirsutum L.) is a key allopolyploid crop with global economic importance. Here we present a telomere-to-telomere assembly of the elite variety Zhongmian 113. Leveraging technologies including PacBio HiFi, Oxford Nanopore Technology (ONT) ultralong-read sequencing and Hi-C, our assembly surpasses previous genomes in contiguity and completeness, resolving 26 centromeric and 52 telomeric regions, 5S rDNA clusters and nucleolar organizer regions. A phylogenetically recent centromere repositioning on chromosome D08 was discovered specific to G. hirsutum, involving deactivation of an ancestral centromere and the formation of a unique, satellite repeat-based centromere. Genomic analyses evaluated favorable allele aggregation for key agronomic traits and uncovered an early-maturing haplotype derived from an 11 Mb pericentric inversion that evolved early during G. hirsutum domestication. Our study sheds light on the genomic origins of short-season adaptation, potentially involving introgression of an inversion from primitively domesticated forms, followed by subsequent haplotype differentiation in modern breeding programs.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
27,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
209,00 € per year
only 17,42 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The ZM113 genome assembly and annotation were deposited in NCBI under BioProject accession number PRJNA1137578 and Cotton Breeding Database (accessible at http://222.88.152.130:1130/). The raw sequencing reads by ONT ultralong-read sequencing, PacBio HiFi sequencing, MGI PE150 short-read sequencing, Hi-C, RNA-seq, ChIP-seq and BS-seq are available in NCBI under BioProject PRJNA1041574 and Cotton Breeding Database (accessible at http://222.88.152.130:1130/). The raw Bionano data are available in the Genome Sequence Archive of China National Genomics Data Center under PRJCA029603 accession CRA018637 (accessible at https://ngdc.cncb.ac.cn/gsa/). Nine G. hirsutum reference genomes and two diploid cotton genomes were downloaded from CottonGen (accessible at https://cottongen.org/). Genetic variant VCF for Hap-D03-1 to Hap-D03-5 was deposited in https://github.com/huguanjing/cottonRef_ZM113/. Source data are provided with this paper.
Code availability
Custom scripts are available in the GitHub repository (https://github.com/huguanjing/cottonRef_ZM113) and on Zenodo (https://doi.org/10.5281/zenodo.14840103)127.
References
Viot, C. R. & Wendel, J. F. Evolution of the cotton genus, Gossypium, and its domestication in the Americas. CRC Crit. Rev. Plant Sci. 42, 1–33 (2023).
Yang, Z. et al. Recent progression and future perspectives in cotton genomic breeding. J. Integr. Plant Biol. 65, 548–569 (2023).
Wen, X. et al. A comprehensive overview of cotton genomics, biotechnology and molecular biological studies. Sci. China Life Sci 66, 2214–2256 (2023).
Zhao, H. et al. Recent advances and future perspectives in early-maturing cotton research. New Phytol 237, 1100–1114 (2023).
Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, 531–537 (2015).
Li, F. et al. Genome sequence of cultivated upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat. Biotechnol. 33, 524–530 (2015).
Yang, Z. et al. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat. Commun. 10, 2989 (2019).
Wang, M. et al. Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. Nat. Genet. 51, 224–229 (2018).
Chang, X. et al. High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the centromeric landscape and evolution. Plant Commun. 5, 100722 (2024).
Huang, G. et al. Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution. Nat. Genet. 52, 516–524 (2020).
Chen, Z. J. et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 52, 525–533 (2020).
Hu, Y. et al. Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton. Nat. Genet. 51, 739–748 (2019).
Sreedasyam, A. et al. Genome resources for three modern cotton lines guide future breeding efforts. Nat. Plants 10, 1039–1051 (2024).
He, S. et al. The genomic basis of geographic differentiation and fiber improvement in cultivated cotton. Nat. Genet. 53, 916–924 (2021).
Ma, Z. et al. High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement. Nat. Genet. 53, 1385–1391 (2021).
Perkin, L. C. et al. Genome assembly of two nematode-resistant cotton lines (Gossypium hirsutum L.). G3 11, jkab276 (2021).
Peng, R. et al. Evolutionary divergence of duplicated genomes in newly described allotetraploid cottons. Proc. Natl Acad. Sci. USA 119, e2208496119 (2022).
Cheng, Y. et al. Gossypium purpurascens genome provides insight into the origin and domestication of upland cotton. J. Advert. Res. https://doi.org/10.1016/j.jare.2023.03.006 (2023).
Meng, Q. et al. Comparative analysis of genome sequences of the two cultivated tetraploid cottons, Gossypium hirsutum (L.) and G. barbadense (L.). Ind. Crops Prod. 196, 116471 (2023).
Dai, S. et al. Phenotypic characteristics and cultivation techniques of an early maturing and machine-harvested cotton variety Zhongmian 113 in introduction and demonstration of Xinjiang. China Cotton 49, 34–36 (2022).
Wang, K. et al. High yield and efficiency cultivation techniques of an upland cotton cultivar, Zhongmian 113, with early maturity and excellent fiber quality. China Cotton 48, 32–33 (2021).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245 (2020).
Hawkins, J. S., Kim, H., Nason, J. D., Wing, R. A. & Wendel, J. F. Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 16, 1252–1261 (2006).
Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).
Hanson, R. E. et al. Distribution of 5S and 18S-28S rDNA loci in a tetraploid cotton (Gossypium hirsutum L.) and its putative diploid ancestors. Chromosoma 105, 55–61 (1996).
Ji, Y. et al. New ribosomal RNA gene locations in Gossypium hirsutum mapped by meiotic FISH. Chromosoma 108, 200–207 (1999).
Gan, Y. et al. Chromosomal locations of 5S and 45S rDNA in Gossypium genus and its phylogenetic implications revealed by FISH. PLoS ONE 8, e68207 (2013).
Mower, J. P. Variation in protein gene and intron content among land plant mitogenomes. Mitochondrion 53, 203–213 (2020).
Wu, Z.-Q., Liao, X.-Z., Zhang, X.-N., Tembrock, L. R. & Broz, A. Genomic architectural variation of plant mitochondria—a review of multichromosomal structuring. J. Syst. Evol. 60, 160–168 (2022).
Feng, Y. et al. Assembly and phylogenomic analysis of cotton mitochondrial genomes provide insights into the history of cotton evolution. Crop J 11, 1782–1792 (2023).
Han, J. et al. Rapid proliferation and nucleolar organizer targeting centromeric retrotransposons in cotton. Plant J 88, 992–1005 (2016).
Luo, S. et al. The cotton centromere contains a Ty3-gypsy-like LTR retroelement. PLoS ONE 7, e35261 (2012).
Nagaki, K. et al. Sequencing of a rice centromere uncovers active genes. Nat. Genet. 36, 138–145 (2004).
Schneider, K. L., Xie, Z., Wolfgruber, T. K. & Presting, G. G. Inbreeding drives maize centromere evolution. Proc. Natl Acad. Sci. USA 113, E987–E996 (2016).
Zhao, H. et al. Gene expression and chromatin modifications associated with maize centromeres. G3 6, 183–192 (2015).
Wang, K., Wu, Y., Zhang, W., Dawe, R. K. & Jiang, J. Maize centromeres expand and adopt a uniform size in the genetic background of oat. Genome Res. 24, 107–116 (2014).
Gassmann, R. et al. An inverse relationship to germline transcription defines centromeric chromatin in C. elegans. Nature 484, 534–537 (2012).
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
Liu, Y. et al. Pan-centromere reveals widespread centromere repositioning of soybean genomes. Proc. Natl Acad. Sci. USA 120, e2310177120 (2023).
Zhao, J. et al. Centromere repositioning and shifts in wheat evolution. Plant Commun 4, 100556 (2023).
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 20, 275 (2019).
Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol 25, 107 (2024).
Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Lovell, J. T. et al. GENESPACE tracks regions of interest and gene copy number variation across multiple genomes. eLife 11, e78526 (2022).
Ma, Z. et al. Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield. Nat. Genet. 50, 803–813 (2018).
Li, Y. et al. Genomic insights into the genetic basis of cotton breeding in China. Mol. Plant 16, 662–677 (2023).
Li, L. et al. Genomic analyses reveal the genetic basis of early maturity and identification of loci and candidate genes in upland cotton (Gossypium hirsutum L.). Plant Biotechnol. J. 19, 109–123 (2021).
Zhang, Y. et al. Uncovering genomic and transcriptional variations facilitates utilization of wild resources in cotton disease resistance improvement. Theor. Appl. Genet. 136, 204 (2023).
Lee, C.-R. et al. Young inversion with multiple linked QTLs under selection in a hybrid zone. Nat. Ecol. Evol. 1, 119 (2017).
Comai, L., Maheshwari, S. & Marimuthu, M. P. A. Plant centromeres. Curr. Opin. Plant Biol. 36, 158–167 (2017).
Henikoff, S., Ahmad, K. & Malik, H. S. The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293, 1098–1102 (2001).
Jiang, J., Birchler, J. A., Parrott, W. A. & Dawe, R. K. A molecular view of plant centromeres. Trends Plant Sci 8, 570–575 (2003).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).
Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).
Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1221–1231 (2023).
Song, J.-M. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).
Wang, T. et al. A complete gap-free diploid genome in Saccharum complex and the genomic footprints of evolution in the highly polyploid Saccharum genus. Nat. Plants 9, 554–571 (2023).
Zhang, L. et al. A near-complete genome assembly of Brassica rapa provides new insights into the evolution of centromeres. Plant Biotechnol. J. 21, 1022–1032 (2023).
Zhang, W. et al. Identification of centromeric regions on the linkage map of cotton using centromere-related repeats. Genomics 104, 587–593 (2014).
Melters, D. P. et al. Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biol 14, R10 (2013).
Huang, G. et al. A telomere-to-telomere cotton genome assembly reveals centromere evolution and a Mutator transposon-linked module regulating embryo development. Nat. Genet. https://doi.org/10.1038/s41588-024-01877-6 (2024).
Gong, Z. et al. Repeatless and repeat-based centromeres in potato: implications for centromere evolution. Plant Cell 24, 3559–3574 (2012).
Presting, G. G. Centromeric retrotransposons and centromere function. Curr. Opin. Genet. Dev. 49, 79–84 (2018).
Neumann, P. et al. Plant centromeric retrotransposons: a structural and cytogenetic perspective. Mob. DNA 2, 4 (2011).
Wlodzimierz, P. et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618, 557–565 (2023).
Montefalcone, G., Tempesta, S., Rocchi, M. & Archidiacono, N. Centromere repositioning. Genome Res. 9, 1184–1188 (1999).
Grover, C. E. et al. Molecular confirmation of species status for the allopolyploid cotton species, Gossypium ekmanianum Wittmack. Genet. Resour. Crop Evol. 62, 103–114 (2015).
Gallagher, J. P., Grover, C. E., Rex, K., Moran, M. & Wendel, J. F. A new species of cotton from Wake Atoll, Gossypium stephensii (Malvaceae). Syst. Bot. 42, 115–123 (2017).
Song, H.-R. et al. The RNA binding protein ELF9 directly reduces SUPPRESSOR OF OVEREXPRESSION OF CO1 transcript levels in arabidopsis, possibly via nonsense-mediated mRNA decay. Plant Cell 21, 1195–1211 (2009).
Jarillo, J. A. & Piñeiro, M. H2A.Z mediates different aspects of chromatin function and modulates flowering responses in Arabidopsis. Plant J 83, 96–109 (2015).
Hu, H. et al. Unravelling inversions: technological advances, challenges, and potential impact on crop breeding. Plant Biotechnol. J. https://doi.org/10.1111/pbi.14224 (2023).
Stefanova, P., Taseva, M., Georgieva, T., Gotcheva, V. & Angelov, A. A modified CTAB method for DNA extraction from soybean and meat products. Biotechnol. Biotechnol. Equip. 27, 3803–3810 (2013).
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20, 129 (2019).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: assessing genomic data quality and beyond. Curr. Protoc. 1, e323 (2021).
Jo, H. & Koh, G. Faster single-end alignment generation utilizing multi-thread for BWA. Biomed. Mater. Eng. 26, S1791–S1796 (2015).
Danecek, P. & McCarthy, S. A. BCFtools/csq: haplotype-aware variant consequences. Bioinformatics 33, 2037–2039 (2017).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Wang, X. & Wang, L. GMATA: an integrated software package for genome-scale SSR mining, marker development and viewing. Front. Plant Sci. 7, 1350 (2016).
Saha, S., Bridges, S., Magbanua, Z. V. & Peterson, D. G. Empirical comparison of ab initio repeat finding programs. Nucleic Acids Res. 36, 2284–2294 (2008).
Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199 (2010).
Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res. 43, e78 (2015).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
Wang, B. et al. High-quality Arabidopsis thaliana genome assembly with Nanopore and HiFi long reads. Genom. Proteom. Bioinform. 20, 4–13 (2022).
Udall, J. A. et al. De novo genome sequence assemblies of Gossypium raimondii and Gossypium turneri. G3 9, 3079–3085 (2019).
Argout, X. et al. The genome of Theobroma cacao. Nat. Genet. 43, 101–108 (2011).
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89 (2016).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49, 9077–9096 (2021).
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE 11, e0163962 (2016).
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015).
Nattestad, M. & Schatz, M. C. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics 32, 3021–3023 (2016).
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368 (2021).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Zhang, Y., Chu, J., Cheng, H. & Li, H. De novo reconstruction of satellite repeat units from sequence data. Genome Res. 33, 1994–2001 (2023).
Hao, Z. et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 6, e251 (2020).
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
Cheng, Z., Presting, G. G., Buell, C. R., Wing, R. A. & Jiang, J. High-resolution pachytene chromosome mapping of bacterial artificial chromosomes anchored by genetic markers reveals the centromere ___location and the distribution of genetic recombination along chromosome 10 of rice. Genetics 157, 1749–1757 (2001).
Liu, Y. et al. Construction and primary application of oligos fluorescence in situ hybridization technology in cotton. Cotton Sci. 29, 213–221 (2017).
Krueger, F. Trim galore: a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files (Babraham Institute, 2015).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics 27, 1571–1572 (2011).
Quinlan, A. R. BEDTools: the swiss-army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11.12.1–11.12.34 (2014).
Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol 9, R137 (2008).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 22, 1813–1831 (2012).
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44, W160–W165 (2016).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303 (2010).
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
He, F., Ding, S., Wang, H. & Qin, F. IntAssoPlot: an R package for integrated visualization of genome-wide association study results with gene structure and linkage disequilibrium matrix. Front. Genet. 11, 260 (2020).
Hu G. Code for the publication ‘T2T reference genome of G. hirsutum cv. ZM113’. Zenodo https://doi.org/10.5281/zenodo.14840103 (2025).
Acknowledgements
This study was supported by the National Key Research and Development Program of China (2021YFF1000100 to Xiongfeng Ma), the National Natural Science Foundation of China (32072114 to Xiongfeng Ma, 32072111 to G.H. and 32070544 to K.W.), the Key Research and Development project of Xinjiang Uygur Autonomous Region (2022B02052-2 to Xiongfeng Ma), the Tianshan Innovation Team of Xinjiang Uygur Autonomous Region (2023D14016 to G.H.), the Major Science and Technology Program of Changji Hui Autonomous Prefecture (2021Z01-01 to Xiongfeng Ma), the Postdoctoral and High-level Flexible Talents of Xinjiang Uygur Autonomous Region (RSSQ00066509 to X. Zhang), the China Agriculture Research System (CARS-15-07 to Xiongfeng Ma), the Shennong Talents program (SNYCQN002-2022 to Xiongfeng Ma), the Top Talent Project of Henan Province (ZYYCYU202012146 to Xiongfeng Ma), the Natural Science Foundation of Henan Province (202300410550 to Xiongfeng Ma) and the Chinese Academy of Agricultural Sciences (the Innovation Project CAAS-ASTIP-ICR-KP-2021-01 to Xiongfeng Ma, the Youth Innovation Project Y2023QC38 to Z. Wang, the Young Elite Scientists Sponsorship Program by Henan Association for Science and Technology (2024HYTP010 to Z.T.) and the Special Project Cotton Research Institute 1610162023005 to X.W.). We thank X. Du (Cotton Research Institute, Chinese Academy of Agricultural Sciences) and J. Zhang (Guangxi University) for discussion and critical reading of the paper.
Author information
Authors and Affiliations
Contributions
Xiongfeng Ma conceived this project and coordinated research activities. G.H. and X.W. designed the experiments. X. Zhang, X.W. and K.F. prepared the plant materials. Z.T., X.W., G.H., X.L., R.N., D.Z., Xueyuan Ma, X.X., Ya Zhang, X. Zang and W.P. assembled, validated and annotated the ZM113 genome. K.F. and Z. Wu assembled and annotated the cytoplasmic genomes. K.W., G.Y. and J.H. performed the ChIP-seq experiment. L.D., Ya Zhang, G.H. and X.X. analyzed the ChIP-seq data. Yuzhi Zhang performed FISH and PCR experiments. Z.T., Ya Zhang and G.H. performed the centromere characterization and evolutionary analysis. X.L. performed the SV analysis. R.N. performed pan-genomic and RNA-seq analyses. Xueyuan Ma analyzed the bisulfite sequencing data. Z. Wang, G.J., Z.Y., J.Z. and X.W. conducted the GWAS and haplotype analysis, with inputs from J.S., S.H. and X. Zang. G.H., Z.T., Z. Wang, G.J. and X.W. wrote the paper, with inputs from D.Z. and X.X. for genome annotation, X.L. and R.N. for pan-genomic analysis, Xueyuan Ma for DNA methylation analysis, and Ya Zhang for ChIP-seq analysis. K.W., C.E.G. and J.F.W. revised the paper. All authors read and approved the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Yuxian Zhu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Syntenic plot of two different assembly versions of ZM113.
Reference (ref), a previous version assembled using only ONT ultralong with NextDenovo (v2.3.1). Query (qry), the current version assembled using both the ONT ultralong and PacBio HiFi reads. The current assembly is larger in size (2,299.07 Mb vs 2288.35 Mb) and displayed notable improvement on A09 and D09 regions both harboring 5S rDNA clusters.
Extended Data Fig. 2 Potential assembly error (D10) and heterzygous sites (A01, A09, A12, D04 and D08) detected by Veritymap K-mer distance concordance test.
Based on the k-mer discrepancies between HiFi reads and the assembly, if a significant proportion of reads (20-80%) deviate from the assembly at a specific ___location, a heterozygous site is suspected; if nearly all reads deviate (>80%), a potential assembly error is indicated.
Extended Data Fig. 3 Whole-genome alignment between ZM113 and five TM-1 reference genome assemblies.
SyRI visualization of synteny (gray), inversion (orange), translocation (green) and duplication (blue) are shown. Common inversions between ZM113 and TM-1 references are indicated by red arrows.
Extended Data Fig. 4 Collinearity of A08 chromosome between ZM113 and 11 G. hirsutum genome assemblies.
SyRI visualization of synteny (gray), inversion (orange), translocation (green) and duplication (blue) are shown.
Extended Data Fig. 5 Collinearity of D03 chromosome in seven G. hirsutum assemblies.
The newly published TM-1 UTX_v3.0 version (Sreedasyam et al. 2024) corrected the ~10 Mb inversion in chromosome D03 found between UTX v2.1 and other G. hirsutum genomes including ZM113.
Extended Data Fig. 6 Integrated analysis of 4 Mb inversion and centromere repositioning in D08 chromosome.
Shown left is the synteny plot of the chromosome D08 between ZM113 and other reference genomes. The inversion at 34–38 Mb divided the D08 locus into two haplotypes. GhSat194 arrays colocalize with the boundary of the inversion. Shown right are synteny plots between ZM113 with AD6 (classified to Hap1) and AD7 (classified to Hap2). CENH3 peaks were shown at the bottom.
Extended Data Fig. 7 IGV visualization of the ZM113 D08 chromosome showed log2(ChIP/input) signal tracks from CENH3 ChIP-seq experiments.
Top to bottom tracks are: 1-4. ZM113 ChIP-seq, GhCEN08 ___location, Ψ-GhD08CEN ___location, and repetitive sequence annotation; 5-7. ChIP-seq experiments on G. hirsutum (AD1) var TM-1 (ZJU), G. stephensii (AD7), G. ekmanianum (AD6), G. tomentosum (AD3), G. barbadense (AD2), G. darwinii (AD5), G. mustelinum (AD4), and G. raimondii (D5).
Extended Data Fig. 8 Comparative annotation and functional validation of the ZM113 private gene Ghir_D03G00738.
(a) IGV visualization of the TM-1 reference genome (version: CRI v1.0) region containing the unannotated Ghir_D03G00738 sequence. (b) IGV visualization of Ghir_D03G00738 annotation and transcriptome alignment in the ZM113 reference genome. (c) The VIGS results show that Ghir_D03G00738 regulates flowering time in upland cotton. Silenced plant phenotype images. (d) Silencing efficiency of Ghir_D03G00738 was detected by qRT-PCR. (e–g) Statistics of first fruiting node position, flowering time, and budding time.
Extended Data Fig. 9 PCR validation of the ZM113 private gene Ghir_D03G00738.
(a) Gene structure and PCR primer design. (b) Agarose gel image of the amplified fragments from both ZM113 and TM-1 materials. (c) Sequence alignment results of the amplified fragments. The unprocessed images of the gels are available in the source data file.
Extended Data Fig. 10 PCR amplification validation of D03_INV.
(a) Primer design scheme and sequences spanning the inversion breakpoint region. (b) Amplification results with Primer 1. (c) Amplification results with Primer 2. The unprocessed images of the gels are available in the source data file.
Supplementary information
Supplementary Information
Supplementary Notes 1 and 2 and Figs. 1–31.
Supplementary Tables 1–30
Supplementary Table 1. Comparison of reference genomes of G. hirsutum. Supplementary Table 2. Multi-platform sequencing data generated for genome assembly. Supplementary Table 3. Summary of the five gap regions closed. Supplementary Table 4. Statistics of 26 chromosomes with centromere and telomere locations. Supplementary Table 5. Assembly consensus quality assessment by Mercury. Supplementary Table 6. Assembly and gene annotation quality assessment by BUSCO. Supplementary Table 7. Repetitive elements in ZM113 genome assembly. Supplementary Table 8. Statistics of functional annotation of gene models in the ZM113 genome assembly. Supplementary Table 9. Statistics of transcriptome support for gene models. Supplementary Table 10. Syntenic orthologous analysis of ZM113 genes relative to parental diploids. Supplementary Table 11. Annotation summary of non-coding RNA. Supplementary Table 12. Genomic distribution of rDNA arrays. Supplementary Table 13. Assembly features of ZM113 organelle genomes. Supplementary Table 14. Gene profile and organization of the ZM113 mtochondrial genome. Supplementary Table 15. Gene profile and organization of the ZM113 chloroplast genome. Supplementary Table 16. Summary of multi-omics data of ChIP-seq, BS-seq, and RNA-seq. Supplementary Table 17. Centromere positions in ZM113 identified by mapping centromere-specific repeat sequences. Supplementary Table 18. Gene identification within and flanking centromeric regions. Supplementary Table 19. Characterization of ZM113 centormeric genes by HOG, pan-genomic annotation, and differential expression. Supplementary Table 20. Statistics of repetitive sequences in centromeric regions and genome-wide. Supplementary Table 21. Characterization of 2530 CEN-Tes based on percentage content per centromere. Supplementary Table 22. DNA methylation levels, genome-wide and in centromeres. Supplementary Table 23. Summary of structurals variants between ZM113 and 9 other G. hirsutum genome assemblies used for genome comparisons. Supplementary Table 24. Summary of non-redundant SVs against ZM113. Supplementary Table 25. Summary of coreDELs relative to ZM113. Supplementary Table 26. Summary of 28 genes located in coreDELs. Supplementary Table 27. Candiate genes contraining SNPs and elite alleles associated with lint percentage (LP) and Flowering day (FD). Supplementary Table 28. SNPs and elite alleles associated with lint percentage (LP) and Flowering day (FD) in 419 accessions. Supplementary Table 29. The cumulative number of elite SNPs associated with LP and FD in 419 accessions. Supplementary Table 30. Haplotype typing results of the D03 chromosome interval.
Source data
Source Data Fig. 4
The unprocessed gels in red box correspond to Fig. 4g,h.
Source Data Extended Data Fig. 9
The unprocessed gels in red box correspond to Extended Data Fig. 9c.
Source Data Extended Data Fig. 10
The unprocessed gels in red box correspond to Extended Data Fig. 10b,c.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, G., Wang, Z., Tian, Z. et al. A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation. Nat Genet 57, 1031–1043 (2025). https://doi.org/10.1038/s41588-025-02130-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-025-02130-4