Chromosome-level genome assembly of the smallscale yellowfin (Plagiognathops microlepis)

Liang, Yangyang; Liu, Huijuan; Lu, Wenxuan; Li, Jing; Fang, Ting; Gao, Na; Chen, Cheng; Zhao, Xiuxia; Yang, Kun; Liu, Haiyang

doi:10.1038/s41597-024-04105-2

Download PDF

Data Descriptor
Open access
Published: 15 November 2024

Chromosome-level genome assembly of the smallscale yellowfin (Plagiognathops microlepis)

Yangyang Liang¹,
Huijuan Liu^2,3,
Wenxuan Lu¹,
Jing Li¹,
Ting Fang¹,
Na Gao¹,
Cheng Chen¹,
Xiuxia Zhao¹,
Kun Yang¹ &
…
Haiyang Liu ORCID: orcid.org/0000-0001-8301-4595³

Scientific Data volume 11, Article number: 1234 (2024) Cite this article

1630 Accesses
Metrics details

Subjects

Abstract

The small-scale yellowfin (Plagiognathops microlepis) is a highly valued species in East Asian aquaculture due to its adaptability and high yield. However, the lack of genomic data has impeded genetic research and breeding efforts. In this study, we utilize PacBio Hifi long-read sequencing and Hi-C technologies to construct a highly detailed genome of P. microlepis at the chromosomal level. The assembly encompasses 976.41 Mb, with an exceptional 99.84% distribution across 24 chromosomes. Notably, the contig N50 was 34.41 Mb and scaffold N50 was 38.38 Mb. The completeness of the P. microlepis genome assembly is underscored by a BUSCO score of 98.08%. A total of 25,389 protein-coding genes were identified, with a BUSCO score of 96.98%, and 99.85% of these genes were functionally annotated. Synteny relationships at the chromosome level with Danio rerio and Chanodichthys erythropterus genomes uncover small-scale chromosomal rearrangements. This high-fidelity genome assembly serves as a pivotal resource for forthcoming endeavors such as the genome structure, functional elements, comparative genomics, and evolutionary characteristics of P. microlepis and its relative species.

Chromosome-level genome assembly of Plagiognathops microlepis based on PacBio HiFi and Hi-C sequencing

Article Open access 19 July 2024

Chromosome-level genome assembly of Triplophysa scleroptera

Article Open access 01 July 2025

A chromosome-level genome assembly of the redfin culter (Chanodichthys erythropterus)

Article Open access 01 September 2022

Background & Summary

The Plagiognathops microlepis, commonly known as smallscale yellowfin, is a benthic fish belonging to the subfamily Xenocyprinae within the family Cyprinidae, the largest family of freshwater fishes^1,2,3. P. microlepis is the sole species of the genus Plagiognathops and is widely distributed in lakes and rivers across East Asia⁴. Cultivated since the 1970s, it is highly regarded for its delicious taste⁵. P. microlepis possesses distinctive characteristics such as medium body size, strong adaptability, rapid growth, high reproductive ability, and low susceptibility to breeding diseases, making it economically valuable⁶. The successful implementation of artificial breeding techniques has elevated P. microlepis to a crucial source of animal protein for local people in recent years⁷. Its delectable and tender flesh has garnered popularity among consumers, resulting in a high commercial value³. It has been included in the list of key protected economic aquatic animal and plant resources for China (first batch).

In its natural habitat, P. microlepis primarily uses its keratinized jaw to scrape humus, organic debris, fixed algae and higher plant debris from the bottom of water bodies¹. Therefore, P. microlepis can effectively clear organic matter and algae from the water, which helps improve water quality and mitigate eutrophication processes. It can form a population in shallow lakes or reservoirs with abundant humus and plays an important role in maintaining local ecological balance^8,9. Consequently, P. microlepis is recognized as an environmentally beneficial fish and is often used in maintaining aquatic ecological systems¹⁰. Furthermore, its population distribution across distinct and isolated basins makes it suitable for comparative genetic investigations^11,12. Despite its ecological and commercial significance, the population size of P. microlepis has declined in the wild in recent years^13,14. Market demand has driven the growing interest in P. microlepis aquaculture over the past decades, as wild populations are threatened by overfishing and water pollution. However, despite the increasing interest, there is a need for reported genome resources for P. microlepis and other fishes in the subfamily Xenocyprinae, significantly impeding research on their phylogeny and evolution biology.

Decoding a high-quality reference genome could provide valuable insights into conservation biology, genetics, and breeding studies for the species¹⁵. Previous studies on P. microlepis have primarily focused on its biological and physiological characteristics, reproductive behavior, age and growth patterns, dietary habits, muscle composition, population genetics, breeding strategies and nutritional requirements¹⁶. Additionally, mitochondrial cytochrome b and microsatellite markers have been used to assess their genetic structure, while isozymes have been used to identify their germplasm^3,16. Whole-genome sequencing and analysis of P. microlepis genomic data offer a foundational framework for investigating species diversity, population dynamics, and developing strategic conservation measures, aquaculture, and breeding practices.

This study presents the first high-quality chromosome-level genome assembly of P. microlepis, achieved using PacBio HiFi and Hi-C sequencing methodologies. The assembly yielded a 976.41 Mb genome with a contig N50 length of 34.41 Mb and a scaffold N50 of 38.38 Mb. Using Hi-C data, 99.84% of the assembled bases were successfully anchored to 24 chromosomes, resulting in a high-quality reference genome. This achievement provides a solid foundation for future population and conservation genetic studies. Moreover, the chromosome-scale genome of P. microlepis will contribute to comparative genomic studies on the genome evolution of the Xenocyprinae subfamily and beyond.

Methods

Sample collection and DNA extraction

A mature male P. microlepis was obtained from the Fisheries Research Institute, Anhui Academy of Agricultural Sciences. Muscle tissue from this specimen was used for DNA extraction and subsequent whole-genome sequencing, including both short-read and long-read sequencing, as well as Hi-C sequencing. All experiments adhered to the recommendations of the Ethics Committee of the Fisheries Research Institute, Anhui Academy of Agricultural Sciences. Genomic DNA was extracted from the muscle tissues using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, USA) following the manufacturer’s instructions. The quality and concentration of DNA were assessed with a NanoDrop One spectrophotometer (Thermo Scientific, USA) and 1% agarose gel electrophoresis.

Genome sequencing

The extracted DNA was randomly sheared into approximately 350 bp fragments, and a short fragment library was constructed using the MGIEasy PCR-Free DNA library prep set (BGI, China). Sequencing was conducted on the DNBSEQ-T7 platform (MGI, China), yielding a total of 59.19 Gb of paired-end raw reads, each 150 bp in length (Table 1). For PacBio sequencing, a SMRTbell Express Template Prep Kit 2.0 was used for HiFi sequencing based on PacBio’s standard protocol (Pacific Biosciences, USA), and a long-read library was constructed using the PacBio Sequel II system. This process generated 35.65 Gb of raw PacBio continuous long reads (HiFi) with an average length of 15.91 kb (Table 1). For Hi-C sequencing, approximately 1 g of muscle tissue from the male P. microlepis was used. The GrandOmics Hi-C kit (with DpnII as the restriction enzyme; GrandOmics, China) was used according to the manufacturer’s protocol. The Hi-C library was sequenced on the DNBSEQ-T7 platform (MGI, China), producing a total of 114.16 Gb of Hi-C read data for accessory chromosomes (Table 1).

Table 1 Statistics of sequencing data.

Full size table

RNA extraction and transcriptome sequencing

To facilitate genome annotation, total RNA was extracted from a total of 11 tissues, including the spleen, kidney, brain, muscle, testis, liver, heart, intestine, skin, etc. The quality of the RNA was assessed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, USA) and an Agilent 2100 Bioanalyzer (Agilent Technologies, USA). A composite RNA sample was then used to construct a cDNA library with the MGIEasy library prep set (BGI, China), following the manufacturer’s instructions. The library was sequenced on the DNBSEQ-T7 platform using a paired-end 150 bp layout, yielding 50.60 Gb of transcriptome data to assist in genome and gene annotations (Table 1).

Genome size and heterozygosity estimation

The genome size of P. microlepis was estimated through a k-mer analysis using MGI clean reads. Initially, Jellyfish (v2.3.0)¹⁷ was used to calculate the frequency of 17-mers and generate a k-mer frequency table. Subsequently, GenomeScope (v2.0)¹⁸ analyzed the 17-mer frequency table, identifying a total of 839,321,169 17-mers, with 0.522% heterozygosity and 46.9% unique sequences (Fig. 1).

Genome assembly

The genome assembly employed the default parameters of Hifiasm (v0.19.5)¹⁹. HiFi long reads were used as input for Hifiasm to construct primary assembly contig graphs. Overlaps were precomputed and subsequently recalculated from the corrected reads. Haploid duplications were removed using Hifiasm, followed by three rounds of error correction. The assembled graph yielded 99 contigs with a total length of 976.41 Mb, a maximum contig size of 57.12 Mb, and an N50 of 34.41 Mb (Table 2).

Table 2 Summary statistics of P. microlepis genome assembly.

Full size table

Juicer (v1.6)²⁰ combined with 3D-DNA (v201008)²¹ was used for scaffolding. Initially, BWA (v0.7.17)²² was used to index the contig-level genome, and Juicer was used to create restriction enzyme cutting sites. Clean Hi-C (paired-end) reads were mapped to the contigs using Juicer, and Hi-C-assisted initial chromosome assembly was conducted using the 3D-DNA algorithm with the standard procedure. Chromosome boundaries were adjusted, and scaffolds were corrected using the Juicerbox (v1.11.08)²³ module, resulting in the resolution of 24 chromosomes (Figs. 2, 3). The results from the modified file output by Juicebox were revised and used as input for 3D-DNA to conduct re-scaffolding on a per-chromosome basis. The “FINAL” assembly consisted of a total of 67 scaffolds, with a maximum scaffold size of 57.81 Mb and an N50 size of 38.38 Mb (Tables 2, 3).

Table 3 Pseudo-chromosome length statistics after Hi-C assisted assembly of P. microlepis.

Full size table

Repeat annotation

Recognizing the importance of tandem repeats, we used the software tools GMATA (v2.2.1)²⁴ and Tandem Repeats Finder (TRF, v4.10.0)²⁵ to conduct a genome-wide search for tandem repeat sequences with default parameters. GMATA specializes in detecting SSRs with shorter repeat units, while TRF identifies tandem repeats encompassing all types of repeat units. Subsequently, we analyzed the dispersed repetitive sequences. Initially, MITE-hunter²⁶ was employed to detect a small transposon known as MITE within the genome, constructing a MITE library file. Following this, a hard mask operation was executed on the genome, marking repeated sequences as “N,” and RepeatModeler²⁷ was used to conduct a de novo search for repeated sequences to form a de novo library file (RepMod.lib). Given the presence of numerous unknown repeated sequences in RepMod.lib, TEclass²⁸ was employed for classification. Lastly, the MITE.lib, RepMod.lib, and Repbase²⁹ libraries were integrated to form a general library file. This comprehensive library file was then used with RepeatMasker³⁰ to search for repeated sequences throughout the entire genome.

The results indicate that dispersed repetitive sequences account for 45.18% of the total genome length (Table 4). Among transposable elements (TEs), DNA elements are the most prevalent type, accounting for 33.13% of the genome, followed by long terminal repeat (LTR) retrotransposons at 6.86%, long interspersed nuclear elements (LINEs) at 3.55%, and short interspersed nuclear elements (SINEs) at 0.26% of the genome. Ultimately, a total of 499,767,501 base pairs of repetitive sequences were identified, comprising 51.18% of the entire genome (Table 4, Fig. 4).

Table 4 Repetitive sequences in the genome of P. microlepis.

Full size table

Gene prediction and function assignment

Gene structure prediction was conducted using three methodologies: homology-based, transcriptome-based, and ab initio annotations. For the homology-based prediction, we employed GEMOMA (v1.6.1)³¹ to compare homologous proteins from five related species (Danio rerio, Carassius auratus, Cyprinus carpio, Megalobrama amblycephala, Sinocyclocheilus grahami) with our assembled genome. Transcriptome sequence annotation was performed with PASA (v2.3.3)³² to obtain gene information, which was then used in semi-supervised self-training of gene models with GeneMark-ST³³ software for prediction. The predicted genes were compared with the SwissProt Database³⁴ using Blastp, and alignment results were filtered based on identity (identity ≥ 95%). The top 3,000 genes with the highest alignment scores from GeneMark-ST were chosen as the training set for AUGUSTUS³⁵ model training. Subsequently, AUGUSTUS was employed to predict genes within the genome using the developed predictive model.

The predicted gene set information files from ab initio prediction, homology-based annotation, and transcriptome prediction were converted into a file format compatible with EVM³². These files were integrated using the EVM software with default parameters, resulting in an initial non-redundant gene set. In total, our predictions identified 25,389 genes in the genome, with an average gene length of 17,691.90 bp, an average exon sequence length of 178.06 bp, and an average of 9.90 exons per gene (Table 5, Fig. 5).

Table 5 Gene structures and function annotation of P. microlepis.

Full size table

Phylogenetic tree

We searched the NCBI database for the following species: Lepisosteus oculatus, Takifugu rubripes, Oryzias latipes, Astyanax mexicanus, Misgurnus anguillicaudatus, Danio rerio, Cyprinus carpio, Puntigrus tetrazona, Gymnocypris eckloni, Onychostoma macrolepis, Cirrhinus molitorella, Ctenopharyngodon idella, Chanodichthys erythropterus, Megalobrama amblycephala, Hypophthalmichthys nobilis. Gene information for P. microlepis and these fifteen species was annotated and filtered. Subsequent gene family identification was carried out using OrthoFinder (v2.5.4)³⁶ with default parameters. Firstly, pairwise comparisons (E-value ≤ 1e-5) of protein sequence datasets from all studied species were performed using BLASTP (v2.2.26)³⁷ software to obtain similarity information between protein sequences. Secondly, genes were clustered to identify direct homologous genes, paralogous genes, and single-copy homologous genes using the MCL algorithm. Based on the single-copy immediate homologous genes identified in the family identification results, MAFFT³⁸ software was utilized to generate a protein multiple sequence alignment dataset (protein-MSA). Then, the PAL2NAL tool was employed to perform protein sequence comparison, converting it into corresponding codon-based sequences (CDS), i.e., CDS-MSA. Finally, a species tree was constructed using the GTRGAMMA model in RAXML³⁹, with Lepisosteus oculatus designated as the out group, and further edited using Figtree. Additionally, the MCMCtree tool in the PAML (v4.10.6)⁴⁰ package was used to estimate divergence times among the fifteen selected fish species. The phylogenetic tree revealed that the closest species to P. microlepis are Chanodichthys erythropterus, the ancestor of P. microlepis separated from the ancestor of C. erythropterus approximately 8.8 million years ago. The ancestor of P. microlepis separated from the ancestor of Cyprinus carpio around 33.4 million years (Fig. 6).

Genome synteny analysis

To compare whole genome synteny, two chromosome-level genomes, Danio rerio⁴¹ and Chanodichthys erythropterus⁴², were aligned with the genome assembly of P. microlepis using MCscan⁴³ with default parameters. Syntenic relationships were plotted using the JCVI⁴⁴ package. The results indicate that chromosome 1 of P. microlepis corresponds to chromosomes 10 and 22 of D. rerio, while the remaining 23 chromosomes exhibit a one-to-one correspondence. P. microlepis and C. erythropterus displayed a high level of conserved synteny, with a nearly perfect one-to-one correspondence between their chromosomes (Fig. 7). Some minor segmental exchanges were observed between P. microlepis and D. rerio, as well as C. erythropterus (Fig. 7). Nevertheless, the chromosome systems and homologies of these species remained largely unchanged, indicating a relatively conserved synteny.

Data Records

All the raw sequencing data utilized in this study, including WGS, Hifi, Hi-C and RNA-Seq have been deposited in the NCBI database under the BioProject accession number PRJNA1115916⁴⁵. The genome assembly is available for public access at the NCBI GenBank under the accession number GCA_040144785.1⁴⁶. Genome annotations, predicted coding sequences and protein sequences can be accessed through the Figshare⁴⁷.

Technical Validation

Assessment of genome assembly

The accuracy of the P. microlepis genome assembly was assessed by evaluating its completeness using the conserved metazoan gene set ‘actinopterygii_odb10’ from BUSCO⁴⁸ (v5.4.3). The analysis demonstrated a high level of completeness, with an overall completeness of 98.08%. Specifically, 96.79% genes were complete and single-copy, 1.3% genes were complete and duplicated, 0.55% genes were fragmented and genes 1.37% were missing. These results indicate that the P. microlepis genome assembly is of high quality (Table 6).

Table 6 BUSCO analysis of the genome assembly and genes.

Full size table

Gene annotation validation

To assess the integrity of the annotated gene set, a BUSCO analysis was conducted using conserved single-copy homologous genes from the actinopterygii_odb10 library. The results showed that approximately 96.98% of the complete gene elements are present in the annotated gene set, indicating a high level of completeness in the conserved gene predictions. Specifically, 95.60% of the genes were complete and single-copy BUSCOs, with only 0.30% fragmented and 2.72% missing from the assembly (Table 7). These findings underscore the remarkable integrity and conservation of gene content in the dace genome assembly, resulting in highly reliable prediction outcomes.

Table 7 BUSCO analysis of the genome annotation and genes.

Full size table

Code availability

No special codes or scripts were used in this work, and data processing was carried out based on the protocols and manuals of the corresponding bioinformatics software.

References

Xiao, W., Zhang, Y. & Liu, H. Molecular systematics of Xenocyprinae (Teleostei: Cyprinidae): taxonomy, biogeography, and coevolution of a special group restricted in East Asia. Mol. Phylogenet. Evol. 18, 163–173 (2001).
Article CAS PubMed Google Scholar
Chen, Y. Y. Fauna Sinica Osteichthyes: Cypriniforme. (Science Press, 1998).
Zeng, C., Luo, W., Liu, X. L., Wang, W. M. & Gao, Z. X. Isolation and characterization of 32 polymorphic microsatellites for Xenocypris microlepis. Conserv. Genet. Resour. 3, 479–481 (2011).
Article Google Scholar
Hu, Y. X., Zhou, Q., Song, Y., Chen, D. Q. & Li, Y. Complete mitochondrial genome of the smallscale yellowfin, Plagiognathops microlepis (Teleostei: Cypriniformes: Cyprinidae). Mitochondrial DNA 26, 463–464 (2015).
Article CAS PubMed Google Scholar
Zhang, T. et al. Electrophoretic analysis for two kinds of isozymes expressed in 5 tissues of Xenocypris microlepis. Chinese Fishery Quality and Standards 9, 10–17 (2019). (in Chinese).
Google Scholar
Peng, X., Zhao, L., Liu, J., Guo, X. & Ding, Y. Comparative transcriptome analyses of the liver between Xenocypris microlepis and Xenocypris davidi under low copper exposure. Aquat. Toxicol. 236, 105850 (2021).
Article CAS PubMed Google Scholar
Du, C., Han, Y. L., Shi, Y. X. & Zhu, J. Q. Cryopreservation of Plagiognathops microlepis sperm. Cryobiology 85, 105–112 (2018).
Article CAS PubMed Google Scholar
Guo, Y. et al. Effect of Xenocypris microlepis on feces and microcystis activity from microcystis-dietary silver carp and bighead carp. China Environ. Sci. 36, 3784–3792 (2016). (in Chinese).
CAS Google Scholar
Yimin, Z. et al. Purification effect on eutrophic water body with combination of Xenocyprinae and Hyriopsis cumingii under different hydrodynamic conditions. Chin. J. Environ. Eng. 9, 1109–1116 (2015). (in Chinese).
Google Scholar
Wu, H., Zhang, Y. M., Zhou, C. & Gao, Y. X. Influence of temperature on physiological characteristics of Plagiognathops microlepis Bleeker. Environ. Eng. 33, 23–27 (2015). (in Chinese).
Google Scholar
Zhang, Z., Li, J., Zhang, X., Lin, B. & Chen, J. Comparative mitogenomes provide new insights into phylogeny and taxonomy of the subfamily Xenocyprinae (Cypriniformes: Cyprinidae). Front. Genet. 13, 966633 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tang, K. L. et al. Limits and phylogenetic relationships of East Asian fishes in the subfamily Oxygastrinae (Teleostei: Cypriniformes: Cyprinidae). Zootaxa 3681, 101–135 (2013).
Article PubMed Google Scholar
Lu, B., Ling, S. G., Wang, X. B. & Yin, Z. D. Conservation of the original genetic resources of Plagiognathops microlepis in the Dragon Nest Lake. Anhui Agri. Sci. Bull. 9, 118–118 (2003). (in Chinese).
Google Scholar
Wang, Y. D., Xion, B. X., Ma, X. F., Hu, Q. S. & Wang, X. B. Age and growth of Xenocypris microlepis in Daoguanhe Reservoir. J. Huazhong Agric. Univ. 23, 241–243 (2004). (in Chinese).
Google Scholar
Chen, L. et al. Chromosome-level assembly and gene annotation of Decapterus maruadsi genome using Nanopore and Hi-C technologies. Sci. Data. 11, 69 (2024).
Article CAS PubMed PubMed Central Google Scholar
Qiao, D. L., He, X. M., Wei, C. B., Yang, L. & Hu, L. Genetic variation of mtDNA Cytb gene in three populations of Plagiognathops microlepis. Freshwater Fisheries 23, 813–820 (2011). (in Chinese).
CAS Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article PubMed PubMed Central Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst 3, 99–101 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. & Wang, L. GMATA: an integrated software package for genome-scale SSR mining, marker development and viewing. Front Plant Sci 7, 215951 (2016).
Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res 38, 199 (2010).
Article Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci.USA 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Abrusán, G., Grundmann, N., DeMester, L. & Makalowski, W. TEclass—a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).
Article PubMed Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Article CAS PubMed Google Scholar
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res 44, 81–89 (2016).
Article Google Scholar
Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: homology-based gene prediction utilizing intron position conservation and RNA-seq data. In: Kollmar, M. (eds) Gene Prediction. Methods in Molecular Biology. (Springer Press, 2019).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9, 1–22 (2008).
Article Google Scholar
Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res 43, e78 (2015).
Article PubMed PubMed Central Google Scholar
Bairoch, A. et al. The universal protein resource (UniProt). Nucleic Acids Res 33, D154–D159 (2005).
Article CAS PubMed Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439 (2006).
Article CAS PubMed PubMed Central Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 1–14 (2019).
Article Google Scholar
Mount, D. W. Using the basic local alignment search tool (BLAST). Cold Spring Harbor Protocols. 2007, pdb. top17, https://doi.org/10.1101/pdb.top17 (2007).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Article CAS PubMed PubMed Central Google Scholar
Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).
Article CAS PubMed Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
Howe, K. et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature 496, 498–503 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhao, S. et al. A chromosome-level genome assembly of the redfin culter (Chanodichthys Erythropterus). Sci. Data. 9, 535 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).
Article ADS CAS PubMed Google Scholar
Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. iMeta, e211 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP509987 (2024).
NCBI Genbank https://identifiers.org/ncbi/insdc.gca:GCA_040256225.1 (2024).
Liu, H. Chromosome-level genome assembly of the smallscale yellowfin (Plagiognathops microlepis). Figshare https://doi.org/10.6084/m9.figshare.26061307 (2024).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar

Download references

Acknowledgements

We acknowledge financial support from the National Modern Agriculture Industry Technology System Special Project (CARS-46), Special Fund for Anhui Agriculture Research System (2021-711), Central Public-interest Scientific Institution Basal Research Fund, CAFS (2023TD37), National Freshwater Genetic Resource Center (FGRC18537), and Guangdong Rural Revitalization Strategy Special Provincial Organization and Implementation Project Funds (2022-SBH-00-001).

Author information

Authors and Affiliations

Key Laboratory of Freshwater Aquaculture and Enhancement of Anhui Province, Fisheries Research Institute, Anhui Academy of Agricultural Sciences, Hefei, 230001, Anhui, China
Yangyang Liang, Wenxuan Lu, Jing Li, Ting Fang, Na Gao, Cheng Chen, Xiuxia Zhao & Kun Yang
School of Marine Sciences, Ningbo University, Ningbo, 315211, China
Huijuan Liu
Key Laboratory of Tropical and Subtropical Fishery Resources Application and Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou, 510380, China
Huijuan Liu & Haiyang Liu

Authors

Yangyang Liang
View author publications
Search author on:PubMed Google Scholar
Huijuan Liu
View author publications
Search author on:PubMed Google Scholar
Wenxuan Lu
View author publications
Search author on:PubMed Google Scholar
Jing Li
View author publications
Search author on:PubMed Google Scholar
Ting Fang
View author publications
Search author on:PubMed Google Scholar
Na Gao
View author publications
Search author on:PubMed Google Scholar
Cheng Chen
View author publications
Search author on:PubMed Google Scholar
Xiuxia Zhao
View author publications
Search author on:PubMed Google Scholar
Kun Yang
View author publications
Search author on:PubMed Google Scholar
Haiyang Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

H.L. and Y.L. designed the study and led the research, X.Z. and K.Y. contribute to the materials of this study, J.L., N.G., T.F. and C.C. analyzed and uploaded the data, W.L. and H.L. contribute to the genome assembly and annotation, Y.L., W.L. and H.L. wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Haiyang Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liang, Y., Liu, H., Lu, W. et al. Chromosome-level genome assembly of the smallscale yellowfin (Plagiognathops microlepis). Sci Data 11, 1234 (2024). https://doi.org/10.1038/s41597-024-04105-2

Download citation

Received: 20 June 2024
Accepted: 07 November 2024
Published: 15 November 2024
DOI: https://doi.org/10.1038/s41597-024-04105-2