Abstract
The Iberian Peninsula, in southwestern Europe, is home to a distinctive freshwater fish fauna, predominantly composed of endemic species. This is a consequence of the prolonged isolation from western Europe caused by the Pyrenees, the diverse geological and climatic gradients, and the isolation of river basins. Freshwater and diadromous fishes have diversified in the Iberian Peninsula and include 72 currently recognized native species, 50 of which are endemic to the region. Habitat loss and degradation, the introduction of invasive species, and climate change have placed Iberian freshwater and diadromous fishes among the most threatened groups of vertebrates, with some species on the brink of extinction. Here, we present 60 new complete mitochondrial genome assemblies out of the 109 freshwater and diadromous fish species found in the Iberian Peninsula, including the mitogenomes of 37 endemics. These resources are crucial for characterising the mitochondrial evolution of species, reconstructing phylogeny and paleogeography, advancing species identification, delineation, and monitoring, and ultimately supporting conservation planning.
Similar content being viewed by others
Background & Summary
Vertebrate mitochondrial genomes (or mitogenomes) represent the maternal evolutionary lineages, and their gene content and order are generally highly conserved across taxa1. They evolve at a relatively constant rate, making whole mitogenome sequences valuable for understanding evolutionary and demographic histories, phylogenetic relationships, and divergence times of non-model species2. Whole mitochondrial genome information can also be employed to inform phylogeography and conservation genetics (e.g.3). The analysis of sequence variation in mitogenomes allows for the distinction of lineages, populations, evolutionarily significant units, cryptic species, and the drivers of speciation. This, in turn, facilitates the identification of priority areas for conservation and the design of strategies to maintain genetic diversity and resilience in natural populations. Furthermore, mitochondrial genomes can provide environmental plasticity, thus allowing for species adaptation and colonization into new habitats4,5.
Mitochondrial sequences are commonly used as molecular markers for species identification, referred to as molecular barcodes, which are especially useful when morphological identification is challenging or ambiguous6. These reference sequences are increasingly important for metagenomics and metabarcoding studies that aim to identify multiple taxa from a mixture of DNA samples. For example, they can be used for assessing biodiversity from DNA present in the environment, such as air, water or soil samples (e.g.7,8,9) or identifying prey items in gut or scat samples10,11,12. To ensure successful and accurate identification, it is crucial to have curated reference sequence databases that cover the diversity of the target taxa for these non-invasive methodologies. Therefore, genomic resources, such as whole mitogenomes, of non-model species are highly important.
The Iberian Peninsula, situated in southwestern Europe, is home to a variety of freshwater ecosystems, including rivers, streams, lakes, and wetlands13. Since the rise of the Pyrenees, approximately 100–150 million years ago, the region has been isolated from the rest of Europe, which has resulted in the evolution of unique species14. The region’s diverse topography and climate, in conjunction with the isolation of river basins, functioned as natural barriers to fish dispersal and gene flow, thereby contributing to further speciation events15. The combined effects of isolation and selective pressures have promoted species diversity and high levels of endemism, resulting in several species being restricted to specific Iberian river ecosystems or basins15,16,17.
The Iberian freshwaters have suffered significant degradation due to various pressures, including alterations caused by dams and other infrastructures (e.g. channels and weirs), pollution, eutrophication, biological invasions, and water over-extraction18. Coupled with the ongoing aridification of the Peninsula, this has led to the decline of most freshwater taxa, including fishes17,19. The genomic resources generated here provide more accurate species identification and assist the use of molecular tools, such as eDNA, for more efficient systematic monitoring. This enables a more comprehensive understanding of population trends and the assessment of conservation status, thereby informing conservation management and policies20.
Although some species already have publicly available mitogenome sequences, these reflect only 43% of the total number of species occurring in Iberia, including both native and non-native species. Moreover, there was a pronounced bias towards non-native species, with only 15% of the native Iberian species having a public mitogenome assembly.
This study presents new reference mitogenomes for 60 (55%) of all 109 freshwater and diadromous fish species known to occur in the Iberian Peninsula, in addition to the 35 already publicly available. Of the new mitogenomes, 50 are from native Iberian species, which, when combined with the 10 already published, represent 83% of the total native freshwater fish fauna (72 species in total). These mitogenomes represent a fundamental resource for future research in phylogenetics, phylogeography and population genetics. Furthermore, the data will facilitate the development of PCR primers and probes for environmental DNA surveys and species monitoring, as well as molecular identification from predator diets and other metabarcoding studies.
Methods
DNA extraction, library construction, and sequencing
Total genomic DNA was extracted from fin clips with the QIAmp DNA Micro kit (QIAGEN) following the manufacturer’s protocol. Vouchered specimens are available to a subset of samples (Species21). DNA quantity was assessed with a Qubit fluorometer with the dsDNA BR Assay Kit (Thermo Fisher Scientific, USA). Illumina libraries were constructed using two different methodologies (Metrics21). Samples from subset A were sheared to an average size of 350 bps using Bioruptor Pico (Diagenode, USA), and Illumina’s TruSeq Nano kit was used to construct libraries. These were quantified using qPCR (Kapa Library Quantification Kits compatible with Illumina platforms) and pooled equimolar to be sequenced, targeting at least 2 Gbps per sample. Libraries were sequenced with 150 bps (PE) on an Illumina platform (Novaseq and HiseqX). Samples from subset B were sent for shotgun sequencing at the Norwegian Sequencing Centre, Oslo, Norway. Library preparation followed the Illumina DNA Prep Tagmentation Kit (Illumina, San Diego, California, USA). Samples were sequenced by producing 150 bp paired-end reads with an expected depth of 20x per sample obtained by two runs using a quarter of a flow-cell of the Illumina NovaSeq S4 platform (expected throughput of 800 Gbp each) and a partial run of the same platform (100 Gbp).
Mitochondrial genome assembly and annotation
Read quality was evaluated with FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and adapters were removed and quality trimmed with Trimmomatic v0.3922 with the following parameters, LEADING:3 TRAILING:15 SLIDINGWINDOW:4:15 MINLEN = 30. Mitochondrial genomes were assembled using NOVOPlasty v4.3.123, and if a circular assembly could not be obtained, GetOrganelle v1.7.6.124 was used. Protein-coding genes and tRNAs were annotated for all mitogenomes using MITOS2 v2.0.825 and tRNAscan-SE v2.0.926, respectively (Annotations_Mitogenomes21). MITOS2 was run with default parameters except for evalue = 15, fragovl = 0, finovl = 10, and using refseq89 m as reference. Publicly available mitogenomes of Iberian freshwater species were retrieved from NCBI and included for further analysis. Mitogenomes were aligned at the order level using the MAFFT version implemented in Geneious Pro v.10.2.6 under default settings to confirm annotations27.
Mitogenome phylogeny
All mitochondrial protein-coding genes (PCGs) and both ribosomal regions from all species were extracted and realigned with MAFFT v7.45327. Alignments for each region were filtered and trimmed using Gblocks v0.91b28 and then concatenated into a single dataset. Phylogenetic analyses were performed with IQ-TREE229, with the appropriate evolutionary model inferred for each gene, using 10,000 bootstraps to confirm the phylogenetic relationship between species. We used the individuals from the family Petromyzontidae as outgroup (Petromyzon marinus: PMU11880; Lampetra alavariensis: MT34; Lampetra auremensis: Aur19-OL-20; Lampetra fluviatilis: 9505; Lampetra lusitanica: MT32; Lampetra planeri: Plan19-long9). This analysis was performed using the gene2phylo wrapper30.
Data Records
The reference data for this collection includes the following information: (1) Sample Code; (2) Species; (3) georeferenced data (latitude and longitude in decimal degrees) for each specimen; (4) sampling date; (5) mitogenome for each specimen; (6) existence of voucher for each specimen; (7) SRR accession code; and (8) assembled mitogenome NCBI accession code. The raw reads sequencing outputs were deposited at the NCBI Sequence Read Archive under SRP511741 (2024)31 and SRP433534 (2023)32. The assembled mitogenomes and annotations were deposited in NCBI (PP928724-PP928783) under BioProject PRJNA119205733. Cytochrome oxidase I (COI) gene sequences were deposited in BOLD (Ref: IBFIS). All data associated with this study is hosted at Figshare21.
Technical Validation
All specimens were identified by experts and further validated based on COI and/or Cytochrome b (Cyt b) queried against the BOLD and NCBI databases, respectively. An identification was deemed correct if the percentage of identity was higher than 99%. The mean coverage of each mitogenome was 407 reads per base, with the lowest coverage observed in Luciobarbus microcephalus at 15 and the highest in Lampetra fluviatilis at 2622. Except for the lampreys and Alosa fallax, all 13 PCGs, 2 rRNAs, and tRNAs were automatically annotated using the previously mentioned software. The publicly available annotated mitogenomes were used as references for the species in which the annotation failed, and gene positions were compared (Alosa alosa: NC_009575, and Lampetra fluviatilis: Y18683).
The mean mitogenome sequence length varies across families, between 16,077 bps (Pleuronectidae) and 16,798 bps (Mugilidae). The average GC content in our dataset is 44.3%, with variability between families. The lowest average GC content is observed in Petromyzontidae (38.4%), while the highest is observed in Atherinidae (49.4%), which is similar to other fish species (Table 134). Despite some variance in PCGs lengths across the dataset, their sizes are comparable to those belonging to closely related species. Thus, the observed differences in mitochondrial genome length are mostly attributed to variation in intergenic regions (Table 2). The majority of species exhibits a gene order analogous to that observed in most vertebrates, whereas the Petromyzontidae displays its characteristic gene order, with the control region located between the ND6 and Cyt b (Fig. 1)35. The maximum likelihood tree reconstructed with IQ-TREE used the model GTR for the combined gene set (13 PCG + 2 rRNAs). The Petromyzontidae family was selected as the outgroup, as it is recognised to be a more basal clade36. All genera represented by multiple species form monophyletic groups within each family (Fig. 2), and the same was found for higher taxonomic levels, such as family and order.
Representation of the mitochondrial arrangement found in species belonging to the class Actinopteri (a) and the rearrangement typical of Petromyzontida (b). The purple shaded box highlights the rearrangement region. CR corresponds to Control Region and all tRNA coding genes are represented by the one-letter code for the corresponding amino acid.
Maximum likelihood tree constructed using IQ-TREE2 with mitogenomes of 95 (87%) fishes occurring in the freshwaters of the Iberian Peninsula mainland. Genera in green represent native groups, while blue represents non-native groups. Collapsed genera with * include species with both statuses and are coloured according to a majority rule. Bold names represent groups with new mitogenomes. Node bootstrap values are shown as follows: black circles: >99%; dark-grey: 95%-99%; blue: 75%-95%; white: 60–75%. Nodes below 60% are not shown.
Although some mitochondrial genomes remain to be sequenced for a few species, further research is ongoing to address this knowledge gap. For example, Anaecypris hispanica, which is endemic to the region, is included in the ERGA (European Reference Genome Atlas) project (www.erga-biodiversity.eu). The remaining species still lacking mitochondrial genomes belong to genera for which new data are now available, thus representing a lower fraction of the whole genetic diversity of freshwater and diadromous fish in the Iberian Peninsula.
Code availability
All software with their respective versions and parameters used for producing the mitogenomes assembly, annotation, and phylogenetic tree are listed in the methods section. Software programs with no parameters associated were used with the default settings. No custom code was used for the curation and/or validation of the dataset.
References
Kolesnikov, A. A. & Gerasimov, E. S. Diversity of Mitochondrial Genome Organization. Biochemistry Moscow 77, 1424–1435 (2012).
Montaña-Lozano, P. et al. Comparative genomic analysis of vertebrate mitochondrial reveals a differential of rearrangements rate between taxonomic class. Sci Rep 12, 5479 (2022).
Hirase, S., Takeshima, H., Nishida, M. & Iwasaki, W. Parallel Mitogenome Sequencing Alleviates Random Rooting Effect in Phylogeography. Genome Biology and Evolution 8, 1267–1278 (2016).
Sebastian, W., Sukumaran, S. & Gopalakrishnan, A. Comparative mitogenomics of Clupeoid fish provides insights into the adaptive evolution of mitochondrial oxidative phosphorylation (OXPHOS) genes and codon usage in the heterogeneous habitats. Heredity 128, 236–249 (2022).
Teacher, A. G., André, C., Merilä, J. & Wheat, C. W. Whole mitochondrial genome scan for population structure and selection in the Atlantic herring. BMC Evolutionary Biology 12, 248 (2012).
Hinsinger, D. et al. Fishing for barcodes in the Torrent: from COI to complete mitogenomes on NGS platforms. DNA Barcodes 3, 170–186 (2015).
Egeter, B. et al. Speeding up the detection of invasive bivalve species using environmental DNA: A Nanopore and Illumina sequencing comparison. Molecular Ecology Resources 22, 2232–2247 (2022).
Lopes-Lima, M. et al. Rapid eDNA survey reveals a unique biodiversity hotspot: The Corubal River, West Africa. BioScience 74, 405–412 (2024).
Ryan, E., Bateman, P., Fernandes, K., van der Heyde, M. & Nevill, P. eDNA metabarcoding of log hollow sediments and soils highlights the importance of substrate type, frequency of sampling and animal size, for vertebrate species detection. Environmental DNA 4, 940–953 (2022).
Mata, V. A. et al. How much is enough? Effects of technical and biological replication on metabarcoding dietary analysis. Molecular Ecology 28, 165–175 (2019).
Mata, V. A. et al. Combining DNA metabarcoding and ecological networks to inform conservation biocontrol by small vertebrate predators. Ecological Applications 31, e02457 (2021).
Nogueira, S. et al. DNA metabarcoding reveals the diet of the invasive fish Oreochromis mossambicus in mangroves of São Tomé Island (Gulf of Guinea). Biol Invasions 26, 17–23 (2024).
Alcaraz, D., Paruelo, J. & Cabello, J. Identification of Current Ecosystem Functional Types in the Iberian Peninsula. Global Ecology and Biogeography 15, 200–212 (2006).
Filipe, A. F., Araújo, M. B., Doadrio, I., Angermeier, P. L. & Collares-Pereira, M. J. Biogeography of Iberian freshwater fishes revisited: the roles of historical versus contemporary constraints. Journal of Biogeography 36, 2096–2110 (2009).
Gante, H. F., Doadrio, I., Alves, M. J. & Dowling, T. E. Semi-permeable species boundaries in Iberian barbels (Barbus and Luciobarbus, Cyprinidae). BMC Evolutionary Biology 15, 111 (2015).
Maceda-Veiga, A. Towards the conservation of freshwater fish: Iberian Rivers as an example of threats and management practices. Rev Fish Biol Fisheries 23, 1–22 (2013).
Magalhães, M. F. et al. Livro Vermelho Dos Peixes Dulciaquícolas e Diádromos de Portugal Continental. (FCiências.ID & ICNF, I.P. Lisboa, 2023).
Lopes-Lima, M. et al. The silent extinction of freshwater mussels in Portugal. Biological Conservation 285, 110244 (2023).
Miqueleiz, I., Miranda, R., Ariño, A. H. & Cancellario, T. Effective reassessments of freshwater fish species: a case study in a Mediterranean peninsula. Hydrobiologia 849, 1339–1349 (2022).
Nielsen, E. S. et al. Molecular ecology meets systematic conservation planning. Trends in Ecology & Evolution 38, 143–155 (2023).
Verissimo, J. et al. The Mitochondrial Genomes of Iberian Freshwater and Diadromous Fishes. figshare https://doi.org/10.6084/m9.figshare.26068195 (2024).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Research 45, e18 (2017).
Jin, J.-J. et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biology 21, 241 (2020).
Donath, A. et al. Improved annotation of protein-coding genes boundaries in metazoan mitochondrial genomes. Nucleic Acids Research 47, 10543–10552 (2019).
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Research 49, 9077–9096 (2021).
Katoh, K. & Standley, D. M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution 30, 772–780 (2013).
Castresana, J. Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis. Molecular Biology and Evolution 17, 540–552 (2000).
Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution 37, 1530–1534 (2020).
White, O. W., Hall, A., Price, B. W., Williams, S. T. & Clark, M. D. A Snakemake Toolkit for the Batch Assembly, Annotation and Phylogenetic Analysis of Mitochondrial Genomes and Ribosomal Genes From Genome Skims of Museum Collections. Molecular Ecology Resources 25, e14036 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP511741 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP433534 (2023).
NCBI GenBank https://identifiers.org/ncbi/bioproject:PRJNA1192057 (2024).
Satoh, T. P., Miya, M., Mabuchi, K. & Nishida, M. Structure and variation of the mitochondrial genome of fishes. BMC Genomics 17, 719 (2016).
Delarbre, C. et al. The Complete Nucleotide Sequence of the Mitochondrial DNA of the Agnathan Lampetra fluviatilis: Bearings on the Phylogeny of Cyclostomes. Molecular Biology and Evolution 17, 519–529 (2000).
Betancur-R, R. et al. Phylogenetic classification of bony fishes. BMC Evolutionary Biology 17, 162 (2017).
Acknowledgements
FCT - Foundation for Science and Technology funded PRA, CSM, MC, FR, CDS and IO via the projects UIDB/04292/2020 (https://doi.org/10.54499/UIDB/04292/2020) and UIDP/04292/2020 (https://doi.org/10.54499/UIDP/04292/2020) awarded to MARE and project LA/P/0069/2020 (https://doi.org/10.54499/LA/P/0069/2020) granted to the Associate Laboratory ARNET. H.G.F., M.C., F.R., C.D.S., A.V. and M.J.A. through grant number PTDC/BIA-CBI/31644/2017, and CSM, IO, and PRA through PTDC/BIA-EVL/30695/2017. FR is supported by the Researcher Contract CEEC/0482/2020 and MLL by 2020.03608.CEECIND. JV was supported through PhD grants SFRH/BD/133159/2017 and COVID/BD/152600/2022, JGN by PhD grant 2020.04637.BD and IO by PhD grant 2022.10442.BD. LK was supported by project MŠMT 8J24AT008. PV was supported by LIFE21-NAT-IT-PREDATOR project n. 101074458. Work supported by National Funds through FCT - Foundation for Science and Technology in the scope of the project UIDP/50027/2020. FMSM and work co-funded by the project NORTE-01-0246-FEDER-000063, supported by Norte Portugal Regional Operational Programme (NORTE2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). Work supported by the European Union’s Horizon 2020 Research and Innovation Programme under the Grant Agreement Number 857251.
Author information
Authors and Affiliations
Contributions
J.V., M.C., P.B., M.L.L. and H.F.G. designed and conceived this work. M.J.A., P.R.A., E.A., Mi.C., C.F.D., M.F.M., F.M.S.M., C.S.M., J.G.N., I.O., A.P., P.R., R.R., F.R., Q.P.R., J.Q., C.D.S., A.T., A.V., L.K. and P.V. provided samples and/or perform morphological identification of specimens. J.V., M.C. and C.C. conducted the lab experiments, and S.J. aided on the sequencing design of samples from subset B.J.V. and M.C. performed the analysis. J.V. and M.L.L. wrote the first version of the manuscript. All authors read, revised, and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Veríssimo, J., Curto, M., Alves, M.J. et al. The mitochondrial genomes of Iberian freshwater and diadromous fishes. Sci Data 12, 349 (2025). https://doi.org/10.1038/s41597-024-04297-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-04297-7