Background & Summary

Vertebrate mitochondrial genomes (or mitogenomes) represent the maternal evolutionary lineages, and their gene content and order are generally highly conserved across taxa1. They evolve at a relatively constant rate, making whole mitogenome sequences valuable for understanding evolutionary and demographic histories, phylogenetic relationships, and divergence times of non-model species2. Whole mitochondrial genome information can also be employed to inform phylogeography and conservation genetics (e.g.3). The analysis of sequence variation in mitogenomes allows for the distinction of lineages, populations, evolutionarily significant units, cryptic species, and the drivers of speciation. This, in turn, facilitates the identification of priority areas for conservation and the design of strategies to maintain genetic diversity and resilience in natural populations. Furthermore, mitochondrial genomes can provide environmental plasticity, thus allowing for species adaptation and colonization into new habitats4,5.

Mitochondrial sequences are commonly used as molecular markers for species identification, referred to as molecular barcodes, which are especially useful when morphological identification is challenging or ambiguous6. These reference sequences are increasingly important for metagenomics and metabarcoding studies that aim to identify multiple taxa from a mixture of DNA samples. For example, they can be used for assessing biodiversity from DNA present in the environment, such as air, water or soil samples (e.g.7,8,9) or identifying prey items in gut or scat samples10,11,12. To ensure successful and accurate identification, it is crucial to have curated reference sequence databases that cover the diversity of the target taxa for these non-invasive methodologies. Therefore, genomic resources, such as whole mitogenomes, of non-model species are highly important.

The Iberian Peninsula, situated in southwestern Europe, is home to a variety of freshwater ecosystems, including rivers, streams, lakes, and wetlands13. Since the rise of the Pyrenees, approximately 100–150 million years ago, the region has been isolated from the rest of Europe, which has resulted in the evolution of unique species14. The region’s diverse topography and climate, in conjunction with the isolation of river basins, functioned as natural barriers to fish dispersal and gene flow, thereby contributing to further speciation events15. The combined effects of isolation and selective pressures have promoted species diversity and high levels of endemism, resulting in several species being restricted to specific Iberian river ecosystems or basins15,16,17.

The Iberian freshwaters have suffered significant degradation due to various pressures, including alterations caused by dams and other infrastructures (e.g. channels and weirs), pollution, eutrophication, biological invasions, and water over-extraction18. Coupled with the ongoing aridification of the Peninsula, this has led to the decline of most freshwater taxa, including fishes17,19. The genomic resources generated here provide more accurate species identification and assist the use of molecular tools, such as eDNA, for more efficient systematic monitoring. This enables a more comprehensive understanding of population trends and the assessment of conservation status, thereby informing conservation management and policies20.

Although some species already have publicly available mitogenome sequences, these reflect only 43% of the total number of species occurring in Iberia, including both native and non-native species. Moreover, there was a pronounced bias towards non-native species, with only 15% of the native Iberian species having a public mitogenome assembly.

This study presents new reference mitogenomes for 60 (55%) of all 109 freshwater and diadromous fish species known to occur in the Iberian Peninsula, in addition to the 35 already publicly available. Of the new mitogenomes, 50 are from native Iberian species, which, when combined with the 10 already published, represent 83% of the total native freshwater fish fauna (72 species in total). These mitogenomes represent a fundamental resource for future research in phylogenetics, phylogeography and population genetics. Furthermore, the data will facilitate the development of PCR primers and probes for environmental DNA surveys and species monitoring, as well as molecular identification from predator diets and other metabarcoding studies.

Methods

DNA extraction, library construction, and sequencing

Total genomic DNA was extracted from fin clips with the QIAmp DNA Micro kit (QIAGEN) following the manufacturer’s protocol. Vouchered specimens are available to a subset of samples (Species21). DNA quantity was assessed with a Qubit fluorometer with the dsDNA BR Assay Kit (Thermo Fisher Scientific, USA). Illumina libraries were constructed using two different methodologies (Metrics21). Samples from subset A were sheared to an average size of 350 bps using Bioruptor Pico (Diagenode, USA), and Illumina’s TruSeq Nano kit was used to construct libraries. These were quantified using qPCR (Kapa Library Quantification Kits compatible with Illumina platforms) and pooled equimolar to be sequenced, targeting at least 2 Gbps per sample. Libraries were sequenced with 150 bps (PE) on an Illumina platform (Novaseq and HiseqX). Samples from subset B were sent for shotgun sequencing at the Norwegian Sequencing Centre, Oslo, Norway. Library preparation followed the Illumina DNA Prep Tagmentation Kit (Illumina, San Diego, California, USA). Samples were sequenced by producing 150 bp paired-end reads with an expected depth of 20x per sample obtained by two runs using a quarter of a flow-cell of the Illumina NovaSeq S4 platform (expected throughput of 800 Gbp each) and a partial run of the same platform (100 Gbp).

Mitochondrial genome assembly and annotation

Read quality was evaluated with FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and adapters were removed and quality trimmed with Trimmomatic v0.3922 with the following parameters, LEADING:3 TRAILING:15 SLIDINGWINDOW:4:15 MINLEN = 30. Mitochondrial genomes were assembled using NOVOPlasty v4.3.123, and if a circular assembly could not be obtained, GetOrganelle v1.7.6.124 was used. Protein-coding genes and tRNAs were annotated for all mitogenomes using MITOS2 v2.0.825 and tRNAscan-SE v2.0.926, respectively (Annotations_Mitogenomes21). MITOS2 was run with default parameters except for evalue = 15, fragovl = 0, finovl = 10, and using refseq89 m as reference. Publicly available mitogenomes of Iberian freshwater species were retrieved from NCBI and included for further analysis. Mitogenomes were aligned at the order level using the MAFFT version implemented in Geneious Pro v.10.2.6 under default settings to confirm annotations27.

Mitogenome phylogeny

All mitochondrial protein-coding genes (PCGs) and both ribosomal regions from all species were extracted and realigned with MAFFT v7.45327. Alignments for each region were filtered and trimmed using Gblocks v0.91b28 and then concatenated into a single dataset. Phylogenetic analyses were performed with IQ-TREE229, with the appropriate evolutionary model inferred for each gene, using 10,000 bootstraps to confirm the phylogenetic relationship between species. We used the individuals from the family Petromyzontidae as outgroup (Petromyzon marinus: PMU11880; Lampetra alavariensis: MT34; Lampetra auremensis: Aur19-OL-20; Lampetra fluviatilis: 9505; Lampetra lusitanica: MT32; Lampetra planeri: Plan19-long9). This analysis was performed using the gene2phylo wrapper30.

Data Records

The reference data for this collection includes the following information: (1) Sample Code; (2) Species; (3) georeferenced data (latitude and longitude in decimal degrees) for each specimen; (4) sampling date; (5) mitogenome for each specimen; (6) existence of voucher for each specimen; (7) SRR accession code; and (8) assembled mitogenome NCBI accession code. The raw reads sequencing outputs were deposited at the NCBI Sequence Read Archive under SRP511741 (2024)31 and SRP433534 (2023)32. The assembled mitogenomes and annotations were deposited in NCBI (PP928724-PP928783) under BioProject PRJNA119205733. Cytochrome oxidase I (COI) gene sequences were deposited in BOLD (Ref: IBFIS). All data associated with this study is hosted at Figshare21.

Technical Validation

All specimens were identified by experts and further validated based on COI and/or Cytochrome b (Cyt b) queried against the BOLD and NCBI databases, respectively. An identification was deemed correct if the percentage of identity was higher than 99%. The mean coverage of each mitogenome was 407 reads per base, with the lowest coverage observed in Luciobarbus microcephalus at 15 and the highest in Lampetra fluviatilis at 2622. Except for the lampreys and Alosa fallax, all 13 PCGs, 2 rRNAs, and tRNAs were automatically annotated using the previously mentioned software. The publicly available annotated mitogenomes were used as references for the species in which the annotation failed, and gene positions were compared (Alosa alosa: NC_009575, and Lampetra fluviatilis: Y18683).

The mean mitogenome sequence length varies across families, between 16,077 bps (Pleuronectidae) and 16,798 bps (Mugilidae). The average GC content in our dataset is 44.3%, with variability between families. The lowest average GC content is observed in Petromyzontidae (38.4%), while the highest is observed in Atherinidae (49.4%), which is similar to other fish species (Table 134). Despite some variance in PCGs lengths across the dataset, their sizes are comparable to those belonging to closely related species. Thus, the observed differences in mitochondrial genome length are mostly attributed to variation in intergenic regions (Table 2). The majority of species exhibits a gene order analogous to that observed in most vertebrates, whereas the Petromyzontidae displays its characteristic gene order, with the control region located between the ND6 and Cyt b (Fig. 1)35. The maximum likelihood tree reconstructed with IQ-TREE used the model GTR for the combined gene set (13 PCG + 2 rRNAs). The Petromyzontidae family was selected as the outgroup, as it is recognised to be a more basal clade36. All genera represented by multiple species form monophyletic groups within each family (Fig. 2), and the same was found for higher taxonomic levels, such as family and order.

Table 1 Summary information of the new mitogenomes presented in this study. For each family, we provide the number of species sequenced, the mean and standard deviation (mean ± sd) of the mitogenome size (Size), the percentage of GC content (GC), and the mean and standard deviation (mean ± sd) of the sequence coverage (Cov). Families represented by only one individual are shown with the corresponding value.
Table 2 Summary information on the minimum (Min) and maximum (Max) length of annotated mitochondrial regions across the sequenced mitogenomes.
Fig. 1
figure 1

Representation of the mitochondrial arrangement found in species belonging to the class Actinopteri (a) and the rearrangement typical of Petromyzontida (b). The purple shaded box highlights the rearrangement region. CR corresponds to Control Region and all tRNA coding genes are represented by the one-letter code for the corresponding amino acid.

Fig. 2
figure 2

Maximum likelihood tree constructed using IQ-TREE2 with mitogenomes of 95 (87%) fishes occurring in the freshwaters of the Iberian Peninsula mainland. Genera in green represent native groups, while blue represents non-native groups. Collapsed genera with * include species with both statuses and are coloured according to a majority rule. Bold names represent groups with new mitogenomes. Node bootstrap values are shown as follows: black circles: >99%; dark-grey: 95%-99%; blue: 75%-95%; white: 60–75%. Nodes below 60% are not shown.

Although some mitochondrial genomes remain to be sequenced for a few species, further research is ongoing to address this knowledge gap. For example, Anaecypris hispanica, which is endemic to the region, is included in the ERGA (European Reference Genome Atlas) project (www.erga-biodiversity.eu). The remaining species still lacking mitochondrial genomes belong to genera for which new data are now available, thus representing a lower fraction of the whole genetic diversity of freshwater and diadromous fish in the Iberian Peninsula.