Background & Summary

Copepods are a primary component of zooplankton assemblages in marine ecosystems. They occupy a critical trophic position in the marine food web since they graze on the phytoplankton, consume detritus, and serve as a food reserve for fish and invertebrates1. Inside this group, Acartia tonsa Dana (1849) is a marine, euryhaline calanoid copepod with a cosmopolitan neritic distribution, often the dominant zooplankton species in several ecosystems2. Due to its worldwide distribution, easy culturing, short generation times and ecological relevance, A. tonsa is a useful bioindicator organism for climate-driven ecosystem variability3 and for assessing the toxicity of several compounds4,5,6. This organism has been successfully used in previous studies to assess the effect of neonicotinoid pesticides due to its high sensitivity caused by its nervous system’s similarity with insects7,8.

Neonicotinoids (NEOs) are neurotoxic pesticides that disrupt synaptic transmission by acting as nicotinic acetylcholine receptors (nAChRs) agonists9. In the current study, we employed the first-generation NEOs acetamiprid, imidacloprid, and thiacloprid and the second-generation NEOs clothianidin and thiamethoxam.

Despite the ecological importance of A. tonsa, only one genomic and a few transcriptomic assemblies of this organism are available in the NCBI database2,10,11,12,13. The largest transcriptome of A. tonsa until the time of writing (April 2025) that includes the adult stage of this copepod was previously obtained by Jørgensen et al.2. These authors performed a de novo assembly from Illumina reads covering all life stages (embryos, nauplii/copepodites and adults), obtaining a 118,709,440-base transcriptome with 61,149 representative transcripts and 56,257 additional isoforms. However, a de novo transcriptome assembly of eukaryotic organisms from short RNA-Seq data is severely limited, as this type of reads does not allow accurate reconstruction of full-length transcripts and transcriptional isoforms14. For this reason, long-read sequencing technologies are increasingly being used to unravel the complex nature of transcriptomes, especially when no high-quality genome sequence is available15,16.

PacBio and Nanopore sequencing technologies achieve read lengths of around 15 kb and over 30 kb, respectively. These technologies enable the study of full-length sequences, alternative splicing, structural variation and alternative polyadenylation sites, facilitating genome annotation and gene function studies17,18.

In this study, we present a de novo transcriptome of A. tonsa adult copepods exposed to different NEOs obtained from Nanopore sequences. To our knowledge, this is the first report of a de novo transcriptome of this organism produced using a long-read sequencing strategy. Our assembly exhibits an estimated completeness of 88.3% and includes 261,560 transcripts with a mean length of 2,034 bases. It provides a solid baseline for future studies on the response of A. tonsa adults to different environmental stressors. Leveraging this dataset, we will study the differential expression of genes in A. tonsa adults exposed to different NEOs at different concentrations to identify biomarkers of stress and to support the development of early warning tools for marine ecosystem health.

Methods

Acartia tonsa culturing

The biological material used for the experiments was obtained from in-house laboratory cultures kept at the Ca’ Foscari University of Venice5,7,8. Stock cultures of A. tonsa (n = 600–800 individuals) were maintained at 20 ± 2 °C in climatic rooms under a 16 h: 8 h light:dark photoperiod and continuous aeration using a 20‰ salinity medium prepared according to the ISO 16778 standard method19. The copepods were fed ad libitum with a mixture of three microalgae (Pavlova lutheri, Tetraselmis suecica, and Tisochrysis lutea) dosed four times per day using a peristaltic pump controlled by a timer. The A. tonsa cultures were cleaned three times a week by siphoning off the water from the bottom of the culture flasks (2 L glass bottles) and filtering the siphoned medium using two sieves with mesh sizes of 170 mm and 50 mm, respectively. This procedure allowed the separation of the adult copepods and copepodites (retained by the 170 mm mesh size sieve), from the eggs and nauplii (passing through the 170 μm mesh but retained by the 50 μm mesh size sieve) and wastes (faecal pellets, aged algae, detritus). Adult copepods were reintroduced into the culture, while eggs were collected and stored separately at 4 °C for later testing. Each stock culture was maintained for 6 - 7 weeks before starting new cultures.

Experimental design

The exposure experiments were conducted using eggs produced by in-house cultures within 24 hours before the test. The eggs were carefully counted under a dissecting microscope, and only healthy, blue-coloured eggs were selected. Approximately 400–500 eggs were then placed in 1 L Erlenmeyer glass flasks filled with the test solutions: a negative control (20‰ salinity medium) and two different concentrations of each NEO (acetamiprid, 10 and 81 ng L−1; clothianidin, 8 and 62 ng L−1; thiacloprid, 9 and 84 ng L−1, thiamethoxam, 8 and 84 ng L−1; imidacloprid, 1 and 10 ng L−1) showing no effects on the larval development of A. tonsa7,8 were tested.

The exposure was performed in triplicate, under the same condition as the copepod culturing (T = 20 ± 2 °C, 16 h: 8 h light:dark photoperiod). The test solution was renewed three times a week by siphoning off about half of the culture medium and replacing it with a freshly prepared medium at the same concentration. The siphoned-off medium was filtered through the 170 μm and 50 μm mesh size screens, and the recovered nauplii and copepodites were reintroduced into their test flask. The food was provisioned during medium renewal by using the algal mix as part of the dilution medium to prepare test concentrations. Peristaltic pumps were not used during the exposure to avoid the dilution of the test concentrations.

Experiments ended on day 14, when all exposed individuals reached adulthood and sexual maturity. For the remainder of the study, we analyzed RNA extracted from these adults. The content of each Erlenmeyer flask was filtered through the 170 μm screen, and the retained copepods were poured into a 10 mL graduated cylinder filled with 20‰ salinity medium and kept under fluorescent light to concentrate the copepods on the surface and allow the residual food, exuviae, and faecal pellets to settle on the bottom. Adult copepods were then recovered using a 1000 µL micropipette and placed into a second 10 mL graduated cylinder for further separation. Finally, recovered copepods were pipetted into a pre-chilled 1.5 mL RNase/DNase-free Eppendorf tube. The remaining 20‰ salinity medium was removed using a 200 µL micropipette and 200 µl of cold RNAProtect (Qiagen, Hilden, Germany) was added. Samples were stored at 4 °C overnight.

RNA extraction

After removing RNAProtect, the samples were ground using sterilized Bel-Art® Disposable Pestles (BAF199230000, Sigma-Aldrich, MO, USA). The Eppendorf tube was partially immersed in liquid nitrogen to cool the tissue when performing the grinding. A figure representing this procedure was created with BioRender.com and is available at Figshare20. The following steps were performed at room temperature unless otherwise specified. For the lysis step, samples were resuspended in 400 µl of RLT buffer present in the RNeasy Protect Mini Kit (cat. no. 74124, Qiagen) containing 1% of 14.3 M β-mercaptoethanol (Sigma-Aldrich). The micropestle was also washed with the same buffer to remove all residues. An isovolume of phenol-chloroform-IAA (Sigma-Aldrich) was then added and the samples were vortexed for 30 s and centrifuged for 10 min at max speed (16.100 g). The top aqueous phase was collected without disturbing the other layers and moved to a new RNase/DNase-free Eppendorf tube. Following this step, RNA extraction was continued by using the RNeasy Protect Mini Kit (Qiagen) according to the manufacturer’s protocol for the purification of total RNA from animal cells, resumed at step 4. All buffers were prepared accordingly. Briefly: 400 µl of freshly prepared ethanol 70% (Sigma-Aldrich) solution were added to the samples and mixed by pipetting. 700 µl of each sample were loaded onto an RNeasy Mini spin column, followed by 15 s centrifugation at 10.000 g. Flowthrough was discarded and this step was repeated with the leftover volume. Optional DNA removal step was performed using RNase-Free DNase Set (Qiagen) following manufacturer’s instructions. The protocol was resumed at step 7. Washing steps were performed, as well as the additional centrifugation step to remove leftover buffer. Elution was performed by adding 40 µl of RNase free H2O directly on the membrane and centrifuging for 1 min at max speed. The purity of RNA was checked with NanoDrop1000. Quantity was evaluated with Qubit™ fluorometer (ThermoFisher Scientific, MA, USA) using the Qubit™ RNA High Sensitivity (HS) kit following manufacturer’s instructions. RIN values were determined using an Agilent Bioanalyzer 2100 (Agilent, CA, USA). All RIN values were higher than 7 (mean ± SD for all samples: 8.52 ± 0.68).

Nanopore library synthesis

All samples from the same NEO exposure experiment (control and two exposure concentrations) were pooled for library construction (five pools in total). Input RNA was obtained by pooling together equal amounts of total RNA from treated and control samples for each NEO exposure experiment, reaching the final concentration of 60 ng µl−1. When available, samples from parental specimens were included. The cDNA libraries were obtained with the cDNA-PCR Sequencing (SQK-PCS109) kit following the protocol version PCS_9085_v109_revK14Aug2019 last updated on 09/12/2020, available at the Oxford Nanopore Technologies website. One µl (60 ng) of pooled total RNA for each NEO experiment was used as starting material for reverse transcription and strand-switching. The 20 µl of reverse-transcribed sample were used to make 4 × 50 µl PCR reactions to select full-length transcripts. The PCR conditions were set as follows: initial denaturation, 95 °C 30 s, followed by 12 cycles of denaturation 95 °C 15 s, annealing 62 °C 15 s, extension 65 °C 6 min, and a final extension 65 °C 6 min. AMPure XP beads (Beckman Coulter, CA, USA) were used for amplified-DNA purification according to the protocol. The libraries were tested for DNA size and quality, using the Bioanalyzer 2100 (Agilent), and quantity, using the Qubit™ DNA High Sensitivity kit (ThermoFisher Scientific). Libraries ranging between 35 fmol and 45 fmol were used for adapter addition. Quality checking, priming and loading of the SpotON flow cell (FLO-MIN106) were performed following manufacturer’s instructions. Sequencing was performed using the MinION Mk1C technology (Oxford Nanopore Technologies, Oxford, UK) with an initial bias voltage of −180 mV. Samples were sequenced for 72 h.

Basecalling

Raw Nanopore signals were converted into nucleotide sequences using the dorado basecaller (https://github.com/nanoporetech/dorado; version 0.5.3 + d9af343) with the Super Accurate model “[email protected]”. Full-length reads were selected using pychopper (https://github.com/epi2me-labs/pychopper; version 2.7.10).

Transcriptome assembly

The de novo assembly was performed as detailed in Fig. 1. Reads were assembled using the software RNA-Bloom21 (version 2.0.1), a method recently extended to perform reference-free reconstruction of transcriptomes starting from long reads. The transcriptome reconstruction was performed independently on each pool of long reads (control and exposure to two NEO concentrations) coming from each NEO experiment. Individual assemblies (five in total) were then joined in a unique transcriptome using the TR2AACDS pipeline from the EvidentialGene software (http://arthropods.eugenes.org/EvidentialGene/evigene/) to remove redundant sequences. A final filtering step was performed using DIAMOND alignments (see “Functional annotation” section), removing transcripts whose 3 top hits matched non-eukaryotic organisms in the NR database (294 representative transcripts).

Fig. 1
figure 1

Bioinformatic workflow used in this work for the transcriptome assembly and annotation.

Transcriptome completeness assessment

The completeness of the assembly was determined using BUSCO (version 5.8.0)22 against the arthropoda_odb10 database. Only one representative transcript for each gene was considered for the analysis. General metrics for the assembly were obtained by using TransRate (version 1.0.3)23.

Indel rates in individual Nanopore reads and in the assembled transcriptome were also assessed using the BUSCO gene set as a reference. Reads were aligned to BUSCO genes using the minimap2 software24 with the “splice” preset, and CIGAR operations corresponding to insertions and deletions were extracted using a custom Python script. In parallel, we evaluated the presence of indels in assembled transcripts corresponding to BUSCO single-copy orthologs using BLASTN.

Functional annotation

Assembled transcripts were compared with the NCBI-NR (non-redundant) protein database. Sequence alignments were computed using DIAMOND25 with a sensitive preset. Results with an e-value greater than 10−6 were discarded. Only the best hit for each transcript was considered for further analyses. We also ran InterproScan26 (version 5.67–99.0) to search known functional domains and predict protein family membership.

Data Records

All reads generated in this work were deposited in the NCBI SRA repository27 under BioProject PRJNA1104241. The de novo transcriptome assembly was deposited at GenBank under the accession prefix GKXQ28. Tables with the complete data of the functional annotation and amino acid sequences derived from the transcriptome are made accessible on Figshare20.

Technical Validation

Sequencing statistics

The main statistics of Nanopore sequencing runs for each NEO are summarized in Table 1. More than 8 M reads passed the quality requirements in all cases, showing an average N50 of 1.14 ± 0.15 kb.

Table 1 Main statistics of Nanopore sequencing runs.

Transcriptome assembly characteristics

The main properties of the de novo transcriptome assembly (obtained with the TransRate software) are detailed in Table 2. A total of 261,560 transcripts were obtained, where 31,291 sequences were considered as representative transcripts. The mean sequence length was 2,034 bases, over two times the length obtained in the Jørgensen et al.2 transcriptome (994 bases). The N50 was also higher (2,580 compared to 1,052 bases), whereas the GC content was similar between the two transcriptomes (37 vs. 39%). The distribution of all transcripts according to their length (Fig. 2) also showed longer sequences in the transcriptome obtained in this study, with 77.26% of transcripts being longer than 1000 bases.

Table 2 Summary statistics of the transcriptomic assembly reported in this work obtained with TransRate.
Fig. 2
figure 2

Length distribution of assembled transcripts from this work and from Jørgensen et al.2.

The completeness of the transcriptome was assessed and compared with a previous transcriptome of A. tonsa2 using BUSCO with representative transcripts of each dataset as input (Fig. 3). The single-copy BUSCO markers are broadly used for completeness measurement, since they are expected to be found once in a genome21. Results showed 88.3% of completeness for our transcriptome (894 out of 1,013 arthropod lineage marker genes), of which 81.8% corresponded to single-copy genes and 6.4% to duplicated ones. In contrast, the reference transcriptome2 exhibited a lower completeness (73.9%) with a low percentage of single-copy markers (51.9%) and a higher percentage of duplicated BUSCOs (22.0%). The high percentage of BUSCO genes that were present in one copy in the representative transcriptome of this work indicates a nearly complete transcriptome with low redundancy. Furthermore, the amino acid sequence identity of BUSCO genes identified in our de novo transcriptome assembly against those in the Arthropoda_odb10 database was 66.28% ± 14.07, being slightly higher than the identity between BUSCO genes of the reference transcriptome and the same database (61.64% ± 15.10).

Fig. 3
figure 3

BUSCO completeness analysis. The representative transcripts from this work and from Jørgensen et al.2 (reference) were analyzed using the Arthropoda dataset odb10 (total of marker genes: 1013). The number of complete genes (C) (single copy: S; duplicated: D), fragmented genes (F) and missing genes (M) found for each transcriptome are detailed in the figure.

Assessment of indels in the dataset

While the length of Nanopore reads provide significant advantages for transcriptome assembly, they are known to have a higher rate of insertions and deletions compared to alternative technologies29. We investigated this tradeoff in our data. In the absence of a reliable genome assembly for comparison, we reasoned that BUSCO genes could serve as a reference for assessing indel frequency. These genes were selected for their presence as near-universal single-copy orthologs, making them highly conserved across species within our taxonomic group. While BUSCO genes were primarily selected for assessing genome/transcriptome completeness, we leverage their highly conserved nature to provide an estimate of indel frequency, acknowledging that this might only provide an approximate estimate of sequence accuracy.

Indel rates in single Nanopore reads were first assessed. Overall, we observed that all sequencing runs were affected by similar indel rates, with a median value of 2.5%. This is consistent with previously reported indel rates for Nanopore sequencing29.

To assess the quality of our transcriptome, we compared assembled transcripts corresponding to BUSCO single-copy orthologs using the BLASTN software. Out of the 1,395 matched transcripts, 1,182 (85%) showed no detectable indels. The mean indel rate across all matched transcripts was 0.1%, representing a substantial reduction from the read-level rate. This suggests that our assembly pipeline, incorporating both RNA-Bloom and EvidentialGene, effectively reduced indel rates.

Functional annotation

Assembled transcripts were annotated using the different resources included in the InterProScan database (such as Metacyc, Reactome and the Gene Ontology) and searching for sequence homology with proteins stored in the NCBI-NR database. According to annotation statistics (Table 3), 17,005 out of 31,291 representative transcripts were annotated in at least one database, whereas 8,381 were annotated by all databases. The frequency of GO subsets (also known as GO slims) classified in Biological Processes (BP), Cellular Component (CC) or Molecular Function (MF) is shown in Fig. 4. From BP categories, the most frequent GO slim was “regulation of DNA-templated transcription” (734 representative transcripts). Inside the CC ontology, more than 2000 transcripts were classified in the “nucleus” category, whereas in the case of the MF group, the top 3 GO slims were protein binding (1238 transcripts), DNA binding (666 transcripts) and RNA binding (575 transcripts). Finally, a homology search of predicted amino acid sequences against the NR database revealed significant matches, primarily with the copepods Eurytemora carolleeae (12,447 representative transcripts), Tigriopus californicus (550 transcripts) and Acartia pacifica (416 transcripts) (Fig. 5).

Table 3 Percentages of representative transcripts annotated in different databases.
Fig. 4
figure 4

Annotation of representative transcripts against the Gene Ontology (GO) database. Bars represent the distribution of representative transcripts in GO slims classified into three functional categories: biological process, cellular component, and molecular function. Only GO slims with more than 100 counts are shown.

Fig. 5
figure 5

Taxonomic distribution of the 5 most frequent species among NR hits. The width of the branches represents the frequency of each taxon identified by the DIAMOND search.