Background & Summary

Mosquitoes are hematophagous arthropods transmitting various pathogens1. Aedes aegypti is a mosquito species that transmits arboviruses, including dengue, Zika and Chikungunya, posing severe threats to public health globally2,3. The female mosquito’s midgut is the primary digestive and absorption organ, it is also involved in many physiological processes like oogenesis, signal molecule secretion, gut microbiota shaping and immune reactions4,5. Anatomically, the female mosquito’s midgut is spindle-shaped and a single inner layer of epithelial cells on a basal lamina surrounded by visceral muscles, tracheoles and fibroblasts embedded in the outer extracellular matrix4,5. Successful arbovirus infection necessitates overcoming two barriers in the midgut: the midgut infection barrier and the midgut escape barrier, both of which influence vector competence6. In arbovirus infection, the midgut is the initial infection site and plays a dual role in pathogen transmission and defense against pathogen invasion7. The cells from different midgut regions exhibit different proviral or antiviral effects, indicating arboviruses have cell type-specific preference in host-pathogen interactions8,9.

Decades of advancements in bulk RNA sequencing technologies (bulk RNA-seq) have significantly contributed to the field of transcriptomics. Studies have shown that transcriptomes may differ among cells or between tissues from the same organ. However, bulk RNA-seq has limitations, as it only provides an overall average of gene expression, thereby hiding unique transcriptional differences at the cellular level10,11. In contrast, single-cell RNA sequencing (scRNA-seq) technologies allow the study of transcriptome heterogeneity in a cell-to-cell resolution12. The cells with similar function or morphology could be separated based on transcriptional heterogeneity10,11. Therefore, scRNA-seq has been widely used to reveal cellular composition, discover novel or rare cell types, and characterize gene expression changes during cell differentiation or other stages. For example, a single-cell study of mosquito hemocytes revealed the diversity of hemocyte functions, identified different cell subtypes such as hemocytes (HC) and fat body cells (FBC) separately, and illuminated the developmental trajectories from granulocytes to antimicrobial granulocytes13.

Previous studies on the midgut of Drosophila, Ae. aegypti and other Diptera insects could serve as references for annotating mosquito midgut cells5,14,15. The midgut cells of adult dipteran insects consist of four epithelial cell types: enterocytes (EC), enteroendocrine cells (EE), enteroblasts (EB), and intestinal stem cells (ISC), and two other cell types called cardia cells (Cardia) and visceral muscles (VM)14,15. Unlike mammalian ECs, insect ECs fulfil roles in both nutrient transport and digestive enzyme secretion5. EEs produce and secret bioactive peptides and other signal molecules, playing vital regulatory roles in midgut functionality and insect homeostasis16. EBs act as progenitors of ECs and EEs, originating from ISCs via asymmetric division. ISCs can undergo symmetric division to generate either two stem cells or two differentiated cells, thereby maintaining a self-renewing epithelium under homeostatic conditions or in response to stressors in the life cycle17. Cardia cells are located at the junction between the foregut and midgut, while VMs facilitate food movement from the midgut to the hindgut5. But it should be noted that there is a major difference between Drosophila and mosquitoes, as hematophagy has distinct metabolic requirements that will contribute to the transcriptome14,15,18,19.

Cellular heterogeneity affects vector competence and host-pathogen interaction. scRNA-seq study of Anopheles gambiae hemocytes revealed a novel cell type called “megacyte” that plays a crucial role in response to the invasion of Plasmodium parasite13. Sindbis virus (SINV) prefers infecting EEs in the midgut of Ae. aegypti and may utilize the neurosecretory nature of EEs to disseminate to neighboring midgut cells and other target tissues9. Therefore, establishing a cell atlas of Ae. aegypti midgut at single-cell level can provide a reference for future studies on arbovirus-midgut interaction.

An earlier study utilizing single-nucleus RNA sequencing technology (snRNA-seq) identified and annotated 20 distinguishable cell clusters including ISC/EB, EE, EC, cardia, and VM from the midgut of Ae. aegypti15. Analyzing single nucleus rather than single cells is an important strategy that addresses tissues that cannot be readily dissociated into a single-cell suspension and frozen samples20. But snRNA-seq cannot detect mRNAs in the cytoplasm. Since some transcripts are not mainly located in the nucleus and the higher proportion of mature mRNA with polyA tails exist in the cytoplasm, the scRNA-seq has great potential in the identification of cell subtypes and the annotation of the pathogen-infected mosquito cells21,22. Here, we performed scRNA-seq experiments on mosquito midguts and successfully identified the major cell types. Our study will aid in advancing further scRNA-seq studies of the arbovirus-infected mosquito midgut.

Methods

Mosquitoes

The eggs of Ae. aegypti (Rockefeller strain) were hatched in a transparent box containing distilled water, and the larvae were fed with fish feed (Bessn, China) at 28 °C with 12 h:12 h light/dark cycles. After eclosion, adult mosquitoes were maintained in ventilated cages at 28 °C and 75% relative humidity (RH) under 12 h:12 h light/dark cycles. These mosquitoes were only fed with 8% glucose solution.

Single-cell preparation from midguts

For each experimental replicate, 40 midguts were dissected from female mosquitoes at 7 days after eclosion, then washed and temporarily stored in Schneider’s Drosophila Medium (Gibco, USA) on ice. After dissection, the medium was discarded, and the midguts were digested in 400 μl of dissociation buffer (1 mg/ml collagenase/dispase in 1 × PBS) in a shaking incubator for 20–30 min (mixed samples every five minutes) at room temperature. Then, 400 μl of Schneider’s Drosophila Medium was added to the microtube and mixed well. The resulting 800 μl of the cell suspension was filtered with a 40 μm Flowmi Cell Strainer (SP Scienceware, USA) to remove larger cell debris. To remove impurities and smaller cell debris, the filter was centrifuged over a 1.12 g/ml density solution made with OptiPrep (Serumwerk Bernburg, German) at 400 × g, 4 °C for 20 min. A viable cell band was collected and subjected to an additional centrifugation under the same conditions. The cell precipitation was washed with 1 × PBS for three times and then resuspended in 200 μl of Schneider’s Drosophila Medium. For quality control (QC), cell concentration, viability and aggregation rate were measured by staining with Calcein/PI Cell Viability/Cytotoxicity Assay Kit (Beyotime, China) and single cells were counted with a hemocytometer under a fluorescence microscope. Cell suspension samples with viability ≥80% and aggregation rate <5% were used for subsequent scRNA-seq.

Library construction and sequencing

According to the manufacturer’s guidelines, the single cell concentration was adjusted to 1000–1400 cells/μl prior to loading onto a Chromium microfluidic chip (10 × Genomics). The cDNA synthesis of captured cells and sequencing library construction were performed with Single Cell 3’ v3 Chemistry. The constructed libraries were sequenced with NovaSeq 6000 system in accordance with the manufacturer’s instructions.

Single-cell RNA sequencing analysis

For each replicate, expression matrix calculation, data filtering, normalization, dimensionality reduction by PCA (principal component analysis), UMAP clustering, and doublet screening were performed separately. The valid cell barcodes, unique molecular identifiers (UMI) and gene features were checked, raw reads were then qualified, demultiplexed and mapped to the genome (Genome assembly AaegL5.3) of Ae. aegypti from VectorBase (https://vectorbase.org/common/downloads/release-65/AaegyptiLVP_AGWG/) using Cell Ranger (v7.2.0)23. Then, the feature expression matrix was constructed by Cell Ranger, and scRNA-seq analysis was performed with Scanpy (v1.9.5)24. The low-quality cells with unique gene numbers >2500 or <100 and a proportion of mitochondrial transcripts >30% were filtered out13. The filtered data were log normalized and the top 2000 highly variable genes were used for PCA dimension reduction. The silhouette_score function of scikit-learn (v1.3.0)25 was used to calculate the silhouette coefficients of the clustering results for different numbers of principal components and resolutions to select the optimal parameters for clustering. The optimal parameters of rep1 (first 21 principal components, resolution = 0.1) and rep2 (first 14 principal components, resolution = 0.1) were used for clustering, respectively. DoubletFinder (v2.0.3)26 was used to identify doublets. After filtering out the doublet data, rep1 and rep2 were integrated as merged data using Harmony (v0.0.9)27. The merged data was subjected to data scaling, PCA, and clustering using the optimal parameters (first 18 principal components, resolution = 0.1). The filter_rank_genes_groups function in Scanpy was used to calculate the specifically expressed genes for each cell cluster. Cell types were annotated by referring to the published single-cell (nucleus) RNA-seq studies of Ae. aegypti13,15. Data visualization was performed by Scanpy, Matplotlib (v3.8.0)28 and Seaborn (v0.13.0)29 (Fig. 1).

Fig. 1
figure 1

Flow diagram of the scRNA-seq and data analysis.

Data Records

The sequencing raw data and scRNA-seq data are available on Gene Expression Omnibus (GEO). GSE accession: GSE246612 with sample GSM7872696 and GSM7872697 for rep1 and rep230. The GEO samples contain the files as follows:

raw sequencing data files:

SRR26591148 (for rep1), SRR26591149 (for rep2): the raw sequencing files with FASTQ format available in SRA run selector

10x Genomics scRNA-seq data files:

GSM7872696_sub_rep1_barcodes.tsv.gz, GSM7872697_sub_rep2_barcodes.tsv.gz: cell barcodes identified by the downstream analysis

GSM7872696_sub_rep1_features.tsv.gz, GSM7872697_sub_rep2_features.tsv.gz: gene features identified by the downstream analysis

GSM7872696_sub_rep1_matrix.mtx.gz, GSM7872697_sub_rep2_matrix.mtx.gz: a matrix that describes gene expression in each cell

Technical Validation

Since the transcriptomes of mosquito cells are different from those of mammals, we established single-cell QC standards for mosquito transcriptomes by referring to a scRNA-seq study of hemocytes of An. gambiae and Ae. aegypti13. At the stage of calculating the expression matrix, we used the–force-cells parameter of Cell Ranger for rep2 and set the value of forced cells at 3064 to ensure as many cells detected as possible while maintaining the cell quality (median value of genes around 270). Before QC, 383,640,091 and 270,662,393 reads were obtained in the raw data for rep1 and rep2, respectively (Table 1). The mean reads per cell for rep1 and rep2 are 31,840 and 88,336, respectively, with median genes per cell of 212 and 263, respectively (Table 2). After QC and filtering of the raw data, the median genes per cell in rep1 and rep2 are 359 and 251 (Fig. 2a), and the median numbers of UMIs of cells are 1,637 and 1,009, respectively (Fig. 2b). The percentage of mitochondrial genes in both replicates was less than 30% (Fig. 2c). Detection of doublets from the data showed that the number of genes in doublet samples was greater than in singlet samples (Fig. 2d), which was characteristic of doublet samples. In total, 298 doublets and 7,147 singlets were identified. Among these singlets, 4,350 cells belonged to rep1 and 2,797 to rep2. The UMAP results showed that most of the cells in rep2 were overlapped with cells in rep1, some cells in rep1 were not overlapped with cells in rep2 because there are much more cells from rep1 (Fig. 2e).

Table 1 Sequencing summary of raw scRNA-seq data.
Table 2 Quality summary of cell-based scRNA-seq data before filtering low-quality cells.
Fig. 2
figure 2

Summary of scRNA-seq data of Ae. aegypti midgut after filtering low-quality cells. (a) Distribution of the number of genes in each replicate after data filtering. (b) Distribution of the number of UMIs in each replicate after data filtering. (c) Distribution of mitochondrial gene percentage in each replicate after data filtering. (d) Distribution of gene counts in doublets and singlets. (e) Distribution of individual replicates in UMAP. The violin plot shows the range of distribution of single-cell QC data values. The widest area (bulb shape) of the violin plot shows the range of the main distribution of QC data values.

We successfully annotated 8 cell types, ISC/EB, Cardia, EC-like, EC, EE, VM, FBC, and HC (Fig. 3a, Table 3, Table S1). EC-like cell cluster is the biggest one in both replicates (Fig. 3b). ISC/EB, EE, FBC, and HC contain a greater proportion of cells in rep1, while EC contains a greater proportion of cells in rep2 (Fig. 3b). Cardia was only detected in rep1 (Fig. 3b). These results indicate that the two replicates were mostly consistent in terms of cell type. We identified marker genes for each cell type13,15, which can be used to clearly distinguish each cell type (Fig. 3c). To confirm the cell specificity of these marker genes in each replicate, we calculated the expression of marker genes between the replicates (Fig. 4a-k). The results showed that the expression of most of the marker genes for rep2 was lower than that for rep1, but the expression trends across cell types were consistent across replicates. As EC-like cells contain the highest number of cells in both replicates, it is necessary to further clarify its midgut cell characteristics. We compared the expression of the midgut marker gene carboxypeptidase (AAEL001863)31 among cell types (Fig. 4l). The results showed that the expression of carboxypeptidase was much higher in EC-like cells than in other cell types, suggesting that most of the detected cells have distinct midgut cell characteristics. These results indicate that most of the cell types identified in the two replicates are consistent, and the vast majority of the cells have distinct midgut cell characteristics.

Fig. 3
figure 3

Cell type annotation for cell clusters. (a) Cell type annotation results. The number of cells is labeled after the cluster name. (b) Percentage of cells with replicate 1 and replicate 2 in each cell type. The number of cells is labeled in parentheses. (c) Expression of cell type marker genes in each cell cluster. Intestinal stem cells/enteroblasts (ISC/EB), cardia cells (Cardia), enterocytes (EC), enterocytes-like (EC-like), enteroendocrine cells (EE), visceral muscle (VM), fat body cells (FBC) and hemocytes (HC).

Table 3 Cell type marker genes reported in previous studies. The information for marker gene description was obtained from the NCBI database.
Fig. 4
figure 4

Marker gene analysis by cell type. (al) Each subplot corresponds to a marker gene. The expression of marker genes in all cells and their cell specificity in each replicate are shown in the subplots. Dark and light red in the dot plots indicate highly and lowly expressed genes, respectively.

Usage Notes

Variabilities in the experimental replicates were observed in both previous scRNA-seq studies on insects and this study14,32. The variances may be influenced by factors such as mosquitoes from different batches, different dissection techniques performed by different persons and the time for storing midguts on ice before dissociation. The mosquito midgut scRNA-seq data in this study were uploaded to the GEO database, which includes the raw FASTQ files and the expression matrix files processed by Cell Ranger software. Our scripts have been uploaded to Github (see Code Availability section for details) to allow other researchers to reproduce the results or combine them with other data for further analysis.