Abstract
Antibiotic resistance poses a major global threat to health and ecosystems. Genes that confer resistance and can move between microbes, known as mobile antibiotic resistance genes, are especially concerning, as they can spread into disease-causing bacteria. While many pathways for gene dispersal are known, airborne transport remains poorly understood due to the difficulty of collecting and analyzing low-biomass air samples. Here we sampled air during clear weather and Middle Eastern dust storms and applied a genomic method called co-assembly, which merges multiple datasets to improve gene detection. This approach enhanced gene recovery and revealed resistance genes against clinically important antibiotics, including aminoglycosides, beta-lactams, fosfomycin, glycopeptides, quinolones, and tetracyclines. However, most resistance genes were not clearly linked to mobile elements, reflecting the complexity of airborne microbial communities. Our findings suggest the potential for long-range airborne spread of antibiotic resistance and underscore the need for continued atmospheric monitoring and strategies to mitigate environmental dissemination.
Similar content being viewed by others
Introduction
Antibiotic resistance (AR) is an evolutionary process in which microorganisms acquire the ability to survive and proliferate in the presence of antibiotics. This resistance emerges through random mutations in microbial DNA and horizontal gene transfer1,2. While AR is a natural survival mechanism, human activities have significantly amplified its scale and impact, transforming it into a significant global threat to public health and ecosystem stability. The widespread misuse of antibiotics in healthcare, agriculture, and industry have accelerated the emergence and dissemination of resistance3. By 2050, drug-resistant infections are projected to cause nearly 2 million deaths annually, resulting in severe social and economic consequences4. A key mechanism driving the spread of AR is the horizontal transfer of antibiotic resistance genes (ARGs) between diverse microbial species5,6,7. ARGs frequently associate with mobile genetic elements (MGEs), such as plasmids, transposons, and integrative conjugative elements, enabling their movement across species and ecological boundaries. This genetic mobility significantly accelerates the evolution and spread of resistance8,9. Climate change further aggravates this crisis by influencing the environmental dissemination of ARGs since extreme weather events, such as floods, high winds, dust storms, and rising temperatures, can enhance the transport and persistence of resistant microbes in the environment10,11,12,13,14,15,16,17.
Airborne ARGs are an emerging concern, yet their spread remains poorly understood. These ARGs originate from diverse sources, including agricultural fields, wastewater treatment plants, industrial sites, and urban environments18,19,20. Microorganisms carrying ARGs can attach to aerosols, traveling long distances and crossing geographical and ecological barriers21,22,23. While ARG dynamics in soil, water, and terrestrial ecosystems have been studied extensively, the role of atmospheric processes, such as air mass transport and dust storms, in ARG dispersal remains underexplored. This knowledge gap is critical, as atmospheric transport could facilitate the intercontinental spread of ARGs, potentially influencing global AR patterns. Addressing this gap is essential for developing effective mitigation strategies.
Studying the atmospheric microbiome, particularly airborne ARGs and MGEs, presents significant challenges due to the inherently low biomass of atmospheric microbial communities24,25,26,27. Comprehensive microbiome analysis requires tens of millions of high-quality sequencing reads, which are difficult to obtain from atmospheric samples. These limitations hinder our ability to accurately characterize the composition, functional potential, and mobility of airborne microbes. The challenge is further exacerbated during transient events such as dust storms, where the sampling window to capture peak microbial loads is typically restricted to a few hours28. Extending sampling durations can increase biomass collection for metagenomic analyses, but it may also dilute event-specific microbial signatures, making it difficult to differentiate dust storm-associated microbiota from background air microbiomes. While optimized sampling and DNA extraction protocols can enhance taxonomic resolution, their effectiveness in linking functional microbiome traits, such as ARG mobility, remains uncertain24,29,30,31. Higher DNA recovery does not necessarily translate to improved resolution in downstream bioinformatic analyses, as sequencing depth alone does not guarantee accurate reconstruction of functional and phylogenetic relationships. A major challenge in characterizing airborne ARGs and MGEs is the need to assemble sufficiently long genomic fragments to reliably identify genetic mobility and resistance determinants. For example, plasmids, which are key vectors of ARGs6,32,33, are difficult to detect in metagenomic datasets due to their high sequence diversity and frequent genetic recombination34. Additionally, homologous regions shared between plasmids and chromosomal DNA, a consequence of co-evolution and horizontal gene transfer35, complicate accurate classification. Short DNA fragments, which may encode only partial genes or non-coding sequences, present significant difficulties for plasmid prediction tools, reducing the reliability of ARG mobility assessments36.
To overcome these challenges, improved methodologies are needed. Strategies should focus on maximizing DNA recovery, improving the assembly of longer genomic fragments, and integrating advanced sequencing technologies, such as the combination of short- and long-read sequencing27. One promising approach is the co-assembly of air samples from similar environmental origins, which could enhance the detection of low-abundance genes that are otherwise undetectable in individual samples. This strategy may also yield longer contigs, enabling a more detailed analysis of microbial traits37,38,39. However, the inherently low biomass and high diversity of airborne microbial communities could introduce potential biases during co-assembly, such as misassemblies or the loss of rare ARGs and plasmids. Developing robust approaches to balance these trade-offs is essential for advancing our understanding of airborne ARGs and their global implications.
Thus, our study aims to (i) evaluate the effectiveness of co-assembling atmospheric genetic data in addressing these key technical limitations, and (ii) by analyzing longer DNA fragments, determine whether ARGs are mobile within the atmospheric microbiome, particularly during dust storms. Understanding these processes can provide critical insights into the airborne dissemination of AR and its broader implications for microbial ecology and public health.
Results
Individual versus co-assembly
To compare the performance of co-assembly against individual assembly for the same set of samples, we grouped 45 air samples into six distinct subgroups based on their taxonomic and functional characteristics, as detailed in the Materials and methods section. Each subgroup was then co-assembled. In co-assembly, all sequencing reads from different samples were pooled and assembled, generating a non-redundant set of contigs and genes. In contrast, individual assembly processes each sample separately, potentially leading to redundant contigs and genes across samples. To ensure a fair comparison, we reduced redundancy in the individual assemblies by clustering similar sequences into families and organizing them according to the corresponding co-assembled samples.
Assembly quality
We assessed assembly quality using four key metrics: genome fraction, duplication ratio, mismatches per 100 kbp, and the number of misassemblies. Across the 49 reference genomes representing a subset of the air microbiome, co-assembly consistently outperformed individual assembly. Specifically, co-assembly achieved a slightly higher genome fraction (4.94 ± 2.64%, mean ± SD) compared to individual assembly (4.83 ± 2.71%). More notably, co-assembly exhibited a lower duplication ratio (1.09 ± 0.06 vs. 1.23 ± 0.20), fewer mismatches per 100 kbp (4379.82 ± 339.23 vs. 4491.1 ± 344.46), and fewer misassemblies (277.67 ± 107.15 vs. 410.67 ± 257.66).
To further understand these differences, we performed a comparative analysis of individual reference genomes across the 49 species. Co-assembly significantly reduced both the duplication ratio and the number of misassemblies (paired one-sided Wilcoxon signed-rank tests, p < 0.05, except for two samples in misassemblies), with a large effect size (Wilcoxon, r ≥ 0.5). This large effect indicates that the improvements observed are not only statistically significant but also notably important for the dataset. Although the differences in genome fraction and mismatches per 100 kbp did not reach statistical significance, likely due to the small magnitude of change and limited genome coverage in our reference genomes, they could still hold biological relevance, as discussed in the Discussion section. These results are presented in Fig. 1 and Supplementary Data 1.
Differences in a genome fraction (%), b duplication ratio, c mismatches per 100 kbp, and d number of misassemblies were assessed using paired one-sided Wilcoxon signed-rank tests. Violin plots illustrate the distribution of each assembly metric across the 49 reference genomes, with overlaid boxplots indicating medians (center line), interquartile ranges (25th and 75th percentiles), and whiskers extending to 1.5× the interquartile range.
Effect of sequencing depth
To evaluate how sequencing depth influences assembly quality, we analyzed whether pooling air samples, each initially sequenced at an average depth of 4.29 ± 1.45 million (mean ± SD) paired-end reads after quality control, improved assembly metrics and increased genome fraction. We observed variability in assembly performance across atmospheric metagenomes representing different conditions. For instance, some samples, such as d_East, exhibited consistently higher values across multiple metrics, while others, such as c_Northwest_1, showed lower values. Despite these differences, we identified meaningful trends across datasets. As expected, genome fraction (%) increased with sequencing depth, as indicated by our regression model. Other assembly metrics followed more complex trajectories. For instance, the effect of sequencing depth on mismatches per 100 kbp was modest, with only about 0.1% of the variance in this metric explained by the fixed effects of the model (marginal R2 < 0.01). In contrast, when considering both the fixed and random effects, such as variability between samples, nearly all the variance was accounted for (conditional R2 = 0.99). This large variance between samples likely reflects differences in microbial composition and the varying degrees of environmental stressors to which the samples were exposed during transport, such as low humidity, UV radiation and pollutants. These factors could affect airborne DNA during atmospheric transport, potentially leading to sequencing errors or modifications. Other assembly metrics exhibited non-linear trends. Duplication ratio and misassembled contig length initially increased with sequencing depth but plateaued once sequencing reached ~30 million reads. This saturation point suggests that genome coverage and assembly accuracy reach a threshold beyond which additional sequencing results in diminishing returns for these metrics. The initial increase in assembly errors likely resulted from the overrepresentation of highly abundant genomes, challenges in assembling repetitive regions, and the complexity of merging reads from divergent genomes. However, as sequencing depth reached sufficient levels, these errors stabilized, reflecting an overall improvement in assembly quality. In summary, these results indicate that co-assembling air samples enhances genome coverage while minimizing assembly errors, particularly once sequencing depth reaches the point of saturation. These results are presented in Fig. 2.
The relationship between a genome fraction (%), b duplication ratio, c mismatches per 100 kbp, and d misassembled contig length and sequencing depth using regression models. The red line represents the predicted assembly metrics based on the fitted regression model, with shaded areas indicating 95% confidence intervals for the predictions. R2 (marginal) values indicate the proportion of variance explained by the fixed effects of the model, while the p-values from the ANOVA tests assess the statistical significance of the non-linear relationship (polynomial terms) between sequencing depth and assembly metrics.
Contig length and gene prediction
Next, we compared the contig length distributions generated by co-assembly and individual assembly. Co-assembly produced a higher number of longer contigs (762,369 contigs ≥500 bp) and a greater total contig length (555.79 million bp in contigs ≥500 bp) compared to individual assembly (455,333 contigs and 334.31 million bp, respectively, Supplementary Data 2). Statistical analysis confirmed that co-assembly resulted in significantly more contigs and a longer total contig length (≥500 bp) than individual assembly (paired one-sided Wilcoxon signed-rank test, p < 0.05), with a large effect size (Wilcoxon, r ≥ 0.5, Supplementary Data 3). However, as expected, the differences in the number and total length of all contigs (≥0 bp) and very long contigs (≥5000 bp) were not significant across all samples (Fig. 3a, b). This likely reflects the inherent challenges of assembling longer contigs in low-biomass samples, where fragmented reads and low coverage limit the ability to reconstruct very long sequences.
Differences in a total contig number, b total contig length, c total gene number, and d total gene length between the two assembly approaches were evaluated using paired one-sided Wilcoxon signed-rank tests. Boxplots display median values as center lines, with lower and upper hinges representing the 25th and 75th percentiles. To ensure comprehensive data representation, whiskers extend to the full range of observed values, including outliers. e–g depict the cumulative distribution of gene lengths for complete, partial, and incomplete genes, respectively, across the two assembly approaches. For visual clarity, genes in each sample were divided into 50 bins, each bin containing an equal number of genes sorted in descending order of length. Within each bin, total gene length and the number of genes were calculated.
We also compared gene prediction between co-assembly and individual assembly across all contigs. Co-assembly generated significantly more complete genes and a longer total gene length than individual assembly (paired one-sided Wilcoxon signed-rank test, p < 0.05), with a large effect size (Wilcoxon, r ≥ 0.5, Supplementary Data 3). In contrast, individual assembly yielded more incomplete genes, but the total gene length was unaffected, and the differences in incomplete gene counts were not statistically significant. These findings suggest that co-assembly enhances the recovery of complete genes, likely by assembling fragmented sequences into more complete forms, thereby improving overall gene prediction quality (Fig. 3c, d). The advantages of co-assembly were even more pronounced when comparing cumulative gene length distributions (Fig. 3e-g).
Furthermore, we found that co-assembly generally produced longer contigs annotated with at least one AR and virulence gene, and these contigs exhibited higher coverage (one-sided Mann–Whitney U test, p < 0.05). Similarly, co-assembly generated longer ORFs associated with AR and virulence genes, with these genes also displaying higher coverage (one-sided Mann–Whitney U test, p < 0.05). However, the observed differences were relatively small (Wilcoxon, r < 0.3), likely because only a fraction of functionally relevant fragments from individual assemblies were successfully reconstructed into longer sequences during co-assembly. These findings suggest that co-assembly enhances the reconstruction of functionally relevant genomic elements, providing a more accurate and comprehensive representation of genes associated with AR and virulence. These results are presented in Supplementary Fig. 1 and Supplementary Data 4.
Microbial composition, diversity, and richness
We found that the composition, diversity, and richness of AR and virulence genes in the assembled contigs were generally comparable between individual and co-assembly approaches (Supplementary Fig. 2). However, richness was slightly higher in individual assembly (paired two-sided Wilcoxon signed-rank test, p = 0.035). This difference was primarily due to low-abundance ORFs, which are more prone to fragmentation during assembly. These fragmented ORFs were often annotated with distinct AR and virulence genes in individual and co-assembled contigs, contributing to the observed richness differences (Supplementary Fig. 3). A more detailed discussion of these findings is provided in the Discussion section.
To assess whether the increased total contig number and length in co-assembly influenced taxonomic annotation, we used Kraken2 to compare co-assembled and individually assembled datasets. Our results showed that composition, diversity, and richness were comparable between the two approaches (Supplementary Fig. 4). To further validate these results, we employed the DIAMOND-MEGAN-LR protocol for taxonomic classification. Unlike Kraken2, which assigns taxonomy through exact k-mer matching, DIAMOND-MEGAN-LR relies on translated sequence alignment against protein databases. This allows for more sensitive detection of distant homologs, which is particularly useful in analyzing metagenomes with highly fragmented or poorly annotated reference sequences. For instance, many airborne pathogens occur at low abundance, and their genomes may be highly fragmented or lack well-characterized reference sequences in nucleotide databases. This can lead to false negatives if taxonomic classification relies solely on k-mer-based approaches. Therefore, combining DIAMOND-MEGAN-LR with Kraken2 provides a complementary strategy for analyzing environmental and low-biomass microbiomes. Our results from DIAMOND-MEGAN-LR corroborated those from Kraken2, further confirming that co-assembly and individual assembly yielded similar taxonomic compositions, diversity, and richness (Supplementary Fig. 5). However, we observed notable differences between the Kraken2 and Diamond-Megan-LR in the total number of mapped reads, and estimated diversity and richness across taxonomic levels. For instance, Kraken2 generally reported higher diversity and richness at the species, genus, and phylum levels, although the difference in phylum-level diversity was not statistically significant. Additionally, the total number of reads mapped to taxonomic classifications varied between methods and was sample-dependent (Supplementary Fig. 6).
These methodological differences were particularly evident in pathogen detection. Both Kraken2 and DIAMOND-MEGAN-LR identified key pathogens, including Klebsiella pneumoniae, Escherichia coli, Salmonella enterica, Staphylococcus aureus, and Pantoea agglomerans. However, their abundance estimates varied between methods. Notably, DIAMOND-MEGAN-LR identified fewer taxa overall compared to Kraken2 but detected some taxa that Kraken2 missed, including Vibrio cholerae, Klebsiella oxytoca, and Alternaria species (Supplementary Data 5). Additional details on the potential hosts and diseases associated with these species are provided in Supplementary Data 6.
Plasmid detection
Plasmids are key vectors in the global dissemination of ARGs33,40. To evaluate plasmid prediction performance, we compared individual and co-assembly approaches. Since plasmid prediction tools achieve higher sensitivity and precision with longer contigs36,41, we restricted our analysis to contigs ≥500 bp. We found that co-assembly detected significantly more plasmids-associated contigs than individual assembly (2422 ± 2246 RPKM vs. 1678 ± 2056 RPKM, mean ± SD; paired one-sided Wilcoxon signed-rank test, p = 0.031), with a large effect size (Wilcoxon, r = 0.813). Similarly, co-assembly yielded a higher abundance of plasmid contigs carrying AR and virulence genes compared to individual assembly (233 ± 288 RPKM vs. 190 ± 328 RPKM). However, this difference was not statistically significant (paired one-sided Wilcoxon signed-rank test, p = 0.156). These findings suggest that co-assembly enhances plasmid detection, providing a more accurate and comprehensive representation of airborne mobile AR and virulence genes. This improvement is likely due to increased genome coverage and the enhanced resolution of fragmented sequences compared to individual assemblies.
Overall, these improvements in genome reconstruction and gene detection are particularly relevant for studying mobile ARGs in low biomass environments. By enhancing the recovery of complete and high-confidence ARG sequences, including those carried on plasmids, co-assembly provides a more accurate and comprehensive representation of ARG dynamics in airborne microbial communities.
Mobile antibiotic resistance genes and virulence factors
We identified a subset of the air metagenomes containing AR and virulence genes, which we quantified relative to the total ORF abundance across all samples. Specifically, AR and virulence genes accounted for 0.89 ± 1.24% and 0.07 ± 0.09% (mean ± SD) in contigs ≥500 bp, respectively, and 0.85 ± 0.85% and 0.06 ± 0.06% when considering all contigs.
To assess potential biases in functional annotation, we examined whether read mapping and contig length differed between functionally annotated contigs and the entire dataset. We found that contigs carrying at least one AR or virulence gene exhibited trends in read mapping and length distribution like those of total contigs in each sample. Notably, many of these genes were annotated on contigs longer than ~250 bp (Supplementary Fig. 7). Short contigs with extremely low read coverage are often associated with spurious functional annotations. Therefore, our analysis suggests that the observed functional annotations are largely reliable. However, we identified a small subset of contigs, up to approximately 400 bp in length, that lacked any matched reads during read mapping (zero-coverage contigs in Supplementary Fig. 7, x-axis). This finding suggests that annotations on very short contigs may be prone to bias, possibly due to the formation of chimeric contigs. To focus specifically on mobile AR and virulence genes, and to ensure accurate functional annotation and reliable abundance estimates, we analyzed only those present on contigs ≥500 bp. However, for completeness, we also report the abundance of all AR and virulence genes (without size filtering) in Supplementary Data 7.
Our analysis revealed a diverse array of AR and virulence genes in air samples, with their composition and abundance varying across environmental conditions. Specifically, samples collected during elevated temperatures and Easterly dust storms exhibited higher total gene abundance (RPKM) compared to samples collected under lower temperatures, including those affected by dust storms from the southwest (Figs. 4a, b). This pattern likely arises because Easterly dust storms transport anthropogenically influenced microbial communities. These air masses pass through the Fertile Crescent, a region spanning northern Israel, Syria, and Iraq, where human activity is more prevalent. In contrast, southwesterly and northwesterly air masses predominantly traverse the Mediterranean Sea and Sahara Desert, regions characterized by vast sand dunes and low population densities. However, southwesterly air masses occasionally pass over Cairo, a highly populated area, which may contribute to variations in microbial composition. The observed differences between c_Northwest_1 and c_Northwest_2, as well as between c_Southwest_1 and c_Southwest_2, may be attributable to rising temperatures or the influence of a major Easterly dust storm occurring between these sampling periods42.
Composition of a antibiotic resistance and b virulence genes in co-assembled air samples, color-coded by antibiotic resistance mechanism and virulence category. Composition of mobile c antibiotic resistance and d virulence genes. Total gene abundances (RPKM) in each sample are indicated by a white four-pointed star on the right y-axis.
We found that most AR and virulence genes were not associated with mobile genetic elements or plasmids and were therefore classified as chromosomal (90.80 ± 1.36% and 90.30 ± 4.15%, respectively, mean ± SD). The remaining genes were associated with plasmids, ICEs, IS elements, and integrons as follows: plasmids (7.13 ± 1.10% and 8.03 ± 4.05%), conjugation elements (1.84 ± 0.33% and 1.86 ± 1.02%), insertion sequences (0.29 ± 0.10% and 0.29 ± 0.11%), and integrons (0.03 ± 0.01% for AR contigs only).
The composition of mobile AR and virulence genes varied widely across different samples (Fig. 4c, d). ABC Transporter genes, resistance-modulating genes, and acetyltransferase-based resistance mechanisms were predominant. Mobile virulence factors were less diverse, mainly dominated by functions associated with stress survival. Interestingly, acetyltransferases, which catalyze N-acetyltransferase reactions43, are often found in clinical isolates where they modify antibiotics through acetylation, rendering them ineffective44. Despite their overall high abundance, genes conferring antibiotic resistance through acetyltransferases were rare among mobile antibiotic resistance genes. This may be due to the prevalence of other resistance mechanisms, such as efflux pumps, which might be favored by environmental selective pressures, resulting in less frequent mobilization of acetyltransferase genes. Virulence-related genes identified in air samples were primarily associated with virulence factors that enhance pathogenicity in specific bacterial species. However, many of these genes are also present in non-pathogenic microbes42. The dual presence makes it challenging to assess their direct impact on human health.
Importantly, we detected mobile AR genes conferring resistance to a wide range of medically significant antibiotics, including aminoglycosides, beta-lactams, fosfomycin, glycopeptides, quinolones, and tetracyclines45. The abundance of mobile AR genes varied considerably between samples, reflecting similar trends observed for total ARGs. Notably, during a typical Easterly dust storm, when PM10 concentrations exceeded 900 µg/m3, dust particles transported up to 42 times more total mobile ARGs (measured by RPKM abundance) compared to clear-air conditions (Fig. 5). This highlights the potential role of atmospheric transport in disseminating mobile antibiotic resistance genes across large geographic regions.
The RPKM values of genes linked to both mobile genetic elements and antibiotic resistance are labeled outside the circle. Each chord diagram represents the corresponding sample name. ARGs are grouped and displayed based on the classes of antibiotics they confer resistance to. At the bottom of each chord diagram, bar plots represent total gene (ORF) abundance, total antibiotic resistance gene abundance, and total mobile antibiotic resistance gene abundance. All values are expressed as log10 RPKM, with exact RPKM values displayed above each bar.
Metagenome-assembled genomes
To further investigate the functional potential of airborne microbial communities and their implications for human and ecosystem health, we attempted to reconstruct MAGs from air samples. Although co-assembly improved genome reconstruction, the resulting datasets remained highly fragmented, as shown in previous sections. To address this challenge, we tested whether metagenomic binning could provide deeper insights into the association between ARGs and potential pathogens.
Using co-assembled reads, we recovered 39 MAGs, ranging from low-quality to medium quality with an average completeness of 13.57 ± 10.88% (mean ± SD), based on established guidelines (Supplementary Data 8)46. Despite testing various co-assembly strategies, including combining all dusty-day samples to improve MAG completeness, we did not achieve higher recovery rates. Among all generated bins, only 13 (22.84 ± 14.44%, mean ± SD completeness) could be phylogenetically classified (Supplementary Figs. 8–13).
The taxonomic composition of MAGs closely reflected the origins of air masses. Clear-day samples, influenced by northwesterly and southwesterly air masses, primarily contained MAGs classified within Cyanobacteriota, specifically the Aliterella genus, a taxon commonly found in aquatic environments47. In contrast, dusty-day samples, driven by easterly air masses, predominantly harbored MAGs of soil-dwelling bacteria, including members of Actinomycetota and Bacillota, with genera such as Geodermatophilus, Kocuria, Rubrobacter and Arthrobacter48. These findings further support the air mass origins identified in our study, aligning with our previous short-read taxonomic analysis42 and complementing the long contig-based taxonomic identifications obtained through other approaches in this study. Notably, none of the identified MAGs belonged to known pathogenic species. We limited our functional analysis to the MAGs that could be taxonomically classified and had relatively higher completeness (22.84 ± 14.44% vs. 8.93 ± 3.56%, mean ± SD). The functional characteristics of these MAGs varied widely, reflecting typical microbial functions (Supplementary Fig. 14). However, given the high fragmentation of reconstructed genomes, fully resolving their ecological roles and environmental potential remains challenging.
Discussion
We evaluated the effectiveness of co-assembly versus individual assembly in reconstructing airborne microbial genomes and genes, particularly in the context of low-biomass environments. Our results demonstrate that co-assembly improves genome reconstruction, gene prediction, and plasmid detection in airborne metagenomes compared to individual assembly. By pooling samples with similar ecological and microbiome characteristics, we reduced redundancy, improved assembly accuracy, and recovered longer contigs and more complete genes. These improvements are particularly valuable for studying airborne microbial communities, where DNA is often fragmented, sequencing coverage is low, and assembly errors can obscure meaningful biological patterns.
Advantages of co-assembly in air metagenomics
Across key assembly metrics, including genome fraction, duplication ratio, and misassemblies, co-assembly outperformed individual assembly. Although the differences in genome fraction and mismatches per 100 kbp were not statistically significant, they are biologically relevant, as even modest improvements in genome recovery enhance downstream analyses, such as strain-level resolution and functional annotation. Co-assembly also yielded longer contigs and a greater total contig length, indicating its capacity to recover more complete genomic fragments. These results suggest that co-assembly helps mitigate some, though not all, of the challenges commonly associated with low-biomass samples, such as fragmented reads and uneven sequencing depth, by maximizing the shared genetic signal across samples.
As expected, genome fraction increased with sequencing depth, confirming that additional sequencing enhances genome recovery. However, we identified a saturation point beyond which further sequencing provided diminishing returns. While the duplication ratio and misassembly rates initially increased with sequencing depth, they stabilized beyond ~30 million reads, indicating that coverage continues to improve without significantly increasing assembly errors. This plateau likely arises once the assembly process has effectively saturated the complexities introduced by repeated regions, strain-level variation, and environmental stressors affecting airborne DNA integrity, limiting further improvements in assembly accuracy. These findings underscore the importance of optimizing sequencing strategies to balance depth, cost, and assembly efficiency, particularly in studies of airborne microbiomes. The results also suggest that increasing sequencing efforts can be highly beneficial in these contexts.
Despite these improvements, most contigs in our study remained shorter than 1000 bp, which is considerably smaller than those found in higher-biomass environments such as soil and water. This limitation is likely due to the fragmented nature of airborne DNA and the difficulties in assembling low-abundance genomes. Unlike water and soil samples, where DNA integrity is generally better preserved, airborne DNA is continuously exposed to environmental stressors such as UV radiation, oxidative damage, and temperature fluctuations, all of which contribute to DNA fragmentation. Additionally, mechanical and hydrodynamic stress during sample collection, such as the centrifugal forces generated by the Coriolis air sampler, along with handling and multiple freeze-thaw cycles, may further enhance DNA degradation. These factors collectively result in shorter DNA fragments, posing additional challenges for metagenomic assembly and gene recovery in airborne microbiomes.
We also tested different sample grouping strategies to further increase sample sizes, such as pooling all samples collected on clear or dusty days. However, when we attempted to assemble this larger dataset, even a high-memory server with 750 GB of RAM could not process the reads. This failure suggests that co-assembly becomes increasingly difficult when samples originate from highly diverse microbial sources with limited genetic similarity. In contrast, co-assembly was more successful when samples were derived from a single dominant habitat, such as dust storms from specific origins. These findings emphasize the need for careful sample selection in co-assembly approaches, as excessive diversity introduces sequence heterogeneity, which complicates assembly.
Interestingly, while co-assembly improved overall gene prediction accuracy, individual assembly yielded slightly higher gene richness. This difference likely arises because individual assemblies fragmented low-abundance genes into shorter contigs, preventing complete gene reconstruction. Such fragmentation can lead to misannotations and artificially inflated gene diversity, particularly for ARGs, where accurate reconstruction is crucial for distinguishing functional sequences from fragmented, non-functional elements. Our results indicate that while an individual assembly captures some sample-specific genes, co-assembly provides a more biologically meaningful and reliable representation of the airborne resistome.
Beyond airborne microbiomes, these findings have broader implications for other low-biomass environments, such as deep-sea, polar, and extraterrestrial microbiomes, where similar assembly challenges exist. By demonstrating that co-assembly improves genome and gene recovery, we provide a framework that can be applied to other extreme environments.
Mobile airborne ARGs and environmental dissemination
By improving genome reconstruction and functional analysis, co-assembly allowed a more comprehensive assessment of airborne microbial communities and their role in public health and environmental processes. A notable finding was that a subset of airborne ARGs was associated with mobile genetic elements, suggesting their potential for horizontal gene transfer across diverse microbial species. Dust storms substantially increased the abundance of mobile ARGs, supporting our hypothesis that atmospheric transport of dust could facilitate the global dissemination of antibiotic resistance. These findings provide new evidence linking ARG mobility to extreme weather events, highlighting environmental factors that influence the spread of AR on a global scale. Our findings underscore the need for advanced metagenomic approaches to accurately track the movement of ARGs and other clinically relevant genes in atmospheric microbiomes.
Limitations and future directions
Despite the advantages of co-assembly, this approach has inherent limitations. By averaging genetic signals across pooled samples, co-assembly may obscure sample-specific variations and lead to the loss of low-abundance, unique genes. Additionally, while co-assembly improves genome recovery, resolving complex microbial genomes remains challenging, particularly for species with high intra-species variation or repetitive regions.
We also identified methodological issues that require further consideration. Taxonomic annotation results varied between classification tools, similar to findings in previous studies that used diverse taxonomic profilers49,50. Kraken2 detected a higher species diversity, whereas Diamond-Megan-LR identified additional pathogens missed by Kraken2. These differences are likely due to their distinct classification strategies: Kraken2’s k-mer-based approach offers broad taxonomic coverage but may include false positives, whereas Diamond-Megan-LR’s protein-based alignment is more conservative but improves specificity. Although our previous study validated Kraken2 as a reliable tool for analyzing atmospheric communities42, these findings highlight the importance of using multiple annotation tools to achieve a comprehensive assessment of airborne microbial diversity.
Functional gene annotation also posed challenges due to database limitations. Some protein domains associated with antibiotic resistance, such as PF12847 (a methyltransferase ___domain)51 and PF02467 (a transcription factor, WhiB)52, also play essential housekeeping roles. This functional overlap complicates the differentiation between routine cellular processes and true resistance mechanisms. Additionally, we identified ARG-related protein domains in fungal contigs (e.g., PF00903, typically associated with glyoxalase enzymes, and fosfomycin resistance proteins like FosB53, FosA54, and FosX55). This raises the possibility that fungi might possess intrinsic genetic potential for antibiotic resistance, or that horizontal gene transfer may have occurred between bacterial and fungal communities. Given the bacterial focus of current resistance databases, further research is needed to determine the biological significance of these findings. We also observed AR and virulence genes co-occurrence with signatures of IS elements and ICEs within the same ORF. This may reflect multi-___domain proteins with distinct resistance and mobility functions56 or chimeric genes formed through genetic fusion events. Alternatively, annotation artifacts in low-biomass datasets may complicate accurate identification. These scenarios highlight the need for careful genetic context interpretation and detailed functional analysis in metagenomic studies57. Experimental validation will be essential to confirm these associations and assess their ecological and clinical implications.
Despite our efforts, reconstructing high-quality MAGs remained challenging. We recovered only 39 MAGs, most of which were classified as low to medium quality. The low completeness of these MAGs suggests that assembling complete genomes from airborne microbial communities remains difficult due to read fragmentation and the diverse nature of airborne metagenomes. None of the recovered MAGs were classified as belonging to well-known pathogenic species. This is likely because many pathogenic bacteria occur at low abundances in environmental samples, making it difficult to recover their genomes through metagenomic binning. Therefore, airborne pathogen detection should be interpreted with caution in the context of ecosystem dynamics and public health risks. These findings highlight the need for further methodological advancements, such as improved assembly algorithms, deeper sequencing, and optimized binning strategies, to enhance MAG recovery from low-biomass environments.
Future studies should integrate hybrid assembly approaches that combine short- and long-read sequencing to further enhance genome reconstruction in airborne metagenomes27. This strategy could improve contiguity, facilitate the recovery of mobile genetic elements, and enable the identification of previously undetected pathogens and ARGs. Additionally, increasing DNA recovery from air samples, such as using multiple high-volume air samplers, could enhance sequencing depth and assembly quality. Complementing metagenomics with cultivation-based microbiology, metatranscriptomics, and single-cell genomics could provide deeper insights into the functional potential and ecological dynamics of airborne microbial communities.
Conclusion
Our study demonstrates that co-assembly offers distinct advantages over individual assembly in reconstructing airborne microbial genomes and genes. Co-assembly provides a more reliable characterization of airborne microbial communities by reducing redundancy, enhancing genome completeness, and improving gene prediction accuracy. Most importantly, our findings highlight the role of dust storms in the global dispersion of ARGs. Notably, some of these genes confer resistance to medically important antibiotics, and appear to be mobile, potentially facilitating the wider spread of antibiotic resistance. Given the increasing concern over environmental antibiotic resistance, this research offers critical insights into the factors shaping ARG distribution in the atmosphere. By refining genomic and computational methods, and deepening our understanding of airborne microbial dynamics, we can improve risk assessment and develop more effective surveillance and mitigation strategies.
Materials and methods
Sample collection and atmospheric conditions
We conducted air sampling from the rooftop of a four-story building at the Weizmann Institute of Science in Rehovot, Israel (31.9070°N, 34.8102°E; 80 m above mean sea level) over a 64-day period, from the end of March to the late May 2022. This site is frequently exposed to dust storms originating from prominent Middle Eastern dust sources, including the Saharan Desert, Iraq, Iran, and Saudi Arabia28. For air sampling, we used the Coriolis µ microbial air sampler (Bertin Technologies). Samples were collected for 2 hours at a flow rate of 200 L/min. The impinger efficiently captures particles larger than 0.5 µm, with a collection efficiency (D50 value) surpassing this threshold. To maintain sample integrity, we stored the collected samples in 15 ml of a nucleic acid preservation solution containing 280 g L−1 ammonium sulfate, 25 mM sodium sulfate, and 10 mM EDTA (pH 5.2)58. To minimize evaporation, we supplemented the samples with a top-off flow rate of 0.5–1.5 ml/min of molecular-grade water. Before each sampling event, we thoroughly decontaminated the Coriolis sampler as previously described42. Following collection, we concentrated the sample onto autoclaved and sterilized polyethersulfone (PES) filters (diameter of 25 mm and a pore size of 0.1 μm; Sartorius Stedim Biotech) using a sterile single-use syringe and reusable syringe filter holders (Sartorius Stedim Biotech). The filters were then transferred into Falcon tubes and stored at −20 °C until further processing. DNA extraction was performed within 2–3 days post-collection to ensure optimal sample quality. To account for potential contamination, we generated operating blanks by running the sampler for 5 seconds without collecting air samples. This process was repeated on randomly selected dates, yielding a total of seven operating blanks throughout the study.
We also monitored particulate matter concentration (PM, μg m−3) and temperature (T, °C) using data from the Rehovot Air Monitoring station, located ~1 km from the sampling site. To estimate the origin of the dust and air masses, we used time-averaged maps of dust column mass density59 and the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model60,61. Additional details on the sampling campaign and the analysis methods can be found in a recent study42. Throughout the campaign, we collected a total of 45 air samples (Supplementary fig. 15 and Supplementary Data 9).
DNA Extraction, library preparation, and metagenomic sequencing
We extracted DNA from the air filters using the PowerWater DNA Isolation Kit (Qiagen, Dresden, Germany) following the manufacturer’s protocol. DNA quantification was performed using a Qubit 2.0 fluorometer with the Qubit dsDNA HS (High Sensitivity) Assay Kit (Invitrogen) and dsDNA HS buffer. The quantification results are provided in Supplementary Data 9. For shotgun metagenomic library preparation, we followed the Illumina DNA Prep Reference Guide. Briefly, we fragmented 1 ng of input DNA, performed a post-fragmentation cleanup, and amplified the fragments through 12 cycles of PCR while incorporating unique dual indexes (Nextera DNA UD Indexes) into each sample. To ensure balanced sequencing representation, we initially pooled equal volumes of each library to create a preliminary pool. This pool was purified using SPB beads and eluted in molecular-grade water. We quantified the eluted library using Qubit and assessed fragment distribution with TapeStation (D1000 ScreenTape). The quantified library was then loaded into a MiniSeq Mid-Output reagent cartridge (300 cycles) for preliminary sequencing. Based on these results, we generated a final equimolar pool, ensuring uniform representation of each sample. The final pooled library underwent another round of SPB bead cleanup and was eluted in water. We determined the concentration using Qubit and validated fragment size distribution with TapeStation. The library was then loaded onto a NovaSeq X lane for high-throughput sequencing with paired-end 2 × 150 bp reads. Library preparation and sequencing were conducted at the Rush Genomics and Microbiome Core Facility (GMCF) located at Rush University. Additional details on sequencing depth for each sample are provided in Supplementary Data 10.
Data processing and metagenomic analysis
We performed a comprehensive quality assessment of the metagenomic datasets using FastQC software (version 0.11.9, www.bioinformatics.babraham.ac.uk/projects/fastqc). Quality trimming and Illumina adapter removal were executed using Trimmomatic (version 0.75) with the following parameters: ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:20, TRAILING:20, SLIDINGWINDOW:4:15, and MINLEN:35. After applying these quality control steps, we obtained an average of 4.3 million high-quality reads per sample, with read counts ranging from 3 to 11 million paired-end reads (Supplementary Data 11). Basic sequencing statistics for these samples were generated using SeqKit (v2.3.1)62. To examine microbial composition in relation to atmospheric conditions, we classified the metagenomic libraries into six distinct groups based on dominant air mass trajectories and microbial community similarities. These groupings were determined using permutational MANOVA (PERMANOVA), Principal Coordinates Analysis (PCoA) with Bray–Curtis dissimilarity, and hierarchical clustering based on taxonomic and functional characteristics, following our previous study42.
To improve metagenomic reconstruction, we merged samples within each group for co-assembly. Sample identifiers (t) correspond to time points spanning the entire sampling campaign, from t1 (beginning) to t64 (end) (Supporting Fig. 1): c_Northwest_1 (t12, t15, t16, t17, t23, and t24) and c_Southwest_1 (t5, t13, t14, t20, t22, t30, t31, t34, and t35) represent northwesterly and southwesterly air masses, respectively. These were observed under clear conditions before a period of consecutive dust storms and low daily mean temperatures. c_Northwest_2 (t50, t51, t56, t57, and t58) and c_Southwest_2 (t37, t38, t41, t42, t45, t49, t52, t55, t59, and t64) represent northwesterly and southwesterly air masses, respectively. These were observed under clear conditions following a period of consecutive dust storms and increasing mean daily temperatures. d_East (t1, t27, t28, t29, t32, and t62) and d_Southwest (t6, t7, t8, t9, t10, t11, t21, t33, and t36) represent dusty conditions of easterly and southwesterly air masses.
Metagenomic binning
We merged the high-quality reads and subsequently assembled and binned them using the metaWRAP (v.1.3.2) pipeline42. Briefly, for each sample, we co-assembled reads using the metaWRAP assembly module, which conducts the assembly with metaSPAdes (--metaspades)63. Contigs shorter than 500 bp were removed to improve assembly quality. To enhance binning, we reassembled the reads that did not map back to the metaSPAdes assembly using MegaHit64. We then performed metagenomic binning using three complementary algorithms: MaxBin265, CONCOCT66, and metaBAT267, utilizing the Binning module (-maxbin2 -concoct -metabat2). To refine the bins, we applied the bin_refinement module. Given the relatively low sequencing depth of our samples, we set a minimum completion threshold of 5% and a maximum contamination threshold of 10% during the refinement process. After refinement, we further improved the bins by using the reassemble_bins module. We assessed bin completeness and contamination using CheckM68 within this pipeline. The binning process grouped contigs into distinct genome bins, and after refinement and quality assessment, bins that met the thresholds were consolidated into metagenome-assembled genomes (MAGs), classified as follows: low-quality draft (<50% completeness, <10% contamination), medium quality draft (≥50% completeness, <10% contamination) and high-quality draft (>90% completeness, <5% contamination) based on established guidelines46.
We taxonomically annotated the MAGs using GTDB-Tk (v2.1.1)69 with the classify workflow (classify_wf). A phylogenomic tree of these MAGs was created using the Interactive Tree of Life (v6)70. Finally, we functionally annotated the MAGs using eggNOG-mapper (v2.1.12)71 against the EggNOG 6.0 database72 with the following parameters: --itype genome and -m diamond.
AR and virulence gene annotation
We predicted open reading frames (ORFs) for protein-coding genes using Prodigal (version 2.6.3)73 on contigs assembled with metaSPAdes, applying the ‘-p meta’ option. To ensure a comprehensive analysis, we examined all contigs when comparing individual and co-assembly approaches. However, given the highly fragmented and low-biomass nature of air samples, we excluded contigs shorter than 500 bp when detecting ARGs and mobility elements (e.g., plasmids) to enhance accuracy. To annotate AR and virulence genes in ORFs, we used hmmscan (HMMER, v3.3.2-gompic-2020b)74 with a stringent e-value threshold of 1 × 10−5. In instances where the same ORFs produced multiple hits to the same Pfam entry, we selected only the top matches with the highest alignment reliability (acc value), scores, and the lowest E-values. For antibiotic resistance, we utilized the Resfams HMM Database (Core, v1.2)75. To identify virulence-related genes, we initially searched for Pfam modules related to ‘virulence’ and ‘pathogenicity’ and downloaded the corresponding Pfam profiles76. We manually checked the virulence role of these Pfam modules through a literature search and removed spurious modules from our list. Subsequently, we integrated all Pfam models, including the Resfams HMM Database, and performed hmmscan as described above.
Mobile genetic element annotation
We used PLASMe36 to identify plasmid contigs from metagenome assemblies, setting the minimum coverage and identity thresholds for BLASTN at 90% and 70%, respectively (-c 0.9, -i 0.7, -p 0.5). We chose PLASMe due to its effectiveness with the short contig lengths. PLASMe’s hybrid approach, which combines sequence alignment and deep learning, outperforms other methods for short contig lengths. In contrast, many widely used tools require contig lengths longer than 1000 bp for accurate prediction.
To identify putatively mobile elements from metagenome assemblies, we used several tools. We employed hmmscan to predict integrative and conjugative elements (ICEs) on ORFs using the CONJScan HMM database (v2.0.1)77. In the hmmscan results, we retained only the best hits as previously described. Additionally, we used ISEScan78 to predict insertion sequence (IS) elements with default parameters and IntegronFinder (version 2.0)79 with the (--local-max) option to predict complete integrons or integron-associated elements, such as attC sites on assembled contigs.
We identified contigs linked to AR and virulence genes as putatively mobile if they were also linked to one of the IS elements, ICEs, and integrons. This classification is informed by the sizes of known composite transposons, which can range up to 65 Kbp (mean: 11.74 Kbp)80, ICEs that range from 20 Kbp to 500 Kbp56,81, and bacterial integrons, most of which are shorter than 10 Kbp82. Given that most AR and virulence gene associated contigs in our dataset did not exceed 10 Kbp, we infer that if these contigs contain both AR or virulence genes and mobile genetic elements, they are likely associated. This approach is consistent with previous studies83,84. Contigs not linked to any mobile elements or plasmids are considered chromosomal.
Sequence mapping
To estimate the abundance, we first mapped reads from each sample to each MAG, contig, or ORF using Bowtie2 (v2.5.1)85 with the (--very-sensitive-local) option. We then calculated coverage information using pileup.sh (BBMap, v39.01, sourceforge.net/projects/bbmap/) and converted it into reads per kilobase per million mapped reads (RPKM) to estimate relative abundance. This normalization procedure adjusts for both library size, accounting for the total number of mapped reads, and MAG, contig, or gene length, ensuring accurate rescaling. For ORFs predicted as partial or incomplete genes, we estimated the gene abundance accordingly.
Taxonomic identification
We performed taxonomic annotation on contigs using Kraken2 (v2.1.2)86 with default parameters and cross-check these results with Diamond-Megan-LR protocol87, which is specifically designed for long-read sequences. Additional details of these analyses are provided in the Supporting Information.
Assembly evaluation
We assessed both individual and co-assembled contigs without applying length-based filtering. To evaluate ORF completeness, we used Prodigal, categorizing genes into three categories: complete genes contain a true boundary (a start or a stop codon); partial genes lack either a start or stop codon; incomplete genes have fragmented or missing edges.
To reduce redundancy in individual assembly results, we clustered contigs (nucleotide level) and ORFs (amino acid level) using MMseqs2 (v13-45111)88 with the following parameters: --cov-mode 1, --cluster-mode 2, -c 0.95, and --min-seq-id 0.95. Further details of these analyses are provided in the Supporting Information. For protein abundance estimation, we mapped sequencing reads to the clustered protein-coding sequences and calculated RPKM as previously described. We then performed taxonomic and functional annotations on these non-redundant representative contigs, as well as genes derived from individual assemblies, estimating their abundance using the same approach.
To assess the quality of both individual and co-assemblies, we applied a reference-based evaluation using MetaQUAST (v5.2.0)89 with the --unique-mapping and --reuse-combined-alignments flags. These settings ensure that each contig is mapped to a single reference genome position. For this analysis, we used a curated set of reference genomes representing a subset of the airborne microbiome community, comprising 49 species (Supplementary Data 12). These species were selected based on their high overall abundance and prevalence, as determined in our previous study through read-based taxonomic annotation of the same metagenomic dataset42.
We focused on widely used assembly metrics39, including genome fraction, mismatches per 100 kbp, duplication ratio, the number of misassemblies and misassembled contigs length. The genome fraction represents the percentage of reference bases covered by assembled contigs after similarity-based mapping. Mismatches per 100 kbp indicate the number of mismatched bases in the contig-to-reference alignment. The duplication ratio is calculated as the total number of aligned bases in the assembly divided by the total number of aligned bases in the reference genome. The number of misassemblies quantifies contigs that contain gaps exceeding 1 kbp, insertions larger than 1 kbp, or alignments to multiple distinct genomes. Although NGA50 is a standard metric for assessing assembly contiguity, we were unable to compute it due to the low coverage of reference genomes, an inherent limitation of air samples’ low-biomass nature.
Impact of sequencing depth
To investigate how sequencing depth influences metagenomic assembly, we subsampled paired-end reads pooled based on microbiome similarity, as described previously. We generated subsamples at various depths (2 M, 5 M, 7.5 M, 10 M, 15 M, 20 M, 25 M, 30 M, and 35 M reads) using Seqtk (v1.3-GCC-10.2.0, https://github.com/lh3/seqtk) and assembled them with MetaSPAdes with the --metaspades option63. To evaluate the relationship between sequencing depth and assembly metrics, we applied a linear mixed-effects regression model, treating sequencing depth and assembly metrics as fixed effects while considering the sample type as a random effect. We tested three candidate models: linear regression, second- and third-order polynomial regression. Model selection was performed using the Akaike Information Criterion (AIC), with the model yielding the lowest AIC considered the most optimal fit.
Statistical analysis and hypothesis testing
We evaluated the normality of the data using statistical tests and visual assessments. We performed the Shapiro-Wilk test to formally assess normality and generated Q-Q plots to visually inspect the data distribution. Additionally, we calculated skewness and kurtosis to assess the symmetry and shape of the data. When the normality assumption was violated, we chose to use non-parametric methods. In certain instances, we observed mild deviations from normality but given the small sample size (n = 6), we employed non-parametric methods in these cases as well. To compare the two groups (co-assembly vs. individual assembly), we performed the Wilcoxon signed-rank test and Wilcoxon rank-sum test (Mann–Whitney U test). Statistical significance was defined as p < 0.05. We estimated the effect size of difference between two groups using the Wilcoxon effect size (r) with a 95% confidence interval (CI). The effect size was interpreted as follows: <0.3 (small effect), 0.30 to <0.5 (moderate effect), and ≥0.50 (large effect). For these analyses, we used the following R packages: stats, moments, rstatix and ggplot2.
List of pathogens
To assess the presence of microbial species with known pathogenic potential in the atmospheric microbiome, relevant to human, animal, and plant health, we utilized the comprehensive list provided by the Global Catalogue of Pathogens (gcPathogen) (accessed on 4 June 2024)90. This list consists of 1317 taxa, including 510 bacterial, 407 fungal, 226 viral, and 174 parasitic taxa. We compared the species names from this list with the outputs of Kraken2 and Diamond-Megan-LR analysis to identify their presence in air samples. While our comparison was conducted at the species level, it is important to note that pathogenicity is often determined at the strain level. Therefore, our results indicate the presence of pathogenic lineages rather than confirming the presence of specific pathogenic strains.
Assessment of contamination
To evaluate potential contamination and establish a baseline for accurate assembly, we collected blank (negative control) samples as described earlier. DNA yields from these blanks were either at or below the detection limit (0.005 ng/μL, measured using a Qubit 2.0 fluorometer). Shotgun sequencing of these blanks resulted in negligible sequencing reads for only three out of seven samples after quality assessment, and Kraken2 taxonomic classification confirmed substantially lower read counts in controls compared to air samples (Supplementary Data 10, 11 and 13). These findings suggest minimal background contamination and provide a reference for assessing assembly quality.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The metagenomic sequences utilized in this study have been deposited in the National Center for Biotechnology Information (NCBI) database under the accession number PRJNA1045528.
References
Holmes, A. H. et al. Understanding the mechanisms and drivers of antimicrobial resistance. Lancet 387, 176–187 (2016).
Munita, J. M. & Arias, C. A. Mechanisms of antibiotic resistance. Microbiol. Spectr. 4, https://doi.org/10.1128/microbiolspec.VMBF-0016-2015 (2016).
Kirchhelle, C. The Antibiocene—towards an eco-social analysis of humanity’s antimicrobial footprint. Hum. Soc. Sci. Commun. 10, 619 (2023).
Collaborators, G. B. D. A. R. Global burden of bacterial antimicrobial resistance 1990–2021: a systematic analysis with forecasts to 2050. Lancet 404, 1199–1226. https://doi.org/10.1016/S0140-6736(24)01867-1 (2024).
Groussin, M. et al. Elevated rates of horizontal gene transfer in the industrialized human microbiome. Cell 184, 2053–2067.e2018 (2021).
Ellabaan, M. M. H., Munck, C., Porse, A., Imamovic, L. & Sommer, M. O. A. Forecasting the dissemination of antibiotic resistance genes across bacterial genomes. Nat. Commun. 12, 2435 (2021).
Jiang, X. et al. Dissemination of antibiotic resistance genes from antibiotic producers to pathogens. Nat. Commun. 8, 15784 (2017).
Khedkar, S. et al. Landscape of mobile genetic elements and their antibiotic resistance cargo in prokaryotic genomes. Nucleic Acids Res. 50, 3155–3168 (2022).
Pehrsson, E. C. et al. Interconnected microbiomes and resistomes in low-income human habitats. Nature 533, 212–216 (2016).
Brumfield, K. D. et al. Genomic diversity of Vibrio spp. and metagenomic analysis of pathogens in Florida Gulf coastal waters following Hurricane Ian. mBio 14, e0147623 (2023).
Li, W. et al. Association between antibiotic resistance and increasing ambient temperature in China: an ecological study with nationwide panel data. Lancet Reg. Health West Pac. 30, 100628 (2023).
MacFadden, D. R., McGough, S. F., Fisman, D., Santillana, M. & Brownstein, J. S. Antibiotic resistance increases with local temperature. Nat. Clim. Chang. 8, 510–514 (2018).
Cruz-Loya, M. et al. Stressor interaction networks suggest antibiotic resistance co-opted from stress responses to temperature. ISME J. 13, 12–23 (2019).
Usmani, M. et al. Combating cholera by building predictive capabilities for pathogenic Vibrio cholerae in Yemen. Sci. Rep. 13, 2255 (2023).
Mao, G. et al. Monsoon affects the distribution of antibiotic resistome in Tibetan glaciers. Environ. Pollut. 317, 120809 (2023).
Lee, S. et al. Residential urban stormwater runoff: a comprehensive profile of microbiome and antibiotic resistance. Sci. Total Environ. 723, 138033 (2020).
van Bavel, B. et al. Intersections between climate change and antimicrobial resistance: a systematic scoping review. Lancet Planet Health 8, e1118–e1128 (2024).
Zhang, Z. et al. Assessment of global health risk of antibiotic resistance genes. Nat. Commun. 13, 1553 (2022).
Bai, H. et al. Airborne antibiotic resistome and microbiome in pharmaceutical factories. Environ. Int. 186, 108639 (2024).
Bai, H. et al. Spread of airborne antibiotic resistance from animal farms to the environment: dispersal pattern and exposure risk. Environ. Int. 158, 106927 (2022).
Rodo, X. et al. Microbial richness and air chemistry in aerosols above the PBL confirm 2,000-km long-distance transport of potential human pathogens. Proc. Natl. Acad. Sci. USA 121, e2404191121 (2024).
Mazar, Y., Cytryn, E., Erel, Y. & Rudich, Y. Effect of dust storms on the atmospheric microbiome in the Eastern Mediterranean. Environ. Sci. Technol. 50, 4194–4202 (2016).
Gat, D., Mazar, Y., Cytryn, E. & Rudich, Y. Origin-dependent variations in the atmospheric microbiome community in Eastern Mediterranean Dust Storms. Environ. Sci. Technol. 51, 6709–6718 (2017).
Luhung, I. et al. Experimental parameters defining ultra-low biomass bioaerosol analysis. NPJ Biofilms Microbiomes 7, 37 (2021).
Drautz-Moses, D. I. et al. Vertical stratification of the air microbiome in the lower troposphere. Proc. Natl. Acad. Sci. USA 119, e2117293119 (2022).
Gusareva, E. S. et al. Microbial communities in the tropical air ecosystem follow a precise diel cycle. Proc. Natl. Acad. Sci. USA 116, 23299–23308 (2019).
Reska, T. et al. Air monitoring by nanopore sequencing. ISME Commun. 4, ycae099 (2024).
Bodenheimer, S., Lensky, I. M. & Dayan, U. Characterization of Eastern Mediterranean dust storms by area of origin; North Africa vs. Arabian Peninsula. Atmos. Environ. 198, 158–165 (2019).
Luhung, I. et al. Protocol improvements for low concentration DNA-based bioaerosol sampling and analysis. PLoS One 10, e0141158 (2015).
Lever, M. A. et al. A modular method for the extraction of DNA and RNA, and the separation of DNA pools from diverse environmental sample types. Front. Microbiol. 6, 476 (2015).
Boifot, K. O., Skogan, G. & Dybwad, M. Sampling efficiency and nucleic acid stability during long-term sampling with different bioaerosol samplers. Environ. Monit. Assess. 196, 577 (2024).
Botelho, J. & Schulenburg, H. The role of integrative and conjugative elements in antibiotic resistance evolution. Trends Microbiol. 29, 8–18 (2021).
Castaneda-Barba, S., Top, E. M. & Stalder, T. Plasmids, a molecular cornerstone of antimicrobial resistance in the One Health era. Nat. Rev. Microbiol. 22, 18–32 (2024).
Hall, J. P. J., Harrison, E. & Baltrus, D. A. Introduction: the secret lives of microbial mobile genetic elements. Philos. Trans. R. Soc. Lond. B Biol. Sci. 377, 20200460 (2022).
Harrison, E. & Brockhurst, M. A. Plasmid-mediated horizontal gene transfer is a coevolutionary process. Trends Microbiol. 20, 262–267 (2012).
Tang, X., Shang, J., Ji, Y. & Sun, Y. PLASMe: a tool to identify PLASMid contigs from short-read assemblies using transformer. Nucleic Acids Res. 51, e83 (2023).
Delgado, L. F. & Andersson, A. F. Evaluating metagenomic assembly approaches for biome-specific gene catalogues. Microbiome 10, 72 (2022).
Vosloo, S. et al. Evaluating de novo assembly and binning strategies for time series drinking water metagenomes. Microbiol. Spectr. 9, e0143421 (2021).
Meyer, F. et al. Critical assessment of metagenome interpretation: the second round of challenges. Nat. Methods 19, 429–440 (2022).
Larsson, D. G. J. & Flach, C. F. Antibiotic resistance in the environment. Nat. Rev. Microbiol. 20, 257–269 (2022).
Krawczyk, P. S., Lipinski, L. & Dziembowski, A. PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures. Nucleic Acids Res. 46, e35 (2018).
Erkorkmaz, B. A., Zeevi, D. & Rudich, Y. Dust storm-driven dispersal of potential pathogens and antibiotic resistance genes in the Eastern Mediterranean. Sci. Total Environ. 958, 178021 (2025).
Burk, D. L., Ghuman, N., Wybenga-Groot, L. E. & Berghuis, A. M. X-ray structure of the AAC(6’)-Ii antibiotic resistance enzyme at 1.8 A resolution; examination of oligomeric arrangements in GNAT superfamily members. Protein Sci. 12, 426–437 (2003).
Miller, G. H. et al. The most frequent aminoglycoside resistance mechanisms–changes with time and geographic area: a reflection of aminoglycoside usage patterns? Aminoglycoside resistance study groups. Clin. Infect. Dis. 24, S46–S62 (1997).
Jesudason, T. WHO publishes list of medically important antimicrobials. Lancet Infect. Dis. 24, e284 (2024).
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
Rigonato, J. et al. Aliterella atlantica gen. nov., sp. nov., and Aliterella antarctica sp. nov., novel members of coccoid Cyanobacteria. Int. J. Syst. Evol. Microbiol 66, 2853–2861 (2016).
Battistuzzi, F. U. & Hedges, S. B. A major clade of prokaryotes with ancient adaptations to life on land. Mol. Biol. Evol. 26, 335–343 (2009).
Poussin, C. et al. Crowdsourced benchmarking of taxonomic metagenome profilers: lessons learned from the sbv IMPROVER Microbiomics challenge. BMC Genom. 23, 624 (2022).
Miossec, M. J. et al. Evaluation of computational methods for human microbiome analysis using simulated data. PeerJ 8, e9688 (2020).
Chow, C. S., Lamichhane, T. N. & Mahto, S. K. Expanding the nucleotide repertoire of the ribosome with post-transcriptional modifications. ACS Chem. Biol. 2, 610–619 (2007).
Kormanec, J. & Homerova, D. Streptomyces aureofaciens whiB gene encoding putative transcription factor essential for differentiation. Nucleic Acids Res. 21, 2512 (1993).
Cao, M., Bernat, B. A., Wang, Z., Armstrong, R. N. & Helmann, J. D. FosB, a cysteine-dependent fosfomycin resistance protein under the control of sigma(W), an extracytoplasmic-function sigma factor in Bacillus subtilis. J. Bacteriol. 183, 2380–2383 (2001).
Beharry, Z. & Palzkill, T. Functional analysis of active site residues of the fosfomycin resistance enzyme FosA from Pseudomonas aeruginosa. J. Biol. Chem. 280, 17786–17791 (2005).
Fillgrove, K. L., Pakhomova, S., Schaab, M. R., Newcomer, M. E. & Armstrong, R. N. Structure and mechanism of the genomically encoded fosfomycin resistance protein, FosX, from Listeria monocytogenes. Biochemistry 46, 8110–8120 (2007).
Wozniak, R. A. & Waldor, M. K. Integrative and conjugative elements: mosaic mobile genetic elements enabling dynamic lateral gene flow. Nat. Rev. Microbiol. 8, 552–563 (2010).
Kerkvliet, J. J. et al. Metagenomic assembly is the main bottleneck in the identification of mobile genetic elements. PeerJ 12, e16695 (2024).
Erkorkmaz, B. A., Gat, D. & Rudich, Y. Aerial transport of bacteria by dust plumes in the Eastern Mediterranean revealed by complementary rRNA/rRNA-gene sequencing. Commun. Earth Environ. 4, 24 (2023).
Acker, J. G. & Leptoukh, G. Online analysis enhances use of NASA Earth science data. EoS Trans. Am. Geophys. Union 88, 14–17 (2007).
Rolph, G., Stein, A. & Stunder, B. Real-time Environmental Applications and Display sYstem: READY. Environ. Model. Softw. 95, 210–228 (2017).
Cohen, M. D. et al. NOAA’s HYSPLIT atmospheric transport and dispersion modeling system. Bull. Am. Meteorol. Soc. 96, 2059–2077 (2015).
Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One 11, e0163962 (2016).
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res 27, 824–834 (2017).
Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Wu, Y. W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016).
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38, 5315–5316 (2022).
Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 52, W78–W82 (2024).
Cantalapiedra, C. P., Hernandez-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and ___domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Hernandez-Plaza, A. et al. eggNOG 6.0: enabling comparative genomics across 12 535 organisms. Nucleic Acids Res. 51, D389–D394 (2023).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma. 11, 119 (2010).
Eddy, S. R. Profile hidden Markov models. Bioinformatics 14, 755–763 (1998).
Gibson, M. K., Forsberg, K. J. & Dantas, G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J. 9, 207–216 (2015).
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).
Cury, J., Abby, S. S., Doppelt-Azeroual, O., Néron, B. & Rocha, E. P. C. in Horizontal Gene Transfer: Methods and Protocols (ed Fernando de la Cruz) 265-283 (Springer, 2020).
Xie, Z. & Tang, H. ISEScan: automated identification of insertion sequence elements in prokaryotic genomes. Bioinformatics 33, 3340–3347 (2017).
Neron, B. et al. IntegronFinder 2.0: identification and analysis of integrons across bacteria, with a focus on antibiotic resistance in Klebsiella. Microorganisms 10, 700 (2022).
Nielsen, T. K., Browne, P. D. & Hansen, L. H. Antibiotic resistance genes are differentially mobilized according to resistance mechanism. Gigascience 11, https://doi.org/10.1093/gigascience/giac072 (2022).
Johnson, C. M. & Grossman, A. D. Integrative and conjugative elements (ICEs): what they do and how they work. Annu. Rev. Genet. 49, 577–601 (2015).
Zhang, A. N. et al. Conserved phylogenetic distribution and limited antibiotic resistance of class 1 integrons revealed by assessing the bacterial genome and plasmid collection. Microbiome 6, 130 (2018).
Kent, A. G., Vill, A. C., Shi, Q., Satlin, M. J. & Brito, I. L. Widespread transfer of mobile antibiotic resistance genes within individual gut microbiomes revealed through bacterial Hi-C. Nat. Commun. 11, 4379 (2020).
Lee, K. et al. Population-level impacts of antibiotic usage on the human gut microbiome. Nat. Commun. 14, 1191 (2023).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Bagci, C., Patz, S. & Huson, D. H. DIAMOND+MEGAN: fast and easy taxonomic and functional analysis of short and long microbiome sequences. Curr. Protoc. 1, e59 (2021).
Steinegger, M. & Soding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D. & Gurevich, A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics 34, i142–i150 (2018).
Guo, C. et al. gcPathogen: a comprehensive genomic resource of human pathogens for public health. Nucleic Acids Res. 52, D714–D723 (2024).
Acknowledgements
We acknowledge the Weizmann Institute of Science Sustainability and Energy Research Initiative (SAERI) and the de Botton Center for Marine Science for partially supporting this study through a research grant.
Author information
Authors and Affiliations
Contributions
B.A.E. conceptualized and designed the study, conducted bioinformatics and data analysis, interpreted, and visualized the results, and drafted the manuscript. Y.R. and D.Z. contributed to reviewing and editing the manuscript. All authors actively participated in result discussions and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
Y.R. is an Editorial Board Member for Communications Earth & Environment, but was not involved in the editorial review of, nor the decision to publish this article. All other authors have no competing interests to declare.
Peer review
Peer review information
Communications Earth & Environment thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Alice Drinkwater. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Erkorkmaz, B.A., Zeevi, D. & Rudich, Y. Metagenomic co-assembly uncovers mobile antibiotic resistance genes in airborne microbiomes. Commun Earth Environ 6, 397 (2025). https://doi.org/10.1038/s43247-025-02381-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s43247-025-02381-3