Metagenomic co-assembly uncovers mobile antibiotic resistance genes in airborne microbiomes

Erkorkmaz, Burak Adnan; Zeevi, David; Rudich, Yinon

doi:10.1038/s43247-025-02381-3

Download PDF

Article
Open access
Published: 22 May 2025

Metagenomic co-assembly uncovers mobile antibiotic resistance genes in airborne microbiomes

Communications Earth & Environment volume 6, Article number: 397 (2025) Cite this article

1502 Accesses
Metrics details

Subjects

Abstract

Antibiotic resistance poses a major global threat to health and ecosystems. Genes that confer resistance and can move between microbes, known as mobile antibiotic resistance genes, are especially concerning, as they can spread into disease-causing bacteria. While many pathways for gene dispersal are known, airborne transport remains poorly understood due to the difficulty of collecting and analyzing low-biomass air samples. Here we sampled air during clear weather and Middle Eastern dust storms and applied a genomic method called co-assembly, which merges multiple datasets to improve gene detection. This approach enhanced gene recovery and revealed resistance genes against clinically important antibiotics, including aminoglycosides, beta-lactams, fosfomycin, glycopeptides, quinolones, and tetracyclines. However, most resistance genes were not clearly linked to mobile elements, reflecting the complexity of airborne microbial communities. Our findings suggest the potential for long-range airborne spread of antibiotic resistance and underscore the need for continued atmospheric monitoring and strategies to mitigate environmental dissemination.

Genomic attributes of airway commensal bacteria and mucosa

Article Open access 12 February 2024

Sierra Nevada sweep: metagenomic measurements of bioaerosols vertically distributed across the troposphere

Article Open access 24 July 2020

Detection and genetic characterization of multidrug-resistant staphylococci isolated from public areas in an international airport

Article Open access 12 November 2024

Introduction

Antibiotic resistance (AR) is an evolutionary process in which microorganisms acquire the ability to survive and proliferate in the presence of antibiotics. This resistance emerges through random mutations in microbial DNA and horizontal gene transfer^1,2. While AR is a natural survival mechanism, human activities have significantly amplified its scale and impact, transforming it into a significant global threat to public health and ecosystem stability. The widespread misuse of antibiotics in healthcare, agriculture, and industry have accelerated the emergence and dissemination of resistance³. By 2050, drug-resistant infections are projected to cause nearly 2 million deaths annually, resulting in severe social and economic consequences⁴. A key mechanism driving the spread of AR is the horizontal transfer of antibiotic resistance genes (ARGs) between diverse microbial species^5,6,7. ARGs frequently associate with mobile genetic elements (MGEs), such as plasmids, transposons, and integrative conjugative elements, enabling their movement across species and ecological boundaries. This genetic mobility significantly accelerates the evolution and spread of resistance^8,9. Climate change further aggravates this crisis by influencing the environmental dissemination of ARGs since extreme weather events, such as floods, high winds, dust storms, and rising temperatures, can enhance the transport and persistence of resistant microbes in the environment^{10,11,12,13,14,15,16,17}.

Airborne ARGs are an emerging concern, yet their spread remains poorly understood. These ARGs originate from diverse sources, including agricultural fields, wastewater treatment plants, industrial sites, and urban environments^18,19,20. Microorganisms carrying ARGs can attach to aerosols, traveling long distances and crossing geographical and ecological barriers^21,22,23. While ARG dynamics in soil, water, and terrestrial ecosystems have been studied extensively, the role of atmospheric processes, such as air mass transport and dust storms, in ARG dispersal remains underexplored. This knowledge gap is critical, as atmospheric transport could facilitate the intercontinental spread of ARGs, potentially influencing global AR patterns. Addressing this gap is essential for developing effective mitigation strategies.

Studying the atmospheric microbiome, particularly airborne ARGs and MGEs, presents significant challenges due to the inherently low biomass of atmospheric microbial communities^24,25,26,27. Comprehensive microbiome analysis requires tens of millions of high-quality sequencing reads, which are difficult to obtain from atmospheric samples. These limitations hinder our ability to accurately characterize the composition, functional potential, and mobility of airborne microbes. The challenge is further exacerbated during transient events such as dust storms, where the sampling window to capture peak microbial loads is typically restricted to a few hours²⁸. Extending sampling durations can increase biomass collection for metagenomic analyses, but it may also dilute event-specific microbial signatures, making it difficult to differentiate dust storm-associated microbiota from background air microbiomes. While optimized sampling and DNA extraction protocols can enhance taxonomic resolution, their effectiveness in linking functional microbiome traits, such as ARG mobility, remains uncertain^24,29,30,31. Higher DNA recovery does not necessarily translate to improved resolution in downstream bioinformatic analyses, as sequencing depth alone does not guarantee accurate reconstruction of functional and phylogenetic relationships. A major challenge in characterizing airborne ARGs and MGEs is the need to assemble sufficiently long genomic fragments to reliably identify genetic mobility and resistance determinants. For example, plasmids, which are key vectors of ARGs^6,32,33, are difficult to detect in metagenomic datasets due to their high sequence diversity and frequent genetic recombination³⁴. Additionally, homologous regions shared between plasmids and chromosomal DNA, a consequence of co-evolution and horizontal gene transfer³⁵, complicate accurate classification. Short DNA fragments, which may encode only partial genes or non-coding sequences, present significant difficulties for plasmid prediction tools, reducing the reliability of ARG mobility assessments³⁶.

To overcome these challenges, improved methodologies are needed. Strategies should focus on maximizing DNA recovery, improving the assembly of longer genomic fragments, and integrating advanced sequencing technologies, such as the combination of short- and long-read sequencing²⁷. One promising approach is the co-assembly of air samples from similar environmental origins, which could enhance the detection of low-abundance genes that are otherwise undetectable in individual samples. This strategy may also yield longer contigs, enabling a more detailed analysis of microbial traits^37,38,39. However, the inherently low biomass and high diversity of airborne microbial communities could introduce potential biases during co-assembly, such as misassemblies or the loss of rare ARGs and plasmids. Developing robust approaches to balance these trade-offs is essential for advancing our understanding of airborne ARGs and their global implications.

Thus, our study aims to (i) evaluate the effectiveness of co-assembling atmospheric genetic data in addressing these key technical limitations, and (ii) by analyzing longer DNA fragments, determine whether ARGs are mobile within the atmospheric microbiome, particularly during dust storms. Understanding these processes can provide critical insights into the airborne dissemination of AR and its broader implications for microbial ecology and public health.

Results

Individual versus co-assembly

To compare the performance of co-assembly against individual assembly for the same set of samples, we grouped 45 air samples into six distinct subgroups based on their taxonomic and functional characteristics, as detailed in the Materials and methods section. Each subgroup was then co-assembled. In co-assembly, all sequencing reads from different samples were pooled and assembled, generating a non-redundant set of contigs and genes. In contrast, individual assembly processes each sample separately, potentially leading to redundant contigs and genes across samples. To ensure a fair comparison, we reduced redundancy in the individual assemblies by clustering similar sequences into families and organizing them according to the corresponding co-assembled samples.

Assembly quality

We assessed assembly quality using four key metrics: genome fraction, duplication ratio, mismatches per 100 kbp, and the number of misassemblies. Across the 49 reference genomes representing a subset of the air microbiome, co-assembly consistently outperformed individual assembly. Specifically, co-assembly achieved a slightly higher genome fraction (4.94 ± 2.64%, mean ± SD) compared to individual assembly (4.83 ± 2.71%). More notably, co-assembly exhibited a lower duplication ratio (1.09 ± 0.06 vs. 1.23 ± 0.20), fewer mismatches per 100 kbp (4379.82 ± 339.23 vs. 4491.1 ± 344.46), and fewer misassemblies (277.67 ± 107.15 vs. 410.67 ± 257.66).

To further understand these differences, we performed a comparative analysis of individual reference genomes across the 49 species. Co-assembly significantly reduced both the duplication ratio and the number of misassemblies (paired one-sided Wilcoxon signed-rank tests, p < 0.05, except for two samples in misassemblies), with a large effect size (Wilcoxon, r ≥ 0.5). This large effect indicates that the improvements observed are not only statistically significant but also notably important for the dataset. Although the differences in genome fraction and mismatches per 100 kbp did not reach statistical significance, likely due to the small magnitude of change and limited genome coverage in our reference genomes, they could still hold biological relevance, as discussed in the Discussion section. These results are presented in Fig. 1 and Supplementary Data 1.

**Fig. 1: Comparison of assembly metrics between individual and co-assembly.**

Effect of sequencing depth

To evaluate how sequencing depth influences assembly quality, we analyzed whether pooling air samples, each initially sequenced at an average depth of 4.29 ± 1.45 million (mean ± SD) paired-end reads after quality control, improved assembly metrics and increased genome fraction. We observed variability in assembly performance across atmospheric metagenomes representing different conditions. For instance, some samples, such as d_East, exhibited consistently higher values across multiple metrics, while others, such as c_Northwest_1, showed lower values. Despite these differences, we identified meaningful trends across datasets. As expected, genome fraction (%) increased with sequencing depth, as indicated by our regression model. Other assembly metrics followed more complex trajectories. For instance, the effect of sequencing depth on mismatches per 100 kbp was modest, with only about 0.1% of the variance in this metric explained by the fixed effects of the model (marginal R² < 0.01). In contrast, when considering both the fixed and random effects, such as variability between samples, nearly all the variance was accounted for (conditional R² = 0.99). This large variance between samples likely reflects differences in microbial composition and the varying degrees of environmental stressors to which the samples were exposed during transport, such as low humidity, UV radiation and pollutants. These factors could affect airborne DNA during atmospheric transport, potentially leading to sequencing errors or modifications. Other assembly metrics exhibited non-linear trends. Duplication ratio and misassembled contig length initially increased with sequencing depth but plateaued once sequencing reached ~30 million reads. This saturation point suggests that genome coverage and assembly accuracy reach a threshold beyond which additional sequencing results in diminishing returns for these metrics. The initial increase in assembly errors likely resulted from the overrepresentation of highly abundant genomes, challenges in assembling repetitive regions, and the complexity of merging reads from divergent genomes. However, as sequencing depth reached sufficient levels, these errors stabilized, reflecting an overall improvement in assembly quality. In summary, these results indicate that co-assembling air samples enhances genome coverage while minimizing assembly errors, particularly once sequencing depth reaches the point of saturation. These results are presented in Fig. 2.

Contig length and gene prediction

Next, we compared the contig length distributions generated by co-assembly and individual assembly. Co-assembly produced a higher number of longer contigs (762,369 contigs ≥500 bp) and a greater total contig length (555.79 million bp in contigs ≥500 bp) compared to individual assembly (455,333 contigs and 334.31 million bp, respectively, Supplementary Data 2). Statistical analysis confirmed that co-assembly resulted in significantly more contigs and a longer total contig length (≥500 bp) than individual assembly (paired one-sided Wilcoxon signed-rank test, p < 0.05), with a large effect size (Wilcoxon, r ≥ 0.5, Supplementary Data 3). However, as expected, the differences in the number and total length of all contigs (≥0 bp) and very long contigs (≥5000 bp) were not significant across all samples (Fig. 3a, b). This likely reflects the inherent challenges of assembling longer contigs in low-biomass samples, where fragmented reads and low coverage limit the ability to reconstruct very long sequences.

**Fig. 3: Comparison of contig and gene prediction metrics between co-assembly and individual assembly.**

We also compared gene prediction between co-assembly and individual assembly across all contigs. Co-assembly generated significantly more complete genes and a longer total gene length than individual assembly (paired one-sided Wilcoxon signed-rank test, p < 0.05), with a large effect size (Wilcoxon, r ≥ 0.5, Supplementary Data 3). In contrast, individual assembly yielded more incomplete genes, but the total gene length was unaffected, and the differences in incomplete gene counts were not statistically significant. These findings suggest that co-assembly enhances the recovery of complete genes, likely by assembling fragmented sequences into more complete forms, thereby improving overall gene prediction quality (Fig. 3c, d). The advantages of co-assembly were even more pronounced when comparing cumulative gene length distributions (Fig. 3e-g).

Furthermore, we found that co-assembly generally produced longer contigs annotated with at least one AR and virulence gene, and these contigs exhibited higher coverage (one-sided Mann–Whitney U test, p < 0.05). Similarly, co-assembly generated longer ORFs associated with AR and virulence genes, with these genes also displaying higher coverage (one-sided Mann–Whitney U test, p < 0.05). However, the observed differences were relatively small (Wilcoxon, r < 0.3), likely because only a fraction of functionally relevant fragments from individual assemblies were successfully reconstructed into longer sequences during co-assembly. These findings suggest that co-assembly enhances the reconstruction of functionally relevant genomic elements, providing a more accurate and comprehensive representation of genes associated with AR and virulence. These results are presented in Supplementary Fig. 1 and Supplementary Data 4.

Microbial composition, diversity, and richness

We found that the composition, diversity, and richness of AR and virulence genes in the assembled contigs were generally comparable between individual and co-assembly approaches (Supplementary Fig. 2). However, richness was slightly higher in individual assembly (paired two-sided Wilcoxon signed-rank test, p = 0.035). This difference was primarily due to low-abundance ORFs, which are more prone to fragmentation during assembly. These fragmented ORFs were often annotated with distinct AR and virulence genes in individual and co-assembled contigs, contributing to the observed richness differences (Supplementary Fig. 3). A more detailed discussion of these findings is provided in the Discussion section.

To assess whether the increased total contig number and length in co-assembly influenced taxonomic annotation, we used Kraken2 to compare co-assembled and individually assembled datasets. Our results showed that composition, diversity, and richness were comparable between the two approaches (Supplementary Fig. 4). To further validate these results, we employed the DIAMOND-MEGAN-LR protocol for taxonomic classification. Unlike Kraken2, which assigns taxonomy through exact k-mer matching, DIAMOND-MEGAN-LR relies on translated sequence alignment against protein databases. This allows for more sensitive detection of distant homologs, which is particularly useful in analyzing metagenomes with highly fragmented or poorly annotated reference sequences. For instance, many airborne pathogens occur at low abundance, and their genomes may be highly fragmented or lack well-characterized reference sequences in nucleotide databases. This can lead to false negatives if taxonomic classification relies solely on k-mer-based approaches. Therefore, combining DIAMOND-MEGAN-LR with Kraken2 provides a complementary strategy for analyzing environmental and low-biomass microbiomes. Our results from DIAMOND-MEGAN-LR corroborated those from Kraken2, further confirming that co-assembly and individual assembly yielded similar taxonomic compositions, diversity, and richness (Supplementary Fig. 5). However, we observed notable differences between the Kraken2 and Diamond-Megan-LR in the total number of mapped reads, and estimated diversity and richness across taxonomic levels. For instance, Kraken2 generally reported higher diversity and richness at the species, genus, and phylum levels, although the difference in phylum-level diversity was not statistically significant. Additionally, the total number of reads mapped to taxonomic classifications varied between methods and was sample-dependent (Supplementary Fig. 6).

These methodological differences were particularly evident in pathogen detection. Both Kraken2 and DIAMOND-MEGAN-LR identified key pathogens, including Klebsiella pneumoniae, Escherichia coli, Salmonella enterica, Staphylococcus aureus, and Pantoea agglomerans. However, their abundance estimates varied between methods. Notably, DIAMOND-MEGAN-LR identified fewer taxa overall compared to Kraken2 but detected some taxa that Kraken2 missed, including Vibrio cholerae, Klebsiella oxytoca, and Alternaria species (Supplementary Data 5). Additional details on the potential hosts and diseases associated with these species are provided in Supplementary Data 6.

Plasmid detection

Plasmids are key vectors in the global dissemination of ARGs^33,40. To evaluate plasmid prediction performance, we compared individual and co-assembly approaches. Since plasmid prediction tools achieve higher sensitivity and precision with longer contigs^36,41, we restricted our analysis to contigs ≥500 bp. We found that co-assembly detected significantly more plasmids-associated contigs than individual assembly (2422 ± 2246 RPKM vs. 1678 ± 2056 RPKM, mean ± SD; paired one-sided Wilcoxon signed-rank test, p = 0.031), with a large effect size (Wilcoxon, r = 0.813). Similarly, co-assembly yielded a higher abundance of plasmid contigs carrying AR and virulence genes compared to individual assembly (233 ± 288 RPKM vs. 190 ± 328 RPKM). However, this difference was not statistically significant (paired one-sided Wilcoxon signed-rank test, p = 0.156). These findings suggest that co-assembly enhances plasmid detection, providing a more accurate and comprehensive representation of airborne mobile AR and virulence genes. This improvement is likely due to increased genome coverage and the enhanced resolution of fragmented sequences compared to individual assemblies.

Overall, these improvements in genome reconstruction and gene detection are particularly relevant for studying mobile ARGs in low biomass environments. By enhancing the recovery of complete and high-confidence ARG sequences, including those carried on plasmids, co-assembly provides a more accurate and comprehensive representation of ARG dynamics in airborne microbial communities.

Mobile antibiotic resistance genes and virulence factors

We identified a subset of the air metagenomes containing AR and virulence genes, which we quantified relative to the total ORF abundance across all samples. Specifically, AR and virulence genes accounted for 0.89 ± 1.24% and 0.07 ± 0.09% (mean ± SD) in contigs ≥500 bp, respectively, and 0.85 ± 0.85% and 0.06 ± 0.06% when considering all contigs.

To assess potential biases in functional annotation, we examined whether read mapping and contig length differed between functionally annotated contigs and the entire dataset. We found that contigs carrying at least one AR or virulence gene exhibited trends in read mapping and length distribution like those of total contigs in each sample. Notably, many of these genes were annotated on contigs longer than ~250 bp (Supplementary Fig. 7). Short contigs with extremely low read coverage are often associated with spurious functional annotations. Therefore, our analysis suggests that the observed functional annotations are largely reliable. However, we identified a small subset of contigs, up to approximately 400 bp in length, that lacked any matched reads during read mapping (zero-coverage contigs in Supplementary Fig. 7, x-axis). This finding suggests that annotations on very short contigs may be prone to bias, possibly due to the formation of chimeric contigs. To focus specifically on mobile AR and virulence genes, and to ensure accurate functional annotation and reliable abundance estimates, we analyzed only those present on contigs ≥500 bp. However, for completeness, we also report the abundance of all AR and virulence genes (without size filtering) in Supplementary Data 7.

Our analysis revealed a diverse array of AR and virulence genes in air samples, with their composition and abundance varying across environmental conditions. Specifically, samples collected during elevated temperatures and Easterly dust storms exhibited higher total gene abundance (RPKM) compared to samples collected under lower temperatures, including those affected by dust storms from the southwest (Figs. 4a, b). This pattern likely arises because Easterly dust storms transport anthropogenically influenced microbial communities. These air masses pass through the Fertile Crescent, a region spanning northern Israel, Syria, and Iraq, where human activity is more prevalent. In contrast, southwesterly and northwesterly air masses predominantly traverse the Mediterranean Sea and Sahara Desert, regions characterized by vast sand dunes and low population densities. However, southwesterly air masses occasionally pass over Cairo, a highly populated area, which may contribute to variations in microbial composition. The observed differences between c_Northwest_1 and c_Northwest_2, as well as between c_Southwest_1 and c_Southwest_2, may be attributable to rising temperatures or the influence of a major Easterly dust storm occurring between these sampling periods⁴².

**Fig. 4: Composition of antibiotic resistance and virulence genes, and their predicted mobility in air samples.**

We found that most AR and virulence genes were not associated with mobile genetic elements or plasmids and were therefore classified as chromosomal (90.80 ± 1.36% and 90.30 ± 4.15%, respectively, mean ± SD). The remaining genes were associated with plasmids, ICEs, IS elements, and integrons as follows: plasmids (7.13 ± 1.10% and 8.03 ± 4.05%), conjugation elements (1.84 ± 0.33% and 1.86 ± 1.02%), insertion sequences (0.29 ± 0.10% and 0.29 ± 0.11%), and integrons (0.03 ± 0.01% for AR contigs only).

The composition of mobile AR and virulence genes varied widely across different samples (Fig. 4c, d). ABC Transporter genes, resistance-modulating genes, and acetyltransferase-based resistance mechanisms were predominant. Mobile virulence factors were less diverse, mainly dominated by functions associated with stress survival. Interestingly, acetyltransferases, which catalyze N-acetyltransferase reactions⁴³, are often found in clinical isolates where they modify antibiotics through acetylation, rendering them ineffective⁴⁴. Despite their overall high abundance, genes conferring antibiotic resistance through acetyltransferases were rare among mobile antibiotic resistance genes. This may be due to the prevalence of other resistance mechanisms, such as efflux pumps, which might be favored by environmental selective pressures, resulting in less frequent mobilization of acetyltransferase genes. Virulence-related genes identified in air samples were primarily associated with virulence factors that enhance pathogenicity in specific bacterial species. However, many of these genes are also present in non-pathogenic microbes⁴². The dual presence makes it challenging to assess their direct impact on human health.

Importantly, we detected mobile AR genes conferring resistance to a wide range of medically significant antibiotics, including aminoglycosides, beta-lactams, fosfomycin, glycopeptides, quinolones, and tetracyclines⁴⁵. The abundance of mobile AR genes varied considerably between samples, reflecting similar trends observed for total ARGs. Notably, during a typical Easterly dust storm, when PM₁₀ concentrations exceeded 900 µg/m³, dust particles transported up to 42 times more total mobile ARGs (measured by RPKM abundance) compared to clear-air conditions (Fig. 5). This highlights the potential role of atmospheric transport in disseminating mobile antibiotic resistance genes across large geographic regions.

**Fig. 5: Antibiotic resistance genes are associated with mobile genetic elements.**

Metagenome-assembled genomes

To further investigate the functional potential of airborne microbial communities and their implications for human and ecosystem health, we attempted to reconstruct MAGs from air samples. Although co-assembly improved genome reconstruction, the resulting datasets remained highly fragmented, as shown in previous sections. To address this challenge, we tested whether metagenomic binning could provide deeper insights into the association between ARGs and potential pathogens.

Using co-assembled reads, we recovered 39 MAGs, ranging from low-quality to medium quality with an average completeness of 13.57 ± 10.88% (mean ± SD), based on established guidelines (Supplementary Data 8)⁴⁶. Despite testing various co-assembly strategies, including combining all dusty-day samples to improve MAG completeness, we did not achieve higher recovery rates. Among all generated bins, only 13 (22.84 ± 14.44%, mean ± SD completeness) could be phylogenetically classified (Supplementary Figs. 8–13).

The taxonomic composition of MAGs closely reflected the origins of air masses. Clear-day samples, influenced by northwesterly and southwesterly air masses, primarily contained MAGs classified within Cyanobacteriota, specifically the Aliterella genus, a taxon commonly found in aquatic environments⁴⁷. In contrast, dusty-day samples, driven by easterly air masses, predominantly harbored MAGs of soil-dwelling bacteria, including members of Actinomycetota and Bacillota, with genera such as Geodermatophilus, Kocuria, Rubrobacter and Arthrobacter⁴⁸. These findings further support the air mass origins identified in our study, aligning with our previous short-read taxonomic analysis⁴² and complementing the long contig-based taxonomic identifications obtained through other approaches in this study. Notably, none of the identified MAGs belonged to known pathogenic species. We limited our functional analysis to the MAGs that could be taxonomically classified and had relatively higher completeness (22.84 ± 14.44% vs. 8.93 ± 3.56%, mean ± SD). The functional characteristics of these MAGs varied widely, reflecting typical microbial functions (Supplementary Fig. 14). However, given the high fragmentation of reconstructed genomes, fully resolving their ecological roles and environmental potential remains challenging.

Discussion

We evaluated the effectiveness of co-assembly versus individual assembly in reconstructing airborne microbial genomes and genes, particularly in the context of low-biomass environments. Our results demonstrate that co-assembly improves genome reconstruction, gene prediction, and plasmid detection in airborne metagenomes compared to individual assembly. By pooling samples with similar ecological and microbiome characteristics, we reduced redundancy, improved assembly accuracy, and recovered longer contigs and more complete genes. These improvements are particularly valuable for studying airborne microbial communities, where DNA is often fragmented, sequencing coverage is low, and assembly errors can obscure meaningful biological patterns.

Advantages of co-assembly in air metagenomics

Across key assembly metrics, including genome fraction, duplication ratio, and misassemblies, co-assembly outperformed individual assembly. Although the differences in genome fraction and mismatches per 100 kbp were not statistically significant, they are biologically relevant, as even modest improvements in genome recovery enhance downstream analyses, such as strain-level resolution and functional annotation. Co-assembly also yielded longer contigs and a greater total contig length, indicating its capacity to recover more complete genomic fragments. These results suggest that co-assembly helps mitigate some, though not all, of the challenges commonly associated with low-biomass samples, such as fragmented reads and uneven sequencing depth, by maximizing the shared genetic signal across samples.

As expected, genome fraction increased with sequencing depth, confirming that additional sequencing enhances genome recovery. However, we identified a saturation point beyond which further sequencing provided diminishing returns. While the duplication ratio and misassembly rates initially increased with sequencing depth, they stabilized beyond ~30 million reads, indicating that coverage continues to improve without significantly increasing assembly errors. This plateau likely arises once the assembly process has effectively saturated the complexities introduced by repeated regions, strain-level variation, and environmental stressors affecting airborne DNA integrity, limiting further improvements in assembly accuracy. These findings underscore the importance of optimizing sequencing strategies to balance depth, cost, and assembly efficiency, particularly in studies of airborne microbiomes. The results also suggest that increasing sequencing efforts can be highly beneficial in these contexts.

Despite these improvements, most contigs in our study remained shorter than 1000 bp, which is considerably smaller than those found in higher-biomass environments such as soil and water. This limitation is likely due to the fragmented nature of airborne DNA and the difficulties in assembling low-abundance genomes. Unlike water and soil samples, where DNA integrity is generally better preserved, airborne DNA is continuously exposed to environmental stressors such as UV radiation, oxidative damage, and temperature fluctuations, all of which contribute to DNA fragmentation. Additionally, mechanical and hydrodynamic stress during sample collection, such as the centrifugal forces generated by the Coriolis air sampler, along with handling and multiple freeze-thaw cycles, may further enhance DNA degradation. These factors collectively result in shorter DNA fragments, posing additional challenges for metagenomic assembly and gene recovery in airborne microbiomes.

We also tested different sample grouping strategies to further increase sample sizes, such as pooling all samples collected on clear or dusty days. However, when we attempted to assemble this larger dataset, even a high-memory server with 750 GB of RAM could not process the reads. This failure suggests that co-assembly becomes increasingly difficult when samples originate from highly diverse microbial sources with limited genetic similarity. In contrast, co-assembly was more successful when samples were derived from a single dominant habitat, such as dust storms from specific origins. These findings emphasize the need for careful sample selection in co-assembly approaches, as excessive diversity introduces sequence heterogeneity, which complicates assembly.

Interestingly, while co-assembly improved overall gene prediction accuracy, individual assembly yielded slightly higher gene richness. This difference likely arises because individual assemblies fragmented low-abundance genes into shorter contigs, preventing complete gene reconstruction. Such fragmentation can lead to misannotations and artificially inflated gene diversity, particularly for ARGs, where accurate reconstruction is crucial for distinguishing functional sequences from fragmented, non-functional elements. Our results indicate that while an individual assembly captures some sample-specific genes, co-assembly provides a more biologically meaningful and reliable representation of the airborne resistome.

Beyond airborne microbiomes, these findings have broader implications for other low-biomass environments, such as deep-sea, polar, and extraterrestrial microbiomes, where similar assembly challenges exist. By demonstrating that co-assembly improves genome and gene recovery, we provide a framework that can be applied to other extreme environments.

Mobile airborne ARGs and environmental dissemination

By improving genome reconstruction and functional analysis, co-assembly allowed a more comprehensive assessment of airborne microbial communities and their role in public health and environmental processes. A notable finding was that a subset of airborne ARGs was associated with mobile genetic elements, suggesting their potential for horizontal gene transfer across diverse microbial species. Dust storms substantially increased the abundance of mobile ARGs, supporting our hypothesis that atmospheric transport of dust could facilitate the global dissemination of antibiotic resistance. These findings provide new evidence linking ARG mobility to extreme weather events, highlighting environmental factors that influence the spread of AR on a global scale. Our findings underscore the need for advanced metagenomic approaches to accurately track the movement of ARGs and other clinically relevant genes in atmospheric microbiomes.

Limitations and future directions

Despite the advantages of co-assembly, this approach has inherent limitations. By averaging genetic signals across pooled samples, co-assembly may obscure sample-specific variations and lead to the loss of low-abundance, unique genes. Additionally, while co-assembly improves genome recovery, resolving complex microbial genomes remains challenging, particularly for species with high intra-species variation or repetitive regions.

We also identified methodological issues that require further consideration. Taxonomic annotation results varied between classification tools, similar to findings in previous studies that used diverse taxonomic profilers^49,50. Kraken2 detected a higher species diversity, whereas Diamond-Megan-LR identified additional pathogens missed by Kraken2. These differences are likely due to their distinct classification strategies: Kraken2’s k-mer-based approach offers broad taxonomic coverage but may include false positives, whereas Diamond-Megan-LR’s protein-based alignment is more conservative but improves specificity. Although our previous study validated Kraken2 as a reliable tool for analyzing atmospheric communities⁴², these findings highlight the importance of using multiple annotation tools to achieve a comprehensive assessment of airborne microbial diversity.

Functional gene annotation also posed challenges due to database limitations. Some protein domains associated with antibiotic resistance, such as PF12847 (a methyltransferase ___domain)⁵¹ and PF02467 (a transcription factor, WhiB)⁵², also play essential housekeeping roles. This functional overlap complicates the differentiation between routine cellular processes and true resistance mechanisms. Additionally, we identified ARG-related protein domains in fungal contigs (e.g., PF00903, typically associated with glyoxalase enzymes, and fosfomycin resistance proteins like FosB⁵³, FosA⁵⁴, and FosX⁵⁵). This raises the possibility that fungi might possess intrinsic genetic potential for antibiotic resistance, or that horizontal gene transfer may have occurred between bacterial and fungal communities. Given the bacterial focus of current resistance databases, further research is needed to determine the biological significance of these findings. We also observed AR and virulence genes co-occurrence with signatures of IS elements and ICEs within the same ORF. This may reflect multi-___domain proteins with distinct resistance and mobility functions⁵⁶ or chimeric genes formed through genetic fusion events. Alternatively, annotation artifacts in low-biomass datasets may complicate accurate identification. These scenarios highlight the need for careful genetic context interpretation and detailed functional analysis in metagenomic studies⁵⁷. Experimental validation will be essential to confirm these associations and assess their ecological and clinical implications.

Despite our efforts, reconstructing high-quality MAGs remained challenging. We recovered only 39 MAGs, most of which were classified as low to medium quality. The low completeness of these MAGs suggests that assembling complete genomes from airborne microbial communities remains difficult due to read fragmentation and the diverse nature of airborne metagenomes. None of the recovered MAGs were classified as belonging to well-known pathogenic species. This is likely because many pathogenic bacteria occur at low abundances in environmental samples, making it difficult to recover their genomes through metagenomic binning. Therefore, airborne pathogen detection should be interpreted with caution in the context of ecosystem dynamics and public health risks. These findings highlight the need for further methodological advancements, such as improved assembly algorithms, deeper sequencing, and optimized binning strategies, to enhance MAG recovery from low-biomass environments.

Future studies should integrate hybrid assembly approaches that combine short- and long-read sequencing to further enhance genome reconstruction in airborne metagenomes²⁷. This strategy could improve contiguity, facilitate the recovery of mobile genetic elements, and enable the identification of previously undetected pathogens and ARGs. Additionally, increasing DNA recovery from air samples, such as using multiple high-volume air samplers, could enhance sequencing depth and assembly quality. Complementing metagenomics with cultivation-based microbiology, metatranscriptomics, and single-cell genomics could provide deeper insights into the functional potential and ecological dynamics of airborne microbial communities.

Conclusion

Our study demonstrates that co-assembly offers distinct advantages over individual assembly in reconstructing airborne microbial genomes and genes. Co-assembly provides a more reliable characterization of airborne microbial communities by reducing redundancy, enhancing genome completeness, and improving gene prediction accuracy. Most importantly, our findings highlight the role of dust storms in the global dispersion of ARGs. Notably, some of these genes confer resistance to medically important antibiotics, and appear to be mobile, potentially facilitating the wider spread of antibiotic resistance. Given the increasing concern over environmental antibiotic resistance, this research offers critical insights into the factors shaping ARG distribution in the atmosphere. By refining genomic and computational methods, and deepening our understanding of airborne microbial dynamics, we can improve risk assessment and develop more effective surveillance and mitigation strategies.

Materials and methods

Sample collection and atmospheric conditions

We conducted air sampling from the rooftop of a four-story building at the Weizmann Institute of Science in Rehovot, Israel (31.9070°N, 34.8102°E; 80 m above mean sea level) over a 64-day period, from the end of March to the late May 2022. This site is frequently exposed to dust storms originating from prominent Middle Eastern dust sources, including the Saharan Desert, Iraq, Iran, and Saudi Arabia²⁸. For air sampling, we used the Coriolis µ microbial air sampler (Bertin Technologies). Samples were collected for 2 hours at a flow rate of 200 L/min. The impinger efficiently captures particles larger than 0.5 µm, with a collection efficiency (D₅₀ value) surpassing this threshold. To maintain sample integrity, we stored the collected samples in 15 ml of a nucleic acid preservation solution containing 280 g L⁻¹ ammonium sulfate, 25 mM sodium sulfate, and 10 mM EDTA (pH 5.2)⁵⁸. To minimize evaporation, we supplemented the samples with a top-off flow rate of 0.5–1.5 ml/min of molecular-grade water. Before each sampling event, we thoroughly decontaminated the Coriolis sampler as previously described⁴². Following collection, we concentrated the sample onto autoclaved and sterilized polyethersulfone (PES) filters (diameter of 25 mm and a pore size of 0.1 μm; Sartorius Stedim Biotech) using a sterile single-use syringe and reusable syringe filter holders (Sartorius Stedim Biotech). The filters were then transferred into Falcon tubes and stored at −20 °C until further processing. DNA extraction was performed within 2–3 days post-collection to ensure optimal sample quality. To account for potential contamination, we generated operating blanks by running the sampler for 5 seconds without collecting air samples. This process was repeated on randomly selected dates, yielding a total of seven operating blanks throughout the study.

We also monitored particulate matter concentration (PM, μg m⁻³) and temperature (T, °C) using data from the Rehovot Air Monitoring station, located ~1 km from the sampling site. To estimate the origin of the dust and air masses, we used time-averaged maps of dust column mass density⁵⁹ and the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model^60,61. Additional details on the sampling campaign and the analysis methods can be found in a recent study⁴². Throughout the campaign, we collected a total of 45 air samples (Supplementary fig. 15 and Supplementary Data 9).

DNA Extraction, library preparation, and metagenomic sequencing

We extracted DNA from the air filters using the PowerWater DNA Isolation Kit (Qiagen, Dresden, Germany) following the manufacturer’s protocol. DNA quantification was performed using a Qubit 2.0 fluorometer with the Qubit dsDNA HS (High Sensitivity) Assay Kit (Invitrogen) and dsDNA HS buffer. The quantification results are provided in Supplementary Data 9. For shotgun metagenomic library preparation, we followed the Illumina DNA Prep Reference Guide. Briefly, we fragmented 1 ng of input DNA, performed a post-fragmentation cleanup, and amplified the fragments through 12 cycles of PCR while incorporating unique dual indexes (Nextera DNA UD Indexes) into each sample. To ensure balanced sequencing representation, we initially pooled equal volumes of each library to create a preliminary pool. This pool was purified using SPB beads and eluted in molecular-grade water. We quantified the eluted library using Qubit and assessed fragment distribution with TapeStation (D1000 ScreenTape). The quantified library was then loaded into a MiniSeq Mid-Output reagent cartridge (300 cycles) for preliminary sequencing. Based on these results, we generated a final equimolar pool, ensuring uniform representation of each sample. The final pooled library underwent another round of SPB bead cleanup and was eluted in water. We determined the concentration using Qubit and validated fragment size distribution with TapeStation. The library was then loaded onto a NovaSeq X lane for high-throughput sequencing with paired-end 2 × 150 bp reads. Library preparation and sequencing were conducted at the Rush Genomics and Microbiome Core Facility (GMCF) located at Rush University. Additional details on sequencing depth for each sample are provided in Supplementary Data 10.

Data processing and metagenomic analysis

We performed a comprehensive quality assessment of the metagenomic datasets using FastQC software (version 0.11.9, www.bioinformatics.babraham.ac.uk/projects/fastqc). Quality trimming and Illumina adapter removal were executed using Trimmomatic (version 0.75) with the following parameters: ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:20, TRAILING:20, SLIDINGWINDOW:4:15, and MINLEN:35. After applying these quality control steps, we obtained an average of 4.3 million high-quality reads per sample, with read counts ranging from 3 to 11 million paired-end reads (Supplementary Data 11). Basic sequencing statistics for these samples were generated using SeqKit (v2.3.1)⁶². To examine microbial composition in relation to atmospheric conditions, we classified the metagenomic libraries into six distinct groups based on dominant air mass trajectories and microbial community similarities. These groupings were determined using permutational MANOVA (PERMANOVA), Principal Coordinates Analysis (PCoA) with Bray–Curtis dissimilarity, and hierarchical clustering based on taxonomic and functional characteristics, following our previous study⁴².

To improve metagenomic reconstruction, we merged samples within each group for co-assembly. Sample identifiers (t) correspond to time points spanning the entire sampling campaign, from t1 (beginning) to t64 (end) (Supporting Fig. 1): c_Northwest_1 (t12, t15, t16, t17, t23, and t24) and c_Southwest_1 (t5, t13, t14, t20, t22, t30, t31, t34, and t35) represent northwesterly and southwesterly air masses, respectively. These were observed under clear conditions before a period of consecutive dust storms and low daily mean temperatures. c_Northwest_2 (t50, t51, t56, t57, and t58) and c_Southwest_2 (t37, t38, t41, t42, t45, t49, t52, t55, t59, and t64) represent northwesterly and southwesterly air masses, respectively. These were observed under clear conditions following a period of consecutive dust storms and increasing mean daily temperatures. d_East (t1, t27, t28, t29, t32, and t62) and d_Southwest (t6, t7, t8, t9, t10, t11, t21, t33, and t36) represent dusty conditions of easterly and southwesterly air masses.

Metagenomic binning

We merged the high-quality reads and subsequently assembled and binned them using the metaWRAP (v.1.3.2) pipeline⁴². Briefly, for each sample, we co-assembled reads using the metaWRAP assembly module, which conducts the assembly with metaSPAdes (--metaspades)⁶³. Contigs shorter than 500 bp were removed to improve assembly quality. To enhance binning, we reassembled the reads that did not map back to the metaSPAdes assembly using MegaHit⁶⁴. We then performed metagenomic binning using three complementary algorithms: MaxBin2⁶⁵, CONCOCT⁶⁶, and metaBAT2⁶⁷, utilizing the Binning module (-maxbin2 -concoct -metabat2). To refine the bins, we applied the bin_refinement module. Given the relatively low sequencing depth of our samples, we set a minimum completion threshold of 5% and a maximum contamination threshold of 10% during the refinement process. After refinement, we further improved the bins by using the reassemble_bins module. We assessed bin completeness and contamination using CheckM⁶⁸ within this pipeline. The binning process grouped contigs into distinct genome bins, and after refinement and quality assessment, bins that met the thresholds were consolidated into metagenome-assembled genomes (MAGs), classified as follows: low-quality draft (<50% completeness, <10% contamination), medium quality draft (≥50% completeness, <10% contamination) and high-quality draft (>90% completeness, <5% contamination) based on established guidelines⁴⁶.

We taxonomically annotated the MAGs using GTDB-Tk (v2.1.1)⁶⁹ with the classify workflow (classify_wf). A phylogenomic tree of these MAGs was created using the Interactive Tree of Life (v6)⁷⁰. Finally, we functionally annotated the MAGs using eggNOG-mapper (v2.1.12)⁷¹ against the EggNOG 6.0 database⁷² with the following parameters: --itype genome and -m diamond.

AR and virulence gene annotation

We predicted open reading frames (ORFs) for protein-coding genes using Prodigal (version 2.6.3)⁷³ on contigs assembled with metaSPAdes, applying the ‘-p meta’ option. To ensure a comprehensive analysis, we examined all contigs when comparing individual and co-assembly approaches. However, given the highly fragmented and low-biomass nature of air samples, we excluded contigs shorter than 500 bp when detecting ARGs and mobility elements (e.g., plasmids) to enhance accuracy. To annotate AR and virulence genes in ORFs, we used hmmscan (HMMER, v3.3.2-gompic-2020b)⁷⁴ with a stringent e-value threshold of 1 × 10⁻⁵. In instances where the same ORFs produced multiple hits to the same Pfam entry, we selected only the top matches with the highest alignment reliability (acc value), scores, and the lowest E-values. For antibiotic resistance, we utilized the Resfams HMM Database (Core, v1.2)⁷⁵. To identify virulence-related genes, we initially searched for Pfam modules related to ‘virulence’ and ‘pathogenicity’ and downloaded the corresponding Pfam profiles⁷⁶. We manually checked the virulence role of these Pfam modules through a literature search and removed spurious modules from our list. Subsequently, we integrated all Pfam models, including the Resfams HMM Database, and performed hmmscan as described above.

Mobile genetic element annotation

We used PLASMe³⁶ to identify plasmid contigs from metagenome assemblies, setting the minimum coverage and identity thresholds for BLASTN at 90% and 70%, respectively (-c 0.9, -i 0.7, -p 0.5). We chose PLASMe due to its effectiveness with the short contig lengths. PLASMe’s hybrid approach, which combines sequence alignment and deep learning, outperforms other methods for short contig lengths. In contrast, many widely used tools require contig lengths longer than 1000 bp for accurate prediction.

To identify putatively mobile elements from metagenome assemblies, we used several tools. We employed hmmscan to predict integrative and conjugative elements (ICEs) on ORFs using the CONJScan HMM database (v2.0.1)⁷⁷. In the hmmscan results, we retained only the best hits as previously described. Additionally, we used ISEScan⁷⁸ to predict insertion sequence (IS) elements with default parameters and IntegronFinder (version 2.0)⁷⁹ with the (--local-max) option to predict complete integrons or integron-associated elements, such as attC sites on assembled contigs.

We identified contigs linked to AR and virulence genes as putatively mobile if they were also linked to one of the IS elements, ICEs, and integrons. This classification is informed by the sizes of known composite transposons, which can range up to 65 Kbp (mean: 11.74 Kbp)⁸⁰, ICEs that range from 20 Kbp to 500 Kbp^56,81, and bacterial integrons, most of which are shorter than 10 Kbp⁸². Given that most AR and virulence gene associated contigs in our dataset did not exceed 10 Kbp, we infer that if these contigs contain both AR or virulence genes and mobile genetic elements, they are likely associated. This approach is consistent with previous studies^83,84. Contigs not linked to any mobile elements or plasmids are considered chromosomal.

Sequence mapping

To estimate the abundance, we first mapped reads from each sample to each MAG, contig, or ORF using Bowtie2 (v2.5.1)⁸⁵ with the (--very-sensitive-local) option. We then calculated coverage information using pileup.sh (BBMap, v39.01, sourceforge.net/projects/bbmap/) and converted it into reads per kilobase per million mapped reads (RPKM) to estimate relative abundance. This normalization procedure adjusts for both library size, accounting for the total number of mapped reads, and MAG, contig, or gene length, ensuring accurate rescaling. For ORFs predicted as partial or incomplete genes, we estimated the gene abundance accordingly.

Taxonomic identification

We performed taxonomic annotation on contigs using Kraken2 (v2.1.2)⁸⁶ with default parameters and cross-check these results with Diamond-Megan-LR protocol⁸⁷, which is specifically designed for long-read sequences. Additional details of these analyses are provided in the Supporting Information.

Assembly evaluation

We assessed both individual and co-assembled contigs without applying length-based filtering. To evaluate ORF completeness, we used Prodigal, categorizing genes into three categories: complete genes contain a true boundary (a start or a stop codon); partial genes lack either a start or stop codon; incomplete genes have fragmented or missing edges.

To reduce redundancy in individual assembly results, we clustered contigs (nucleotide level) and ORFs (amino acid level) using MMseqs2 (v13-45111)⁸⁸ with the following parameters: --cov-mode 1, --cluster-mode 2, -c 0.95, and --min-seq-id 0.95. Further details of these analyses are provided in the Supporting Information. For protein abundance estimation, we mapped sequencing reads to the clustered protein-coding sequences and calculated RPKM as previously described. We then performed taxonomic and functional annotations on these non-redundant representative contigs, as well as genes derived from individual assemblies, estimating their abundance using the same approach.

To assess the quality of both individual and co-assemblies, we applied a reference-based evaluation using MetaQUAST (v5.2.0)⁸⁹ with the --unique-mapping and --reuse-combined-alignments flags. These settings ensure that each contig is mapped to a single reference genome position. For this analysis, we used a curated set of reference genomes representing a subset of the airborne microbiome community, comprising 49 species (Supplementary Data 12). These species were selected based on their high overall abundance and prevalence, as determined in our previous study through read-based taxonomic annotation of the same metagenomic dataset⁴².

We focused on widely used assembly metrics³⁹, including genome fraction, mismatches per 100 kbp, duplication ratio, the number of misassemblies and misassembled contigs length. The genome fraction represents the percentage of reference bases covered by assembled contigs after similarity-based mapping. Mismatches per 100 kbp indicate the number of mismatched bases in the contig-to-reference alignment. The duplication ratio is calculated as the total number of aligned bases in the assembly divided by the total number of aligned bases in the reference genome. The number of misassemblies quantifies contigs that contain gaps exceeding 1 kbp, insertions larger than 1 kbp, or alignments to multiple distinct genomes. Although NGA50 is a standard metric for assessing assembly contiguity, we were unable to compute it due to the low coverage of reference genomes, an inherent limitation of air samples’ low-biomass nature.

Impact of sequencing depth

To investigate how sequencing depth influences metagenomic assembly, we subsampled paired-end reads pooled based on microbiome similarity, as described previously. We generated subsamples at various depths (2 M, 5 M, 7.5 M, 10 M, 15 M, 20 M, 25 M, 30 M, and 35 M reads) using Seqtk (v1.3-GCC-10.2.0, https://github.com/lh3/seqtk) and assembled them with MetaSPAdes with the --metaspades option⁶³. To evaluate the relationship between sequencing depth and assembly metrics, we applied a linear mixed-effects regression model, treating sequencing depth and assembly metrics as fixed effects while considering the sample type as a random effect. We tested three candidate models: linear regression, second- and third-order polynomial regression. Model selection was performed using the Akaike Information Criterion (AIC), with the model yielding the lowest AIC considered the most optimal fit.

Statistical analysis and hypothesis testing

We evaluated the normality of the data using statistical tests and visual assessments. We performed the Shapiro-Wilk test to formally assess normality and generated Q-Q plots to visually inspect the data distribution. Additionally, we calculated skewness and kurtosis to assess the symmetry and shape of the data. When the normality assumption was violated, we chose to use non-parametric methods. In certain instances, we observed mild deviations from normality but given the small sample size (n = 6), we employed non-parametric methods in these cases as well. To compare the two groups (co-assembly vs. individual assembly), we performed the Wilcoxon signed-rank test and Wilcoxon rank-sum test (Mann–Whitney U test). Statistical significance was defined as p < 0.05. We estimated the effect size of difference between two groups using the Wilcoxon effect size (r) with a 95% confidence interval (CI). The effect size was interpreted as follows: <0.3 (small effect), 0.30 to <0.5 (moderate effect), and ≥0.50 (large effect). For these analyses, we used the following R packages: stats, moments, rstatix and ggplot2.

List of pathogens

To assess the presence of microbial species with known pathogenic potential in the atmospheric microbiome, relevant to human, animal, and plant health, we utilized the comprehensive list provided by the Global Catalogue of Pathogens (gcPathogen) (accessed on 4 June 2024)⁹⁰. This list consists of 1317 taxa, including 510 bacterial, 407 fungal, 226 viral, and 174 parasitic taxa. We compared the species names from this list with the outputs of Kraken2 and Diamond-Megan-LR analysis to identify their presence in air samples. While our comparison was conducted at the species level, it is important to note that pathogenicity is often determined at the strain level. Therefore, our results indicate the presence of pathogenic lineages rather than confirming the presence of specific pathogenic strains.

Assessment of contamination

To evaluate potential contamination and establish a baseline for accurate assembly, we collected blank (negative control) samples as described earlier. DNA yields from these blanks were either at or below the detection limit (0.005 ng/μL, measured using a Qubit 2.0 fluorometer). Shotgun sequencing of these blanks resulted in negligible sequencing reads for only three out of seven samples after quality assessment, and Kraken2 taxonomic classification confirmed substantially lower read counts in controls compared to air samples (Supplementary Data 10, 11 and 13). These findings suggest minimal background contamination and provide a reference for assessing assembly quality.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The metagenomic sequences utilized in this study have been deposited in the National Center for Biotechnology Information (NCBI) database under the accession number PRJNA1045528.

References

Holmes, A. H. et al. Understanding the mechanisms and drivers of antimicrobial resistance. Lancet 387, 176–187 (2016).
Article CAS Google Scholar
Munita, J. M. & Arias, C. A. Mechanisms of antibiotic resistance. Microbiol. Spectr. 4, https://doi.org/10.1128/microbiolspec.VMBF-0016-2015 (2016).
Kirchhelle, C. The Antibiocene—towards an eco-social analysis of humanity’s antimicrobial footprint. Hum. Soc. Sci. Commun. 10, 619 (2023).
Collaborators, G. B. D. A. R. Global burden of bacterial antimicrobial resistance 1990–2021: a systematic analysis with forecasts to 2050. Lancet 404, 1199–1226. https://doi.org/10.1016/S0140-6736(24)01867-1 (2024).
Groussin, M. et al. Elevated rates of horizontal gene transfer in the industrialized human microbiome. Cell 184, 2053–2067.e2018 (2021).
Article CAS Google Scholar
Ellabaan, M. M. H., Munck, C., Porse, A., Imamovic, L. & Sommer, M. O. A. Forecasting the dissemination of antibiotic resistance genes across bacterial genomes. Nat. Commun. 12, 2435 (2021).
Article CAS Google Scholar
Jiang, X. et al. Dissemination of antibiotic resistance genes from antibiotic producers to pathogens. Nat. Commun. 8, 15784 (2017).
Article CAS Google Scholar
Khedkar, S. et al. Landscape of mobile genetic elements and their antibiotic resistance cargo in prokaryotic genomes. Nucleic Acids Res. 50, 3155–3168 (2022).
Article CAS Google Scholar
Pehrsson, E. C. et al. Interconnected microbiomes and resistomes in low-income human habitats. Nature 533, 212–216 (2016).
Article CAS Google Scholar
Brumfield, K. D. et al. Genomic diversity of Vibrio spp. and metagenomic analysis of pathogens in Florida Gulf coastal waters following Hurricane Ian. mBio 14, e0147623 (2023).
Article Google Scholar
Li, W. et al. Association between antibiotic resistance and increasing ambient temperature in China: an ecological study with nationwide panel data. Lancet Reg. Health West Pac. 30, 100628 (2023).
Google Scholar
MacFadden, D. R., McGough, S. F., Fisman, D., Santillana, M. & Brownstein, J. S. Antibiotic resistance increases with local temperature. Nat. Clim. Chang. 8, 510–514 (2018).
Article CAS Google Scholar
Cruz-Loya, M. et al. Stressor interaction networks suggest antibiotic resistance co-opted from stress responses to temperature. ISME J. 13, 12–23 (2019).
Article Google Scholar
Usmani, M. et al. Combating cholera by building predictive capabilities for pathogenic Vibrio cholerae in Yemen. Sci. Rep. 13, 2255 (2023).
Article CAS Google Scholar
Mao, G. et al. Monsoon affects the distribution of antibiotic resistome in Tibetan glaciers. Environ. Pollut. 317, 120809 (2023).
Article CAS Google Scholar
Lee, S. et al. Residential urban stormwater runoff: a comprehensive profile of microbiome and antibiotic resistance. Sci. Total Environ. 723, 138033 (2020).
Article CAS Google Scholar
van Bavel, B. et al. Intersections between climate change and antimicrobial resistance: a systematic scoping review. Lancet Planet Health 8, e1118–e1128 (2024).
Article Google Scholar
Zhang, Z. et al. Assessment of global health risk of antibiotic resistance genes. Nat. Commun. 13, 1553 (2022).
Article CAS Google Scholar
Bai, H. et al. Airborne antibiotic resistome and microbiome in pharmaceutical factories. Environ. Int. 186, 108639 (2024).
Article CAS Google Scholar
Bai, H. et al. Spread of airborne antibiotic resistance from animal farms to the environment: dispersal pattern and exposure risk. Environ. Int. 158, 106927 (2022).
Article CAS Google Scholar
Rodo, X. et al. Microbial richness and air chemistry in aerosols above the PBL confirm 2,000-km long-distance transport of potential human pathogens. Proc. Natl. Acad. Sci. USA 121, e2404191121 (2024).
Article CAS Google Scholar
Mazar, Y., Cytryn, E., Erel, Y. & Rudich, Y. Effect of dust storms on the atmospheric microbiome in the Eastern Mediterranean. Environ. Sci. Technol. 50, 4194–4202 (2016).
Article CAS Google Scholar
Gat, D., Mazar, Y., Cytryn, E. & Rudich, Y. Origin-dependent variations in the atmospheric microbiome community in Eastern Mediterranean Dust Storms. Environ. Sci. Technol. 51, 6709–6718 (2017).
Article CAS Google Scholar
Luhung, I. et al. Experimental parameters defining ultra-low biomass bioaerosol analysis. NPJ Biofilms Microbiomes 7, 37 (2021).
Article CAS Google Scholar
Drautz-Moses, D. I. et al. Vertical stratification of the air microbiome in the lower troposphere. Proc. Natl. Acad. Sci. USA 119, e2117293119 (2022).
Article CAS Google Scholar
Gusareva, E. S. et al. Microbial communities in the tropical air ecosystem follow a precise diel cycle. Proc. Natl. Acad. Sci. USA 116, 23299–23308 (2019).
Article CAS Google Scholar
Reska, T. et al. Air monitoring by nanopore sequencing. ISME Commun. 4, ycae099 (2024).
Article Google Scholar
Bodenheimer, S., Lensky, I. M. & Dayan, U. Characterization of Eastern Mediterranean dust storms by area of origin; North Africa vs. Arabian Peninsula. Atmos. Environ. 198, 158–165 (2019).
Article CAS Google Scholar
Luhung, I. et al. Protocol improvements for low concentration DNA-based bioaerosol sampling and analysis. PLoS One 10, e0141158 (2015).
Article Google Scholar
Lever, M. A. et al. A modular method for the extraction of DNA and RNA, and the separation of DNA pools from diverse environmental sample types. Front. Microbiol. 6, 476 (2015).
Article Google Scholar
Boifot, K. O., Skogan, G. & Dybwad, M. Sampling efficiency and nucleic acid stability during long-term sampling with different bioaerosol samplers. Environ. Monit. Assess. 196, 577 (2024).
Article Google Scholar
Botelho, J. & Schulenburg, H. The role of integrative and conjugative elements in antibiotic resistance evolution. Trends Microbiol. 29, 8–18 (2021).
Article CAS Google Scholar
Castaneda-Barba, S., Top, E. M. & Stalder, T. Plasmids, a molecular cornerstone of antimicrobial resistance in the One Health era. Nat. Rev. Microbiol. 22, 18–32 (2024).
Article CAS Google Scholar
Hall, J. P. J., Harrison, E. & Baltrus, D. A. Introduction: the secret lives of microbial mobile genetic elements. Philos. Trans. R. Soc. Lond. B Biol. Sci. 377, 20200460 (2022).
Article CAS Google Scholar
Harrison, E. & Brockhurst, M. A. Plasmid-mediated horizontal gene transfer is a coevolutionary process. Trends Microbiol. 20, 262–267 (2012).
Article CAS Google Scholar
Tang, X., Shang, J., Ji, Y. & Sun, Y. PLASMe: a tool to identify PLASMid contigs from short-read assemblies using transformer. Nucleic Acids Res. 51, e83 (2023).
Article CAS Google Scholar
Delgado, L. F. & Andersson, A. F. Evaluating metagenomic assembly approaches for biome-specific gene catalogues. Microbiome 10, 72 (2022).
Article CAS Google Scholar
Vosloo, S. et al. Evaluating de novo assembly and binning strategies for time series drinking water metagenomes. Microbiol. Spectr. 9, e0143421 (2021).
Article Google Scholar
Meyer, F. et al. Critical assessment of metagenome interpretation: the second round of challenges. Nat. Methods 19, 429–440 (2022).
Article CAS Google Scholar
Larsson, D. G. J. & Flach, C. F. Antibiotic resistance in the environment. Nat. Rev. Microbiol. 20, 257–269 (2022).
Article CAS Google Scholar
Krawczyk, P. S., Lipinski, L. & Dziembowski, A. PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures. Nucleic Acids Res. 46, e35 (2018).
Article Google Scholar
Erkorkmaz, B. A., Zeevi, D. & Rudich, Y. Dust storm-driven dispersal of potential pathogens and antibiotic resistance genes in the Eastern Mediterranean. Sci. Total Environ. 958, 178021 (2025).
Article CAS Google Scholar
Burk, D. L., Ghuman, N., Wybenga-Groot, L. E. & Berghuis, A. M. X-ray structure of the AAC(6’)-Ii antibiotic resistance enzyme at 1.8 A resolution; examination of oligomeric arrangements in GNAT superfamily members. Protein Sci. 12, 426–437 (2003).
Article CAS Google Scholar
Miller, G. H. et al. The most frequent aminoglycoside resistance mechanisms–changes with time and geographic area: a reflection of aminoglycoside usage patterns? Aminoglycoside resistance study groups. Clin. Infect. Dis. 24, S46–S62 (1997).
Article CAS Google Scholar
Jesudason, T. WHO publishes list of medically important antimicrobials. Lancet Infect. Dis. 24, e284 (2024).
Article Google Scholar
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
Article CAS Google Scholar
Rigonato, J. et al. Aliterella atlantica gen. nov., sp. nov., and Aliterella antarctica sp. nov., novel members of coccoid Cyanobacteria. Int. J. Syst. Evol. Microbiol 66, 2853–2861 (2016).
Article CAS Google Scholar
Battistuzzi, F. U. & Hedges, S. B. A major clade of prokaryotes with ancient adaptations to life on land. Mol. Biol. Evol. 26, 335–343 (2009).
Article CAS Google Scholar
Poussin, C. et al. Crowdsourced benchmarking of taxonomic metagenome profilers: lessons learned from the sbv IMPROVER Microbiomics challenge. BMC Genom. 23, 624 (2022).
Article Google Scholar
Miossec, M. J. et al. Evaluation of computational methods for human microbiome analysis using simulated data. PeerJ 8, e9688 (2020).
Article Google Scholar
Chow, C. S., Lamichhane, T. N. & Mahto, S. K. Expanding the nucleotide repertoire of the ribosome with post-transcriptional modifications. ACS Chem. Biol. 2, 610–619 (2007).
Article CAS Google Scholar
Kormanec, J. & Homerova, D. Streptomyces aureofaciens whiB gene encoding putative transcription factor essential for differentiation. Nucleic Acids Res. 21, 2512 (1993).
Article CAS Google Scholar
Cao, M., Bernat, B. A., Wang, Z., Armstrong, R. N. & Helmann, J. D. FosB, a cysteine-dependent fosfomycin resistance protein under the control of sigma(W), an extracytoplasmic-function sigma factor in Bacillus subtilis. J. Bacteriol. 183, 2380–2383 (2001).
Article CAS Google Scholar
Beharry, Z. & Palzkill, T. Functional analysis of active site residues of the fosfomycin resistance enzyme FosA from Pseudomonas aeruginosa. J. Biol. Chem. 280, 17786–17791 (2005).
Article CAS Google Scholar
Fillgrove, K. L., Pakhomova, S., Schaab, M. R., Newcomer, M. E. & Armstrong, R. N. Structure and mechanism of the genomically encoded fosfomycin resistance protein, FosX, from Listeria monocytogenes. Biochemistry 46, 8110–8120 (2007).
Article CAS Google Scholar
Wozniak, R. A. & Waldor, M. K. Integrative and conjugative elements: mosaic mobile genetic elements enabling dynamic lateral gene flow. Nat. Rev. Microbiol. 8, 552–563 (2010).
Article CAS Google Scholar
Kerkvliet, J. J. et al. Metagenomic assembly is the main bottleneck in the identification of mobile genetic elements. PeerJ 12, e16695 (2024).
Article Google Scholar
Erkorkmaz, B. A., Gat, D. & Rudich, Y. Aerial transport of bacteria by dust plumes in the Eastern Mediterranean revealed by complementary rRNA/rRNA-gene sequencing. Commun. Earth Environ. 4, 24 (2023).
Article Google Scholar
Acker, J. G. & Leptoukh, G. Online analysis enhances use of NASA Earth science data. EoS Trans. Am. Geophys. Union 88, 14–17 (2007).
Article Google Scholar
Rolph, G., Stein, A. & Stunder, B. Real-time Environmental Applications and Display sYstem: READY. Environ. Model. Softw. 95, 210–228 (2017).
Article Google Scholar
Cohen, M. D. et al. NOAA’s HYSPLIT atmospheric transport and dispersion modeling system. Bull. Am. Meteorol. Soc. 96, 2059–2077 (2015).
Article Google Scholar
Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One 11, e0163962 (2016).
Article Google Scholar
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res 27, 824–834 (2017).
Article CAS Google Scholar
Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Article CAS Google Scholar
Wu, Y. W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016).
Article CAS Google Scholar
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Article CAS Google Scholar
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
Article Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Article CAS Google Scholar
Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38, 5315–5316 (2022).
Article CAS Google Scholar
Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 52, W78–W82 (2024).
Article Google Scholar
Cantalapiedra, C. P., Hernandez-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and ___domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Article CAS Google Scholar
Hernandez-Plaza, A. et al. eggNOG 6.0: enabling comparative genomics across 12 535 organisms. Nucleic Acids Res. 51, D389–D394 (2023).
Article CAS Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma. 11, 119 (2010).
Article Google Scholar
Eddy, S. R. Profile hidden Markov models. Bioinformatics 14, 755–763 (1998).
Article CAS Google Scholar
Gibson, M. K., Forsberg, K. J. & Dantas, G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J. 9, 207–216 (2015).
Article CAS Google Scholar
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).
Article CAS Google Scholar
Cury, J., Abby, S. S., Doppelt-Azeroual, O., Néron, B. & Rocha, E. P. C. in Horizontal Gene Transfer: Methods and Protocols (ed Fernando de la Cruz) 265-283 (Springer, 2020).
Xie, Z. & Tang, H. ISEScan: automated identification of insertion sequence elements in prokaryotic genomes. Bioinformatics 33, 3340–3347 (2017).
Article CAS Google Scholar
Neron, B. et al. IntegronFinder 2.0: identification and analysis of integrons across bacteria, with a focus on antibiotic resistance in Klebsiella. Microorganisms 10, 700 (2022).
Article CAS Google Scholar
Nielsen, T. K., Browne, P. D. & Hansen, L. H. Antibiotic resistance genes are differentially mobilized according to resistance mechanism. Gigascience 11, https://doi.org/10.1093/gigascience/giac072 (2022).
Johnson, C. M. & Grossman, A. D. Integrative and conjugative elements (ICEs): what they do and how they work. Annu. Rev. Genet. 49, 577–601 (2015).
Article CAS Google Scholar
Zhang, A. N. et al. Conserved phylogenetic distribution and limited antibiotic resistance of class 1 integrons revealed by assessing the bacterial genome and plasmid collection. Microbiome 6, 130 (2018).
Article Google Scholar
Kent, A. G., Vill, A. C., Shi, Q., Satlin, M. J. & Brito, I. L. Widespread transfer of mobile antibiotic resistance genes within individual gut microbiomes revealed through bacterial Hi-C. Nat. Commun. 11, 4379 (2020).
Article CAS Google Scholar
Lee, K. et al. Population-level impacts of antibiotic usage on the human gut microbiome. Nat. Commun. 14, 1191 (2023).
Article CAS Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS Google Scholar
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Article CAS Google Scholar
Bagci, C., Patz, S. & Huson, D. H. DIAMOND+MEGAN: fast and easy taxonomic and functional analysis of short and long microbiome sequences. Curr. Protoc. 1, e59 (2021).
Article CAS Google Scholar
Steinegger, M. & Soding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Article CAS Google Scholar
Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D. & Gurevich, A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics 34, i142–i150 (2018).
Article CAS Google Scholar
Guo, C. et al. gcPathogen: a comprehensive genomic resource of human pathogens for public health. Nucleic Acids Res. 52, D714–D723 (2024).
Article CAS Google Scholar

Download references

Acknowledgements

We acknowledge the Weizmann Institute of Science Sustainability and Energy Research Initiative (SAERI) and the de Botton Center for Marine Science for partially supporting this study through a research grant.

Author information

Authors and Affiliations

Department of Earth and Planetary Sciences, Weizmann Institute of Science, Rehovot, Israel
Burak Adnan Erkorkmaz & Yinon Rudich
Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot, Israel
David Zeevi

Authors

Burak Adnan Erkorkmaz
View author publications
Search author on:PubMed Google Scholar
David Zeevi
View author publications
Search author on:PubMed Google Scholar
Yinon Rudich
View author publications
Search author on:PubMed Google Scholar

Contributions

B.A.E. conceptualized and designed the study, conducted bioinformatics and data analysis, interpreted, and visualized the results, and drafted the manuscript. Y.R. and D.Z. contributed to reviewing and editing the manuscript. All authors actively participated in result discussions and approved the final manuscript.

Corresponding author

Correspondence to Yinon Rudich.

Ethics declarations

Competing interests

Y.R. is an Editorial Board Member for Communications Earth & Environment, but was not involved in the editorial review of, nor the decision to publish this article. All other authors have no competing interests to declare.

Peer review

Peer review information

Communications Earth & Environment thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Alice Drinkwater. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Transparent Peer Review file

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Supplementary Data 9

Supplementary Data 10

Supplementary Data 11

Supplementary Data 12

Supplementary Data 13

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Erkorkmaz, B.A., Zeevi, D. & Rudich, Y. Metagenomic co-assembly uncovers mobile antibiotic resistance genes in airborne microbiomes. Commun Earth Environ 6, 397 (2025). https://doi.org/10.1038/s43247-025-02381-3

Download citation

Received: 06 October 2024
Accepted: 13 May 2025
Published: 22 May 2025
DOI: https://doi.org/10.1038/s43247-025-02381-3