Fig. 1: Workflow of Pangaea.
From: Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity

a Pangaea could assemble reads with physical barcodes from linked-read sequencing, or virtual barcodes from aligning short-reads to long-reads. Linked-reads of stLFR and TELL-Seq are with high barcode specificity. b Pangaea extracts features including k-mer histograms and TNFs from co-barcoded reads. The features are concatenated and used to represent reads in low-dimensional latent space using a variational autoencoder. The embeddings of co-barcoded reads are clustered by RPH-kmeans. Pangaea assembles the reads from each bin independently and adopts a multi-thresholding reassembly strategy to improve the assemblies for low-abundance microbes. Ensemble assembly integrates the contigs from different strategies using OLC algorithm.