Fig. 1: An overview of Monopogen workflow. | Nature Biotechnology

Fig. 1: An overview of Monopogen workflow.

From: Single-nucleotide variant calling in single-cell sequencing data with Monopogen

Fig. 1

Monopogen includes germline and putative somatic SNV calling modules. a, Monopogen starts from individual bam files produced by single-cell sequencing technologies, including scRNA-seq, snRNA-seq, snATAC-seq and scDNA-seq. Sequencing reads with multiple alignment mismatches (default four) are removed. Putative SNVs are identified sensitively from pooled pileup containing at least one nonreference read. b, For SNVs present in the external reference panel (such as 1KG3), genotype likelihoods are further refined based on LD in the reference panel. The loci showing persistent discordance are used to estimate a sequencing error model. c, For the remaining loci, we identify putative somatic SNVs by focusing on ones if there is sufficient sequencing depth and alternative allele frequency (calibrated by a sequencing error model). The SVM module is designed to remove low-quality SNVs. The variant calling metrics including the QS for calling, VDB for filtering splice-site artifacts, Mann–Whitney U test of RPB, Mann–Whitney U test of BQB, Mann–Whitney U test of ratio of MQSB, SGB and BAF. The germline SNVs are considered as the positive training sets, while the continuous de novo SNV chunks (>2 SNVs) that do not include any germline SNV are set as the negative sets. The remaining de novo SNVs are considered as the test set. d, The alleles observed at a de novo SNV site are statistically phased together with adjacent germline alleles to calculate an LD refinement score that estimates the percentage of cells in which the alleles do not cosegregate with neighboring germline alleles. De novo SNVs with high LD refinement scores are classified as the putative somatic SNVs, and their genotypes at the single cell/cluster level are inferred using Monovar. e, Projection of study samples onto the HGDP enables genetic ancestry inference. f, Genome-wide association study of cellular quantitative traits can be performed when there is sufficient sample size. g, Lineage tracing at single cell or clonal level. QS, quality score; VDB, variant distance bias; RPB, read position bias; BQB, base quality bias; MQSB, mapping quality and strand bias; SGB, segregation-based metric; HGDP, Human Genome Diversity Project.

Back to article page