Extended Data Fig. 2: Computational pipeline and data quality controls for condense-seq. | Nature

Extended Data Fig. 2: Computational pipeline and data quality controls for condense-seq.

From: Native nucleosomes intrinsically encode genome organization principles

Extended Data Fig. 2

a,b, The pipeline of Condense-seq analysis is composed of (i) reads alignment by Bowtie2, (ii) coverage calculations, (iii) mono-nucleosome peak calling for each local maximum of input coverage, (iv) absolute nucleosome count estimation using coverage area and soluble fraction changes from the titration data of the UV-VIS spectrometry measurement, and (v) compute condensability score as negative log of soluble fraction after condensation for each nucleosome. c, For quality control, we checked that the length distribution of nucleosomal DNA of nucleosomes remaining in the supernatant is mostly around at 150 bp for all concentrations of spermine used. (d) Nucleosome number fluctuation vs genomic position in Chr 1. The input ([sp]=0 mM, red curve) shows mostly flat values, showing that there is no strong bias in the input. NCPs remaining in the supernatant show progressively strong bias at higher [sp]. e, The periodicity of AT-rich versus GC-rich dinucleotides, the hallmark indicator of nucleosome peaks, supports the nucleosomal source of DNA analyzed. f, Condensability is more highly correlated with the supernatant nucleosome number changes than the input (Spearman correlation coefficient −0.79 vs 0.14). g, Estimated NCP number for various ChromHMM chromatin states for input vs supernatant ([sp] = 0.79 mM). Analyses in (d), (f), and (g) collectively show that condensability score is mostly determined by the degree of how much nucleosomes are condensed, not by the variations in the input NCPs. h, Condensability determined via nucleosome peak calling and regular sliding windows gave almost identical results for various ChromHMM chromatin states (p-value > 0.05 and Cohen’s d < 0.1 for every comparison). All boxplot centers represent median, and the lower/upper bounds is the 1st/3rd quartile of data. i, The statistical significance (p-value using t-test) and effect size (Cohen’s d) are computed for condensability difference between each pair of ChromHMM states (data in Fig. 1c and Extended Data Fig. 2h). Numeric values are shown for each cell for Cohen’s d (top right triangle) and -log10 p-value (bottom left triangle). j, Correlations of condensability values between replicates. All statistics were computed via two-sided Welch’s t-test over more than 7000 nucleosomes (g-i) or 40000 genomic bins (h) of each state from two biological replicates.

Source Data

Back to article page