Extended Data Fig. 2: A uniform approach to identify sites bound by the KMT2A fusion protein using KMT2A N- and C-terminal profiles.
From: Automated CUT&Tag profiling of chromatin heterogeneity in mixed-lineage leukemia

a, Line plot comparing the Bayesian Information Criterion (BIC) for a range of Gaussian mixture models containing 1-9 components using either equal (E) or unequal (V) variance to model the distribution of KMT2A peak widths in CD34 + HSPCs. A two component Gaussian mixture model provides the highest BIC. b, Histogram of KMT2A peak widths in CD34 + HSPCs showing the two Gaussian models fit to the data. The dotted line indicates the threshold separating peaks called as ‘wide’ versus ‘narrow’. c, Same as (a) but modeling the distribution of the relative enrichment of the KMT2A N- versus C-terminus (KMT2A N/C scores) over KMT2A peaks in CD34 + HSPCs. The highest BIC is achieved by a single Gaussian distribution. d, A two component model fails to partition the KMT2A peaks by N/C scores in CD34 + HSPCs. e, Scatter plots comparing the KMT2A peak width and N/C scores in control CD34 + HSPCs; wide peaks indicated in red. f, Same as (a) but for the KMT2Ar SEM cell line. g, Same as (b) but for SEM cells. h, Same as (c) but for SEM cells. Here, a two component Gaussian mixture model achieves the highest BIC. i, Two Gaussian models fit to the SEM KMT2A N/C scores. The dotted line indicates the threshold separating peaks with ‘high’ versus ‘low’ KMT2A N/C scores. j, Same as (e) but for SEM cells; oncoprotein targets are indicated in red. k, Venn diagram of KMT2A-AF4 oncoprotein target genes in SEM cells called using either this two-dimensional Gaussian modeling approach or using ChIP-seq23. l, Same as (e) for the KMT2Ar ML-2 cell line which lacks the wild-type KMT2A allele. m, Same as (e) for the control H1 sample.