Extended Data Fig. 4: Pervasive L1s harbor enhancer-like features. | Nature Genetics

Extended Data Fig. 4: Pervasive L1s harbor enhancer-like features.

From: LINE-1 transcription activates long-range gene expression

Extended Data Fig. 4

a,b, Dot plot/box plot of STARR-seq read counts (left) and enrichment (right) at the L1 subfamily (a) and L1 individual (b) level using K562 STARR-seq datasets (2 PE100 and 1 PE150). The STARR-seq reads were unique-mapped to individual L1 locus. Center lines represent the median value, box limits represent the 25th and 75th percentiles and whiskers denote minima and maxima (1.5× the interquartile range). n = 2 biological replicates. CPM, read counts per million mapped reads. c, Scatter plot showing counts of STARR-seq peaks and −log10(P value) for all repeats at the subfamily level in indicated cell lines. Dots, repeat subfamilies; large dots, L1 subfamilies; color, repeat classes. P, hypergeometric test. d, Aggregated line plots showing STARR-seq signals over full-length L1s at the subfamily level in indicated cell lines. e, Heatmaps of STARR-seq signals over full-length L1s in indicated cell lines, sorted and aligned as in Fig. 2d. f, Heatmap showing the proportion of full-length and non-full-length L1s that exhibit STARR-seq signals in six indicated human cell lines. The L1 subfamilies are classified into two groups according to their length: longer than 6 kb (full-length) or shorter (non-full-length). While nearly all full-length L1s contain L1 5′ UTR and overlap with STARR-seq signals, some non-full-length L1s (such as L1P2 and L1PBa1) also contain L1 5′ UTR and overlap with STARR-seq peaks. g, Enrichment of DNA binding motifs at the full-length L1s with STARR-seq signals in K562, with the full-length L1s not overlapped with STARR-seq peaks as control. Plotted is the log2 transformed fold enrichment and log10 transformed P value of each protein binding motif (hypergeometric test). The enriched motifs for transcription factors are specified. Zf, zinc finger. h, Analysis of the ChIP-Atlas data showing that transcription factors bind full-length L1s. Colored circles represent log2 fold enrichment, with areas proportional to Q value (hypergeometric test). For a single protein, log2 fold enrichment is calculated using the median value from different cells or experiments. Q-values are adjusted with the Benjamini–Hochberg method.

Back to article page