Extended Data Fig. 2: Additional analysis using the fine-tuned CNN model and Z-DNABERT, related to Fig. 1. | Nature

Extended Data Fig. 2: Additional analysis using the fine-tuned CNN model and Z-DNABERT, related to Fig. 1.

From: AIRE relies on Z-DNA to flag gene targets for thymic T cell tolerization

Extended Data Fig. 2

a, Contribution score profiles for AIRE-induced genes whose largest-positive-gradient regions contained (CA)n repeats (left) or NFE2–MAF-binding motifs (right). b, Motifs enriched in the regions (50 bp in length) with the largest ISM scores. c, Motifs that are relatively more enriched in extended-promoter sequences of AIRE-induced genes than AIRE-neutral genes (E-value < 0.05). The MEME suite was used to identify the enriched motifs for panel b and c. d, Example ISM score heatmaps for (CA)n repeats in AIRE-induced gene promoters. Each of the three rows shows results for one possible substitution in the order of A- > C- > G- > T from top to bottom. Red (positive ISM score) indicates that substitution of the original nucleotide leads to a decreased average Z-DNA score across the stretch of the (CA)n repeat; Blue (negative ISM score) indicates the other way. e, Boxplot showing the distribution of ISM scores at various positions near the boundaries of (CA)n repeats in AIRE-induced gene promoters. For example, position 2 indicates the second nucleotides from both ends of a (CA)n repeat. p-values for panel e were calculated using the one-sample Wilcoxon Signed Rank Test (one-tailed). AIG, AIRE-induced gene.

Source Data

Back to article page