Fig. 2: Characteristic patterns in the pan-tandem repeat (TR) dataset.

a Distribution of each TR type. The inner pie chart indicates the ratio of short TRs (STRs) (red) and variable number TRs (VNTRs) (blue) in the pan-TR dataset. The outer pie chart indicates the ratio of TRs that were present in the Nipponbare reference genome (dark green) and TRs absent from the reference genome (light green). b Statistics summarizing TR copy number differences between the major allele and the reference allele. Red and blue dots indicate STRs and VNTRs, respectively. c Distribution of the repeat motif length at each TR locus. d Distribution of allele numbers at each TR locus. e Distribution of the frequency of the major alleles at each TR locus. The dashed line indicates a major allele frequency of 0.5. f The distribution of genetic variants’ distance to the nearest transcription start site (TSS). Each color indicates a genomic variant. The overlap between genetic variants indicates similar distribution between variants. g Distribution of genomic variations along each chromosome. p-values indicate differences in distribution between TRs and other bi-allelic variants (Wilcoxon rank sum test). h Distribution of linkage disequilibrium (LD) values between TRs and bi-allelic variants within 100 kb. LD was calculated as the absolute value of a pairwise Pearson’s correlation test (|R|). For each TR, the maximum |R| value with adjacent variants on either side is recorded. The dashed line indicates |R|  =  0.30 and |R|   =  0.70. Source data are provided as a Source Data file.