A sequence of SVA retrotransposon insertions in ASIP shaped human pigmentation

Kamitaki, Nolan; Hujoel, Margaux L. A.; Mukamel, Ronen E.; Gebara, Edward; McCarroll, Steven A.; Loh, Po-Ru

doi:10.1038/s41588-024-01841-4

Download PDF

Letter
Open access
Published: 24 July 2024

A sequence of SVA retrotransposon insertions in ASIP shaped human pigmentation

Nature Genetics volume 56, pages 1583–1591 (2024)Cite this article

9486 Accesses
6 Citations
56 Altmetric
Metrics details

Subjects

Abstract

Retrotransposons comprise about 45% of the human genome¹, but their contributions to human trait variation and evolution are only beginning to be explored^2,3. Here, we find that a sequence of SVA retrotransposon insertions in an early intron of the ASIP (agouti signaling protein) gene has probably shaped human pigmentation several times. In the UK Biobank (n = 169,641), a recent 3.3-kb SVA insertion polymorphism associated strongly with lighter skin pigmentation (0.22 [0.21–0.23] s.d.; P = 2.8 × 10⁻³⁵¹) and increased skin cancer risk (odds ratio = 1.23 [1.18–1.27]; P = 1.3 × 10⁻²⁸), appearing to underlie one of the strongest common genetic influences on these phenotypes within European populations^4,5,6. ASIP expression in skin displayed the same association pattern, with the SVA insertion allele exhibiting 2.2-fold (1.9–2.6) increased expression. This effect had an unusual apparent mechanism: an earlier, nonpolymorphic, human-specific SVA retrotransposon 3.9 kb upstream appeared to have caused ASIP hypofunction by nonproductive splicing, which the new (polymorphic) SVA insertion largely eliminated. Extended haplotype homozygosity indicated that the insertion allele has risen to allele frequencies up to 11% in European populations over the past several thousand years. These results indicate that a sequence of retrotransposon insertions contributed to a species-wide increase, then a local decrease, of human pigmentation.

Mini-heterochromatin domains constrain the cis-regulatory impact of SVA transposons in human brain development and disease

Article Open access 04 June 2024

Contributions of NR1H3 genetic polymorphisms to susceptibility and effects of narrowband UVB phototherapy to nonsegmental vitiligo

Article Open access 28 February 2023

SVA retrotransposon insertion in exon of MMR genes results in aberrant RNA splicing and causes Lynch syndrome

Article 08 December 2020

Main

Variation in skin pigmentation has profoundly influenced human evolution and social history, enabling Homo sapiens to adapt to environments with diverse levels of solar radiation. Agouti signaling protein (ASIP) is a secreted protein that plays a key role in skin and hair pigmentation by binding to a receptor (melanocortin 1 receptor (MC1R)) on the surface of melanocytes, causing them to shift melanin pigment production from darker, brown eumelanin to lighter, red pheomelanin⁷. Across vertebrates, regulated increases in expression of ASIP decrease pigmentation temporally and in different parts of the body⁸. In humans, the ASIP-MC1R pathway is affected by several of the largest influences of common genetic variation on skin and hair pigmentation, including several common missense variants in MC1R^9,10.

Genome-wide association studies (GWAS) in European genetic-ancestry cohorts for pigmentation-related traits, including skin cancers such as melanoma, have long observed a particularly strong association near the ASIP gene^4,5,6 that colocalizes with an expression quantitative trait locus (eQTL) for ASIP¹¹. However, a plausible functional variant for this common, large effect (>0.2 s.d. change in pigmentation phenotypes) has not been identified, despite the considerable statistical resolution afforded by large biobank cohorts¹². GWAS of African cohorts (from Ethiopia, Tanzania, Botswana or KhoeSan populations)^13,14 or East Asian cohorts (from Japan or Korea)^15,16 have not found an association at this locus, suggesting that the functional variant emerged recently and on a European-specific haplotype, consistent with a genome-wide scan of recent positive selection in a British cohort that identified the ASIP locus among other pigmentation-associated genes¹⁷.

Across mammals, structural mutation at ASIP is a recurring mechanism underlying variation in coat color. Changes in coat color have occurred by large rearrangements at ASIP in lethal yellow agouti mice (A^y)¹⁸, sheep¹⁹ and gibbons²⁰, and by retrotransposon insertions in viable yellow agouti mice (A^vy)²¹ and dogs²². However, such polymorphisms have not been reported for human ASIP.

To identify structural variation at ASIP that could underlie genetic associations with pigmentation, we examined long-read sequence assemblies generated by the Human Genome Structural Variation Consortium²³ (HGSVC2; n = 32). The single individual that was heterozygous for the light-pigmentation/cancer-risk haplotype (NA12329) was also heterozygous for a large, 3.3-kb structural variant in an intron of ASIP (Fig. 1a,b) previously inferred from short-read sequencing analyses^24,25. Optical mapping data (Bionano) confirmed this insertion as the only large structural variant within 500 kb of ASIP carried by NA12329. The variant overlaps a SINE-VNTR-Alu (SVA) element annotated by RepeatMasker²⁶ (in antisense orientation relative to ASIP transcription) at chr20:34228123–34231419 in the GRCh38 reference, suggesting that the human genome (GRCh38) reference sequence has an allele with a recent, polymorphic SVA insertion that is in fact absent from most ASIP haplotypes. SVA elements are an active, recent family of retrotransposons unique to great apes, with the E, F and F₁ subfamilies specific to humans²⁷. Sequence evidence suggests that this SVA is in the youngest SVA F₁ subfamily^28,29, as it lacks a key 5′ (CCCTCT)n hexameric repeat and Alu-like elements and also contains 92 bp matching the MAST2 exon 1 that was 5′-transduced into the subfamily’s source element. Notably, this polymorphic SVA is 3.9 kb downstream of (and in the opposite orientation to) another, nonpolymorphic 1.6-kb SVA F retrotransposon within the same intron of ASIP (Fig. 1b).

Fig. 1: Characterization of a polymorphic SVA F₁ retrotransposon insertion within an intron of *ASIP.*

To facilitate deeper analysis of this variant in larger, phenotyped cohorts, we devised a strategy for ascertaining individual-level genetic states (genotypes) from short-read whole-genome sequencing (WGS) alignment patterns specific to each allele (Fig. 1c; Methods). Applying this approach to high-coverage WGS of 1000 Genomes Project (1KGP) samples³⁰ demonstrated good separation of genotype clusters (Fig. 1d). Across 1KGP population samples, the SVA F₁ insertion exhibited the highest allele frequencies (7–8%) in the northwest European (GBR and CEU) population samples and was not detected in (nonadmixed) samples of a variety of populations from Africa and Asia (Fig. 1e). In CEU and GBR population samples, the SVA F₁ insertion was in strong linkage disequilibrium (LD) (r² = 0.93) with the pigmentation-associated index single nucleotide polymorphism (SNP) rs6059655, and LD with other SNPs spanned a ~5-Mb extended haplotype (Fig. 1f).

We applied the same SVA genotyping approach to WGS data available for 169,641 unrelated White individuals of European genetic ancestry in the deeply phenotyped UK Biobank (UKB) cohort^31,32 (Fig. 2a), finding the allele frequency of the insertion to be 11%. We estimated the accuracy of genotyping to be r² ≈ 0.997 based on the concordance of genotype calls across sibling pairs sharing both ASIP alleles identically by descent (IBD2) (Fig. 2b). Across pigmentation traits including skin color, hair color and tanning response, the SVA F₁ insertion associated more strongly with lighter pigmentation (0.22 [95% confidence interval (CI):0.21–0.23], 0.24 [0.23–0.25] and 0.27 [0.26–0.28] s.d.; P = 2.8 × 10⁻³⁵¹, 2.0 × 10⁻³⁹⁶ and 1.5 × 10⁻⁵²³, respectively) than did all other SNP and indel variants in the region, explaining 0.9–1.4% of trait variance (Fig. 2c,d and Extended Data Figs. 1a,b and 2a). Likewise, the SVA F₁ insertion associated more strongly with increased skin cancer risk (odds ratio (OR) = 1.23 [1.18–1.27]; P = 1.3 × 10⁻²⁸) than did any nearby variant (Fig. 2e,f and Extended Data Figs. 1c,d and 2b). In a joint analysis (of both the SVA F₁ insertion and the lead SNP rs6059655) for tanning response (the pigmentation trait with the strongest association at the locus), the SVA F₁ insertion remained significantly associated (P = 6.9 × 10⁻³¹), whereas the lead SNP did not (P = 0.56). Fine-mapping analysis using SuSiE³³ also selected the SVA as the only member of a single credible set. Conditional association analyses including the SVA as a covariate further suggested that the SVA F₁ insertion almost completely accounted for pigmentation associations at the locus: residual signal was only 1–2% as strong (Extended Data Fig. 1e–j).

**Fig. 2: Association of SVA F₁ insertion with pigmentation phenotypes in UKB.**

As SNPs on this haplotype have been observed to associate with ASIP expression levels in skin¹¹, we next asked whether this insertion is the likely cause of this effect on ASIP expression. Genotyping the SVA F₁ insertion in WGS data available for tissue donors of the Genotype-Tissue Expression (GTEx) Project³⁴ (Extended Data Fig. 3) showed that, as with the pigmentation associations, the insertion associated strongly with ASIP expression in both sun-exposed (SE) and not sun-exposed (NSE) skin (P = 3.5 × 10⁻¹⁷ and 1.3 × 10⁻²¹, respectively), and that the insertion appeared to account for most of the eQTL signal at the locus (Fig. 3a–d and Extended Data Fig. 4). The SVA insertion associated with a 2.2-fold (1.9–2.6) increase in ASIP expression in NSE skin. Closer examination of RNA sequencing (RNA-seq) read alignments at ASIP showed substantial RNA-seq coverage at several alternative first exons as well as within introns (Fig. 3e). Whereas the SVA F₁ insertion associated with broadly increased expression across all exons, it associated with decreased abundance of unspliced transcripts containing intronic sequence upstream of the SVA (Fig. 3f).

**Fig. 3: Association of SVA F₁ insertion with expression of *ASIP* exons and introns in skin.**

We therefore hypothesized that the SVA F₁ insertion increases ASIP expression by improving splicing fidelity (and thus reducing the ascertainment of unspliced transcripts). To test this idea, we analyzed all the ASIP splice junctions observed in GTEx skin samples (reported by LeafCutter³⁵). One of the more frequent anomalous splice events involved splicing from an ASIP 5′ untranslated region (UTR) exon to a computationally predicted splice acceptor (SpliceAI³⁶ acceptor probability of 0.51) within the nonpolymorphic SVA F element that resides in the same intron (in opposite orientation) as the polymorphic SVA F₁ insertion (Fig. 4a,b). In heterozygotes for the SVA F₁ insertion, the relative frequency of splicing into the SVA F element, rather than the downstream coding exon, decreased from 14.1% to 3.1% in SE skin and 15.6% to 5.6% in NSE skin (P = 1.6 × 10⁻²¹ and 1.0 × 10⁻¹¹, respectively; Fig. 4c,d). In three GTEx donors homozygous for the SVA F₁ insertion, no evidence of splicing into the SVA F element was observed. As with the expression QTL, the SVA F₁ insertion seemed to explain this splicing QTL signal (Extended Data Fig. 5). Scanning downstream to determine the fate of these aberrant transcripts revealed a termination point of the intronic read alignments, at which several reads ended in poly(A) sequences (Fig. 4a,b), concordant with a polyadenylation site predicted by APARENT³⁷ (Fig. 4b). These observations together indicate that the aberrant transcripts spliced into the upstream SVA F element are terminated to yield a noncoding transcript, and that the presence of the polymorphic SVA F₁ insertion inhibits the production of such transcripts while increasing the production of ASIP-coding transcripts. Because translation stabilizes transcripts, this analysis may underestimate the relative amount of noncoding transcript being produced, as suggested by the fold increase in expression. We propose that the polymorphic SVA F₁ element—inserted in inverse orientation relative to the upstream SVA F element—forms (together with the upstream SVA F) a hairpin structure that blocks the function of splice enhancers and/or splice acceptor sequences in the SVA F element, ensuring productive splicing to ASIP-coding exons downstream of the SVA insertions (Fig. 4e). This model resembles recent reports of inverted pairs of Alu elements modulating splicing via formation of an RNA hairpin^3,38.

**Fig. 4: Aberrant splicing and early polyadenylation of *ASIP* transcripts from haplotypes without SVA F₁ insertion.**

These results led us to hypothesize that ancient ASIP alleles that predated both SVA insertions spliced ASIP more similarly to the splicing yielded by the present-day derived haplotype that contains both SVAs. Human genetic data do not enable assessment of this question because the upstream SVA F element, which is human-specific (Extended Data Fig. 6a), has reached fixation in present-day human populations (Extended Data Fig. 6b). We therefore used an in vitro construct to verify that the SVA F element can function as a splicing acceptor when inserted in the hybrid intron of the CAG promoter³⁹ (Extended Data Fig. 7a; Methods), similar to SVA splicing constructs from previous work²⁶. Insertion of the SVA F element caused approximately 11% of transcripts to splice into the SVA at the same aberrant acceptor site as in ASIP (Extended Data Fig. 7b–d), consistent with the idea that the ancient human (no longer polymorphic) SVA insertion had reduced productive splicing of ASIP.

In contrast to the nonpolymorphic SVA F element, the polymorphic SVA F₁ insertion allele (the reference, minor allele) was detected only in population samples with European genetic ancestry. Furthermore, this ASIP-expression-increasing, pigmentation-decreasing SVA F₁ insertion allele exhibited long-range (>3 Mb) LD—generally a property of recent mutations—on European haplotypes (Fig. 1f and Fig. 5a,b), whereas such long-range LD was not observed in non-European population samples (Fig. 5c). Analysis of haplotype genealogies in 1KGP CEU and GBR population samples (Methods) dated the insertion at 16,400–21,800 years ago and 14,300–25,400 years ago, respectively (Extended Data Fig. 8). These lines of evidence suggest that this SVA insertion has increased quickly in allele frequency relative to other European haplotypes at the ASIP locus, reaching up to 11% allele frequency in some European populations in a short period of time.

**Fig. 5: Recent selection for the SVA F₁ insertion haplotype in ancestral European populations.**

The timing and effects of these two retrotransposon insertions at ASIP seem broadly consistent with early and recent changes in human pigmentation. The ancient SVA F insertion—which, despite being relatively young within the SVA F family (Methods), is present in Neanderthal genomes^40,41,42,43 (Supplementary Figs. 1 and 2)—probably contributed to increases in pigmentation early in human evolution, potentially helping to enable humans’ concomitant loss of body hair. As modern humans later migrated around the world, pigmentation-lightening alleles appear to have emerged in several settings with reduced exposure to ultraviolet light^44,45. The much more recent SVA F₁ insertion seems to have contributed to decreased pigmentation in a subset of individuals within ancestral European populations, while also leading to an increase in sunburn frequency (OR = 1.34 [1.30–1.37]; P = 1.4 × 10⁻¹⁰⁵) and skin cancer risk (Fig. 5d).

ASIP thus appears to provide an example of how a sequence of retrotransposon insertions at a single locus can modulate phenotype several times in a species’ recent history. The fact that the effect of such a common, large polymorphism (3.3 kb) could remain unnoticed for 15 years (even after recent advances in retrotransposon association analysis²) speaks to the importance of fully integrating structural variants into genetic association analyses. Interestingly, on evolutionary timescales, expression of ASIP and its homologs seems to have been modulated primarily by structural mutations that caused pigmentation changes in diverse species^{18,19,20,21,22}. It is interesting to consider the possibility that some loci could be prone, across tens of millions of years, to evolve via structural variation.

The causal mechanism we have proposed—in which the recent SVA F₁ insertion acts by mitigating a splicing hypofunction introduced by an ancient SVA F insertion—should be testable by future experiments that insert the SVA F₁ element into cell types that express ASIP. Increasingly available single-cell RNA-seq data may help identify the responsible cell type, which previous work has suggested could be fibroblasts or melanocyte precursors in the dermis^11,46. While a recent study found that ectopic expression of ASIP (due to a rare ITCH–ASIP gene fusion) yields a monogenic phenotype including obesity, overgrowth and light pigmentation⁴⁷, the common SVA insertion polymorphism had little-to-no association with anthropometric traits in UKB (Extended Data Fig. 9) despite ample power to find associations, and its effect on ASIP expression (in GTEx) was observed only in skin and tibial nerve tissues (Fig. 3 and Extended Data Figs. 4 and 5) (possibly reflecting shared descent of melanocytes and Schwann cells from a common neural crest-derived progenitor⁴⁸). More broadly, these results highlight the potential for integrative analyses of newly available genetic data resources to yield new insights, even in loci that have been well studied.

Methods

HGSVC2 genetic data

PacBio long-read assembly contigs and Bionano structural variant calls for individuals in HGSVC2 were downloaded from the 1KGP FTP site (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2/release/v1.0/assemblies/ and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2/working/20200219_Bionano_optical_map_SVs/). Assembled sequences around ASIP were extracted by building each set of haploid contigs (h1 and h2) into a BLAST database and using a region from ASIP lacking repetitive sequences (chr20:34224248–34225005 in GRCh38) as a search query for BLASTN (v.2.12.0). Dotplots were generated with FlexiDot (v.1.06)⁵⁰.

1KGP genetic data

High-coverage data for 2,504 unrelated 1KGP individuals³⁰ was sliced to obtain paired reads mapping to the genomic interval chr20:34221626–34232418. To genotype the SVA F₁ element present in the reference genome at chr20:34228123–34231419, we counted (1) discordant read pairs aligned within a window extending 1 kb in each direction from the SVA F₁ with template length (TLEN) exceeding 2.5 kb (indicative of the presence of the major allele not containing the SVA F₁); and (2) reads spanning the right breakpoint (chr20:34231418) (indicative of the minor allele with the SVA F₁). Individuals were genotyped as homozygous for having the SVA F₁ insertion (Hom-INS) if the number of reads overlapping the right breakpoint (n_INS) was greater than three times the number of reads with TLEN >2.5 kb (n_ANC), that is, n_INS > 3n_ANC. Individuals were genotyped as homozygous for not having the insertion (Hom-ANC) if the number of reads overlapping the right breakpoint was <0.25× the number of reads with TLEN >2.5 kb, that is, n_INS < 0.25n_ANC. The remaining individuals were genotyped as heterozygous for the insertion. This strategy was necessitated by low mappability throughout the SVA F₁ element, which precluded read-depth analysis in the region. Similarly, to search for individuals that could in theory lack the upstream SVA F element at chr20:34222626–34224238, we counted (1) discordant read pairs aligned within a window extending 1 kb in each direction from the SVA F with TLEN exceeding 1.25 kb (which would suggest the presence of an allele not containing the SVA F); and (2) reads spanning the right breakpoint (chr20:34224238) (indicative of the expected SVA F sequence).

Pairwise LD plots (Fig. 5c) were generated for each of the European, East Asian and African genetic ancestry superpopulations by first extracting variants in the region chr20:32.5–37.5 Mb (GRCh38 coordinates) with population minor allele frequency (MAF) >1% with bcftools (v.1.14)⁴⁸. ASW and ACB populations were excluded from the African genetic ancestry set to avoid selecting variants that would have excessively long linkage due to recent admixture. For each superpopulation, 13,000 variants were then sampled randomly to yield even density in plotting, and pairwise r² matrices were computed with plink1.9 (v.1.90b6.26)⁵¹.

UKB genetic and phenotype data

Genotyping of the SVA F₁ insertion polymorphism was performed on read alignments at ASIP extracted from WGS data available for 199,956 UKB participants³². The same overall genotyping approach as above was used for UKB, with the slight modification that individuals were genotyped as Hom-INS if the read categories from before satisfied n_INS > 10 + 3n_ANC and as Hom-ANC if n_INS < 2 + 0.2n_ANC. Different linear separators were used here based on observed differences in the relative presence of n_INS and n_ANC for each genotype, presumably due to slight differences in sequencing and alignment parameters (for example, average coverage, fragment length, bwa-mem options). In this much larger sample, a third genotype cluster with few discordant read pairs and many reads overlapping the right breakpoint (indicating homozygosity for the SVA F₁ insertion allele) became clearly defined. Because the UKB data set contains several hundred sibling pairs that share both haplotypes IBD2 in this region (based on at most three mismatching SNP-array genotypes within a 2-Mb window centered at ASIP, computed using plink1.9 –genome), we could correlate genotype calls made between these sibling pairs to estimate genotyping accuracy.

From these 199,956 individuals, we first removed 11,953 who did not report White ethnicity. A further 18,027 individuals were then filtered to remove outliers (>6 s.d.) across the first ten genetic ancestry principal components (PCs) and to select one individual from each first- or second-degree related pair, as described⁵². An additional 335 individuals were then excluded who had WGS but were not present in the imp_v3 imputed genotype dataset³¹. This set of 169,641 individuals was then used for both genome-wide association analyses with BOLT-LMM (v.2.4.1)⁵³ as well as local association analyses with plink2 (v.2.00a3.7; Intel AVX2)⁵¹ (see below). Phenotypes of self-reported pigmentation traits (tanning ability, skin color, hair color, childhood sunburn frequency) were obtained from UKB touchscreen questionnaire datafields, adjusting the coding of hair color to assign red hair a value of 1 and blonde hair a value of 2 to better follow the order corresponding to increasing eumelanin:pheomelanin ratio⁵⁴ and binarizing the sunburn phenotype to individuals with no instances of severe childhood sunburns and those with at least one instance. Phenotypes of anthropometric traits (height, body mass index (BMI), waist-hip ratio adjusted for BMI) were processed as previously described⁵⁵. Phenotypes of skin cancer diagnoses (derived from cancer registry data, accessed 26 October 2022) were taken from the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10) codes C43 (melanoma), C44 (nonmelanoma) and the union of the two for all skin cancers (C43 + C44).

UKB: local association analyses at ASIP

VCF files containing genotype calls from UKB WGS were processed by first splitting multiallelic variants into separate biallelic variants with bcftools. Association analyses were performed on variants with MAF > 0.001 using plink2 (v.2.00a3.7; Intel AVX2) using linear regression for quantitative traits (tanning ability, skin color, hair color, height, BMI and waist-hip ratio adjusted for BMI) and logistic regression for binary skin cancer traits (C43, C44 and C43 + C44) with standard covariates (age, age squared, sex, 20 genetic PCs and assessment center). LD with the SVA F₁ (r²) was computed with plink1.9 (v.1.90b6.26).

UKB: genome-wide association analyses

Genome-wide association analyses of skin color, tanning ability and any skin cancer (C43 + C44) were performed on imputed variants (imp_v3) for the same 169,641 individuals using linear regression with BOLT-LMM (v.2.4.1). The above covariates as well as genotyping array were included as covariates, and variants were filtered to MAF > 0.001 and INFO > 0.3. Manhattan plots were generated using variants with P < 0.01 using the qqman (v.0.1.8) package⁵⁶.

UKB: fine-mapping of phenotype associations

The 5,000 top associating variants from the region chr20:32.5–37.5 Mb (GRCh38) were extracted from VCF files containing genotype calls from UKB WGS (again splitting multiallelic variants into separate biallelic variants with bcftools). Standard covariates (age, age squared, sex, 20 genetic PCs and assessment center) were regressed out from both phenotype and genotypes and provided as input to the susieR (v.0.12.35) package³³ for fine-mapping (allowing up to L = 10 causal variables).

UKB: measuring extended haplotype homozygosity

SVA F₁ diploid genotypes were phased onto a prephased scaffold of SNP-array genotyped variants for the same 169,641 individuals using the phase_common tool from SHAPEIT5 (v.5.1.1)⁵⁷. The SNP-array genotyped variants were then matched with variants present in the low-coverage 1KGP3 (ref. ⁵⁸) VCF, for which ancestral and derived alleles had been annotated previously. These phased haplotypes were then used to measure extended haplotype homozygosity (EHH) and generate bifurcation plots comparing haplotypes with and without the SVA F₁ insertion with the rehh (v.3.2.2) package⁵⁹.

1KGP: estimation of SVA F₁ insertion age

Variants present in the low-coverage 1KGP3 VCF were first recoded with REF allele as the ancestral allele. SVA F₁ diploid genotypes were then phased onto these scaffolds using the phase_common tool from SHAPEIT5. These phased haplotypes were then used as input to Relate (v.1.2.1)⁶⁰ with 1KGP3 genomic mask to filter low mappability regions (20140520.chr20.pilot_mask.fasta.gz), modified to include the SVA F₁ position itself. The coalescence rates provided by Relate for each 1KGP subpopulation were used.

Estimation of SVA F insertion age

We compared the sequence of the ASIP SVA F with the sequences of other SVA F elements in the GRCh38 reference, which could in theory allow estimation of insertion time based on the numbers of observed base pair differences within the Alu-like and SINE-R SVA sequence elements (chr20:34222672–34223023 and chr20:34223717–34224222 at ASIP) and the de novo mutation rate. We ultimately concluded that these sequence elements were insufficiently long to accumulate enough mutations to allow a precise date estimate. However, we discovered that the most closely related SVA F (chr2:169785008–169787441, matching 349 of 352 bp in the Alu-like region and 505 of 506 bp in the SINE-R region) is commonly polymorphic⁶¹. This suggests that, despite being fixed in modern humans and archaic hominins, the ASIP SVA F has probably been active relatively recently and may be a relatively younger SVA in the SVA F family (which is estimated to be ~3 million years old⁶²).

GTEx genetic and expression data

Genotyping of the SVA F₁ insertion polymorphism was performed as above on read alignments at ASIP from 838 donors with WGS available in the GTEx v.8 release. Because the separation of genotype clusters was less visually clear in GTEx WGS using the criteria above, the read groups informative of the two alleles were redefined more strictly as (1) discordant read pairs in which one read aligns before the SVA F₁ left breakpoint (POS < 34228123), the other aligns after the right breakpoint (PNEXT > 34231419), and the TLEN exceeds 2.5 kb (indicative of the presence of the major allele not containing the SVA F₁) and (2) reads aligning at least 5 bp on each side of the right breakpoint and lacking any soft-clipping (indicative of the minor allele with the SVA F₁). Using these criteria, samples with 0 reads with TLEN >2.5 kb were called as homozygous for the SVA F₁ insertion, and samples with fewer than three reads overlapping the right edge were called as homozygous for no SVA F₁ insertion.

GTEx: eQTL associations

For gene expression association analyses, the ASIP transcripts per million (TPM) values for a given tissue were taken from the GTEx v.8 release (GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct). As some of the alternate first exons of ASIP were not included in the GENCODE v.26 definitions used by GTEx for expression quantification, TPM values for each exon and intronic region were computed from RNA-seq read counts. Specifically, the read counts for each region were first determined by using RNA-seq reads filtered with samtools (v.1.15.1)⁶³ view for being in proper pair (-f 0x2), not failing platform/vendor quality checks (-F 0x200), having an alignment distance ≤6, and mapping quality of 255 (following GTEx; https://gtexportal.org/home/methods) as input for bedtools (v.2.27.1)⁶⁴ coverage with -split flag. Separately, the TPM sample-level normalization factor previously computed by GTEx and applied to all genes from a given biosample was derived from read counts (GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_reads.gct) and TPM values (GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct) for GAPDH (computing its length as the sum of its exon lengths; gencode.v26.GRCh38.genes.gtf), as:

$$\text{TPM scaling factor}=\,\frac{\text{read counts}}{(\text{effective gene length})(\text{TPM})}$$

We used GAPDH to recover this biosample-level scaling factor since GAPDH is highly expressed across all tissues, but the choice of gene used here has a negligible effect on TPM scaling factor estimation. TPM values for each exon or intron region were then calculated by first normalizing read counts from above by the region’s size before dividing by the derived sample scaling factor.

For both gene-level and exon/intron-level eQTL analyses, the TPM values were analyzed for association with WGS-derived biallelic SNP and indel variants (GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.SHAPEIT2_phased.MAF01.vcf.gz) as well as the SVA F₁ insertion. Analyses were performed using linear models including all GTEx v8 covariates, and conditional analyses that additionally included SVA F₁ insertion genotype as a covariate were also performed.

Allelic fold change (aFC) was estimated as described⁶⁵. First, in the linear regression

$$y={\beta }_{0}+{\beta }_{\rm{g}}g+{{\boldsymbol{\beta }}}_{\mathrm{cov}}{{\bf{X}}}_{\mathrm{cov}}+\varepsilon$$

the intercept, β₀, estimates the expression level of two reference alleles, and the genotype effect size, β_g, estimates the difference of expression between alleles (alternate − reference), where g are genotypes across donors, X_cov are covariates across donors, β_cov are the coefficients for each covariate, and ε is noise. A point estimate of aFC can then be found as

$$\text{aFC}=\frac{2{\beta }_{\rm{g}}}{{\beta }_{0}}+1$$

where the estimated expression from reference and alternate alleles were each constrained to be positive. CIs were estimated with the adjusted bootstrap percentile (Bca) method with 10,000 replicates as implemented in R boot (v.1.3-28) library.

GTEx: splicing QTL associations

Because the splicing phenotypes computed by GTEx consider a subset of splicing events at each locus³⁴, we quantified splice events across the region defined by the longest ASIP isoform. RNA-seq reads were first filtered with samtools as above, after which identification and quantification of splice junctions was performed with regtools (v.0.5.2)⁶⁶ junctions extract -a 8 -m 50 -M 500000 -s XS. This was first run on a merged bam from all GTEx biosamples corresponding to a given tissue to identify a set of nonspurious junctions (n > 10 observations total). The same regtools command was then run on each individual bam file, and the abundance of each junction seen in the merged set was recorded as individual-level quantification.

The fraction of reads aberrantly splicing into the SVA F splice acceptor was measured as the number of split reads supporting the junction between exon 2 and the splice acceptor within the SVA F divided by that plus the number of split reads supporting the junction between exons 2 and 3. For sQTL association analyses, the log-fraction of observed splice junctions for samples with at least one read spliced from exon 2 (adding a pseudocount of 1 to each junction count) was analyzed for association using the same approach as the eQTL analyses.

Splice acceptor prediction

SpliceAI³⁶ (v.1.3.1) was run on the GRCh38 sequence in the region of the ASIP intron centered roughly on the nonpolymorphic SVA F element, chr20:34221336–34224758, such that plotted SpliceAI predictions (Fig. 4b) considered at least 1 kb of sequence context on each side.

Polyadenylation site prediction

APARENT³⁷ (v.0.1) was run using the aparent_large_lessdropout_all_libs_no_sampleweights model on a region of the ASIP intron centered roughly on the fall-off in RNA-seq read coverage between the nonpolymorphic SVA F element and the SVA F₁ insertion, chr20:34224584–34228084, such that plotted predictions (Fig. 4b) considered at least 1 kb of sequence context on each side.

Cloning of CAG SVA splicing construct

pCAGEN³⁹ was a gift from Connie Cepko (Addgene plasmid no. 11160; http://n2t.net/addgene:11160; RRID:Addgene_11160). mGreenLantern was synthesized as a gBlock from Integrated DNA Technologies. This was cloned into pCAGEN at the EcoRI and NotI cut sites using standard methods to yield pCAG-mGL. The human fixed SVA F was amplified from genomic DNA from 1KGP individual NA12878 using primers SVA_F and SVA_R. This amplicon then had pCAGEN-derived sequences added for Gibson assembly (New England Biolabs, cat. no. E2621S) by nested PCR with primers SVA_CAG_F and SVA_CAG_R to yield pCAG-mGL_SVA. All primer sequences can be found in Supplementary Table 1.

Expression of CAG SVA construct, identification of splice site and reverse transcription digital droplet PCR

Plasmid pCAG-mGL_SVA was transfected into HEK293T cells (Takara Bio, cat. no. 632180) with Lipofectamine 3000 (Thermo Fisher Scientific, cat. no. L3000001) in six wells of two separate plates for a total of 12 replicates. Cells were given 24 h to express the construct before RNA was collected using Qiagen RNeasy columns (Qiagen, cat. no. 74104). To determine the exact ___location of the introduced splice junction, RNA was first converted to cDNA with oligo dT primers before amplifying the region spanning the expected junction between chicken beta-actin exon and SVA with CAG_bactin_fwd and CAG_SVA_rev primers. The amplicon matched the expected size (88 bp) and was Sanger sequenced with the CAG_SVA_rev primer.

To measure the relative amount of splicing into the inserted SVA F sequence, two ddPCR assays were designed: The first assay measures normal splicing between chicken beta-actin and rabbit beta-globin exons using HEX-labeled probe CAG_mGL_HEX with primers CAG_bactin_fwd and CAG_mGL_rev. The second assay measures splicing from chicken beta-actin exon to acceptor in SVA F using FAM-labeled probe CAG_SVA_FAM with primers CAG_bactin_fwd and CAG_SVA_rev. RNA was used as input with Bio-Rad One-Step RT-ddPCR Advanced Kit for Probes (Bio-Rad, cat. no. 1864022), with the optimal concentration of RNA input first identified by dilution series. Estimated Poisson-corrected concentrations of splicing into the SVA F were normalized by the sum of concentrations seen for both assays to yield an estimate of fraction spliced into the SVA F using QuantaSoft software (v.1.7). All primer and probe sequences can be found in Supplementary Table 1.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The following data resources are available by application: UKB (http://www.ukbiobank.ac.uk/) and GTEx (https://gtexportal.org/ under dbGaP accession no. phs000424.v9.p2). The following data resources are publicly available: 1KGP 30× coverage (https://www.internationalgenome.org/data-portal/data-collection/30x-grch38) and HGSVC2 (https://www.internationalgenome.org/data-portal/data-collection/hgsvc2).

Code availability

The following publicly available software resources were used: BLAST (v.2.12.0, https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.12.0/), FlexiDot (v.1.06, https://github.com/molbio-dresden/flexidot), bcftools (v.1.14, http://www.htslib.org/), samtools (v.1.15.1, http://www.htslib.org/), plink (v.1.90b6.26 and v.2.00a3.7, https://www.cog-genomics.org/plink/), BOLT-LMM (v.2.4.1, https://alkesgroup.broadinstitute.org/BOLT-LMM/), susieR (v.0.12.35, https://stephenslab.github.io/susieR/), qqman (v.0.1.8), https://cran.r-project.org/web/packages/qqman/index.html), SHAPEIT5 (v.5.1.1, https://odelaneau.github.io/shapeit5/), rehh (v.3.2.2, https://cran.r-project.org/web/packages/rehh/index.html), regtools (v.0.5.2, https://regtools.readthedocs.io/en/latest/), bedtools (v.2.27.1, https://bedtools.readthedocs.io/en/latest/), SpliceAI (v.1.3.1, https://github.com/Illumina/SpliceAI), APARENT (v.0.1, https://apa.cs.washington.edu/) and Relate (v.1.2.1, https://myersgroup.github.io/relate/index.html). The following proprietary software resources were used: QuantaSoft (v.1.7, https://www.bio-rad.com/en-us/life-science/digital-pcr/qx200-droplet-digital-pcr-system/quantasoft-software-regulatory-edition). Custom code used to generate results in this study is available via Zenodo at https://doi.org/10.5281/zenodo.10407629 (ref. ⁶⁷).

References

Cordaux, R. & Batzer, M. A. The impact of retrotransposons on human genome evolution. Nat. Rev. Genet. 10, 691–703 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kojima, S. et al. Mobile element variation contributes to population-specific genome diversification, gene regulation and disease risk. Nat. Genet. 55, 939–951 (2023).
Article CAS PubMed Google Scholar
Xia, B. et al. On the genetic basis of tail-loss evolution in humans and apes. Nature 626, 1042–1048 (2024).
Article CAS PubMed PubMed Central Google Scholar
Sulem, P. et al. Two newly identified genetic determinants of pigmentation in Europeans. Nat. Genet. 40, 835–837 (2008).
Article CAS PubMed Google Scholar
Brown, K. M. et al. Common sequence variants on 20q11.22 confer melanoma susceptibility. Nat. Genet. 40, 838–840 (2008).
Article CAS PubMed PubMed Central Google Scholar
Gudbjartsson, D. F. et al. ASIP and TYR pigmentation variants associate with cutaneous melanoma and basal cell carcinoma. Nat. Genet. 40, 886–891 (2008).
Article CAS PubMed Google Scholar
Suzuki, I. et al. Agouti signaling protein inhibits melanogenesis and the response of human melanocytes to alpha-melanotropin. J. Invest. Dermatol. 108, 838–842 (1997).
Article CAS PubMed Google Scholar
Hoekstra, H. E. Genetics, development and evolution of adaptive pigmentation in vertebrates. Heredity 97, 222–234 (2006).
Article CAS PubMed Google Scholar
Valverde, P., Healy, E., Jackson, I., Rees, J. L. & Thody, A. J. Variants of the melanocyte-stimulating hormone receptor gene are associated with red hair and fair skin in humans. Nat. Genet. 11, 328–330 (1995).
Article CAS PubMed Google Scholar
Valverde, P. et al. The Asp84Glu variant of the melanocortin 1 receptor (MC1R) is associated with melanoma. Hum. Mol. Genet. 5, 1663–1666 (1996).
Article CAS PubMed Google Scholar
Liu, F. et al. Genetics of skin color variation in Europeans: genome-wide association studies with functional follow-up. Hum. Genet. 134, 823–835 (2015).
Article PubMed PubMed Central Google Scholar
Visconti, A. et al. Genome-wide association study in 176,678 Europeans reveals genetic loci for tanning response to sun exposure. Nat. Commun. 9, 1684 (2018).
Article PubMed PubMed Central Google Scholar
Crawford, N. G. et al. Loci associated with skin pigmentation identified in African populations. Science 358, eaan8433 (2017).
Article PubMed PubMed Central Google Scholar
Martin, A. R. et al. An unexpectedly complex architecture for skin pigmentation in Africans. Cell 171, 1340–1353.e14 (2017).
Article CAS PubMed PubMed Central Google Scholar
Shido, K. et al. Susceptibility loci for tanning ability in the Japanese population identified by a genome-wide association study from the Tohoku Medical Megabank Project Cohort Study. J. Invest. Dermatol. 139, 1605–1608.e13 (2019).
Article CAS PubMed Google Scholar
Seo, J. Y. et al. GWAS identifies multiple genetic loci for skin color in Korean women. J. Invest. Dermatol. 142, 1077–1084 (2022).
Article CAS PubMed Google Scholar
Field, Y. et al. Detection of human adaptation during the past 2000 years. Science 354, 760–764 (2016).
Article CAS PubMed PubMed Central Google Scholar
Duhl, D. M. et al. Pleiotropic effects of the mouse lethal yellow (A^y) mutation explained by deletion of a maternally expressed gene and the simultaneous production of agouti fusion RNAs. Development 120, 1695–1708 (1994).
Article CAS PubMed Google Scholar
Norris, B. J. & Whan, V. A. A gene duplication affecting expression of the ovine ASIP gene is responsible for white and black sheep. Genome Res. 18, 1282–1293 (2008).
Article CAS PubMed PubMed Central Google Scholar
Nakayama, K. & Ishida, T. Alu-mediated 100-kb deletion in the primate genome: the loss of the agouti signaling protein gene in the lesser apes. Genome Res. 16, 485–490 (2006).
Article CAS PubMed PubMed Central Google Scholar
Duhl, D. M., Vrieling, H., Miller, K. A., Wolff, G. L. & Barsh, G. S. Neomorphic agouti mutations in obese yellow mice. Nat. Genet. 8, 59–65 (1994).
Article CAS PubMed Google Scholar
Bannasch, D. L. et al. Dog colour patterns explained by modular promoters of ancient canid origin. Nat. Ecol. Evol. 5, 1415–1423 (2021).
Article PubMed PubMed Central Google Scholar
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Article CAS PubMed PubMed Central Google Scholar
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Article PubMed PubMed Central Google Scholar
Korbel, J. O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).
Article CAS PubMed PubMed Central Google Scholar
Jurka, J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420 (2000).
Article CAS PubMed Google Scholar
Hancks, D. C. & Kazazian, H. H. SVA retrotransposons: evolution and genetic instability. Semin. Cancer Biol. 20, 234–245 (2010).
Article CAS PubMed PubMed Central Google Scholar
Damert, A. et al. 5′-Transducing SVA retrotransposon groups spread efficiently throughout the human genome. Genome Res. 19, 1992–2008 (2009).
Article CAS PubMed PubMed Central Google Scholar
Bantysh, O. B. & Buzdin, A. A. Novel family of human transposable elements formed due to fusion of the first exon of gene MAST2 with retrotransposon SVA. Biochemistry (Mosc). 74, 1393–1399 (2009).
Article CAS PubMed Google Scholar
Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440.e19 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B Stat. Methodol. 82, 1273–1300 (2020).
Article Google Scholar
GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article Google Scholar
Li, Y. I. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet. 50, 151–158 (2018).
Article CAS PubMed Google Scholar
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).
Article CAS PubMed Google Scholar
Bogard, N., Linder, J., Rosenberg, A. B. & Seelig, G. A deep neural network for predicting and engineering alternative polyadenylation. Cell 178, 91–106.e23 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lee, K., Ku, J., Ku, D. & Kim, Y. Inverted Alu repeats: friends or foes in the human transcriptome. Exp. Mol. Med. https://doi.org/10.1038/s12276-024-01177-3 (2024).
Matsuda, T. & Cepko, C. L. Electroporation and RNA interference in the rodent retina in vivo and in vitro. Proc. Natl Acad. Sci. USA 101, 16–22 (2004).
Article CAS PubMed Google Scholar
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
Article CAS PubMed PubMed Central Google Scholar
Prüfer, K. et al. The complete genome sequence of a Neandertal from the Altai Mountains. Nature 505, 43–49 (2014).
Article PubMed Google Scholar
Prüfer, K. et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 358, 655–658 (2017).
Article PubMed PubMed Central Google Scholar
Mafessoni, F. et al. A high-coverage Neandertal genome from Chagyrskaya Cave. Proc. Natl Acad. Sci. USA 117, 15132–15136 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jablonski, N. G. & Chaplin, G. The evolution of human skin coloration. J. Hum. Evol. 39, 57–106 (2000).
Article CAS PubMed Google Scholar
Jablonski, N. G. The evolution of human skin pigmentation involved the interactions of genetic, environmental, and cultural variables. Pigment Cell Melanoma Res. 34, 707–729 (2021).
Article CAS PubMed PubMed Central Google Scholar
Inaba, M. et al. Instructive role of melanocytes during pigment pattern formation of the avian skin. Proc. Natl Acad. Sci. USA 116, 6884–6890 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kempf, E. et al. Aberrant expression of agouti signaling protein (ASIP) as a cause of monogenic severe childhood obesity. Nat. Metab. 4, 1697–1712 (2022).
Article PubMed PubMed Central Google Scholar
Adameyko, I. et al. Schwann cell precursors from nerve innervation are a cellular origin of melanocytes in skin. Cell 139, 366–379 (2009).
Article CAS PubMed Google Scholar
Sabeti, P. C. et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832–837 (2002).
Article CAS PubMed Google Scholar
Seibt, K. M., Schmidt, T. & Heitkam, T. FlexiDot: highly customizable, ambiguity-aware dotplots for visual sequence analyses. Bioinformatics 34, 3575–3577 (2018).
Article CAS PubMed Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Article PubMed PubMed Central Google Scholar
Mukamel, R. E. et al. Repeat polymorphisms underlie top genetic risk loci for glaucoma and colorectal cancer. Cell 186, 3659–3673.e23 (2023).
Article CAS PubMed Google Scholar
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ito, S. & Wakamatsu, K. Diversity of human hair pigmentation as studied by chemical analysis of eumelanin and pheomelanin. J. Eur. Acad. Dermatol. Venereol. 25, 1369–1380 (2011).
Article CAS PubMed Google Scholar
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Article CAS PubMed PubMed Central Google Scholar
Turner, S. D. qqman: an R package for visualizing GWAS results using Q-Q and Manhattan plots. J. Open Source Softw. 3, 731 (2018).
Article Google Scholar
Hofmeister, R. J., Ribeiro, D. M., Rubinacci, S. & Delaneau, O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat. Genet. 55, 1243–1249 (2023).
Article CAS PubMed PubMed Central Google Scholar
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article PubMed Google Scholar
Gautier, M., Klassmann, A. & Vitalis, R. rehh 2.0: a reimplementation of the R package rehh to detect positive selection from haplotype structure. Mol. Ecol. Resour. 17, 78–90 (2017).
Article CAS PubMed Google Scholar
Speidel, L., Forest, M., Shi, S. & Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321–1329 (2019).
Article CAS PubMed PubMed Central Google Scholar
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, H. et al. SVA elements: a hominid-specific retroposon family. J. Mol. Biol. 354, 994–1007 (2005).
Article CAS PubMed Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Mohammadi, P., Castel, S. E., Brown, A. A. & Lappalainen, T. Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome Res. 27, 1872–1884 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cotto, K. C. et al. Integrated analysis of genomic and transcriptomic data for the discovery of splice-associated variants in cancer. Nat. Commun. 14, 1589 (2023).
Article CAS PubMed PubMed Central Google Scholar
Kamitaki, N. et al. Code for "A sequence of SVA retrotransposon insertions in ASIP shaped human pigmentation". Zenodo https://doi.org/10.5281/zenodo.10407629 (2023).

Download references

Acknowledgements

We thank A. Akbari, A. Barton, G. Genovese and M. Florio for helpful discussions; C. Usher for edits to text and figures and A. Arguello, A. Lewis and S. Hyman for helpful comments on the manuscript. This research was conducted using the UKB Resource under application number 40709. N.K. was supported by a US National Institutes of Health (NIH) training grant T32 HG002295. M.L.A.H. was supported by US NIH Fellowship F32 HL160061. R.E.M. was supported by US NIH grant K25 HL150334. S.A.M. was supported by US NIH grant R01 HG006855. P.-R.L. was supported by US NIH grants DP2 ES030554, R56 HG012698 and R01 HG013110 and a Burroughs Wellcome Fund Career Award at the Scientific Interfaces. The funders had no role in study design, data collection and analysis, the decision to publish or the preparation of the manuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Computational analyses were performed on the O2 High Performance Compute Cluster supported by the Research Computing Group at Harvard Medical School (http://rc.hms.harvard.edu) and on the UKB Research Analysis Platform.

Author information

Authors and Affiliations

Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Nolan Kamitaki, Margaux L. A. Hujoel, Ronen E. Mukamel & Po-Ru Loh
Center for Data Sciences, Brigham and Women’s Hospital, Boston, MA, USA
Nolan Kamitaki, Margaux L. A. Hujoel, Ronen E. Mukamel & Po-Ru Loh
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Nolan Kamitaki, Margaux L. A. Hujoel, Ronen E. Mukamel, Edward Gebara, Steven A. McCarroll & Po-Ru Loh
Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Nolan Kamitaki, Edward Gebara & Steven A. McCarroll
Department of Genetics, Harvard Medical School, Boston, MA, USA
Nolan Kamitaki, Edward Gebara & Steven A. McCarroll
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Nolan Kamitaki

Authors

Nolan Kamitaki
View author publications
Search author on:PubMed Google Scholar
Margaux L. A. Hujoel
View author publications
Search author on:PubMed Google Scholar
Ronen E. Mukamel
View author publications
Search author on:PubMed Google Scholar
Edward Gebara
View author publications
Search author on:PubMed Google Scholar
Steven A. McCarroll
View author publications
Search author on:PubMed Google Scholar
Po-Ru Loh
View author publications
Search author on:PubMed Google Scholar

Contributions

N.K., S.A.M. and P.-R.L. conceived the study design. N.K., M.L.A.H., R.E.M. and P.-R.L. did the computational analyses. N.K. and E.G. did the in vitro experiments. N.K., S.A.M. and P.-R.L. wrote the manuscript with contributions from all authors.

Corresponding authors

Correspondence to Nolan Kamitaki, Steven A. McCarroll or Po-Ru Loh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Associations of SVA F₁ insertion and nearby variants to pigmentation phenotypes in UK Biobank.

a-d, Local association plots in a 5-Mb window surrounding ASIP for (a) self-reported tanning ability (n = 166,404), (b) self-reported hair color (n = 167,310), (c) melanoma (C43 ICD-10 code, n = 169,635), and (d) other skin cancers including basal and squamous cell carcinomas (C44 ICD-10 code, n = 169,635). Association strengths track with linkage disequilibrium with the SVA F₁ insertion (yellow-to-purple shading), indicated by the large purple dot. e-j, Conditional association plots for SNPs and indels after including SVA F₁ genotype as a covariate for (e) self-reported skin color (n = 167,568), (f) any skin cancer (C43 or C44 ICD-10 codes, n = 169,635), (g) self-reported tanning ability (n = 166,404), (h) self-reported hair color (n = 167,310), (i) melanoma (C43 ICD-10 code, n = 169,635), and (j) other skin cancers including basal and squamous cell carcinomas (C44 ICD-10 code, n = 169,635).

Extended Data Fig. 2 Genome-wide associations with tanning ability and skin cancer risk in UK Biobank.

a, Associations from linear regression across all imputed variants with self-reported tanning ability (n = 166,404). b, Associations from linear regression across all imputed variants with any type of skin cancer (union of C43 and C44 ICD-10 codes, n = 169,635).

Extended Data Fig. 3 Genotyping of SVA F₁ insertion in GTEx cohort.

a, Genotyping of 838 GTEx donors with whole-genome sequencing. b, Linkage disequilibrium (r²) of the SVA F₁ insertion with variants between 32.5 Mb to 37.5 Mb on chromosome 20 across GTEx donors.

Extended Data Fig. 4 Associations of SVA F₁ insertion and nearby variants to expression of ASIP in skin and tibial nerve.

a, ASIP gene expression in GTEx tibial nerve samples (n = 532), stratified by SVA F₁ insertion genotype. Tibial nerve was the only other tissue that appeared to have evidence of the same eQTL. TPM, transcripts per million. b, Local association plot for ASIP gene expression in tibial nerve samples (n = 532) in the region 32.5 Mb to 37.5 Mb on chromosome 20. Association strengths track with linkage disequilibrium with the SVA F₁ insertion (yellow-to-purple shading), indicated by the large purple dot. c, Conditional association plot for ASIP gene expression in GTEx skin (not sun-exposed, NSE) samples (n = 517) after including SVA F₁ genotype as a covariate. d, As in c, but for skin (sun-exposed, SE) samples (n = 605). e, As in c, but for tibial nerve samples (n = 532).

Extended Data Fig. 5 Associations of SVA F₁ insertion and nearby variants to aberrant ASIP splice junction usage in skin and tibial nerve.

a, Fraction of splice junctions from exon 2 that aberrantly splice into the acceptor site in the nearby SVA F element (versus splicing to exon 3), stratified by SVA F₁ insertion genotype in GTEx tibial nerve samples. Only samples with greater than 10 total reads supporting either splice junction are included in the violin plot (n = 33) to reduce noise from less informative points. Centers: combined fraction of aberrant splicing across all samples with each SVA F₁ insertion genotype (total n = 532); error bars: 95% CIs from bias-corrected and accelerated bootstrap. b, Local association plot for aberrant splice junction usage in skin (not sun-exposed, NSE) samples (n = 433) in the region 32.5 Mb to 37.5 Mb on chromosome 20. Association strengths track with linkage disequilibrium with the SVA F₁ insertion (yellow-to-purple shading), indicated by the large purple dot. c, As in b, but after including SVA F₁ genotype as a covariate. d, As in b, but for skin (sun-exposed, SE) samples (n = 497). e, As in d, but after including SVA F₁ genotype as a covariate. f, As in b, but for tibial nerve samples (n = 364). g, As in f, but after including SVA F₁ genotype as a covariate.

Extended Data Fig. 6 Characterization of non-polymorphic SVA F element upstream of SVA F₁ insertion.

a, Pairwise sequence alignment dot plot of GRCh38 human reference vs. panTro6 chimpanzee reference at ASIP. The human reference contains an SVA F element upstream of the polymorphic SVA F₁ insertion, neither of which are present in chimpanzee. b, Assessment of SVA F presence in 1KGP individuals. Similar to the genotyping approach we used for the polymorphic SVA F₁ insertion, we counted reads overlapping the right edge of the SVA F element (indicating presence of at least one allele containing the SVA F) and discordant reads with a fragment size approaching the length of the SVA F element (1.6 kb) (which would indicate the presence of an allele lacking the SVA F). All individuals in 1KGP appear to carry the SVA F on both ASIP alleles.

Extended Data Fig. 7 Construct to measure splicing into upstream SVA F element in vitro.

a, Design of base construct, pCAG-mGL, and relative position of introduced SVA F sequence in the hybrid intron of the CAG promoter at XbaI restriction site. b, Sanger sequencing results from transcripts produced by pCAG-mGL_SVA construct and match to the expected sequence that would arise from splicing from the upstream chicken beta-actin exon (in blue) to the aberrant splice acceptor within the SVA F element (in red) observed in GTEx RNA-seq at ASIP. Note that the sequence is antisense to the transcript. c, Design of RT-ddPCR assays to measure relative splicing from the chicken beta-actin exon to introduced splice acceptor in SVA F versus downstream rabbit beta-globin exon. The arrows represent forward and reverse primers and the rectangles with circles represent quenched fluorescently labeled probes, where the blue circle is a FAM fluorophore, the green circle is a HEX fluorophore, and the gray circles are quenchers that are cleaved during polymerase extension. d, Fraction of splicing into the introduced SVA F element in pCAG-mGL_SVA construct (n = 12 replicates). Each point indicates the measured value in a replicate, with the bar indicating the mean fraction.

Extended Data Fig. 8 Genealogy of haplotypes at the ASIP SVA F₁ insertion.

a,b, Coalescent trees estimated by Relate for (a) CEU (n = 202) and (b) GBR (n = 186) haplotypes at the SVA F₁ insertion site. The purple branch contains all haplotypes carrying the SVA F₁ insertion. Age (in years) on the y-axis assumes a generation time of 28 years.

Extended Data Fig. 9 Associations of SVA F₁ insertion and nearby variants to anthropometric phenotypes in UK Biobank.

a-c, Local association plots in a 5-Mb window surrounding ASIP for (a) BMI (n = 169,052), (b) height (n = 169,239), and (c) waist-hip ratio adjusted for BMI (n = 169,285). Only the associations with height reach genome-wide significance, but the association pattern does not appear to colocalize with linkage disequilibrium with the SVA F₁ insertion (yellow-to-purple shading).

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2 and Table 1.

Reporting Summary

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kamitaki, N., Hujoel, M.L.A., Mukamel, R.E. et al. A sequence of SVA retrotransposon insertions in ASIP shaped human pigmentation. Nat Genet 56, 1583–1591 (2024). https://doi.org/10.1038/s41588-024-01841-4

Download citation

Received: 08 August 2023
Accepted: 21 June 2024
Published: 24 July 2024
Issue Date: August 2024
DOI: https://doi.org/10.1038/s41588-024-01841-4

This article is cited by

Regulatory and disruptive variants in the CLCN2 gene are associated with modified skin color pattern phenotypes in the corn snake
- Sophie A. Montandon
- Pierre Beaudier
- Athanasia C. Tzika
Genome Biology (2025)
Mobile element insertions affect human pigmentation and skin cancer risk
- Jeffrey M. Kidd
Nature Genetics (2024)