Introduction

A mix of heritable and environmental influences have been characterised for many complex human traits and behaviours. For transgender and other gender diverse identities (for a description of relevant terminology used in this paper, please see Table 11), evidence from multiple studies suggest genetic involvement2,3. As is the case for other complex traits4, environmental influences are also undoubtedly important in shaping gender identity. However, current theories that propose a causal role for specific psychosocial exposures – such as early childhood trauma and poor parenting practices5,6, social contagion7,8, mental illness9, and pornography10 – lack clear empirical support11. Nevertheless, such theories have shaped the public discourse on gender identity development, influencing policy making, societal attitudes, and clinical approaches. Most notably, the recent Cass Review placed strong emphasis on such psychosocial theories and gave little credence to earlier twin studies suggesting genetic involvement12, [p. 114–116].

Table 1 Glossary of relevant terms.

In classic twin study designs, estimates of heritability are determined by comparing the rates of trait concordance among monozygotic (MZ, identical) twins, who are largely genetically identical, and dizygotic (DZ, fraternal) twins, who, like singleton siblings, share an average of half of their genetic material. Evidence of underlying genetic contributory influences, or heritability, comes from observations of higher rates of phenotype concordance among both DZ and MZ twins compared to the background rate in the general population, as well as an elevation in the rate of concordance among MZ compared to DZ twins.

To date, heritability estimates for gender identity have come from eight published twin studies. Seven provided evidence suggesting a genetic contribution, with heritability estimates ranging between 0.1 and 0.813,14,15,16,17,18,19. All seven studies, however, have limitations. Coolidge et al. (2002) assigned gender identity from parental reports of gendered behaviours. Bailey et al. (2000), Buhrich (1990), Burri et al. (2011) and Sasaki et al. (2016) used an indirect measure (Likert scale) to infer gender identity. Heylens et al. (2012) partially relied upon data from prior case study literature, which likely introduced the risk of ascertainment bias20. Similarly, Diamond (2013) supplemented a local twin survey with data available from earlier case study reports. While all seven yielded results suggesting genetic involvement, their limitations cannot be overlooked.

The eighth and most recent report on the subject, Karamanis et al. (2022), differs from the others by using twin data sourced from Sweden’s national health data linkage system21. Linked health data records were examined to detect individuals with a Gender Dysphoria diagnosis, which can serve as a proxy for transgender and other gender diverse identities. Among 67 twin pairs identified with at least one twin with a Gender Dysphoria diagnosis, concordance was documented only among opposite sex pairs and not same sex pairs. Karamanis et al. proposed that this observation could be explained by intrauterine hormonal influences, whereby sex hormones produced by one twin could influence the gender identity development of the other. Such a theory is potentially consistent with existing evidence of intrauterine hormonal influences on gender role behaviour in individuals with congenital adrenal hyperplasia22 and in non-human mammal species23,24. Although Karamanis et al. did not assess zygosity, they nonetheless inferred that their finding of zero concordance among 40 same-sex twins – some of whom could be assumed to be MZ – suggested the absence of genetic influences on the development of gender dysphoria. However, Karamanis et al. did draw attention to potential confounding influences, such as their stringent diagnostic requirement for Gender Dysphoria, which introduced a risk of misclassifying undetected concordant twin pairs as discordant. They also noted that, in the absence of zygosity data, it was possible that their observation could have occurred because of random error and highlighted the importance of future studies to verify their observations. Ignoring these caveats and previous twin literature, the Cass Review relied solely on Karamanis et al. to support the claim “that environmental influences during pregnancy are a more likely explanation for the development of gender dysphoria than genetics”12, [p. 116].

To address the uncertainty in this field, here we first sought to test the null hypothesis that genetics do not contribute to transgender and other gender diverse identities using a systematic twin recruitment strategy to reduce bias. In the event that the null hypothesis is rejected, we next aimed to provide relative risk estimates for the occurrence of transgender and other gender diverse identities in MZ and DZ twins compared with background population prevalence estimates. Finally, we sought to examine the question of intrauterine hormonal influences by comparing concordance rates in same and opposite sex twin pairs.

Materials and methods

Participants and recruitment

Participants included transgender, non-binary and other gender diverse individuals who were also a twin and were systematically recruited from two sources. The first source was the Royal Children’s Hospital Gender Service (RCHGS) in Victoria, Australia, which provides multidisciplinary specialist services for trans young people under the age of 18 years and systematically collects information on whether clients are twins. The second source was from the clinical register of a member of our authorship team (FH), a psychiatrist with years of experience in the care of trans adults and who has also systematically collected details about whether his trans patients are twins.

Recruitment from RCHGS involved contacting all twins (n = 59) who had sought care from the service since February 1, 2017, when information on twins was first systematically collected, and inviting them to participate in the study via email. Non-responders were then sent two follow-up messages after 4 and 8–12 weeks from the initial contact. A similar method was used for twins (n = 17) on FH’s register, with a single follow-up phone call at 8–12 weeks.

Following recruitment, data were obtained from 27 twin pairs, including 13 MZ and 14 DZ pairs. Grouped by birth-registered sex, there were 6 male-male, 15 female-female and 6 opposite-sex pairs. The mean age of participants was 19.9 years (SD = 12.0 years). Among the 54 individual participants, 33 were gender diverse (demographic summary in Supplementary Table 1).

Ethics, recruitment and consent

The Royal Children’s Hospital Human Research Ethics Committee granted approval for the study (#91711). Informed consent was obtained for all participants, including parental consent for legal minors (younger than 18 years). All methods were carried out in accordance with relevant guidelines and regulations.

Measures

Gender identity

A standardised two-step process, which did not require a detailed gender history, was used to determine gender identity25. This involved each individual recording both their gender identity and sex assigned at birth, as considered best practice by the Australian Bureau of Statistics26. Participants entered these data both for themselves and on behalf of their twin. Gender identity was classified using the approach previously described by Blacklock et al.27.

Zygosity

Zygosity analysis was not required for opposite-sex pairs (n = 6), which are by necessity DZ. Same-sex twins (n = 21) were asked if prior zygosity testing had been arranged and, if so, past results were obtained (n = 2). Otherwise, they were offered DNA-based zygosity testing, which some pairs accepted (n = 5). These participants self-collected a DNA sample, which was analysed by 16 loci microsatellite genotyping at the NATA-accredited clinical genetics laboratory at The Children’s Hospital at Westmead, in New South Wales, Australia. These pairs were classified as MZ if alleles at all loci were concordant and as DZ if allele size differences occurred across multiple loci. For pairs where genomic-based zygosity testing was not arranged (n = 14), zygosity was assigned from ‘Peas-In-A-Pod’ questionnaire results28.

Search for comparable twin data in the literature

A recent systematic search of the relevant twin literature [29-manuscript under review] was used to identify studies using methods comparable to our own, which could be combined with our data for meta-analysis. Inclusion criteria were pre-determined to ensure that studies provided data suitable for pooling, while the exclusion criterion aimed to minimise potential biases.

The inclusion criteria were: (1) a twin study design to ensure methodological comparability; (2) reporting of concordance/discordance status of pairs for data pooling; (3) reporting of birth-assigned sex and zygosity for group comparison analyses; and (4) investigation of gender identity, rather than concepts such as gendered behaviour or sexuality which are likely to have different heritable contributions. The sole exclusion criterion was the case study literature, due to concerns that these data may inflate genetic estimates. Additionally, any subgroup(s) reported in papers (e.g. survey data, clinically recruited samples) that met the eligibility criteria were included in the pooled analysis.

Of the eight published twin studies, three met these criteria: Diamond (2013) (survey data only)17, Heylens et al. (2012)18 (subgroup reported by Zucker, 201130), and all data from Sasaki et al. (2016)19.

Statistical analyses

Concordance rates were calculated for MZ (RMZ), DZ (RDZ), same sex (SS) (male-male/female-female) (RSS) and opposite-sex (OS) (ROS) groups using simple descriptive statistics. Two-tailed z-tests for two proportions were conducted to investigate statistical differences between groups. Relative risk ratios were calculated to quantify the difference in likelihood of a co-twin categorised as transgender, comparing either between groups (e.g., MZ vs. DZ) or against the general population. For calculations against the general population, a range of prevalence rates (0.5–3%) were utilised1,31, with a central conservative estimate of 1% used for the final reported results.

The prenatal hormone exposure theory was tested by comparing RSS and ROS for (1) all twin pairs and (2) only DZ pairs. The former approach aligns with comparisons commonly employed in the existing literature, while the latter specifically examines twins with a comparable average genetic similarity. Two-tailed z tests of two proportions and relative risk ratios (as described above) were also used for these calculations. All calculations were performed using STATA32.

Results

Table 2 presents concordance rates categorised by zygosity and birth-assigned sex. The rate of concordance among MZ twins (4/13, 30.8%) was four-fold higher compared with DZ twins (1/14, 7.1%) but this difference did not reach statistical significance given the small sample size (z-test p = 0.113). Concordance among SS and OS pairs were similar (SS pairs 4/21 (19.0%); OS pairs 1/6 (16.7%); z-test p = 0.898). Of the four concordant pairs recruited, three pairs were assigned female at birth, and one was an opposite-sex pair.

Table 2 Concordance rates for a transgender or gender diverse identity among MZ, DZ, SS and OS twins (Australian clinical data, n = 27), with MZ pairs analysed by assigned sex and combined.

Next, we identified three previous twin studies that met our eligibility criteria for the pooled analysis: survey data from Diamond (2013) (n = 75), a subset of participants from Heylens et al. (2012) (n = 25), and Sasaki et al. (2016) (n = 336). We then combined the four cohorts, which together provided 463 twin pairs for meta-analysis (Table 3). The rate of concordance among MZ twins (47/222, 21.2%) was 2.6-fold higher compared with DZ twins (21/241, 8.2%) and this difference was statistically significant (z-test, p < 0.001). In contrast, concordance rates among SS and OS pairs were similar, regardless of whether the analyses were performed using all SS pairs or only those who were DZ. For example, concordance among SS and OS DZ pairs were 12/131 (9.2%) and 9/110 (8.2%) respectively. We were also interested to compare concordance rates in MZ twins based on birth-assigned sex, and we observed that these rates were not statistically different (MM 7/56, 95%CI 5.2–24.1%; FF 40/166, 95%CI 17.8–31.3%).

Table 3 Meta-analysis of concordance rates for a transgender or gender diverse identity among MZ, DZ, SS and OS twins, with MZ pairs analysed by assigned sex and combined.

Finally, to contextualise our findings with population prevalence data on gender diversity, the rates of gender diversity among these different twin groups were compared against population prevalence rates (Table 4). Notably, the relative risk was 21.2 for MZ pairs (95% CI 16.4–27.3), and 8.7 for DZ pairs (95% CI 5.8–13.1) compared to the current generally accepted population prevalence rate of ~ 1%1. Given the range of previous population prevalence estimates1, relative risks were also calculated for alternative estimates (Table 4), and all demonstrated a similar trend.

Table 4 Relative risk ratios for MZ and DZ groups vs. general population (GP) at prevalence estimates of 0.5%, 1%, 2% and 3%.

Discussion

In this study, our systematically ascertained clinical cohort of MZ and DZ twin pairs yielded a greater than four-fold higher rate of concordance for gender diversity among MZ twins compared with DZ pairs. However, the statistical significance of this observation was limited by our relatively low sample size. To address this issue, we were able to identify three previous studies (Diamond, 2013, Heylens et al., 2012; and Sasaki et al., 2016) that applied a similar methodological approach to ours, and merged the four data sets. This combined dataset revealed a 2.6-fold higher concordance rate among MZ twin pairs compared with DZ pairs, with a clear separation between their respective 95% confidence intervals (MZ 21.2% (16.4–27.3%); DZ 8.7% (5.8–13.1%)).

Our findings are broadly consistent with earlier twin studies that suggested a genetic contribution to gender diversity13,14,15,16,17,18,19. Although the difference between our MZ and DZ concordance estimates was not as striking as that of Heylens et al. (2012), who reported concordance rates of 39.1% (MZ; n = 23) and 0.0% (DZ; n = 21), their results also relied on case reports, which are likely subject to significant publication bias. Superficially, our findings appear at odds with those of Karamanis et al. (2022), who found no evidence to support a genetic contribution to gender diversity. This was based on their observation of 0% (0/40) concordance for gender dysphoria among SS pairs and 37% (10/27) concordance among OS pairs. However, as Karamanis et al. acknowledged, they were limited by an absence of zygosity details. Accepting these authors’ suggestion of a 1:1 DZ: MZ mix among their 40 SS pairs, the MZ concordance value of 0% (0/20) would have a 95% confidence interval of 0.0–16.8%, which actually overlaps with that of our combined group of MZ pairs.

The significance of this overlap cannot be underestimated, particularly given the socio-political influence of the Cass Review, which relied on the report by Karamanis et al. to support a claim of there being no “definitive evidence about biological causes of gender incongruence” and instead ran with a counter narrative that speculated on the role of psychosocial influences, such as social stressors, peer influence, mental health problems, neurodiversity, social media, access to pornography, and adverse childhood experiences [12–Chaps. 7 and 8]. Cass also used the findings of Karamanis et al. to promote the prenatal sex hormone exposure theory while dismissing earlier evidence indicating the likely role of genetic factors [12– p115-117]. However, the cautionary remarks by Karamanis et al. about the importance of trying to replicate their findings must be heeded. To this end, we took the opportunity to compare concordance among all pairs and OS pairs in our combined dataset, and assessed for evidence of a potential influence of prenatal hormone exposure on gender diversity. Unlike Karamanis et al., our analysis failed to identify any significant difference between SS and OS groups, despite being much better powered to do so (SS pairs: n = 335 in our pooled analysis, n = 40 in Karamanis et al.; OS pairs: n = 110 in our pooled analysis, n = 27 in Karamanis et al.)21. In this way, our results do not provide any evidence to support their proposed hypothesis that prenatal sex hormone exposure from a co-twin influences gender identity formation.

In some ways, this lack of evidence is perhaps unsurprising since – in the context of twins – it is difficult to identify plausible mechanisms by which such hormone exposure might occur. After all, DZ twins almost universally have separate fetal circulations. Thus, transfer of sex hormones would presumably need to occur via the maternal circulation, which would significantly dilute out their effects. Consistent with this, a recent systematic review examining in utero sex hormone transfer between twins found only inconsistent evidence regarding any effect on gender-related behaviour33. Finally, it is worth noting that previous human studies which support a role for prenatal hormone exposure in the development of gender identity have centred around singleton cases of congenital adrenal hyperplasia in which there is increased fetal testosterone production in XX individuals22, a scenario that is clearly very different from one involving twins with normal levels of sex hormone production. In terms of strengths and limitations, our methodological approach hopefully reduced the risk of bias present in previous studies. For instance, we did not utilise previously published case report data in deriving our estimate – including a recent report of identical triplets concordant for a transgender male identity34 – and our clinical samples were recruited using systematic identification of twins. We also operationalised gender identity by allowing individuals to self-identify, rather than relying on: parental report (e.g. Coolidge et al., 2002), questions related to clinical symptoms of gender dysphoria (e.g. Coolidge et al., 2002; Sasaki et al., 2016) (which may or may not be present in transgender individuals depending on the extent of existing gender affirmation), formal diagnosis of gender dysphoria (e.g. Karamanis et al., 2022), or prior use of gender-affirming medical interventions (e.g. Karamanis et al., 2022), which are not always accessible or desired by transgender individuals, especially those who identify as non-binary35.

However, it should be noted that our Australian sample was derived from clinical databases and thus might not be representative of transgender individuals from the general community. Moreover, given that these participants were specifically invited to participate in a study whose stated aim was to estimate the heritability of gender diversity, it is possible that concordant twins may have been more interested to participate. Our higher-than-expected rates of MZ pairs are also an indicator of likely recruitment bias, but this is extremely common in twin20 and reflects what seems to be an inherently greater interest of MZ twins being involved in research compared to DZ twins.

It is important also to consider the strengths and weaknesses of our meta-analysis. One strength was the decision to exclude case study-based data that may have over-represented concordance rates in MZ and DZ pairs, thus enhancing the reliability of the findings. Nevertheless, participants from some of the additional studies included in our meta-analysis were not systematically recruited, which could still potentially inflate concordance estimates. For example, participants from Diamond (2013) were ascertained through colleagues and transgender community groups, which may have introduced recall bias, particularly for concordant pairs. Meanwhile, the Zucker et al. (2011) pairs were recruited from a clinical sample similar to ours. In contrast, the Sasaki et al. (2016) data were derived from a larger, comprehensive twin study encompassing multiple research aspects, which effectively minimised the potential of an ascertainment bias for concordant pairs. Another limitation of the meta-analysis was that all four studies captured information about gender identity cross-sectionally and thus provided only a particular snapshot in time. Since gender diversity can develop over the lifespan and transgender individuals may not “come out” until later in life, some of our apparently discordant twin pairs – especially those involving younger participants – may in time become concordant, which would alter heritability estimates. Finally, another limitation of our meta-analysis is the relatively small number of subjects in specific sub-groups. For example, it is currently unknown whether genetic and environmental factors differentially influence the development of gender diversity in assigned males and assigned females, but subdivision of our DZ and MZ pairs by birth-assigned sex resulted in numbers too small to permit a well-powered statistical analysis. Looking to the future, larger studies will therefore be required to better address this question.

It is also important to note that, regardless of the degree of heritability observed, the results of twin studies such as ours remain consistent with a wide variety of causal explanations, because twin studies cannot reveal specific mechanisms through which any genes (or environments) operate.

Conclusion

The relative risk ratios of 21.2 for MZ pairs and 8.7 for DZ pairs, when compared to the population prevalence of 1%, provide compelling evidence that familial factors play a significant role in gender diversity. Furthermore, the relative risk of 2.6 between MZ and DZ pairs strongly suggests that these observations are due to genetic involvement. Contrasting a prominent recent study, we did not find any evidence to support prenatal hormone transfer between twins playing a contributory role in gender identity development. Looking ahead, future research should continue to investigate the aetiological contribution of genetics to gender diversity, including via the recruitment of larger numbers of twin pairs.