Abstract
The genetic landscape of human Mendelian diseases is shaped by mutation and selection. Although selection on heterozygotes is well-established in autosomal-dominant disorders, convincing evidence for selection in carriers of pathogenic variants associated with recessive conditions is limited. Here, we studied heterozygous pathogenic variants in 1,929 genes, which cause recessive diseases when bi-allelic, in n = 378,751 unrelated European individuals from the UK Biobank. We find evidence suggesting fitness effects in heterozygous carriers for recessive genes, especially for variants in constrained genes across a broad range of diseases. Our data suggest reproductive effects at the population level, and hence natural selection, for autosomal-recessive disease variants. Further, variants in genes that underlie intellectual disability are associated with lower educational attainment in carriers, and we observe an altered genetic landscape, characterized by a threefold reduction in the calculated frequency of bi-allelic intellectual disability in the population relative to other recessive disorders.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
27,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
118,99 € per year
only 9,92 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
The raw data used in this study are available as part of the UK Biobank dataset.
Code availability
The code used to generate the data for this project is available via GitHub at https://github.com/Genome-Bioinformatics-RadboudUMC/ukbb_recessive_public.
References
Acuna-Hidalgo, R., Veltman, J. A. & Hoischen, A. New insights into the generation and role of de novo mutations in health and disease. Genome Biol. 17, 241 (2016).
Goldmann, J. M., Veltman, J. A. & Gilissen, C. De novo mutations reflect development and aging of the human germline. Trends Genet. 35, 828–839 (2019).
Shadrina, M. et al. Automated identification of germline de novo mutations in family trios: a consensus-based informatic approach. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2024.03.08.584100v1 (2024).
Kaplanis, J. et al. Genetic and chemotherapeutic influences on germline hypermutation. Nature 605, 503–508 (2022).
Goldmann, J. M. et al. Differences in the number of de novo mutations between individuals are due to small family-specific effects and stochasticity. Genome Res. 31, 1513–1518 (2021).
Gilissen, C. et al. Genome sequencing identifies major causes of severe intellectual disability. Nature 511, 344–347 (2014).
Kaplanis, J. et al. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature 586, 757–762 (2020).
Girirajan, S. & Eichler, E. E. Phenotypic variability and genetic susceptibility to genomic disorders. Hum. Mol. Genet. 19, R176–R187 (2010).
Taylor, S. M., Parobek, C. M. & Fairhurst, R. M. Haemoglobinopathies and the clinical epidemiology of malaria: a systematic review and meta-analysis. Lancet Infect. Dis. 12, 457–468 (2012).
Weatherall, D. J. Genetic variation and susceptibility to infection: the red cell and malaria. Br. J. Haematol. 141, 276–286 (2008).
Aidoo, M. et al. Protective effects of the sickle cell gene against malaria morbidity and mortality. Lancet 359, 1311–1312 (2002).
Hogenauer, C. et al. Active intestinal chloride secretion in human carriers of cystic fibrosis mutations: an evaluation of the hypothesis that heterozygotes have subnormal active intestinal chloride secretion. Am. J. Hum. Genet. 67, 1422–1427 (2000).
Oussalah, A. et al. Population and evolutionary genetics of the PAH locus to uncover overdominance and adaptive mechanisms in phenylketonuria: results from a multiethnic study. EBioMedicine 51, 102623 (2020).
Barton, A. R., Hujoel, M. L. A., Mukamel, R. E., Sherman, M. A. & Loh, P. R. A spectrum of recessiveness among Mendelian disease variants in UK Biobank. Am. J. Hum. Genet. 109, 1298–1307 (2022).
Heyne, H. O. et al. Mono- and biallelic variant effects on disease at biobank scale. Nature 613, 519–525 (2023).
Agarwal, I., Fuller, Z. L., Myers, S. R. & Przeworski, M. Relating pathogenic loss-of-function mutations in humans to their evolutionary fitness costs. eLife 12, e83172 (2023).
Amorim, C. E. G. et al. The population genetics of human disease: the case of recessive, lethal mutations. PLoS Genet. 13, e1006915 (2017).
Fridman, H. et al. The landscape of autosomal-recessive pathogenic variants in European populations reveals phenotype-specific effects. Am. J. Hum. Genet. 108, 608–619 (2021).
Liu, A. et al. Evidence from Finland and Sweden on the relationship between early-life diseases and lifetime childlessness in men and women. Nat. Hum. Behav. 8, 276–287 (2024).
Mathieson, I. et al. Genome-wide analysis identifies genetic effects on reproductive success and ongoing natural selection at the FADS locus. Nat. Hum. Behav. 7, 790–801 (2023).
Gardner, E. J. et al. Reduced reproductive success is associated with selective constraint on human genes. Nature 603, 858–863 (2022).
Koko, M. et al. Exome sequencing of UK birth cohorts. Wellcome Open Res. 9, 390 (2024).
Chundru, V. K. et al. Federated analysis of autosomal recessive coding variants in 29,745 developmental disorder patients from diverse populations. Nat. Genet. 56, 2046–2053 (2024).
de Ligt, J. et al. Diagnostic exome sequencing in persons with severe intellectual disability. N. Engl. J. Med. 367, 1921–1929 (2012).
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Seplyarskiy, V. et al. A mutation rate model at the basepair resolution identifies the mutagenic effect of polymerase III transcription. Nat. Genet. 55, 2235–2242 (2023).
Weghorn, D. et al. Applicability of the mutation-selection balance model to population genetics of heterozygous protein-truncating variants in humans. Mol. Biol. Evol. 36, 1701–1710 (2019).
Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Demange, P. A. et al. Investigating the genetic architecture of noncognitive skills using GWAS-by-subtraction. Nat. Genet. 53, 35–44 (2021).
Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat. Genet. 48, 1462–1472 (2016).
Verweij, R. M. et al. Sexual dimorphism in the genetic influence on human childlessness. Eur. J. Hum. Genet. 25, 1067–1074 (2017).
Nisen, J., Martikainen, P., Kaprio, J. & Silventoinen, K. Educational differences in completed fertility: a behavioral genetic study of Finnish male and female twins. Demography 50, 1399–1420 (2013).
Zschocke, J., Byers, P. H. & Wilkie, A. O. M. Gregor Mendel and the concepts of dominance and recessiveness. Nat. Rev. Genet. 23, 387–388 (2022).
Goker-Alpan, O. et al. Parkinsonism among Gaucher disease carriers. J. Med. Genet. 41, 937–940 (2004).
Chen, C. Y. et al. The impact of rare protein coding genetic variation on adult cognitive function. Nat. Genet. 55, 927–938 (2023).
Rolland, T. et al. Phenotypic effects of genetic variants associated with autism. Nat. Med. 29, 1671 (2023).
Verweij, R. M. et al. Using polygenic scores in social science research: unraveling childlessness. Front. Sociol. 4, 74 (2019).
Wright, C. F. et al. Assessing the pathogenicity, penetrance, and expressivity of putative disease-causing variants in a population setting. Am. J. Hum. Genet. 104, 275–286 (2019).
Schuurs-Hoeijmakers, J. H. et al. Identification of pathogenic gene variants in small families with intellectually disabled siblings by exome sequencing. J. Med. Genet. 50, 802–811 (2013).
Yuen, R. K. et al. Whole-genome sequencing of quartet families with autism spectrum disorder. Nat. Med. 21, 185–191 (2015).
Kirk, E. P. et al. Gene selection for the australian reproductive genetic carrier screening project (‘Mackenzie’s Mission’). Eur. J. Hum. Genet. 29, 79–87 (2021).
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Li, Q. & Wang, K. InterVar: clinical interpretation of genetic variants by the 2015 ACMG-AMP guidelines. Am. J. Hum. Genet. 100, 267–280 (2017).
van der Velde, K. J. et al. MOLGENIS research: advanced bioinformatics data software for non-bioinformaticians. Bioinformatics 35, 1076–1078 (2019).
Carter, A. R. et al. Educational attainment as a modifier for the effect of polygenic scores for cardiovascular risk factors: cross-sectional and prospective analysis of UK Biobank. Int J. Epidemiol. 51, 885–897 (2022).
Acknowledgements
We thank E. Gardner, J. Hampstead, H. Martin and M. Hurles for fruitful discussions and S. Carmi for advising about population genetics matters. This research has been conducted using the UK Biobank Resource under application number 66493. This project was financially supported by a VIDI grant from the Dutch Research Council (grant no. 917-17-353 to C.G.) and an AI for Health PhD grant from Radboudumc. This project was supported by a gift from the Koum Foundation (to E.L.-L.). E.L.-L. is Robin Chemers Neustein Director of Medical Genetics.
Author information
Authors and Affiliations
Contributions
H.G.B., C.G. and E.L.-L. supervised the study; H.F. and G.K. developed a data collection pipeline and statistical methods and analysed data. H.F., G.K., H.G.B., C.G. and E.L.-L. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Human Behaviour thanks Terry Vrijenhoek, Lily Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Association of genetic burden for recessive disease with childlessness for different constraint scores.
Associations of genetic burden with childlessness and hair color (as a control phenotype) for heterozygous PLPs in recessive genes (purple), PLPs in recessive genes excluding carriers of LoF in non-recessive genes (orange), and singleton LoFs in highly constrained (s-het > 0.15) non-recessive genes (green). Colored lines indicate odds ratios for the phenotypes with 99% confidence intervals. The dashed gray line indicates OR = 1, which serves as the reference point. Statistically significant deviations from OR = 1 are tested using a two-sided Wald test. The corresponding p-values are displayed in the figure and adjusted for multiple comparisons using the Bonferroni correction. Statistically significant associations are marked with an asterisk. The results are for three different s-het scores: Weghorn (a), Cassa (b) and pLI (c).
Extended Data Fig. 2 Simulations of the effect of sample size on detection of the association with childlessness.
Simulated estimation of the odds-ratios (a) and corresponding p-values (b), (c) depicted with a spread (95% percentile interval for n = 20 simulations per cohort size) for the effect of cohort size on the association of childlessness with the genetic burden of PLPs. Shown are simulation results for heterozygous carriers of PLPs in recessive genes (purple) and of carriers of singleton LoFs carriers in non-recessive highly constrained genes (green). The dotted gray line marks the significance level of P = 0.05.
Extended Data Fig. 3 Association of genetic burden for recessive disease with educational attainment, fluid intelligence score, and log-transformed number of ICD-10 diagnoses for different constraint scores.
Associations of genetic burden with educational attainment (measured in years of education), fluid intelligence score and log-transformed number of ICD-10 diagnoses for PLPs in all recessive (purple) genes and singleton LoFs in non-recessive highly constrained (green) genes. Colored lines indicate effect sizes (ES; estimated regression coefficients) with 99% confidence intervals. The dashed gray line indicates ES = 0, which serves as the reference point. Statistically significant deviations from ES = 0 are tested using a two-sided Wald test. The corresponding p-values are displayed in the figure and adjusted for multiple comparisons using the Bonferroni correction. Statistically significant associations are marked with an asterisk. The results are for three different s-het scores: Weghorn (a), Cassa (b) and pLI (c).
Extended Data Fig. 4 Association of genetic burden with various phenotypes for PLPs in recessive ID genes and all other recessive genes, using different constraint scores.
The comparison between PLPs in recessive ID genes (red) and all other recessive genes (blue) for the effects of genetic burden on childlessness, educational attainment (measured in years of education), log-transformed number of ICD-10 diagnoses, and hair color (as a control phenotype). The results are for three different s-het scores: Weghorn (a), Cassa (b), and pLI (c). Colored lines indicate odds ratio (OR; for childlessness and hair color) or effect sizes (ES; estimated regression coefficients) with 99% confidence intervals; dashed gray line indicates the OR = 1 (for childlessness and hair color) or ES = 0, which serve as the reference point. Statistically significant deviations from these points are tested using a two-sided Wald test. The corresponding p-values are displayed in the figure and adjusted for multiple comparisons using the Bonferroni correction. Statistically significant associations are marked with an asterisk.
Extended Data Fig. 5 Consanguinity ratio scores (CR) for different disorder groups in three European populations.
Consanguinity ratio scores (CRs) for 13 disorder groups calculated for 3 European populations: UK Biobank (blue), Dutch (magenta) and Estonian (orange) cohorts. Scores for the Dutch and Estonian cohorts are taken from Fridman et al.; Scores marked in asterisk are significantly higher than seen in a random set of recessive genes with the same coding length (Methods). We note that CR scores for some of the other disorder groups (Skeletal, Neuromuscular, Hematologic) are similar to or higher than the CR score obtained for ID genes, yet these did not reach significance.
Extended Data Fig. 6 Correlation of allele frequencies in disease categories between a Dutch cohort and the UK Biobank.
Correlation of average PLP allele frequencies (AF) per disorder group between the UK Biobank and Dutch cohorts with 95% confidence interval for the regression estimate (derived from n = 1,000 bootstrap samples). The X-axis shows the AF in the UK Biobank cohort, Y-axis shows the AF in the Dutch cohort. Pearson correlation ρ = 0.791, 95%CI = [0.425, 0.935], two-sided non-adjusted P = 0.001. AF for the Dutch cohort are taken from Fridman et al.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fridman, H., Khazeeva, G., Levy-Lahad, E. et al. Reproductive and cognitive phenotypes in carriers of recessive pathogenic variants. Nat Hum Behav (2025). https://doi.org/10.1038/s41562-025-02204-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41562-025-02204-7