Introduction

Vulvovaginal candidiasis (VVC) affects 75% of women worldwide at least once in their lifetime1. Recurrent vulvovaginal candidiasis (RVVC), defined as at least 3 distinct episodes of vulvovaginal candidiasis (VVC) within 12 months2,3,4, affects up to 9% of women worldwide5,6. However, studies from symptomatic women in Africa report higher RVVC rates of almost 25%7,8. In the multifactorial symptomatic vaginal Candida infections, host predisposing factors, perturbed microbiota composition, and fungal virulence traits play a role9. Commonly reported predisposing host factors include pregnancy, diabetes, contraceptive use, and broad-spectrum antibiotic treatment10. However, only some women experience recurrent episodes of VVC, and in most cases, the condition is idiopathic. This suggests that genetic differences in genes involved in the antifungal host immune response may influence susceptibility to the infection11.

An individual’s genetic makeup plays an important role in disease susceptibility12,13 and response to treatment14. Understanding the role of genetics in different populations for specific diseases has become increasingly important for improving diagnostics, developing targeted therapies, and enhancing disease prevention strategies. Notably, most human genomics research has focused on populations of non-African ancestry. However, African populations remain underrepresented in research on genetic susceptibility to diseases15,16,17,18 -despite having the greatest genetic diversity19,20,21,22,23, the highest infectious disease burden24 and an increasing load of non-communicable diseases25. This situation is however improving through initiatives such as the Human Heredity and Health in Africa (H3Africa) multi-country consortium, which empowers African researchers and fosters collaboration in genomic research—investigating the interplay between human genetics, the environment, and disease susceptibility26. Other efforts include the African Genome Variation Project22, the African BioGenome Project27, the African pan-genome assembly21, and the African Genomics Program by Roche28.

Candidate gene approach studies have identified genetic variations in immune-related genes as potential contributors to increased susceptibility in RVVC. For example, mutations in receptors involved in recognizing Candida albicans cell wall components—such as Dectin-129, its adaptor molecule CARD930,31, TLR232,33, or MBL34—can impair fungal detection and the host immune response, and have been linked to a higher risk for RVVC. Polymorphisms in NLRP3 have also been associated with RVVC4,35, and higher concentrations of IL-1β were detected in vaginal secretions from women carrying the RVVC-associated genotype4, suggesting that RVVC may be characterized by an enhanced pro-inflammatory immune response36,37.

The advent of the ‘omics’ technologies and cohort-based studies has enabled genome-wide association studies (GWAS)38 to identify candidate loci associated with increased susceptibility to various diseases including (R)VVC. Through the integration of genome-wide genetic analysis and immunological data, sialic acid-binding immunoglobulin-like lectin 15 (SIGLEC-15) has been identified as a potential susceptibility gene for RVVC39. Another GWAS conducted on a large population-based biobank in Estonia highlighted the role of the vaginal epithelium in recurrent vaginitis, suggesting that epithelial factors may influence host susceptibility to disease40. Despite these efforts, there is a scarcity of RVVC-related genomic studies due to challenges related to sample size and the difficulty of validating these findings across diverse populations, including Africans.

Genetic differences and heterogeneity in gene-driven cytokine response to antigen stimulation have been observed between populations of African and European ancestry41,42,43. Hence, the findings on genetic susceptibility to RVVC from European cohorts39,40 need to be complemented by studies in larger and more diverse populations, to better understand RVVC pathogenesis worldwide, and to advise therapeutic and preventive strategies. Therefore, we conducted a GWAS in an African cohort of women with RVVC. In order to better understand the contributing genetic factors that predispose these women to recurrent episodes of Candida infection, we compared the genetic landscape of women with RVVC and those with VVC; we also compared RVVC and a control group consisting of ‘asymptomatic/healthy women’ plus ‘symptomatic but uninfected women’. We included the two categories of women in the control group to account for any potential confounding factors, such as the presence of Candida commensal colonization, viral infections (e.g. genital herpes), and non-infectious causes of genital symptoms (e.g. atrophic vaginitis, allergic reactions, hormonal effects). The VVC comparator group served to assess for the genetic differences between women with RVVC and those able to limit the VVC episodes.

Methods

Ethical statement

Ethics approval was granted by the Kenyatta National Hospital-University of Nairobi Ethics and Research Committee, P980/12/2016. A research license (permit) was granted by the National Commission for Science Technology and Innovation, and authorization to conduct this study at the health facilities was given by Nairobi City County, Kenya. All methods were performed in accordance with the Declaration of Helsinki, and relevant guidelines and regulations. Informed consent was obtained from all individual participants included in the study.

Study design and populations

We performed a case–control study to determine risk factors for RVVC. This study was part of a larger longitudinal observational study on female lower genital tract infections and RVVC, conducted between October 2018 and March 2023, at seven health facilities in Nairobi, Kenya. The parent study was designed to establish a cohort of 250 women with RVVC, however due to the restrictions occasioned by the COVID-19 pandemic, participant recruitment was terminated prematurely in March 2020.

In this case–control study we performed two different comparisons: 1. RVVC (n = 174) versus Controls (n = 347); and 2. RVVC versus VVC (n = 157). The Controls group consisted of symptomatic but uninfected women (n = 246) plus asymptomatic (healthy) women (n = 101). The symptomatic but uninfected were women with one or more of the following genital tract symptoms (GTS) i.e. vaginal discharge with or without a foul smell, vulvar or vaginal itching/pruritus, vulvovaginal soreness or burning sensation, lower abdominal pain, dysuria, and dyspareunia; and whose vaginal smear sample tested negative for Candida species, Bacterial vaginosis (BV), and sexually transmitted infections (STI) (i.e. Neisseria gonorrhoea (NG), Trichomonas vaginalis (TV), Chlamydia trachomatis (CT) and Mycoplasma genitalium (MG)). Asymptomatic women were those with no GTS. RVVC was defined as having at least three symptomatic episodes of VVC within a 12-month period, with laboratory-confirmed Candida species identified in a vaginal smear for at least one the episodes. Participants with fewer than three VVC episodes in 12 months were classified as having acute VVC (VVC group). Participants were consenting non-pregnant women, aged 18–50 years, and who tested negative for both HIV and glycosuria, recruited at outpatient health facilities in Nairobi, Kenya. The study procedures have been previously described in detail8. Briefly, sociodemographic and clinical data, vaginal smear samples for detection of genital infections, and blood (buffy coat) for DNA isolation were obtained from all participants. Genital infections were detected as follows:—candidiasis by microscopic examination and culture on Sabouraud dextrose agar media; BV by the Nugent score; and CT, NG, MG, and TV by multiplex Real Time polymerase chain reaction test (Sacace Biotechnologies, Como, Italy).

Genotyping, quality control, and imputation

DNA isolated from whole blood samples of 678 participants was genotyped using Illumina GSA beadchip GSA MD (Illumina GSA Arrays “Infinium iSelect 24 × 1 HTS Custom Beadchip Kit). Quality control was performed before and after genotype imputation. Briefly, samples with a low call rate (≤ 99%) and variants with a Hardy–Weinberg equilibrium ≤ 0.00001, call rate < 0.99, missingness test (GENO > 0.01), and minor allele frequency (MAF) < 0.001 were excluded from further analyses. Duplicates, and first- and second-degree relatives were identified through multidimensional scaling (MDS) analyses and excluded from further analysis. A total of 265,479 variants passed quality control and were sent for imputation. Imputation was performed using the human reference consortium (HRC) panel44. As there is incomplete availability of genome reference data from the African population, for quality control ethnic outliers in our study population were identified based on MDS analyses by comparing the genomes of different populations from the 1000G Project44,45 with our study population. We performed MDS analyses per analysis, so the number of outliers is different in each of the comparisons i.e. RVVC versus Controls and RVVC versus VVC.

Association analysis

The GWAS was performed at single nucleotide polymorphisms (SNPs) with MAF > 0.05, and genetic susceptibility to RVVC was tested using logistic regression after correcting for five principal components to account for genetic heterogeneity. P value distributions were assessed using a Quantile–Quantile (Q–Q) plot to estimate inflation effect on the association results. The impact of identified SNPs on gene expression was then explored using Genotype-Tissue Expression (Gtex) database46 to reveal expression quantitative traits (eQTL); we next annotated the identified loci with their potential association with other quantitative traits using Open Target database47.

Results

Participant characteristics

Of the 678 participants, 174 were RVVC, 347 were Controls (246 symptomatic without infection (candidiasis, BV, STI), and 101 asymptomatic (healthy) women), and 157 were VVC. The participants were mainly from 5 ethnic groups, specifically, Kamba, Kikuyu, Kisii, Luhya and Luo, and their median age (IQR) was 27.7 (24.3–33.9) for RVVC, 29 (24–35) for the symptomatic uninfected and the asymptomatic women, and 29 (24.1–33.0) for VVC. Table 1

Table 1 Demographic characteristics of participants by group (RVVC, Controls, and VVC).

MDS analysis confirms the African ancestry of our cohort, but emphasizes the lack of sufficient African reference data

At MDS, 35 duplicates and 19 first- and second-degree relatives were excluded, leaving 624 participants for further analysis. As there is incomplete availability of genome reference data from the African population, we investigated the ancestry of individuals from our study population. To do so, we first conducted MDS analysis using genome reference data available from the 1000G Project45. Genetic data from our cohorts showed a complete overlap with the genomes of the African population (Supplementary Fig. 1A). Next, we compared the genetic data from our cohorts with the African-only reference data available in the 1000G Project. This showed the overlap of subset of individuals from our cohorts with the Luhya (LWK) population of Kenya; but we also observed the presence of some samples that were not overlapping with the available reference data in the 1000G Project (Supplementary Fig. 1B). Lastly, we annotated our study population samples with the names of their different tribes (Supplementary Fig. 1C) and found distinct clusters. This observation points to the lack of sufficient reference genomes to capture fully the genetic heterogeneity of the African population. Nevertheless, MDS analysis comparing the genomes of the cohorts in our study population (RVVC, VVC, Symptomatic without infection, and Asymptomatic) indicated that there is no clustering of individuals based on disease status (Supplementary Fig. 1D).

GWAS of RVVC versus controls and annotation of susceptibility SNPs

To identify genetic variants associated with susceptibility to RVVC, a GWAS was performed in a cohort of 160 Cases (RVVC) and 309 Controls (symptomatic uninfected women, plus asymptomatic women). After imputation and quality-control filtering, 7.18 million variants were analysed using regression analysis. Although no SNPs passed the genome-wide significance threshold of P < 5 × 10–8, 14 independent loci with P < 10–5 were suggestive for an association with RVVC (Table 2). Among these loci, a SNP on chromosome 11, rs8181503, showed the strongest association with susceptibility to RVVC (P = 9.28 × 10–7, odds ratio (OR) = 0.46) and is located close to the MS4A12 gene. Next, we explored the impact of these candidate SNPs on gene expression using Gtex database and revealed expression quantitative traits (eQTL) for five loci. We also annotated the 14 loci with their potential association with other quantitative traits using Open Target database. This led to the confirmation of the impact of one locus, rs6731176 on both RNA and protein levels of the gene PROC (Table 2).

Table 2 RVVC versus controls group: SNPs associated with susceptibility to RVVC (P < 10–5).

Given the role of multiple pathways in fungal infection, we hypothesized that the 14 genetic loci that show suggestive association with RVVC (P < 9.99 × 10–5) are enriched for susceptibility pathways for RVVC. To test this, we ran a pathway enrichment analysis using FUMA48. We found a significant enrichment of the identified RVVC susceptibility genes in metabolic pathways and cell adhesion pathways (Fig. 1, panels A–C).

Fig. 1
figure 1

RVVC-associated genes are enriched in metabolic and adhesion pathways based on (A) KEEG; (B), Reactome; and (C) GO database. − log (10) P values are shown on the x-axis and the significant pathways are named on the y-axis. RVVC, recurrent vulvovaginal candidiasis; SNPs, single nucleotide polymorphism; rsID, reference SNP identification; Chr, chromosome; eQTL, expression quantitative traits.

GWAS of VVC and annotation of susceptibility SNPs

Prior to comparing RVVC and VVC groups, we performed a GWAS analysis to identify genetic variants associated to VVC. After quality control, the cohort comprised 444 women (137 VVC and 307 Controls) and 7.18 million variants. A region on chromosome 14 was significantly associated to VVC (rs76123164, P = 2.94 × 10–8, OR = 4.61). While this SNP is closest to the ADAM20 gene, we also found it to be eQTL for TTC9 and SLC8A3. Additionally, 18 other regions showed suggestive associations (P < 10–5). (Table 3).

Table 3 VVC versus controls: SNPs associated with susceptibility to VVC (P < 10–5).

GWAS of RVVC versus VVC and annotation of susceptibility SNPs

To test whether genetic variants were playing a role in conferring recurrent infections, we performed a GWAS analysis comparing 158 RVVC patients with 142 VVC patients. Although we found no genome-wide significant loci, we identified 12 independent loci with P < 10–5, suggestive for an association with RVVC with the strongest SNP being on chromosome 21 (P = 1.22 × 10–6, OR = 0.33) located next to the H2BC12L gene. We also found several of these SNPs to be eQTLs. For example, we found the expression of transmembrane protein 39A gene (TMEM39A) to be associated with rs58936172 locus (P = 8.96 × 10–6, OR = 2.42), rendering TMEM39A an important gene for RVVC susceptibility. (Table 4).

Table 4 RVVC versus VVC: SNPs associated with susceptibility to RVVC (P < 10–5).

From pathway enrichment analysis on genes from these loci, there was a significant enrichment of genes in metabolic pathways and disease signalling pathways, with the main ones being linoleic acid metabolism and fibroblast growth factor receptor (FGFR) signalling. (Fig. 2, panels A–C).

Fig. 2
figure 2

RVVC versus VVC—Pathways enrichment analysis based on (A) KEEG; (B), Reactome; and (C) GO database. − log (10) P values are shown on the x-axis and the significant pathways are named on the y-axis.

Discussion

In this study we performed the first genome-wide association analysis in an African population of women reporting recurrent episodes of VVC. We identified several genomic variants associated with RVVC, with the strongest being the SNP rs8181503, located close to the MS4A12 gene on chromosome 11; as well as the SNP rs58936172 next to the TMEM39A gene on chromosome 3. We found no overlap in the susceptibility SNPs for RVVC and VVC. The polymorphisms associated with RVVC were linked to dysregulated metabolic, cell adhesion and disease signalling pathways relevant for RVVC susceptibility, with the top pathways being gluconeogenesis, fatty acid metabolism, linoleic acid metabolism, pentose phosphate, chemotaxis, and FGFR signalling pathways.

During encounters with pathogens, the immune cells’ metabolic processes are reprogrammed towards providing the energy and substrates necessary for phagocytosis and the mounting of a successful immune response49,50. In line with this, monocytes exposed to Candida exhibit an upregulation of glycolysis, and the biosynthesis and catabolism of glycogen is important for Candida albicans metabolism and virulence51,52,53,54; in gluconeogenesis-associated polymorphisms therefore, the immune response is bound to be inadequate. Linoleic acid inhibits hypha formation by Candida thus interfering with the morphogenic processes which are key to the fungus’s virulence55. Therefore, women with polymorphisms that affect the metabolism of linoleic acid may exhibit an insufficient immune response against Candida. It is thus interesting to observe that genes related to these metabolic processes are among the most enriched in genetic variants associated with susceptibility to RVVC.

The TMEM39A gene influences immune pathways for cytokine production and hence may affect the IL-17 and IL-22 cytokines, which are important components in the host’s immune response and mucosal barrier against Candida infection, respectively56,57. Indeed, use of interleukin IL-17 blockers (in comparison with use of anti-TNFα) is associated with increased vulnerability to VVC in patients with psoriasis58. In women bearing the rs58936172 SNP, we speculate that the Candida-host commensal association59 is easily tilted in favour of the fungus as compared to others who despite exposure to the Candida are able to maintain the commensal association always or most times.

Studies from non-African populations revealed genes associated with RVVC susceptibility via effects on the host immune processes4,60. From a European population, Jaeger et al.39 identified the SIGLEC15 to be associated with RVVC. We speculate that the upregulation of the MS4A12 and PROC genes in African women may be associated with the hyperinflammation to the Candida fungus in RVVC. An upregulated MS4A12 gene likely promotes hyperinflammation via an exaggerated IgE-mediated inflammatory response61, while that by the PROC gene is via protein C62. Since TMEM39A polymorphism is associated with autoimmune diseases63,64, we have reason to suggest that for women with this SNP, RVVC is likely akin to an autoimmune condition with the immunogen being the Candida fungus during its commensal existence in the host. Our finding of no upregulation of hyperinflammation genes in VVC despite the VVC-associated inflammation aligns with this view. To elucidate the exact mechanisms involved, functional studies are warranted in the future, including assessment of the influence of the unique environmental factors and endemic infections in Africa, on the identified polymorphisms for RVVC.

In previous RVVC genetic studies, patients with RVVC were mainly compared with healthy women without vaginal symptoms. In our study, we expanded the comparator group to include women with symptoms but without an etiology as well as those with sporadic VVC, and one may argue that these control groups are the ideal—since susceptibility to recurrent Candida infections is also explored in contrast to occasional Candida infection or symptoms that are not related to the fungus. To explore further on this conjecture, we recommend that further RVVC genetic studies involve the expanded comparator groups, and also integrate these comparator groups in RVVC-related genital mucosal immunological studies.

A key limitation of our study is the sample size, which impacts the statistical power to detect genome-wide significant associations. Based on standard GWAS power calculations, our cohort of 174 cases and 347 controls provides limited power (< 30%) to detect an odds ratio of 1.5 at a MAF of 5%, using a genome-wide significance threshold of 5 × 10⁻⁸. Given these constraints, our study is primarily exploratory and aimed at identifying suggestive genetic associations that warrant further validation. Larger sample sizes, replication in independent cohorts, and meta-analyses will be necessary to confirm the observed associations and achieve robust genome-wide significance. A second limitation is that in this study’s setting, diagnosis of VVC is conventionally symptom based without microbiological testing. Our definition of RVVC thus included reliance on participants’ recall of previous VVC episodes, which may have introduced recall bias. Future studies should aim to obtain longitudinal microbiological confirmation of Candida infection for all VVC episodes. Third, our cohorts were not without co-infections (NG, TV, CT, MG, and BV)8; however, these being acquired conditions, we believe that the co-infections did not impact the SNPs identified, and are in any case a true reflection of these characteristics in the RVVC group. Finally, we lacked a comparative African RVVC cohort because studies and data on RVVC are scarce as the diagnosis of RVVC is not recognized in the syndromic approach frequently/widely used in African settings. This nevertheless means that the data that we present here is a treasure for future studies.

Despite these limitations, our study is notable, being the first to associate relevant SNPs with RVVC in an African population, hence affords valuable insights and can serve as a foundation for future research and clinical application. The RVVC-associated polymorphisms and pathways identified in the present work could be advanced for the development of novel personalized therapies, biomarkers, and tailored preventive strategies, to improve the management of patients with RVVC.

Conclusion

RVVC susceptibility is due to multiple factors, genetic predisposition being among the newer revelations. Our findings are a valuable addition for better understanding of the pathogenesis of RVVC in an African population, and an important resource for the future search for novel therapies and preventive strategies.