Introduction

Neurodevelopmental disorders have surged in prevalence over recent decades1,2,3. For instance, the number of diagnosed cases of Autism spectrum disorder (ASD) has risen by 176% in the last 20 years4,5,6. The etiology of delayed and aberrant neurodevelopment is complex and multifactorial7,8,9,10,11,12,13. Studies on heritability have identified several genetic risk factors including rare and common genetic variants, chromosomal anomalies, and gene defects2,3,14,15,16. Meta-analyses have also identified a series of environmental risk factors, including pre-natal, peri-natal, and post-natal elements8,9,17,18,19. Among these factors, vitamin D has been increasingly implicated in various brain disorders, including schizophrenia, multiple sclerosis, and ASD20,21,22,23,24,25,26,27,28.

Low levels of vitamin D are hypothesized to play a causative role in ASD risk during early childhood20,23,24,25,27,28. Children with ASD have lower vitamin D levels in the brain and peripheral blood compared to neurotypical children29,30,31, and children living in areas with low solar UVB doses exhibit three times the ASD prevalence of their peers in sunny areas32. During pregnancy, gestational vitamin D deficiency is associated with increased risk of ASD development33,34. Conversely, maternal vitamin D supplementation has been demonstrated to significantly associate with reduced incidence of ASD35,36. Vitamin D supplementation has also shown remarkable therapeutic effects, improving core behavioral and social symptoms of ASD37,38,39.

Vitamin D is a fat-soluble vitamin that is produced mainly from skin exposure to UVB radiation40. The human body metabolizes vitamin D into a series of steroid-like hormones41. These hormones regulate a variety of physiological processes, from bone growth to immune response to brain development and function41. Multiple modes of action have been proposed to explain the link between vitamin D and neurodevelopment23,25,27,42,43. Evidence suggests that vitamin D might be involved in the synthesis of neurotransmitters such as serotonin and dopamine44,45, the modulation of autoimmune responses46,47, and the production of pro-inflammatory cytokines and antioxidants48,49. However, a causal mechanism has yet to be established.

In this study, we employ an innovative and integrative network approach to explore the neurodevelopmental impact of vitamin D. We analyze the metabolomic profiles of 381 children within the Vitamin D Antenatal Asthma Reduction Trial (VDAART)50. We investigate how vitamin D levels act through individual metabolic differences to influence early life neurodevelopment, particularly communication skills development. Our study is focused on child communication status as measured by the Ages and Stages Questionnaire (ASQ)51,52. Poor performance in the ASQ communication ___domain has been shown to strongly correlate with future ASD risk53,54.

The novelty of our work is twofold. Firstly, we present a dataset that uniquely captures information on the metabolomic profiles, vitamin D levels, clinical records, and neurobehavioral statuses of hundreds of children; to our knowledge, no other resources with this combination of information exist. Additionally, we apply network analysis to these data in a novel way. Using LIONESS55, a network algorithm developed in our group, we reconstruct each child’s individual metabolic network and integrate these networks with their phenotypic traits. Collectively, our findings shed light on the functional impact of vitamin D in early childhood communication skills development. By leveraging a metabolome-wide, systems biology approach, this work offers new insights into the potential of vitamin D as a treatment for ASD and other neurodevelopmental disorders.

Results

The study population

The study population of VDAART has been described previously50. Briefly, this study recruited pregnant women at 10–18 weeks gestation and randomized them to a daily supplement of 4000 IU/day of vitamin D3 or placebo. All women also received a daily multivitamin containing 400 IU of vitamin D3. Supplementation took place until delivery. Offspring were evaluated quarterly through questionnaires and yearly through in-person visits.

This study is centered on the clinical and omics profiles of the VDAART children at age three (see Supplementary Table 1). Clinical and phenotypic traits of the children included: demographic information, including race and gender; pathological conditions, including asthma and recurrent wheeze; and developmental predictors collected through the Ages and Stages Questionnaire (ASQ). The ASQ is a parent-completed, developmental screening test evaluating the communication, personal-social, problem-solving, fine motor, and gross motor skills of a child51,52 (see “Materials and methods”). For each of these five domains, a child is given a score and categorized as “On Schedule for developing normally” (ASQ = 0); “Requires Monitoring” (ASQ = 1), or “Needs further evaluation” (ASQ = 2). ASQ results of VDAART children were recorded during the annual follow-up visit at age three.

We focused our analysis on the ASQ scores in the communication skills ___domain (referred to as the ASQ-comm score). This score reflects child neurodevelopmental status and progress. Indeed, the ASQ-comm score has been reported to correctly identify 95% of children defined at risk of ASD based on the Modified Checklist for Autism in Toddlers (M-CHAT)53. Additionally, we have previously shown that plasma metabolites associated with the ASQ-comm score at age 3 represent accurate predictors of ASD diagnosis later in life37. For these reasons, here we use this score as a proxy for ASD risk. No similar associations have been observed for the other four ASQ domains.

During the same year-three visit, children’s blood samples were also collected (see Supplementary Table 1). From these samples, plasma metabolomic profiling was performed by Metabolon Inc. (NC, USA) and serum levels of vitamin D were measured. Serum levels of vitamin D were also measured in mothers before (32–38 weeks gestation) and after (1 year) delivery.

Reconstructing patient-specific metabolic networks through LIONESS

Prior studies have linked neurodevelopment, and in particular ASD diagnosis, to Vitamin D deficiency during pregnancy and early childhood20,21,24,25,56. The underlying molecular mechanisms remain poorly understood, but a key role for neurotransmitter metabolism has been recently suggested25,27,28,56. Building upon these findings, we sought to investigate the metabolic processes associated with the neurodevelopmental actions of vitamin D. To do this, we used LIONESS55 (Linear Interpolation to Obtain Network Estimates for Single Samples), a method developed in our group to reverse-engineer sample-specific networks. As opposed to other commonly used network approaches, such as correlation networks, the single-sample networks estimated by LIONESS can model the intrinsic variability across the samples in a data set. This allows us to analyze individual network relationships in connection with individual phenotypes and disease status.

The input to LIONESS is an omics matrix (Fig. 1a). LIONESS estimates the individual network \(e^{\left( q \right)}\) of a sample \(q\) through the following equation55 (see also “Materials and methods”):

$$e^{\left( q \right)} = N\left( {e^{\left( \alpha \right)} - e^{{\left( {\alpha - q} \right)}} } \right) + e^{{\left( {\alpha - q} \right)}} ,$$
(1)

where \(N\) is the number of samples in the data set \(\left( \alpha \right)\), and \(e^{\left( \alpha \right)}\) and \(e^{{\left( {\alpha - q} \right)}}\) are two aggregate networks, built using all samples in the data set (\(e^{\left( \alpha \right)}\)) and built excluding sample \(q\) (\(e^{{\left( {\alpha - q} \right)}}\)) respectively. In our application, we used the metabolomic profiles of the VDAART children at age 3 to build the omics matrix (Fig. 1a). Metabolite levels were filtered for missingness, normalized, corrected for batch effects, and adjusted for covariates (see “Materials and methods”). The resulting metabolomic matrix contained the levels of 833 metabolites for 381 samples. We computed the Pearson correlation coefficient for each pair of metabolites to build the two aggregate networks, \(e^{\left( \alpha \right)}\) and \(e^{{\left( {\alpha - q} \right)}}\) (Fig. 1b). Using Eq. (1), we then iteratively applied LIONESS to reconstruct the metabolic network \(e^{\left( q \right)}\) of each VDAART child (Fig. 1c). The edge \(e_{ij}^{\left( q \right)}\) in the individual network \(e^{\left( q \right)}\) is a weighted edge connecting two metabolites \(i\) and \(j\). The edge’s weight represents how much sample \(q\) contributes to the co-expression between the metabolites \(i\) and \(j.\) The more positive (negative) \(e_{ij}^{\left( q \right)}\), the more the pair \(\left( {i,j} \right)\) has coordinated (anti-coordinated) metabolic expression in the sample \(q\). In total, LIONESS generated 381 individual networks, each composed of 833 nodes and 346,528 edges. These networks allowed us to investigate the relationship between the children’s metabolic profiles and their phenotypes.

Figure 1
figure 1

Overall scheme of the analysis. (a) The metabolomics matrix containing the metabolite levels of 381 VDAART children was used as the input to the LIONESS algorithm. (b) Pearson correlation coefficients between each pair of metabolites were calculated to build the aggregate correlation network associated with the metabolomics matrix. Only edges with high correlation coefficients are shown for visualization purposes. (c) LIONESS reconstructed the sample-specific metabolic networks based on Eq. (1). (d) The LIONESS edge weights for each pair of metabolites were regressed against the children’s phenotypic traits. (e) We selected edges that are statistically associated with the interaction term between children’s ASQ-comm scores and either maternal or offspring vitamin D levels. These edges constitute the VDI network.

Tryptophan, linoleic, and fatty acid metabolism are associated with the neurodevelopmental role of vitamin D in early childhood

Starting from the LIONESS networks, we examined the metabolic interactions associated with the role of vitamin D in children’s neurodevelopment. To do so, we ran a linear regression analysis on the children’s metabolic networks against their phenotypic traits (Fig. 1d). For each metabolite pair, the weight of the associated LIONESS edge across the individual networks was regressed against multiple clinical variables (Fig. 1d, see “Materials and methods”). We selected metabolite edges that were significantly associated with (1) the interaction term between children’s ASQ-comm score and their Vitamin D levels, and/or (2) the interaction term between children’s ASQ-comm score and their mother’s Vitamin D levels at 32–38 gestation weeks (p-value < 0.01). We hypothesize that these edges represent potential metabolic reactions affected by vitamin D in relation to communication skills at age 3. Therefore, they might highlight biochemical processes mediating the interaction between Vitamin D, communication impairment, and future ASD risk. We refer to the subnetwork composed of these significant edges as the Vitamin D Interaction (VDI) network (Fig. 1e, Supplementary Table 2).

To determine the metabolic pathways underlying the VDI network, we performed a pre-ranked Metabolic Set Enrichment Analysis (MSEA)57,58 using the Kyoto Encyclopedia of Genes and Genome (KEGG)58 (Fig. 2a). We ranked all the metabolites based on their number of connections, or “degree”, in the VDI network and ran MSEA based on this pre-ranked list (see “Materials and methods”). In this way, the MSEA was built upon the topology of the VDI network: enriched pathways represent molecular processes where the annotated metabolites act predominantly as hubs in the VDI network (nodes with a large number of edges). Hence, they highlight biochemical mechanisms underlying the interaction between vitamin D and child’s communication development. Among the most enriched metabolic pathways (Supplementary Table 3), we identified Tryptophan metabolism (hsa00380, FDR = 0.08), Linoleic Acid Metabolism (hsa00591, FDR = 0.002), and Biosynthesis of Unsaturated Fatty Acids (hsa01040, FDR = 10–6).

Figure 2
figure 2

Enrichment analysis of the VDI network. (a) Normalized Enrichment Score of KEGG pathways enriched in the VDI network based on pre-ranked MSEA. Pathways are ranked based on their p-values. (b) VDI network degree of the MSEA leading metabolites of the top three enriched KEGG metabolic pathways, which include Tryptophan metabolism, Linoleic acid metabolism, and Biosynthesis of unsaturated fatty acids. (c) First neighbors’ network of the leading metabolites for 1) Tryptophan metabolism (top), 2) Linoleic acid metabolism (center), and 3) Steroid hormone biosynthesis (bottom). Node sizes are proportional to the node’s degree. Only edges included in the VDI network are shown. For visualization purposes, we highlighted nodes that were mentioned in the main text.

Within the Tryptophan pathway, MSEA leading edge analysis59 indicated serotonin, quinolinate, xanthurenate, tryptophan, and kynurenine as leading metabolites (Fig. 2b and c). Altered regulation of these neuroactive metabolites has been previously linked to disruption of various neurodevelopmental and neuropsychological conditions, including ASD, Attention-deficit/hyperactivity disorder (ADHD), depression, and schizophrenia60,61,62,63,64,65,66,67,68,69.

The VDI network was also enriched for pathways related to fatty acid production, including the Linoleic Acid Metabolism and the Biosynthesis of Unsaturated Fatty Acids pathways. MSEA leading metabolites in the Linoleic Acid Metabolism comprised linoleate, dihomolinolenate, and arachidonate (Fig. 2b). These metabolites represented large hubs in the VDI network, with over 70 metabolite edges significantly associated with the interaction between vitamin D and the ASQ-comm score (Fig. 2c). Notably, multiple studies have reported deficiency of these metabolites in individuals with ASD70,71,72 and other learning disabilities, such as ADHD73,74 and dyslexia75,76. Leading metabolites annotated to the Biosynthesis of Unsaturated Fatty Acids included docosahexaenoate, stearate, and palmitate (Fig. 2b), with docosahexaenoate a major hub in our VDI network (Fig. 2b). Several of these metabolites show reduced levels in individuals with ASD72. They also serve as precursors in both pro-inflammatory and pro-resolvin pathways, supporting the anti-inflammatory effects of vitamin D in relation to neurodevelopment77. Additionally, vitamin D and docosahexaenoic acid have been proposed to modulate serotonin synthesis78, regulating brain activities such as mood, behavior, memory, and sleep79. These results support growing evidence that delayed communication skills and ASD risk are linked to disordered lipid and fatty acid metabolism72,80,81, which are both known to affect brain growth and cognitive development70,71.

In conjunction with aberrant fatty acid metabolism, a deficit in cholesterol biosynthesis may be also involved in the pathogenesis of ASD81. Cholesterol was a leading metabolite in the enrichment of the Steroid Hormone Biosynthesis pathway (Fig. 2c). It follows that vitamin D deficiency might affect neurodevelopment by disrupting cholesterol homeostasis.

Metabolic interactions of Vitamin D with children’s neurodevelopment correlate with different disease outcomes

Our network analysis suggested that Vitamin D may influence communication skills development through multiple metabolic pathways. We interrogated whether differences within these pathways underlie epidemiological differences among the VDAART children. To answer this question, we performed hierarchical clustering of the VDAART children in the context of the VDI network. More specifically, we selected the edges in the VDI network and then clustered children based on their individual LIONESS weights associated with these edges. We used Spearman correlation as similarity distance and the complete-linkage metric (see “Materials and methods”). Our analysis identified 5 main clusters. Notably, each cluster exhibited different clinical, physiological, and metabolic characteristics (Supplementary Table 4).

Children’s vitamin D levels were overall higher in cluster 3 and 5, with cluster 5 having significantly higher vitamin D levels at age 3 compared to all other clusters (Student’s T-test, P = 0.039; see “Materials and methods” for details). In contrast, children in cluster 4 exhibited significantly lower vitamin D (Student’s T-test, P = 0.062; Fig. 3a). Interestingly, maternal vitamin D levels collected during and after pregnancy exhibited a behavior similar to the offspring (Fig. 3a). Vitamin D levels of children in clusters 3 and 5 at age 3 were significantly correlated (p-value < 0.1) with maternal vitamin D levels in both cord blood and at 1 year post-pregnancy (Supplementary Fig. 1). Mothers of the children in cluster 3 had significantly higher vitamin D both during late pregnancy (32–38 gestation weeks, Student’s T-test, P = 0.09) and at one year after delivery (Student’s T-test, P = 0.1), while children in cluster 5 had significantly higher cord blood vitamin D (Student’s T-test, P = 0.085). Similarly, vitamin D levels of children in cluster 4 at age 3 correlated significantly with cord blood vitamin D levels (Supplementary Fig. 1). Notably, this interdependence was not evident when looking purely at the correlation between the mothers’ and children’s vitamin D measurements (Supplementary Fig. 2). It follows that imbalances in maternal vitamin D may relate to imbalances in vitamin D during offspring's early childhood. This effect could be due to biological factors or reflect breastfeeding habits and shared environmental exposures82,83.

Figure 3
figure 3

Children’s phenotypic traits in the clusters extracted from the VDI network. (a) Vitamin D levels (ng/mol) of children in each cluster at year 3, their mothers at 32 to 38 gestation weeks and one year after delivery, and in the cord blood. Colored boxes represent clusters with lower or higher Vitamin D levels compared to the individuals in all other clusters (p-value < 0.15, see “Materials and methods”). (b) Stacked histogram of the communication scores for children in each cluster. Cluster 4 (referred as the low-communication cluster) exhibited the lowest distribution of communication scores. (c) Stacked histogram of asthma incidence in children and mothers within each cluster. Cluster 2 (referred as the asthma cluster) exhibited the highest proportion of children and mothers with asthma.

We next analyzed the clusters in the context of the communication scores obtained from the ASQ questionnaire (see “Materials and methods”). These scores reflect a child's proficiency in different aspects of communication; the higher the score, the higher the child’s ability to perform communication tasks. The distribution of communication scores across clusters revealed a pattern opposite to that of the vitamin D levels (Fig. 3b). Children in clusters 3 and 5 exhibited higher communication scores, with cluster 3 reaching statistical significance (Mann–Whitney U test, P = 0.04). Children in cluster 4 showed significantly lower scores (Mann–Whitney U test, P = 0.015). As such, cluster 4 represents children with delayed communication development compared to the rest of the population. We refer to this cluster as the low-communication cluster. The lower levels of vitamin D in this cluster support a potential neuroprotective role of vitamin D in early childhood communication development.

The VDAART cohort was initially established to examine the impact of vitamin D on asthma and allergies50, and therefore includes information regarding children’s current and family history of asthma and respiratory symptoms. Considering the known association between asthma and ASD84,85, we investigated potential connections between our VDI network and the incidence of asthma. Figure 3c displays the VDAART children’s symptomatology together with their parental history. Comparison between clusters showed a significantly higher incidence of asthma and maternal asthma in cluster 2 (Chi-Squared Test, P = 0.03, 0.05 for child’s and maternal asthma respectively). Together with the low-communication cluster, this cluster also exhibited the lowest maternal vitamin D levels during late pregnancy (Student’s T-test, P = 0.148). At age three, 70% of children in cluster 2 had Vitamin D concentrations below the deficiency threshold of 20 ng/ml86. As such, cluster 2 (referred to as the asthma cluster) reflects a different phenotypic outcome associated with vitamin D deficiency. This result can be interpreted within the context of the immunomodulatory and anti-inflammatory effects of vitamin D. Vitamin D might alter the metabolic regulation of the immune response, leading to the respiratory abnormalities often encountered in individuals with ASD. In support of this, children in cluster 3 had a lower incidence of asthma (Chi-Squared Test, P = 0.14).

We further examined if the different disease phenotypes of the low-communication and asthma clusters coincided with different metabolic endotypes. To do so, we compared the clusters’ individual metabolic networks and identified their marker metabolic edges (see “Materials and methods”). For each cluster, a marker edge is defined as an edge whose LIONESS weights significantly differ in that cluster compared to the rest of the population. In agreement with our enrichment analysis, marker edges of the low-communication cluster involved interactions with serotonin and its breakdown product 5-hydroxyindoleacetate (5HIAA) (Fig. 4a). Top significant edges connected 5HIAA to ester derivatives of L-carnitine, including 3-Hydroxybutyrylcarnitine, acetyl-L-carnitine, and palmitoleoylcarnitine. This hints at a role of vitamin D in the crosstalk between serotonin synthesis and the carnitine shuttle system87. Of note, there is increasing interest in the neuroprotective potential of L-carnitine across the lifecourse88. Marker edges of the low-communication cluster also involved interactions between 5HIAA and members of the linoleate metabolism pathway, suggesting perturbations in fatty acid production. Additionally, this cluster exhibited altered connection with the amino acid proline (Fig. 4a). In support of this result, abnormal proline levels have been associated with the catechol-O-methyltransferase genotype, a gene variant affecting the brain dopaminergic system in individuals with ASD89. Finally, marker edges involved interactions with palmitic and stearic acids, which have both been suggested as ASD biomarkers71,72.

Figure 4
figure 4

Marker metabolic edges identified for the low-communication and asthma clusters. Top marker metabolic edges identified for the (a) low-communication and (b) asthma clusters. Node sizes are proportional to the node’s degree in the marker edges’ subnetwork. Edges are colored based on the average LIONESS weight across the networks in the cluster. For visualization purposes, we highlighted nodes and edges that were mentioned in the main text.

Analysis of marker edges of the asthma cluster revealed a different pattern. Several edges involved the amino-acids isoleucine and valine (Fig. 4b). This result is noteworthy; previous reports have listed these metabolites as differentially expressed in the plasma metabolome of asthma and asthma-COPD overlap patients90,91. The metabolic networks of the asthma cluster also showed perturbations in the palmitic and stearic acid metabolism pathways.

These findings indicate that vitamin D affects children’s neurodevelopment through multiple mechanisms of action, leading to the development of distinct metabolic endotypes and phenotypes.

The VDI network highlights a role of the kynurenine and serotonin sub-pathways in the interaction between vitamin D and neurodevelopment

Our MSEA analysis revealed tryptophan metabolism as one of the most enriched pathways within the VDI network (Fig. 2a and Supplementary Table 3). Our previous studies54 have also identified strong associations between tryptophan metabolites and the ASQ-comm score in both the plasma and stool metabolome of VDAART children. Given this evidence, we investigated the biochemical reactions underlying these associations. We selected metabolic edges that (1) were statistically associated with the interaction between children’s and maternal vitamin D and ASQ-comm score (p-value < 0.05) and (2) connected pairs of metabolites that are both annotated to the KEGG tryptophan metabolism pathway. We identified only one edge connecting 5HIAA and L-kynurenine. This edge was also significantly associated with children’s vitamin D levels in our regression model (p-value = 6.4 × 10–3). By definition, the metabolites of this edge exhibited co-expression changes in association with the interaction between vitamin D and the ASQ-comm score. To identify the potential molecular mechanisms mediating these changes, we mapped 5HIAA and L-kynurenine onto the KEGG tryptophan reaction network (see “Materials and methods”). The nodes of this network represent all chemical compounds listed in the KEGG tryptophan metabolism pathway. The edges represent chemical reactions between two metabolites, a substrate and a product.

To gain insights on the cascade of biochemical reactions affecting the relative proportion of 5HIAA and L-kynurenine, we computed all the shortest paths connecting these two metabolites within the tryptophan reaction network. We identified one path (Fig. 5), involving two main subprocesses: (1) the biosynthesis of serotonin from L-tryptophan and (2) the degradation of L-tryptophan via the kynurenine pathway. Biochemical reactions mediating serotonin biosynthesis included hydroxylation of L-tryptophan into L-5-hydroxytryptophan (5-HTP) (KEGG reaction R01814), subsequent decarboxylation of 5-HTP to produce serotonin (KEGG R02701), serotonin degradation into 5-hydroxyindoleacetaldehyde (5-HIAL) (KEGG R02908), and the final oxidation of 5-HIAL to 5HIAA (KEGG R04903) (Fig. 5). Our shortest path also contained metabolic reactions involved in the kynurenine pathway, including the oxidation of L-tryptophan to N-formylkynurenine (NFK) (KEGG R00678) and subsequent hydrolysis of NFK to L-kynurenine (KEGG R01959). For each of these biochemical reactions, we analyzed their substrate-to-product expression ratio in association with the children’s ASQ-comm score (see “Materials and methods”). The ratio between serotonin and 5HIAA levels increased significantly with increasing ASQ-comm score (P-value = 0.038), suggesting dysregulated activation of serotonin biosynthesis in children with impaired communication skills. This result was consistent with previous observations of blood hyper-serotoninemia in children with ASD44,92.

Figure 5
figure 5

Shortest path connecting 5HIAA and L-kynurenine in the KEGG reaction network. The edge connecting 5HIAA and L-kynurenine was statistically associated with the interaction term between vitamin D and the ASQ communication score. One shortest path connected these two metabolites in the KEGG reaction network of tryptophan metabolism. This path involved two main subprocesses: (1) the biosynthesis of serotonin from l-tryptophan (orange nodes) and (2) the degradation of l-tryptophan via the kynurenine pathway (blue nodes). The nodes in this network represent chemical compounds listed in the KEGG tryptophan metabolism. The edges represent the chemical reactions between a substrate and a product. Only the largest connected component is shown.

Discussion

This work investigates the impact of vitamin D on the human metabolic network in association with children’s communication development. Language and communication deficits represent a characterizing feature of various neurodevelopmental disorders, especially ASD7,8,9,27,42. By using early childhood communication skills as a proxy for delayed neurodevelopment, our analysis sheds light on the potential neuroprotective effect of vitamin D against ASD and other cognitive disabilities.

We integrated the metabolomic profiles, clinical traits, and communication screening data of hundreds of children through a unique network framework. This is the first time our LIONESS method has been applied in the context of paired metabolomics and neurodevelopmental data, illustrating the novelty of this work. Our results suggest that the neurodevelopmental action of vitamin D operates through several metabolic pathways, altering interactions within the tryptophan metabolism, lipid metabolism, and unsaturated fatty acid metabolism pathways. Closer investigation of the tryptophan pathway highlights a key role of the kynurenine and serotonin sub-pathways.

Notably, our network clustering analysis shows statistically significant associations between the vitamin D levels of children and their mothers during and after pregnancy. This result is consistent with previous literature supporting a causative role of maternal vitamin D deficiency in offspring vitamin D imbalances93,94,95. Indeed, despite recommendations for vitamin D supplementation during pregnancy and the widespread use of prenatal vitamins, current estimates indicate that 20–85% of pregnant women are still deficient in vitamin D82,83,96,97. This might be due to individual habits, inadequate supplement intake, or a molecular breakdown in vitamin D absorption82,83. As such, vitamin D deficiency remains a worldwide epidemic. In the face of this public health issue, our findings provide a potential mechanistic interpretation of known statistical associations between low vitamin D levels, aberrant communication development, and future risk of ASD.

A substantial body of literature has linked ASD to disrupted levels of neuroactive metabolites44,92,98,99,100,101: blood hyper-serotoninemia is one of the most consistent biomarkers of ASD44,92; overproduction of quinolinic, xanthurenic, and kynurenic acid has been observed in individuals with ASD and has been linked to enhanced kynurenine and glutamatergic activity100,101; finally, vitamin D has been proposed to differentially regulate the transcription of the serotonin-synthesizing genes TPH1 and TPH244. In support of these findings, our network analysis suggests that vitamin D might play a role by altering metabolic interactions between serotonin and other members of the tryptophan metabolism pathway, including xanthurenate, L-tryptophan, N-formylanthranilic acid, and quinolinate. Additionally, ASD has been associated with fatty acid deficiencies72,80,81,102. We show that differences in vitamin D levels correspond to perturbations in fatty acid metabolism, including the metabolic interactions between docosahexaenoate, stearate, and palmitate. Interestingly, these metabolites are major constituents of membrane phospholipids. This supports recent theories about dysregulation of phospholipid metabolism in patients affected by neurodevelopmental disorders, including ASD and attention-deficit/hyperactivity disorder103,104.

Our findings align with several postulated mechanisms linking vitamin D, impaired communication skills, and future ASD risk. Indeed, vitamin D might be critically involved in serotonin metabolism44,78. Low vitamin D levels could result in reduced brain serotonin levels through either insufficient intake or impaired vitamin D absorption82,83. This effect might disrupt the serotonin-dependent regulation of brain development and function, impacting neural plasticity, sensory processing, and emotional control23,25,44,98,105. Further, vitamin D could influence cognitive development via its anti-inflammatory action. Vitamin D may activate tryptophan hydroxylase 2, which preferentially converts tryptophan to serotonin over kynurenine78, or reduce neurotoxin levels in the brain23,25,106. Finally, low vitamin D levels combined with excess androgens may interact with sex hormones to influence neurovelopment23,107,108,109,110. These theories are not mutually exclusive, and it is possible that all these mechanisms are at play.

ASD manifestations are often accompanied by other comorbid pathologies84,85. These include, but are not limited to, psychiatric disorders such as anxiety, depression, obsessive compulsive disorder, and sleep dysfunction111,112,113,114,115,116, gastrointestinal problems such as chronic constipation and diarrhea, food allergies, and digestive issues85,117,118,119,120, and allergic and autoimmune diseases such as asthma and atopic dermatitis121,122. Given the clinical respiratory data available in the VDAART cohort, we asked if the neurodevelopmental action of vitamin D on children’s metabolome influences asthma manifestation and other respiratory symptoms. Our findings suggest that a breakdown in vitamin D metabolism can impact different components of children’s metabolic networks. These components are associated with different disease outcomes, including asthma. Interestingly, several pathways highlighted by our network analysis, such as serotonin and fatty acid metabolism, have been previously implicated in asthma and asthma phenotypes. Serotonin receptors are functionally expressed in asthmatic lung epithelial and smooth muscle cells, and eosinophil migration has been shown to be dependent on these receptors in active asthma123. Serotonin’s release from mast cells has been shown to contribute to asthmatic airway hyperresponsiveness124. Dysregulation of fatty acid metabolism, which is critical for maintaining the balance between the pro-inflammatory and pro-resolvin mediators of lung inflammation, is also well characterized in asthma. These common mechanisms may explain why asthma and ASD co-occur frequently85.

Our work has some limitations. We recognize that the ASQ communication score is only one measure of neurodevelopment and does not represent a clinical diagnosis of ASD or any other condition. Indeed, there are intrinsic biases when estimating children’s developmental status from indirect caregiver reports53,125. For instance, ASQ assessments can be influenced by parental education level and socio-economic background126,127, social stigmatization127,128, and even a caregiver’s psychological distress129. Despite this limitation, our dataset represents one of the first instances where paired information on the metabolic profile, vitamin D levels, and neurobehavioral status of hundreds of children has been captured. To our knowledge, no other resources combine this breadth of information with a medical diagnosis of ASD. As such, VDAART offers a unique opportunity to investigate the link between vitamin D, metabolism, and ASD risk, unattainable with any other dataset. However, it is important to emphasize that this study is only the first step towards determining the role of vitamin D on early life neurodevelopment. Further clinical and translational assessments will be crucial for establishing a definite causative link between vitamin D and ASD.

Interventional studies using in vitro cultures, brain organoids, and mouse models represent an exciting opportunity to demonstrate this causal relationship. Interestingly, maternal vitamin D deficiency has shown to induce brain malformations in newborn mice130. Additional studies on calcitriol treatment of neuronal cultures and embryonic vitamin D-deficient brains have provided compelling evidence on vitamin D’s functional role in cell proliferation, apoptosis, and differentiation of developing brain cells130,131,132,133,134. Animal behavioral tests have also shown learning deficits in vitamin D-deficient mice135. These findings highlight the potential of experimental models to elucidate the causative mechanisms linking vitamin D with brain function. Additional complementary strategies include integration of multiple omics sources. Given the interdependence between tryptophan metabolism, the gut brain-axis, and the neurodevelopmental action of vitamin D136,137,138, combining microbiome and metabolic data represents a prospective avenue for future research. Integrating metabolomics data with genome-wide association studies may also be valuable to construct a putative causal pathway. Indeed, ASD has been reported to be highly heritable, with up to 40% of cases attributed to genetic causes139,140,141. At least half of the genetic risk is predicted by common genetic variations, while another 15–20% is due to spontaneous mutations or predictable inheritance patterns139,140,141. These genetic variants may interact with vitamin D's role in neurodevelopment, as recent research supports a new vitamin D-related genetic theory of autism142,143,144,145. Finally, questions remain regarding the reproducibility of our biological findings. Further studies are needed to address these questions. Our work provides promising early insights, laying the groundwork for the implementation of clinical trials on vitamin D supplementation as a treatment for ASD and other neurodevelopmental disorders.

Materials and methods

Calculation of the ASQ-comm score

Primary caregivers of VDAART children submitted the Ages and Stages Questionnaire52 (3rd Edition, https://agesandstages.com/; Paul H. Brookes Publishing Co., Inc.) during the children’s annual three year visit. In the ASQ, caregivers were asked a series of questions about the ability of their children to perform a particular task. Answers to each question were scored with 10 points if the answer was “Yes”, 5 points if the answer was “Sometimes”, and 0 points if the answer was “Not yet”. Questions pertained to five main developmental domains: gross motor skills, fine motor skills, problem solving ability, personal/social skills, and communication. Six questions per ___domain were assigned, resulting in a ___domain specific score. In this work, we focused on the scores from the communication skills ___domain due to the demonstrated association between a poor performance in this ___domain at age three and subsequent ASD risk53,54. We referred to this score as the “communication score”.

The communication score was compared to the expected mean score of a reference distribution within age-groups. Scores were then categorized as follows: “On Schedule for developing normally”; “Requires Monitoring” (1–2 standard deviations below the mean); and “Needs further evaluation” (> 2 standard deviations below the mean). We referred to this categorization as the ASQ-comm score.

Metabolomic profiling and preprocessing

We analyzed previously generated and preprocessed metabolomic data54. Briefly, blood samples were obtained from participating children in VDAART at age three years. Metabolites were analyzed as measured LC–MS peak areas and identified by their mass-to-charge ratio, retention time, and through a comparison to library entries of purified known standards. The blood samples were processed in two batches sent six months apart (batch one n = 245; batch two n = 688) then merged and scaled together based on equivalence of the control groups. If a metabolite was missing in 50% or more of the samples from either dataset, it was excluded from further analysis. All remaining missing values were imputed with half the minimum peak intensity for that metabolite across the whole population. Data were pareto scaled to account for the differences in the scales of measurements across the metabolome. Metabolites were log-transformed to create approximately Gaussian distributions and to stabilize variance. See our previous paper54 for extensive description of data extraction and processing.

LIONESS

Correlation networks have been successful in aggregating and summarizing multidimensional biological data146,147, but they do not account for the intrinsic variability across the samples of a population. To fill this gap, our group developed LIONESS55, a method to reverse-engineer sample-specific networks starting from the aggregate network of an entire population.

The founding principle of LIONESS is that an aggregate network, summarizing information from a set \(\left( \alpha \right)\) of \(N\) samples, can be conceptualized as the average of individual component networks, representing the contributions from each member within the input sample set. This principle can be formalized mathematically through the following equation:

$$e_{ij}^{\left( \alpha \right)} = \mathop \sum \limits_{q = 1}^{N} w_{q}^{\left( \alpha \right)} e_{ij}^{\left( q \right)} ,$$

where \(e_{ij}^{\left( \alpha \right)}\) and \(e_{ij}^{\left( q \right)}\) represent a weighted edge between two nodes \(i\) and \(j\) in the set-wide aggregate network and in the individual network of sample \(q\), respectively. \(w_{q}^{\left( \alpha \right)}\) represents the contributing weight of sample \(q\) to the set \(\left( \alpha \right)\). If all samples have equal weight, \(w_{q}^{\left( \alpha \right)} = 1/N\). Starting from the above equation, the individual network \(e^{\left( q \right)}\) of sample \(q\) can be reconstructed as (see original paper55 for derivations):

$$e^{\left( q \right)} = N\left( {e^{\left( \alpha \right)} - e^{{\left( {\alpha - q} \right)}} } \right) + e^{{\left( {\alpha - q} \right)}} ,$$

where \(e^{\left( \alpha \right)}\) is the aggregate network estimated using all \(N\) samples in the set \(\left( \alpha \right)\) and \(e^{{\left( {\alpha - q} \right)}}\) is the aggregate network estimated using all samples except for sample q \(.\) In these aggregate networks, nodes may represent individual omics entities (e.g. metabolites) and edges may represent relationships summarizing biological information from the population-based omics matrix (e.g. the correlation between two metabolites). Intuitively, this equation expresses an individual network as the sum of two contributions: one that accounts for biological patterns that are specific to the individual \(q\), and a second that accounts for biological patterns that are shared across all the samples in the population.

In our application, we employed Pearson correlation to build the two aggregate metabolic networks \(e^{\left( \alpha \right)}\) and \(e^{{\left( {\alpha - q} \right)}}\) for the VDAART study sample \(\left( \alpha \right)\). In these correlation networks, each metabolite pair (\(i,j)\) is connected through a weighted edge. The edge’s weight represents the co-expression coefficient between \(i\) and \(j\) calculated across the VDAART study sample before (\(e^{\left( \alpha \right)}\)) and after (\(e^{{\left( {\alpha - q} \right)}}\)) removing individual \(q\). Using Eq. (1), we reconstructed the individual weighted network of each VDAART child.

Linear regression model to generate the VDI network

LIONESS generated 381 fully connected, individual, weighted networks. For each edge in those networks, we regressed the distribution of edge weights across the VDAART children against their phenotypic traits and family history. We used the following multivariable regression model:

$$weight_{LION} \sim ASQ + vitD + vitD_{mat} + vitD:ASQ + vitD_{mat} :ASQ$$

The terms \(ASQ\) and \(vitD\) represent children’s ASQ-comm score and vitamin D levels at age three years, respectively; \(vitD_{mat}\) represents maternal Vitamin D levels during late pregnancy (within 32–38 weeks); and \(vitD:ASQ\) and \(vitD_{mat} :ASQ\) represent interaction terms between the ASQ-comm score and children’s and maternal vitamin D levels, respectively. This model allows us to identify the metabolic edges associated with the interaction between maternal and early childhood vitamin D levels and child’s communication development. The model was adjusted for the children’s race and sex, maternal education level and marital status, treatment group, and clinical site of plasma collection (see Supplementary Table 1). Linear regression was performed using the function lmfit in the R package LIMMA148. P-value statistics were obtained from the LIMMA function eBayes. This analysis provided a list of linear regression coefficients and p-values for each edge (metabolite pairs).

Pre-ranked MSEA Enrichment analysis

MSEA enrichment analysis was performed using the python package GSEApy149. First, we ranked all metabolites in the VDI network based on their degree. If two nodes had the same degree, the one having the highest “Interaction Coefficient” (IC) was ranked higher. We defined the IC of a node as the mean absolute interaction coefficient \(mean\left( {vitD:ASQ,vitD_{mat} :ASQ} \right)\) averaged across all the VDI edges connected to the node, as calculated from the linear regression analysis. Nodes with high IC exhibit connections highly associated with the interaction terms between maternal and children’s vitamin D and the ASQ-comm score. As such, they likely represent key regulators of these interactions. Nodes that were not included in the VDI network were assigned degree zero. To break ties between zero-degree nodes, we ranked them based on their IC. We downloaded all the human metabolic pathways within the KEGG58 database using the functions process_kegg and process_gmt of the python package ssPA57. Based on these KEGG pathways, we performed pre-ranked Metabolic Set Enrichment Analysis (MSEA) on our pre-ranked list of VDI metabolites using the function prerank of GSEApy. P-values were adjusted using the Benjamini-Hochberg (BH)-FDR correction, and an FDR < 0.15 was used to identify significantly enriched pathways.

Hierarchical clustering analysis of the individual metabolic networks

To identify similarities in the metabolic networks of the VDAART children, we selected the subgraph of VDI edges in each LIONESS network. We then z-score normalized the weights of each edge across the individual VDI networks and performed hierarchical clustering on the z-scores using Spearman correlation as similarity distance and the complete-linkage metrics. Based on the hierarchical structure of the cluster dendrogram, the optimal number of clusters was obtained by cutting the tree such that all the descendent links in each cluster are shorter than a cut-off distance of 1.15. We obtained five clusters (see Supplementary Fig. 3). We compared the phenotypic and molecular characteristics of these clusters.

In each cluster, we analyzed (1) children’s communication scores and vitamin D levels at age three, (2) maternal vitamin D levels during pregnancy (within 32–38 weeks), in the cord blood, and 1 year after delivery, and (3) children’s history of asthma/wheeze, and (4) maternal asthma.

For each of these variables, we compared its distribution in each cluster versus all other clusters. P-value statistics were obtained using the Mann–Whitney u test for the communication score, T-test for the other numerical variables, and chi-squared for binary variables. A cluster was considered significantly different from the others for a given variable if its p-value was less than 0.1. For reference, in the main text we reported p-values trends lower than 0.15.

Additionally, we identified marker metabolic edges for each cluster. For each edge connecting a pair of metabolites, we used a T-test to compare the distribution of that edge’s LIONESS weight in the cluster versus the rest of the population. An edge was considered a marker edge in the cluster if its p-value was less than 0.1.

KEGG tryptophan reaction network

To build the tryptophan reaction network, we used the Biopython150 package Bio.Kegg. We downloaded the KEGG Markup Language58 (KGML) associated with the tryptophan metabolism pathway (hsa00380). We then selected the chemical network from the KGML graph object. In this network, nodes represent chemical compounds and edges represent chemical reactions between a substrate and a product (as defined in the KEGG documentation58). The resulting reaction network was composed of 41 nodes and 39 edges. We mapped all the metabolites of our VDI networks onto the KEGG reaction network and computed the shortest paths between pairs of metabolites connected by a VDI edge. We identified one shortest path connecting L-kynurenine with 5HIAA. For each metabolite pair \(\left( {A,B} \right)\) in this path, we investigated if its expression ratio was statistically associated with the children’s ASQ-comm scores. Specifically, we used the linear regression model \(ASQ\sim expr_{A} + \frac{{expr_{A} }}{{expr_{B} }}\). The model was adjusted for the children’s race and sex, maternal education level and marital status, treatment group, and clinical site of plasma collection.

Ethics approval and consent to participate

This study is a secondary analysis from VDAART. VDAART was originally conducted as a clinical trial registered on “ClinicalTrials.gov” with the registration identification number NCT00920621 (10/08/2018) (https://clinicaltrials.gov/ct2/show/NCT00920621). All portions of this study were conducted in accordance with the Declaration of Helsinki. Local institutional review boards approved the protocol. Written informed consent was obtained from all individuals, and, if participants were under 18, from a parent and/or legal guardian. The institutional review boards at each participating VDAART Clinical Center and the Data Coordinating Center at Brigham and Women’s Hospital approved protocols of the trial. This project was additionally approved by Brigham and Women’s Hospital IRB protocol 2018P000478.