Introduction

Bipolar disorder (BD) is a chronic, recurrent condition primarily characterized by fluctuations in mood and energy levels1,2. Depression often presents as the initial symptom, accounting for approximately 75% of symptomatic episodes3. The severity and nature of bipolar depression often resemble those of unipolar depression (UD)3,4,5. Notably, around 69% of individuals with BD are initially misdiagnosed with UD. On average, patients consult four different doctors before receiving an accurate diagnosis of BD, which poses significant challenges for both patients and their families6. In the first five years after a UD diagnosis, the conversion rate to BD is approximately 2.5% annually, dropping to 0.5% per year thereafter7. Consequently, this study focuses on UD patients with a history of recurrent depressive episodes lasting more than five years. The annual suicide rate among BD patients is approximately 0.9%, compared to just 0.014% in the general population, and an estimated 15–20% of individuals with BD die by suicide. This underscores the need for early diagnosis and treatment to improve patient outcomes3.

Distinct pathophysiological mechanisms differentiate BD from UD, including variations in brain structure, inflammation levels, and brain function8. Current diagnoses for these disorders rely on the DSM-5 and ICD-10 classifications9,10. However, identifying objective biomarkers that reflect the inherent characteristics of these disorders is crucial for accurate differentiation between UD and BD11.

Previous research has revealed that BD patients exhibit mitochondrial dysfunction and metabolic decline in the prefrontal cortex (PFC)12. A In contrast, patients with major depressive disorder (MDD) demonstrate dysregulation in prefrontal circuits, with structural and functional changes in the PFC playing a key role in mood disorders13. Redlich et al. found that UD patients exhibited reduced activation in both the nucleus accumbens (NAC) and the PFC compared to BD patients14. While functional magnetic resonance imaging (fMRI) is the predominant tool used in cognitive studies of mood disorders, it is costly, requires complex operational maintenance, and is highly sensitive to head movements, potentially compromising data quality. These limitations hinder the broader adoption of fMRI. In light of these challenges, functional near-infrared spectroscopy (fNIRS) offers a promising alternative for examining cognitive and functional brain changes in BD and UD patients15.

The fNIRS is an optical neuroimaging technique that uses near-infrared light to track changes in concentrations of oxygenated hemoglobin (Oxy-Hb) and deoxygenated hemoglobin (Deoxy-Hb) to investigate brain function16. Compared to electroencephalography (EEG) or functional magnetic resonance imaging (fMRI), functional near-infrared spectroscopy (fNIRS) offers unique advantages. Its relatively portable equipment allows for use in various environments, and it is safe, non-invasive, unrestrictive, quiet, motion-tolerant, and cost-effective17. fNIRS has been proven effective in detecting brain function and serves as a viable alternative for patients unable to undergo fMRI. When conducting brain function assessments on individuals with BD and UD, physical movement may occur due to emotional instability or restlessness. However, fNIRS demonstrates a high tolerance to motion artifacts. Additionally, given the psychologically sensitive and vulnerable nature of BD and UD patients, fNIRS only requires probe placement on the scalp surface, avoiding discomfort or potential risks.fNIRS is particularly effective in measuring oxygenation changes in the prefrontal cortex during cognitive tasks18 a region critical for cognitive control19. The prefrontal cortex is closely associated with stress perception, emotional regulation, and executive function20. Executive function refers to the ability to plan, make decisions, and self-regulate in the face of complex tasks21. Cognitive impairments, especially those affecting executive function, are commonly observed in BD and MDD22,23.

Among emotional and cognitive tasks, the verbal fluency task (VFT) is widely considered one of the most effective measures for assessing the control, planning, activation, and monitoring processes involved in emotional regulation24. During the VFT, participants are asked to generate as many unique words as possible within given categories (phonemic or semantic) under time constraints25. This task requires the initiation of verbal behaviors, strategic retrieval of verbal information, and self-monitoring to avoid intrusions and repetitions26. The VFT has been extensively used in fNIRS research to explore functional cognitive impairments.

Despite the growing body of research on BD and UD, few studies have examined brain activation patterns in these patients during fNIRS-VFT tasks. One study observed that BD patients exhibited decreased activation in the bilateral ventrolateral prefrontal cortex and anterior temporal cortex compared to healthy controls27. Additionally, Feng et al. found that BD patients showed lower activation in the left inferior frontal gyrus during the VFT, while UD patients demonstrated reduced activation in the left dorsolateral prefrontal cortex, bilateral ventrolateral prefrontal cortex, and bilateral orbitofrontal cortex15. Although these studies relied heavily on manual feature extraction, recent advancements in deep learning offer more efficient and accurate approaches. To improve the differentiation between BD and UD, this study applies a supervised learning approach using a one-dimensional convolutional neural network (1D-CNN), which captures temporal fluctuations in emotional states28. By extracting key features associated with emotional disorders, our model enhances the accuracy of distinguishing BD from UD29 thereby providing a more robust foundation for clinical diagnosis.

We therefore hypothesized that abnormalities in prefrontal area functioning may lead to enhanced negative emotions in BD and UD, and that abnormal activation in patients with BD may be emotional state specific, with activation showing a biphasic dynamic imbalance.This study utilizes fNIRS to investigate prefrontal cortex activation in UD and BD patients, as well as healthy controls, during a verbal fluency task. By integrating deep learning techniques with 1D-CNN analysis, we aim to identify specific brain regions or activation patterns that can differentiate between UD and BD, providing a more objective foundation for clinical diagnosis.

Materials and methods

Participants

This study included 172 patients treated at Huzhou Third People’s Hospital between November 2022 and August 2024. The inclusion criteria were as follows: (1) meeting diagnostic criteria for bipolar disorder with depressive episodes or recurrent depressive disorder according to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5), confirmed by two psychiatrists; (2) age range of 18 to 60; (3) a Hamilton Depression Rating Scale (HAMD-17) score above 17; (4) a Young Mania Rating Scale (YMRS) score below 7; (5) right-handedness; and (6) at least a primary education level. Exclusion criteria included: (1) presence of other psychiatric disorders, such as schizophrenia or dementia; (2) a significant medical history; (3) recent receipt (within the past month) of electroconvulsive therapy (MECT) or repetitive transcranial magnetic stimulation (rTMS); (4) history of substance or alcohol abuse; (5) diagnosis of a first depressive episode; and (6) pregnancy or breastfeeding. Additionally, 40 healthy controls (HC) matched for gender, age, and education level were recruited, with HAMD-17 scores below 7.

This study was approved by the ethical review committee of Huzhou Third People’s Hospital, and informed consent was obtained from all participants prior to inclusion.

Verbal fluency task

The verbal fluency task (VFT) is designed to assess vocabulary knowledge, linguistic creativity, and executive function30. The test consists of three phases: a 30-second preparation rest period, a 60-second formal task period, and a 70-second recovery rest period. During the preparation and recovery phases, participants are instructed to verbally repeat the numbers one through five. In the formal task phase, participants are prompted to generate as many words as possible using the Chinese characters “hua,” “he,” and “jiang,” with a 20-second time limit per character. Participants remain seated and minimize movements throughout the test to ensure accuracy of the results (Fig. 1).

Fig. 1
figure 1

Verbal fluency task.

fNIRS measurement

In this study, we used the ETG-ONE functional near-infrared spectroscopy (fNIRS) imaging device from Hitachi, Japan, to measure brain activity. This device operates by emitting near-infrared light at wavelengths of 695 nm and 830 nm. We configured 22 channels, consisting of 8 emitter probes and 7 receiver probes, arranged in a 3 × 5 layout over the frontal lobe region. The distance between each emitter and receiver probe was set to 3 centimeters, with measurement points positioned 2–3 centimeters subcutaneously, corresponding to the cortical surface31 at a sampling rate of 10 Hz. Probe placement followed the EEG 10–20 system, initially positioning the central probe of the bottom row at the brow ridge and extending the remaining channels from the brow ridge to the pinna. Based on Brodmann area distributions, channels were assigned to specific prefrontal regions: the dorsolateral prefrontal cortex (DLPFC) included channels 1, 2, 3, 4, 5, 9, 14, and 18; the frontopolar prefrontal cortex (FPC) included channels 6, 8, 10, 11, 12, and 13; the orbitofrontal cortex (OFC) included channels 15, 17, 20, and 21; and the ventrolateral prefrontal cortex (VLPFC) included channels 19 and 22 (Fig. 2). Cortical activation was recorded during the verbal fluency task, and concentrations of oxygenated and deoxygenated hemoglobin in the cortex were indirectly calculated along with their differential concentrations, based on the Beer-Lambert law.

Fig. 2
figure 2

Distribution of fNIRS channels and corresponding brain regions.

Data analysis

Data analysis was conducted using SPSS 26.0 and MATLAB. Quality control of the fNIRS data was performed using the NIRS_KIT toolbox (MATLAB 2021a)32 to check for physiological noise, time-___domain, frequency-___domain, or spatial features of head motion artefacts. Preprocessing was then initiated.The initial preprocessing step involved detrending the raw data. Motion artifacts were then corrected using the TDDR method, followed by the application of a third-order IIR Butterworth bandpass filter (0.001 to 0.1 Hz) to remove irrelevant low- and high-frequency components. After filtering, optical data signals detected by the photodetectors were converted into oxygenated hemoglobin (Oxy-Hb) and deoxygenated hemoglobin (deoxy-Hb) concentrations using an enhanced Beer-Lambert law. Only Oxy-Hb was analyzed, as prior studies suggest that Oxy-Hb more accurately reflects changes in regional cerebral blood flow compared to Deoxy-Hb33. the Oxy-Hb signal is a more direct reflection of cognitive activation than the Deoxy-Hb signa34 and therefore Oxy-Hb levels have been used to compute cortical activity to analyze fNIRS data35.

Cortical activation was indicated by increases in Oxy-Hb, calculated by subtracting the mean Oxy-Hb values during the resting periods before and after the task from the average Oxy-Hb during the task.

Data counts are reported as frequencies or percentages (%), with chi-square tests used for intergroup comparisons. The Shapiro-Wilk test was used to assess normality for quantitative data. Normally distributed data is presented as mean ± standard deviation (`x ± s) and analyzed using one-way ANOVA for intergroup comparisons. Non-normally distributed data is presented as the median and interquartile range [M(P25, P75)], with the Kruskal-Wallis rank-sum test applied for intergroup comparisons. Bonferroni correction was used for multiple comparisons. Statistical significance was set at P < 0.05, with a 95% confidence interval (CI) for all analyses.

Deep learning approaches

During data preprocessing, we performed data cleaning and standardization on the collected dataset. To enhance dataset diversity and robustness, we applied data augmentation techniques, including noise addition and Gaussian smoothing. We have added the following information to the revised manuscript: A combined strategy of Gaussian smoothing (σ = 0.1) and adding Gaussian noise (µ = 0, σ = 0.02) was used in the data enhancement stage, and the degree of smoothing and noise intensity was controlled by adjusting the sigma parameter. The time series are dynamically truncated/filled during data loading to ensure that all samples are uniformly of preset sequence length (hyperparameters determine the specific value). The preprocessing process is completely encapsulated in the fNIRSDataset class, and the batch data loading is realized through DataLoader, which is designed to ensure the reproducibility of the experiment and the standardization of the engineering implementation. In terms of model training optimization, a dynamic adaptive hyper-parameter tuning system is constructed: the AdamW optimizer (initial learning rate 0.0002, weight decay 0.001) is used to balance the gradient updating and regularization constraints, the validation loss is monitored in real-time through the ReduceLROnPlateau scheduler, and when the loss stagnates in 5 rounds, it automatically attenuates the learning rate by 50%, and a 40-round early A 40-round early stopping mechanism is set to prevent invalid training. During the training process, a five-fold overfitting prevention and control strategy is implemented: (1) online data enhancement with randomly perturbed input signals; (2) introduction of 0.6 probability dropout at the fully connected layer; (3) built-in L2 weight constraints in the optimizer; (4) dynamic partitioning of the validation set by five-fold cross-validation; and (5) real-time termination of the overfitting tendency by early stopping mechanism. The validation system adopts a two-layer validation framework, where the outer layer divides the training set and test set by 4:1, and the inner layer refines the training set into a training subset and a validation subset by 5-fold cross-validation. Two hundred fifty rounds of iterative training are performed to ensure parameter convergence, and the final model demonstrates a stable classification performance on the independent test set. All the above experiments were conducted based on the Pytorch platform.

The model uses a three-layer one-dimensional convolutional neural network (1D-CNN) architecture. The first convolutional layer has a kernel size of 9, with 22 input channels and 64 output channels. The second layer employs a kernel size of 5, with 64 input channels and 128 output channels, and the third layer features a kernel size of 3, with 128 input channels and 256 output channels. Following each convolutional layer, we applied the ReLU activation function and batch normalization to enhance training stability. Each convolutional layer is also followed by a max pooling layer with a kernel size of 2 and a stride of 2, reducing the spatial dimensions of the feature maps.

After the 1D-CNN processes the input, we concatenated the extracted features with integral values and passed them into a fully connected layer. The first fully connected layer has an input dimension of 39,958 and an output dimension of 256, while the second layer has an input dimension of 256 and an output of 128. Each fully connected layer includes the ReLU activation function, batch normalization, and dropout to stabilize training and prevent overfitting. Finally, a Softmax function converts the output into a probability distribution corresponding to the three classes: healthy, unipolar depression, and bipolar depression.

The loss function of the model is based on cross-entropy and optimized using the AdamW algorithm with a learning rate of 0.0002 and a weight decay of 0.001. The training parameters consist of 32 batches and 250 calendar hours, with the early stopping patience set to 40 calendar hours. We used the KFold cross-validation method to divide the dataset into five folds. For each fold, the training set was randomly divided into a training subset and a validation subset. Models are trained on the training subset, losses are computed via forward propagation, and parameters are updated via backpropagation. Key performance metrics (e.g., accuracy on the validation subset) are monitored, and the learning rate is adjusted, or early stops are triggered using the ReduceLROnPlateau scheduler.

For model validation, we select 30 random seeds for experiments, and 20% of the data from each random seed is extracted as the test set, while the remaining 80% is divided into the training and validation sets by the leave-one-out method to ensure that the test set does not overlap with the training and validation sets at all. Through multiple training and validation, we calculate various performance metrics (e.g., accuracy, recall, F1 value, etc.) of the model on the training, validation, and test sets and take the average value of these metrics in order to evaluate the performance of the model on the training data more comprehensively and stably and to avoid bias in the evaluation results due to the randomness in the division of the dataset.

Through multiple sets of experiments, we explored the effects of different convolutional kernel sizes (e.g., 3, 5, 7, 9, 11), step sizes (1, 2), the number of convolutional layers (1 to 5), and the number of neurons in the fully-connected layer (32, 64, 128, 256) on the effect of feature extraction and the model’s ability to learn complex patterns. These experiments helped us to determine the optimal combination of parameters, which improved the overall performance of the model.

Results

Demographic and clinical characteristics

No significant gender differences were observed among the UD patients, BD patients, and healthy participants (χ² = 0.021, p = 0.989), nor were there significant differences in age (Z = 0.257, p = 0.880), BMI (Z = 1.742, p = 0.418), duration of illness (Z = 0.570, p = 0.752), and years of education (Z = 1.853, p = 0.173). However, HAMD scores showed a significant difference across the three groups (Z = 94.088, p < 0.001) (Table 1).

Table 1 Comparison among three groups concerning demographic data and clinical scales.details.

Differences in prefrontal cortex activation among groups during the VFT

A one-way ANOVA was conducted on the activation values across the three groups. The results revealed significant differences in oxyhemoglobin concentration changes among the three groups across eight brain regions during the VFT (Table 2). Figure 3 presents the post-hoc comparisons across these regions, indicating a notable reduction in prefrontal cortex activation in both UD and BD patients compared to healthy participants (Fig. 3A-H). Additionally, UD patients showed significantly lower activation values than BD patients in the DLPFC, FPC, lOFC, and lVLPFC regions (Fig. 3A, B, C, D, F, H).

Table 2 Analysis of variance of changes in oxygenated hemoglobin concentration in eight brain regions in three groups.
Fig. 3
figure 3

In the task of verbal fluency, significant differences emerged in the activation of the prefrontal cortex among the three groups. (A) Box plots of Oxy-Hb variations in the right dorsolateral prefrontal cortex for the three groups. (B) Box plots of Oxy-Hb variations in the left dorsolateral prefrontal cortex for the three groups. (C) Box plots of Oxy-Hb variations in the right anterior prefrontal cortex for the three groups. (D) Box plots of Oxy-Hb variations in the left anterior prefrontal cortex for the three groups. (E) Box plots of Oxy-Hb variations in the right orbitofrontal cortex for the three groups. (F) Box plots of Oxy-Hb variations in the left orbitofrontal cortex for the three groups. (G) Box plots of Oxy-Hb variations in the right ventrolateral cortex for the three groups. (H) Box plots of Oxy-Hb variations in the left ventrolateral cortex for the three groups.(**p < 0.01,***p < 0.001).

Classification (Deep Learning)

We conducted 30 randomized experiments on each brain region to assess model performance on the test set. The results, summarized in Table 3, show that the highest classification accuracy reached 79.57% in the left orbitofrontal cortex (lOFC), while the lowest accuracy was 61.64% in the left ventrolateral prefrontal cortex (lVLPFC).

Table 3 The three-class classification accuracy for each brain region.

Using the fNIRS classification results, we plotted the ROC curve for diagnosing UD across eight brain regions (Fig. 4). During the VFT task, the left orbitofrontal cortex (lOFC) exhibited the highest area under the curve (AUC) for UD patients, with an AUC of 0.99. The optimal cutoff value was 0.89, resulting in a sensitivity of 100% and a specificity of 89% (Table 4).

Fig. 4
figure 4

ROC curve for Class UD.

Table 4 During the VFT period, the AUC for distinguishing UD patients from the other two groups in each prefrontal area, along with the corresponding sensitivity, specificity, and Oxy-Hb cut-off points.

The fNIRS classification results were used to plot the ROC curve for diagnosing BD across eight brain regions (Fig. 5). During the VFT task, the right dorsolateral prefrontal cortex (rDLPFC) showed the highest area under the curve (AUC) for BD patients, with an AUC of 0.91. The optimal cutoff value was 0.66, achieving a sensitivity of 78% and a specificity of 89% (Table 5).

Fig. 5
figure 5

ROC curve for Class BD.

Table 5 During the VFT period, the AUC for distinguishing BD patients from the other two groups in each prefrontal area, along with the corresponding sensitivity, specificity, and Oxy-Hb cut-off points.

ROC curves were generated for diagnosing healthy controls (HC) across eight cortical regions using fNIRS classification results (Fig. 6). During the VFT task, the right dorsolateral prefrontal cortex (rDLPFC) exhibited the highest area under the curve (AUC) for HC, with an AUC of 0.98. The threshold value was set at 0.83, achieving a sensitivity of 83% and a specificity of 100% (Table 6).

Fig. 6
figure 6

ROC curve for Class HC.

Table 6 During the VFT period, the AUC for distinguishing HC group from the other two groups in each prefrontal area, along with the corresponding sensitivity, specificity, and Oxy-Hb cut-off points.

Discussion

This study aimed to explore the potential of fNIRS in distinguishing patients with unipolar depression (UD) from those with bipolar disorder (BD). By combining fNIRS and a verbal fluency task (VFT), we assessed differences in prefrontal cortex (PFC) activation patterns across three groups—UD, BD, and healthy controls—using a one-dimensional convolutional neural network (1D-CNN) deep learning approach. Our findings demonstrate distinct PFC activation patterns across the groups during the fNIRS-VFT task, supporting the use of fNIRS as an auxiliary diagnostic tool to improve diagnostic accuracy for UD and BD in clinical settings.

Our study presents three main findings:

First, both UD and BD patients showed significantly reduced PFC activation compared to healthy participants, consistent with prior research. For example, previous studies using fNIRS reported that BD patients exhibit lower average changes in oxy-Hb during VFTs compared to healthy controls, with reduced bilateral DLPFC activity positively correlated with depression severity36. Similar findings have shown that UD patients demonstrate significantly lower oxyhemoglobin levels in both the prefrontal cortex and temporal lobes during VFT tasks relative to healthy participants15,37,38.This may be due to the fact that the VFT assesses vocabulary production, executive ability, and cognitive flexibility39 and the onset of affective disorders affects mood, and cognition40 so that patients with affective disorders are imaged differently than healthy individuals on the fNIRS. Prior to the prevalence of fNIRS, studies on magnetic resonance also demonstrated limited activation of brain regions during the Stroop task in patients with bipolar disorder compared to healthy subjects41 and reduced dynamic functional connectivity between the posterior cingulate cortex and the medial prefrontal cortex in patients with bipolar disorder during the VFT task42. Magnetic resonance imaging also revealed an imbalance between left and right prefrontal activation in UD patients during emotion judgments43and a decrease in prefrontal activation during the VFT44. This may be because glutamate is the main excitatory neurotransmitter in the cerebral cortex that coordinates vascular and metabolic responses to neuronal activity behind changes in functional imaging signals45. Patients with mood regulation disorders-bipolar disorder and depression-have elevated levels of glutamate in the frontal cortex, and the frontal glutamatergic system is disturbed thereby affecting brain function46. Conversely, prefrontal activation is deficient, the diminution of prefrontal regulation, and the limbic brain is hypothesized to be dysregulated, leading to extreme moods such as mania, depression, and mixed states47.

Second, UD patients showed notably lower activation values in the DLPFC, FPC, lOFC, and lVLPFC regions compared to BD patients. Both the VLPFC and DLPFC play key roles in emotional regulation48,49,50 with the VLPFC particularly involved in the downregulation of negative emotions51. Compared to the right VLPFC, feedback from the left VLPFC is associated with enhanced self-regulation capabilities and improved emotional control across the frontal lobe52. Increased DLPFC activation has been linked to reduced subjective social distress53. The orbitofrontal cortex is functionally connected to brain regions such as the amygdala, hippocampus, and dorsolateral prefrontal lobe, and is responsible for receiving sensory and emotional input signals related to decision-making behaviors54 reward expectancy55 impulsivity, and other psychological activities. Reduced OFC function impairs decision-making and emotional processing, and may affect impulsivity in depressed and suicidal individuals56. Magnetic coimaging studies of adolescent bipolar depressed patients with suicidal behavior found reduced volume in the left orbitofrontal cortex and that the reduced volume was negatively correlated with suicide-to-mortality rates57. The MDD and BD groups had reduced resting-state functional connectivity in the left orbitofrontal cortex and the left anterior cingulate cortex58. Orbital frontal cortex-hippocampus (OFC-HPC) circuits were noted to mediate the remission of depression in a study on EEG59. The FPC, among the densest cortical regions, is integral to complex social, emotional, and cognitive processes and coordinates DLPFC and VLPFC integration60,61. Imaging studies have noted a reduction in grey matter volume in the FPC of UD patients62. Kawano found a negative correlation between cerebral blood flow in the FPC and depressive symptom severity, suggesting the FPC as a potential biomarker region for depression severity63. Prior studies indicated lower hemodynamic activation in the VLPFC of UD compared to BD patients15 though these studies often did not distinguish between first-episode and recurrent depression. Other research has noted differences in activation between BD and UD in the left FPC and Broca’s area64. For instance, Tran’s study found that BD patients displayed greater activation in the VLPFC and right FPC than UD patients65. Using a VFT paradigm, Liu and colleagues reported activation differences in lateral and orbitofrontal PFC regions between UD and BD patients66. The above study partially agrees with our findings, suggesting that bipolar-depressed patients have better activation patterns. For patients with UD and BD there are differences in the performance of brain structures under cognitive tasks may be due to the heterogeneity of mood disorders and the impact of current emotional states on cognitive performance47. Although the patients selected for this study were in depressive episodes, patients with BD had more fluctuating mood states throughout the illness, and this heterogeneity may be precisely reflected in the activation patterns15.

Third, to the best of our knowledge, studies combining MRI and pattern recognition techniques to explore biomarkers of psychiatric disorders have grown substantially in recent years. In a study of resting-state fMRI data from 46 patients with MDD and 57 HCs based on the multivariate pattern analysis (MVPA) learning method, an 86.4% correct classification rate was found67 and pattern classification analysis using the support vector machine (SVM) method was 86% accurate in distinguishing between BD and UD68. BD and UD in an emotional stimulus task with multivariate pattern analysis produced a significant classification rate of 72%69. However, these do not involve the training of network structures as complex as deep learning, so we applied a novel deep learning approach using 1D-CNNs to classify fNIRS data, which significantly enhances fNIRS data classification by automatically extracting complex features, thereby improving classification accuracy. Deep neural networks capture nonlinear relationships among brain regions, providing greater robustness and generalizability in multi-class classification tasks. In this study, the three-class classification task achieved an accuracy of 79.57% in the lOFC, a critical region in cognitive processes such as behavioral inhibition, decision-making, and emotional regulation15,70. Further analysis revealed that, during the VFT task, the lOFC was the most effective region in diagnosing UD patients, with an AUC of 0.99. Similarly, the rDLPFC demonstrated high diagnostic efficacy for BD patients (AUC = 0.91) and for identifying healthy controls (AUC = 0.98). These findings indicate the high accuracy of our machine learning models in classifying patients with these disorders, underscoring the value and reliability of fNIRS technology as a clinical aid in the diagnosis of psychiatric disorders, and by providing real-time, noninvasive data on neural activity, it can complement the subjective limitations of the DSM-5 and the Structured Interview in terms of both the quantification of biomarkers and the dynamic monitoring of symptoms.

Limitations and future research

Some limitations in this study must be considered. First, the relatively small sample size may have led to an increased risk of confounding and selective bias, so future research needs to test whether these patterns replicate in larger samples. It would have been better to analyze the results by subgroups (gender and age). Although one study found no hemodynamic differences between genders during a verbal working memory task by fNIRS71. And the patients selected for this study were 18–60 years old, although the possible influence of minors and elderly people on the results was avoided, a perfectly matched sample in terms of age and gender is still ideal for subsequent studies, so as to further explore and validate whether this characteristic has an effect on hemodynamics. Second, the patients receiving medications were not ruled out from the study. Although previous studies have revealed no correlation between antidepressant dosages and Oxy-Hb activation in both BD and UD groups72. However, further exploration of the confounding effect between the use of multiple medications and changes in brain function is needed. Third, this study was a cross-sectional study. Future longitudinal studies are needed to monitor changes in blood oxygen levels during the disease and the effects of different drugs on hemodynamic activation during the VFT task and to assess the stability of fNIRS-derived biomarkers over time.Fourth, this study did not collect behavioral data from the participants. However, in previous studies, it was found that task performance was not significantly different between many patient and healthy groups73,74. Fifth, fNIRS can only measure cortical areas and cannot measure the hemodynamic responses of subcortical structures such as the amygdala. Sixth, the NIRS-KIT toolkit used in this study offers a comprehensive analysis workflow. It also provides flexible data processing capabilities. However, the study did not utilize the Vector Phase Analysis Approach75. Omitting this method may hinder the effective separation and analysis of signals related to neural activity, thus affecting the accurate assessment of brain function. Future research could explore and adopt more advanced signal processing techniques to enhance the reliability of the results.

Conclusions

The findings of this study indicate that while both bipolar depression and unipolar depression patients demonstrate significant impairments in emotional regulation compared to healthy controls, patients with bipolar depression retain relatively better emotional regulation abilities than those with unipolar depression. This pattern of differential fNIRS prefrontal-based activation during the processing of cognitive tasks may contribute to elucidating the pathophysiological mechanisms of mood disorders and may serve as a potential marker to help differentiate between bipolar and unipolar depression.