Introduction

Trigeminal neuralgia is a disorder characterized by recurrent, unilateral, short bursts of electric shock-like pain affecting one or more branches of the trigeminal nerve1. This condition has been recognized since ancient times, with countless individuals suffering its debilitating effects over the centuries2. To date, its diagnosis heavily relies on patients’ personal descriptions of pain and associated symptoms3,4. This reliance on subjective symptom reporting poses significant diagnostic challenges, including risks of both overdiagnosis and misdiagnosis. Over 60% of patients with trigeminal neuralgia experience misdiagnoses, often leading to delays in accurate assessment5. Many of these patients undergo unnecessary treatments, such as dental procedures6. Conversely, cases of overestimated trigeminal neuralgia diagnoses are common, with only 21% of suspected cases ultimately confirmed based on international diagnostic criteria7. These diagnostic challenges highlight the limitations of subjective symptom assessment and underscore the need for objective diagnostic tools that can support accurate trigeminal neuralgia diagnosis in complex clinical scenarios.

Magnetic resonance imaging (MRI) serves as a valuable tool for excluding differential diagnoses and identifying structural causes, such as neurovascular compression (NVC). However, MRI has limitations related to cost and efficiency, particularly in screening patients with facial pain. Additionally, MRI struggles to differentiate idiopathic trigeminal neuralgia from cases caused by factors other than NVC.

NVC has long been considered a primary etiological factor in trigeminal neuralgia8,9. However, recent studies have challenged this assumption, suggesting that NVC may not be a necessary or sufficient condition for the disorder’s development10,11. This has prompted the exploration of other contributing factors, including structural variations in the skull, which may play a role in the pathogenesis of trigeminal neuralgia. Although structural differences in the skull have been considered potential mechanical contributors for over 50 years12,13, their analysis has remained challenging. The practical utility of detecting subtle variations in plain skull X-ray images through visual inspection has diminished over time.

Recent advancements in image analysis through deep learning have created new opportunities to revisit these claims. This technology enables the re-evaluation of structural variations in trigeminal neuralgia, potentially uncovering valuable insights that were previously unattainable. In medical imaging, deep learning, particularly convolutional neural networks (CNNs), has shown remarkable capability in identifying features that may elude human observers14. Therefore, in this study, we aimed to evaluate the applicability of trigeminal neuralgia in distinguishing plain skull X-ray images using CNN and to assess the potential for its application as a screening tool.

Materials and methods

Study enrollment

This study was reviewed and approved by the Institutional Review Boards (IBRs) of Seoul National University Bundang Hospital (approval No. B-1910-572-101) and Ajou University Medical Center (approval No. AJIRB-MED-MDB-22-234). It adhered to the Declaration of Helsinki guidelines for biomedical research involving human subjects and complied with all relevant ethical regulations established by the IBRs. Informed consent requirements were waived by the committees due to the retrospective nature of the study and the deidentified character of the data. Lateral skull radiographs were collected using the following inclusion criteria: (1) patients aged over 16 years and (2) lateral skull radiographs without evidence of prior craniofacial surgeries or truncation of the image. Clinically diagnosed patients with trigeminal neuralgia were included if they had a of paroxysmal unilateral orofacial pain distributed along the trigeminal nerve territory, triggered by typical maneuvers15. Eligible patients were identified through a trigeminal neuralgia registry and medical records. Between January 2013 and June 2020, 277 patients were diagnosed with trigeminal neuralgia. For the control group, patients with unruptured intracranial aneurysms who underwent clipping surgery were selected, as they commonly had skull radiographs taken for preoperative planning. A clinical data warehouse query identified 3045 such patients, of whom 2706 had available skull radiographs. After a thorough review of medical records and radiographs, 166 patients with trigeminal neuralgia and 1702 control patients, all with appropriate skull X-ray images, were retained as eligible study subjects. To minimize confounding effects, a three-fold matching process by age and sex was conducted16. This resulted in 664 total patients (166 with trigeminal neuralgia and 498 controls) being enrolled in the study. Control group subjects were confirmed to have no history of facial pain-related surgeries, procedures, or prescriptions.

The datasets were split into training, validation, and test sets in a 6:2:2 ratio using random permutation. An external test dataset consisting of 100 patients (50 with trigeminal neuralgia and 50 controls) from other institutions, selected following the same procedures described above, was used to evaluate model generalizability. The patient selection process is illustrated in Fig. 1.

Fig. 1
figure 1

Flowchart shows the process of enrollment of patients and partitioning.

Convolutional neural network classifier

We used TensorFlow (version 2.0.0, https://www.tensorflow.org/) as a framework to train and evaluate the neural network model for skull radiograph classification. A pre-trained ‘Densely Connected Convolutional Network’ (DenseNet-121)17 model was imported with initial weights configured using the ImageNet18 and ChesXNet datasets19. The original DenseNet-121 architecture was designed to classify 1000 ImageNet labels and 14 ChesXNet labels using 224 × 224-pixel images. For this study, the model was adapted to a binary classification task. The top layers, consisting of fully connected nodes for 1000 and 14 labels, respectively, were replaced with a new output layer containing two nodes. The original softmax activation function was replaced by a sigmoid activation function \(( f\left(x\right)= \frac{1}{1+{e}^{-x}} )\), resulting in 7,039,554 trainable parameters. The loss function was defined as binary cross-entropy (\(-{y}_{true}\text{log}f(x) -\left(1-{y}_{true}\right)\text{log}(1-f\left(x\right) )\)). Various optimizers, including SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, and Nadam, were tested with initial learning rates ranging from 10–6 and 10–3 to optimize model performance.

Preprocessing, model training, and evaluation

Given the modest size of the training dataset, image augmentation was employed to enhance data diversity. Two augmentation techniques were applied: (1) contrast-limited adaptive histogram equalization to improve image contrast and highlight specific features20, and (2) random combinations of rotation (within 15°), translation (within 10%), transposition (within 10%), and brightness alterations (within 20%) (Fig. 2). To address class imbalance (1:3 ratio), the control group was augmented by a factor of 14, resulting in 4172 images, while the trigeminal neuralgia group was augmented by a factor of 20, resulting in 2000 images. For the validation dataset, only contrast-limited adaptive histogram equalization was used, producing two iterations for each label. The test dataset remained unaltered.

Fig. 2
figure 2

Flowchart shows each augmentation step and related example images. The images are slightly blurred with gausian filtering to enhance anonymity. In the actual investigation, blurring was not applied to the images. CLAHE, contrast limited adaptive histogram equalization.

Model performance was assessed using the following metrics derived from the confusion matrix:

  • Accuracy: \(\frac{TP+TN}{TP+FP+TN+FN}\)

  • Precision: \(\frac{TP}{TP+FP}\)

  • Recall: \(\frac{TP}{TP+FN}\)

  • F1-score: the harmonic mean of precision and recall.

  • Area under the receiver-operating characteristic curve (AUROC): calculated using the probability produced by the sigmoid output for each label

where:

  • True Positives (TP): Correctly predicted cases of trigeminal neuralgia.

  • True Negatives (TN): Correctly predicted control cases.

  • False Positives (FP): Control cases incorrectly classified as trigeminal neuralgia.

  • False Negatives (FN): Trigeminal neuralgia cases incorrectly classified as control.

During training, classification accuracy and AUROC were continuously monitored. Training was terminated early if AUROC did not improve over 20 consecutive epochs. After training, model performance was assessed on a separate test dataset. Gradient-weighted class activation mapping (Grad-CAM) was utilized to identify important features and regions of interest influencing the classifier’s decision between the input and final convolution layers21. Then, gradient heatmaps were overlaid on the original images to visualize the spatial importance of predictions correlated with anatomical structures. For external validation, the model’s performance was evaluated on an independent test dataset collected from another institution, adhering to the Personal Information Protection Act.

Statistical analysis

All statistical analyses were performed using R software (version 4.0.2; https://www.r-project.org). Patient age was presented as the median with interquartile range (IQR) and analyzed using the Mann–Whitney U test, as normality was rejected based on the Shapiro–Wilk test. Patient sex was presented as counts (percentages) and analyzed using the chi-square test. Case–control matching was conducted using the MatchIt package in R.

Results

The demographic information of the 166 patients enrolled in this study is summarized in Table 1. The median age in both groups was identical at 60.5 years (range: 24–87 years). Of the total cohort, 108 patients (65.1%) were female. None of the patients in the control group reported facial pain. Among the 166 patients, 108 experienced pain on the right side, while 58 experienced pain on the left side. Pain distribution in 58 patients involved a combination of two nerves: V1 and V2, V1 and V3, or V2 and V3. Isolated involvement of the V1 division occurred in four patients, while V2 and V3 were each affected in 52 patients. The duration of pain prior to the first outpatient visit averaged 4 years, ranging from less than 1 year to 30 years. Notably, 32 patients reported experiencing pain for more than 10 years. At the time of the outpatient visit, 78 patients reported a Barrow Neurological Institute Pain Intensity (BNI) score of 3, indicating that their pain was controlled with medication. In contrast, 88 patients reported a BNI score of 4 or higher, reflecting inadequate pain control despite medication. Of the 166 patients, 118 underwent microvascular decompression surgery for pain treatment, 18 underwent gamma knife surgery, and 30 were managed with medication alone.

Table 1 Baseline clinical characteristics of study population (n = 166).

Among various combinations of pretrained models, optimizers, and learning rates, the DenseNet-121 model with ImageNet initial weights, optimized using Adam with an initial learning rate of 10–5 exhibited the best performance based on AUROC. Training under this configuration met the early termination condition and halted at the 66th epoch, achieving a validation AUROC of 0.8095.

On the test dataset comprising 133 images, the model achieved an overall accuracy of 0.8722. The AUROC values for predicting trigeminal neuralgia and control were 0.9006 and 0.8858, respectively. For trigeminal neuralgia prediction, precision and recall were 0.7500 and 0.7273, respectively, resulting in an F1-score of 0.7385. For control prediction, precision and recall were 0.9109 and 0.9200, respectively, with an F1-score of 0.9154. The weighted averages of precision, recall, and F1-score were 0.8710, 0.8722, and 0.8715, respectively.

During external validation, the model exhibited a slight performance degradation. The AUROC values for predicting trigeminal neuralgia and control were 0.8160 and 0.8272, respectively. Trigeminal neuralgia prediction yielded a precision of 0.6767 and a recall of 0.8800, resulting in an F1-score of 0.7521. For control prediction, precision and recall were 0.8182 and 0.5400, respectively, with an F1-score of 0.6506. The weighted averages of precision, recall, and F1-score during external validation were 0.7374, 0.7100, and 0.7014, respectively. A detailed summary of performance metrics is presented in Table 2 and Fig. 3. Grad-CAM analysis revealed that for predicting trigeminal neuralgia, the principal attention areas were predominantly located around the sphenoid body and clivus in most TP cases. However, in some cases, heatmaps exhibited broader spatial distributions (Fig. 4a). Among FP cases, the sphenoid body demonstrated a higher gradient distribution with a wider attention area (Fig. 4b). In contrast, predictions for control cases primarily focused on the calvarium or cervical spine in both TN and FN cases. Notably, the trained model did not consider the sphenoid body in these instances, which is a critical factor in trigeminal neuralgia prediction (Fig. 4c,d).

Table 2 Model performance indicators.
Fig. 3
figure 3

Plots show area under the receiver operating characteristic of trained model to classify test set (A) and external test set (B). Class 1 and 0 refer trigeminal neuralgia and control, respectively. ROC, receiver operating characteristic.

Fig. 4
figure 4

Image matrix shows the heatmaps for decision of trained model by Grad-CAM application. In the case of a prediction as trigeminal neuralgia, higher gradient heatmaps are concentrated around the sphenoid body to clivus. However, false positive cases show that the area is more spreadout (A,B). Heatmaps for predicting control is concentrated at the calvarium or the cervical spines (C,D). The skull images are slightly blurred with gausian filtering to enhance anonymity. In the actual investigation, blurring was not applied to the images.

Discussion

We determined whether deep learning could distinguish trigeminal neuralgia from control cases on plain skull radiographs. Our findings indicate that patients with trigeminal neuralgia have distinct structural skull features compared to a control group, and that deep learning models can distinguish trigeminal neuralgia with an accuracy of approximately 80%.

The deep learning model focused predominantly on the region from the clivus to the sphenoid bone, a critical pathway where the trigeminal nerve originates at the brainstem, traverses Meckel’s cave, and passes through the gasserian ganglion. The model’s focus on this region reflects its ability to detect subtle anatomical variations associated with trigeminal neuralgia, suggesting that structural factors in the trigeminal nerve pathway may contribute to trigeminal neuralgia etiology.

The etiology of trigeminal neuralgia has been extensively studied, in particular, traction on the trigeminal nerve pathway has been considered an important factor in the pathogenesis of trigeminal neuralgia11,22. This traction could induce abnormal root stretching force, which might promote hyper excitability of the nerve and contribute the development of trigeminal neuralgia. With advancements in imaging technologies, various causes of traction have been identified. Vascular compression at the root exit zone is notably the most recognized cause of such traction9,23. In addition, thickened arachnoid membranes and adjacent bony structures have been implicated as contributing factors to the development of trigeminal neuralgia due to the traction they exert on the trigeminal nerve11,22. Our findings support the hypothesis that the anatomical configuration of trigeminal neuralgia patients, as distinguished from normal individuals, may contribute to the pathogenesis of trigeminal neuralgia by providing an environment that predisposes the trigeminal nerve to traction or easily induces traction. Interestingly, skull structural differences have long been considered as a potential mechanical factor in trigeminal neuralgia24,25. Historical assertions dating back to the early 1900s suggested differences in the plane of the middle fossa and angle of the petrous process among individuals,12 and recent advanced neuroimaging techniques have reaffirmed these distinctions, particularly in patients with trigeminal neuralgia26. Parise et al. analyzed the morphology of the posterior fossa using MRI, it was found that patients with trigeminal neuralgia tended to have smaller cerebello-pontine angle cisterns than normal subjects26,27. These differences can lead to a closer nerve-surrounding structure relationship, thereby increasing the likelihood of neurovascular compression. The higher prevalence of trigeminal neuralgia in females is presumed to be related to sex-dependent anatomical differences in the posterior fossa28,29. Additionally, Andrei et al. reported that the petrous bone angle, particularly petrous ridge at the exit from Meckel’s cave, is more acute in trigeminal neuralgia patients compared to controls30. These studies suggested that change of anatomical configuration could affect or could be affected by trigeminal neuralgia. However, most of these studies have been limited by relatively small sample sizes to allow for significant conclusions. We used deep learning to analyze structural differences in large sample sizes and verified reliable results. To the best of our knowledge, this is the first study to use deep learning for identifying skull differences in patients with trigeminal neuralgia that distinguish them from normal individuals. Our findings suggest that anatomical configuration may play a role in the development of trigeminal neuralgia and could contributes to a comprehensive understanding of trigeminal neuralgia by bridging historical perspectives with contemporary anatomical nuances, yielding potential advancements in diagnosis and management.

Integration of deep learning into the diagnostic landscape has highlighted the potential alternative utility of plain skull X-ray imaging. Our model demonstrated a level of discriminatory power that could make it a viable screening tool for trigeminal neuralgia. The deep learning model successfully differentiated trigeminal neuralgia from control cases in plain skull radiograph images with an overall accuracy of 87.2%, which is significantly higher than chance. To determine the accuracy of classifying radiographs that the classifier had never seen before, we used a separate set of tests to validate it, rather than a cross-validation approach where all images are used for training. Additional validation tests were performed using new datasets from other institutions to assess the general performance capabilities. Consequently, the deep learning model achieved significant performance accuracy and efficiency in distinguishing trigeminal neuralgia. Our study established a reliable identification model, which can be used to screen patients, making the diagnostic protocol for trigeminal neuralgia more efficient and systematic.

Despite these promising results, this study had several limitations. First, the use of two-dimensional skull images limited the ability to precisely identify the specific regions and mechanisms underlying the observed differences between patients with trigeminal neuralgia and controls. A deeper understanding of these intricacies would require three-dimensional imaging techniques, such as computed tomography (CT) MRI. Second, the composition of the control group posed a potential source of bias. In this study, patients with unruptured intracranial aneurysms were used as controls, given the difficulty of obtaining plain skull X-ray images from healthy individuals. While this approach helped match the control group to the trigeminal neuralgia cohort in terms of sex ratio and age distribution, it may have influenced the findings. Finally, the retrospective nature of this study and the exclusive inclusion of patients from an Asian population introduce limitations regarding the generalizability of the findings. Skull morphology, including features such as dolichocephaly and brachycephaly, varies across populations, potentially affecting characteristics like the petrous bone angle in relation to the middle fossa plane31. Further research is needed to investigate these variations in diverse populations and to explore their implications for trigeminal neuralgia pathogenesis and diagnosis.

The findings of this study highlight the potential of deep learning-based models to distinguish between plain skull X-ray images of patients with trigeminal neuralgia and controls. Our preliminary results suggest that plain x-ray can be potentially used as an adjunct to conventional MRI, ideally with CISS sequences, to aid in the clinical diagnosis of TN. However, additional research is necessary to explore the influence of three-dimensional skull structural variations on the brain parenchyma, the trigeminal nerve’s course, and surrounding vascular structures.