Introduction

SAPHO syndrome is a rare disease that presents a diverse range of clinical features, including chronic inflammatory osteoarticular lesions and dermatological disorders1,2. More than 50 different terminologies have been used to describe the association between osteoarticular and dermatological lesions, resulting in high heterogeneity and posing challenges to diagnosis, management, and research3,4. Bone and joint diseases are significant factors in the development of SAPHO syndrome, displaying varying imaging characteristics during different stages of the disease. Patients may experience pain, stiffness, limited movement, or soft tissue swelling at the affected site5.

Whole-body bone scintigraphy (WBBS) is a highly effective initial method for the systematic evaluation of osteoarticular lesions in SAPHO syndrome. The bullhead sign observed on WBBS is considered a characteristic feature of SAPHO syndrome6. WBBS can detect both symptomatic and asymptomatic lesions and is more sensitive in detecting underlying lesions than computed tomography7. However, the scintigraphic features of SAPHO syndrome are often not specific, making it challenging to differentiate it from other conditions. In particular, bone metastases, i.e., secondary bone tumors (SBT), can present similarly on bone scans with multifocal abnormal radiotracer uptake8. In our experience, many patients with SAPHO syndrome were misdiagnosed with metastatic disease before arriving at the correct diagnosis. This is especially concerning when the providers or radiologists are not familiar with the clinical and imaging manifestations of SAPHO syndrome9. It is of high clinical significance to identify imaging characteristics to assist in the differential diagnosis of SAPHO syndrome and SBT10.

Systematic research on the imaging differential diagnosis between SAPHO syndrome and SBT remains limited, largely due to the low incidence of SAPHO and insufficient clinical awareness. Conventional imaging techniques such as X-ray and CT are valuable in diagnosing osteolytic SBT, as they can effectively demonstrate features like worm‐like bone destruction and soft tissue infiltration. Nonetheless, the sclerotic lesions seen in osteoblastic or mixed SBT are frequently confounded with the bone hypertrophy characteristic of SAPHO syndrome, with both conditions presenting as cortical thickening and medullary cavity stenosis. By contrast, MRI detects bone marrow edema with a sensitivity exceeding 90%, although its specificity is comparatively limited. In SAPHO syndrome, the bone marrow edema pattern—marked by symmetrical involvement of multiple vertebrae—can overlap with the “skip” distribution observed in metastatic disease, where soft tissue masses are also common, rendering approximately 25% of cases challenging to differentiate. Moreover, positron emission tomography-computed tomography (PET-CT) distinguishes metastases from SAPHO by revealing high FDG uptake in metastatic lesions versus mild or focal uptake in SAPHO; however, PET-CT may yield false negatives in osteoblastic metastases (such as those from prostate cancer), with an overall sensitivity of roughly 70%.

The objective of this research was to identify distinguishing patterns of osteoarticular involvement in SAPHO syndrome and SBT on whole-body bone scintigraphy using machine learning modeling. The analysis was conducted on a large multicenter cohort with a clearly defined population.

Method and materials

Participants

This study retrospectively included 600 patients diagnosed with secondary bone tumors without coexiting rheumatic immune diseases, and 593 patients who met the criteria for SAPHO syndrome proposed by Kahn and Khan11. The patients with SAPHO syndrome were from our dynamic cohort of SAPHO syndrome12,13. These patients came from three hospitals respectively and were divided into two groups. Group 1 consisting of 521 SAPHO and 500 SBT patients from Peking Union Medical College Hospital and Beijing University of Chinese Medicine Fangshan Hospital, was used for model training purposes. Group 2 was used to verify the model's performance, which included 72 SAPHO and 100 SBT patients from Zhejiang Provincial People’s Hospital. Patients who had undergone WBBS using 99mTc-MDP (Technetium-99 m Methylene Diphosphonate) after the onset of the disease were included in this study. The Ethics Committee of the Hospitals approved this study (Number of Ethics documents: ZS-944). All patients gave their written informed consent, following the principles of the Declaration of Helsinki.

Whole body bone scintigraphy

We gathered data on WBBS using 99mTc-MDP for all patients, which was conducted after disease onset and performed following relevant guidelines and regulations. The imaging was limited to delayed phase imaging, and no blood flow or blood pool imaging was collected. Our research collected data based on WBBS reports, and we ensured that physiological uptake was excluded by having professional radiological doctors review the text and image information.The sites of lesions were classified into five major regions, namely the anterior chest wall (including sternoclavicular joints, costosternal joints, clavicles, and sternum), ribs (anterior and posterior), axial skeleton (including cervical, thoracic, lumbar, and sacral spine, and sacroiliac joints), peripheral joints and bones, others (pelvis, and skull). The bull's-head sign was defined as symmetrical increase in uptake of the sternoclavicular region, including the manubrium, the sternoclavicular joints, and the adjacent clavicles, which was widely considered typical for SAPHO syndrome14. To optimize the impact of individual features on the machine learning model, we grouped the 87 anatomic sites identified on the WBBS reports into 30 anatomic regions, as shown in Table 1.

Table 1 Frequency of various sites with abnormal radiotracer uptake on whole-body bone scintigraphy in patients with SAPHO syndrome and secondary bone tumors.

Features selection and model construction

To minimize the number of independent variables, we first removed four variables that showed no significant differences between groups in the Chi-square test for RxC contingency table. During our statistical analysis, we discovered that the lesions of the sternoclavicular joint, sternocostal joint, jaw, and typical bullhead sign were exclusively present in SAPHO syndrome cases. However, to avoid overfitting of our model, we decided to exclude these variables.

To further reduce the number of variables, we used three machine learning methods to identify factors that possess predicting capacities: logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO) regression, and random forest. Logistics and Lasso regression are two classical techniques used for dimensionality reduction in machine learning. Both methods involve the use of default parameters in the model training process to minimize the number of features used for prediction. The LASSO regression uses tenfold cross-validation to identify the optimal lambda value. The random forest algorithm selects random subsets of objects and variables to create multiple decision trees. These trees are then used to classify objects and evaluate the importance of each feature. In our analysis, we utilized 1000 random trees and a variable reduction of 1.5, and tenfold cross-validation to identify the optimal number of features15,16. The feature selection was performed by SPSSAU (Statistical Product and Service Software Automatically) 2023 (https://www.spssau.com). The training set ratio for machine learning is 0.7.

The feature variables selected through machine learning partially overlapped. After ranking according to the weight of each variable's impact on the model, we manually managed and eliminated the four variables with the smallest impact in the Lasso regression and no overlap with the other two methods. The variables selected by manual management , random forest and logestics regression were used to train the model in the G1 dataset respectively. The model was evaluated using metrics such as Accuracy, Precision, Recall, F1-score, and AUC. To validate the model, we used G2 dataset as an external dataset.

Statistical analysis

Statistical analysis was performed using the SPSS 26.0 software and R software (4.1.3). P < 0.05 indicated a statistical significant difference. Clinical data were analyzed using the SPSS software. The chi-square test was used for classified variable analysis, the t-test for continuous variables of normal distribution, and the Mann–Whitney U test for non-normal or unknown distribution. The R packages “car”, “rms”, and “pROC” were used to analyze the receiver operating characteristic (ROC) curve.

Results

Patient characteristics

Six hundred patients with secondary bone tumors (237 females and 363 males) and 593 patients with SAPHO syndrome (392 females and 201 males) were included in this study. The two groups' mean age and standard deviation are 57.8 (12.3) and 42.9 (10.2). For patients with secondary bone tumors, the primary malignancies included lung (n = 262), breast (n = 52), thyroid (n = 26), prostate (n = 224), and bladder cancers (n = 36).

Consistent with our previous cohort studies6 and literature, it is evident that the anterior chest wall (85.5%) is the most commonly affected area, including the sternoclavicular joint (61.6%), sternum (35.6%), clavicle (10.3%) and sternocostal joint (10.3%). The prevalence of rib lesions is greater in patients diagnosed with secondary bone tumors (57.7%) than in those with SAPHO disease (46.5%). In SAPHO patients with rib lesions, the primary area affected is the first to fifth anterior ribs (41.8%), with sternocostal joints being the main culprits. Conversely, for patients with secondary bone tumors, rib lesions were found to be uniformly distributed.

Secondary bone tumors are more likely to affect the axial skeleton region (58.3%) than SAPHO syndrome (35.9%). In the case of secondary bone tumors, the spinal vertebrae are evenly affected, whereas SAPHO syndrome primarily affects the lower lumbar spine (22.8%) and sacroiliac joints (30.7%). Shoulder joints (28%) and long bones such as the femur (28%) and humerus (15.7%) are more frequently affected by secondary bone tumors than peripheral joints or bone lesions. Conversely, SAPHO syndrome usually affects the knee joint (12.3%) and foot (14.7%). Pelvic lesions (48%) are a typical characteristic of secondary bone tumors. Mandibular lesions (11.6%) were only observed in patients with SAPHO syndrome in our study, while other areas of the skull (22.3%) (e.g., frontal, parietal, temporal bones) are commonly affected by secondary bone tumors.

Terms selection and model establishment

The G1 dataset is utilized to choose the most relevant features and to build the model. The chi-square test was used to screen out four insignificant terms, including 1st–3th lumbar vertebrae, elbow, wrist, and hand. Four terms only found in SAPHO, which are the typical bull head sign, mandible, sternoclavicular and sternocostal joint, were eliminated to prevent the model from overfitting. The remaining 22 terms, including 2 terms of anterior chest wall, 4 terms of rib, 7 terms of axial bone, 4 terms of peripheral joint and bone, pelvis, and skull except for jaw, were processed by LASSO, RF, and LR machines learning method screening respectively.

The LASSO regression selects 19 variables for the model, as shown in Fig. 1A. The feature influence of random forest screening is sorted according to MeanDecreaseAccuracy, and we selected the top 14 features based on the multiple correction error rate, as shown in Fig. 1B. The process of logistics stepwise regression involved selecting the most appropriate 13 features for building the model, using 12 stepwise iterations forward and backward. The filtered features of the three machine learning methods are sorted according to the confidence of their respective models after numerical normalization, as shown in Fig. 1C. The four features from Lasso regression, which are knee, ankle, 4th–5th lumbar vertebrae, and cervical vertebra, have low confidence and were not considered for the other two models. Our clinical experience indicated that these features have no significant correlation with SAPHO and SBT. We removed them manually from our analysis, and we proceeded with the remaining 15 features to build a final model based on logistics regression.

Fig. 1
figure 1

(A) The LASSO regression uses tenfold cross-validation to identify the optimal lambda value and selects 19 variables for the model. (B) The feature influence of random forest screening is sorted according to MeanDecreaseAccuracy, and we selected the top 14 features based on the multiple correction error rate. C The filtered features of the three machine learning methods are sorted according to the confidence of their respective models after numerical normalization.

To compare the performance of the three groups of models in the G1 dataset, we evaluated the models in both the training and test sets. We used radar charts (Fig. 2B) to display the normalized evaluation parameters. Even though the ROC AUC of the LR model (training AUC 0.934, 95% CI 0.917–0.951; testing AUC 0.929, 95% CI 0.887–0.971) is lower than the RF model (training AUC 0.939, 95% CI 0.919–0.959; testing AUC 0.938, 95% CI 0.913–0.963), as shown in Fig. 2A. The comprehensive evaluation index of the LR model is better, including accuracy (84.691%), precision (85.934%), recall (84.691%), and f1-score (0.845). Our analysis shows that the clavicle, 5th-8th thoracic vertebrae, and sacrococcygeal vertebrae can improve the diagnostic performance of the model. After adding the three terms, the comprehensive evaluation indicators of the manually managed model, including accuracy (88.274%), precision (88.675%), recall (88.274%), and F1-score (0.882), were further improved. The G2 dataset was utilized to validate the model externally, and all three models demonstrated consistent results during the validation process. Comparing the ROC AUC of the validation set and comparing it through the Delong test, the AUC value of the manual management model is 0.957 (95% CI 0.924–0.989). The sensitivity is 0.820 (95% CI 0.755–0.885), and specificity is 0.986 (95% CI 0.956–1.000), corresponding to the Youden index of 0.806.

Fig. 2
figure 2

(A) In the training set and test set of G1 dataset, the model ROC AUC and RF model after manually managing (0.938 and 0.939) filtered features are similar, and both are higher than LR model (0.934 and 0.929). (B) We use radar plots to display the normalized evaluation parameters. The comprehensive evaluation indicators of the manual management group such as accuracy (88.274%), precision (88.675%), recall (88.274%), f1-score (0.882), etc. have further improved.

Model validation and term ranking

To explore which features have the greatest impact on the diagnostic model, we ranked features using their weights in an LR model built from 15 manually curated features. Through the LR forest plot (Fig. 3), we found that pelvis (OR 38.29, 95% CI 18.11–89.10), fumerus (OR 8.69, 95% CI 3.67–21.84), posterior rib 1th-5th (OR 6.19, 95% CI 2.24–19.16), skull without jaws (OR 5.26, 95% CI 2.66–10.80), posterior rib 6th-12th (OR 4.48, 95% CI 2.16–9.92), anterior rib 6th–12th (OR 4.17 95% CI 2.21–8.03), shoulder (OR 3.10, 95% CI 1.69–5.78) has a significant positive impact on the SBT diagnosis. Additionally, sacroiliac joint (OR 0.04, 95% CI 0.01–0.08), sternum (OR 0.10, 95% CI 0.05–0.18), foot (OR 0.21, 95% CI 0.09–0.44), anterior rib 1st–5th (OR 0.29 95% CI 0.18–0.46) has a significant positive impact on the SAPHO diagnosis.

Fig. 3
figure 3

To explore which features have the greatest impact on the diagnostic model, we rank features using their weights in an LR model built from 15 manually curated features.cWe found that pelvis (OR: 38.29, 95% CI 18.11–89.10), fumerus (OR 8.69, 95% CI 3.67–21.84), posterior rib 1th-5th (OR 6.19, 95% CI 2.24–19.16), skull without jaws (OR 5.26, 95% CI 2.66–10.80), posterior rib 6th-12th (OR 4.48, 95% CI 2.16–9.92), anterior rib 6th–12th (OR 4.17 95% CI 2.21–8.03), shoulder (OR 3.10, 95% CI 1.69–5.78) has a significant positive impact on the SBT diagnosis. Additionally, sacroiliac joint (OR 0.04, 95% CI 0.01–0.08), sternum (OR 0.10, 95% CI 0.05–0.18), foot (OR 0.21, 95% CI 0.09–0.44), anterior rib 1st–5th (OR 0.29 95% CI 0.18–0.46) has a significant positive impact on the SAPHO diagnosis.

Discussion

SAPHO syndrome is a complex medical condition that was proposed in 1987, characterized by a combination of bone/joint and skin lesions17. It poses a significant challenge to clinicians due to its highly heterogeneous nature, especially when secondary bone tumors need to be ruled out18,19. In this study, we present the largest multicenter cohort to investigate the distributional characteristics of SAPHO and SBT in the field of WBBS. Our team employed machine learning techniques to sift through a vast array of WBBS terms and develop an effective differential diagnosis model for SAPHO and SBT. The terms selected by the RF, LR and manual management align with prior research findings, and the model exhibits excellent diagnostic performance.

The findings of this research align with prior studies, indicating that SAPHO syndrome predominantly affects the anterior chest wall6. Enthesitis originates from the costoclavicular ligament and primarily affects the sternoclavicular joint, clavicle, and sternocostal joint. The first anterior ribs are also commonly involved due to their ___location in the chest wall. According to our study, it was discovered that the terms, such as sternocostal joints, sternoclavicular joints, and the typical bullhead sign in the anterior chest wall were exclusively associated with SAPHO. To prevent overfitting the model, we decided to remove terms related to these three regions. Metabolic concentration in sternoclavicular joints and sternocostal joints may be observed in non-SAPHO patients as well, but it is primarily attributed to trauma, infectious agents, or degenerative disorders20. While our collection of patients has shown physiological metabolic uptake, no definite pathological changes have been observed. Although there have been some case reports of SBT involving the sternoclavicular or sternocostal joints21,22, it is not reliable to diagnose SBT based solely on the involvement of these parts. The bull's head sign has long been considered a reliable diagnostic criterion of SAPHO syndrome when using whole-body bone scintigraphy23. However, this study involving a larger sample size has still revealed that this sign is present in only approximately 10% of patients with the condition. The involvement of the sternum is a significant predictor of SAPHO syndrome, which is often caused by inflammation of the nearby joints24. On the other hand, SBT typically presents as isolated sternal involvement and is more commonly associated with breast cancer or thyroid25. However, in our research, we found that approximately 40% of prostate cancer cases exhibited sternal metastasis, which may be related to the multiple systemic metastases in the advanced cases we studied.

SAPHO and SBT exhibit distinct distribution patterns in the ribs, axial bone and other peripheral joints and bones. SAPHO mainly affects the mandible, which is another characteristic manifestation besides anterior chest wall involvement. A small number of temporomandibular joint or maxillary involvement in the study is considered an extension of mandibular osteitis26. Jaw involvement may be observed in patients with periodontal disease who do not have SAPHO, while it is important to emphasize that SBT predominantly affects other skull regions excluding the jaw27. In the model we established, the sacroiliac joint is the strongest influencing factor of SAPHO, similar to other types of seronegative spondyloarthritis. On the other hand, SBT involves all vertebrae of the spine, especially the upper to mid-thoracic spine. The pelvic lesion appeared to be the strongest predictive term for SBT in the model, while the sacroiliac joints were rarely affected, which aligns with our clinical observations. The archive of the Rizzoli institute indicates that the pelvis is the most probable site for bone metastasis, apart from the spine28. This trend may be attributable to the blood flow distribution and the bone marrow microenvironment characteristics in this region29. Rib lesions in patients with SAPHO predominantly affect the anterior ribs, particularly on the first anterior rib. The primary source is often traced to the sternocostal joints. In patients with SBT, rib lesions are evenly distributed throughout all regions of the rib30. In terms of peripheral bones, SAPHO primarily affects the metatarsal joint31, while SBT tends to target the shoulder and femur.

It should be noted that our study has some limitations. Although cross-validation and external validation using the G2 dataset have been implemented, the potential for model overfitting remains. Enhancing the current binary data by expanding it into ordered categorical or continuous variables could improve the expressiveness of the features. Economic considerations and radioprotection measures, along with the low incidence of SAPHO and the widely dispersed population, complicate data collection efforts. Nevertheless, we are committed to gathering data in future studies to increase the volume of analyzable image data. While a multicenter design enhances the applicability of our findings, it is crucial to validate the results in non-Chinese populations.

Conclusion

This study assesses the effectiveness of WBBS terms in identifying SAPHO syndrome from SBT and utilizes machine learning to help screen features for patients. Results obtained from two datasets demonstrate the dependability of the model, providing a valuable tool for accurate and timely diagnosis.