Introduction

Rheumatoid arthritis (RA) and renal fibrosis (RF) are two complex and debilitating conditions that have garnered significant attention in the medical and scientific communities. Despite their seemingly distinct anatomical locations and initial manifestations, an increasing body of evidence suggests a potential connection between RA and RF1.

RA is an autoimmune disease primarily characterized by erosive arthritis and can afflict individuals at any age. One of the common extra-articular manifestations in RA patients is renal disorders2. Until a few years ago, glomerulonephritis remained the most prevalent renal disease in RA3. The prevalence rate of renal diseases among RA patients ranges from 5 − 50%4. Over time, the cumulative incidence of renal function decline in the RA population is significantly higher than that in the non-RA population5. Taking chronic kidney disease (CKD) as an example, compared with the non-RA population, RA patients have an increased risk of developing CKD6; moreover, the comorbidity of CKD is a risk factor for RA patients failing to achieve clinical remission, which can augment the risks of hospitalization due to infection and mortality7.

In RA, recent studies have leveraged transcriptomic and proteomic data to identify novel diagnostic biomarkers. For example, machine learning analysis of autophagy-related genes in peripheral blood identified EEF2, HSP90AB1, and TNFSF10 as promising diagnostic markers for RA, demonstrating excellent diagnostic accuracy8. Similarly, in RF, integrative bioinformatic analyses of kidney transcriptomes from unilateral ureteral obstruction (UUO) models revealed hub genes such as Bmp1 and CD74, which may serve as biomarkers for chronic kidney disease progression9. These studies highlight the power of ML-driven approaches in dissecting disease heterogeneity and prioritizing candidate biomarkers from complex datasets.

Previous studies have primarily focused on identifying biomarkers for RA or RF in isolation. Existing diagnostic paradigms rely primarily on clinical symptoms, serological markers, and histological assessments, which may fail to detect comorbid conditions at their earliest stages.

In the present study, we utilize bioinformatics and machine learning techniques to investigate potential common pathogenic genes underlying the comorbidity of RA and RF, thereby addressing this critical research gap. By analyzing transcriptomic datasets from patients with RA and RF, we aim to identify common pathogenic genes driving their pathogenesis and elucidate the intricate molecular networks interconnecting these two diseases. Our hypothesis posits that a subset of genes are differentially regulated in both RA and RF, serving as pivotal regulators within the shared pathogenic pathways. The identification of shared biomarkers could enable preemptive interventions, thereby improving outcomes for patients at risk of dual organ damage. The discovery of these genes will not only deepen our fundamental understanding of the relationship between these diseases but also provide valuable insights for potential biomarkers and therapeutic targets.

Methods

Dataset screening and processing

Figure 1 presents the analysis process undertaken in this study.Gene expression profiling data were retrieved from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo). In this study, five disparate datasets were employed. For rheumatoid arthritis (RA), two independent gene expression profiles were used for gene screening, namely GSE12021 and GSE55457. In GSE12021, there were 21 synovial membrane samples, with 9 originating from normal controls and 12 from RA patients10. Correspondingly, GSE55457 consisted of 23 synovial membrane samples, where 10 were from normal controls and 13 were from RA patients11. GSE89408 was used as the external validation set for RA, among which 12 were from normal controls and 152 were from RA patients12. Regarding renal fibrosis, two datasets were also incorporated. GSE76882 encompassed 99 control samples and 175 samples of renal fibrosis13 while GSE22459 comprised 25 control samples and 40 renal fibrosis samples14. Subsequently, the gene expression matrix was extracted, and the probes were annotated in accordance with the annotation file of the platform.Detailed information of the dataset samples can be found in the supplementary material table S1.

Fig. 1
figure 1

The flowchart of this study.

Identification of differentially expressed genes (DEGs)

Within the R software environment, the “limma” package was employed to perform an analysis aimed at identifying differentially expressed genes (DEGs)15. For the comparison between RA patients and healthy controls, the datasets GSE55457 and GSE12021 were utilized. Additionally, for the comparison between RF patients and healthy controls, the dataset GSE76882 was used. The significance of the DEGs was determined based on the criteria of a |log2FC)| >1 and an adjusted p-value < 0.05.

Weighted gene co‑expression network analysis (WGCNA) for identifying trait‑related genes

In the present study, we initially employed the R package “WGCNA” to establish a weighted co-expression network on the datasets16 which served as the fundamental infrastructure for subsequent genetic analyses. Subsequently, our attention was directed towards the genes exhibiting the top 30% of expression variance across all profiles. By doing so, we were able to identify specific gene sets and further explore their associations with diseases. In particular, we determined the co-expression modules related to RA and RF. Subsequently, the weighted adjacency matrix was transformed into a topological overlap matrix (TOM), enabling a different perspective on the relationships among genes and providing a more suitable data format for subsequent clustering analyses. Based on the dissimilarity measure (dissTOM = 1 - TOM) derived from the topological overlap, the genes were hierarchically clustered. This clustering approach grouped the genes according to their similarities, facilitating the discovery of potential functional modules and interactions among them. Subsequently, the hierarchical clustering tree method was utilized to select modules containing more than 100 genes. This step further refined the clustering results, focusing on larger modules that likely possess more significant biological implications for in-depth investigation. Finally, from these selected modules, genes that exhibited strong correlations with clinical features were extracted by computing the gene significance (GS) and module membership (MM).

Protein–protein interaction analysis

In this study, the STRING database (version 12.0) was utilized to establish a protein - protein interaction (PPI) network for the RA - RF related DEGs that had been previously identified17. To enhance the reliability and biological relevance of the network, a confidence score threshold of greater than 0.4 was applied during the construction process.

Functional enrichment analysis

In the research, the R package “clusterProfiler” was utilized to conduct KEGG pathway and Gene Ontology (GO) enrichment analyses on the RA-RF related DEGs18. To ensure the reliability and significance of the results, a P - value threshold of less than 0.05 was adopted, following the application of the Benjamini - Hochberg correction for multiple testing. This correction method was crucial in minimizing the false discovery rate and identifying only those results that were truly biologically relevant. Subsequently, for a more intuitive and comprehensive understanding of the significant findings, the R packages “ggplot2” and “Goplot” were employed to visually represent the data.

Identification of disease-related feature genes

For the further identification of hub RA-RF DEGs, two machine - learning methodologies, namely the least absolute shrinkage and selection operator (LASSO) and the random forest algorithm, were adopted. The LASSO algorithm was applied with the aim of minimizing regression coefficients, thereby eliminating redundant and uncorrelated genes from the analytical process19. This step was crucial in mitigating the risk of overfitting, ensuring that the selected genes were more likely to be truly associated with the RA - RF phenotype and enhancing the overall robustness and accuracy of the gene selection model.

The random forest algorithm, on the other hand, is a highly potent tool for gene selection in microarray analyses20. It is renowned for its robustness, which enables it to effectively handle noisy and high - dimensional datasets. Additionally, it provides accurate measures of variable importance, facilitating the identification of genes that play significant roles in the context of RA - RF. This characteristic is particularly valuable in complex genomic studies where the presence of confounding factors and high data dimensionality pose challenges to traditional analytical methods.

The implementation of the LASSO and random forest algorithms was carried out using the “glmnet” and “randomForest” packages in R, respectively. The hub RA - RF DEGs for both RA and RF were ultimately defined as those genes that were consistently identified by all utilized algorithms.

Construction and evaluation of binary logistic regression (LR) model

The binary LR model was used to find the matching coefficients of hub RA-RF-DEGs and compute the risk score for each sample. The nomogram was constructed using the “rms” program.To comprehensively assess the discriminatory power of the scores derived from the nomogram, a series of statistical approaches were implemented. Decision Curve Analysis (DCA) was conducted to evaluate the net benefit of the model across different threshold probabilities, providing insights into its practicality in clinical decision-making. Calibration curve analysis was carried out to examine the agreement between the predicted and observed values, thereby gauging the calibration accuracy of the model. Moreover, the Area Under the Curve (AUC) values were calculated from the Receiver Operating Characteristic (ROC) curves to quantitatively measure the discriminatory ability of the model, with a higher AUC value indicating a better performance.

Assessment of immune cell infiltration in RF

The single-sample gene set enrichment analysis (ssGSEA) was employed to evaluate the infiltration of immune cells in both RF and control samples within the GSE76882 dataset.

Gene set enrichment analysis

The functional roles of the hub RA-RF-DEGs were examined through Gene Set Enrichment Analysis (GSEA). This analysis was performed using the “clusterProfiler” package, with the “C2.cp.kegg.v7.0.symbols.gmt” gene set serving as the reference.

Construction of TF-miRNA regulatory networks for hub PRG-DEGs

For the prediction of microRNAs (miRNAs) that target the hub RA-RF-DEGs, the miRTarBase database was employed21. Subsequently, the Enrichr database was utilized to identify the transcription factors (TFs) associated with these hub RA-RF-DEGs, with a significance threshold set at p < 0.05. Following the identification of miRNAs and TFs, miRNA-TF-mRNA networks were constructed. These networks were then visualized using Cytoscape software, facilitating a more intuitive understanding and in-depth analysis of the complex regulatory relationships among these genetic elements.

Results

Identification of DEGs in RA and RF

In this study, we initially conducted a differential expression gene analysis (DEGs). Volcano plots and heatmaps of the differentially expressed genes were constructed for visualization purposes. In the two RA datasets, namely GSE55457 (Fig. 2A, B) and GSE12021 (Fig. 2C, D), we obtained 341 and 253 DEGs, respectively. In the RF dataset GSE76882 (Fig. 2E, F), 315 DEGs were identified.

Fig. 2
figure 2

Identification of the DEGs in RA and RF. (A) A volcano map illustrating the DEGs in RA patients compared to healthy controls (GSE55457). (B) heatmap of the top 25 upregulated and downregulated DEGs (GSE55457). (C) A volcano map illustrating the DEGs in RA patients compared to healthy controls (GSE12021). (B) heatmap of the top 25 upregulated and downregulated DEGs (GSE12021). (E) A volcano map illustrating the DEGs in RF patients compared to healthy controls (GSE76882). (B) heatmap of the top 25 upregulated and downregulated DEGs (GSE76882).

WGCNA uncovers trait-related genes

For the identification of hub module genes associated with RA and RF, the WGCNA was employed. In the RA datasets, GSE55457 and GSE12021, the analysis revealed that the optimal soft threshold was determined to be 5 (Fig. 3A, D). Concurrently, as the average connectivity approached zero, an increase in the scale-free topology model fit index was observed, with the signed R2 value = 0.9 threshold. Moreover, by setting minModuleSize to 100, a total of 7 co-expression modules were generated (Fig. 3B, E). Notably, the turquoise module exhibited a significant correlation with RA. Subsequently, through the application of stringent cut-off criteria (|MM| > 0.8 and | GS| > 0.5) (Fig. 3C, F), 126 and 117 genes with high connectivity within the clinically relevant RA-related module were respectively identified as hub module genes for further investigation.

Fig. 3
figure 3

WGCNA. (A, D) Determine the best soft threshold. The soft threshold value of 5 was determined as the optimal choice for constructing a scale-free network based on the position of the redline (R²=0.9).(B, E) Correlations between different modules and clinical traits(RA). The turquoise module has the highest correlation with RA. (C, F) Scatterplot of correlations between gene significance (GS) > 0.5 and module membership (MM) > 0.8 in the blue module. ABC corresponds to the RA dataset GSE55457, while DEF refers to the RA dataset GSE12021.(G) Determine the best soft threshold. The soft threshold value of 15 was determined as the optimal choice for constructing a scale-free network based on the position of the redline (R²=0.9).(H)Correlations between different modules and clinical traits(RF), the blue module has the highest correlation with RF.(I) Scatterplot of correlations between GS > 0.4 and MM > 0.8 in the blue module.

Similarly, in the RF dataset GSE76882, the optimal soft threshold was established at 15 (Fig. 3G), ultimately leading to the generation of 6 co-expression modules. The blue module demonstrated a significant correlation with RF (Fig. 3H). By applying the cut-off criteria (|MM| > 0.8 and |GS| > 0.4) (Fig. 3I), 250 genes with high connectivity within the clinically relevant RF-related module were identified as hub module genes.

Identification of RA-RF-DEGs and functional analysis

The intersection operation was performed on the four gene sets of previously obtained DEGs and WGCNA significant module genes. As a result, a total of 10 RA-RF-DEGs were obtained, namely: PSMB90, BIRC3, GZMA, TAP1, CD8A, CD2, CCL5, CD27, GZMK, and LCK (Fig. 4A). Subsequently, a PPI network was constructed with the parameter of medium confidence set to be greater than 0.4(Fig. 4B). KEGG pathway analysis22 demonstrated that the RA-RF-DEGs were predominantly implicated in pathways such as Primary immunodeficiency, NF-kappa B signaling pathway, and TNF signaling pathway. The top 10 pathways are illustrated in Fig. 4C. GO enrichment analysis further disclosed that these RA-RF-DEGs were remarkably enriched in various aspects, namely biological processes (Fig. 4D), cellular components Fig. 4E), and molecular functions (Fig. 4F). In the figures, the top 10 enriched terms were presented, which mainly encompassed T cell activation, membrane raft, MHC class I protein binding, among others.

Fig. 4
figure 4

PPI and fnctional enrichment analysis of RA-RF-DEGs. (A) A Venn diagram of the DEGs and WGCNA module genes identifying 11 RA-RF-DEGs. (B) PPI network of RA-RF-DEGs. (C) Bubble plots showing the top 10 results of KEGG enrichment analysis22 and GO enrichment analysis with (D) BP, (E) CC, and (F) MF.

Identification of hub RA-RF-DEGs through machine learning

Hub RA-RF-DEGs were screened using machine learning (ML) algorithms. LASSO and Random Forest analyses were respectively conducted on the RA dataset and the RF dataset. Through the LASSO analysis, three characteristic genes were obtained from the GSE55457 dataset, as depicted in Fig. 5A, with an Area Under the Curve (AUC) value of 0.985. Additionally, seven characteristic genes were derived from the GSE76882 dataset, shown in Fig. 5C, with an AUC value of 0.861. In the random forest analysis, four characteristic genes were identified for GSE55457, illustrated in Fig. 5B, with an AUC value of 0.908, and ten characteristic genes were determined for GSE76882, as presented in Fig. 5D, with an AUC value of 0.797.

Fig. 5
figure 5

Machine Learning Screening hub RA-RF-DEGs. (A) LASSO coefficient analysis identifies the optimal lambda marked by vertical dashed lines. This determination is based on five cross-validations of adjustment parameters, and the ROC curve is also provided for evaluation(GSE55457). (B) The relationship between the number of random forest trees and the error rate is explored, along with the ranking of RA-RF-DEGs based on their relative importance(importance > 1), and the ROC curve is also provided for evaluation(RA: GSE55457). (C) LASSO coefficient analysis identifies the optimal lambda marked by vertical dashed lines. This determination is based on five cross-validations of adjustment parameters, and the ROC curve is also provided for evaluation(RF: GSE76882). (D) The relationship between the number of random forest trees and the error rate is explored, along with the ranking of RA-RF-DEGs based on their relative importance(importance > 5), and the ROC curve is also provided for evaluation(RF: GSE76882).

Construction and evaluation of the LR model

Firstly, a venn diagram was employed to identify the intersection of genes derived from the ML methods, leading to the discovery of two hub RA-RF-DEGs, namely BIRC3 and PSMB9 (Fig. 6A). Subsequently, further in-depth analyses were carried out to investigate the differential expression patterns of these two hub RA-RF-DEGs. Across the RF training set (GSE76882), it was observed that the expression levels of the two hub RA-RF-DEGs were significantly elevated in comparison to the control group((Fig. 6B).

Fig. 6
figure 6

A nomogram designed for RF. (A) A Venn diagram of the LASSO and RF features identifying two hub RA-RF-DEGs. Violin plots show hub PRG-DEGs expression in control and RF tissues in (B) training set GSE76882.(C) Risk distribution in the training set between RF patients and healthy controls. (D) A nomogram presents the risk distribution within the training set. (E) shows a calibration curve, which evaluates the predictive accuracy of the nomogram for the training set. (F) employs DCA to assess the clinical utility of the nomogram for the training set. Additionally, ROC curves are utilized to evaluate the diagnostic performance of the LR model in the (G) training set. ***P < 0.001;****P < 0.0001.

Thereafter, the LR model was constructed using the hub RA-RF-DEGs, with the final risk score calculated as (1.3095 × BIRC3)+ (0.1785 × PSMB9). In the training set, the risk scores of RF patients were substantially greater than those of the healthy controls(Fig. 6C). A risk nomogram for RF was devised based on the expression levels of these hub RA-RF-DEGs, aiming to formulate a more clinically relevant risk model(Fig. 6D). By constructing decision curves(Fig. 6E) and clinical calibration curves(Fig. 6F), it became evident that our proposed model possessed high predictive capabilities for RF. The LR model achieved an AUC of 0.829 through 10-fold cross-validation (Fig. 6G).

Additionally, the risk scores of RF patients were significantly higher than those of the healthy controls in the validation set GSE22459(Fig. 7A). The calibration and DCA curves exhibited strong diagnostic prediction ability (Fig. 7B and C), with an AUC value of 0.743 (Fig. 7D). External validation using the GSE89048 dataset revealed that the risk score of RA patients was significantly higher (Fig. 7E). The calibration curves, and DCA further exhibited the robust diagnostic prediction ability of the LR model (Fig. 7F and G).The LR model obtained an AUC value of 0.952 for the validation set (Fig. 7H).

Fig. 7
figure 7

Verification of diagnostic efficacy of model in the test set. (A) Risk distribution of RF and healthy control group in the RF test set. (B) Use calibration curves to evaluate the predictive ability of the model in the RF test set. (C) DCA evaluates the clinical benefits of the model for the RF test set. (D) ROC curve evaluates the diagnostic efficacy of the model in the RF test set. (E) Risk distribution of RA and healthy control group in the RA test set. (F) Use calibration curves to evaluate the predictive ability of the model in the RA test set. (G) DCA evaluates the clinical benefits of the model for the RA test set. (H) ROC curve evaluates the diagnostic efficacy of the model in the RA test set. ***P < 0.001;****P < 0.0001.

The role of hub RA-RF-DEGs in RF immune microenvironment

To explore the disparities in the immune microenvironment between RF patients and healthy controls, we evaluated the correlations among 28 immune cell types, as depicted in Fig. 8A. Our findings demonstrated that RF patients exhibited a higher degree of infiltration of several immune cell subsets, including activated B cells, activated CD4 T cells, activated CD8 T cells, activated dendritic cells, central memory CD4 T cells, effector memory CD8 T cells, and myeloid-derived suppressor cells (MDSC), etc. Subsequently, we further probed into the associations between the hub RA-RF-DEGs and the 28 immune cell types. As illustrated in Fig. 8B, BIRC3 and PSMB9 were significantly correlated with a multitude of immune cells.

Fig. 8
figure 8

The immune characteristics in the RF. (A) Boxplot showing the variations in immune cell distribution in the RF patients compared to controls. (B) Correlation analysis between BIRC3 ,PSMB9 and infiltrating immune cells in RF.*p < 0.05; **p < 0.01; ***p < 0.001.

Single-gene GSEA of the hub RA-RF-DEGs

To explore the underlying mechanisms governing the significance of the expression patterns of RA-RF-DEGs in RF, we implemented a single-gene GSEA. The resultant analysis unveiled six pathways that demonstrated statistically significant positive correlations with the expression of two hub RA-RF-DEGs. More specifically, the expression of BIRC3 was positively associated with pathways including chemokine signaling pathway, cytokine cytokine receptor interaction, hematopoietic ccell lineage, natural killer cell mediated cytotoxicity, and systemic lupus erythematosus (Fig. 9A). Likewise, the expression of PSMB9 exhibited positive correlations with chemokine signaling pathway, hematopoietic cell lineage, systemic lupus erythematosus, and toll like receptor signaling pathway (Fig. 9B). These findings not only provide crucial insights into the molecular underpinnings of RA-RF-DEGs in RF but also offer potential targets for future therapeutic interventions.

Fig. 9
figure 9

GSEA and regulatory networks of hub RA-RF-DEGs. The correlation of (A) BIRC3, (B) PSMB9 with the top six significantly enriched pathways. (C) A diagram representing the miRNA-TF-RNA regulatory network. Pink diamond signifies genes, blue V-shapes represent miRNAs, and green squares denote TFs.

Regulatory networks of hub RA-RF-DEGs

In an endeavor to acquire a more profound understanding of the regulatory mechanisms governing the expression of hub RA-RF-DEGs, we carried out a predictive analysis with a specific focus on miRNAs and TFs associated with these genes. After this analysis, Cytoscape was utilized to construct a visual representation of the intricate regulatory network that emerged. This network was composed of 42 miRNAs, 4 TFs, and 2 genes, affording a comprehensive panorama of the interacting elements in play(Fig. 9C).

Discussion

Studies have indicated that the primary cause of renal insufficiency in patients with rheumatoid arthritis (RA) is the nephrotoxicity associated with RA treatment, such as the use of nonsteroidal anti-inflammatory drugs (NSAIDs) and disease-modifying antirheumatic drugs (DMARDs). Additionally, inadequate control of systemic inflammation may lead to mesangial proliferative glomerulonephritis (GN) or secondary amyloidosis. Given the high prevalence and frequent co-occurrence of RA and renal diseases23 which severely impact patients’ quality of life, early diagnosis of these comorbidities is particularly imperative.

In the present study, we aimed to uncover the potential pathogenic genes underlying the comorbidity of RA and RF-related conditions, with the goal of identifying early diagnostic markers. Utilizing transcriptomic data from the GEO database and employing a combination of bioinformatics analysis and machine learning algorithms, we identified BIRC3 and PSMB9 as hub RA-RF-DEGs. Functional enrichment analysis revealed that these genes are primarily associated with immune responses. Given the role of the immune system in the pathogenesis of both RA and RF, this finding is not entirely unexpected. In rheumatoid arthritis, chronic inflammation driven by aberrant immune responses leads to joint damage, while in RF-related conditions, immune-mediated processes are increasingly recognized as key drivers of fibrosis progression. A Logistic Regression model constructed based on the expression profiles of these genes in RF-related contexts further highlights their significance, demonstrating robust predictive performance for RF risk assessment.

Baculoviral IAP Repeat Containing 3 (BIRC3), which belongs to the conserved IAP protein family, is characterized by the presence of one-to-three N-terminal tandem baculovirus IAP repeats24. Previous investigations have revealed that BIRC3 constitutes a part of a signature gene cluster associated with hypoxia-induced inflammation in glioblastoma multiforme25. Additionally, it has been documented that BIRC3 functions as an inhibitor of non-traditional NF - κB signaling in chronic lymphocytic leukemia26. BIRC3 has been identified as a crucial regulator that suppresses cell death and serves as a significant mediator in inflammatory signaling and immunity27. Numerous studies have underlined the essential role of BIRC3 in a variety of diseases28. For instance, Jiang et al. reported that BIRC3 promoted the growth and dissemination of cancer cells in liver cancer via the NF - κB signaling pathway29. Conversely, tissue samples from deceased patients infected with H7N9 demonstrated a reduction in BIRC3 accompanied by an increase in necroptosis30.

Fibroblast-like synoviocytes (FLS) are recognized as pivotal contributors to the pathogenesis of RA31. These cells are capable of producing substantial quantities of inflammatory mediators, encompassing TNF-α and IL-6, which serve to intensify inflammatory processes within the joint environment32. The elevated expression of the BIRC3 gene and its corresponding protein product, cIAP2, within the context of RA, orchestrates a diverse array of cellular functions, including the modulation of apoptosis, inflammatory signaling pathways, immune response dynamics, mitogen-activated protein kinase (MAPK) signaling cascades, and cell proliferation rates33. These regulatory activities collectively foster the survival of FLS and amplify inflammatory responses. Targeted suppression of BIRC3 expression has been demonstrated to curtail the secretion of inflammatory cytokines by RA-FLS under both resting and inflammatory conditions, while also impeding their proliferative capacity.Despite the promising therapeutic potential of BIRC3 inhibitors in the management of RA, careful consideration must be given to their potential adverse effects. Continued investigation into the intricate molecular mechanisms underlying BIRC3 function, particularly its involvement in cell signaling networks, apoptotic regulation, and immune evasion strategies, is imperative for the identification of novel therapeutic targets and the development of innovative treatment modalities34. However, the role of BIRC3 in kidney diseases, particularly in renal fibrosis, remains poorly understood. In kidney disease, a significant upregulation of BIRC3 expression was observed, which was associated with renal ischemia–reperfusion injury35. Recent research by Chen S et al.36 indicated that Endothelial BIRC3 promotes renal fibrosis through modulating Drp1-mediated mitochondrial fission via the MAPK/PI3K/Akt pathway, suggesting that targeting BIRC3 could offer a promising therapeutic strategy to enhance endothelial cell survival and alleviate the progression of CKD.

PSMB9, a subunit of the proteasome, is intimately associated with antigen processing and immune activation37. A research investigation has indicated that a genetic variation at the rs17587 locus within the PSMB9 gene exhibits an association with RA in the Han Chinese population38. Despite the utilization of diverse bioinformatics methodologies to uncover potential biomarkers for RA39,40 limited literature exists regarding the involvement of PSMB9 in the pathological mechanisms underlying RA. However, recent studies have shed light on the possibility that PSMB9 may serve as a critical factor in the development of various immune-mediated disorders, such as systemic lupus erythematosus, and Parkinson’s disease41,42. Research findings have demonstrated that the deletion of Psmb9 in Apoe− / − mice leads to a reduction in atherosclerotic lesion size, plaque vulnerability, and vascular inflammation. Significantly, the endothelial overexpression-induced atherosclerosis and vascular inflammation, which are augmented by PSMB8-AS1, can be attenuated through knocking out Psmb9. Although currently, there is relatively scant research on PSMB9 in the kidney, its established roles in endothelial cells and vascular inflammation may pave the way for further investigations in the future.

Finally, the construction of the miRNA-TF-mRNA regulatory network centered around the hub RA-RF-DEGs added another layer of complexity and understanding. miRNAs and TFs are known to play pivotal roles in post-transcriptional and transcriptional regulation, respectively. By mapping out this network, we were able to identify potential upstream regulators (miRNAs and TFs) that could control the expression of BIRC3 and PSMB9. This knowledge could be exploited to develop innovative therapeutic strategies, such as using miRNA mimics or inhibitors to modulate the expression of the hub genes and ultimately influence the course of RF in RA patients.

However, several limitations of our study should be acknowledged. First, the transcriptomic data obtained from the GEO database, although a valuable resource, may have inherent biases due to differences in sample collection, processing, and experimental conditions across different studies. Second, while our in silico analyses provided strong hypotheses, further experimental validation, including in vitro and in vivo studies, is essential to confirm the functional roles of BIRC3 and PSMB9 and the regulatory mechanisms we proposed. Third, the complexity of the comorbidity between RA and RF likely involves additional factors and genes that were not captured in our current analysis. Future studies should consider integrating multi-omics data and larger patient cohorts to achieve a more comprehensive understanding.

Conclusion

In conclusion, our study has identified BIRC3 and PSMB9 as hub genes in the comorbidity of RA and RF and provided a framework for understanding their roles in the context of immune responses and fibrotic progression. The insights gained from this research offer promising avenues for future investigations and the development of targeted therapies to improve the management of patients with this challenging comorbidity. Continued efforts to bridge the gap between computational predictions and experimental validations will be crucial in translating these findings into clinical practice.