Machine learning analysis of gene expression profiles of pyroptosis-related differentially expressed genes in ischemic stroke revealed potential targets for drug repurposing

Hei, Changchun; Li, Xiaowen; Wang, Ruochen; Peng, Jiahui; Liu, Ping; Dong, Xialan; Li, P. Andy; Zheng, Weifan; Niu, Jianguo; Yang, Xiao

doi:10.1038/s41598-024-83555-5

Download PDF

Article
Open access
Published: 27 February 2025

Machine learning analysis of gene expression profiles of pyroptosis-related differentially expressed genes in ischemic stroke revealed potential targets for drug repurposing

Changchun Hei¹^na1,
Xiaowen Li¹^na1,
Ruochen Wang²,
Jiahui Peng²,
Ping Liu³,
Xialan Dong⁴,
P. Andy Li⁴,
Weifan Zheng⁴,
Jianguo Niu¹ &
…
Xiao Yang²

Scientific Reports volume 15, Article number: 7035 (2025) Cite this article

1963 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

The relationship between ischemic stroke (IS) and pyroptosis centers on the inflammatory response elicited by cerebral tissue damage during an ischemic stroke event. However, an in-depth mechanistic understanding of their connection remains limited. This study aims to comprehensively analyze the gene expression patterns of pyroptosis-related differentially expressed genes (PRDEGs) by employing integrated IS datasets and machine learning techniques. The primary objective was to develop classification models to identify crucial PRDEGs integral to the ischemic stroke process. Leveraging three distinct machine learning algorithms (LASSO, Random Forest, and Support Vector Machine), models were developed to differentiate between the Control and the IS patient samples. Through this approach, a core set of 10 PRDEGs consistently emerged as significant across all three machine learning models. Subsequent analysis of these genes yielded significant insights into their functional relevance and potential therapeutic approaches. In conclusion, this investigation underscores the pivotal role of pyroptosis pathways in ischemic stroke and identifies pertinent targets for therapeutic development and drug repurposing.

Bioinformatics prediction of function of T-cell exhaustion related genes in ischemic stroke

Article Open access 29 May 2025

Bioinformatics analysis and experimental validation revealed that Paeoniflorigenone effectively mitigates cerebral ischemic stroke by suppressing oxidative stress and inflammation

Article Open access 07 March 2024

Chuanxiong Rhizoma regulates ferroptosis and the immune microenvironment in ischemic stroke through the JAK-STAT3 pathway

Article Open access 28 December 2024

Introduction

Cerebral ischemic stroke (IS) is a significant global health concern, and millions of people suffer from its worldwide each year. The survivors have varying degrees of sequelae, seriously affecting the quality of life of patients and their families^1,2,3. Due to occlusion of intracranial blood vessels during ischemic stroke, brain cells have soon died of apoptosis, necrosis and/or pyroptosis. Other pathophysiological alterations elicited by IS include impairment of neurovascular unit, free radical accumulation, secondary inflammatory responses, ion channel alterations, mitochondrial damage, autophagy, and others, all of which contribute the final cell death^4,5,6,7. Current effective treatment for IS includes thrombolysis or thrombectomy, which only applicable to less than 5% of patients due to their short intervention window. Thus, it has become critically important to understand detailed molecular mechanisms involved in ischemic stroke and identify potential therapeutic targets for drug development.

Pyroptosis is one of the major types of programmed cell death initiated by inflammation⁸. Formation of an inflammasome, composed of Nod-like receptor protein-3 (NLRP3), pro-caspase-1, and apoptosis-related speckle protein (ASC), is a critical step for pyroptosis. The inflammasome activates caspase -1, which cleaves IL-1β, IL-18, and GasderminD (GSDMD)^{9,10,11,12,13}. Cleaved IL-1β and IL-18 further exacerbate inflammatory responses, while the N-terminal fragment of GSDMD translocates to the plasma membrane to induce pore formation and pyroptotic cell death. Alternatively, pyroptosis could also be induced by the non-classical cysteine proteinase-4, 5 and 11-dependent pathway¹⁴ or granzyme-mediated pathways and caspase-3-mediated pathways¹⁵. Pyroptosis plays an essential role in mediating cell death and clinical prognosis of various neurological disorders such as central nervous system infections, autoimmune disorders, neurodegenerative diseases, traumatic brain and spinal cord injuries, cerebral hemorrhagic and ischemic stroke^16,17,18,19. Recently, an increasing number of studies have been conducted to search for pyroptosis-related biomarkers as potential indicators for predicting the development of disease and therapeutic target of drug discovery²⁰.

Several gene expression profiling datasets have been published in the gene expression omnibus (GEO) (www.ncbi.nlm.nih.gov/geo/). These include GSE22255, which consists of 40 samples of gene expression profiles of peripheral blood mononuclear cells from IS patients and age-matched healthy adults; GSE140275 that includes gene expression profiles from blood samples of 3 acute ischemic stroke (AIS) patients and 3 healthy controls; and GSE16561 dataset that consists of 63 whole blood samples from 39 patients with IS and 24 healthy control subjects. In addition, information about pyroptosis-related genes (PRGs) has been made available in the GeneCards database (www.genecards.org/). These datasets have laid a foundation for detailed machine learning analysis to gain insight into genes that might be involved in IS and pyroptosis.

Using machine learning to search for differentially expressed genes in IS patient data set has drawn great interest. Qin et al.²¹ investigated the role of anoikis-related genes (ARGs) in ischemic stroke (IS), which employed machine learning to classify IS samples, identified key ARGs, and constructed diagnostic models. Ren et al.²² discovered inflammation-related biomarkers for ischemic stroke (IS) using machine learning. They identified differentially expressed genes (DEGs) between healthy controls and IS patients. According to Wang et al.²³, immunosuppression-related genes have been identified in ischemic stroke (IS). It identified crosstalk genes linking immunosuppression and IS. Wei et al. has studied microRNA Expression Profile in Ischemic Stroke²⁴. It constructed an IS diagnostic signature using logistic regression, identifying 14 differentially expressed miRNAs. Up to date, there has no report to search for pyroptosis-related differentially expressed genes (PRDEGs) in IS dataset.

Martha et al.²⁵ used machine learning to predict stroke outcomes in acute ischemic stroke patients by analyzing inflammatory gene expression. It identified key genes and subject factors associated with infarct and edema volumes. These genes indicated chemoattraction and proliferation of immune cells, influencing stroke outcomes. The findings suggested that machine learning can help develop prognostic biomarkers for stroke outcomes and understand acute gene expression changes. O’Connell et al.²⁶ used machine learning to validate a stroke-associated gene expression pattern using an independent patient population. It confirmed the robustness and temporal stability of this gene expression pattern as a potential diagnostic marker for stroke.

The aim of this study is to analyze the gene expression pattern of PRDEGs based on an integrated IS dataset using machine-learning methods in order to build predictive models of ischemic stroke samples and identify critically important PRDEGs involved in ischemic stroke. Classification models have been built to distinguish the healthy control samples and the IS patient samples using three machine learning methods (namely, LASSO, Random Forest and Support Vector Machine). As a result, a set of 10 PRDEGs was identified by all three machine learning models. Further analysis of the ten critical genes identified above has revealed the importance of these genes in IS development as well as potential therapeutic targets for IS.

Results

Overall workflow

In this study, we conducted a bioinformatics analysis to explore the expression characteristics of pyroptosis-related genes between normal and ischemic stroke (IS) tissues. Initially, we collected pyroptosis-related genes from literature and databases and combined them with RNA expression data of IS patients from specific GEO datasets. Within these datasets, we identified pyroptosis-related differentially expressed genes (Pyroptosis-Related DEGs).

Next, utilizing a LASSO regression model, we pinpointed a set of significant gene markers. These markers facilitated consensus clustering analysis to determine pyroptosis-associated IS subtypes. A substantial number of DEGs were identified between the different IS subtypes, from which we extracted core hub genes.

Throughout this process, we also conducted analyses of gene expression levels, correlation analysis, and gene ___location on chromosomes. These analyses helped us to understand the biological backdrop of the DEGs and their potential roles in disease progression. Further validation of our findings was ensured through additional validation of the GEO dataset and experimental verification. Moreover, we applied Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis tools to investigate the specific functions of these genes in the pyroptosis pathways and the biological processes involved. Protein–Protein Interaction (PPI) analysis and immune infiltration analysis were also performed to deepen our understanding of the role of DEGs in IS.

Finally, based on the identified hub genes, we screened for potential drugs or compounds for the treatment of ischemic stroke within the Drug Gene Interaction Database (DGIdb). This series of exhaustive analytical steps provided not only new biomarkers for the diagnosis of ischemic stroke but also potential drug targets for the disease’s treatment, thereby hoping to advance the development of therapeutic strategies for IS. The detailed schematic of this flowchart is illustrated in Fig. 1 of the article.

The technical flowchart is shown in Fig. 1.

Ischemic stroke related DEGs

The IS dataset (Fig. 2A,B) was obtained by first removing batch effects from the GSE2225, GSE140275 and GSE16561 IS datasets using the R package sva and then normalizing the combined datasets using the limma package: the expression data in the IS dataset were normalized to remove the batch effect, and the expression matrices were standardized to be consistent between samples. Principal Component Analysis (PCA) was performed on the expression matrix of the dataset to verify the effect of removing the batch effect (Fig. 2C,D), and the results showed that the batch effect in the IS dataset was largely eliminated after batch removal.

Differentially expressed genes (DEGs) were obtained from the IS dataset using the limma package to analyze the differences in gene expression between different subgroups of the IS dataset compared to the control group. The results are as follows: a total of 12,733 DEGs were obtained from the IS dataset, of which 1422 met the threshold of |logFC|> 0 and P.adj < 0.05. Under this threshold, the number of genes that were highly expressed in the IS group (low in the Control group with positive logFC) was 376; the number of genes with low expression in the IS group (high expression in the Control group with negative logFC) was 1046, and the results of the differential analysis of the IS dataset were plotted on a volcano plot (Fig. 3A).

The intersection of all the DEGs and the 372 PRGs obtained from the IS dataset was taken and a total of 32 PRDEGs were obtained and plotted in a Venn diagram (Fig. 3B), namely: AHSA1, APAF1, ATF6, ATG7, BIRC2, BNIP3L, CEBPB, CTSG, DNMT1, ELAVL1, GSK3B, IKBKE, IL18BP, IL32, IRAK3, JUN, LRPPRC, NDUFA13, NFE2L2, NFS1, NLRP3, OSM, PTGS2, SERPINB1, STAT3, TCEA3, TLR2, TNFSF13B, TRAF3, TREM1, UBE2D3, and VIM. The results obtained from Venn diagrams were used to analyze the differential expression of the 32 PRDEGs in different subgroups of the IS dataset and show the specific differential analysis results in a heat map (Fig. 3C). The expression of the 32 PRDEGs in the different subgroups of IS dataset (IS/Control) was significantly different.

The position annotation of 32 PRDEGs was used to analyze the ___location of PRDEGs on human chromosomes (Fig. 3D), which shows that they are mainly distributed on chromosomes 1, 2, 6, 11, 14 and 19, with the most distributed on chromosome 1, with a total of 6 PRDEGs, indicating that these PRDEGs, which are located close to each other on the chromosomes. This indicates that these PRDEGs in close proximity to each other at the genomic level are more closely linked.

Differential expression analysis of PRDEGs

The Wilcoxon rank sum test was used to analyze the differences in expression levels of the 32 PRDEGs between the different subgroups (IS/Control) in the IS dataset, and the results of the expression difference analysis are presented in a subgroup comparison plot (Fig. 4A). As can be seen from the figure, the expression level of the 32 PRDEGs between the IS group (IS) and the control group (Control) was highly statistically significant (P < 0.01), except for JUN and AHSA1, 10 genes (DNMT1, ELAVL1, IKBKE, IL18BP, IL32, LRPPRC, NDUFA13, NFS1, TCEA3 and TRAF3) were all expressed at higher levels in the control group (Control) than in the IS group (IS); while 20 genes (APAF1, ATF6, ATG7, BIRC2, BNIP3L, CEBPB, CTSG GSK3B, IRAK3, NFE2L2, NLRP3, OSM, PTGS2, SERPINB1, STAT3, TLR2, TNFSF13B, TREM1, UBE2D3, VIM) level were expressed at higher levels in the IS group (IS) than in the control group (Control).

The ROC curves of the 32 PRDEGs in the IS and control groups (Control) were then plotted and the results presented (Fig. 4B–I). As shown in Fig. 4, 20 PRDEGs (AHSA1, APAF1, ATF6, ATG7, CEBPB, CTSG, DNMT1, GSK3B, IL18BP, JUN, NFS1, NLRP3, OSM, PTGS2, SERPINB1, TCEA3, TLR2, TRAF3, TREM1, UBE2D3) in the IS expression in the dataset has low accurate for the diagnosis of IS occurrence (0.500 < AUC < 0.700, Fig. 4B–I); whereas the other 12 PRDEGs (BIRC2, BNIP3L, ELAVL1, IKBKE, IL32, IRAK3, LRPPRC, NDUFA13 NFE2L2, STAT3, TNFSF13B, and VIM) expression in the IS dataset had moderate accuracy in the detection of IS occurrence (0.700 < AUC < 0.900, Fig. 4B–I).

Machine learning models based on PRDEGs

To determine the diagnostic value of the 32 PRDEGs in the IS dataset, we first constructed SVM models based on the 32 PRDEGs. The SVM (Support Vector Machine) algorithm should obtain the lowest error rate (Fig. 5A) and the highest accuracy (Fig. 5B). The results showed that the accuracy of the SVM model was highest when the number of genes was 29; and the genes include BIRC2, ELAVL1, UBE2D3, ATF6, JUN, DNMT1, GSK3B, TCEA3, IRAK3, SERPINB1, APAF1, AHSA1, STAT3, BNIP3L, NDUFA13, LRPPRC, CTSG, VIM, OSM, TNFSF13B, ATG7, IL18BP, TRAF3, TREM1, TLR2, NFS1, IKBKE, NFE2L2, CEBPB.

The expression data of the 32 PRDEGs in the IS dataset was also analyzed using the Random Forest algorithm (RandomForest, RF; rf) (Fig. 5C) to determine the predictive value of the 32 PRDEGs in the IS dataset. IncNodePurity (Increase in Node Purity) indicates an increase in node purity. The higher the node purity, the lower the number of impurities contained. The results of the specific analysis were screened for IncNodePurity > 0.5, and 21 diagnostic markers for IS disease were obtained from the 32 PRDEGs (Fig. 5D), namely: BIRC2, NDUFA13, ELAVL1, IL32, TNFSF13B, BNIP3L, NFE2L2, IRAK3, DNMT1, VIM, JUN, CTSG, STAT3, CEBPB, LRPPRC, IKBKE, PTGS2, NLRP3, GSK3B, ATF6, and TRAF3.

Finally, we performed a logistic regression analysis and constructed a logistic regression model based on the 32 PRDEGs. A total of 32 PRDEGs were included in the logistic regression model, and the expression profiles of the 32 PRDEGs were visualized in a Forest Plot (Fig. 5E). The PRDEGs-based predictive model was then constructed by LASSO (Least Absolute Shrinkage and Selection Operator) regression analysis, and the results were visualized by plotting the LASSO regression model (Fig. 5F) and the LASSO variable trajectories (Fig. 5G). The results show that the model contains a total of 12 PRDEGs, namely: APAF1, ATF6, BIRC2, BNIP3L, CTSG, ELAVL1, GSK3B, IKBKE, NDUFA13, OSM, STAT3, and VIM. The Nomogram demonstrates the contribution of the 12 PRDEGs to the diagnostic model (Fig. 5H).

To determine the accuracy and discriminatory power of the PRDEGs-based model, calibration curves (Calibration Curve) were plotted using Calibration analysis to assess the predictive effectiveness of the model for actual outcomes based on the fit of the optimal theoretical probability (solid line) to the probability predicted by the model (dashed line) under different scenarios (Fig. 5I). In addition, decision curve analysis (DCA) was used to assess the usefulness of the model in terms of the clinical utility and to present the results (Fig. 5J). When the model’s line was stable over a range above All positive and All negative, the larger the range, the greater the net gain and the more effective the model is. It is clear from the results (Fig. 5I,J) that the model constructed in this experiment has a high accuracy in predicting the occurrence of IS over Control.

The PRDEGs from the LASSO regression model, the PRDEGs from the SVM model and those from the random forest model were intersected to obtain a total of 10 PRDEGs and plotted in a Venn diagram (Fig. 6A), which were: ATF6, BIRC2, BNIP3L, CTSG, ELAVL1, GSK3B, IKBKE, NDUFA13, STAT3, and VIM. A chord plot was then drawn to show the correlation between the 10 Common PRDEGs based on their specific expression in the IS dataset (Fig. 6B), which showed that the 10 Common PRDEGs were mostly positively correlated with each other, i.e. the PRDEGs have mostly positive interactions with each other.

Further correlation analysis was performed on the expression of the 10 Common PRDEGs in the IS dataset and the results were presented by correlation pheatmap (Fig. 6C), which showed that most of the correlations between the 10 Common PRDEGs in the IS dataset were statistically significant (P < 0.05), and then the results were presented by plotting correlation scatter plots separately for each of the three pairs with the highest and lowest correlations (Fig. 6D–I), with the three pairs with the highest positive correlations GSK3B and STAT3 (Fig. 6D, r = 0.606, P < 0.001); ATF6 and GSK3B (Fig. 6E, r = 0.564, P < 0.001); IKBKE and NDUFA13 (Fig. 6F, r = 0.500, P < 0.001) and the three most negatively correlated gene pairs: BNIP3L and IKBKE (Fig. 6G, r = − 0.566, P < 0.001); ELAVL1 and STAT3 (Fig. 6H, r = − 0.403, P < 0.001); BIRC2 and NDUFA13 (Fig. 6I, r = − 0.396, P < 0.001).

Analysis of differences between high and low-risk groups and functional enrichment analysis (GO)

To analyze the differences in gene expression in the IS samples in the high and low-risk groups of the PRDEGs-based diagnostic model, we used the limma package for differential analysis of the IS dataset to obtain DEGs between the different subgroups (High/Low-Riskscore). A total of 12,733 DEGs were obtained from the IS dataset, of which 12 genes met the threshold of |logFC|> 0.5 and P.adj < 0.05. Under this threshold, 9 genes were highly expressed in the High-Riskscore group (low expression in the Low-Riskscore group with positive logFC and up-regulated genes), and they are PF4V1, LY96, PROS1, GNG11, SH3BGRL2, PPBP, RGS18, IFNB1, and XK; and three genes have low expression in the High-Riskscore group (high expression in the Low-Riskscore group with logFC is negative), and they are GZMK, VPREB3, and HLA-DQA1. The results of the differential analysis of the IS dataset were plotted on a volcano map (Fig. 7A).

The expression of the 12 DEGs (PF4V1, LY96, PROS1, GNG11, SH3BGRL2, PPBP, RGS18, IFNB1, XK, GZMK, VPREB3, HLA-DQA1) in different subgroups (High/Low-Riskscore) of the IS dataset was analyzed and the results of the differential analysis were plotted according to the results obtained from the volcano plot (Fig. 7B). The differential expression profiles were analyzed, and heat maps were plotted to show the results of the differential analysis (Fig. 7B), from which it can be seen that the 12 DEGs had significantly different expression profiles in different subgroups (High/Low-Riskscore) (Fig. 7B).

To analyze the relationship between the biological processes, molecular functions, cellular components and biological pathways of the 12 DEGs and IS disease, a GO (Gene Ontology) gene function enrichment analysis (Table 1) was performed on the 12 DEGs. The screening criteria for enrichment entries were P < 0.05 and FDR values (q.value) < 0.05, which were considered statistically significant. The results showed that the 12 DEGs in IS disease were mainly enriched in humoral immune response, leukocyte migration, blood coagulation, chemokine-mediated signaling pathway, response to chemokine, neutrophil chemotaxis, and other biological processes, as well as cellular components such as platelet alpha granule lumen and platelet alpha granule. These 12 DEGs were also enriched in cytokine activity, cytokine receptor binding, receptor ligand activity, chemokine activity, chemokine receptor binding, G-protein-coupled receptor binding and other molecular functions (MF). The results of GO function enrichment analysis are presented as bubble plots (Fig. 7C). In addition, the results were also presented as a network diagram (Fig. 7D). A joint logFC/GO enrichment analysis of these 12 DEGs was then performed, which is based on the enrichment analysis by providing the logFC values of the DEGs and calculating the corresponding z-score for each gene. The results were presented as bubble plots (Fig. 7E), from which it can be seen that the GO enrichment analysis results of the 12 DEGs were mainly concentrated in the BP pathway.

Table 1 GO enrichment analysis results of IS dataset pyroptosis model high and low risk group DEGs.

Full size table

Pathway enrichment (KEGG) analysis of DEGs in high and low-risk groups

The 12 DEGs were analyzed for KEGG enrichment (Table 2) and the screening criteria of P < 0.05 and FDR value (q.value) < 0.05 were considered statistically significant. The results showed that the 12 DEGs were significantly enriched in chemokine signaling pathway, cytokine-cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, Toll-like receptor signaling pathway, and toxoplasmosis in a total of five KEGG pathways (Fig. 8A). In addition, the pathway enrichment (KEGG) analysis was presented in the form of a chord diagram (Fig. 9B), and the pathway maps (Fig. 8C–G) were plotted and drawn by the R package Pathview to show KEGG enrichment in the Chemokine signaling pathway (Fig. 8C), Cytokine-cytokine receptor interaction (Fig. 8D), Viral protein interaction with cytokine and cytokine receptor (Fig. 8E), Toll-like receptor signaling pathway (Fig. 8F), and Toxoplasmosis (Fig. 8G) gene expression in the five KEGG pathways.

Table 2 KEGG enrichment analysis results of IS dataset Pyroptosis Model high and low risk group DEGs.

Full size table

GSEA and GSVA enrichment analysis of the high and low-risk groups of the IS dataset

To determine the effect of the expression levels of metabolism-related DEGs in ischaemic stroke (IS) on the occurrence of IS, the entire gene expression and involvement of DEGs in the IS dataset was analyzed by GSEA between high and low-risk groups of the PRDEGs-based diagnostic model. The association between gene expression and biologic processes, cellular components affected, and molecular functions performed was analyzed by GSEA, with P < 0.05 and FDR value (q.value) < 0.05 as significant enrichment screening criteria.

The results show significant enrichment of genes between high and low-risk groups in the PRDEGs-based diagnostic model include Brain Hcp With H3k27me3 (Fig. 9B), Neuroactive Ligand Receptor Interaction (Fig. 9C), Regulation Of Autophagy (Fig. 9D), SARS Coronavirus and Innate Immunity (Fig. 9E), Interferon Alpha Beta Signaling (Fig. 10F), Il2 Stat5 Pathway (Fig. 9G) and other pathways (Fig. 9A–G, Table 3).

Table 3 GSEA enrichment analysis results of IS dataset Pyroptosis Model high and low risk group genes.

Full size table

GSVA (Gene Set Variation Analysis) analysis was performed on all genes in the IS dataset between the high and low-risk groups to explore the variability of the c2.cp.all.v2022.1.Hs.symbols.gmt gene set between the high and low-risk groups of the IS dataset, with P < 0.05 and |logFC|> 0.35 as significant enrichment screening criteria (Fig. 10, Table 4). GSVA of all genes between high and low-risk groups in the IS dataset showed Gdp Fucose Biosynthesis, Synthesis Of Dolichyl Phosphate, Cytosine Methylation, Formation Of Xylulose 5 Phosphate, Beta Oxidation Of Butanoyl Coa To Acetyl Coa, Aurka Targets, Wax And Plasmalogen Biosynthesis, Cerebral Organic Acidurias Including Diseases, Chylomicron Clearance, Diseases Of Base Excision Repair, Mirna Biogenesis, Synthesis Of Ketone Bodies, Runx1 Regulates Transcription Of Genes Involved In Wnt Signaling, Melanin Biosynthesis, Aml Methylation Cluster 7 Dn, Cohesin Loading Onto Chromatin, Covid19 Thrombosis And Anticoagulation, Regulation Of Gene Expression By Hypoxia Inducible Factor, Signaling To p38 Via Rit And Rin a total of 19 gene set pathways showing differences between different subgroups in the IS dataset, with Melanin Biosynthesis, Aml Methylation Cluster 7 Dn, Cohesin Loading Onto Chromatin, Covid19 Thrombosis And Anticoagulation, Regulation Of Gene Expression By Hypoxia Inducible Factor, Signaling To P38 Via Rit And Rin, six pathways had significantly higher enrichment scores in the high-risk group of the IS dataset than in the low-risk group, while the other 13 pathways had significantly higher enrichment scores in the low-risk group of the IS dataset than in the high-risk group. The results of the differential expression of the 19 pathways in the different subgroups of the IS dataset were also analyzed and the results of the specific differential analysis were presented in a heat map (Fig. 10A). The results showed that the 19 pathways were significantly differentially expressed in the different subgroups (High/Low-Riskscore) of the IS dataset. The Mann–Whitney U-test analysis also showed that the expression of the 19 pathways in the IS dataset differed significantly in different subgroups (High/Low-Riskscore), except for the Wax And Plasmalogen Biosynthesis and Covid19 Thrombosis And Anticoagulation did not have statistically significant differences in enrichment scores (P > 0.05); and all other pathways had at least statistically significant differences (P < 0.05).

Table 4 GSVA enrichment analysis results of IS dataset Pyroptosis model high and low risk group genes.

Full size table

8.Construction of protein–protein interaction networks (PPI networks), mRNA-miRNA, mRNA-RBP and mRNA-TF interaction networks.

The STRING database was used to perform protein–protein interaction analysis [minimum required interaction score: low confidence (0.150)] on 12 DEGs (PF4V1, LY96, PROS1, GNG11, SH3BGRL2, PPBP, RGS18, IFNB1, XK, GZMK, VPREB3, HLA-DQA1) between high and low risk groups of the PRDEGs diagnostic model in the IS dataset. protein–protein interaction analysis [minimum required interaction score: low confidence (0.150)], and a protein–protein interaction network (PPI network; Protein–protein interaction network) consisting of 12 DEGs was constructed (Fig. 11A). network) (Fig. 11A), and then the PRDEGs in the PPI network with links to other genes were used as key genes (hub genes) for IS disease, and 11 key genes (hub genes) were obtained, namely: PF4V1, LY96, PROS1, GNG11, PPBP, RGS18 These 11 hub genes were then analysed for functional similarity, and the GO terms, sets of GO terms, gene products and gene clusters were calculated using the R package GOSemSim. The results showed that among the 11 hub genes, VPREB3 had no functional similarity value with other hub genes, and LY96 had the highest functional similarity value with other hub genes. genes had the highest functional similarity values (Fig. 11B).

The mRNA-miRNA data in the miRDB database were used to predict the miRNAs that interacted with 11 key genes, and the mRNA-miRNA interaction network was drawn for visualization (Fig. 11C). mRNA-miRNA interaction network in the sky-blue oval blocks are mRNAs; orange hexagonal blocks are miRNAs. by mRNA-miRNA interaction The mRNA-miRNA interaction network was composed of 9 hub genes and 30 miRNA molecules, constituting a total of 32 pairs of mRNA-miRNA interactions.

The RNA binding protein (RBP) interacting with 11 key genes was predicted using the ENCORI database, and the mRNA-RBP interaction network was mapped for visualization (Fig. 11D). mRNA-RBP interaction network has sky blue circular blocks for mRNA and lavender diamond-shaped dot blocks for RBP. The mRNA-RBP interaction network consisted of 7 hub genes and 68 RBP molecules, constituting a total of 120 pairs of mRNA-RBP interactions, of which the hub genes PROS1 had interactions with 54 RBP molecules.

Finally, the CHIPBase database (version 3.0) and hTFtarget database were used to find transcription factors (TF) that bind to hub genes. The interaction relationships were intersected with 11 hub genes, and the interaction data of 7 hub genes and 70 transcription factors (TF) were obtained and visualized. mRNA-TF interaction network with sky blue oval blocks for mRNA; light green triangular blocks for transcription factors (TF) (Fig. 11E). mRNA-TF interaction network with key The key gene VPREB3 interacted with 56 transcription factors (TF).

9.Analysis of differences in ssgsea immune characteristics between high and low risk groups in the IS dataset dataset PRDEGs diagnostic model.

To investigate the differences in immune infiltration between the high and low risk groups of the IS dataset PRDEGs diagnostic model, the ssGSEA algorithm was used to calculate the abundance of infiltration of 28 immune cells in the samples between the high and low risk groups of the IS dataset, and then the Mann–Whitney U test was used to analyse the differences in infiltration of 28 immune cells between different IS disease subtypes. The degree of difference in the infiltration of 28 immune cells between different IS disease subtypes was then analysed by the Mann–Whitney U test and the results were presented by grouped comparison plots (Fig. 12A). The results showed statistically significant differences (p < 0.05) in the infiltration abundance of 10 immune cells between the high and low risk groups of the PRDEGs diagnostic model of the IS dataset, namely Activated B cell, Activated CD8 T cell, CD56dim natural killer cell Central memory CD4 T cell, Effector memory CD4 T cell, Effector memory CD8 T cell, Immature B cell, MDSC, Memory B cell, T follicular helper cell. The correlation between the infiltration abundance of the 10 immune cell types with statistical differences between the high-risk and low-risk groups was further calculated and the results were presented (Fig. 12B,C), showing that in both subgroups of the IS dataset (Fig. 12C), the infiltration abundance of the 10 immune cell types were mostly positively correlated with each other, and that Effector memory CD4 T cells and Activated CD8 T cells were positively correlated with each other. cells and Activated CD8 T cells (Fig. 12B,C).

The correlation between the abundance of 10 immune cell infiltrates and the expression of 11 key genes (hub genes: PF4V1, LY96, PROS1, GNG11, PPBP, RGS18, IFNB1, XK, GZMK, VPREB3, HLA-DQA1) were also calculated for the low-risk (Fig. 12D) and high-risk (Fig. 12E) patient samples, respectively. The correlations between the amounts, screened at P < 0.05 and the results presented by correlation heat map (Fig. 12D,E), showed some amount of significant correlation between immune cell content and the expression of 11 hub genes in both subgroups, with the hub genes in the low risk group of the IS dataset (Fig. 12D) There was a negative correlation between IFNB1 and the infiltration abundance of most immune cells, while there was a positive correlation for the hub genes VPREB3. In the high-risk group of the IS dataset (Fig. 12E) there was a negative correlation between hub genes IFNB1, LY96 and the infiltration abundance of most immune cells, while there were relatively more significant positive correlations for hub genes VPREB3, HLA-DQA1, and GZMK.

Discussion

Ischemic stroke (IS) is a major global health problem. Despite rapid advances in medical care worldwide, stroke continues to be the leading cause of death and disability²⁷. The disease does not only occur in elderly people, but the chances of developing it in younger people are increasing every year^28,29,30, and the common risk factors are smoking, hypertension, hypercholesterolemia hyperglycemia. Similarly, stroke is a dangerous disease in the life and health of children and can even lead to serious neurological deficits³¹. Up to date, there is no reliable molecular biomarkers could be used in clinical practice to accurately predict IS prognosis.

Pyroptosis, an inflammation-caused type of cell death, is thought to be one of the major types of brain cell death after IS and factor that affects the progression of IS³². Suppression of pyroptosis could promote neuronal survival and ameliorate neurological damage, thereby improving the prognosis and reducing mortality in IS patients^33,34,35. As brain tissue biopsy has not been well accepted by the patients and their relatives, studying the expression pattern of PRDEGs in the blood samples of IS patients deposited to IS dataset may provide a tool to predict the IS prognosis and targets for drug development.

Our study encompassed a comprehensive machine learning analysis of both the Control and the ischemic stroke (IS) samples. We first collected and combined three IS expression profile datasets into one integrated IS dataset consisting of 62 IS patients and 47 controls, with a total of 12,733 gene expression data in the blood samples. Among them, 1422 genes met the threshold of |logFC|> 0 and P. adj < 0.05 and of which 376 were highly expressed in IS group. Intersection of PRGs and DEGs further narrowed the number of genes down to 32. Logistic/LASSO regression analysis, random forest and SVM algorithms were performed on each of the 32 PRDEGs to construct accurate predicting models and determine the predicting value of PRDEGs in the dataset. Through a detailed evaluation involving differential gene expression pattern analysis and ROC analysis (Fig. 4), we successfully identified a group of 10 critical PRDEGs (ATF6, BIRC2, BNIP3L, CTSG, ELAVL1, GSK3B, IKBKE, NDUFA13, STAT3, and VIM). These genes participate in diverse cellular processes, and their interconnection with IS is a subject of ongoing investigation. We have delved into some of the implications of these gene associations below and discuss some potential therapeutic interventions for IS.

The differential gene expression analysis of IS samples indicates an up-regulation of ATF6 (Activating Transcription Factor 6) (Fig. 4A). Furthermore, ATF6 demonstrates a moderate level of predictive power for IS against the Control, as evidenced by an AUROC value of 0.67 (Fig. 4B). This observation can be attributed to ATF6’s role as a cellular response mechanism that aids cells in managing stress resulting from the accumulation of misfolded proteins in the endoplasmic reticulum (ER), induced by various cellular stressors including ischemia³⁶. This alignment with existing literature underscores ATF6’s significance in the context of ischemic stroke. The practical implication of this discovery lies in the potential utility of ATF6-activating compounds to mitigate ischemic stroke-related cellular stress. Notably, compounds such as BiX (1-(3,4-dihydroxyphenyl)-2-thiocyanate-ethanone) and compound 147, as mentioned in reference³⁶, have demonstrated the ability to activate ATF6, offering a foundation for the development of novel pharmacological therapies for ischemic stroke.

The analysis also revealed an up-regulation of BIRC2 (Baculoviral IAP Repeat Containing 2) in the IS group relative to the Control group, supported by a P-value of less than 0.001 (Fig. 4A). BIRC2 is a member of the inhibitor of apoptosis (IAP) family³⁷. It plays a role in regulating cysteine aspartic enzyme pathway, inhibiting apoptosis and promoting cell cycle progression^38,39,40. In cerebral ischemia, suppression of BIRC2 leads to exacerbated neurological functional deficits and increased apoptosis and infarct volume³⁷. In contrast, upregulation of BIRC2 provides protection against type 2 diabetes and hypercholesterolemia, both of which are risk factors for inducing and aggravating ischemic stroke^41,42,43. Remarkably, BIRC2 emerges as a potent diagnostic discriminator, as evidenced by an area under the ROC curve (AUROC) value of 0.79 (Fig. 4C), effectively distinguishing IS samples from the Controls. This finding mirrors a recent study by Zhang et al.⁴⁴, aligning with our investigation and providing valuable insights for exploring the underlying mechanisms of ischemic stroke, as well as fostering the development of innovative diagnostic and therapeutic avenues.

BNIP3L (BCL2 Interacting Protein 3 Like), which is a mitochondrial autophagy receptor^45,46, plays a neuroprotective role in IS^47,48. The up-regulation of BNIP3L identified in our analysis (Fig. 4A) holds particular significance due to its involvement in activating mitophagy, a process that confers protective effects against IS damage. The AUROC value of 0.701 (Fig. 4C) underscores BNIP3L’s efficacy as a diagnostic marker, illuminating its role as a response mechanism to IS, aimed at safeguarding the ischemic brain region. Corroborating this, research by Li et al.⁴⁹ indicates that BNIP3L knockout mice exhibit larger brain infarct areas and exacerbated neurological deficits, which can be ameliorated through BNIP3L overexpression. Consequently, compounds capable of stimulating BNIP3L expression offer promise as therapeutic agents against IS. For instance, hydroxamic acid-based histone deacetylase (HDAC) inhibitors have been identified as inducers of BNIP3L expression⁵⁰, warranting exploration as potential small molecule drug candidates for this purpose.

Cathepsin G (CTSG) is a serine protease of the trypsin C family⁵¹ that is released directly at the site of inflammation, causing platelet secretion and aggregation, and plays an important role in cell signaling, extracellular matrix degradation and cytokine processing, and could be pro-inflammatory or promote thrombosis when platelets interact with neutrophils^52,53,54. Our analysis further indicates an up-regulation of CTSG (Cathepsin G) in the context of ischemic stroke (IS) compared to the Control group (Fig. 4A). Notably, this up-regulation is statistically significant, denoted by a P-value of 0.01. Moreover, the area under the ROC curve (AUROC) value of 0.66 (Fig. 4C) underscores CTSG’s diagnostic potential for distinguishing IS samples from Controls. This observation assumes significance given CTSG’s implication as an inflammation-related gene product. Building upon prior research, which demonstrated that knockdown of CTSG inhibits apoptosis^55,56 and that the reduction of myocyte death post cardiac ischemia and reperfusion upon CTSG inhibition⁵⁷, the prospect of CTSG inhibitors emerges as a promising avenue for potential therapeutic intervention against IS. A pertinent example is the Cathepsin G inhibitor Bis-Napthyl Beta-Ketophosphonic Acid (DrugBank ID: DB02360), which holds potential as a small molecule drug candidate for ischemic stroke.

Within our analysis, ELAVL1 exhibits down-regulation in the IS group relative to the Control group (Figs. 4A and D). This modulation is hypothesized to be a response mechanism to mitigate neuronal impairment in cerebral ischemia/reperfusion events. Support for this notion is derived from a study by Du et al.⁵⁸, which underscores the role of ELAVL1 in ferroptosis-induced neuronal protection. Although explicit ELAVL1-downregulating compounds are not yet established, ongoing research aims to identify small molecules capable of disrupting the interaction between ELAVL1 and its target mRNAs. Such molecules have the potential to modulate the stabilizing influence of ELAVL1 on mRNA molecules, thereby influencing changes in gene expression⁵⁹.

GSK3B (glycogen synthase kinase 3 beta) exhibits up-regulation in the IS samples based on our analysis (Fig. 4A). The AUROC value of 0.65 (Fig. 4D) highlights GSK3B’s robust discriminatory power in separating the IS samples from the Control samples. Research by Li et al.⁶⁰ underscores the potential of GSK3 inhibition in reducing cerebral ischemia/reperfusion injury. In light of this, GSK3B inhibitors emerge as a promising avenue for the treatment of ischemic stroke. For instance, Tideglusib, a chemical inhibitor of GSK3B, has demonstrated a capacity to attenuate hypoxic-ischemic brain injury in neonatal mice⁶¹. This supports the potential utility of GSK3B inhibitors, such as Tideglusib, as candidates for consideration in the context of therapeutic interventions for IS patients.

IKBKE (inhibitor of nuclear factor kappa B kinase subunit epsilon) is a gene involved in pyroptosis during the ischemic stroke (IS). Its expression is down-regulated compared to the Control group (Fig. 4A). The area under the ROC curve (AUROC) is 0.72 (Fig. 4D), indicating a moderate level of statistical power in distinguishing IS samples from Control samples. Given the context-dependent nature of IKBKE’s function⁶², it is suggested, based on our analysis, that further in vitro and in vivo tests should be conducted with various modulators of this target to identify potential drug candidates relevant for treating IS.

STAT3 (Signal transducer and activator of transcription 3) (Fig. 4A) is another factor differentially expressed, with an AUROC of 0.705, suggesting a moderate level of power in distinguishing the IS samples from the Control samples. Studies have highlighted the essential role of STAT3 in maintaining cerebrovascular integrity and its significance in providing protection against cerebral ischemia. Notably, endothelial STAT3 could serve as a potential therapeutic target for mitigating endothelial dysfunction post-stroke⁶³.

NDUFA13 (NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit 13) is down-regulated (Fig. 4A), with an AUROC of 0.79 (Fig. 4F). Notably, studies have shown that a moderate down-regulation of NDUFA13 is associated with suppressed superoxide burst and reduced infarct size during the ischemia–reperfusion process⁶⁴, highlighting the significant role that NDUFA13 plays in ischemic stroke.

Our analysis demonstrates an upregulation of VIM (Vimentin) (Fig. 4A), with an AUROC of 0.704 (Fig. 4I). VIM is a cytoskeletal intermediate filament protein^65,66 that maintains the capacity for tissue repair and tissue integrity^67,68. VIM, a protein coding gene, recruits inflammatory cells from the bloodstream and enhances inflammatory responses. It impairs vascular function by regulating the vascular tone and affects blood vessel remodeling⁶⁷, which could exacerbate stroke progression and hinder the recovery. Interestingly, circulating VIM is positively associated with prediction of future IS incidence in a population–based cohort study⁶⁹. Furthermore, VIM is associated with stroke risk factors with a high expression in diabetic pancreas islet cells^70,71 and mediates hyperglycemia caused disperse of Golgi apparatus⁷². VIM deficiency prevents high-fat diet-induced obesity and insulin resistance in mice⁷³. These results suggest that VIM is closely associated with IS risk factors and IS itself. And furthermore, Fasipe et al.⁷⁴ found that extracellular VIM significantly promotes the formation of von Willebrand factor (VWF) strings through A2 ___domain binding and pharmacologically targeting VIM/VWF interactions promotes improvements in reperfusion after IS. These studies demonstrate the critical role of VIM in stroke pathology and further investigation is necessary to confirm whether inhibiting VIM could contribute to the recovery of IS patients from ischemic brain injury.

By using |logFC|> 0.5 and P.adj < 0.05 as screening criteria, a total of 12 DEGs (PF4V1, LY96, PROS1, GNG11, SH3BGRL2, PPBP, RGS18, IFNB1, XK, GZMK, VPREB3, HLA-DQA1) with significant differences between high and low risk groups of the IS dataset were screened. To understand the potential roles, pathways and biological properties of the 12 DEGs in the high and low-risk groups in the IS dataset, KEGG, GO, GSEA and GSVA enrichment analyses were performed in this experiment. KEGG results showed that these 12 genes were mainly enriched in the five KEGG pathways of Chemokine signaling pathway, Cytokine-cytokine receptor interaction, Viral protein interaction with cytokine and cytokine receptor, Toll-like receptor signaling pathway, Toxoplasmosis.

GO enrichment analysis showed that these genes were mainly enriched in BP pathway such as humoral immune response, leukocyte migration, blood coagulation, MF such as cytokine activity, cytokine receptor binding, receptor ligand activity and CC such as platelet alpha granule lumen, platelet alpha granule. Taken together, the results of these enrichment analyses allow an assessment of the potential value of PRDEGs in disease diagnosis. Numerous studies have shown that pyroptosis is also involved in immunity, this and the result of the analysis of the GO in our study and KEGG is consistent. Scorching leads to a range of inflammatory and immune responses including cell swelling, plasma membrane lysis, and chromatin rupture^75,76. Evidence suggests that ischemic stroke patients frequently exhibit pronounced immune-inflammatory responses^77,78. The immune system has been shown to be involved in all phases of ischemic stroke, from the early damage events that have occurred with arterial occlusion to later tissue repair, and the damaged brain in turn exerts immunosuppressive effects that promote fatal disturbances and dangerous post-stroke patient survival^79,80.

Materials and methods

Data sources and pre-processing

We downloaded the expression profile datasets GSE22255⁸¹, GSE140275⁸² and GSE16561⁸³ for patients with IS from the GEO database⁸⁴ using the R package GEOquery⁸⁵. All datasets were obtained from Homo sapiens.

The GSE22255 dataset consists of 40 samples of gene expression profiles of peripheral blood mononuclear cells (PBMCs) from IS patients and age-matched healthy adults: 20 IS patients and 20 age-matched healthy adults. The GSE140275 dataset includes gene expression profiles from blood samples of 3 acute ischemic stroke (AIS) patients and 3 healthy controls. The GSE16561 dataset consists of 63 samples of GEP data from whole blood samples from 39 patients with IS and 24 healthy control subjects.

The data platforms for GSE22255, GSE140275, and GSE16561 are GPL570 (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array, GPL16791 Illumina HiSeq 2500 (Homo sapiens), and GPL6883 Illumina HumanRef-8 v3.0 expression beadchip, respectively. The probe annotation for these datasets was done using the corresponding GPL platform files. All samples from these datasets were included for further analysis. Please refer to Tables 5,6 for specific dataset information.

Table 5 Ischemic stroke data set information list.

Full size table

Table 6 The clinical characteristics of datasets.

Full size table

We collected Pyroptosis-Related Genes (PRGs) from the GeneCards database⁸⁶ (https://www.genecards.org/), which provides comprehensive information on human genes. A total of 372 PRGs at the mRNA level were obtained using the term “pyroptosis” as a search term and “protein coding” as a filtering criterion. In addition, we used MSigDB (Molecular Signatures Database)⁸⁷, with "pyroptosis " as the search term, to find 27 genes from the REACTOME_PYROPTOSIS.v2022.1.Hs reference gene set. After merging and de-duplication, we obtained a total of 372 PRGs as described in Table S1.

Differentially expressed genes associated with ischemic stroke

To identify the potential mechanisms, biological features, and pathways of differential genes in IS, the datasets GSE22255, GSE140275 and GSE16561 were combined to remove the batch effect, and then the three datasets were standardized to obtain the IS dataset. The expression matrix of the dataset before and after removal of the batch effect was subjected to principal component analysis (PCA)⁸⁸. The effect of removing the batch effect was validated.

Differential analysis was then performed on the IS dataset using the LIMMA package, and DEGs (differentially expressed genes) were selected for further study using the following criteria: |logFC|> 0 and P.adj < 0.05. Among these DEGs, genes with logFC > 0 and P.adj < 0.05 were considered upregulated DEGs, while genes with logFC < 0 and P.adj < 0.05 were considered downregulated DEGs.

The DEGs were intersected with the 372 Pyroptosis-Related Genes (PRGs) and a Venn diagram was used to obtain the IS disease-related Pyroptosis-Related Differentially Expressed Genes (PRDEGs) (Table 7). The results of the analysis are presented in a volcano map and a heat map.

Table 7 IS dataset pyroptosis related differentially expressed genes list.

Full size table

Constructing predictive models for ischemic stroke based on PRDEGs

Logistic regression analysis of the integrated IS dataset was performed to obtain a predictive model based on the PRDEGs. Logistic regression models were constructed by screening PRDEGs with a P-value < 0.10, and the expression of the PRDEGs included in the logistic regression models was presented in a Forest Plot.

Next, we applied LASSO (Least Absolute Shrinkage and Selection Operator) regression analysis to the PRDEGs included in the above logistic regression model using the R package glmnet⁸⁹. The parameters set. seed (500) and family = “binomial” were used, and the analysis was run for 200 cycles. LASSO regression analysis, based on linear regression, reduces model overfitting and improves model generalization by adding a penalty term (lambda × absolute value of slope). The results of the LASSO regression analysis were plotted on a Nomogram by R package rms⁹⁰.

RandomForest (RF)⁹¹ is an algorithm that integrates multiple decision trees through ensemble learning and belongs to the bagging (bootstrap aggregation) category of ensemble algorithms. By constructing multiple decision trees, it predicts a sample by aggregating the predictions from each tree and selecting the final result through voting. We used the RandomForest package⁹² to perform the model construction with the expression matrix of PRDEGs in the IS dataset. Parameters set. seed (234) and ntree = 1000.

Furthermore, we constructed an SVM (Support Vector Machine)⁹³ model based on the Pyroptosis-Related DEGs (PRDEGs) and selected those PRDEGs based on the highest accuracy and lowest error rate. The PRDEGs included in the LASSO regression model, SVM model, and Random Forest model were intersected, and a Venn diagram was plotted to obtain the common PRDEGs.

Finally, we evaluated the accuracy and discriminative power of the PRDEGs diagnostic model using calibration analysis and plotted a calibration curve based on the pyroptosis-related hub genes. Decision Curve Analysis (DCA)⁹⁴ (a simple method for evaluating clinical prediction models, diagnostic tests, and molecular markers) was used to assess the accuracy and discriminative power of the PRDEGs diagnostic model. The DCA plot evaluating the PRDEGs diagnostic model was generated using the R package ggDCA based on the pyroptosis-related DEGs (PRDEGs).

Differentially expressed genes associated with high and low-risk groups in a diagnostic model of PRDEGs

To identify the potential mechanisms, biological features, and pathways associated with differentially expressed genes (DEGs) in the high-risk and low-risk groups of the IS dataset PRDEGs diagnostic model, we performed differential analysis using the LIMMA package on the IS dataset. DEGs were selected based on the criteria of |logFC|> 0.5 and P < 0.05 for comparisons between different groups (High vs. Low). These DEGs, meeting the criteria, were further investigated in our study. Among them, genes with logFC > 0.5 and P < 0.05 were considered up-regulated DEGs, while genes with logFC < -0.5 and P < 0.05 were considered down-regulated DEGs.

Gene function enrichment analysis (GO) and pathway enrichment (KEGG) analysis

Gene Ontology (GO) analysis⁹⁵ is a commonly used method for large-scale functional enrichment studies, including biological processes (BP), molecular functions (MF), and cellular components (CC). Kyoto Encyclopedia of Genes and Genomes (KEGG)⁹⁶ is a widely used database that stores information on genomes, biological pathways, diseases, drugs, and more. We used the clusterProfiler package⁹⁷ in R to perform GO annotation analysis on the DEGs. The criteria for selecting enriched terms were P.adj < 0.05 and false discovery rate (FDR) value (q-value) < 0.05, indicating statistical significance. The Benjamini-Hochberg (BH) method was used for P correction.

Gene set enrichment analysis and gene set variation analysis

GSEA (Gene Set Enrichment Analysis)⁹⁸ is a computational method used to determine whether a set of pre-defined genes show statistical differences between two biological states and is commonly used to estimate changes in pathway and biological process activity in samples from expression datasets. In this study, we first divided the genes in the high and low-risk groups of the PRDEGs diagnostic model in the IS dataset into two groups of high and low phenotypic correlation based on phenotypic correlation ranking, and then we used the clusterProfiler package to perform an enrichment analysis of all genes in the two groups of high and low phenotypic correlation. The parameters used in this GSEA were as follows: seed of 2022, calculated number of times was 10,000, each gene set contained at least 10 genes and at most 500 genes, and the P correction method was Benjamini-Hochberg (BH). We obtained the "c2.all.v7.5.1.symbols.gmt" gene set from the Molecular Signatures Database (MSigDB). The screening criteria for significant enrichment were P.adj < 0.05 and FDR value (q. value) < 0.05.

GSVA (Gene Set Variation Analysis)⁹⁹, known as Gene Set Variation Analysis, is a non-parametric, unsupervised analysis method that evaluates gene set enrichment in microarray transcriptomes by converting gene expression matrices between samples into gene set expression matrices between samples. This is used to assess whether different pathways are enriched across samples. We obtained the "c2.cp.all.v2022.1.Hs.symbols.gmt" gene set from the MSigDB database and performed GSVA on all genes in the PRDEGs diagnostic model high and low-risk groups in the IS dataset to calculate the functional enrichment of genes in the PRDEGs diagnostic model high and low-risk groups. The functional enrichment differences between the risk-groups were calculated, and P < 0.05 and |logFC|> 0.35 were used as the significant enrichment screening criteria.

Protein–protein interaction networks

The STRING database¹⁰⁰ was used to construct DEGs-related protein–protein interaction networks from DEGs in the high and low risk groups of the filtered IS dataset PRDEGs diagnostic model, visualise the PPI network using Cytoscape¹⁰¹ and include DEGs in the PPI network with links to other nodes and as key genes (hub genes) for IS diseases.

Construction of mRNA-miRNA, mRNA-RBP, and mRNA-TF interactions networks

Using the miRDB database¹⁰² predicted miRNAs interacting with hub genes and found mRNA-miRNA interactions for key genes (hub genes) with a Target Score ≥ 90 on the data section to map mRNA-miRNA interaction networks. The ENCORI database was also used¹⁰³ RNA binding proteins (RBPs) interacting with key genes were predicted and mRNA-RBP pairs were screened using clusterNum > 1 and clipExpNum > 1 as screening criteria and mRNA-RBP interaction networks were mapped. Through the CHIPBase database¹⁰⁴ (version 3.0) and the hTFtarget database¹⁰⁵. Transcription factors (TFs) that bind to key genes (hub genes) were identified and the common parts of both databases were retained. Cytoscape software was then used to visualize mRNA-miRNA, mRNA-RBP, and mRNA-TF interactions networks.

Statistical analysis

All data processing and analysis in this article was based on R software (Version 4.2.2), and continuous variables are presented as mean ± standard deviation. Comparisons between two groups were made using the Wilcoxon rank sum test (i.e., Wilcoxon rank sum test), while comparisons between three or more groups were made using the Kruskal–Wallis test.

Limitation

In this study, we described the gene expression profiles related to pyroptosis in ischemic stroke using machine learning to identify potential targets for drug repurposing. However, owing to the limitations of single-cell sequencing, such as the inability to accurately describe low-expression genes, large sample sizes are required for reliable analysis. Additionally, owing to the lack of corresponding clinical specimen research, it cannot be analyzed in combination with clinical information. We may conduct future research in this direction using multigroup science and space transcriptome technology.

Conclusion

Our comprehensive machine learning analysis of Control and ischemic stroke (IS) samples has yielded valuable insights into the complexity of PRDEGs and their relevance to IS. Through detailed examination of differential gene expression patterns and ROC analysis, we have identified a critical set of 10 PRDEGs that contribute to diverse cellular processes. Our findings underscore the potential interplay between these genes and IS, shedding light on their diagnostic and therapeutic significance.

ATF6’s up-regulation in IS samples reflects its crucial role as a cellular stress-response mechanism. This discovery opens avenues for ATF6-targeted interventions to alleviate IS-induced cellular stress. BIRC2 emerges as a potent diagnostic discriminator with clinical implications, aligning with recent research, and offering insights into IS mechanisms. BNIP3L’s up-regulation and its involvement in mitophagy indicate its protective role against IS damage. This observation emphasizes the prospect of stimulating BNIP3L expression for IS therapeutics. CTSG’s up-regulation and its potential as a diagnostic marker underscore its significance as an inflammation-related gene product. ELAVL1’s down-regulation reflects its potential role in mitigating neuronal impairment, guiding future research into its regulatory mechanisms. GSK3B’s up-regulation holds promise for IS treatment, with Tideglusib as a potential candidate for drug repurposing. IKBKE’s down-regulation and STAT3’s differential expression underscore their roles, warranting further investigations. NDUFA13’s down-regulation implicates its contribution to IS pathophysiology, and VIM’s upregulation opens prospects for potential therapeutic interventions.

In summary, our study provides a comprehensive view of the 10 PRDEGs’ roles in IS, offering insights into potential targets for diagnostic and therapeutic approaches. By unraveling the intricate network of gene expressions associated with IS, this work contributes to the advancement of knowledge in this field, potentially paving the way for novel strategies in managing and treating ischemic stroke.

Data availability

Data and materials will be made available on request. Dataset URL Reference https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22255 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140275 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16561.

References

Rodgers, M. L. et al. Care of the patient with acute ischemic stroke (Endovascular/Intensive Care Unit-Postinterventional Therapy): Update to 2009 comprehensive nursing care scientific statement: A scientific statement from the American Heart Association. Stroke 52, e198–e210. https://doi.org/10.1161/str.0000000000000358 (2021).
Article PubMed Google Scholar
Haupt, M., Gerner, S. T., Bähr, M. & Doeppner, T. R. Neuroprotective strategies for ischemic stroke-future perspectives. Int. J. Mol. Sci. https://doi.org/10.3390/ijms24054334 (2023).
Article PubMed PubMed Central MATH Google Scholar
Gu, C. et al. The PI3K/AKT pathway-the potential key mechanisms of traditional chinese medicine for stroke. Front. Med. (Lausanne) 9, 900809. https://doi.org/10.3389/fmed.2022.900809 (2022).
Article PubMed PubMed Central Google Scholar
Pal, S., Hartnett, K. A., Nerbonne, J. M., Levitan, E. S. & Aizenman, E. Mediation of neuronal apoptosis by Kv2.1-encoded potassium channels. J. Neurosci. 23, 4798–4802. https://doi.org/10.1523/jneurosci.23-12-04798.2003 (2003).
Article CAS PubMed PubMed Central Google Scholar
Besancon, E., Guo, S., Lok, J., Tymianski, M. & Lo, E. H. Beyond NMDA and AMPA glutamate receptors: emerging mechanisms for ionic imbalance and cell death in stroke. Trends Pharmacol. Sci. 29, 268–275. https://doi.org/10.1016/j.tips.2008.02.003 (2008).
Article CAS PubMed Google Scholar
Shen, L. et al. Mitophagy in cerebral ischemia and ischemia/reperfusion injury. Front. Aging Neurosci. 13, 687246. https://doi.org/10.3389/fnagi.2021.687246 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ward, R. et al. NLRP3 inflammasome inhibition with MCC950 improves diabetes-mediated cognitive impairment and vasoneuronal remodeling after ischemia. Pharmacol. Res. 142, 237–250. https://doi.org/10.1016/j.phrs.2019.01.035 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tsuchiya, K. Inflammasome-associated cell death: Pyroptosis, apoptosis, and physiological implications. Microbiol. Immunol. 64, 252–269. https://doi.org/10.1111/1348-0421.12771 (2020).
Article CAS PubMed MATH Google Scholar
Wen, H., Miao, E. A. & Ting, J. P. Mechanisms of NOD-like receptor-associated inflammasome activation. Immunity 39, 432–441. https://doi.org/10.1016/j.immuni.2013.08.037 (2013).
Article CAS PubMed MATH Google Scholar
Vanaja, S. K., Rathinam, V. A. & Fitzgerald, K. A. Mechanisms of inflammasome activation: recent advances and novel insights. Trends Cell Biol. 25, 308–315. https://doi.org/10.1016/j.tcb.2014.12.009 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Hippocampal changes in inflammasomes, apoptosis, and MEMRI after radiation-induced brain injury in juvenile rats. Radiat. Oncol. 15, 78. https://doi.org/10.1186/s13014-020-01525-3 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jin, C. & Flavell, R. A. Molecular mechanism of NLRP3 inflammasome activation. J. Clin. Immunol. 30, 628–631. https://doi.org/10.1007/s10875-010-9440-3 (2010).
Article CAS PubMed MATH Google Scholar
Shi, J., Gao, W. & Shao, F. Pyroptosis: Gasdermin-mediated programmed necrotic cell death. Trends Biochem. Sci. 42, 245–254. https://doi.org/10.1016/j.tibs.2016.10.004 (2017).
Article CAS PubMed Google Scholar
Yu, P. et al. Pyroptosis: mechanisms and diseases. Signal Transduct. Target Ther. 6, 128. https://doi.org/10.1038/s41392-021-00507-5 (2021).
Article PubMed PubMed Central MATH Google Scholar
Jiang, M., Qi, L., Li, L. & Li, Y. The caspase-3/GSDME signal pathway as a switch between apoptosis and pyroptosis in cancer. Cell Death Discov. 6, 112. https://doi.org/10.1038/s41420-020-00349-0 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Sepehrinezhad, A., Gorji, A. & Sahab Negah, S. SARS-CoV-2 may trigger inflammasome and pyroptosis in the central nervous system: a mechanistic view of neurotropism. Inflammopharmacology 29, 1049–1059. https://doi.org/10.1007/s10787-021-00845-4 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, D. et al. Evidence of pyroptosis and ferroptosis extensively involved in autoimmune diseases at the single-cell transcriptome level. J. Transl. Med. 20, 363. https://doi.org/10.1186/s12967-022-03566-6 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Voet, S., Srinivasan, S., Lamkanfi, M. & van Loo, G. Inflammasomes in neuroinflammatory and neurodegenerative diseases. EMBO Mol. Med. https://doi.org/10.15252/emmm.201810248 (2019).
Article PubMed PubMed Central Google Scholar
Hu, X. et al. Emerging role of STING signalling in CNS injury: inflammation, autophagy, necroptosis, ferroptosis and pyroptosis. J. Neuroinflamm. 19, 242. https://doi.org/10.1186/s12974-022-02602-y (2022).
Article CAS Google Scholar
Shi, S. et al. Identification of pyroptosis-related immune signature and drugs for ischemic stroke. Front. Genet. 13, 909482. https://doi.org/10.3389/fgene.2022.909482 (2022).
Article CAS PubMed PubMed Central Google Scholar
Qin, X. et al. Identification of anoikis-related genes classification patterns and immune infiltration characterization in ischemic stroke based on machine learning. Front. Aging Neurosci. 15, 1142163. https://doi.org/10.3389/fnagi.2023.1142163 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ren, P. et al. Diagnostic model constructed by nine inflammation-related genes for diagnosing ischemic stroke and reflecting the condition of immune-related cells. Front. Immunol. 13, 1046966. https://doi.org/10.3389/fimmu.2022.1046966 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. Is immune suppression involved in the ischemic stroke? A study based on computational biology. Front. Aging Neurosci. 14, 830494. https://doi.org/10.3389/fnagi.2022.830494 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Wei, Z. et al. Bioinformatics method combined with logistic regression analysis reveal potentially important miRNAs in ischemic stroke. Biosci. Rep. https://doi.org/10.1042/bsr20201154 (2020).
Martha, S. R. et al. Expression of cytokines and chemokines as predictors of stroke outcomes in acute ischemic stroke. Front. Neurol. 10, 1391. https://doi.org/10.3389/fneur.2019.01391 (2019).
Article PubMed Google Scholar
O’Connell, G. C., Chantler, P. D. & Barr, T. L. Stroke-associated pattern of gene expression previously identified by machine-learning is diagnostically robust in an independent patient population. Genom. Data 14, 47–52. https://doi.org/10.1016/j.gdata.2017.08.006 (2017).
Article PubMed PubMed Central Google Scholar
Feigin, V. L. Global, regional, and national burden of stroke and its risk factors, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet Neurol. 20, 795–820. https://doi.org/10.1016/s1474-4422(21)00252-0 (2021).
Article CAS MATH Google Scholar
Perera, K. S. et al. Evaluating rates of recurrent ischemic stroke among young adults with embolic stroke of undetermined source: The young ESUS longitudinal cohort study. JAMA Neurol. 79, 450–458. https://doi.org/10.1001/jamaneurol.2022.0048 (2022).
Article PubMed PubMed Central Google Scholar
Béjot, Y., Delpont, B. & Giroud, M. Rising stroke incidence in young adults: More epidemiological evidence, more questions to be answered. J. Am. Heart Assoc. https://doi.org/10.1161/jaha.116.003661 (2016).
Article PubMed PubMed Central MATH Google Scholar
Wu, S. et al. Stroke in China: advances and challenges in epidemiology, prevention, and management. Lancet Neurol. 18, 394–405. https://doi.org/10.1016/s1474-4422(18)30500-3 (2019).
Article PubMed MATH Google Scholar
Sporns, P. B. et al. Childhood stroke. Nat. Rev. Dis. Primers 8, 12. https://doi.org/10.1038/s41572-022-00337-x (2022).
Article PubMed Google Scholar
Yang, K. et al. Research progress on pyroptosis-mediated immune-inflammatory response in ischemic stroke and the role of natural plant components as regulator of pyroptosis: A review. Biomed. Pharmacother. 157, 113999. https://doi.org/10.1016/j.biopha.2022.113999 (2023).
Article CAS PubMed Google Scholar
Tang, H. et al. Vagus nerve stimulation alleviated cerebral ischemia and reperfusion injury in rats by inhibiting pyroptosis via α7 nicotinic acetylcholine receptor. Cell Death Discov. 8, 54. https://doi.org/10.1038/s41420-022-00852-6 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wang, M. et al. Decoction ameliorates ischemic stroke injury via suppressing pyroptosis. Front. Pharmacol. 11, 590453. https://doi.org/10.3389/fphar.2020.590453 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ran, Y. et al. Curcumin ameliorates white matter injury after ischemic stroke by inhibiting microglia/macrophage pyroptosis through NF-κB suppression and NLRP3 inflammasome inhibition. Oxid. Med. Cell. Longev. 2021, 1552127. https://doi.org/10.1155/2021/1552127 (2021).
Article CAS PubMed PubMed Central Google Scholar
Glembotski, C. C., Rosarda, J. D. & Wiseman, R. L. Proteostasis and beyond: ATF6 in ischemic disease. Trends Mol. Med. 25, 538–550. https://doi.org/10.1016/j.molmed.2019.03.005 (2019).
Article CAS PubMed PubMed Central Google Scholar
Huang, L. G. et al. MicroRNA-29c correlates with neuroprotection induced by FNS by targeting both Birc2 and Bak1 in rat brain after stroke. CNS Neurosci. Ther. 21, 496–503. https://doi.org/10.1111/cns.12383 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rumble, J. M. & Duckett, C. S. Diverse functions within the IAP family. J. Cell Sci. 121, 3505–3507. https://doi.org/10.1242/jcs.040303 (2008).
Article CAS PubMed MATH Google Scholar
Li, M. et al. BRD7 inhibits enhancer activity and expression of BIRC2 to suppress tumor growth and metastasis in nasopharyngeal carcinoma. Cell Death Dis. 14, 121. https://doi.org/10.1038/s41419-023-05632-3 (2023).
Article CAS PubMed PubMed Central Google Scholar
Beug, S. T., Cheung, H. H., LaCasse, E. C. & Korneluk, R. G. Modulation of immune signalling by inhibitors of apoptosis. Trends Immunol. 33, 535–545. https://doi.org/10.1016/j.it.2012.06.004 (2012).
Article CAS PubMed Google Scholar
Xiong, Y. et al. Enhanced external counterpulsation inhibits endothelial apoptosis via modulation of BIRC2 and Apaf-1 genes in porcine hypercholesterolemia. Int. J. Cardiol. 171, 161–168. https://doi.org/10.1016/j.ijcard.2013.11.033 (2014).
Article PubMed Google Scholar
Jamil, K., Jayaraman, A., Ahmad, J., Joshi, S. & Yerra, S. K. TNF-alpha -308G/A and -238G/A polymorphisms and its protein network associated with type 2 diabetes mellitus. Saudi J. Biol. Sci. 24, 1195–1203. https://doi.org/10.1016/j.sjbs.2016.05.012 (2017).
Article CAS PubMed Google Scholar
Zhao, L. et al. In silico analysis of novel dipeptidyl peptidase-IV inhibitory peptides released from Macadamia integrifolia antimicrobial protein 2 (MiAMP2) and the possible pathways involved in diabetes protection. Curr. Res. Food Sci. 4, 603–611. https://doi.org/10.1016/j.crfs.2021.08.008 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q. et al. Identification of key genes and upstream regulators in ischemic stroke. Brain Behav. 9, e01319. https://doi.org/10.1002/brb3.1319 (2019).
Article PubMed PubMed Central MATH Google Scholar
Yazdankhah, M. et al. BNIP3L-mediated mitophagy is required for mitochondrial remodeling during the differentiation of optic nerve oligodendrocytes. Autophagy 17, 3140–3159. https://doi.org/10.1080/15548627.2020.1871204 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yuan, Y. et al. BNIP3L/NIX-mediated mitophagy protects against ischemic brain injury independent of PARK2. Autophagy 13, 1754–1766. https://doi.org/10.1080/15548627.2017.1357792 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wu, X. et al. BNIP3L/NIX degradation leads to mitophagy deficiency in ischemic brains. Autophagy 17, 1934–1946. https://doi.org/10.1080/15548627.2020.1802089 (2021).
Article CAS PubMed Google Scholar
Li, Y. et al. BNIP3L/NIX-mediated mitophagy: molecular mechanisms and implications for human disease. Cell Death Dis. 13, 14. https://doi.org/10.1038/s41419-021-04469-y (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, J. et al. Targeting neuronal mitophagy in ischemic stroke: an update. Burns Trauma 11, tkad018. https://doi.org/10.1093/burnst/tkad018 (2023).
Article PubMed PubMed Central Google Scholar
Knutson, A. K. et al. Comparative effects of histone deacetylase inhibitors on p53 target gene expression, cell cycle and apoptosis in MCF-7 breast cancer cells. Oncol. Rep. 27, 849–853. https://doi.org/10.3892/or.2011.1590 (2012).
Article CAS PubMed Google Scholar
Aghdassi, A. A. et al. Absence of the neutrophil serine protease cathepsin G decreases neutrophil granulocyte infiltration but does not change the severity of acute pancreatitis. Sci. Rep. 9, 16774. https://doi.org/10.1038/s41598-019-53293-0 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Bank, U. & Ansorge, S. More than destructive: neutrophil-derived serine proteases in cytokine bioactivity control. J. Leukoc. Biol. 69, 197–206 (2001).
Article CAS PubMed Google Scholar
Wiedow, O. & Meyer-Hoffert, U. Neutrophil serine proteases: potential key regulators of cell signalling during inflammation. J. Intern. Med. 257, 319–328. https://doi.org/10.1111/j.1365-2796.2005.01476.x (2005).
Article CAS PubMed Google Scholar
Sambrano, G. R. et al. Cathepsin G activates protease-activated receptor-4 in human platelets. J. Biol. Chem. 275, 6819–6823. https://doi.org/10.1074/jbc.275.10.6819 (2000).
Article CAS PubMed Google Scholar
Chan, S. et al. CTSG suppresses colorectal cancer progression through negative regulation of Akt/mTOR/Bcl2 signaling pathway. Int. J. Biol. Sci. 19, 2220–2233. https://doi.org/10.7150/ijbs.82000 (2023).
Article CAS PubMed PubMed Central Google Scholar
Huang, G. Z. et al. Bioinformatics analyses indicate that cathepsin G (CTSG) is a potential immune-related biomarker in oral squamous cell carcinoma (OSCC). Onco Targets Ther. 14, 1275–1289. https://doi.org/10.2147/ott.S293148 (2021).
Article PubMed PubMed Central Google Scholar
Hooshdaran, B. et al. Dual inhibition of cathepsin G and chymase reduces myocyte death and improves cardiac remodeling after myocardial ischemia reperfusion injury. Basic Res. Cardiol. 112, 62. https://doi.org/10.1007/s00395-017-0652-z (2017).
Article CAS PubMed PubMed Central Google Scholar
Du, Y. et al. Downregulation of ELAVL1 attenuates ferroptosis-induced neuronal impairment in rats with cerebral ischemia/reperfusion via reducing DNMT3B-dependent PINK1 methylation. Metab. Brain Dis. 37, 2763–2775. https://doi.org/10.1007/s11011-022-01080-8 (2022).
Article CAS PubMed Google Scholar
Wu, X. et al. Identification and validation of novel small molecule disruptors of HuR-mRNA interaction. ACS Chem. Biol. 10, 1476–1484. https://doi.org/10.1021/cb500851u (2015).
Article CAS PubMed PubMed Central Google Scholar
Li, Y. et al. Glycogen synthase kinase 3β influences injury following cerebral ischemia/reperfusion in rats. Int. J. Biol. Sci. 12, 518–531. https://doi.org/10.7150/ijbs.13918 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wang, H. et al. Tideglusib, a chemical inhibitor of GSK3β, attenuates hypoxic-ischemic brain injury in neonatal mice. Biochim. Biophys. Acta 1860, 2076–2085. https://doi.org/10.1016/j.bbagen.2016.06.027 (2016).
Article CAS PubMed Google Scholar
Patel, M. N. et al. Hematopoietic IKBKE limits the chronicity of inflammasome priming and metaflammation. Proc. Natl. Acad. Sci. U. S. A. 112, 506–511. https://doi.org/10.1073/pnas.1414536112 (2015).
Article ADS CAS PubMed Google Scholar
Davis, C. M., Lyon-Scott, K., Varlamov, E. V., Zhang, W. H. & Alkayed, N. J. Role of endothelial STAT3 in cerebrovascular function and protection from ischemic brain injury. Int. J. Mol. Sci. https://doi.org/10.3390/ijms232012167 (2022).
Article PubMed PubMed Central Google Scholar
Hu, H. et al. Electron leak from NDUFA13 within mitochondrial complex I attenuates ischemia-reperfusion injury via dimerized STAT3. Proc. Natl. Acad. Sci. U. S. A. 114, 11908–11913. https://doi.org/10.1073/pnas.1704723114 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Paulin, D., Lilienbaum, A., Kardjian, S., Agbulut, O. & Li, Z. Vimentin: Regulation and pathogenesis. Biochimie 197, 96–112. https://doi.org/10.1016/j.biochi.2022.02.003 (2022).
Article CAS PubMed Google Scholar
Thakur, G. K. et al. High resolution based quantitative determination of methylation status of CDH1 and VIM gene in epithelial ovarian cancer. Asian Pac. J. Cancer Prev. 20, 2923–2928. https://doi.org/10.31557/apjcp.2019.20.10.2923 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ridge, K. M., Eriksson, J. E., Pekny, M. & Goldman, R. D. Roles of vimentin in health and disease. Genes Dev. 36, 391–407. https://doi.org/10.1101/gad.349358.122 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Ivaska, J., Pallari, H. M., Nevo, J. & Eriksson, J. E. Novel functions of vimentin in cell adhesion, migration, and signaling. Exp. Cell Res. 313, 2050–2062. https://doi.org/10.1016/j.yexcr.2007.03.040 (2007).
Article CAS PubMed Google Scholar
Xiao, J. et al. Circulating vimentin is associated with future incidence of stroke in a population-based cohort study. Stroke 52, 937–944. https://doi.org/10.1161/strokeaha.120.032111 (2021).
Article CAS PubMed MATH Google Scholar
Roefs, M. M. et al. Increased vimentin in human α- and β-cells in type 2 diabetes. J. Endocrinol. 233, 217–227. https://doi.org/10.1530/joe-16-0588 (2017).
Article CAS PubMed Google Scholar
Vasir, B. et al. Effects of diabetes and hypoxia on gene markers of angiogenesis (HGF, cMET, uPA and uPAR, TGF-alpha, TGF-beta, bFGF and Vimentin) in cultured and transplanted rat islets. Diabetologia 43, 763–772. https://doi.org/10.1007/s001250051374 (2000).
Article CAS PubMed Google Scholar
Li, Y. et al. Inhibition of NLRP3 and Golph3 ameliorates diabetes-induced neuroinflammation in vitro and in vivo. Aging (Albany NY) 14, 8745–8762. https://doi.org/10.18632/aging.204363 (2022).
Article CAS PubMed Google Scholar
Kim, S., Kim, I., Cho, W., Oh, G. T. & Park, Y. M. Vimentin Deficiency prevents high-fat diet-induced obesity and insulin resistance in mice. Diabetes Metab. J. 45, 97–108. https://doi.org/10.4093/dmj.2019.0198 (2021).
Article PubMed MATH Google Scholar
Fasipe, T. A. et al. Extracellular Vimentin/VWF (von Willebrand Factor) interaction contributes to VWF string formation and stroke pathology. Stroke 49, 2536–2540. https://doi.org/10.1161/strokeaha.118.022888 (2018).
Article CAS PubMed PubMed Central Google Scholar
Li, L. et al. Pyroptosis, a new bridge to tumor immunity. Cancer Sci. 112, 3979–3994. https://doi.org/10.1111/cas.15059 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Liu, J. et al. Pyroptosis-related lncRNAs are potential biomarkers for predicting prognoses and immune responses in patients with UCEC. Mol. Ther. Nucleic Acids 27, 1036–1055. https://doi.org/10.1016/j.omtn.2022.01.018 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Davì, G. et al. CD40 ligand and MCP-1 as predictors of cardiovascular events in diabetic patients with stroke. J. Atheroscler. Thromb. 16(6), 707–713. https://doi.org/10.5551/jat.1537 (2009).
Article PubMed Google Scholar
Tuttolomondo, A. et al. HLA and killer cell immunoglobulin-like receptor (KIRs) genotyping in patients with acute ischemic stroke. J. Neuroinflamm. 16(1), 88. https://doi.org/10.1186/s12974-019-1469-5 (2019).
Article MATH Google Scholar
Iadecola, C. & Anrather, J. The immunology of stroke: from mechanisms to translation. Nat. Med. 17, 796–808. https://doi.org/10.1038/nm.2399 (2011).
Article CAS PubMed PubMed Central MATH Google Scholar
Moskowitz, M. A., Lo, E. H. & Iadecola, C. The science of stroke: mechanisms in search of treatments. Neuron 67, 181–198. https://doi.org/10.1016/j.neuron.2010.07.002 (2010).
Article CAS PubMed PubMed Central MATH Google Scholar
Krug, T. et al. TTC7B emerges as a novel risk factor for ischemic stroke through the convergence of several genome-wide approaches. J. Cereb. Blood Flow Metab. 32, 1061–1072. https://doi.org/10.1038/jcbfm.2012.24 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, S. et al. Expression profile and bioinformatics analysis of circular RNAs in acute ischemic stroke in a South Chinese Han population. Sci. Rep. 10, 10138. https://doi.org/10.1038/s41598-020-66990-y (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
O’Connell, G. C. et al. Peripheral blood AKAP7 expression as an early marker for lymphocyte-mediated post-stroke blood brain barrier disruption. Sci. Rep. 7, 1172. https://doi.org/10.1038/s41598-017-01178-5 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Barrett, T. et al. NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Res. 35, D760-765. https://doi.org/10.1093/nar/gkl887 (2007).
Article CAS PubMed Google Scholar
Davis, S. & Meltzer, P. S. GEOquery: a bridge between the gene expression omnibus (GEO) and BioConductor. Bioinformatics 23, 1846–1847. https://doi.org/10.1093/bioinformatics/btm254 (2007).
Article CAS PubMed MATH Google Scholar
Stelzer, G. et al. The GeneCards suite: From gene data mining to disease genome sequence analyses. Curr. Protoc Bioinform. 54, 13031–313033. https://doi.org/10.1002/cpbi.5 (2016).
Article Google Scholar
Liberzon, A. et al. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425. https://doi.org/10.1016/j.cels.2015.12.004 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ben Salem, K. & Ben Abdelaziz, A. Principal component analysis (PCA). Tunis Med. 99, 383–389 (2021).
PubMed PubMed Central MATH Google Scholar
Engebretsen, S. & Bohlin, J. Statistical predictions with glmnet. Clin. Epigenet. 11, 123. https://doi.org/10.1186/s13148-019-0730-1 (2019).
Article MATH Google Scholar
Park, S. Y. Nomogram: An analogue tool to deliver digital knowledge. J. Thorac. Cardiovasc. Surg. 155, 1793. https://doi.org/10.1016/j.jtcvs.2017.12.107 (2018).
Article PubMed MATH Google Scholar
Gruber, H. E., Hoelscher, G. L., Ingram, J. A. & Hanley, E. N. Jr. Genome-wide analysis of pain-, nerve- and neurotrophin -related gene expression in the degenerating human annulus. Mol. Pain 8, 63. https://doi.org/10.1186/1744-8069-8-63 (2012).
Article CAS PubMed PubMed Central Google Scholar
Liu, Y. & Zhao, H. Variable importance-weighted random forests. Quant. Biol. 5, 338–351 (2017).
Article PubMed PubMed Central MATH Google Scholar
Cai, W. & van der Laan, M. Nonparametric bootstrap inference for the targeted highly adaptive least absolute shrinkage and selection operator (LASSO) estimator. Int. J. Biostat. https://doi.org/10.1515/ijb-2017-0070 (2020).
Article PubMed MATH Google Scholar
Van Calster, B. et al. Reporting and interpreting decision curve analysis: A guide for investigators. Eur. Urol. 74, 796–804. https://doi.org/10.1016/j.eururo.2018.08.038 (2018).
Article PubMed PubMed Central Google Scholar
Yu, G. Gene ontology semantic similarity analysis using GOSemSim. Methods Mol. Biol. 2117, 207–215. https://doi.org/10.1007/978-1-0716-0301-7_11 (2020).
Article CAS PubMed MATH Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30. https://doi.org/10.1093/nar/28.1.27 (2000).
Article CAS PubMed PubMed Central MATH Google Scholar
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16, 284–287. https://doi.org/10.1089/omi.2011.0118 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102, 15545–15550. https://doi.org/10.1073/pnas.0506580102 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, H. Q. et al. Deregulated miR-155 promotes Fas-mediated apoptosis in human intervertebral disc degeneration by targeting FADD and caspase-3. J. Pathol. 225, 232–242. https://doi.org/10.1002/path.2931 (2011).
Article CAS PubMed Google Scholar
Szklarczyk, D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607-d613. https://doi.org/10.1093/nar/gky1131 (2019).
Article CAS PubMed Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504. https://doi.org/10.1101/gr.1239303 (2003).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y. & Wang, X. miRDB: an online database for prediction of functional microRNA targets. Nucleic Acids Res. 48, D127-d131. https://doi.org/10.1093/nar/gkz757 (2020).
Article CAS PubMed Google Scholar
Li, J. H. et al. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 42, D92-97. https://doi.org/10.1093/nar/gkt1248 (2014).
Article CAS PubMed Google Scholar
Zhou, K. R. et al. ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from ChIP-seq data. Nucleic Acids Res. 45, D43-d50. https://doi.org/10.1093/nar/gkw965 (2017).
Article CAS PubMed Google Scholar
Zhang, Q. et al. hTFtarget: A comprehensive database for regulations of human transcription factors and their targets. Genom. Proteom. Bioinform. 18, 120–128. https://doi.org/10.1016/j.gpb.2019.09.006 (2020).
Article MATH Google Scholar

Download references

Acknowledgements

We gratefully appreciate the financial support provided by the National Natural Science Foundation of China (Grant numbers 82060237 to Changchun Hei, 82260255 to Xiao Yang, and 82060156 to Ping Liu), the Natural Science Foundation of Ningxia Hui Autonomous Region (Grant number 2023AAC03217 to Changchun Hei), and the Research Fund of Ningxia Medical University (Grant number XZ2022002 to Changchun Hei) for facilitating this research. We also thank GEO database for providing their platforms and contributors for uploading their meaningful datasets.

Funding

This study was funded by the National Natural Science Foundation of China (No. 82060237, 82260255, 82060156), the Natural Science Foundation of Ningxia Hui, Autonomous Region (No. 2023AAC03217, 2024AAC02078), and the Research Fund of Ningxia Medical University (No. XZ2022002) for facilitating this research.

Author information

Changchun Hei and Xiaowen Li have contributed equally to this work.

Authors and Affiliations

Key Laboratory for Craniocerebral Diseases of Ningxia Hui Autonomous Region, Department of Human Anatomy, Histology and Embryology, Ningxia Medical University, Yinchuan, China
Changchun Hei, Xiaowen Li & Jianguo Niu
Department of Neurology, General Hospital of Ningxia Medical University, Yinchuan, China
Ruochen Wang, Jiahui Peng & Xiao Yang
Department of Endocrinology, General Hospital of Ningxia Medical University, Yinchuan, China
Ping Liu
Department of Pharmaceutical Sciences, Biomanufacturing Research Institute Technology Enterprise (BRITE), College of Health and Sciences, North Carolina Central University, Durham, NC, USA
Xialan Dong, P. Andy Li & Weifan Zheng

Authors

Changchun Hei
View author publications
Search author on:PubMed Google Scholar
Xiaowen Li
View author publications
Search author on:PubMed Google Scholar
Ruochen Wang
View author publications
Search author on:PubMed Google Scholar
Jiahui Peng
View author publications
Search author on:PubMed Google Scholar
Ping Liu
View author publications
Search author on:PubMed Google Scholar
Xialan Dong
View author publications
Search author on:PubMed Google Scholar
P. Andy Li
View author publications
Search author on:PubMed Google Scholar
Weifan Zheng
View author publications
Search author on:PubMed Google Scholar
Jianguo Niu
View author publications
Search author on:PubMed Google Scholar
Xiao Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

XY, JN, CH conceived study concept and design. CH, XL, RW, PL, JP conducted data analysis and wrote the manuscript. XD, WZ contributed to the discussion of potential biological targets and drug repurposing hypotheses. XY, PAL revised and finalized the manuscript. All coauthors have reviewed and approved the submission.

Corresponding authors

Correspondence to Jianguo Niu or Xiao Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval

GEO is an open source public database. Users are permited to freely download relevant data for research and publication. This study is exempt from the approval of Institutional Review Board (IRB).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Table S1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Hei, C., Li, X., Wang, R. et al. Machine learning analysis of gene expression profiles of pyroptosis-related differentially expressed genes in ischemic stroke revealed potential targets for drug repurposing. Sci Rep 15, 7035 (2025). https://doi.org/10.1038/s41598-024-83555-5

Download citation

Received: 11 April 2024
Accepted: 16 December 2024
Published: 27 February 2025
DOI: https://doi.org/10.1038/s41598-024-83555-5

Keywords

This article is cited by

Identification and Validation of Glycosylation‑Related Genes in Ischemic Stroke Based on Bioinformatics and Machine Learning
- Hui Zhang
- Yanan Ji
- Xianxian Zhang
Journal of Molecular Neuroscience (2025)

Subjects

Abstract

Similar content being viewed by others

Bioinformatics prediction of function of T-cell exhaustion related genes in ischemic stroke

Bioinformatics analysis and experimental validation revealed that Paeoniflorigenone effectively mitigates cerebral ischemic stroke by suppressing oxidative stress and inflammation

Chuanxiong Rhizoma regulates ferroptosis and the immune microenvironment in ischemic stroke through the JAK-STAT3 pathway

Introduction

Results

Overall workflow

Ischemic stroke related DEGs

Differential expression analysis of PRDEGs

Machine learning models based on PRDEGs

Analysis of differences between high and low-risk groups and functional enrichment analysis (GO)

Pathway enrichment (KEGG) analysis of DEGs in high and low-risk groups

GSEA and GSVA enrichment analysis of the high and low-risk groups of the IS dataset

Discussion

Materials and methods

Data sources and pre-processing

Differentially expressed genes associated with ischemic stroke

Constructing predictive models for ischemic stroke based on PRDEGs

Differentially expressed genes associated with high and low-risk groups in a diagnostic model of PRDEGs

Gene function enrichment analysis (GO) and pathway enrichment (KEGG) analysis

Gene set enrichment analysis and gene set variation analysis

Protein–protein interaction networks

Construction of mRNA-miRNA, mRNA-RBP, and mRNA-TF interactions networks

Statistical analysis

Limitation

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethics approval

Additional information

Publisher’s note

Supplementary Information

Supplementary Table S1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Identification and Validation of Glycosylation‑Related Genes in Ischemic Stroke Based on Bioinformatics and Machine Learning

Search

Quick links