Abstract
The relationship between ischemic stroke (IS) and pyroptosis centers on the inflammatory response elicited by cerebral tissue damage during an ischemic stroke event. However, an in-depth mechanistic understanding of their connection remains limited. This study aims to comprehensively analyze the gene expression patterns of pyroptosis-related differentially expressed genes (PRDEGs) by employing integrated IS datasets and machine learning techniques. The primary objective was to develop classification models to identify crucial PRDEGs integral to the ischemic stroke process. Leveraging three distinct machine learning algorithms (LASSO, Random Forest, and Support Vector Machine), models were developed to differentiate between the Control and the IS patient samples. Through this approach, a core set of 10 PRDEGs consistently emerged as significant across all three machine learning models. Subsequent analysis of these genes yielded significant insights into their functional relevance and potential therapeutic approaches. In conclusion, this investigation underscores the pivotal role of pyroptosis pathways in ischemic stroke and identifies pertinent targets for therapeutic development and drug repurposing.
Similar content being viewed by others
Introduction
Cerebral ischemic stroke (IS) is a significant global health concern, and millions of people suffer from its worldwide each year. The survivors have varying degrees of sequelae, seriously affecting the quality of life of patients and their families1,2,3. Due to occlusion of intracranial blood vessels during ischemic stroke, brain cells have soon died of apoptosis, necrosis and/or pyroptosis. Other pathophysiological alterations elicited by IS include impairment of neurovascular unit, free radical accumulation, secondary inflammatory responses, ion channel alterations, mitochondrial damage, autophagy, and others, all of which contribute the final cell death4,5,6,7. Current effective treatment for IS includes thrombolysis or thrombectomy, which only applicable to less than 5% of patients due to their short intervention window. Thus, it has become critically important to understand detailed molecular mechanisms involved in ischemic stroke and identify potential therapeutic targets for drug development.
Pyroptosis is one of the major types of programmed cell death initiated by inflammation8. Formation of an inflammasome, composed of Nod-like receptor protein-3 (NLRP3), pro-caspase-1, and apoptosis-related speckle protein (ASC), is a critical step for pyroptosis. The inflammasome activates caspase -1, which cleaves IL-1β, IL-18, and GasderminD (GSDMD)9,10,11,12,13. Cleaved IL-1β and IL-18 further exacerbate inflammatory responses, while the N-terminal fragment of GSDMD translocates to the plasma membrane to induce pore formation and pyroptotic cell death. Alternatively, pyroptosis could also be induced by the non-classical cysteine proteinase-4, 5 and 11-dependent pathway14 or granzyme-mediated pathways and caspase-3-mediated pathways15. Pyroptosis plays an essential role in mediating cell death and clinical prognosis of various neurological disorders such as central nervous system infections, autoimmune disorders, neurodegenerative diseases, traumatic brain and spinal cord injuries, cerebral hemorrhagic and ischemic stroke16,17,18,19. Recently, an increasing number of studies have been conducted to search for pyroptosis-related biomarkers as potential indicators for predicting the development of disease and therapeutic target of drug discovery20.
Several gene expression profiling datasets have been published in the gene expression omnibus (GEO) (www.ncbi.nlm.nih.gov/geo/). These include GSE22255, which consists of 40 samples of gene expression profiles of peripheral blood mononuclear cells from IS patients and age-matched healthy adults; GSE140275 that includes gene expression profiles from blood samples of 3 acute ischemic stroke (AIS) patients and 3 healthy controls; and GSE16561 dataset that consists of 63 whole blood samples from 39 patients with IS and 24 healthy control subjects. In addition, information about pyroptosis-related genes (PRGs) has been made available in the GeneCards database (www.genecards.org/). These datasets have laid a foundation for detailed machine learning analysis to gain insight into genes that might be involved in IS and pyroptosis.
Using machine learning to search for differentially expressed genes in IS patient data set has drawn great interest. Qin et al.21 investigated the role of anoikis-related genes (ARGs) in ischemic stroke (IS), which employed machine learning to classify IS samples, identified key ARGs, and constructed diagnostic models. Ren et al.22 discovered inflammation-related biomarkers for ischemic stroke (IS) using machine learning. They identified differentially expressed genes (DEGs) between healthy controls and IS patients. According to Wang et al.23, immunosuppression-related genes have been identified in ischemic stroke (IS). It identified crosstalk genes linking immunosuppression and IS. Wei et al. has studied microRNA Expression Profile in Ischemic Stroke24. It constructed an IS diagnostic signature using logistic regression, identifying 14 differentially expressed miRNAs. Up to date, there has no report to search for pyroptosis-related differentially expressed genes (PRDEGs) in IS dataset.
Martha et al.25 used machine learning to predict stroke outcomes in acute ischemic stroke patients by analyzing inflammatory gene expression. It identified key genes and subject factors associated with infarct and edema volumes. These genes indicated chemoattraction and proliferation of immune cells, influencing stroke outcomes. The findings suggested that machine learning can help develop prognostic biomarkers for stroke outcomes and understand acute gene expression changes. O’Connell et al.26 used machine learning to validate a stroke-associated gene expression pattern using an independent patient population. It confirmed the robustness and temporal stability of this gene expression pattern as a potential diagnostic marker for stroke.
The aim of this study is to analyze the gene expression pattern of PRDEGs based on an integrated IS dataset using machine-learning methods in order to build predictive models of ischemic stroke samples and identify critically important PRDEGs involved in ischemic stroke. Classification models have been built to distinguish the healthy control samples and the IS patient samples using three machine learning methods (namely, LASSO, Random Forest and Support Vector Machine). As a result, a set of 10 PRDEGs was identified by all three machine learning models. Further analysis of the ten critical genes identified above has revealed the importance of these genes in IS development as well as potential therapeutic targets for IS.
Results
Overall workflow
In this study, we conducted a bioinformatics analysis to explore the expression characteristics of pyroptosis-related genes between normal and ischemic stroke (IS) tissues. Initially, we collected pyroptosis-related genes from literature and databases and combined them with RNA expression data of IS patients from specific GEO datasets. Within these datasets, we identified pyroptosis-related differentially expressed genes (Pyroptosis-Related DEGs).
Next, utilizing a LASSO regression model, we pinpointed a set of significant gene markers. These markers facilitated consensus clustering analysis to determine pyroptosis-associated IS subtypes. A substantial number of DEGs were identified between the different IS subtypes, from which we extracted core hub genes.
Throughout this process, we also conducted analyses of gene expression levels, correlation analysis, and gene ___location on chromosomes. These analyses helped us to understand the biological backdrop of the DEGs and their potential roles in disease progression. Further validation of our findings was ensured through additional validation of the GEO dataset and experimental verification. Moreover, we applied Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis tools to investigate the specific functions of these genes in the pyroptosis pathways and the biological processes involved. Protein–Protein Interaction (PPI) analysis and immune infiltration analysis were also performed to deepen our understanding of the role of DEGs in IS.
Finally, based on the identified hub genes, we screened for potential drugs or compounds for the treatment of ischemic stroke within the Drug Gene Interaction Database (DGIdb). This series of exhaustive analytical steps provided not only new biomarkers for the diagnosis of ischemic stroke but also potential drug targets for the disease’s treatment, thereby hoping to advance the development of therapeutic strategies for IS. The detailed schematic of this flowchart is illustrated in Fig. 1 of the article.
The technical flowchart is shown in Fig. 1.
Ischemic stroke related DEGs
The IS dataset (Fig. 2A,B) was obtained by first removing batch effects from the GSE2225, GSE140275 and GSE16561 IS datasets using the R package sva and then normalizing the combined datasets using the limma package: the expression data in the IS dataset were normalized to remove the batch effect, and the expression matrices were standardized to be consistent between samples. Principal Component Analysis (PCA) was performed on the expression matrix of the dataset to verify the effect of removing the batch effect (Fig. 2C,D), and the results showed that the batch effect in the IS dataset was largely eliminated after batch removal.
Differentially expressed genes (DEGs) were obtained from the IS dataset using the limma package to analyze the differences in gene expression between different subgroups of the IS dataset compared to the control group. The results are as follows: a total of 12,733 DEGs were obtained from the IS dataset, of which 1422 met the threshold of |logFC|> 0 and P.adj < 0.05. Under this threshold, the number of genes that were highly expressed in the IS group (low in the Control group with positive logFC) was 376; the number of genes with low expression in the IS group (high expression in the Control group with negative logFC) was 1046, and the results of the differential analysis of the IS dataset were plotted on a volcano plot (Fig. 3A).
Analysis of the IS dataset for Ischemic Stroke Related DEGs. (A) Volcano plots for analysis of DEGs in the IS group and control group. (B) Venn plots for DEGs and PRGs in the IS dataset. (C) Heat map of complex values for PRDEGs. (D) Heat map of chromosomal localization for PRDEGs. Chromosomal localization map of PRDEGs. IS ischemic stroke, DEGs differentially expressed genes, PRGs pyroptosis-related genes, PRDEGs pyroptosis-related differentially expressed genes.
The intersection of all the DEGs and the 372 PRGs obtained from the IS dataset was taken and a total of 32 PRDEGs were obtained and plotted in a Venn diagram (Fig. 3B), namely: AHSA1, APAF1, ATF6, ATG7, BIRC2, BNIP3L, CEBPB, CTSG, DNMT1, ELAVL1, GSK3B, IKBKE, IL18BP, IL32, IRAK3, JUN, LRPPRC, NDUFA13, NFE2L2, NFS1, NLRP3, OSM, PTGS2, SERPINB1, STAT3, TCEA3, TLR2, TNFSF13B, TRAF3, TREM1, UBE2D3, and VIM. The results obtained from Venn diagrams were used to analyze the differential expression of the 32 PRDEGs in different subgroups of the IS dataset and show the specific differential analysis results in a heat map (Fig. 3C). The expression of the 32 PRDEGs in the different subgroups of IS dataset (IS/Control) was significantly different.
The position annotation of 32 PRDEGs was used to analyze the ___location of PRDEGs on human chromosomes (Fig. 3D), which shows that they are mainly distributed on chromosomes 1, 2, 6, 11, 14 and 19, with the most distributed on chromosome 1, with a total of 6 PRDEGs, indicating that these PRDEGs, which are located close to each other on the chromosomes. This indicates that these PRDEGs in close proximity to each other at the genomic level are more closely linked.
Differential expression analysis of PRDEGs
The Wilcoxon rank sum test was used to analyze the differences in expression levels of the 32 PRDEGs between the different subgroups (IS/Control) in the IS dataset, and the results of the expression difference analysis are presented in a subgroup comparison plot (Fig. 4A). As can be seen from the figure, the expression level of the 32 PRDEGs between the IS group (IS) and the control group (Control) was highly statistically significant (P < 0.01), except for JUN and AHSA1, 10 genes (DNMT1, ELAVL1, IKBKE, IL18BP, IL32, LRPPRC, NDUFA13, NFS1, TCEA3 and TRAF3) were all expressed at higher levels in the control group (Control) than in the IS group (IS); while 20 genes (APAF1, ATF6, ATG7, BIRC2, BNIP3L, CEBPB, CTSG GSK3B, IRAK3, NFE2L2, NLRP3, OSM, PTGS2, SERPINB1, STAT3, TLR2, TNFSF13B, TREM1, UBE2D3, VIM) level were expressed at higher levels in the IS group (IS) than in the control group (Control).
Analysis of differential expression of PRDEGs in the IS dataset. (A) Plot of PRDEGs grouped in the IS dataset comparing the IS group and the control group (Control). (B–I) ROC curves based on PRDEGs: AHSA1, APAF1, ATF6, ATG7 (B); BIRC2, BNIP3L, CEBPB, CTSG (C); DNMT1, ELAVL1, GSK3B, IKBKE (D); IL18BP, IL32, IRAK3, JUN (E); LRPPRC, NDUFA13, NFE2L2, NFS1 (F); NLRP3, OSM, PTGS2, SERPINB1 (G); STAT3, TCEA3, TLR2, TNFSF13B (H); TRAF3, TREM1, UBE2D3, VIM (I). Symbol ns equals P ≥ 0.05, which is not statistically significant; symbol * equals P.value < 0.05, which is statistically significant; symbol ** equals P.value < 0.01, which is highly statistically significant; and symbol *** equals P.value < 0.001, which is highly statistically significant. The closer the AUC in the ROC curve is to 1.0, the better the distinguishing power. ROC receiver operating characteristic curve, AUC area under curve.
The ROC curves of the 32 PRDEGs in the IS and control groups (Control) were then plotted and the results presented (Fig. 4B–I). As shown in Fig. 4, 20 PRDEGs (AHSA1, APAF1, ATF6, ATG7, CEBPB, CTSG, DNMT1, GSK3B, IL18BP, JUN, NFS1, NLRP3, OSM, PTGS2, SERPINB1, TCEA3, TLR2, TRAF3, TREM1, UBE2D3) in the IS expression in the dataset has low accurate for the diagnosis of IS occurrence (0.500 < AUC < 0.700, Fig. 4B–I); whereas the other 12 PRDEGs (BIRC2, BNIP3L, ELAVL1, IKBKE, IL32, IRAK3, LRPPRC, NDUFA13 NFE2L2, STAT3, TNFSF13B, and VIM) expression in the IS dataset had moderate accuracy in the detection of IS occurrence (0.700 < AUC < 0.900, Fig. 4B–I).
Machine learning models based on PRDEGs
To determine the diagnostic value of the 32 PRDEGs in the IS dataset, we first constructed SVM models based on the 32 PRDEGs. The SVM (Support Vector Machine) algorithm should obtain the lowest error rate (Fig. 5A) and the highest accuracy (Fig. 5B). The results showed that the accuracy of the SVM model was highest when the number of genes was 29; and the genes include BIRC2, ELAVL1, UBE2D3, ATF6, JUN, DNMT1, GSK3B, TCEA3, IRAK3, SERPINB1, APAF1, AHSA1, STAT3, BNIP3L, NDUFA13, LRPPRC, CTSG, VIM, OSM, TNFSF13B, ATG7, IL18BP, TRAF3, TREM1, TLR2, NFS1, IKBKE, NFE2L2, CEBPB.
Construction of a PRDEGs-based model for IS vs Control. (A) Number of genes with the lowest error rate obtained by the SVM modeling. (B) Number of genes with the highest accuracy rate obtained by the SVM modeling. (C) Plot of model training error for the random forest algorithm. (D) Random forest model showing top 30 PRDEGs (in descending order of IncNodePurity). (E) Forest plot of the logistic (logistic) regression model. (F) Diagnostic plot for the LASSO regression model. (G) Variable trajectory plot for the LASSO regression model. (H) Column line plot (Nomogram) of the 12-PRDEGs in the diagnostic model. (I,J) Calibration curve plot (I), DCA plot (J) for the PRDEGs-based model. SVM support vector machine, PRDEGs pyroptosis-related differentially expressed genes, LASSO least absolute shrinkage and selection operator, DCA decision curve analysis.
The expression data of the 32 PRDEGs in the IS dataset was also analyzed using the Random Forest algorithm (RandomForest, RF; rf) (Fig. 5C) to determine the predictive value of the 32 PRDEGs in the IS dataset. IncNodePurity (Increase in Node Purity) indicates an increase in node purity. The higher the node purity, the lower the number of impurities contained. The results of the specific analysis were screened for IncNodePurity > 0.5, and 21 diagnostic markers for IS disease were obtained from the 32 PRDEGs (Fig. 5D), namely: BIRC2, NDUFA13, ELAVL1, IL32, TNFSF13B, BNIP3L, NFE2L2, IRAK3, DNMT1, VIM, JUN, CTSG, STAT3, CEBPB, LRPPRC, IKBKE, PTGS2, NLRP3, GSK3B, ATF6, and TRAF3.
Finally, we performed a logistic regression analysis and constructed a logistic regression model based on the 32 PRDEGs. A total of 32 PRDEGs were included in the logistic regression model, and the expression profiles of the 32 PRDEGs were visualized in a Forest Plot (Fig. 5E). The PRDEGs-based predictive model was then constructed by LASSO (Least Absolute Shrinkage and Selection Operator) regression analysis, and the results were visualized by plotting the LASSO regression model (Fig. 5F) and the LASSO variable trajectories (Fig. 5G). The results show that the model contains a total of 12 PRDEGs, namely: APAF1, ATF6, BIRC2, BNIP3L, CTSG, ELAVL1, GSK3B, IKBKE, NDUFA13, OSM, STAT3, and VIM. The Nomogram demonstrates the contribution of the 12 PRDEGs to the diagnostic model (Fig. 5H).
To determine the accuracy and discriminatory power of the PRDEGs-based model, calibration curves (Calibration Curve) were plotted using Calibration analysis to assess the predictive effectiveness of the model for actual outcomes based on the fit of the optimal theoretical probability (solid line) to the probability predicted by the model (dashed line) under different scenarios (Fig. 5I). In addition, decision curve analysis (DCA) was used to assess the usefulness of the model in terms of the clinical utility and to present the results (Fig. 5J). When the model’s line was stable over a range above All positive and All negative, the larger the range, the greater the net gain and the more effective the model is. It is clear from the results (Fig. 5I,J) that the model constructed in this experiment has a high accuracy in predicting the occurrence of IS over Control.
The PRDEGs from the LASSO regression model, the PRDEGs from the SVM model and those from the random forest model were intersected to obtain a total of 10 PRDEGs and plotted in a Venn diagram (Fig. 6A), which were: ATF6, BIRC2, BNIP3L, CTSG, ELAVL1, GSK3B, IKBKE, NDUFA13, STAT3, and VIM. A chord plot was then drawn to show the correlation between the 10 Common PRDEGs based on their specific expression in the IS dataset (Fig. 6B), which showed that the 10 Common PRDEGs were mostly positively correlated with each other, i.e. the PRDEGs have mostly positive interactions with each other.
Correlation analysis of Common PRDEGs. (A) LASSO regression models for PRDEGs and SVM models, Venn plots for PRDEGs in random forest models. (B,C) Chord plots for the 10 Common PRDEGs (B), correlation heat map (C). (D–I) Correlation scatter plots for Common PRDEGs: GSK3B and STAT3 (D), ATF6 and GSK3B (E), IKBKE and NDUFA13 (F), BNIP3L and IKBKE (G), ELAVL1 and STAT3 (H), and BIRC2 and NDUFA13 (I) correlation scatterplot results are shown. P ≥ 0.05, not statistically significant; P < 0.05, statistically significant; P < 0.01, highly statistically significant; P < 0.001, highly statistically significant. Correlation coefficients (r) in correlation scatter plots with absolute values above 0.8 are strongly correlated; absolute values between 0.5 and 0.8 are moderately correlated; absolute values between 0.3 and 0.5 are weakly correlated; absolute values below 0.3 are weakly or not correlated. LASSO least absolute shrinkage and selection operator, SVM support vector machine, PRDEGs pyroptosis-related differentially expressed genes.
Further correlation analysis was performed on the expression of the 10 Common PRDEGs in the IS dataset and the results were presented by correlation pheatmap (Fig. 6C), which showed that most of the correlations between the 10 Common PRDEGs in the IS dataset were statistically significant (P < 0.05), and then the results were presented by plotting correlation scatter plots separately for each of the three pairs with the highest and lowest correlations (Fig. 6D–I), with the three pairs with the highest positive correlations GSK3B and STAT3 (Fig. 6D, r = 0.606, P < 0.001); ATF6 and GSK3B (Fig. 6E, r = 0.564, P < 0.001); IKBKE and NDUFA13 (Fig. 6F, r = 0.500, P < 0.001) and the three most negatively correlated gene pairs: BNIP3L and IKBKE (Fig. 6G, r = − 0.566, P < 0.001); ELAVL1 and STAT3 (Fig. 6H, r = − 0.403, P < 0.001); BIRC2 and NDUFA13 (Fig. 6I, r = − 0.396, P < 0.001).
Analysis of differences between high and low-risk groups and functional enrichment analysis (GO)
To analyze the differences in gene expression in the IS samples in the high and low-risk groups of the PRDEGs-based diagnostic model, we used the limma package for differential analysis of the IS dataset to obtain DEGs between the different subgroups (High/Low-Riskscore). A total of 12,733 DEGs were obtained from the IS dataset, of which 12 genes met the threshold of |logFC|> 0.5 and P.adj < 0.05. Under this threshold, 9 genes were highly expressed in the High-Riskscore group (low expression in the Low-Riskscore group with positive logFC and up-regulated genes), and they are PF4V1, LY96, PROS1, GNG11, SH3BGRL2, PPBP, RGS18, IFNB1, and XK; and three genes have low expression in the High-Riskscore group (high expression in the Low-Riskscore group with logFC is negative), and they are GZMK, VPREB3, and HLA-DQA1. The results of the differential analysis of the IS dataset were plotted on a volcano map (Fig. 7A).
Analysis of the differences between high and low-risk groups and functional enrichment analysis (GO). (A) Volcano plot of DEGs analysis between different subgroups (High/ Low-Riskscore) in IS dataset. (B) Complex numerical heat map of DEGs in the IS dataset. (C) Bubble plot display of GO functional enrichment analysis results of DEGs (A). (D) Network plot display of GO functional enrichment analysis results of DEGs. (E) Bubble plot display of GO functional enrichment analysis of PRDEGs combined with logFC. Bubble plot display of GO functional enrichment analysis for PRDEGs combined with logFC. The vertical coordinate in the bubble plot (C) is the GO terms, and the bubble color indicates the number of molecules contained in the GO terms. The orange dots in the network diagram (D) represent specific genes and the cyan circles represent specific pathways. The cyan dots in the bubble diagram (E) represent the BP pathway, the orange circles represent the CC pathway; the dark blue circles represent the MF pathway. The screening criteria for GO enrichment entries are P < 0.05 and FDR value (q.value) < 0.05. IS ischemic stroke, DEGs differentially expressed genes, GO gene ontology, BP biological process.
The expression of the 12 DEGs (PF4V1, LY96, PROS1, GNG11, SH3BGRL2, PPBP, RGS18, IFNB1, XK, GZMK, VPREB3, HLA-DQA1) in different subgroups (High/Low-Riskscore) of the IS dataset was analyzed and the results of the differential analysis were plotted according to the results obtained from the volcano plot (Fig. 7B). The differential expression profiles were analyzed, and heat maps were plotted to show the results of the differential analysis (Fig. 7B), from which it can be seen that the 12 DEGs had significantly different expression profiles in different subgroups (High/Low-Riskscore) (Fig. 7B).
To analyze the relationship between the biological processes, molecular functions, cellular components and biological pathways of the 12 DEGs and IS disease, a GO (Gene Ontology) gene function enrichment analysis (Table 1) was performed on the 12 DEGs. The screening criteria for enrichment entries were P < 0.05 and FDR values (q.value) < 0.05, which were considered statistically significant. The results showed that the 12 DEGs in IS disease were mainly enriched in humoral immune response, leukocyte migration, blood coagulation, chemokine-mediated signaling pathway, response to chemokine, neutrophil chemotaxis, and other biological processes, as well as cellular components such as platelet alpha granule lumen and platelet alpha granule. These 12 DEGs were also enriched in cytokine activity, cytokine receptor binding, receptor ligand activity, chemokine activity, chemokine receptor binding, G-protein-coupled receptor binding and other molecular functions (MF). The results of GO function enrichment analysis are presented as bubble plots (Fig. 7C). In addition, the results were also presented as a network diagram (Fig. 7D). A joint logFC/GO enrichment analysis of these 12 DEGs was then performed, which is based on the enrichment analysis by providing the logFC values of the DEGs and calculating the corresponding z-score for each gene. The results were presented as bubble plots (Fig. 7E), from which it can be seen that the GO enrichment analysis results of the 12 DEGs were mainly concentrated in the BP pathway.
Pathway enrichment (KEGG) analysis of DEGs in high and low-risk groups
The 12 DEGs were analyzed for KEGG enrichment (Table 2) and the screening criteria of P < 0.05 and FDR value (q.value) < 0.05 were considered statistically significant. The results showed that the 12 DEGs were significantly enriched in chemokine signaling pathway, cytokine-cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, Toll-like receptor signaling pathway, and toxoplasmosis in a total of five KEGG pathways (Fig. 8A). In addition, the pathway enrichment (KEGG) analysis was presented in the form of a chord diagram (Fig. 9B), and the pathway maps (Fig. 8C–G) were plotted and drawn by the R package Pathview to show KEGG enrichment in the Chemokine signaling pathway (Fig. 8C), Cytokine-cytokine receptor interaction (Fig. 8D), Viral protein interaction with cytokine and cytokine receptor (Fig. 8E), Toll-like receptor signaling pathway (Fig. 8F), and Toxoplasmosis (Fig. 8G) gene expression in the five KEGG pathways.
KEGG pathway enrichment analysis of the DEGs. (A,B) KEGG enrichment analysis of the IS dataset DEGs are shown in histogram (A), chord diagram (B). (C–G) KEGG pathway enrichment analysis of DEGs: Chemokine signaling pathway (C), Cytokine-cytokine receptor interaction (D), Viral protein interaction with cytokine and cytokine receptor (E), Toll-like receptor signaling pathway (F), Toxoplasmosis (G) KEGG. IS ischemic stroke, DEGs differentially expressed genes, KEGG Kyoto encyclopedia of genes and genomes. Screening criteria for KEGG-enriched entries were P < 0.05 and FDR value (q.value) < 0.05.
GSEA enrichment analysis of the high and low-risk groups of the IS diagnostic model. (A) Analysis of the main 6 biological features of GSEA enrichment between high and low-risk groups in the diagnostic model of the IS dataset. B-G. Significant enrichment of genes between high and low-risk groups in the PRDEGs-based diagnostic model: Brain Hcp With H3k27me3 (B), Neuroactive Ligand Receptor Interaction (C), Regulation Of Autophagy (D), Sars Coronavirus And Innate Immunity (E), Interferon Alpha Beta Signaling (F), IL2 Stat5 Pathway (G) pathway. IS ischemic stroke, PRDEGs pyroptosis-related differentially expressed genes, DEGs differentially expressed genes, GSEA gene set enrichment analysis. The significant enrichment screening criteria for GSEA enrichment analysis were P.adj < 0.05 and FDR value (q.value) < 0.05.
GSEA and GSVA enrichment analysis of the high and low-risk groups of the IS dataset
To determine the effect of the expression levels of metabolism-related DEGs in ischaemic stroke (IS) on the occurrence of IS, the entire gene expression and involvement of DEGs in the IS dataset was analyzed by GSEA between high and low-risk groups of the PRDEGs-based diagnostic model. The association between gene expression and biologic processes, cellular components affected, and molecular functions performed was analyzed by GSEA, with P < 0.05 and FDR value (q.value) < 0.05 as significant enrichment screening criteria.
The results show significant enrichment of genes between high and low-risk groups in the PRDEGs-based diagnostic model include Brain Hcp With H3k27me3 (Fig. 9B), Neuroactive Ligand Receptor Interaction (Fig. 9C), Regulation Of Autophagy (Fig. 9D), SARS Coronavirus and Innate Immunity (Fig. 9E), Interferon Alpha Beta Signaling (Fig. 10F), Il2 Stat5 Pathway (Fig. 9G) and other pathways (Fig. 9A–G, Table 3).
GSVA enrichment analysis between high and low-risk groups of the IS dataset. Complex numerical heat map display of GSVA enrichment analysis data between high and low-risk groups of the IS dataset (A), group comparison plot (B). IS ischemic stroke, PRDEGs pyroptosis-related differentially expressed genes, GSVA gene set variation analysis, GSVA gene set variation analysis. GSVA enrichment analysis results were based on P < 0.05 and |logFC|> 0.35 as significant enrichment screening criteria.
GSVA (Gene Set Variation Analysis) analysis was performed on all genes in the IS dataset between the high and low-risk groups to explore the variability of the c2.cp.all.v2022.1.Hs.symbols.gmt gene set between the high and low-risk groups of the IS dataset, with P < 0.05 and |logFC|> 0.35 as significant enrichment screening criteria (Fig. 10, Table 4). GSVA of all genes between high and low-risk groups in the IS dataset showed Gdp Fucose Biosynthesis, Synthesis Of Dolichyl Phosphate, Cytosine Methylation, Formation Of Xylulose 5 Phosphate, Beta Oxidation Of Butanoyl Coa To Acetyl Coa, Aurka Targets, Wax And Plasmalogen Biosynthesis, Cerebral Organic Acidurias Including Diseases, Chylomicron Clearance, Diseases Of Base Excision Repair, Mirna Biogenesis, Synthesis Of Ketone Bodies, Runx1 Regulates Transcription Of Genes Involved In Wnt Signaling, Melanin Biosynthesis, Aml Methylation Cluster 7 Dn, Cohesin Loading Onto Chromatin, Covid19 Thrombosis And Anticoagulation, Regulation Of Gene Expression By Hypoxia Inducible Factor, Signaling To p38 Via Rit And Rin a total of 19 gene set pathways showing differences between different subgroups in the IS dataset, with Melanin Biosynthesis, Aml Methylation Cluster 7 Dn, Cohesin Loading Onto Chromatin, Covid19 Thrombosis And Anticoagulation, Regulation Of Gene Expression By Hypoxia Inducible Factor, Signaling To P38 Via Rit And Rin, six pathways had significantly higher enrichment scores in the high-risk group of the IS dataset than in the low-risk group, while the other 13 pathways had significantly higher enrichment scores in the low-risk group of the IS dataset than in the high-risk group. The results of the differential expression of the 19 pathways in the different subgroups of the IS dataset were also analyzed and the results of the specific differential analysis were presented in a heat map (Fig. 10A). The results showed that the 19 pathways were significantly differentially expressed in the different subgroups (High/Low-Riskscore) of the IS dataset. The Mann–Whitney U-test analysis also showed that the expression of the 19 pathways in the IS dataset differed significantly in different subgroups (High/Low-Riskscore), except for the Wax And Plasmalogen Biosynthesis and Covid19 Thrombosis And Anticoagulation did not have statistically significant differences in enrichment scores (P > 0.05); and all other pathways had at least statistically significant differences (P < 0.05).
8.Construction of protein–protein interaction networks (PPI networks), mRNA-miRNA, mRNA-RBP and mRNA-TF interaction networks.
The STRING database was used to perform protein–protein interaction analysis [minimum required interaction score: low confidence (0.150)] on 12 DEGs (PF4V1, LY96, PROS1, GNG11, SH3BGRL2, PPBP, RGS18, IFNB1, XK, GZMK, VPREB3, HLA-DQA1) between high and low risk groups of the PRDEGs diagnostic model in the IS dataset. protein–protein interaction analysis [minimum required interaction score: low confidence (0.150)], and a protein–protein interaction network (PPI network; Protein–protein interaction network) consisting of 12 DEGs was constructed (Fig. 11A). network) (Fig. 11A), and then the PRDEGs in the PPI network with links to other genes were used as key genes (hub genes) for IS disease, and 11 key genes (hub genes) were obtained, namely: PF4V1, LY96, PROS1, GNG11, PPBP, RGS18 These 11 hub genes were then analysed for functional similarity, and the GO terms, sets of GO terms, gene products and gene clusters were calculated using the R package GOSemSim. The results showed that among the 11 hub genes, VPREB3 had no functional similarity value with other hub genes, and LY96 had the highest functional similarity value with other hub genes. genes had the highest functional similarity values (Fig. 11B).
Construction of protein–protein interaction network (PPI network), mRNA-miRNA, mRNA-RBP and mRNA-TF interaction networks. (A) IS dataset PRDEGs diagnostic model protein interaction network (PPI network) between high and low risk groups of DEGs. (B) Functional similarity analysis results of key genes are presented. (C–E) mRNA-miRNA (C), mRNA-RBP (D), mRNA-TF (E) interaction networks of key genes. mRNA-miRNA (C), mRNA-RBP (D), mRNA-TF (E) interaction networks. -miRNA (C) Interacting network with sky-blue circular blocks as mRNA; orange hexagonal blocks as miRNA. mRNA-RBP (D) Interacting network with sky-blue oval blocks as mRNA; lavender diamond-shaped blocks as RBP. mRNA-TF (E) Interacting network with sky-blue oval blocks as mRNA; light green triangular blocks as transcription factor (TF). IS ischemic stroke, PRDEGs pyroptosis-related differentially expressed genes, DEGs differentially expressed genes, PPI network protein–protein interaction network, RBP RNA binding protein, TF transcription factors.
The mRNA-miRNA data in the miRDB database were used to predict the miRNAs that interacted with 11 key genes, and the mRNA-miRNA interaction network was drawn for visualization (Fig. 11C). mRNA-miRNA interaction network in the sky-blue oval blocks are mRNAs; orange hexagonal blocks are miRNAs. by mRNA-miRNA interaction The mRNA-miRNA interaction network was composed of 9 hub genes and 30 miRNA molecules, constituting a total of 32 pairs of mRNA-miRNA interactions.
The RNA binding protein (RBP) interacting with 11 key genes was predicted using the ENCORI database, and the mRNA-RBP interaction network was mapped for visualization (Fig. 11D). mRNA-RBP interaction network has sky blue circular blocks for mRNA and lavender diamond-shaped dot blocks for RBP. The mRNA-RBP interaction network consisted of 7 hub genes and 68 RBP molecules, constituting a total of 120 pairs of mRNA-RBP interactions, of which the hub genes PROS1 had interactions with 54 RBP molecules.
Finally, the CHIPBase database (version 3.0) and hTFtarget database were used to find transcription factors (TF) that bind to hub genes. The interaction relationships were intersected with 11 hub genes, and the interaction data of 7 hub genes and 70 transcription factors (TF) were obtained and visualized. mRNA-TF interaction network with sky blue oval blocks for mRNA; light green triangular blocks for transcription factors (TF) (Fig. 11E). mRNA-TF interaction network with key The key gene VPREB3 interacted with 56 transcription factors (TF).
9.Analysis of differences in ssgsea immune characteristics between high and low risk groups in the IS dataset dataset PRDEGs diagnostic model.
To investigate the differences in immune infiltration between the high and low risk groups of the IS dataset PRDEGs diagnostic model, the ssGSEA algorithm was used to calculate the abundance of infiltration of 28 immune cells in the samples between the high and low risk groups of the IS dataset, and then the Mann–Whitney U test was used to analyse the differences in infiltration of 28 immune cells between different IS disease subtypes. The degree of difference in the infiltration of 28 immune cells between different IS disease subtypes was then analysed by the Mann–Whitney U test and the results were presented by grouped comparison plots (Fig. 12A). The results showed statistically significant differences (p < 0.05) in the infiltration abundance of 10 immune cells between the high and low risk groups of the PRDEGs diagnostic model of the IS dataset, namely Activated B cell, Activated CD8 T cell, CD56dim natural killer cell Central memory CD4 T cell, Effector memory CD4 T cell, Effector memory CD8 T cell, Immature B cell, MDSC, Memory B cell, T follicular helper cell. The correlation between the infiltration abundance of the 10 immune cell types with statistical differences between the high-risk and low-risk groups was further calculated and the results were presented (Fig. 12B,C), showing that in both subgroups of the IS dataset (Fig. 12C), the infiltration abundance of the 10 immune cell types were mostly positively correlated with each other, and that Effector memory CD4 T cells and Activated CD8 T cells were positively correlated with each other. cells and Activated CD8 T cells (Fig. 12B,C).
Analysis of differences in ssgsea immune characteristics between high and low risk groups in the IS dataset dataset PRDEGs diagnostic model. (A) IS dataset dataset of the PRDEGs diagnostic model of ssGSEA immune infiltration analysis results grouped between high and low risk groups is presented. (B,C) IS dataset dataset of the low risk group (B), high risk group (C) correlation analysis results of immune cell infiltration abundance are presented. (D,E) IS dataset dataset of the low risk group (D), high risk group (E), and immune cell infiltration analysis results are presented. Heat map of correlation between immune cells and hub genes in the high-risk group (E). Symbol ns equals P ≥ 0.05, not statistically significant; symbols * equals P < 0.05, statistically significant; symbols ** equals P < 0.01, highly statistically significant; symbols *** equals P < 0.001, highly statistically significant. IS ischemic stroke, PRDEGs pyroptosis related differentially expressed genes, ssGSEA single-sample gene-set enrichment analysis.
The correlation between the abundance of 10 immune cell infiltrates and the expression of 11 key genes (hub genes: PF4V1, LY96, PROS1, GNG11, PPBP, RGS18, IFNB1, XK, GZMK, VPREB3, HLA-DQA1) were also calculated for the low-risk (Fig. 12D) and high-risk (Fig. 12E) patient samples, respectively. The correlations between the amounts, screened at P < 0.05 and the results presented by correlation heat map (Fig. 12D,E), showed some amount of significant correlation between immune cell content and the expression of 11 hub genes in both subgroups, with the hub genes in the low risk group of the IS dataset (Fig. 12D) There was a negative correlation between IFNB1 and the infiltration abundance of most immune cells, while there was a positive correlation for the hub genes VPREB3. In the high-risk group of the IS dataset (Fig. 12E) there was a negative correlation between hub genes IFNB1, LY96 and the infiltration abundance of most immune cells, while there were relatively more significant positive correlations for hub genes VPREB3, HLA-DQA1, and GZMK.
Discussion
Ischemic stroke (IS) is a major global health problem. Despite rapid advances in medical care worldwide, stroke continues to be the leading cause of death and disability27. The disease does not only occur in elderly people, but the chances of developing it in younger people are increasing every year28,29,30, and the common risk factors are smoking, hypertension, hypercholesterolemia hyperglycemia. Similarly, stroke is a dangerous disease in the life and health of children and can even lead to serious neurological deficits31. Up to date, there is no reliable molecular biomarkers could be used in clinical practice to accurately predict IS prognosis.
Pyroptosis, an inflammation-caused type of cell death, is thought to be one of the major types of brain cell death after IS and factor that affects the progression of IS32. Suppression of pyroptosis could promote neuronal survival and ameliorate neurological damage, thereby improving the prognosis and reducing mortality in IS patients33,34,35. As brain tissue biopsy has not been well accepted by the patients and their relatives, studying the expression pattern of PRDEGs in the blood samples of IS patients deposited to IS dataset may provide a tool to predict the IS prognosis and targets for drug development.
Our study encompassed a comprehensive machine learning analysis of both the Control and the ischemic stroke (IS) samples. We first collected and combined three IS expression profile datasets into one integrated IS dataset consisting of 62 IS patients and 47 controls, with a total of 12,733 gene expression data in the blood samples. Among them, 1422 genes met the threshold of |logFC|> 0 and P. adj < 0.05 and of which 376 were highly expressed in IS group. Intersection of PRGs and DEGs further narrowed the number of genes down to 32. Logistic/LASSO regression analysis, random forest and SVM algorithms were performed on each of the 32 PRDEGs to construct accurate predicting models and determine the predicting value of PRDEGs in the dataset. Through a detailed evaluation involving differential gene expression pattern analysis and ROC analysis (Fig. 4), we successfully identified a group of 10 critical PRDEGs (ATF6, BIRC2, BNIP3L, CTSG, ELAVL1, GSK3B, IKBKE, NDUFA13, STAT3, and VIM). These genes participate in diverse cellular processes, and their interconnection with IS is a subject of ongoing investigation. We have delved into some of the implications of these gene associations below and discuss some potential therapeutic interventions for IS.
The differential gene expression analysis of IS samples indicates an up-regulation of ATF6 (Activating Transcription Factor 6) (Fig. 4A). Furthermore, ATF6 demonstrates a moderate level of predictive power for IS against the Control, as evidenced by an AUROC value of 0.67 (Fig. 4B). This observation can be attributed to ATF6’s role as a cellular response mechanism that aids cells in managing stress resulting from the accumulation of misfolded proteins in the endoplasmic reticulum (ER), induced by various cellular stressors including ischemia36. This alignment with existing literature underscores ATF6’s significance in the context of ischemic stroke. The practical implication of this discovery lies in the potential utility of ATF6-activating compounds to mitigate ischemic stroke-related cellular stress. Notably, compounds such as BiX (1-(3,4-dihydroxyphenyl)-2-thiocyanate-ethanone) and compound 147, as mentioned in reference36, have demonstrated the ability to activate ATF6, offering a foundation for the development of novel pharmacological therapies for ischemic stroke.
The analysis also revealed an up-regulation of BIRC2 (Baculoviral IAP Repeat Containing 2) in the IS group relative to the Control group, supported by a P-value of less than 0.001 (Fig. 4A). BIRC2 is a member of the inhibitor of apoptosis (IAP) family37. It plays a role in regulating cysteine aspartic enzyme pathway, inhibiting apoptosis and promoting cell cycle progression38,39,40. In cerebral ischemia, suppression of BIRC2 leads to exacerbated neurological functional deficits and increased apoptosis and infarct volume37. In contrast, upregulation of BIRC2 provides protection against type 2 diabetes and hypercholesterolemia, both of which are risk factors for inducing and aggravating ischemic stroke41,42,43. Remarkably, BIRC2 emerges as a potent diagnostic discriminator, as evidenced by an area under the ROC curve (AUROC) value of 0.79 (Fig. 4C), effectively distinguishing IS samples from the Controls. This finding mirrors a recent study by Zhang et al.44, aligning with our investigation and providing valuable insights for exploring the underlying mechanisms of ischemic stroke, as well as fostering the development of innovative diagnostic and therapeutic avenues.
BNIP3L (BCL2 Interacting Protein 3 Like), which is a mitochondrial autophagy receptor45,46, plays a neuroprotective role in IS47,48. The up-regulation of BNIP3L identified in our analysis (Fig. 4A) holds particular significance due to its involvement in activating mitophagy, a process that confers protective effects against IS damage. The AUROC value of 0.701 (Fig. 4C) underscores BNIP3L’s efficacy as a diagnostic marker, illuminating its role as a response mechanism to IS, aimed at safeguarding the ischemic brain region. Corroborating this, research by Li et al.49 indicates that BNIP3L knockout mice exhibit larger brain infarct areas and exacerbated neurological deficits, which can be ameliorated through BNIP3L overexpression. Consequently, compounds capable of stimulating BNIP3L expression offer promise as therapeutic agents against IS. For instance, hydroxamic acid-based histone deacetylase (HDAC) inhibitors have been identified as inducers of BNIP3L expression50, warranting exploration as potential small molecule drug candidates for this purpose.
Cathepsin G (CTSG) is a serine protease of the trypsin C family51 that is released directly at the site of inflammation, causing platelet secretion and aggregation, and plays an important role in cell signaling, extracellular matrix degradation and cytokine processing, and could be pro-inflammatory or promote thrombosis when platelets interact with neutrophils52,53,54. Our analysis further indicates an up-regulation of CTSG (Cathepsin G) in the context of ischemic stroke (IS) compared to the Control group (Fig. 4A). Notably, this up-regulation is statistically significant, denoted by a P-value of 0.01. Moreover, the area under the ROC curve (AUROC) value of 0.66 (Fig. 4C) underscores CTSG’s diagnostic potential for distinguishing IS samples from Controls. This observation assumes significance given CTSG’s implication as an inflammation-related gene product. Building upon prior research, which demonstrated that knockdown of CTSG inhibits apoptosis55,56 and that the reduction of myocyte death post cardiac ischemia and reperfusion upon CTSG inhibition57, the prospect of CTSG inhibitors emerges as a promising avenue for potential therapeutic intervention against IS. A pertinent example is the Cathepsin G inhibitor Bis-Napthyl Beta-Ketophosphonic Acid (DrugBank ID: DB02360), which holds potential as a small molecule drug candidate for ischemic stroke.
Within our analysis, ELAVL1 exhibits down-regulation in the IS group relative to the Control group (Figs. 4A and D). This modulation is hypothesized to be a response mechanism to mitigate neuronal impairment in cerebral ischemia/reperfusion events. Support for this notion is derived from a study by Du et al.58, which underscores the role of ELAVL1 in ferroptosis-induced neuronal protection. Although explicit ELAVL1-downregulating compounds are not yet established, ongoing research aims to identify small molecules capable of disrupting the interaction between ELAVL1 and its target mRNAs. Such molecules have the potential to modulate the stabilizing influence of ELAVL1 on mRNA molecules, thereby influencing changes in gene expression59.
GSK3B (glycogen synthase kinase 3 beta) exhibits up-regulation in the IS samples based on our analysis (Fig. 4A). The AUROC value of 0.65 (Fig. 4D) highlights GSK3B’s robust discriminatory power in separating the IS samples from the Control samples. Research by Li et al.60 underscores the potential of GSK3 inhibition in reducing cerebral ischemia/reperfusion injury. In light of this, GSK3B inhibitors emerge as a promising avenue for the treatment of ischemic stroke. For instance, Tideglusib, a chemical inhibitor of GSK3B, has demonstrated a capacity to attenuate hypoxic-ischemic brain injury in neonatal mice61. This supports the potential utility of GSK3B inhibitors, such as Tideglusib, as candidates for consideration in the context of therapeutic interventions for IS patients.
IKBKE (inhibitor of nuclear factor kappa B kinase subunit epsilon) is a gene involved in pyroptosis during the ischemic stroke (IS). Its expression is down-regulated compared to the Control group (Fig. 4A). The area under the ROC curve (AUROC) is 0.72 (Fig. 4D), indicating a moderate level of statistical power in distinguishing IS samples from Control samples. Given the context-dependent nature of IKBKE’s function62, it is suggested, based on our analysis, that further in vitro and in vivo tests should be conducted with various modulators of this target to identify potential drug candidates relevant for treating IS.
STAT3 (Signal transducer and activator of transcription 3) (Fig. 4A) is another factor differentially expressed, with an AUROC of 0.705, suggesting a moderate level of power in distinguishing the IS samples from the Control samples. Studies have highlighted the essential role of STAT3 in maintaining cerebrovascular integrity and its significance in providing protection against cerebral ischemia. Notably, endothelial STAT3 could serve as a potential therapeutic target for mitigating endothelial dysfunction post-stroke63.
NDUFA13 (NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit 13) is down-regulated (Fig. 4A), with an AUROC of 0.79 (Fig. 4F). Notably, studies have shown that a moderate down-regulation of NDUFA13 is associated with suppressed superoxide burst and reduced infarct size during the ischemia–reperfusion process64, highlighting the significant role that NDUFA13 plays in ischemic stroke.
Our analysis demonstrates an upregulation of VIM (Vimentin) (Fig. 4A), with an AUROC of 0.704 (Fig. 4I). VIM is a cytoskeletal intermediate filament protein65,66 that maintains the capacity for tissue repair and tissue integrity67,68. VIM, a protein coding gene, recruits inflammatory cells from the bloodstream and enhances inflammatory responses. It impairs vascular function by regulating the vascular tone and affects blood vessel remodeling67, which could exacerbate stroke progression and hinder the recovery. Interestingly, circulating VIM is positively associated with prediction of future IS incidence in a population–based cohort study69. Furthermore, VIM is associated with stroke risk factors with a high expression in diabetic pancreas islet cells70,71 and mediates hyperglycemia caused disperse of Golgi apparatus72. VIM deficiency prevents high-fat diet-induced obesity and insulin resistance in mice73. These results suggest that VIM is closely associated with IS risk factors and IS itself. And furthermore, Fasipe et al.74 found that extracellular VIM significantly promotes the formation of von Willebrand factor (VWF) strings through A2 ___domain binding and pharmacologically targeting VIM/VWF interactions promotes improvements in reperfusion after IS. These studies demonstrate the critical role of VIM in stroke pathology and further investigation is necessary to confirm whether inhibiting VIM could contribute to the recovery of IS patients from ischemic brain injury.
By using |logFC|> 0.5 and P.adj < 0.05 as screening criteria, a total of 12 DEGs (PF4V1, LY96, PROS1, GNG11, SH3BGRL2, PPBP, RGS18, IFNB1, XK, GZMK, VPREB3, HLA-DQA1) with significant differences between high and low risk groups of the IS dataset were screened. To understand the potential roles, pathways and biological properties of the 12 DEGs in the high and low-risk groups in the IS dataset, KEGG, GO, GSEA and GSVA enrichment analyses were performed in this experiment. KEGG results showed that these 12 genes were mainly enriched in the five KEGG pathways of Chemokine signaling pathway, Cytokine-cytokine receptor interaction, Viral protein interaction with cytokine and cytokine receptor, Toll-like receptor signaling pathway, Toxoplasmosis.
GO enrichment analysis showed that these genes were mainly enriched in BP pathway such as humoral immune response, leukocyte migration, blood coagulation, MF such as cytokine activity, cytokine receptor binding, receptor ligand activity and CC such as platelet alpha granule lumen, platelet alpha granule. Taken together, the results of these enrichment analyses allow an assessment of the potential value of PRDEGs in disease diagnosis. Numerous studies have shown that pyroptosis is also involved in immunity, this and the result of the analysis of the GO in our study and KEGG is consistent. Scorching leads to a range of inflammatory and immune responses including cell swelling, plasma membrane lysis, and chromatin rupture75,76. Evidence suggests that ischemic stroke patients frequently exhibit pronounced immune-inflammatory responses77,78. The immune system has been shown to be involved in all phases of ischemic stroke, from the early damage events that have occurred with arterial occlusion to later tissue repair, and the damaged brain in turn exerts immunosuppressive effects that promote fatal disturbances and dangerous post-stroke patient survival79,80.
Materials and methods
Data sources and pre-processing
We downloaded the expression profile datasets GSE2225581, GSE14027582 and GSE1656183 for patients with IS from the GEO database84 using the R package GEOquery85. All datasets were obtained from Homo sapiens.
The GSE22255 dataset consists of 40 samples of gene expression profiles of peripheral blood mononuclear cells (PBMCs) from IS patients and age-matched healthy adults: 20 IS patients and 20 age-matched healthy adults. The GSE140275 dataset includes gene expression profiles from blood samples of 3 acute ischemic stroke (AIS) patients and 3 healthy controls. The GSE16561 dataset consists of 63 samples of GEP data from whole blood samples from 39 patients with IS and 24 healthy control subjects.
The data platforms for GSE22255, GSE140275, and GSE16561 are GPL570 (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array, GPL16791 Illumina HiSeq 2500 (Homo sapiens), and GPL6883 Illumina HumanRef-8 v3.0 expression beadchip, respectively. The probe annotation for these datasets was done using the corresponding GPL platform files. All samples from these datasets were included for further analysis. Please refer to Tables 5,6 for specific dataset information.
We collected Pyroptosis-Related Genes (PRGs) from the GeneCards database86 (https://www.genecards.org/), which provides comprehensive information on human genes. A total of 372 PRGs at the mRNA level were obtained using the term “pyroptosis” as a search term and “protein coding” as a filtering criterion. In addition, we used MSigDB (Molecular Signatures Database)87, with "pyroptosis " as the search term, to find 27 genes from the REACTOME_PYROPTOSIS.v2022.1.Hs reference gene set. After merging and de-duplication, we obtained a total of 372 PRGs as described in Table S1.
Differentially expressed genes associated with ischemic stroke
To identify the potential mechanisms, biological features, and pathways of differential genes in IS, the datasets GSE22255, GSE140275 and GSE16561 were combined to remove the batch effect, and then the three datasets were standardized to obtain the IS dataset. The expression matrix of the dataset before and after removal of the batch effect was subjected to principal component analysis (PCA)88. The effect of removing the batch effect was validated.
Differential analysis was then performed on the IS dataset using the LIMMA package, and DEGs (differentially expressed genes) were selected for further study using the following criteria: |logFC|> 0 and P.adj < 0.05. Among these DEGs, genes with logFC > 0 and P.adj < 0.05 were considered upregulated DEGs, while genes with logFC < 0 and P.adj < 0.05 were considered downregulated DEGs.
The DEGs were intersected with the 372 Pyroptosis-Related Genes (PRGs) and a Venn diagram was used to obtain the IS disease-related Pyroptosis-Related Differentially Expressed Genes (PRDEGs) (Table 7). The results of the analysis are presented in a volcano map and a heat map.
Constructing predictive models for ischemic stroke based on PRDEGs
Logistic regression analysis of the integrated IS dataset was performed to obtain a predictive model based on the PRDEGs. Logistic regression models were constructed by screening PRDEGs with a P-value < 0.10, and the expression of the PRDEGs included in the logistic regression models was presented in a Forest Plot.
Next, we applied LASSO (Least Absolute Shrinkage and Selection Operator) regression analysis to the PRDEGs included in the above logistic regression model using the R package glmnet89. The parameters set. seed (500) and family = “binomial” were used, and the analysis was run for 200 cycles. LASSO regression analysis, based on linear regression, reduces model overfitting and improves model generalization by adding a penalty term (lambda × absolute value of slope). The results of the LASSO regression analysis were plotted on a Nomogram by R package rms90.
RandomForest (RF)91 is an algorithm that integrates multiple decision trees through ensemble learning and belongs to the bagging (bootstrap aggregation) category of ensemble algorithms. By constructing multiple decision trees, it predicts a sample by aggregating the predictions from each tree and selecting the final result through voting. We used the RandomForest package92 to perform the model construction with the expression matrix of PRDEGs in the IS dataset. Parameters set. seed (234) and ntree = 1000.
Furthermore, we constructed an SVM (Support Vector Machine)93 model based on the Pyroptosis-Related DEGs (PRDEGs) and selected those PRDEGs based on the highest accuracy and lowest error rate. The PRDEGs included in the LASSO regression model, SVM model, and Random Forest model were intersected, and a Venn diagram was plotted to obtain the common PRDEGs.
Finally, we evaluated the accuracy and discriminative power of the PRDEGs diagnostic model using calibration analysis and plotted a calibration curve based on the pyroptosis-related hub genes. Decision Curve Analysis (DCA)94 (a simple method for evaluating clinical prediction models, diagnostic tests, and molecular markers) was used to assess the accuracy and discriminative power of the PRDEGs diagnostic model. The DCA plot evaluating the PRDEGs diagnostic model was generated using the R package ggDCA based on the pyroptosis-related DEGs (PRDEGs).
Differentially expressed genes associated with high and low-risk groups in a diagnostic model of PRDEGs
To identify the potential mechanisms, biological features, and pathways associated with differentially expressed genes (DEGs) in the high-risk and low-risk groups of the IS dataset PRDEGs diagnostic model, we performed differential analysis using the LIMMA package on the IS dataset. DEGs were selected based on the criteria of |logFC|> 0.5 and P < 0.05 for comparisons between different groups (High vs. Low). These DEGs, meeting the criteria, were further investigated in our study. Among them, genes with logFC > 0.5 and P < 0.05 were considered up-regulated DEGs, while genes with logFC < -0.5 and P < 0.05 were considered down-regulated DEGs.
Gene function enrichment analysis (GO) and pathway enrichment (KEGG) analysis
Gene Ontology (GO) analysis95 is a commonly used method for large-scale functional enrichment studies, including biological processes (BP), molecular functions (MF), and cellular components (CC). Kyoto Encyclopedia of Genes and Genomes (KEGG)96 is a widely used database that stores information on genomes, biological pathways, diseases, drugs, and more. We used the clusterProfiler package97 in R to perform GO annotation analysis on the DEGs. The criteria for selecting enriched terms were P.adj < 0.05 and false discovery rate (FDR) value (q-value) < 0.05, indicating statistical significance. The Benjamini-Hochberg (BH) method was used for P correction.
Gene set enrichment analysis and gene set variation analysis
GSEA (Gene Set Enrichment Analysis)98 is a computational method used to determine whether a set of pre-defined genes show statistical differences between two biological states and is commonly used to estimate changes in pathway and biological process activity in samples from expression datasets. In this study, we first divided the genes in the high and low-risk groups of the PRDEGs diagnostic model in the IS dataset into two groups of high and low phenotypic correlation based on phenotypic correlation ranking, and then we used the clusterProfiler package to perform an enrichment analysis of all genes in the two groups of high and low phenotypic correlation. The parameters used in this GSEA were as follows: seed of 2022, calculated number of times was 10,000, each gene set contained at least 10 genes and at most 500 genes, and the P correction method was Benjamini-Hochberg (BH). We obtained the "c2.all.v7.5.1.symbols.gmt" gene set from the Molecular Signatures Database (MSigDB). The screening criteria for significant enrichment were P.adj < 0.05 and FDR value (q. value) < 0.05.
GSVA (Gene Set Variation Analysis)99, known as Gene Set Variation Analysis, is a non-parametric, unsupervised analysis method that evaluates gene set enrichment in microarray transcriptomes by converting gene expression matrices between samples into gene set expression matrices between samples. This is used to assess whether different pathways are enriched across samples. We obtained the "c2.cp.all.v2022.1.Hs.symbols.gmt" gene set from the MSigDB database and performed GSVA on all genes in the PRDEGs diagnostic model high and low-risk groups in the IS dataset to calculate the functional enrichment of genes in the PRDEGs diagnostic model high and low-risk groups. The functional enrichment differences between the risk-groups were calculated, and P < 0.05 and |logFC|> 0.35 were used as the significant enrichment screening criteria.
Protein–protein interaction networks
The STRING database100 was used to construct DEGs-related protein–protein interaction networks from DEGs in the high and low risk groups of the filtered IS dataset PRDEGs diagnostic model, visualise the PPI network using Cytoscape101 and include DEGs in the PPI network with links to other nodes and as key genes (hub genes) for IS diseases.
Construction of mRNA-miRNA, mRNA-RBP, and mRNA-TF interactions networks
Using the miRDB database102 predicted miRNAs interacting with hub genes and found mRNA-miRNA interactions for key genes (hub genes) with a Target Score ≥ 90 on the data section to map mRNA-miRNA interaction networks. The ENCORI database was also used103 RNA binding proteins (RBPs) interacting with key genes were predicted and mRNA-RBP pairs were screened using clusterNum > 1 and clipExpNum > 1 as screening criteria and mRNA-RBP interaction networks were mapped. Through the CHIPBase database104 (version 3.0) and the hTFtarget database105. Transcription factors (TFs) that bind to key genes (hub genes) were identified and the common parts of both databases were retained. Cytoscape software was then used to visualize mRNA-miRNA, mRNA-RBP, and mRNA-TF interactions networks.
Statistical analysis
All data processing and analysis in this article was based on R software (Version 4.2.2), and continuous variables are presented as mean ± standard deviation. Comparisons between two groups were made using the Wilcoxon rank sum test (i.e., Wilcoxon rank sum test), while comparisons between three or more groups were made using the Kruskal–Wallis test.
Limitation
In this study, we described the gene expression profiles related to pyroptosis in ischemic stroke using machine learning to identify potential targets for drug repurposing. However, owing to the limitations of single-cell sequencing, such as the inability to accurately describe low-expression genes, large sample sizes are required for reliable analysis. Additionally, owing to the lack of corresponding clinical specimen research, it cannot be analyzed in combination with clinical information. We may conduct future research in this direction using multigroup science and space transcriptome technology.
Conclusion
Our comprehensive machine learning analysis of Control and ischemic stroke (IS) samples has yielded valuable insights into the complexity of PRDEGs and their relevance to IS. Through detailed examination of differential gene expression patterns and ROC analysis, we have identified a critical set of 10 PRDEGs that contribute to diverse cellular processes. Our findings underscore the potential interplay between these genes and IS, shedding light on their diagnostic and therapeutic significance.
ATF6’s up-regulation in IS samples reflects its crucial role as a cellular stress-response mechanism. This discovery opens avenues for ATF6-targeted interventions to alleviate IS-induced cellular stress. BIRC2 emerges as a potent diagnostic discriminator with clinical implications, aligning with recent research, and offering insights into IS mechanisms. BNIP3L’s up-regulation and its involvement in mitophagy indicate its protective role against IS damage. This observation emphasizes the prospect of stimulating BNIP3L expression for IS therapeutics. CTSG’s up-regulation and its potential as a diagnostic marker underscore its significance as an inflammation-related gene product. ELAVL1’s down-regulation reflects its potential role in mitigating neuronal impairment, guiding future research into its regulatory mechanisms. GSK3B’s up-regulation holds promise for IS treatment, with Tideglusib as a potential candidate for drug repurposing. IKBKE’s down-regulation and STAT3’s differential expression underscore their roles, warranting further investigations. NDUFA13’s down-regulation implicates its contribution to IS pathophysiology, and VIM’s upregulation opens prospects for potential therapeutic interventions.
In summary, our study provides a comprehensive view of the 10 PRDEGs’ roles in IS, offering insights into potential targets for diagnostic and therapeutic approaches. By unraveling the intricate network of gene expressions associated with IS, this work contributes to the advancement of knowledge in this field, potentially paving the way for novel strategies in managing and treating ischemic stroke.
Data availability
Data and materials will be made available on request. Dataset URL Reference https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22255https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140275https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16561.
References
Rodgers, M. L. et al. Care of the patient with acute ischemic stroke (Endovascular/Intensive Care Unit-Postinterventional Therapy): Update to 2009 comprehensive nursing care scientific statement: A scientific statement from the American Heart Association. Stroke 52, e198–e210. https://doi.org/10.1161/str.0000000000000358 (2021).
Haupt, M., Gerner, S. T., Bähr, M. & Doeppner, T. R. Neuroprotective strategies for ischemic stroke-future perspectives. Int. J. Mol. Sci. https://doi.org/10.3390/ijms24054334 (2023).
Gu, C. et al. The PI3K/AKT pathway-the potential key mechanisms of traditional chinese medicine for stroke. Front. Med. (Lausanne) 9, 900809. https://doi.org/10.3389/fmed.2022.900809 (2022).
Pal, S., Hartnett, K. A., Nerbonne, J. M., Levitan, E. S. & Aizenman, E. Mediation of neuronal apoptosis by Kv2.1-encoded potassium channels. J. Neurosci. 23, 4798–4802. https://doi.org/10.1523/jneurosci.23-12-04798.2003 (2003).
Besancon, E., Guo, S., Lok, J., Tymianski, M. & Lo, E. H. Beyond NMDA and AMPA glutamate receptors: emerging mechanisms for ionic imbalance and cell death in stroke. Trends Pharmacol. Sci. 29, 268–275. https://doi.org/10.1016/j.tips.2008.02.003 (2008).
Shen, L. et al. Mitophagy in cerebral ischemia and ischemia/reperfusion injury. Front. Aging Neurosci. 13, 687246. https://doi.org/10.3389/fnagi.2021.687246 (2021).
Ward, R. et al. NLRP3 inflammasome inhibition with MCC950 improves diabetes-mediated cognitive impairment and vasoneuronal remodeling after ischemia. Pharmacol. Res. 142, 237–250. https://doi.org/10.1016/j.phrs.2019.01.035 (2019).
Tsuchiya, K. Inflammasome-associated cell death: Pyroptosis, apoptosis, and physiological implications. Microbiol. Immunol. 64, 252–269. https://doi.org/10.1111/1348-0421.12771 (2020).
Wen, H., Miao, E. A. & Ting, J. P. Mechanisms of NOD-like receptor-associated inflammasome activation. Immunity 39, 432–441. https://doi.org/10.1016/j.immuni.2013.08.037 (2013).
Vanaja, S. K., Rathinam, V. A. & Fitzgerald, K. A. Mechanisms of inflammasome activation: recent advances and novel insights. Trends Cell Biol. 25, 308–315. https://doi.org/10.1016/j.tcb.2014.12.009 (2015).
Yang, J. et al. Hippocampal changes in inflammasomes, apoptosis, and MEMRI after radiation-induced brain injury in juvenile rats. Radiat. Oncol. 15, 78. https://doi.org/10.1186/s13014-020-01525-3 (2020).
Jin, C. & Flavell, R. A. Molecular mechanism of NLRP3 inflammasome activation. J. Clin. Immunol. 30, 628–631. https://doi.org/10.1007/s10875-010-9440-3 (2010).
Shi, J., Gao, W. & Shao, F. Pyroptosis: Gasdermin-mediated programmed necrotic cell death. Trends Biochem. Sci. 42, 245–254. https://doi.org/10.1016/j.tibs.2016.10.004 (2017).
Yu, P. et al. Pyroptosis: mechanisms and diseases. Signal Transduct. Target Ther. 6, 128. https://doi.org/10.1038/s41392-021-00507-5 (2021).
Jiang, M., Qi, L., Li, L. & Li, Y. The caspase-3/GSDME signal pathway as a switch between apoptosis and pyroptosis in cancer. Cell Death Discov. 6, 112. https://doi.org/10.1038/s41420-020-00349-0 (2020).
Sepehrinezhad, A., Gorji, A. & Sahab Negah, S. SARS-CoV-2 may trigger inflammasome and pyroptosis in the central nervous system: a mechanistic view of neurotropism. Inflammopharmacology 29, 1049–1059. https://doi.org/10.1007/s10787-021-00845-4 (2021).
Zhang, D. et al. Evidence of pyroptosis and ferroptosis extensively involved in autoimmune diseases at the single-cell transcriptome level. J. Transl. Med. 20, 363. https://doi.org/10.1186/s12967-022-03566-6 (2022).
Voet, S., Srinivasan, S., Lamkanfi, M. & van Loo, G. Inflammasomes in neuroinflammatory and neurodegenerative diseases. EMBO Mol. Med. https://doi.org/10.15252/emmm.201810248 (2019).
Hu, X. et al. Emerging role of STING signalling in CNS injury: inflammation, autophagy, necroptosis, ferroptosis and pyroptosis. J. Neuroinflamm. 19, 242. https://doi.org/10.1186/s12974-022-02602-y (2022).
Shi, S. et al. Identification of pyroptosis-related immune signature and drugs for ischemic stroke. Front. Genet. 13, 909482. https://doi.org/10.3389/fgene.2022.909482 (2022).
Qin, X. et al. Identification of anoikis-related genes classification patterns and immune infiltration characterization in ischemic stroke based on machine learning. Front. Aging Neurosci. 15, 1142163. https://doi.org/10.3389/fnagi.2023.1142163 (2023).
Ren, P. et al. Diagnostic model constructed by nine inflammation-related genes for diagnosing ischemic stroke and reflecting the condition of immune-related cells. Front. Immunol. 13, 1046966. https://doi.org/10.3389/fimmu.2022.1046966 (2022).
Wang, X. et al. Is immune suppression involved in the ischemic stroke? A study based on computational biology. Front. Aging Neurosci. 14, 830494. https://doi.org/10.3389/fnagi.2022.830494 (2022).
Wei, Z. et al. Bioinformatics method combined with logistic regression analysis reveal potentially important miRNAs in ischemic stroke. Biosci. Rep. https://doi.org/10.1042/bsr20201154 (2020).
Martha, S. R. et al. Expression of cytokines and chemokines as predictors of stroke outcomes in acute ischemic stroke. Front. Neurol. 10, 1391. https://doi.org/10.3389/fneur.2019.01391 (2019).
O’Connell, G. C., Chantler, P. D. & Barr, T. L. Stroke-associated pattern of gene expression previously identified by machine-learning is diagnostically robust in an independent patient population. Genom. Data 14, 47–52. https://doi.org/10.1016/j.gdata.2017.08.006 (2017).
Feigin, V. L. Global, regional, and national burden of stroke and its risk factors, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet Neurol. 20, 795–820. https://doi.org/10.1016/s1474-4422(21)00252-0 (2021).
Perera, K. S. et al. Evaluating rates of recurrent ischemic stroke among young adults with embolic stroke of undetermined source: The young ESUS longitudinal cohort study. JAMA Neurol. 79, 450–458. https://doi.org/10.1001/jamaneurol.2022.0048 (2022).
Béjot, Y., Delpont, B. & Giroud, M. Rising stroke incidence in young adults: More epidemiological evidence, more questions to be answered. J. Am. Heart Assoc. https://doi.org/10.1161/jaha.116.003661 (2016).
Wu, S. et al. Stroke in China: advances and challenges in epidemiology, prevention, and management. Lancet Neurol. 18, 394–405. https://doi.org/10.1016/s1474-4422(18)30500-3 (2019).
Sporns, P. B. et al. Childhood stroke. Nat. Rev. Dis. Primers 8, 12. https://doi.org/10.1038/s41572-022-00337-x (2022).
Yang, K. et al. Research progress on pyroptosis-mediated immune-inflammatory response in ischemic stroke and the role of natural plant components as regulator of pyroptosis: A review. Biomed. Pharmacother. 157, 113999. https://doi.org/10.1016/j.biopha.2022.113999 (2023).
Tang, H. et al. Vagus nerve stimulation alleviated cerebral ischemia and reperfusion injury in rats by inhibiting pyroptosis via α7 nicotinic acetylcholine receptor. Cell Death Discov. 8, 54. https://doi.org/10.1038/s41420-022-00852-6 (2022).
Wang, M. et al. Decoction ameliorates ischemic stroke injury via suppressing pyroptosis. Front. Pharmacol. 11, 590453. https://doi.org/10.3389/fphar.2020.590453 (2020).
Ran, Y. et al. Curcumin ameliorates white matter injury after ischemic stroke by inhibiting microglia/macrophage pyroptosis through NF-κB suppression and NLRP3 inflammasome inhibition. Oxid. Med. Cell. Longev. 2021, 1552127. https://doi.org/10.1155/2021/1552127 (2021).
Glembotski, C. C., Rosarda, J. D. & Wiseman, R. L. Proteostasis and beyond: ATF6 in ischemic disease. Trends Mol. Med. 25, 538–550. https://doi.org/10.1016/j.molmed.2019.03.005 (2019).
Huang, L. G. et al. MicroRNA-29c correlates with neuroprotection induced by FNS by targeting both Birc2 and Bak1 in rat brain after stroke. CNS Neurosci. Ther. 21, 496–503. https://doi.org/10.1111/cns.12383 (2015).
Rumble, J. M. & Duckett, C. S. Diverse functions within the IAP family. J. Cell Sci. 121, 3505–3507. https://doi.org/10.1242/jcs.040303 (2008).
Li, M. et al. BRD7 inhibits enhancer activity and expression of BIRC2 to suppress tumor growth and metastasis in nasopharyngeal carcinoma. Cell Death Dis. 14, 121. https://doi.org/10.1038/s41419-023-05632-3 (2023).
Beug, S. T., Cheung, H. H., LaCasse, E. C. & Korneluk, R. G. Modulation of immune signalling by inhibitors of apoptosis. Trends Immunol. 33, 535–545. https://doi.org/10.1016/j.it.2012.06.004 (2012).
Xiong, Y. et al. Enhanced external counterpulsation inhibits endothelial apoptosis via modulation of BIRC2 and Apaf-1 genes in porcine hypercholesterolemia. Int. J. Cardiol. 171, 161–168. https://doi.org/10.1016/j.ijcard.2013.11.033 (2014).
Jamil, K., Jayaraman, A., Ahmad, J., Joshi, S. & Yerra, S. K. TNF-alpha -308G/A and -238G/A polymorphisms and its protein network associated with type 2 diabetes mellitus. Saudi J. Biol. Sci. 24, 1195–1203. https://doi.org/10.1016/j.sjbs.2016.05.012 (2017).
Zhao, L. et al. In silico analysis of novel dipeptidyl peptidase-IV inhibitory peptides released from Macadamia integrifolia antimicrobial protein 2 (MiAMP2) and the possible pathways involved in diabetes protection. Curr. Res. Food Sci. 4, 603–611. https://doi.org/10.1016/j.crfs.2021.08.008 (2021).
Zhang, Q. et al. Identification of key genes and upstream regulators in ischemic stroke. Brain Behav. 9, e01319. https://doi.org/10.1002/brb3.1319 (2019).
Yazdankhah, M. et al. BNIP3L-mediated mitophagy is required for mitochondrial remodeling during the differentiation of optic nerve oligodendrocytes. Autophagy 17, 3140–3159. https://doi.org/10.1080/15548627.2020.1871204 (2021).
Yuan, Y. et al. BNIP3L/NIX-mediated mitophagy protects against ischemic brain injury independent of PARK2. Autophagy 13, 1754–1766. https://doi.org/10.1080/15548627.2017.1357792 (2017).
Wu, X. et al. BNIP3L/NIX degradation leads to mitophagy deficiency in ischemic brains. Autophagy 17, 1934–1946. https://doi.org/10.1080/15548627.2020.1802089 (2021).
Li, Y. et al. BNIP3L/NIX-mediated mitophagy: molecular mechanisms and implications for human disease. Cell Death Dis. 13, 14. https://doi.org/10.1038/s41419-021-04469-y (2021).
Li, J. et al. Targeting neuronal mitophagy in ischemic stroke: an update. Burns Trauma 11, tkad018. https://doi.org/10.1093/burnst/tkad018 (2023).
Knutson, A. K. et al. Comparative effects of histone deacetylase inhibitors on p53 target gene expression, cell cycle and apoptosis in MCF-7 breast cancer cells. Oncol. Rep. 27, 849–853. https://doi.org/10.3892/or.2011.1590 (2012).
Aghdassi, A. A. et al. Absence of the neutrophil serine protease cathepsin G decreases neutrophil granulocyte infiltration but does not change the severity of acute pancreatitis. Sci. Rep. 9, 16774. https://doi.org/10.1038/s41598-019-53293-0 (2019).
Bank, U. & Ansorge, S. More than destructive: neutrophil-derived serine proteases in cytokine bioactivity control. J. Leukoc. Biol. 69, 197–206 (2001).
Wiedow, O. & Meyer-Hoffert, U. Neutrophil serine proteases: potential key regulators of cell signalling during inflammation. J. Intern. Med. 257, 319–328. https://doi.org/10.1111/j.1365-2796.2005.01476.x (2005).
Sambrano, G. R. et al. Cathepsin G activates protease-activated receptor-4 in human platelets. J. Biol. Chem. 275, 6819–6823. https://doi.org/10.1074/jbc.275.10.6819 (2000).
Chan, S. et al. CTSG suppresses colorectal cancer progression through negative regulation of Akt/mTOR/Bcl2 signaling pathway. Int. J. Biol. Sci. 19, 2220–2233. https://doi.org/10.7150/ijbs.82000 (2023).
Huang, G. Z. et al. Bioinformatics analyses indicate that cathepsin G (CTSG) is a potential immune-related biomarker in oral squamous cell carcinoma (OSCC). Onco Targets Ther. 14, 1275–1289. https://doi.org/10.2147/ott.S293148 (2021).
Hooshdaran, B. et al. Dual inhibition of cathepsin G and chymase reduces myocyte death and improves cardiac remodeling after myocardial ischemia reperfusion injury. Basic Res. Cardiol. 112, 62. https://doi.org/10.1007/s00395-017-0652-z (2017).
Du, Y. et al. Downregulation of ELAVL1 attenuates ferroptosis-induced neuronal impairment in rats with cerebral ischemia/reperfusion via reducing DNMT3B-dependent PINK1 methylation. Metab. Brain Dis. 37, 2763–2775. https://doi.org/10.1007/s11011-022-01080-8 (2022).
Wu, X. et al. Identification and validation of novel small molecule disruptors of HuR-mRNA interaction. ACS Chem. Biol. 10, 1476–1484. https://doi.org/10.1021/cb500851u (2015).
Li, Y. et al. Glycogen synthase kinase 3β influences injury following cerebral ischemia/reperfusion in rats. Int. J. Biol. Sci. 12, 518–531. https://doi.org/10.7150/ijbs.13918 (2016).
Wang, H. et al. Tideglusib, a chemical inhibitor of GSK3β, attenuates hypoxic-ischemic brain injury in neonatal mice. Biochim. Biophys. Acta 1860, 2076–2085. https://doi.org/10.1016/j.bbagen.2016.06.027 (2016).
Patel, M. N. et al. Hematopoietic IKBKE limits the chronicity of inflammasome priming and metaflammation. Proc. Natl. Acad. Sci. U. S. A. 112, 506–511. https://doi.org/10.1073/pnas.1414536112 (2015).
Davis, C. M., Lyon-Scott, K., Varlamov, E. V., Zhang, W. H. & Alkayed, N. J. Role of endothelial STAT3 in cerebrovascular function and protection from ischemic brain injury. Int. J. Mol. Sci. https://doi.org/10.3390/ijms232012167 (2022).
Hu, H. et al. Electron leak from NDUFA13 within mitochondrial complex I attenuates ischemia-reperfusion injury via dimerized STAT3. Proc. Natl. Acad. Sci. U. S. A. 114, 11908–11913. https://doi.org/10.1073/pnas.1704723114 (2017).
Paulin, D., Lilienbaum, A., Kardjian, S., Agbulut, O. & Li, Z. Vimentin: Regulation and pathogenesis. Biochimie 197, 96–112. https://doi.org/10.1016/j.biochi.2022.02.003 (2022).
Thakur, G. K. et al. High resolution based quantitative determination of methylation status of CDH1 and VIM gene in epithelial ovarian cancer. Asian Pac. J. Cancer Prev. 20, 2923–2928. https://doi.org/10.31557/apjcp.2019.20.10.2923 (2019).
Ridge, K. M., Eriksson, J. E., Pekny, M. & Goldman, R. D. Roles of vimentin in health and disease. Genes Dev. 36, 391–407. https://doi.org/10.1101/gad.349358.122 (2022).
Ivaska, J., Pallari, H. M., Nevo, J. & Eriksson, J. E. Novel functions of vimentin in cell adhesion, migration, and signaling. Exp. Cell Res. 313, 2050–2062. https://doi.org/10.1016/j.yexcr.2007.03.040 (2007).
Xiao, J. et al. Circulating vimentin is associated with future incidence of stroke in a population-based cohort study. Stroke 52, 937–944. https://doi.org/10.1161/strokeaha.120.032111 (2021).
Roefs, M. M. et al. Increased vimentin in human α- and β-cells in type 2 diabetes. J. Endocrinol. 233, 217–227. https://doi.org/10.1530/joe-16-0588 (2017).
Vasir, B. et al. Effects of diabetes and hypoxia on gene markers of angiogenesis (HGF, cMET, uPA and uPAR, TGF-alpha, TGF-beta, bFGF and Vimentin) in cultured and transplanted rat islets. Diabetologia 43, 763–772. https://doi.org/10.1007/s001250051374 (2000).
Li, Y. et al. Inhibition of NLRP3 and Golph3 ameliorates diabetes-induced neuroinflammation in vitro and in vivo. Aging (Albany NY) 14, 8745–8762. https://doi.org/10.18632/aging.204363 (2022).
Kim, S., Kim, I., Cho, W., Oh, G. T. & Park, Y. M. Vimentin Deficiency prevents high-fat diet-induced obesity and insulin resistance in mice. Diabetes Metab. J. 45, 97–108. https://doi.org/10.4093/dmj.2019.0198 (2021).
Fasipe, T. A. et al. Extracellular Vimentin/VWF (von Willebrand Factor) interaction contributes to VWF string formation and stroke pathology. Stroke 49, 2536–2540. https://doi.org/10.1161/strokeaha.118.022888 (2018).
Li, L. et al. Pyroptosis, a new bridge to tumor immunity. Cancer Sci. 112, 3979–3994. https://doi.org/10.1111/cas.15059 (2021).
Liu, J. et al. Pyroptosis-related lncRNAs are potential biomarkers for predicting prognoses and immune responses in patients with UCEC. Mol. Ther. Nucleic Acids 27, 1036–1055. https://doi.org/10.1016/j.omtn.2022.01.018 (2022).
Davì, G. et al. CD40 ligand and MCP-1 as predictors of cardiovascular events in diabetic patients with stroke. J. Atheroscler. Thromb. 16(6), 707–713. https://doi.org/10.5551/jat.1537 (2009).
Tuttolomondo, A. et al. HLA and killer cell immunoglobulin-like receptor (KIRs) genotyping in patients with acute ischemic stroke. J. Neuroinflamm. 16(1), 88. https://doi.org/10.1186/s12974-019-1469-5 (2019).
Iadecola, C. & Anrather, J. The immunology of stroke: from mechanisms to translation. Nat. Med. 17, 796–808. https://doi.org/10.1038/nm.2399 (2011).
Moskowitz, M. A., Lo, E. H. & Iadecola, C. The science of stroke: mechanisms in search of treatments. Neuron 67, 181–198. https://doi.org/10.1016/j.neuron.2010.07.002 (2010).
Krug, T. et al. TTC7B emerges as a novel risk factor for ischemic stroke through the convergence of several genome-wide approaches. J. Cereb. Blood Flow Metab. 32, 1061–1072. https://doi.org/10.1038/jcbfm.2012.24 (2012).
Li, S. et al. Expression profile and bioinformatics analysis of circular RNAs in acute ischemic stroke in a South Chinese Han population. Sci. Rep. 10, 10138. https://doi.org/10.1038/s41598-020-66990-y (2020).
O’Connell, G. C. et al. Peripheral blood AKAP7 expression as an early marker for lymphocyte-mediated post-stroke blood brain barrier disruption. Sci. Rep. 7, 1172. https://doi.org/10.1038/s41598-017-01178-5 (2017).
Barrett, T. et al. NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Res. 35, D760-765. https://doi.org/10.1093/nar/gkl887 (2007).
Davis, S. & Meltzer, P. S. GEOquery: a bridge between the gene expression omnibus (GEO) and BioConductor. Bioinformatics 23, 1846–1847. https://doi.org/10.1093/bioinformatics/btm254 (2007).
Stelzer, G. et al. The GeneCards suite: From gene data mining to disease genome sequence analyses. Curr. Protoc Bioinform. 54, 13031–313033. https://doi.org/10.1002/cpbi.5 (2016).
Liberzon, A. et al. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425. https://doi.org/10.1016/j.cels.2015.12.004 (2015).
Ben Salem, K. & Ben Abdelaziz, A. Principal component analysis (PCA). Tunis Med. 99, 383–389 (2021).
Engebretsen, S. & Bohlin, J. Statistical predictions with glmnet. Clin. Epigenet. 11, 123. https://doi.org/10.1186/s13148-019-0730-1 (2019).
Park, S. Y. Nomogram: An analogue tool to deliver digital knowledge. J. Thorac. Cardiovasc. Surg. 155, 1793. https://doi.org/10.1016/j.jtcvs.2017.12.107 (2018).
Gruber, H. E., Hoelscher, G. L., Ingram, J. A. & Hanley, E. N. Jr. Genome-wide analysis of pain-, nerve- and neurotrophin -related gene expression in the degenerating human annulus. Mol. Pain 8, 63. https://doi.org/10.1186/1744-8069-8-63 (2012).
Liu, Y. & Zhao, H. Variable importance-weighted random forests. Quant. Biol. 5, 338–351 (2017).
Cai, W. & van der Laan, M. Nonparametric bootstrap inference for the targeted highly adaptive least absolute shrinkage and selection operator (LASSO) estimator. Int. J. Biostat. https://doi.org/10.1515/ijb-2017-0070 (2020).
Van Calster, B. et al. Reporting and interpreting decision curve analysis: A guide for investigators. Eur. Urol. 74, 796–804. https://doi.org/10.1016/j.eururo.2018.08.038 (2018).
Yu, G. Gene ontology semantic similarity analysis using GOSemSim. Methods Mol. Biol. 2117, 207–215. https://doi.org/10.1007/978-1-0716-0301-7_11 (2020).
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30. https://doi.org/10.1093/nar/28.1.27 (2000).
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16, 284–287. https://doi.org/10.1089/omi.2011.0118 (2012).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102, 15545–15550. https://doi.org/10.1073/pnas.0506580102 (2005).
Wang, H. Q. et al. Deregulated miR-155 promotes Fas-mediated apoptosis in human intervertebral disc degeneration by targeting FADD and caspase-3. J. Pathol. 225, 232–242. https://doi.org/10.1002/path.2931 (2011).
Szklarczyk, D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607-d613. https://doi.org/10.1093/nar/gky1131 (2019).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504. https://doi.org/10.1101/gr.1239303 (2003).
Chen, Y. & Wang, X. miRDB: an online database for prediction of functional microRNA targets. Nucleic Acids Res. 48, D127-d131. https://doi.org/10.1093/nar/gkz757 (2020).
Li, J. H. et al. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 42, D92-97. https://doi.org/10.1093/nar/gkt1248 (2014).
Zhou, K. R. et al. ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from ChIP-seq data. Nucleic Acids Res. 45, D43-d50. https://doi.org/10.1093/nar/gkw965 (2017).
Zhang, Q. et al. hTFtarget: A comprehensive database for regulations of human transcription factors and their targets. Genom. Proteom. Bioinform. 18, 120–128. https://doi.org/10.1016/j.gpb.2019.09.006 (2020).
Acknowledgements
We gratefully appreciate the financial support provided by the National Natural Science Foundation of China (Grant numbers 82060237 to Changchun Hei, 82260255 to Xiao Yang, and 82060156 to Ping Liu), the Natural Science Foundation of Ningxia Hui Autonomous Region (Grant number 2023AAC03217 to Changchun Hei), and the Research Fund of Ningxia Medical University (Grant number XZ2022002 to Changchun Hei) for facilitating this research. We also thank GEO database for providing their platforms and contributors for uploading their meaningful datasets.
Funding
This study was funded by the National Natural Science Foundation of China (No. 82060237, 82260255, 82060156), the Natural Science Foundation of Ningxia Hui, Autonomous Region (No. 2023AAC03217, 2024AAC02078), and the Research Fund of Ningxia Medical University (No. XZ2022002) for facilitating this research.
Author information
Authors and Affiliations
Contributions
XY, JN, CH conceived study concept and design. CH, XL, RW, PL, JP conducted data analysis and wrote the manuscript. XD, WZ contributed to the discussion of potential biological targets and drug repurposing hypotheses. XY, PAL revised and finalized the manuscript. All coauthors have reviewed and approved the submission.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
GEO is an open source public database. Users are permited to freely download relevant data for research and publication. This study is exempt from the approval of Institutional Review Board (IRB).
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Hei, C., Li, X., Wang, R. et al. Machine learning analysis of gene expression profiles of pyroptosis-related differentially expressed genes in ischemic stroke revealed potential targets for drug repurposing. Sci Rep 15, 7035 (2025). https://doi.org/10.1038/s41598-024-83555-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-83555-5
Keywords
This article is cited by
-
Identification and Validation of Glycosylation‑Related Genes in Ischemic Stroke Based on Bioinformatics and Machine Learning
Journal of Molecular Neuroscience (2025)