Machine learning-assisted multi-dimensional transcriptomic analysis of cytoskeleton-related molecules and their relationship with prognosis in hepatocellular carcinoma

Li, Yuxuan; Cao, Mingbo; Su, Xiaorui; Yang, Gaoyuan; Ren, Yupeng; He, Zhiwei; Shi, Zheng; Hu, Ziyi; Liang, Guirong; Zhang, Qi; Yao, Zhicheng; Deng, Meihai

doi:10.1038/s41598-025-10056-4

Download PDF

Article
Open access
Published: 03 July 2025

Machine learning-assisted multi-dimensional transcriptomic analysis of cytoskeleton-related molecules and their relationship with prognosis in hepatocellular carcinoma

Yuxuan Li¹^na1,
Mingbo Cao¹^na1,
Xiaorui Su²^na1,
Gaoyuan Yang¹,
Yupeng Ren¹,
Zhiwei He¹,
Zheng Shi¹,
Ziyi Hu²,
Guirong Liang²,
Qi Zhang³,
Zhicheng Yao² &
…
Meihai Deng¹

Scientific Reports volume 15, Article number: 23832 (2025) Cite this article

501 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Hepatocellular carcinoma (HCC) is a leading cause of cancer-related death worldwide, with a poor prognosis due to its aggressive nature and limited treatment options. Cytoskeletal dynamics play a critical role in tumor progression, but the prognostic and therapeutic potential of cytoskeleton-related genes in HCC remains underexplored. In this study, transcriptomic data from the TCGA-LIHC dataset were used to identify differentially expressed cytoskeleton-related genes associated with overall survival (OS). Prognostic models were constructed using LASSO regression and random forest algorithms, and validated in two independent cohorts (ICGC LIRI-JP and CHCC-HBV). Single-cell sequencing (scRNA-seq) and spatial transcriptomics analyses explored the expression and functional roles of key genes, while drug screening and molecular docking identified potential therapeutic agents, followed by in vitro and in vivo validation. The analysis identified 110 cytoskeleton-related DEGs, with 13 significantly associated with OS. A robust five-gene prognostic model (ARPC1A, CCNB2, CKAP5, DCTN2, TTK) was developed using LASSO regression and validated across cohorts. The model was integrated into a clinical nomogram, demonstrating good calibration and utility. Single-cell and spatial transcriptomics revealed high expression of the five genes in malignant tissues and their association with immunosuppressive microenvironments. High-risk scores correlated with TP53 mutations. Drug screening identified irinotecan and sorafenib as potential agents targeting TTK, with combined treatment significantly inhibiting tumor growth in vitro and in vivo. This study highlights the prognostic and therapeutic significance of cytoskeleton-related genes in HCC. The five-gene model provides a reliable tool for risk stratification, and the irinotecan-sorafenib combination shows promise as a therapeutic strategy.

Multi-omics analysis identifies SNP-associated immune-related signatures by integrating Mendelian randomization and machine learning in hepatocellular carcinoma

Article Open access 04 July 2025

Seven chromatin regulators as immune cell infiltration characteristics, potential diagnostic biomarkers and drugs prediction in hepatocellular carcinoma

Article Open access 30 October 2023

Multi-omics analysis of macrophage-associated receptor and ligand reveals a strong prognostic signature and subtypes in hepatocellular carcinoma

Article Open access 28 May 2024

Introduction

Hepatocellular carcinoma (HCC) is one of the most prevalent primary cancers and a leading cause of cancer-related mortality worldwide. It is estimated that HCC accounts for 80% of liver cancer (with an estimated incidence exceeding 900,000 cases and approximately 830,000 deaths annually) diagnoses worldwide^1,2. This aggressive disease accounts for nearly 90% of liver cancer deaths, underscoring its significant impact on public health^3,4. Although surgical treatment remains the most effective option, many patients miss this opportunity because of the insidious and rapidly progressing nature of the disease, frequently resulting in delayed diagnosis. Furthermore, HCC is characterized by high rates of chemotherapy resistance, metastasis, and recurrence, which contribute to its poor prognosis^4,5. Therefore, understanding the molecular mechanisms underlying HCC is essential for identifying new diagnostic and therapeutic targets.

The cytoskeleton, composed of actin filaments, microtubules, and intermediate filaments, plays a critical role in maintaining cellular integrity and facilitating essential processes such as migration, invasion, and cell division⁶. This dynamic framework not only provides structural support but also orchestrates intracellular transport and signal transduction pathways, which are vital for various cellular functions⁷. Dysregulation of cytoskeletal components has been implicated in enhancing tumor invasion and metastasis^8,9. Cytoskeletal dynamics are crucial in cancer progression, with proteins such as tubulin significantly influencing microtubule stability and dynamics¹⁰. Drugs that directly target tubulin and affect its polymerization and depolymerization, such as paclitaxel, have been widely utilized in various cancers, including breast and pancreatic cancer¹¹. Additionally, emerging concepts, such as post-translational acetylation modifications and their corresponding therapeutic approaches (microtubule acetylation inhibitors), have been experimentally validated to significantly inhibit tumor cytoskeletal dynamics, demonstrating the potential to suppress tumor metastasis and progression^12,13. Furthermore, cytoskeletal dynamics rely on streamlined molecular pathways, including RhoA, PI3K/AKT, and TGF-β¹⁰.

In this study, we compared two machine learning methods: Least Absolute Shrinkage and Selection Operator (LASSO)-stepwise Cox regression and Random Forest (RF). Utilizing one training set and two testing sets, we ultimately identified a robust 5-gene OS prediction model demonstrating satisfactory predictive performance. We further conducted a comprehensive analysis of this model and its 5 core genes across multiple aspects, including expression levels, immune infiltration, transcriptional regulation, mutation status, and drug response, as well as various transcriptomic dimensions such as bulk RNA sequencing, scRNA-seq, and spatial transcriptomics. Our research provides new theoretical references and potential targets for advancing the current study of the cytoskeleton in HCC.

Materials and methods

All animal experiments were conducted under the guidelines and the institutional animal care protocol approved by the laboratory animal care and ethics committee of Guangdong Laidi Biomedical Research Institute Co.,LTD(LDSW-SOR-AAP-05–023(02)). The study was conducted in accordance with the Declaration of Helsinki, and ethical review and approval were waived for the human data analysis portion due to exclusive use of publicly available databases that had previously received ethical approval at their original institutions.

Data collection and preprocessing

Transcriptomic data and clinical information of HCC patients and normal tissue donors were obtained from the TCGA-LIHC dataset (https://portal.gdc.cancer.gov/), the International Cancer Genome Consortium (ICGC) (LIRI-JP) dataset (https://xenabrowser.net/datapages/), and the Chinese HCC patients with HBV infection (CHCC-HBV) dataset¹⁴. All data were standardized and converted to FPKM for subsequent analyses.

Identification of differentially expressed genes (DEGs) and functional clustering

A total of 367 cytoskeleton-related genes were obtained from the MSigDB database. DEGs between high and low OS groups (divided by median OS) in the training set (TCGA-LIHC) were identified using the “limma” R package¹⁵, with p < 0.05 as the cutoff. R package “heatmap” was used to generate Heatmaps showing differences in gene expression between high and low OS groups. After performing ID conversion on the input molecular list, functional enrichment analyses using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)^16,17,18 were performed using the “clusterProfiler” R package.

Gene correlation analysis and heatmap visualization

Heatmaps illustrating the correlations between genes were created using“corrplot”,“heatmap”, and“ggpubr”R packages. Correlation p-values were displayed in the figures as follows: * for < 0.05, ** for < 0.01, and *** for < 0.001.

Protein–protein interaction (PPI) network analysis

Online PPI analysis and clustering were performed on the STRING website (https://string-db.org/).

Prognostic heatmap visualization

Prognostic heatmaps were generated using “pheatmap” and “ggplot2” R packages. Scatter plots of OS-Status distribution and bar graphs of risk factor distribution were overlaid above the gene expression heatmap.

Construction of prognostic prediction model using COX regression and machine learning methods

A prognostic prediction model for HCC was constructed using COX regression and machine learning techniques. COX regression analysis was performed using the“survival”R package to identify key genes associated with OS. LASSO stepwise COX regression analysis was performed using the“glmnet”R package, with tenfold cross-validation to obtain the optimal model. The“gg_minimal_vimp”function in the“randomForestSRC”package was used to evaluate variable importance in the model, which was visualized alongside minimum depth information to identify the most critical features for model prediction. Subsequently, the“obliqueRSF”package was employed to calculate error rates (1—C-index) from predictions, recording error rates for tree numbers from 10 to 1000 in increments of 10. Based on the lowest error rate, a random forest (RF) model was established with 110 trees. Risk scores for each sample were calculated based on predictions and incorporated into the training and testing datasets for further analysis.

Evaluation of predictive performance

ROC curves were generated, and the area under the curve (AUC) was calculated using the“timeROC”and“pROC”R packages to evaluate predictive performance. The difference in predictive performance between the two models was assessed using the DeLong test, with a p-value of less than 0.05 considered statistically significant. To compare survival outcomes between different risk groups, the R package“maxstat”was used to determine the optimal cutoff value for risk score, and the“survival”R package was used to plot Kaplan–Meier (K-M) curves. (Subsequent survival analysis was conducted using the same method.) C-index values at corresponding time points were calculated using the “survival” package, and the results were visualized using the “ggplot2”.

Multivariate COX regression and nomogram construction

Multivariate COX regression was performed using the“survival”R package, integrating survival time, survival status, and other data to evaluate the prognostic significance of risk score and clinical features. The“forestploter”R package was used to create a forest plot that displays the outcomes of the multivariate COX regression analysis. A nomogram-related model was constructed and visualized using the“rms”R package. To validate the multivariate COX analysis and nomogram model, calibration curve analysis and visualization were performed using the“rms”R package. Additionally, to provide a more intuitive representation of the predictive value of independent risk factors, the prognostic model was fitted using the“survival”package, and DCA analysis was conducted using the“stdca”R package.

Consensus clustering analysis and sankey diagram visualization

Cluster analysis was conducted with ConsensusClusterPlus¹⁹, utilizing agglomerative PAM clustering based on 1-Pearson correlation distances, and resampling 80% of the samples over 10 repetitions. The optimal cluster count was identified using the empirical cumulative distribution function (ECDF) plot. The R package “networkD3” was used to analyze the flow relationships among risk score, Consensus Clusters, Status, and OS, and a Sankey diagram was created.

t-SNE analysis of gene expression

t-SNE analyses were conducted using“Rtsne”R packages. Expression profiles were first z-score normalized, and two groups, defined by high and low risk scores (with its median as the cutoff threshold), were displayed to show group distribution.

Differential expression and immune infiltration of 5 core genes

We analyzed the expression difference of 5 core genes between HCC tissues and normal liver tissues from HCC patients in the TIMER2.0 website (http://timer.cistrome.org/). The relationship between the expression level of 5 genes and the infiltration of four immune cell types: CD4 + T cells, Regulatory T cells (Tregs), myeloid-derived suppressor cells (MDSCs), and cancer-associated fibroblasts (CAFs), was examined via Spearman correlation analysis.

Mutation gene analysis

Genomic data for TCGA-LIHC, encompassing somatic mutations and DNA copy number variations, were retrieved from the Genomic Data Commons (GDC) website (https://portal.gdc.cancer.gov/). Differential mutation gene analysis was performed using R packages “maftools”, “base”, and “graphics” to identify mutation genes associated with risk scores and explore their expression correlations.

Transcription factor analysis and network venn diagram visualization

Transcription factors associated with the 5 core genes in the liver were retrieved from the “hTFtarget” database. R packages “igraph”, “ggraph”, and “ggplot2” were used to create a network Venn diagram. Common transcription factors shared by the 5 genes were annotated in the diagram.

Methylation analysis of HCC patients

Methylation levels and differential analysis data for the 5 core genes were obtained from the UALCAN website (https://ualcan.path.uab.edu/).

Single-cell RNA sequencing analysis

Single-cell RNA sequencing (scRNA-seq) analysis was performed using the GSE149614 dataset, which includes 21 samples (8 non-tumor liver tissues (N), 10 primary HCC tumor tissues (T), 1 metastatic lymph node tissue (L), and 2 portal vein tumor thrombus tissues (P)). The analysis utilized the“Seurat”R package (version 4.0) for data processing and analysis. Cells were filtered based on criteria of 200 < nFeature_RNA < 8000 genes per cell and mitochondrial gene expression proportions < 10%. For samples with ≥ 10 cells, batch correction was implemented via Seurat’s integration pipeline, involving“SplitObject”,“FindIntegrationAnchors”, and“IntegrateData”based on canonical correlation analysis. Data normalization and standardization were performed using the"NormalizeData + FindVariableFeatures + ScaleData"workflow, focusing on 2000 highly variable genes. Principal component analysis (PCA) was conducted with“RunPCA”, selecting principal components explaining 90% of variance (default 20 components). Clustering was performed using“FindNeighbors”and“FindClusters”, with cell types annotated via the“SingleR”function against the HumanPrimaryCellAtlasData reference. Dimensionality reduction and visualization were achieved using UMAP. Key analyses included expression profiling of 5 risk genes (ARPC1A, CCNB2, CKAP5, DCTN2, TTK), calculation of risk scores (weighted by gene-specific coefficients), and investigation of immune checkpoint ligand-receptor pairs (PD-L1/PD-1, CD80/CD86/CTLA4, NECTIN2/PVR/TIGIT, LGALS9/HAVCR2, CD47/SIRPA). Pearson correlation analysis assessed relationships between risk genes, immune cell infiltration ratios, and immune escape genes at the sample level. Visualizations, including UMAP plots, violin plots, and correlation heatmaps, were generated using“ggplot2”, with statistical reliability ensured by requiring ≥ 3 cells per group and significance denoted by * (p < 0.05).

Spatial transcriptomics analysis

Spatial transcriptomics analysis was performed using the HCCDB website (http://lifeome.net:809/#/home)²⁰, specifically using four slice samples from the “HCC2” group. The spatial distribution of the 5 core genes (ARPC1A, CCNB2, CKAP5, DCTN2, TTK) was visualized.

Drug prediction and drug sensitivity analysis

Drug prediction was carried out using the DSigDB database from Enrichr (https://maayanlab.cloud/Enrichr/). Drug sensitivity analysis was performed using the Liver Cancer Organoid Biobank (LICOB) dataset²¹, which contains expression profiles and half-maximal inhibitory concentration (IC50) data for HCC organoids. Additionally, two datasets,“Expression_Public_24Q2_subsetted”and"PRISM_Repurposing_Public_24Q2_subsetted (IC50)", were downloaded from the DepMap database (https://depmap.org/portal/) and used as raw data for drug resistance analysis in cell lines.

Molecular docking

Protein structures were retrieved from the PDB database (https://www.pdb.org/), and drug structures were obtained from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/). After removing water molecules and non-essential compounds from the molecules using MOE (Chemical Computing Group, Montreal, QC, Canada), molecular docking was performed following the principles of maximum binding affinity, maximum interaction sites, and minimum binding energy to select the most suitable docking scheme. Additionally, two-dimensional and three-dimensional docking diagrams were generated using MOE.

Cell culture

PLC/PRF/5 and Hep3B cell lines were purchased from the American Type Culture Collection (ATCC, Manassas, VA, USA). HCCLM3, Huh-7, and SNU449 cell lines were purchased from Cellcook Biotech (Guangzhou, China). Cells were cultured at 37 °C with 5% CO₂ in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 1 × penicillin–streptomycin (PS). Cells were passaged when they reached 80% confluence. Sorafenib (5 μM) and/or Irinotecan (10 μM) were used as indicated.

Reverse transcription quantitative polymerase chain reaction (RT-qPCR)

Vazyme TRIzol reagent (Nanjing, China) was used to extract total RNA from HCC cells and then Vazyme HisScript® III RT SuperMix kit (Nanjing, China) was used to reverse-transcribe RNA into cDNA. RT-qPCR was performed using the Vazyme ChamQ Universal SYBR qPCR Kit (Nanjing, China) on a Roche LightCycler® 96 instrument. GAPDH was used as an internal reference to normalize the expression of target genes. Each sample was analyzed in triplicate. The reverse transcription and RT-qPCR procedures followed the manufacturer’s instructions for reagents. Primer sequences used in this study include:

GAPDH: Forward Sequence: GTCTCCTCTGACTTCAACAGCG; Reverse Sequence: ACCACCCTGTTGCTGTAGCCAA.

ARPC1A: Forward Sequence: AGAGTTCCTGCCGCTCCTAAGT; Reverse Sequence: ACTTGGAGACGAAGGTCAGGCA.

CCNB2: Forward Sequence: CAACCAGAGCAGCACAAGTAGC; Reverse Sequence: GGAGCCAACTTTTCCATCTGTAC.

CKAP5: Forward Sequence: TCATCACAGTCCTTGGAGACAGC; Reverse Sequence: CAGCCATTCCTTCATGCCAGTC.

DCTN2: Forward Sequence: CCCAGATAGCAGCCTTGTCACT; Reverse Sequence: TCCAGCTCTGTCAGGCGCTTTT.

TTK: Forward Sequence: CCGAGATTTGGTTGTGCCTGGA; Reverse Sequence: CCGAGATTTGGTTGTGCCTGGA.

Cell proliferation assay

PLC/PRF/5 cells were seeded into 96-well plates at a density of 1000 cells per well. Cell proliferation was measured at 24, 48, 72, 96, and 120 h using a Cell Counting Kit-8 (CCK8, RARbio, China). After 24 h, media were refreshed, and cells were treated with the corresponding drug concentrations according to the experimental groups. At each time point, 100 μL of complete medium containing 10 μL of CCK-8 reagent was added to each well, mixed thoroughly, and incubated for 2 h. Absorbance was measured at 450 nm.

Colony formation assay

1000 cells per well were plated in 6-well plates and incubated at 37 °C with 5% CO₂ for 14 days. Cells were treated with the corresponding drug concentrations during the culture period. Colonies were fixed with methanol and stained with 0.05% crystal violet for 30 min. Colonies containing more than 50 cells were counted under a microscope.

Xenograft formation in nude mice

All animal experiments were conducted under the guidelines and the institutional animal care protocol approved by the laboratory animal care and ethics committee of Guangdong Laidi Biomedical Research Institute Co.,LTD(LDSW-SOR-AAP-05–023(02)).

Female BALB/c nude mice (4–6 weeks old) purchased from GemPharmatech LLC (Jiangsu, China), were utilized to establish subcutaneous xenograft tumor models. Transfected PLC/PRF/5 cells were collected, washed, and resuspended in PBS. Subsequently, 2 × 10^6 cells (suspended in 100 μL PBS) were injected subcutaneously into the axilla of each mouse. Once the tumor volume in the nude mice reached 200 mm³, the mice were grouped for different treatments. The tumor volume and body weight of each mouse were recorded every three days. Mice were considered to have reached the observation endpoint if the total treatment duration reached 28 days, tumor volume reached 1500 mm³, or significant weight loss occurred due to tumor growth or treatment. According to the protocol, mice were euthanized upon exhibiting any of these criteria: (1) signs of uncontrollable pain or decreased mobility; (2) weight loss exceeding 20% of initial body mass; (3) lethargy, diminished response to external stimuli, dull or ruffled fur, hunched posture, reduced food and water intake, decreased urine and fecal output; (4) behavioral and clinical manifestations including palpable abdominal masses, persistent somnolence, sustained temperature abnormalities, respiratory distress, or other indicators of severe health deterioration; (5) reaching the defined observation endpoint. For conditions (1) through (4), specimens were collected through euthanasia prior to symptom onset, based on anticipated tumor development timing or experimental design requirements. Such mice were euthanized by cervical dislocation and disposed of humanely.

Statistical analysis

All statistical analyses were performed using R software (version 4.2.0) and GraphPad Prism (version 9.0). Data are presented as mean ± standard deviation (SD). Comparisons between two groups were performed using the Student’s t-test, while comparisons among multiple groups were conducted using one-way ANOVA followed by Tukey’s post hoc test. Survival analysis was performed with the Kaplan–Meier method, and the log-rank test was used to evaluate differences between the survival curves. Correlation analysis was performed using Spearman’s or Pearson’s correlation coefficients as appropriate. A p-value < 0.05 was considered statistically significant. We define statistical significance as usual: **** P ≤ 0.0001; *** P ≤ 0.001; ** P ≤ 0.01; * P ≤ 0.05; n.s., not significant.

The flowchart of this study is shown in Fig. 1.

Results

Identification of core cytoskeleton-related DEGs in hepatocellular carcinoma

First, we classified 365 HCC samples from the TCGA-LIHC database into two groups based on overall survival (OS). Differential expression analysis of 367 cytoskeleton-related genes between these two groups revealed 110 differentially expressed genes (DEGs) (Fig. 2A). Subsequently, we conducted interaction analysis (Fig. 2B) and clustering analysis (Fig. 2C) of the DEGs using STRING, identifying 2 major clusters containing 80 cytoskeleton-related DEGs (green and red gene clusters in Fig. 2C). Correlation analysis among the DEGs indicated that the relationships among the 80 genes within these two major clusters were more consistent than those among all 110 DEGs, with most genes exhibiting positive correlations with each other in expression levels (Figs. 2D-E). Finally, GO and KEGG analyses of the clustered 80 DEGs revealed associations primarily with cytoskeletal structure, intercellular junctions, spindle formation, and cellular invasion activities (Fig. 2F).

Construction of prognostic models for hepatocellular carcinoma associated with cytoskeleton

To establish a prognostic predictive model for HCC, we performed univariate COX analysis on the 80 identified genes, screening out 13 key genes (Fig. 3A). To develop a more robust model, we utilized the TCGA-LIHC dataset as the training cohort. We first applied LASSO regression analysis, determining an optimal λ value of 0.01, which led to the selection of 9 candidate genes (Figs. 3B and 3 C). Subsequently, we constructed a five-gene model comprising ARPC1A, CCNB2, CKAP5, DCTN2, and TTK through stepwise COX regression from the 9 genes identified by LASSO (Fig. 3D). The heatmap of gene expression associated with OS indicated a consistent distribution, with higher expression levels observed in individuals with lower OS (Fig. 3E). The prognostic ROC curve demonstrated strong predictive capability for the model, with AUC values of 0.77 (95% CI: 0.70–0.83), 0.74 (95% CI: 0.67–0.81), and 0.73 (95% CI: 0.65–0.81) for 1-year, 3-year, and 5-year survival, respectively (Fig. 3F). Based on the risk score from the five-gene model, the K-M curve indicated that HCC patients in the high-risk score group experienced significantly shorter survival than those in the low-risk score group (Fig. 3G). The C-index was calculated to be 0.695 (95% CI: 0.671–0.719), indicating good predictive performance of the model (Fig. 3H).

Furthermore, within the TCGA-LIHC training cohort, we constructed an alternative prognostic risk model using the RF method. Employing the variable importance measure (VIMP) combined with minimal depth, we identified 5 core genes (ARPC5, CAPZA1, CKAP5, WASF2, and TTK) (Fig. 3I). Analysis of the optimal five-gene model from the random forest revealed the lowest error rate with 110 trees (Fig. 3J). The prognostic ROC curve for the random forest model indicated AUC values of 0.91 (95% CI: 0.88–0.95), 0.89 (95% CI: 0.84–0.93), and 0.90 (95% CI: 0.86–0.95) for 1-year, 3-year, and 5-year survival, respectively (Fig. 3K), demonstrating stronger prognostic capability compared with LASSO-stepwise COX model (DeLong test, p < 0.05). K-M analysis based on the risk score from the five-gene model similarly indicated that HCC patients in the high-risk score group had considerably shorter survival than their counterparts (Fig. 3L). The C-index for the RF model was calculated at 0.836 (95% CI: 0.809–0.872), suggesting that the model’s predictive capability was superior to that of the LASSO-stepwise COX regression model in the TCGA-LIHC dataset (Fig. 3M).

Validation of the prognostic models

After establishing the two models, we validated them using 2 test datasets: ICGC (LIRI-JP) and CHCC-HBV. We depicted prognostic ROC curves, K-M curves for OS, and C-index values based on 2 models derived from the LASSO-stepwise COX method and the RF method.

The results indicated that AUC for the prognostic ROC curves of the LASSO-stepwise COX model consistently remained at 0.7 or above in both test datasets. In contrast, the RF model exhibited poor predictive capability in both test datasets, with AUC values below 0.7. Notably, in the CHCC-HBV dataset, the AUC for the prognostic ROC curve was particularly low, dropping to 0.51 (95% CI: 0.41–0.62) at the 3-year mark. DeLong test indicated that the AUC values of ROC curves for corresponding years of the LASSO-stepwise COX model in each dataset were significantly greater than those of the corresponding RF model (p < 0.05). (Figs. 4A-D).

Moreover, the K-M curves indicated that in the ICGC (LIRI-JP) (n = 231) and CHCC-HBV test datasets (n = 159), the p-values generated by the RF model were significantly higher than those produced by the corresponding LASSO-stepwise COX model, further suggesting that the predictive capability of the former may not be as robust as that of the latter (Figs. 4E-H).

The C-index values for the LASSO-stepwise COX model were 0.748 (95% CI: 0.703–0.794) in the ICGC (LIRI-JP) dataset and 0.677 (95% CI: 0.643–0.710) in the CHCC-HBV dataset, while the RF model yielded C-index values of 0.583 (95% CI: 0.52–0.65) and 0.584 (95% CI: 0.534–0.637), respectively (Figs. 4I-L). Overall, combining the results from the two testing methods, we found that the RF model had limited generalizability to other datasets, whereas the LASSO-stepwise COX model maintained robust predictive performance. Consequently, we proceeded with the five-gene model derived from the LASSO-stepwise COX method for further analyses.

For the model constructed by the LASSO-stepwise COX method, we incorporated conventional clinical parameters and conducted multivariate COX regression analysis in the TCGA-LIHC dataset (n = 353). This analysis revealed that both risk score and age were independent risk factors (Fig. 4M). We subsequently constructed a prognostic nomogram based on this multivariate model, which exhibited a C-index of 0.721 (95% CI: 0.687–0.754) (Fig. 4N). The nomogram was further validated using calibration curves (Fig. 4O) and decision curve analysis (DCA) (Figure S1A), both indicating good predictive value. In the two test datasets, ICGC (LIRI-JP) (n = 231) and CHCC-HBV (n = 159), the heatmap of gene expression related to OS showed a similar distribution of the 5 genes, with higher expression levels observed in individuals with low OS, consistent with the training dataset TCGA-LIHC (Figures S1B and S1D). Similarly, multivariate COX regression analysis incorporating clinical factors indicated that the risk score continued to be an independent risk factor in both test datasets (Figures S1C and S1E). To evaluate the performance of our risk model, we compared it with existing models constructed around TCGA-LIHC from other studies. Based on a previous study that summarized C-index values of 10 different aspects of hepatocellular carcinoma machine learning models, including HCC prognosis-related gene signatures (HPRGs), anoikis-related genes signatures (ARGs), lactic acid metabolism-related gene signatures (LMRGs), cuproptosis-related gene signatures (CRGs), epithelial-mesenchymal transition-related gene signatures (EMTRGs), fatty acid metabolism gene signatures (FMRGs), cancer-associated fibroblast signatures (CFGs), necroptosis-related gene signatures (NRGs), inflammatory response-related gene signatures (RRGs), and immunotherapy-related gene signatures (IRGs), the C-index ranges were 0.496–0.709 in the training set TCGA-LIHC (with HPRGs being optimal) and 0.502–0.703 in the testing set LIRI-JP (with HPRGs being optimal)²². Our current risk model demonstrated C-index values of 0.695 (0.671–0.719) in TCGA-LIHC and 0.748 (0.703–0.794) in LIRI-JP, which were non-inferior to these studies and even more robust than these existing models. These findings further support the clinical value of our 5-gene model in hepatocellular carcinoma prognosis prediction.

Additionally, we identified the intersection of genes involved in the LASSO-stepwise COX and RF analyses, yielding two common genes: CKAP5 and TTK (Figure S2A). Due to the limited number of genes, we further analyzed CKAP5 and TTK using COX regression to construct a new model. However, the COX regression results indicated that only TTK exhibited statistically significant differences (Figure S2B). This finding suggests that TTK may serve as a core gene significantly influencing the prognosis of HCC, providing an important lead for subsequent drug screening analyses and experiments. Therefore, we further assessed the predictive capability of TTK. The prognostic ROC curves derived from TCGA-LIHC, ICGC (LIRI-JP), and CHCC-HBV datasets demonstrated that TTK also had good prognostic judgment capability (Figure S2C). K-M analysis revealed that elevated TTK expression was linked to worse prognosis (Figure S2D).

Representational role of the five-gene model for two clusters based on the expression of eighty core DEGs in hepatocellular carcinoma

To assess whether the 5 genes derived from the LASSO-stepwise COX method accurately represent the 80 DEGs obtained from clustering, we performed dimensionality reduction analyses using the risk score derived from the LASSO-stepwise COX model as the grouping criterion (Low/High), with its median as the cutoff threshold, and the expression of the 80 DEGs as the expression profile. The t-SNE analysis (Fig. 5A), preliminarily indicated that the risk score effectively segregated the expression profiles of all 80 genes into two subgroups. Additionally, we performed consensus clustering on the expression levels of the 80 DEGs using ConsensusClusterPlus, determining the optimal cluster number to be k = 2 (Fig. 5B). The ECDF plot illustrated the consensus clustering for each k (Fig. 5C). We used the diagnostic ROC curve to evaluate the ability of the risk score from the LASSO-stepwise COX model to predict the clustering results of the 80 DEGs, yielding an AUC of 0.76 (95% CI: 0.71–0.81) (Fig. 5D). Finally, the Sankey diagram illustrated the relationships among clustering, the five-gene model risk score, survival status, and OS, suggesting a strong consistency between the risk score and the clustering results of the 80 DEGs. Most samples in cluster C1 exhibited high-risk score, while cluster C2 displayed the opposite trend (Fig. 5E). In summary, we believe that the 5 core genes and the model they constitute can represent the 80 genes.

The expression of 5 genes in hepatocellular carcinoma and their relationships with prognosis, immune suppressor cells, and genetic mutations

Analysis of the TCGA-LIHC dataset revealed remarkable differences in the expression level of the 5 genes between HCC and normal liver tissues, with all 5 genes showing markedly elevated expression in HCC (Fig. 6A). Furthermore, K-M curves indicated the prognostic value of these genes, as high expression levels of all 5 genes were related to poor OS in HCC patients (Fig. 6B). After further analysis with the TIMER2.0 database, the expression levels of 5 genes exhibited significant positive correlations with the infiltration levels of CD4 + T cells, Tregs, MDSCs, and CAFs (Fig. 6C). This suggests that high expression of these genes may indicate an immunosuppressive or immune evasion state in HCC^23,24,25. Finally, we analyzed the mutation status of the 5 genes in high/low-risk score groups from the TCGA-LIHC dataset. No statistically significant differences in mutation rates were observed between the two groups. However, further analysis of differential mutations revealed that the mutations (mainly missense mutations) of TP53 exhibited the most significant differences, notably increasing in the high-risk score group. The waterfall plot illustrates the top 20 mutated genes distinguished by the risk score of the five-gene model (Fig. 6D). Moreover, differential mutation analysis of the 5 core genes based on high and low OS grouping criteria revealed limited mutated individuals for each gene, with no mutations found for CCNB2 in the TCGA-LIHC dataset. Consequently, the analysis did not yield statistically significant differences, leading us to conclude that mutations in these 5 genes have no significant impact on the prognosis of HCC (Figure S2E).

Transcriptional regulation analysis of 5 genes

To investigate the factors influencing the transcriptional differences of the five identified genes, an analysis of their transcriptional regulation was performed. Utilizing the hTFtarget database, we identified transcription factors associated with each gene in liver tissue and determined the intersection of these transcription factors, constructing a Venn diagram. The Venn diagram revealed three common transcription factors shared among the 5 genes: CREB1, SIN3A, and YY1 (Fig. 7A), suggesting their potential key roles in regulating 5 genes’ expression. Additionally, we assessed the methylation levels of the 5 genes using the TCGA-LIHC dataset. Our analysis indicated that the methylation levels of CCNB2, CKAP5, and DCTN2 were significantly lower in HCC compared to normal liver tissues. Although ARPC1A and TTK exhibited no statistically significant differences in methylation levels between HCC and normal liver tissues, both genes showed reduced methylation levels in HCC relative to normal tissues (Fig. 7B). These findings reflect the transcriptional activation status of the 5 genes in hepatocellular carcinoma.

Single-cell sequencing analysis of 5 genes and risk score

We analyzed the scRNA-seq data from GSE149614 and utilized UMAP plots to illustrate the distribution of quality control-passed cells according to sample (Fig. 8A) and cell type (Fig. 8B), along with the counts of various cell types. Among these, (benign/malignant) hepatocytes constituted the highest proportion (Fig. 8C). To further investigate the impact of the 5 risk score-related genes on immune infiltration, we performed a Pearson correlation analysis between their expression levels and the infiltration levels of immune cells. The results indicated that the tumor infiltration levels of major cytotoxic immune cells, including T cells and Natural Killer (NK) cells, correlated negatively with the expression levels of the 5 genes, particularly for NK cells (p < 0.05) (Fig. 8D).

Moreover, we analyzed the expression correlation between risk scores and immune-related molecules (Fig. 8E). The findings revealed that the expression of PVR was significantly positively correlated with the expression levels of the 5 genes (p < 0.05), suggesting that PVR may play an active role in tumor immune evasion. Furthermore, several genes predominantly expressed by immune cells, such as PDCD1, TIGIT, and LAG3, exhibited negative correlations with the expression levels of the 5 genes. We considered that this observation might be influenced by the negative correlation between risk score-related genes and the infiltration levels of immune cells like T cells. Nonetheless, CTLA4 still displayed a positive correlation trend, reflecting its potential role as a key target of the 5 genes affecting immune cells.

To better define the risk score situation in (benign/malignant) hepatocytes and accurately understand the relationship between risk scores and HCC occurrence and development, we categorized the samples based on their tissue types. The analysis demonstrated that risk scores were elevated in metastatic lymph nodes (L), hepatocellular carcinoma tissues (T), and portal vein tumor thrombus (P) samples, with the highest scores observed in metastatic lymph nodes, while the lowest scores were recorded in normal liver tissues (N) (Fig. 8F and S3A). Additionally, the expression levels of the 5 genes and their corresponding risk scores in hepatocytes were depicted using UMAP plots, as shown in Fig. 8G.

Spatial transcriptomics analysis of 5 genes’ expression

To further validate the predictive role of the 5 genes, we conducted spatial transcriptomics analysis using the HCCDB database (http://lifeome.net:809/#/home/). The tissue specimens from the HCC2 dataset were found to be consistent with the tissue types identified in the GSE149614 scRNA-seq, including normal liver tissue (N), hepatocellular carcinoma tissue (T), lymph node metastases (L), and portal vein tumor thrombus (P). Consequently, we selected the HCC2 samples for this exploratory analysis.

Initially, hematoxylin and eosin (H&E) staining of the tissue sections revealed the overall morphology of the samples. Based on this morphological assessment, we delineated the distribution of different tissue and cell types (Fig. 9A). Subsequently, UMAP plots were employed to further illustrate the distribution of (Fig. 9B) tissue types and (Fig. 9C) cell types, along with their interrelationships through clustering and dimensionality reduction.

Finally, we mapped the distribution of the 5 genes across the overall morphology of the tissue sections. All 5 genes showed significantly higher expression levels in malignant tissues, specifically hepatocellular carcinoma (T), lymph node metastases (L), and portal vein tumor thrombus (P) (Fig. 9D). Additionally, incorporating the previous UMAP analysis of cell types, through the UMAP plots of the 5 genes, we also found their expression levels were elevated in both tumor and stromal cell types (Fig. 9E). These findings corroborate the results obtained from the previous scRNA-seq analysis.

Drug screening and molecular docking of 5 genes

We utilized the DSigDB database (https://maayanlab.cloud/Enrichr/) to predict protein interactions corresponding to the 5 core genes, yielding 168 results. Among these, the top ten candidate compounds with the lowest p-values are presented (Fig. 10A). Among the 87 statistically significant results (p < 0.05), which included 83 compounds, irinotecan exhibited the lowest p-value, and both irinotecan and sorafenib were identified with the highest frequency (three occurrences each).

Subsequently, we performed Pearson correlation analysis on risk scores and drug IC50 values for all HCC cell lines in the DepMap PRISM dataset and all HCC organoids in the LICOB dataset, respectively, to obtain all drugs with significant correlations (p < 0.05) (Fig. 10B and S3B show the top 20 positive and negative correlations for each dataset). Venn diagrams illustrated the intersection of compounds predicted by DSigDB with drugs significantly correlated with risk scores in both the DepMap PRISM and LICOB datasets (Fig. 10C). However, the two compounds identified in the intersection of DSigDB and the DepMap PRISM dataset, chlorpromazine and ciclopirox, had combined scores (158.8 and 309.2) and p-values (0.028 and 0.017) in the DSigDB ranking that were lower than those for sorafenib obtained from the intersection with the LICOB dataset (highest combined score of 387.4, minimum p-value of 0.014). Notably, chlorpromazine and ciclopirox are not approved for tumor treatment, whereas sorafenib is a first-line therapy for HCC. Furthermore, the LICOB dataset results stem from organoid analyses derived from tumor specimens of liver cancer patients, providing greater reliability than the cell line data from the DepMap PRISM dataset. Additionally, the frequent occurrence of sorafenib in the DSigDB analysis suggests stronger evidence supporting its relevance.

Finally, chemical interaction network analysis conducted via the STITCH database (http://stitch.embl.de/) further indicated sorafenib’s central role among HCC treatment compounds in the LICOB dataset (Fig. 10D). In contrast, chlorpromazine and ciclopirox did not occupy central positions in the chemical interaction networks of compounds in the DepMap PRISM dataset (Figures S3C and S3D).

In summary, we ultimately proceeded with further analysis of the two drugs, irinotecan and sorafenib. To elucidate the interaction relationships between the drugs and their targets, we performed molecular docking of compounds with key drug targets (sorafenib with TTK; irinotecan with CCNB2, DCTN2, and TTK) according to the drug targets indicated by DSigDB, selecting the optimal scheme as the docking approach (maximum binding affinity, maximum interaction sites, and minimum binding energy), and used 2D and 3D diagrams to display the best docking results (Fig. 10E).

Validation of drug-target docking results by in vitro and in vivo experiments

Irinotecan is a commonly used chemotherapeutic agent for malignant tumors, including colorectal cancer, and can be included in combination therapy regimens for HCC. Previous studies have reported that in vitro, irinotecan and its metabolites enhance the inhibitory effect of sorafenib on HCC cell lines^26,27. Furthermore, our earlier investigations identified TTK as a core gene influencing the prognosis of HCC among cytoskeleton-related genes, with both irinotecan and sorafenib targeting TTK. This suggests that the combination of these two drugs may enhance therapeutic efficacy.

To further validate these predictions, we first employed RT-qPCR to assess the relative expression levels of the 5 genes in HCC cell lines and calculated the risk score. The PLC/PRF/5 cell line, which exhibited the highest risk score, was selected for subsequent experiments (Fig. 11A). The results of the CCK-8 assay following drug treatment indicated that the proliferation rate of the PLC/PRF/5 cell line was significantly reduced following combined treatment with irinotecan and sorafenib compared to either drug alone or the control group (Fig. 11B). Additionally, the results of the corresponding colony formation assays corroborated these findings (Fig. 11C).

Finally, in the subcutaneous tumor formation experiments in nude mice, the tumor volume in the combination group (849.03 ± 75.62 mm³ at day 28) was significantly reduced compared to the irinotecan group (1114.39 ± 135.31 mm³), sorafenib group (939.16 ± 5.60 mm³), and control group (1527.63 ± 92.57 mm³), as well as the tumor growth rate (Figs. 11D-E). Moreover, combination therapy did not significantly affect the body weight of the nude mice, indicating that this regimen is safe and feasible (Fig. 11F). These results suggest that the clinical combination of these two drugs may improve the prognosis of patients with HCC.

Discussion

The prognosis of HCC remains poor, particularly for patients with intermediate to advanced stages. The cytoskeleton not only provides structural support for cells but also plays crucial roles in cell division, migration, and intracellular signal transduction^28,29. Dysregulation of cytoskeleton-related genes has been confirmed to participate in cancer progression¹⁰, but comprehensive studies integrating these genes into prognostic models remain limited. Advances in high-throughput sequencing technology have provided new opportunities to explore the functions of cytoskeleton-related genes in HCC^30,31.

In this study, we identified 110 cytoskeleton-related DEGs in HCC. Through clustering analysis and univariate COX regression, we reduced the DEGs to 13 genes. Subsequently, we developed a robust five-gene prognostic model (ARPC1A, CCNB2, CKAP5, DCTN2, and TTK) using LASSO stepwise COX regression. Previous studies have confirmed that these 5 genes play important roles in various tumors. ARPC1A, as a key subunit of the Arp2/3 complex, regulates actin filament branched polymerization and drives the formation of invasive membrane protrusions in cancer cells. It was recently discovered to regulate cytoskeletal dynamics through interaction with EML4, enhancing lamellipodia formation, cell motility, local invasion, and metastasis³². In prostate cancer, ARPC1A is associated with poor prognosis and promotes tumor cell migration, invasion, and cytoskeletal alterations³³. CCNB2, as a cell cycle regulator, controls cell division through the G2/M checkpoint. Its overexpression leads to cell cycle dysregulation and malignant tumor proliferation, and has been identified as an independent predictive biomarker for poor prognosis in HCC, breast cancer, and other malignancies^34,35. CKAP5 ensures proper chromosome segregation by stabilizing microtubule-kinetochore connections, and its abnormal expression affects spindle assembly and genomic stability. CKAP5 has been found to stabilize CENP-E at kinetochores in various cancer cell lines by regulating microtubule-chromosome connections, ensuring smooth cell cycle progression in cancer cells³⁶. DCTN2, as a component of the dynein complex, mediates intracellular material transport and signal transduction. DCTN2 exerts oncogenic functions through binding with ROCK1 (in HCC) or as a FER substrate (in triple-negative breast cancer/TNBC)^37,38. TTK, as a core regulator of the spindle assembly checkpoint, not only controls cell division processes but also regulates cytoskeletal reorganization and EMT processes through phosphorylation of multiple substrate proteins. As a multifunctional protein kinase, TTK is regarded as an oncogene and therapeutic target in various malignancies including HCC, ovarian cancer, and TNBC^39,40,41. Notably, our study is among the first to integrate multiple cytoskeleton-related genes into a prognostic model for HCC, enhancing our understanding of cytoskeletal dynamics in cancer progression.

In this study, through analysis of prognostic ROC curves, K-M curves, and C-index, we found that the LASSO-stepwise COX model demonstrated superior stability and generalization capability compared to the RF model. Although the RF model showed significantly better predictive performance in the training set, it performed poorly in the test set, suggesting potential overfitting and emphasizing the necessity of robust validation methods. Combined with previous studies, we found that the poor external generalization ability of RF models is mainly attributed to their tendency to capture noise features in training sets when dealing with high-dimensional genomic data, leading to overfitting phenomena (especially when optimal features are not effectively selected). Additionally, differences in sample feature distributions and clinical heterogeneity between different datasets can also lead to RF model overfitting⁴². This highlights the importance of conducting multi-data and multi-dimensional comparisons when selecting prognostic models. Compared with 10 previous HCC machine learning models from different aspects (including HPRGs, ARGs, etc.), our model demonstrated non-inferior or even superior performance in both TCGA-LIHC and LIRI-JP datasets, highlighting the value of cytoskeleton-related genes in HCC prognosis prediction.

Furthermore, immune infiltration analysis revealed increased levels of immunosuppressive cells (such as regulatory T cells) in high-risk score patients, potentially contributing to poorer prognosis. Multiple cytoskeleton-related molecules have been identified as biomarkers for immune infiltration/escape through bioinformatics analysis and experimental validation, such as TAGLN2⁴³. Additionally, treatment regimens combining cytoskeleton-targeting drugs, represented by microtubule-targeting agents, with immune checkpoint inhibitors (ICIs) have been applied and promoted in various malignancies including TNBC and non-small cell lung cancer (NSCLC), with particularly favorable results observed in“hot”tumors⁴⁴. This also provides new insights for HCC immunotherapy. For example, previous studies have shown that targeting TTK can inhibit various CTNNB1 (β-catenin) mutated cancers⁴⁵. TTK inhibitors such as CFI-402257 and BAY1217389 have been proven to be potential therapeutic drugs for HCC or ovarian cancer^39,41. Moreover, TTK inhibition promotes anti-PD1 immunotherapy in TNBC⁴⁰, highlighting the potential impact of cytoskeletal regulation in cancer treatment.

In vitro and in vivo experiments confirmed that the combination treatment of irinotecan and sorafenib was more effective than monotherapy. Irinotecan, as a topoisomerase I inhibitor, and sorafenib, as a multi-kinase inhibitor, have both been FDA-approved for cancer treatment with good clinical safety profiles and substantial clinical evidence supporting their use. Current studies on this combination regimen (including a study based on pediatric hepatocellular carcinoma) have shown acceptable toxicity profiles^26,27. We can preliminarily determine that this is a clinically feasible combination regimen, and our in vivo experiments also suggest that this combination is safe. However, in our study and previous studies, the application time of the combination regimen was relatively short, and potential off-target effects and long-term safety still require further evaluation. Additionally, clinical applications indicate that this regimen has a strict therapeutic window, requiring further research to determine optimal dosing and efficacy.

Despite these advances, several limitations warrant consideration. The retrospective nature of our study requires prospective validation to confirm clinical applicability. The biological mechanisms underlying the prognostic significance of the identified genes require further investigation. Furthermore, genetic and environmental heterogeneity of HCC among different populations may limit the generalizability of our findings, highlighting the necessity of validation in diverse cohorts.

In conclusion, we have proposed a novel five-gene prognostic model for HCC and identified potential therapeutic targets, laying the foundation for future research to improve patient outcomes through precision medicine and targeted therapy.

Data availability

The datasets used and analysed during the current study available from the corresponding author on reasonable request.

Abbreviations

HCC:: Hepatocellular carcinoma
TCGA:: The cancer genome atlas
ICGC:: International cancer genome consortium
LIHC:: Liver hepatocellular carcinoma (TCGA dataset)
LIRI-JP:: International cancer genome consortium (ICGC) dataset
CHCC-HBV:: Chinese HCC patients with HBV infection
DEGs:: Differentially expressed genes
OS:: Overall survival
LASSO:: Least absolute shrinkage and selection operator
RF:: Random forest
ROC:: Receiver operating characteristic
AUC:: Area under the curve
PPI:: Protein–protein interaction
UMAP:: Uniform manifold approximation and projection
t-SNE:: T-Distributed stochastic neighbor embedding
PCA:: Principal component analysis
CAF:: Cancer-associated fibroblasts
MDSC:: Myeloid-derived suppressor cells
DCA:: Decision curve analysis
RT-qPCR:: Reverse transcription quantitative polymerase chain reaction
CCK-8:: Cell counting kit-8
GDC:: Genomic data commons
GO:: Gene ontology
KEGG:: Kyoto encyclopedia of genes and genomes
LICOB:: Liver cancer organoid biobank
STITCH:: Search tool for interactions of chemicals
TTK:: TTK Protein kinase / monopolar spindle 1 (MPS1)

References

Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249. https://doi.org/10.3322/caac.21660 (2021).
Article CAS PubMed Google Scholar
Cronin, K. A. et al. Annual report to the nation on the status of cancer, part 1: National cancer statistics. Cancer 128, 4251–4284. https://doi.org/10.1002/cncr.34479 (2022).
Article PubMed Google Scholar
Llovet, J. M. et al. Hepatocellular carcinoma. Nat. Rev. Dis. Primers 7, 6. https://doi.org/10.1038/s41572-020-00240-3 (2021).
Article PubMed Google Scholar
Forner, A., Reig, M. & Bruix, J. Hepatocellular carcinoma. Lancet 391, 1301–1314. https://doi.org/10.1016/S0140-6736(18)30010-2 (2018).
Article PubMed Google Scholar
Yang, X. et al. Precision treatment in advanced hepatocellular carcinoma. Cancer Cell 42, 180–197. https://doi.org/10.1016/j.ccell.2024.01.007 (2024).
Article CAS PubMed Google Scholar
Hall, A. The cytoskeleton and cancer. Cancer Mets. Rev. 28, 5–14. https://doi.org/10.1007/s10555-008-9166-3 (2009).
Article Google Scholar
Hohmann, T. & Dehghani, F. The cytoskeleton-a complex interacting meshwork. Cells https://doi.org/10.3390/cells8040362 (2019).
Article PubMed PubMed Central Google Scholar
Seetharaman, S. & Etienne-Manneville, S. Cytoskeletal crosstalk in cell migration. Trends Cell Biol. 30, 720–735. https://doi.org/10.1016/j.tcb.2020.06.004 (2020).
Article CAS PubMed Google Scholar
Giannelli, G., Koudelkova, P., Dituri, F. & Mikulits, W. Role of epithelial to mesenchymal transition in hepatocellular carcinoma. J. Hepatol. 65, 798–808. https://doi.org/10.1016/j.jhep.2016.05.007 (2016).
Article CAS PubMed Google Scholar
Mangaonkar, S., Nath, S. & Chatterji, B. P. Microtubule dynamics in cancer metastasis: Harnessing the underappreciated potential for therapeutic interventions. Pharmacol. Ther. 263, 108726. https://doi.org/10.1016/j.pharmthera.2024.108726 (2024).
Article CAS PubMed Google Scholar
Kamath, K., Smiyun, G., Wilson, L. & Jordan, M. A. Mechanisms of inhibition of endothelial cell migration by taxanes. Cytoskeleton (Hoboken) 71, 46–60. https://doi.org/10.1002/cm.21153 (2014).
Article CAS PubMed Google Scholar
Steinmetz, M. O. & Prota, A. E. Microtubule-Targeting Agents: Strategies To Hijack the Cytoskeleton. Trends Cell Biol. 28, 776–792. https://doi.org/10.1016/j.tcb.2018.05.001 (2018).
Article CAS PubMed Google Scholar
Yoshimoto, S., Morita, H., Okamura, K., Hiraki, A. & Hashimoto, S. alphaTAT1-induced tubulin acetylation promotes ameloblastoma migration and invasion. Lab. Invest. 102, 80–89. https://doi.org/10.1038/s41374-021-00671-w (2022).
Article CAS PubMed Google Scholar
Gao, Q. et al. Integrated proteogenomic characterization of HBV-related hepatocellular carcinoma. Cell https://doi.org/10.1016/j.cell.2019.08.052 (2019).
Article PubMed PubMed Central Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47. https://doi.org/10.1093/nar/gkv007 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 28, 1947–1951. https://doi.org/10.1002/pro.3715 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Furumichi, M., Sato, Y., Matsuura, Y. & Ishiguro-Watanabe, M. KEGG: Biological systems database as a model of the real world. Nucleic Acids Res. 53, D672–D677. https://doi.org/10.1093/nar/gkae909 (2025).
Article PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30. https://doi.org/10.1093/nar/28.1.27 (2000).
Article CAS PubMed PubMed Central Google Scholar
Wilkerson, M. D. & Hayes, D. N. ConsensusClusterPlus: A class discovery tool with confidence assessments and item tracking. Bioinformatics 26, 1572–1573. https://doi.org/10.1093/bioinformatics/btq170 (2010).
Article CAS PubMed PubMed Central Google Scholar
Jiang, Z. et al. HCCDB v2.0: Decompose expression variations by single-cell RNA-seq and spatial transcriptomics in HCC. Genom. Proteom. Bioinform. https://doi.org/10.1093/gpbjnl/qzae011 (2024).
Article Google Scholar
Jin, H., Sun, Z. & Bernards, R. LICOB: a powerful organoid platform for drug discovery. Cell Res 34, 11–12. https://doi.org/10.1038/s41422-023-00878-0 (2024).
Article CAS PubMed Google Scholar
Zheng, S. et al. Novel prognostic signature for hepatocellular carcinoma using a comprehensive machine learning framework to predict prognosis and guide treatment. Front. Immunol. 15, 1454977. https://doi.org/10.3389/fimmu.2024.1454977 (2024).
Article CAS PubMed PubMed Central Google Scholar
Savage, P. A., Klawon, D. E. J. & Miller, C. H. Regulatory T Cell Development. Annu Rev Immunol 38, 421–453. https://doi.org/10.1146/annurev-immunol-100219-020937 (2020).
Article CAS PubMed Google Scholar
Hegde, S., Leader, A. M. & Merad, M. MDSC: Markers, development, states, and unaddressed complexity. Immunity 54, 875–884. https://doi.org/10.1016/j.immuni.2021.04.004 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mao, X. et al. Crosstalk between cancer-associated fibroblasts and immune cells in the tumor microenvironment: New findings and future perspectives. Mol. Cancer 20, 131. https://doi.org/10.1186/s12943-021-01428-1 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, Z., Zhao, Z., Wu, T., Song, L. & Zhang, Y. Sorafenib-irinotecan sequential therapy augmented the anti-tumor efficacy of monotherapy in hepatocellular carcinoma cells HepG2. Neoplasma 62, 172–179. https://doi.org/10.4149/neo_2015_022 (2015).
Article CAS PubMed Google Scholar
Keino, D. et al. Pilot study of the combination of sorafenib and fractionated irinotecan in pediatric relapse/refractory hepatic cancer (FINEX pilot study). Pediatr. Blood Cancer 67, e28655. https://doi.org/10.1002/pbc.28655 (2020).
Article CAS PubMed Google Scholar
Fife, C. M., McCarroll, J. A. & Kavallaris, M. Movers and shakers: Cell cytoskeleton in cancer metastasis. Br. J. Pharmacol. 171, 5507–5523. https://doi.org/10.1111/bph.12704 (2014).
Article CAS PubMed PubMed Central Google Scholar
Legatova, A., Pelantova, M., Rosel, D., Brabek, J. & Skarkova, A. The emerging role of microtubules in invasion plasticity. Front. Oncol. 13, 1118171. https://doi.org/10.3389/fonc.2023.1118171 (2023).
Article CAS PubMed PubMed Central Google Scholar
Sun, Y. et al. Single-cell landscape of the ecosystem in early-relapse hepatocellular carcinoma. Cell https://doi.org/10.1016/j.cell.2020.11.041 (2021).
Article PubMed PubMed Central Google Scholar
Zhang, N. et al. Biomarkers and prognostic factors of PD-1/PD-L1 inhibitor-based therapy in patients with advanced hepatocellular carcinoma. Biomark. Res. 12, 26. https://doi.org/10.1186/s40364-023-00535-z (2024).
Article PubMed PubMed Central Google Scholar
Wang, S. et al. The N6-methyladenosine Epitranscriptomic Landscape of Lung Adenocarcinoma. Cancer Discov. 14, 2279–2299. https://doi.org/10.1158/2159-8290.CD-23-1212 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y. H. et al. ARPC1A correlates with poor prognosis in prostate cancer and is up-regulated by glutamine metabolism to promote tumor cell migration, invasion and cytoskeletal changes. Cell Biosci. 13, 38. https://doi.org/10.1186/s13578-023-00985-w (2023).
Article CAS PubMed PubMed Central Google Scholar
Li, R. et al. Cyclin B2 overexpression in human hepatocellular carcinoma is associated with poor prognosis. Arch. Med. Res. 50, 10–17. https://doi.org/10.1016/j.arcmed.2019.03.003 (2019).
Article CAS PubMed Google Scholar
Lu, Z., Wang, Z. & Li, G. High expression of CCNB2 is an independent predictive poor prognostic biomarker and correlates with immune infiltrates in breast carcinoma. Heliyon 10, e31586. https://doi.org/10.1016/j.heliyon.2024.e31586 (2024).
Article CAS PubMed PubMed Central Google Scholar
Lakshmi, R. B. et al. CKAP5 stabilizes CENP-E at kinetochores by regulating microtubule-chromosome attachments. EMBO Rep. 25, 1909–1935. https://doi.org/10.1038/s44319-024-00106-9 (2024).
Article CAS PubMed PubMed Central Google Scholar
Li, Y. F. et al. Binding between ROCK1 and DCTN2 triggers diabetes-associated centrosome amplification in colon cancer cells. Oncol. Rep. https://doi.org/10.3892/or.2021.8102 (2021).
Article PubMed PubMed Central Google Scholar
Tavares, S. et al. FER regulates endosomal recycling and is a predictor for adjuvant taxane benefit in breast cancer. Cell Rep. 39, 110584. https://doi.org/10.1016/j.celrep.2022.110584 (2022).
Article CAS PubMed Google Scholar
Chan, C. Y. et al. CFI-402257, a TTK inhibitor, effectively suppresses hepatocellular carcinoma. Proc. Natl. Acad. Sci. U S A 119, e2119514119. https://doi.org/10.1073/pnas.2119514119 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Qi, G. et al. TTK inhibition increases cisplatin sensitivity in high-grade serous ovarian carcinoma through the mTOR/autophagy pathway. Cell Death Dis. 12, 1135. https://doi.org/10.1038/s41419-021-04429-6 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hu, X. et al. TTK inhibition activates STING signal and promotes anti-PD1 immunotherapy in breast cancer. Biochem. Biophys. Res. Commun. 694, 149388. https://doi.org/10.1016/j.bbrc.2023.149388 (2024).
Article CAS PubMed Google Scholar
Barrenada, L., Dhiman, P., Timmerman, D., Boulesteix, A. L. & Van Calster, B. Understanding overfitting in random forest for probability estimation: A visualization and simulation study. Diagn. Progn. Res. 8, 14. https://doi.org/10.1186/s41512-024-00177-1 (2024).
Article PubMed PubMed Central Google Scholar
Pan, T., Wang, S. & Wang, Z. An integrated analysis identified TAGLN2 As an oncogene indicator related to prognosis and immunity in pan-cancer. J. Cancer 14, 1809–1836. https://doi.org/10.7150/jca.84454 (2023).
Article CAS PubMed PubMed Central Google Scholar
Liang, T., Lu, L., Song, X., Qi, J. & Wang, J. Combination of microtubule targeting agents with other antineoplastics for cancer treatment. Biochim. Biophys. Acta Rev. Cancer 1877, 188777. https://doi.org/10.1016/j.bbcan.2022.188777 (2022).
Article CAS PubMed Google Scholar
Zaman, G. J. R. et al. TTK inhibitors as a targeted therapy for CTNNB1 (beta-catenin) mutant cancers. Mol. Cancer Ther. 16, 2609–2617. https://doi.org/10.1158/1535-7163.MCT-17-0342 (2017).
Article CAS PubMed Google Scholar

Download references

Funding

Science and Technology Planning Project of Guangzhou city, 202102010171, 2023A03J0211, National Natural Science Foundation Cultivation Project of the Third Affiliated Hospital of Sun Yat-sen University, 2020GZRPYMS11, Guangdong Basic and Applied Basic Research Foundation, 2023A1515220090, Beijing Xisike Clinical Oncology Research Foundation, Y-Roche2019/2-0041.

Author information

Yuxuan Li, Mingbo Cao and Xiaorui Su contributed equally to this work.

Authors and Affiliations

Department of Hepatobiliary Surgery, The Third Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510630, China
Yuxuan Li, Mingbo Cao, Gaoyuan Yang, Yupeng Ren, Zhiwei He, Zheng Shi & Meihai Deng
Department of Hepatobiliary & Pancreatic Surgery, The Third Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510630, China
Xiaorui Su, Ziyi Hu, Guirong Liang & Zhicheng Yao
Biotherapy Centre, The Third Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510630, China
Qi Zhang

Authors

Yuxuan Li
View author publications
Search author on:PubMed Google Scholar
Mingbo Cao
View author publications
Search author on:PubMed Google Scholar
Xiaorui Su
View author publications
Search author on:PubMed Google Scholar
Gaoyuan Yang
View author publications
Search author on:PubMed Google Scholar
Yupeng Ren
View author publications
Search author on:PubMed Google Scholar
Zhiwei He
View author publications
Search author on:PubMed Google Scholar
Zheng Shi
View author publications
Search author on:PubMed Google Scholar
Ziyi Hu
View author publications
Search author on:PubMed Google Scholar
Guirong Liang
View author publications
Search author on:PubMed Google Scholar
Qi Zhang
View author publications
Search author on:PubMed Google Scholar
Zhicheng Yao
View author publications
Search author on:PubMed Google Scholar
Meihai Deng
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.L. and M.C.: Conducted partial data cleaning, drafted the"Materials and Methods"section, constructed and visualized the main prognostic models in Figs. 1,2,3,4,5,6 (including corresponding supplementary figures), and wrote the original draft. X.S.: Collected raw data, performed background investigation, conducted drug sensitivity analysis and molecular docking in Fig. 9 (including corresponding supplementary figures), and contributed to the original draft. G.Y.: Conducted single-cell sequencing analysis and spatial transcriptomics analysis (Figs. 7 and 8) and supervised coding throughout the study. Y.R.: Participated in experimental design and managed the animal ethics review. ZW.H. and Z.S.: Conducted cellular experiments included in the manuscript. ZY.H. and G.L.: Oversaw the animal experiments described in the paper. Q.Z.: Reviewed the experimental design and the entire manuscript. Z.Y. and M.D.: Provided funding, contributed to the development of methodologies, and reviewed the manuscript. Additionally, M.D. provided the conceptual framework for the study. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Qi Zhang, Zhicheng Yao or Meihai Deng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval

This study was conducted in accordance with the ARRIVE guidelines.

All animal experiments were conducted under the guidelines and the institutional animal care protocol approved by the laboratory animal care and ethics committee of Guangdong Laidi Biomedical Research Institute Co.,LTD(LDSW-SOR-AAP-05–023(02)).

The study was conducted in accordance with the Declaration of Helsinki, and ethical review and approval were waived for the human data analysis portion due to exclusive use of publicly available databases that had previously received ethical approval at their original institutions.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Y., Cao, M., Su, X. et al. Machine learning-assisted multi-dimensional transcriptomic analysis of cytoskeleton-related molecules and their relationship with prognosis in hepatocellular carcinoma. Sci Rep 15, 23832 (2025). https://doi.org/10.1038/s41598-025-10056-4

Download citation

Received: 11 February 2025
Accepted: 01 July 2025
Published: 03 July 2025
DOI: https://doi.org/10.1038/s41598-025-10056-4

Subjects

Abstract

Similar content being viewed by others

Multi-omics analysis identifies SNP-associated immune-related signatures by integrating Mendelian randomization and machine learning in hepatocellular carcinoma

Seven chromatin regulators as immune cell infiltration characteristics, potential diagnostic biomarkers and drugs prediction in hepatocellular carcinoma

Multi-omics analysis of macrophage-associated receptor and ligand reveals a strong prognostic signature and subtypes in hepatocellular carcinoma

Introduction

Materials and methods

Data collection and preprocessing

Identification of differentially expressed genes (DEGs) and functional clustering

Gene correlation analysis and heatmap visualization

Protein–protein interaction (PPI) network analysis

Prognostic heatmap visualization

Construction of prognostic prediction model using COX regression and machine learning methods

Evaluation of predictive performance

Multivariate COX regression and nomogram construction

Consensus clustering analysis and sankey diagram visualization

t-SNE analysis of gene expression

Differential expression and immune infiltration of 5 core genes

Mutation gene analysis

Transcription factor analysis and network venn diagram visualization

Methylation analysis of HCC patients

Single-cell RNA sequencing analysis

Spatial transcriptomics analysis

Drug prediction and drug sensitivity analysis

Molecular docking

Cell culture

Reverse transcription quantitative polymerase chain reaction (RT-qPCR)

Cell proliferation assay

Colony formation assay

Xenograft formation in nude mice

Statistical analysis

Results

Identification of core cytoskeleton-related DEGs in hepatocellular carcinoma

Construction of prognostic models for hepatocellular carcinoma associated with cytoskeleton

Validation of the prognostic models

Representational role of the five-gene model for two clusters based on the expression of eighty core DEGs in hepatocellular carcinoma

The expression of 5 genes in hepatocellular carcinoma and their relationships with prognosis, immune suppressor cells, and genetic mutations

Transcriptional regulation analysis of 5 genes

Single-cell sequencing analysis of 5 genes and risk score

Spatial transcriptomics analysis of 5 genes’ expression

Drug screening and molecular docking of 5 genes

Validation of drug-target docking results by in vitro and in vivo experiments

Discussion

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethics approval

Additional information

Publisher’s note

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links