Fig. 1: Multi-omic Overview of the Colorectal Cancer PDX Cohort and Cetuximab Response Modelling Approach. | Nature Communications

Fig. 1: Multi-omic Overview of the Colorectal Cancer PDX Cohort and Cetuximab Response Modelling Approach.

From: Integrative ensemble modelling of cetuximab sensitivity in colorectal cancer patient-derived xenografts

Fig. 1

a The left panel presents the IRCC patient derive xenografs (PDX) collection, from 231 unique colorectal cancer (CRC) liver metastasis (LMX) resections. This collection was characterised at a multi-omic level and assessed for cetuximab response. A schematic of the omic-specific feature engineering is also provided. The right panel outlines the CeSta classifier pipeline. Input features selected from the training set (Methods) using univariate tests (Fisher’s exact, Mann-Withney U-test) and multivariate linear models feed into three independent level 1 classifier pipelines: forward feature selection plus elastic net, ANOVA feature selection plus extra trees, and ANOVA feature selection plus support vector classifiers. A fourth classifier, a catBoost model, is pre-trained on pan-cancer data from the Cell Model Passport repository and fine-tuned using IRCC-PDX data. The predictions from these level 1 classifiers are stacked and inputted into a meta-classifier, which produces the final binary classification (cetuximab-responder/non-responder) using argmax-based soft voting. b CeSta nested cross-validation approach: 50 train/test splits are generated via stratified sampling of the IRCC-PDX collection. CeSta is trained and tuned independently across these 50 splits. In each iteration, the training set is divided into three folds. Two folds are used in three rounds as the ‘training fold’, while the remaining fold serves as the ‘validation fold’. Predictions from level-1 classifiers for the validation fold are stacked and input into the meta-classifier. After validation, first-level classifiers are fitted to the entire training set, and CeSta’s performance is evaluated on the test set (pink rectangle, N = 81). CeSta is then trained on the entire IRCC-PDx dataset and tested on an independent CR-PDX dataset (grey rectangle, N = 50) for external validation. c Top frequently mutated genes in the IRCC-PDX cohort. d Selection of multi-omic and clinical features across the IRCC-PDX collection, including CRIS expression cluster labels, methylation NMF cluster labels, primary sample anatomical ___location, and treatment backbone. Source data are provided as a Source Data file. Fig. 1AB has been Created in BioRender [Iorio, F. (2024) BioRender.com/q01w468] and released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).

Back to article page