Extended Data Fig. 5: Prediction performance for BioMe and Genomic Psychiatry Cohort (GPC) cohorts restricted to individuals of European (EUR) ancestry.
From: Prognostic value of polygenic risk scores for adults with psychosis

Training and validation F2 estimates for varying regularization parameters (C) are displayed within the ‘Cross-validated grid-search’ frame for each outcome and feature configuration of interest [that is, clinical, clinical and genetic (all), and clinical and binarized genetic (all binary)]. Data are presented as mean values for training and mean values ± SD for validation. Averages are computed on n = 300 independent scores derived from subsets of n = 23 (BioMe) and n = 850 (GPC) observations for validation and n = 46 (BioMe) and n = 1,698 (GPC) observations for training. The dot corresponds to the highest F2 score during validation. The best model, with related parameter C, is then trained on the entire training set and evaluated on the test set. The F2 validation score distributions obtained are enclosed in the ‘Performance evaluation’ frame. Boxplots’ center: median and mean; bound of box: 25th (Q1) and 75th (Q3) percentiles; minimum: Q1-1.5*(Q3-Q1); maximum: Q3+1.5*(Q3-Q1). Exact p-values from two-sided pairwise t-tests with Benjamini-Hochberg correction are displayed for significant comparisons. Precision-recall curves obtained from the models evaluated on test sets are reported on the right.