Fig. 1: Compounds employed to train machine learning models of senolytic action.

a We assembled training data from multiple sources. We mined 58 known senolytics (positives) from academic papers and a commercial patent, and integrated them with diverse compounds from the LOPAC-1280 and Prestwick FDA-approved-1280 chemical libraries (negatives). Chemical structures were featurised with 200 physicochemical descriptors computed with RDKit57 and binary labelled according to their senolytic action. These labelled data were employed to train binary classifiers predictive of senolytic activity. b Sources of the 58 senolytics employed for training, including the number of compounds per source and the cell lines where senolysis was identified. c Cluster structure of the senolytics employed for training using the RDKit descriptors as features. Plot shows the k-means clustering score and silhouette coefficient58 averaged across compounds for an increasing number of clusters (k). Error bars denote one standard deviation over 100 repeats with different initial seeds. The lack of a clear “elbow” in the k-means score and low silhouette coefficients suggest poor clustering among the senolytics employed for training. d Tanimoto distance graph of senolytics employed for training; nodes are compounds and edges represent compounds that are sufficiently close in the physicochemical feature space. Node colour indicates the data source as in panel b. To emphasise the overall dissimilarity between compounds, we set the edge thickness as the Tanimoto similarity (1-distance). Inset shows the distribution of Tanimoto distances across the 269 graph edges (median distance of 0.77). e Clustering of the Tanimoto distance graph using the Louvain algorithm for community detection60. Plot shows the average number of clusters with respect to the resolution parameter (γ) across 100 runs (error bars denote one standard deviation); increasing values of γ produce a larger number of clusters. We observe pronounced plateaus at 5 and 6 clusters, suggesting some degree of clustering in the data. We computed the adjusted Rand index61 (ARI) averaged across all compounds to quantify the similarity between cluster labels and compound source labels (15 labels; panel e). Low ARI values indicate that Louvain clusters are substantially different from the literature source labels.