Fig. 3: Substitution effect analysis of the trifluoromethyl group.
From: Sequence-based drug design as a concept in computational drug design

a Left, data distribution of the trifluoromethyl substitution dataset. Only 15.73% of substitution of −CH3 by −CF3 could increase or decrease the biological activity by at least an order of magnitude. We conducted substitution effect analysis on this part of data. Right, an example of −CH3 changed by −CF3 leads to a significant increase in biological activity. b The overall accuracy of TransformerCPI2.0 and baseline models on the whole dataset. Error bars represent mean ± SEM of three independent experiments. P values were evaluated using 2-tailed unpaired t-test, ***P < 0.001. (TranformerCPI2.0 vs. TransformerCPI, P = 0.0007; TranformerCPI2.0 vs. GraphDTA, P = 0.0003; TranformerCPI2.0 vs. GCN, P = 0.0009; TranformerCPI2.0 vs. Random Guess, P < 0.0001.) c The overall accuracy of TransformerCPI2.0 and baseline models on the subset where the substitution of −CH3 by −CF3 could increase or decrease the biological activity by at least three orders of magnitude. Error bars represent mean ± SEM of three independent experiments. P values were evaluated using 2-tailed unpaired t-test, **P < 0.01,***P < 0.001. (TranformerCPI2.0 vs. TransformerCPI, P = 0.0003; TranformerCPI2.0 vs. GraphDTA, P = 0.0012; TranformerCPI2.0 vs. GCN, P < 0.0001; TranformerCPI2.0 vs. Random Guess, P < 0.0001.) d Additional two activity decrease examples and activity increase examples are shown. The predictions of TransformerCPI2.0 are consistent with the ground truth for proteins and compounds not present in the training set.