Fig. 5: ESM-DBP performs well on DBPs with few homologous proteins. | Nature Communications

Fig. 5: ESM-DBP performs well on DBPs with few homologous proteins.

From: Improving prediction performance of general protein language model by ___domain-adaptive pretraining on DNA-binding protein

Fig. 5

a Statistics on the number of homologous proteins of 381 DBPs in the independent test set against UniDBP40 and UniRef50 respectively using the Blast program35 with an e-value of 1e-06; b Head-to-head comparison between the prediction probabilities of ESM-DBP and ESM2 on the 381 DBPs in the independent test set. Pentagrams highlighted in grey shading indicate those 18 DBPs for which the difference between the predicted probabilities of ESM-DBP and ESM2 is greater than 0.4; c The relationship between the prediction probability of DBP and its homology against the two pretraining datasets; d Prediction results of ESM2 and other methods on DBPs in independent test set, among them high homology DBPs (n = 226) are those that have similar sequences against UniDBP40 (number > 0) and low homology DBPs (n = 155) have no similar sequences. Centre line means median, bounds of box indicate 25th and 75th percentiles, and whiskers indicate minima and maxima; e ESM-DBP learns discriminatory knowledge from three different types of low homologous proteins, i.e., bZIP protein, disordered protein, and protein without ___domain. NUniDBP40 and NUniRef50 mean the number of similar sequences against UniDBP40 and UniRef50 separately. “Probability” indicates the probability of being predicted as a DBP by ESM-DBP. Source data are provided as a Source Data file.

Back to article page