Fig. 4: ESM-DBP explores the DNA-binding domains. | Nature Communications

Fig. 4: ESM-DBP explores the DNA-binding domains.

From: Improving prediction performance of general protein language model by ___domain-adaptive pretraining on DNA-binding protein

Fig. 4

a Visualization of ESM-DBP sequence embedding of different DBD families in UniDBP40; b The saliency map of six human TFs. The solid black line indicates the DBD area recorded in UniProt. The horizontal axis and vertical axis are the indexes of the residue in protein sequence and the transformer block in ESM-DBP. Each value with coordinates \((x,y)\) represents the saliency score calculated by the integrated gradient algorithm of the x th residue in the yth transformer block for the prediction result; The integrated gradient algorithm is implemented in the Captum library83; c Prediction results comparison of full feature (marked in diamond), DBD feature (marked in the circle), and non-DBD feature (marked in the triangle) on 91 test human TFs. The DBD (or non-DBD) feature is obtained by replacing all residues that are not in (or in) the DBD region with the token of <mask> before inputting the protein sequence into the ESM-DBP. Source data are provided as a Source Data file.

Back to article page