Scientific Reports

Table 7 Properties of the three datasets used for evaluating the performance of the models.

From: A new strategy for Cas protein recognition based on graph neural networks and SMILES encoding

Dataset for prediction	Number of positive samples	Number of negative samples
Dataset 1: sequence length ≤ 400aa	Cas1: 10374	Uniref50: 8839
Dataset 2: sequence length > 400aa	Cas1: 1221	Uniref50: 1198
Dataset 3: sequence length ≤ 1300aa	Cas1: 11595	Cas2-Cas14: 11276

Back to article page

Search

Advanced search

Quick links