Table 4 Cas1 proteins with lengths of < 250aa predicted by our models.

From: A new strategy for Cas protein recognition based on graph neural networks and SMILES encoding

 

Candidate Cas1 ID

Sequence length (< 250)

Predicted score

Number of Cas1 that BLAST hits

Region name in NCBI

Cas1 ___domain matched in SMART (start–end; E-values)

1

KPU62407

61

1.08

331

Cas1_I–II–III

 

2

KKG18251

153

1.05

66

Cas1_I–II–III

8–70; 3.80e–19

3

SNY00500

100

1.37

33

–

–

4

OBZ34426

89

0.96

32

–

–

5

KGK98611

106

0.96

31

–

–

6

ATU08116

73

0.65

30

–

–

7

APH39473

91

1.01

30

–

–

8

ATU08599

60

0.97

28

–

–

9

EFC93806

56

1.06

28

Cas1_I–II–III

–

10

SNY20592

58

0.99

27

–

–

11

AFV22607

57

1

13

–

–

12

OYT29111

237

1.32

1

–

–

13

EQB71572

30

1.11

1

–

–

14

AIC16667

84

1.21

1

–

–

15

KKF97957

172

1.8

1

–

–