Table 1 Data and datasets.
Step | Dataset | Data group | Number of subsequences used/Number of RNAs in raw data | Number of subsequences (percentage)a |
---|---|---|---|---|
Pre-training | Training set (T1) | Group 3 | 17,927/105,370 | 14,342 (80%) |
Validation set (V1) | Â | Â | 3,585 (20%) | |
Trans-training | Training set (T2) | Group 2 | 3,635/24,863 | 2545 (70%) |
Validation set (V2) | Â | Â | 545 (15%) | |
Test | Test set (TS) | Â | Â | 545 (15%) |
Test set (TS’) | Group 1 | 128/1,003 | 128 (100%) |