Extended Data Fig. 2: Features and application of the SGREELI algorithm. | Nature Chemical Biology

Extended Data Fig. 2: Features and application of the SGREELI algorithm.

From: Single-step discovery of high-affinity RNA ligands by UltraSelex

Extended Data Fig. 2

a, Number of HTS reads per RNA species in the UltraSelex input library. Three independent replicates. b, Decreasing sequence diversity with increasing group number e1–11 in the SiR-B UltraSelex library. The x axis indicates the log2 of the number of HTS reads per species, while the y axis denotes the number of RNA species with the specified abundance. The partitioning eluate (e) number is individually denoted. c, Percentage of RNA species with ≥2 HTS reads in the different partitioning groups (e1–11) of UltraSelex library SiR-B. The dashed gray line represents 1% as a reference. d, Formula and simulation of γ functions, assigning γi values to different partitioning groups gi for i ≥ 2. c = 0.5, eluate_nums, number of partitioning eluates (here 11). Colors in the line plot represent different exponents θ from 0.1 to 10. e, Illustration of the SGREELI calculation protocol, comprising seven steps. After sequencing and quality control, fold-change (FC) values were calculated for each eluate (e2–11 for FC2–FC11) by normalizing read counts to their abundance in eluate e1. An RNA species that ranks at exactly the top 1% by FC value in each eluate is defined as the reference RNA (FCref), with this step performed for each eluate group (ref2–ref11). The fold-change values of these reference RNAs are log2-transformed (log2 FCref). Each eluate group is then assigned a specific γ value, here based on a linear function with exponent θ = 1, reflecting the expected elution patterns of strong binders in later fractions. The log2 FCref value is then divided by γ to derive a normalization factor for each eluate group. All RNA species in a given eluate group have their FC values log2 transformed and adjusted by this normalization factor, yielding intra- and inter-group adjusted fold-change (aFC) values. Finally, the area under the curve (AUC) is computed by integration of the aFC values for each species, using the corresponding γ values as the x-axis parameters.

Source data

Back to article page