Fig. 1: MiMSI model design and performance metrics.

a Schematic representation of converting sequencing reads in a genomic region into a vector representation. Reference sequence, along with mapping qualities and CIGAR strings for each read is used in the vectorization after downsampling. The set of vectors for a given sample is passed through the model (see eFigure 1). b Study cohort used for both training the model and testing the performance. c Distribution of MSISensor scores for samples with orthogonal testing performed. d Area under the receiver operator curve (auROC) analysis of the test cohort analyzed with MSISensor and MiMSI at 4 different downsampled coverage levels (100X, 200X, 300X, and 400X). e MSISensor scores and MiMSI probabilities for the test cohort. Colors indicate the orthogonal test status. Source data are provided as a Source Data file.