Fig. 2: SVM probability scores with stepwise re-training of classifiers.

SVM-classifier training started with cases (dark green) with obviously pathogenic, that is, non-mosaic loss-of-function variants (step 0). This initial classifier was applied to cases with variants (gray) of various levels of pathogenic significance. The cases in whom this produced an SVM probability score > 0.5 were then included in the case set (light green) for re-training of the classifier. Novel cases to be thus included in re-training were detectable up to the 3rd re-training of the KMT2B classifier and up to the 2nd re-training of the KMT2D classifier. The 19 controls (black) were the same in all training steps. Dashed blue lines indicate 50th ( = median) and 95th quantiles of the classifiers’ SVM probability scores in a set of 194 independent samples with and without variants in genes of the epigenetic machinery other than the gene under examination. The upper dotted line indicates the classifiers’ specificities as determined in these independent control samples.