Computational assessment of measurable residual disease in acute myeloid leukemia using mixture models

Mocking, Tim R.; Kelder, Angèle; Reuvekamp, Tom; Ngai, Lok Lam; Rutten, Philip; Gradowska, Patrycja; van de Loosdrecht, Arjan A.; Cloos, Jacqueline; Bachas, Costa

doi:10.1038/s43856-024-00700-x

Download PDF

Article
Open access
Published: 19 December 2024

Computational assessment of measurable residual disease in acute myeloid leukemia using mixture models

Communications Medicine volume 4, Article number: 271 (2024) Cite this article

2939 Accesses
4 Citations
Metrics details

Subjects

Abstract

Background

The proportion of residual leukemic blasts after chemotherapy assessed by multiparameter flow cytometry, is an important prognostic factor for the risk of relapse and overall survival in acute myeloid leukemia (AML). This measurable residual disease (MRD) is used in clinical trials to stratify patients for more or less intensive consolidation therapy. However, an objective and reproducible analysis method to assess MRD status from flow cytometry data is lacking, yet is highly anticipated for broader implementation of MRD testing.

Methods

We propose a computational pipeline based on Gaussian mixture modeling that allows a fully automated assessment of MRD status while remaining completely interpretable for clinical diagnostic experts. Our pipeline requires limited training data, which makes it easily transferable to other medical centers and cytometry platforms.

Results

We identify all healthy and leukemic immature myeloid cells in with high concordance (Spearman’s Rho = 0.974) and classification performance (median F-score = 0.861) compared to manual analysis. Using control samples (n = 18), we calculate a computational MRD percentage with high concordance to expert gating (Spearman’s rho = 0.823) and predict MRD status in a cohort of 35 AML follow-up measurements with high accuracy (97%).

Conclusions

We demonstrate that our pipeline provides a powerful tool for fast (~3 s) and objective automated MRD assessment in AML.

Plain Language Summary

Cancer cells can be targeted with intensive chemotherapy in patients with acute myeloid leukemia (a type of blood cell cancer). However, disease can return after treatment due to the survival of cancer cells in the bone marrow. Identifying these cells is relevant to decide on future treatment options. However, this analysis is still performed manually by looking at a series of graphs to identify cancer and healthy cells. This process is labor-intensive, and results can differ based on the person performing the analysis. In this study, we demonstrate that this process can be automated using a computer algorithm (calculations), cutting the analysis time down from thirty minutes to three seconds. We anticipate that this can improve the accessibility and accuracy of diagnosing acute myeloid leukemia.

Impact of measurable residual disease by decentralized flow cytometry: a PETHEMA real-world study in 1076 patients with acute myeloid leukemia

Article 01 February 2021

Pre-emptive detection and evolution of relapse in acute myeloid leukemia by flow cytometric measurable residual disease surveillance

Article Open access 18 June 2024

Association of hematologic response and assay sensitivity on the prognostic impact of measurable residual disease in acute myeloid leukemia: a systematic review and meta-analysis

Article 19 October 2022

Introduction

Measurable residual disease (MRD), defined as the proportion of residual leukemic blasts after (chemo)therapy, is a critical biomarker for relapse-free and overall survival in acute myeloid leukemia (AML)¹. A recommended technique for MRD assessment in AML is multiparameter flow cytometry, which uses fluorescent-labeled monoclonal antibodies to determine intra- and extracellular expression of proteins of individual cells². Flow cytometry allows for the identification and monitoring of immature myeloid cells (blasts) with characteristic leukemia-associated phenotypes (LAIPs), protein expression patterns absent or present in very low frequencies in healthy or regenerating bone marrow³. Currently, experts manually identify these leukemic blasts through hierarchical bivariate visualizations in a process called manual gating. However, this method requires extensive background knowledge of protein expression patterns of healthy and malignant hematopoiesis and is time-consuming. Furthermore, the subjectivity of gating can result in undesired variability in MRD percentages, derived from the proportion of LAIP cells relative to the total white blood cells (WBC). The threshold for MRD positivity is currently 0.1%², so small differences in manual gating can affect the reliability of MRD positivity. Therefore, an objective and standardized MRD assessment is a clinical imperative for AML management.

A promising new approach for objective MRD assessment in AML is computational MRD (cMRD) assessment. Previous applications in other hematological malignancies, such as chronic lymphocytic leukemia^4,5, acute lymphoblastic leukemia^6,7, and multiple myeloma⁸, have demonstrated varying degrees of success. For AML, several studies have utilized (semi-)supervised machine learning and artificial intelligence to identify leukemic features for automated AML-MRD assessment^{6,9,10,11,12,13,14,15}. However, the heterogeneity of marker expression in AML poses a significant challenge for supervised approaches: more than 100 distinct LAIPs have been described in AML, with some observed in less than 1% of patients¹⁶. Because LAIPs manifest as distinct clusters in high-dimensional data, supervised approaches require training models on the full spectrum of AML phenotypes. This requires unrealistically large amounts of training data, which is infeasible for smaller diagnostic centers. Additionally, the development and implementation of such models require consistent use of identical flow cytometry panels and platforms over time and would become unusable if new test samples deviate from previous training data.

Several algorithms that limit the challenges of supervised approaches have been proposed. One approach allows for clustering a single AML sample with a set of (disease-free) control or “reference” samples, assuming that cells with characteristics absent from the reference coincide with leukemic populations of interest. Leukemic clusters can then be delineated in a “computer-assisted” setting^15,17,18,19, in which clusters are manually evaluated, or determined using a heuristic. For example, Jacqmin et al. ²⁰. and Weijler et al. ²¹. used the ratio of patient and control cells within clusters to define potential leukemic clusters. However, the degree to which leukemic blasts form a separate cluster depends on the mixture of leukemic to control cells. Additionally, these clustering algorithms are heavily dependent on density patterns in feature space, which may fail for rare cell populations and in high dimensions. Weijler et al. ²¹. attempted to circumvent this by reducing the data to three dimensions using Uniform manifold approximation and projection (UMAP). However, such transformations may extensively distort expression data^22,23, potentially limiting their applicability in clinical settings.

Another promising approach to identify leukemic blasts may come from novelty detection algorithms, which evaluate whether a new sample contains anomalous observations that do not occur (or with a low frequency) in training data. Consequently, the need for ad-hoc clustering together with a reference is alleviated, and the same model can be used and validated for use across AML samples. Additionally, as novelty detection evaluates each cell individually, it is not affected by the rarity of leukemic blasts in test samples.

In this study, we developed and evaluated a novel pipeline for fully automated MRD assessment in AML using Gaussian mixture models (GMMs). Briefly, these statistical models fit a combination of Gaussian distributions to the data, representing clusters (“components”) of cells with similar marker expression. Using these models, we identified the cell subset containing healthy immature and leukemic blasts. Next, we identified leukemic blasts within this subset and predicted the MRD status. We applied the pipeline to a large flow cytometry dataset which was acquired according to standardization criteria of the European LeukemiaNet guidelines². This dataset was annotated and labeled extensively, and made publically available to provide further potential for benchmarks of computational AML-MRD strategies.

Methods

Datasets

Involvement of human participants was approved by the institutional review boards of Amsterdam UMC (#METC-2023.0883) for in-house participants and Erasmus MC (#MEC-2013-539) for patients in the HOVON-SAKK-132 trial²⁴. The study followed the Declaration of Helsinki with written informed consent from all patients.

Bone marrow samples were obtained via aspiration from the posterior superior iliac spine in pulls of 1–2 mL²⁵ from AML patients or from the sternum during cardiothoracic surgery for normal bone marrow. Three anonymized cohorts of patient samples were collected for model training and optimization. The first cohort, BLAST110, includes 90 AML and 20 normal bone marrow in-house patients measured between 2015 and 2022. For AML patients, a mixture of diagnosis (n = 30) and follow-up (n = 60) samples with different frequencies of leukemic blasts at different time points were selected for phenotypic variation of AML progression. For follow-up samples, an even mixture (20 - 20 – 20 samples) of MRD_low (0–0.1%), MRD_intermediate (0.1%–1%), and MRD_high (≥1%) samples were randomly selected from the total patient cohort between 2015 and 2022. The second cohort, LAIP29, comprised another 29 in-house AML patients with samples from diagnostic (n = 28) and follow-up (n = 20) time points. Lastly, from patients included in the HOVON-SAKK 132 trial²⁴, a subset of 18 patients with an event-free survival of 3 years, without a LAIP at diagnosis and MRD negativity at follow-up, was set aside for the construction of a disease-free reference dataset, which we refer to as regenerating bone-marrow (RBM18).

Flow cytometry

Flow cytometry measurements were performed on FACS CANTO II (BD Biosciences, San Jose, CA, USA) equipment using a four-tube (P1-P4), eight-color antibody panel²⁶ (Supplementary Table 1). To explore potential batch effects, we visualized each marker’s percentiles (interpercentile distance of 10%, excluding minimum and maximum) for each file across the three cohorts using principal component analysis, which did not show a clear technical confounder (Supplementary Fig. 1).

In the BLAST110 dataset, blasts were manually gated in FlowJo (v10.8.1) by identifying CD45_dim/CD34+ and CD45_dim/CD117+ cells (Supplementary Fig. 2). Cell labeling was extracted from FlowJo workspace files in R (4.3.1) using CytoML (v2.12.0)²⁷. In the LAIP29 and RBM18 datasets, every event was given a unique identifier before manual AML-MRD analysis²⁶ in Infinicyt (v2.0.5b). Each manually gated population was exported as a unique FCS file, after which an R script was used to obtain cell labeling similar to that obtained through FlowJo. For the RBM18 dataset, any LAIP+ cells were removed.

Flow cytometry pre-processing

Events exceeding the instrument’s capacity were removed using PeacoQC (v1.10.0)²⁸, after which fluorescence parameters were compensated. Data was transformed using the logicle transformation²⁹. The w parameter of the transform was based on the median value as determined using the estimateLogicle function in flowCore (v2.12.0)³⁰ based on the BLAST110 cohort. An approximated min-max scaling was applied for scatter parameters based on the 0.01 and 0.99 quantiles. Doublets were removed by excluding all cells identified by PeacoQC’s RemoveDoublets or flowStat’s (4.12.0) singletGate.

Supervised blast classification

Supervised models and their hyperparameters (Supplementary Table 2) were compared using nested group 10-fold cross-validation to classify blasts in the BLAST110 dataset. In this benchmark, we also included a supervised variant of the FlowSOM clustering algorithm (FlowSOMclf) in Python (v3.9.10)^31,32. In this setting clusters of the default 10 × 10 grid in training data were assigned as blast clusters if the ratio to blast to non-blast cells exceeded a threshold. We treated this threshold as a tunable hyperparameter during model training.

To develop a model with applicability across the four-tube panel, models were trained exclusively on SSC-A, CD45, CD34, and CD117. Patient identifiers were used as groupings to prevent data leakage between training and validation folds. Optimal hyperparameters were chosen based on the best performing (highest F-score) hyperparameter set based on either exhaustive search (logistic regression, support vector machine, FlowSOMclf) or 20 randomized search iterations (random forests, LightGBM, GMMclf). For computational considerations, models were trained on downsampled data by randomly sampling 2000 total cells from each FCS file. After model selection, a final GMM classifier was finetuned and fitted on a more extensive (5000 total cells/sample) dataset, selecting the final hyperparameters using an exhaustive search in group 10-fold cross-validation. In all performance evaluation settings, classification performance was calculated for full samples without downsampling.

Supervised WBC count identification

The cell count corresponding to each non-blast component in the GMM classifier was obtained to predict the WBC count. For modeling, we used scikit-learn’s (v1.4.0) implementation of linear regression. In the BLAST110 cohort, we first assessed training performance using leave-one-out cross-validation using patient identifiers as groups. Next, the model was trained on the features from the BLAST110 cohort before evaluation in the LAIP29 cohort.

Reference modeling and novelty detection

An aggregated dataset was created from the RBM18 samples by sampling 4000 cells from every FCS file, ensuring an equal contribution of samples. A unique GMM was trained on fluorescence parameters in the aggregated data for each tube. The number of components (range: 1–20) was optimized using the Bayesian information criterion (BIC) score in leave-one-out cross-validation, using individual patients as held-out test sets. Cells were marked as novelties if they were below a log-likelihood threshold defined by a percentile of the log-likelihood distribution in the aggregated RBM18 data (e.g., 10%). For nuSVM and LOF-based novelty detection, scikit-learn implementations were used, using nu = 0.05 for the nuSVM and contamination = 0.05 for the LOF.

For the UMAP-HDBSCAN pipeline²¹, we used previously described parameters for UMAP (min_dist = 0, n_components = 3) and HDBSCAN (min_cluster_size = 50). Clusters were marked as leukemic if a cluster contained less than 5% control events. Although the authors originally used 80% of the cell count of a test sample to define the reference cell count, this was not always feasible for test samples with large blast counts due to the size of our reference dataset. We used downsampled test data for these samples to identify the aberrant cluster, after which all test data was used for evaluation. Although this could theoretically have lowered sensitivity, samples with such high blast counts were typically diagnosis samples for which sensitivity was unlikely to be the limiting factor.

Results

To perform automated MRD assessment of AML samples, we used a two-staged approach (Fig. 1). In the first stage, leukemic blasts and their healthy myeloid counterparts are identified manually or automatically using a classifier based on GMMs. Next, control samples without leukemic blasts are used to train a GMM reference model. Subsequently, we use this model to identify blasts absent in the control data (i.e., “novelties”). These predicted leukemic blasts can be visually inspected during expert evaluation, and, combined with a manually determined or predicted WBC count, yield a computational MRD percentage (cMRD%). We explain these steps in more detail in the following sections.

**Fig. 1: Proposed approach for computational AML-MRD assessment.**

Interpretable identification of myeloid blasts using Gaussian mixture models

Because leukemic (LAIP + ) blasts are present in very low proportions (below 0.1% of WBCs) in bone marrow samples after chemotherapy, we applied a strategy to enrich for these cells, facilitating their detection. Leukemic blasts are blocked in differentiation and can be found in the compartment of hematopoietic immature blasts, characterized by the following flow cytometry parameters: low side scatter (SCC-A), moderate expression of CD45, and high expression of CD34 or CD117²⁶. To enrich for these cells without human intervention, we trained GMMs on SSC-A intensity and CD45, CD34, and CD117 expression patterns. However, instead of using GMM in the usual unsupervised clustering setting, we used them for classification (Supplementary Text 1). By fitting separate GMMs on blast and non-blast populations in labeled training data, we could calculate two scores for every cell: the (log-)likelihood of a cell belonging to the blast model and the (log-)likelihood of a cell belonging to the non-blast model. These scores were used to classify cells by assigning them to the most likely population.

We benchmarked the GMM classifier (GMMclf) for blast identification against other supervised algorithms, such as logistic regression (LR), support vector machines (SVM), random forests (RF), gradient boosting (LightGBM) and a supervised classifier based on FlowSOM clustering (FlowSOMclf). Performance was first evaluated on a training set (BLAST110) of 90 AML and 20 normal bone marrow samples comprising different manually gated phenotypes (Fig. 2a). To obtain an unbiased performance assessment with optimal hyperparameters (Supplementary Table 2), models were compared using nested cross-validation, using a model selection step before further evaluation and validation³³. Cross-validation in the training set revealed GMMclf as the best-performing method in terms of accuracy (Median = 0.987) and recall (Median = 0.997) (Fig. 2b). Moreover, predicted blast counts showed high concordance (Spearman’s rho = 0.940) with manual analysis (Fig. 2c). While the FlowSOM-based classifier showed the highest F-score and specificity, the superior sensitivity and interpretability of the GMM classifier outweighs this modest difference, especially in a pre-filtering setting.

**Fig. 2: Classification of myeloid blasts using machine learning.**

After initial benchmarking, we further refined the GMM classifier on the full BLAST110 cohort with an exhaustive parameter search. In this setting, we observed optimal performance when using two components for blasts and three for non-blasts (Supplementary Fig. 3). We validated this final model in a test cohort of 29 AML patients (LAIP29) in which blasts were gated by a different expert to prevent human bias. Compared to performance in the BLAST110 training set, the final model resulted in a higher F-score (Median = 0.861; Fig. 2d) and concordance (Spearman’s Rho = 0.974; Fig. 2e) with manual analysis in the LAIP29 test set. This may be based on the model refinement or differences in gating.

Lastly, we evaluated if the leukemic blasts we aimed to identify in the second stage of the pipeline (Fig. 1) were not lost as false negatives as a result of automated blast identification. High recall for leukemic blasts was confirmed (Fig. 2f). In isolated instances of notable (>50%) loss of leukemic populations, LAIP expression on CD45- (pt7) or CD34-CD117- cells in the blast region (pt9) accounted for the observed variation—cell types not accounted for during model training.

Because the components in each model approximate known cell populations (Fig. 2g), model decisions (Fig. 2h) remain interpretable in 2D visualizations. This, combined with the observed performance, yields an automated blast identification that is both accurate and highly interpretable.

Reference models describe myeloid maturation patterns in regenerating bone marrow

Leukemic blasts in AML are commonly defined using aberrant protein expression patterns, which are absent on blasts in normal or regenerating bone marrow. We first evaluated whether a GMM could model patterns of hematopoiesis in regenerating bone marrow, which could be a negative control for detecting leukemic blasts in AML samples. In 18 apparent disease-free AML samples after induction chemotherapy (regenerating bone marrow 18; RBM18 set), we used cross-validation to pick an optimal number of components (K) in each tube for the fluorescence intensities of manually gated and predicted blasts, ensuring that components generalize across the cohort without overfitting (Fig. 3a).

**Fig. 3: Reference modeling using Gaussian mixture models.**

Depending on the input data and tube, we identified a different optimal number of components, ranging from 5 to 11 (Supplementary Fig. 4), which were subsequently annotated by experts (Supplementary Tables 3–4). This showed that the reference models provided a detailed representation of known healthy progenitor counterparts to leukemic populations such as promyelocytes, early monocytes and erythroid precursors. Bivariate visualizations of the model components captured known maturation patterns (Fig. 3b, c). Additionally, certain cell types such as CD36+ erythroid precursors (component 3 in Fig. 3d) and CD19+ pre-B cells (component 2 in Fig. 3e) stood out as single components. In several components, residual debris, cell types without a malignant counterpart in AML (e.g., pre-B cells) and cells with high background levels of LAIPs (CD22, CD15, CD56) that are potentially induced by chemotherapy could still be identified (Supplementary Tables 3–4). However, as these cell types are also identified in test cases, their inclusion in the model will prevent false positives during novelty detection, and therefore we decided to still include these cells.

Novelty detection facilitates automated AML MRD assessment

Using the reference models, we evaluated whether GMM-based novelty detection (Supplemental Text 1) can identify leukemic blasts in the LAIP29 cohort. We benchmarked performance with the UMAP-HDBSCAN pipeline²¹, the only comparable published method capable of semi-supervised detection of leukemic populations in a fully automated setting, and novelty detection algorithms not yet evaluated in the context of AML: the one-class support vector machine (nuSVM) and local outlier factor (LOF).

To perform GMM-based novelty detection, we first calculated the log-likelihood based on the reference models for all blasts. To identify aberrant blasts, we evaluated a range of potential thresholds based on percentiles of the log-likelihood distribution in regenerating bone marrow below which we label cells leukemic (Fig. 4a). For example, a cell with a log-likelihood below the 10% percentile can be considered more anomalous than 90% of cells in regenerating bone marrow. As a proof-of-principle, we evaluated six intuitive thresholds: 25%, 10%, 5%, 2.5%, 1% and 0.1%. Using the GMM with manually gated blast input, we found that the 25% log-likelihood percentile threshold recovered nearly 100% of leukemic blasts in a semi-automated setting (Fig. 4b). Lowering the threshold reduced the number of recovered cells but increased the precision. At a percentile threshold of around 1%, a trade-off between precision and recall was identified. For predicted blast input in a fully automated setting, this cut-off was around 5%, suggesting a lower discriminatory potential. Using these thresholds (1% for manually gated blast input in a semi-automated setting, 5% for predicted blast input in a fully automated setting), we retrieved a leukemic blast count for every sample.

**Fig. 4: Automated AML-MRD assessment using GMM-based novelty detection.**

The prognostic relevance of MRD after initial chemotherapy is established and clinically validated using the CD45-expressing cells as a denominator². To translate the predicted number of leukemic blasts in our pipeline to a similar, clinically useful MRD status, we hypothesized that the WBC count could be inferred from the mixture components of the GMM classifier based on the presence of CD45 positive and negative components in parallel to the first blast classification (Fig. 2g). We obtained the cell counts associated with each component for the non-blast GMM in the BLAST110 cohort and used this as input for a linear regression model to predict the manual WBC count. This model could accurately determine the WBC count in the BLAST110 and LAIP29 cohorts with a root mean square error of around 21,000 (Supplementary Fig. 5). Like predicted blast counts, WBC counts were highly correlated with manual analysis in the BLAST110 training cohort (Spearman’s rho = 0.975) and the LAIP29 test cohort (Spearman’s rho = 0.973). Consequently, we used the predicted WBC count in a fully automated setting to calculate the cMRD%.

Enumerating the predicted leukemic blast count combined with the WBC count resulted in a cMRD% for each of the four methods for the semi-automated and fully-automated setting. The GMM-based cMRD% in the LAIP29 cohort showed high concordance with manual gating in both the semi (Spearman’s rho = 0.766) and fully (Spearman’s rho = 0.823) automated setting (Fig. 4c). Performance was superior to the UMAP-HDBSCAN approach, which did not recover any leukemic blasts in 34% (semi-automated) or 44% (fully automated) of samples, as leukemic populations failed to form distinct clusters in the UMAP embedding (Fig. 4d). In general, all novelty detection methods (GMM, LOF, nuSVM) outperformed the clustering-based approach of UMAP-HDBSCAN (Fig. 4e).

Next, as a proof-of-principle, we evaluated whether the cMRD% calculated by the GMM-based pipeline could distinguish between MRD+ and MRD- samples in the AML follow-up measurements in the LAIP29 cohort (n = 35, mean 937,164 total cells per sample). We found that the pipeline allowed for distinguishing between MRD positive and negative samples (ROC-AUC > 0.97) in the semi- and fully automated setting (Fig. 4f) in under three seconds (Supplementary Table 5). Visualization of the optimal cMRD% thresholds defined by ROC-AUC statistics revealed that 34 out of 35 (97%) measurements can be correctly classified. One measurement (tube P1 of pt11) was misclassified using the these cut-offs as a false negative. Interestingly, this patient became MRD-negative shortly after and did not die of AML-related causes.

Predictions made by the pipeline can be interpreted in conventional bivariate plots. We illustrate this with one of the follow-up samples from the LAIP29 cohort (pt13). Manual gating of this sample identified two LAIPs (CD13 + /CD15 + , CD13 + /HLA-DR-) expressed on CD34+ blasts (Fig. 5a). Interestingly, both LAIPs were co-expressed on the majority of cells, but a minority also expressed either one of the LAIPs independently. In Fig. 5b, we show how the log-likelihood percentile highlights the bulk of anomalous cells, corresponding with the aberrant region identified by gating in Fig. 5a. Using the optimal log-likelihood percentile threshold identified in Fig. 4b, the pipeline correctly identified 5870 out of 6129 leukemic cells (Fig. 5c).

**Fig. 5: Comparison of manual and computational MRD assessment of an AML follow-up sample (pt13 in LAIP29).**

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Discussion

This study assessed the feasibility of using mixture models for automated AML-MRD assessment. We found that classification using two GMMs is a suitable approach for automated blast detection, which is highly accurate (Median F-score = 0.86; Fig. 2d) and outperforms more complex supervised learning algorithms. Moreover, this approach is highly flexible, as model components could be repurposed for additional tasks such as inference of the WBC count. To identify leukemic blasts in the heterogeneous context of AML, we used GMMs for novelty detection rather than previous supervised or “cluster-with-normal” approaches. This approach outperformed the UMAP-HDBSCAN pipeline and correctly classified MRD status in 34 out of 35 AML follow-up measurements (Fig. 4g) in under three seconds. Beyond this set, further retrospective and prospective studies in independent cohorts are required for validation of the model and its hyperparameters. Importantly, all model decisions throughout the pipeline remain interpretable in conventional 2D visualizations, demonstrating a promising powerful automated AML-MRD assessment.

A key advantage of our approach lies in its minimal dependence on training data. With just 18 regenerating bone marrow samples, we could effectively discriminate between MRD-positive and MRD-negative samples. This facilitates scalability and transfer of MRD assessment across clinical centers and mitigates issues related to cross-center batch effects. Additionally, as the field moves towards higher dimensional cytometry platforms³⁴, our approach offers a smoother transition than supervised models, which become invalid upon adopting new platforms. While our utilization of clinical data for control sample selection may not always be feasible, alternative sources, such as samples from ALL^11,12, stage 1 lymphoma¹³, immune/idiopathic thrombocytopenic purpura patients²⁰ have been successfully employed in previous studies. Although normal bone marrow could be the most accessible type of control, induction chemotherapy may induce aberrancies³⁵, which should ideally be represented in reference models. Nevertheless, the optimal size and suitability of different control samples have not been explored extensively and should be evaluated in future research.

Despite the model reaching a high level of concordance with manual MRD assessment, performance remained suboptimal at the cell level. In particular, we found a high false positive rate as many blasts gated as non-leukemic in AML samples were classified as novelties. These anomalous events within measurements may be attributed to different sources such as poor sample quality, varying antibody titrations or instrument settings and aspecific antibody staining, causing cells to occupy otherwise empty regions within feature space. Preemptive sample-level quality control measures could potentially identify such cases beforehand, evaluating the extent of sample alignment with reference data. Alternatively, discordant cases could contain leukemic populations missed by manual gating, but validating this requires an alternative ground truth at the single cell level.

Another possibility for refining a GMM-based approach is through improved modeling. For example, modeling control data is performed in a “pooled” data setting, but does not take into account the sample-level information. Inclusion of such information has already proven useful in unsupervised³⁶ and supervised settings³⁷ but could prove particularly beneficial in heterogeneous AML cohorts. Alternatively, alternative mixture model architectures can improve the modeling of non-Gaussian clusters³⁸. By better approximation of the distribution of cells in feature space, such models could aid in delineating healthy progenitors, improving the detection of anomalous leukemic blasts as novelties. This aspect is also relevant for novel AML-MRD panels³⁹, as estimating density patterns in high dimensional cytometry data may be challenged by the curse of dimensionality⁴⁰.

Several studies have also evaluated the possibility of using unsupervised clustering algorithms such as FlowSOM³¹ to objectify and augment manual gating in a computer-assisted setting^15,17,18,19. Although our pipeline output could serve as the input for further refinement in gating software, we recently demonstrated limited inter-operator variability for our assay⁴¹. Nevertheless, it could potentially identify leukemic populations not identified by manual gating, thereby augmenting the gating process. This is of particular interest for centers with less experience in AML MRD gating, where this data-driven approach could aid harmonization efforts. However, human examination of model output will remain crucial for models used in clinical settings. In our opinion, the interpretability of GMMs in bivariate visualizations can aid in this effort by allowing for a detailed interpretation of the models “under the hood”.

Finally, by making the standardized and labeled datasets used in this study publicly available, we hope other researchers are stimulated to advance flow cytometry bioinformatics for AML. Consequently, future studies can build upon these findings, improving the modeling of rare and heterogeneous cell populations in high dimensional data to advance the management of AML.

Data availability

FCS files from the BLAST110 and LAIP29 datasets and the associated cell labeling are publicly available at https://zenodo.org/records/11046402 (https://doi.org/10.5281/zenodo.11046401)⁴². All source data is available in Supplementary Data 1. All other data are available from the corresponding author on reasonable request.

Code availability

Scripts for preprocessing FCS files and subsequent modeling and analysis are available at https://github.com/AUMC-HEMA/cMRD-manuscript and are deposited at https://zenodo.org/records/13474900 (https://doi.org/10.5281/zenodo.13474899)⁴³.

References

Short, N. J. et al. Association of measurable residual disease with survival outcomes in patients with acute myeloid leukemia: a systematic review and meta-analysis. JAMA Oncol. 6, 1890–1899 (2020).
Article PubMed PubMed Central Google Scholar
Heuser, M. et al. 2021 Update on MRD in acute myeloid leukemia: a consensus document from the European LeukemiaNet MRD Working Party. Blood 138, 2753–2767 (2021).
Article PubMed PubMed Central Google Scholar
Grimwade, D. & Freeman, S. D. Defining minimal residual disease in acute myeloid leukemia: which platforms are ready for “prime time”? Blood. J. Am. Soc. Hematol. 124, 3345–3355 (2014).
Google Scholar
Salama, M. E. et al. Artificial intelligence enhances diagnostic flow cytometry workflow in the detection of minimal residual disease of chronic lymphocytic leukemia. Cancers 14, 2537 (2022).
Article PubMed PubMed Central Google Scholar
Nguyen, P. C. et al. Computational flow cytometry provides accurate assessment of measurable residual disease in chronic lymphocytic leukaemia. Br. J. Haematol. 202, 760–770 (2023).
Article PubMed Google Scholar
Arvaniti, E. & Claassen, M. Sensitive detection of rare disease-associated cell subsets via representation learning. Nat. Commun. 8, 14825 (2017).
Article PubMed PubMed Central Google Scholar
Verbeek, M. W. et al. Minimal residual disease assessment in B‐cell precursor acute lymphoblastic leukemia by semi‐automated identification of normal hematopoietic cells: a EuroFlow study. Cytometry Part B: Clin. Cytometry 106, 252–263 (2023).
Flores-Montero, J. et al. Next Generation Flow for highly sensitive and standardized detection of minimal residual disease in multiple myeloma. Leukemia 31, 2094–2103 (2017).
Article PubMed PubMed Central Google Scholar
Ko, B.-S. et al. Clinically validated machine learning algorithm for detecting residual diseases with multicolor flow cytometry analysis in acute myeloid leukemia and myelodysplastic syndrome. EBioMedicine 37, 91–100 (2018).
Article PubMed PubMed Central Google Scholar
Ni, W. et al. Automated analysis of acute myeloid leukemia minimal residual disease using a support vector machine. Oncotarget 7, 71915 (2016).
Article PubMed PubMed Central Google Scholar
Licandro, R. et al. Application of machine learning for automatic MRD assessment in paediatric acute myeloid leukaemia. Cancer cells 1012, 1010 (2018).
Google Scholar
Licandro, R. et al. WGAN latent space embeddings for blast identification in childhood acute myeloid leukaemia. In 24th International Conference on Pattern Recognition (ICPR) 3868–3873 (ICPR, 2018).
Rajwa, B., Wallace, P. K., Griffiths, E. A. & Dundar, M. Automated assessment of disease progression in acute myeloid leukemia by probabilistic analysis of flow cytometry data. IEEE Trans. Biomed. Eng. 64, 1089–1098 (2016).
Article PubMed PubMed Central Google Scholar
Weijler, L. et al. FATE: Feature-Agnostic Transformer-based Encoder for learning generalized embedding spaces in flow cytometry data. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 7956–7964 (IEEE, 2024).
Shopsowitz, K. et al. MAGIC-DR: An interpretable machine-learning guided approach for acute myeloid leukemia measurable residual disease analysis. Cytometry. Part B, Clin. Cytometry. 106, 239–251 (2024).
Terwijn, M. et al. High prognostic impact of flow cytometric minimal residual disease detection in acute myeloid leukemia: data from the HOVON/SAKK AML 42A study. J. Clin. Oncol. 31, 3889–3897 (2013).
Article PubMed Google Scholar
Vial, J. P. et al. Unsupervised flow cytometry analysis allows for an accurate identification of minimal residual disease assessment in acute myeloid leukemia. Cancers 13, 629 (2021).
Article PubMed PubMed Central Google Scholar
Bücklein, V. et al. Flowsom: an R-based evaluation strategy for flow cytometry-based Measurable Residual Disease (MRD) diagnostics in Acute Myeloid Leukemia (AML). Blood 134, 4656 (2019).
Article Google Scholar
Canali, A. et al. Prognostic impact of unsupervised early assessment of bulk and leukemic stem cell measurable residual disease in acute myeloid leukemia. Clin. Cancer Res. 29, 134–142 (2023).
Article PubMed Google Scholar
Jacqmin, H. et al. Clustering and Kernel density estimation for assessment of measurable residual disease by flow cytometry. Diagnostics 10, 317 (2020).
Article PubMed PubMed Central Google Scholar
Weijler, L. et al. UMAP based anomaly detection for minimal residual disease quantification within acute myeloid leukemia. Cancers 14, 898 (2022).
Article PubMed PubMed Central Google Scholar
Chari, T. & Pachter, L. The specious art of single-cell genomics. PLOS Comput. Biol.19, e1011288 (2023).
Article PubMed PubMed Central Google Scholar
Wang, S., Sontag, E. D. & Lauffenburger, D. A. What cannot be seen correctly in 2D visualizations of single-cell ‘omics data? Cell Syst. 14, 723–731 (2023).
Article PubMed PubMed Central Google Scholar
Löwenberg, B. et al. Addition of lenalidomide to intensive treatment in younger and middle-aged adults with newly diagnosed AML: the HOVON-SAKK-132 trial. Blood Adv. 5, 1110–1121 (2021).
Article PubMed PubMed Central Google Scholar
Cloos, J. et al. Comprehensive protocol to sample and process bone marrow for measuring measurable residual disease and leukemic stem cells in acute myeloid leukemia. JoVE (Journal of Visualized Experiments), 133, e56386 (2018).
Zeijlemaker, W., Kelder, A., Cloos, J. & Schuurhuis, G. J. Immunophenotypic detection of measurable residual (stem cell) disease using LAIP approach in acute myeloid leukemia. Curr. Protoc. Cytometry 91, e66 (2019).
Article Google Scholar
Finak, G., Jiang, W. & Gottardo, R. CytoML for cross‐platform cytometry data sharing. Cytometry Part A 93, 1189–1196 (2018).
Article Google Scholar
Emmaneel, A. et al. PeacoQC: Peak‐based selection of high quality cytometry data. Cytometry Part A 101, 325–338 (2022).
Article Google Scholar
Parks, D. R., Roederer, M. & Moore, W. A. A new “Logicle” display method avoids deceptive effects of logarithmic scaling for low signals and compensated data. Cytometry Part A: J. Int. Soc. Anal. Cytol. 69, 541–551 (2006).
Article Google Scholar
Hahne, F. et al. flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinform. 10, 1–8 (2009).
Article Google Scholar
Van Gassen, S. et al. FlowSOM: Using self‐organizing maps for visualization and interpretation of cytometry data. Cytometry Part A 87, 636–645 (2015).
Article Google Scholar
Couckuyt, A., Rombaut, B., Saeys, Y. & Van Gassen, S. Efficient cytometry analysis with FlowSOM in Python boosts interoperability with other single-cell tools. Bioinformatics 40, btae179 (2024).
Article PubMed PubMed Central Google Scholar
Lever, J., Krzywinski, M. & Altman, N. Points of significance: model selection and overfitting. Nat. Methods 13, 703–705 (2016).
Article Google Scholar
Fokken, H. et al. A 19‐color single‐tube full spectrum flow cytometry assay for the detection of measurable residual disease in acute myeloid leukemia. Cytometry Part A 105, 181–195 (2023).
Hanekamp, D., Bachas, C., van de Loosdrecht, A., Ossenkoppele, G. & Cloos, J. Re: Myeloblasts in normal bone marrows expressing leukaemia-associated immunophenotypes. Pathology 52, 289–291 (2020).
Article PubMed Google Scholar
Cron, A. et al. Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples. PLoS Comput. Biol. 9, e1003130 (2013).
Article PubMed PubMed Central Google Scholar
Wödlinger, M. et al. Automated identification of cell populations in flow cytometry data with transformers. Comput. Biol. Med. 144, 105314 (2022).
Article PubMed Google Scholar
Frühwirth-Schnatter, S. & Pyne, S. Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 11, 317–336 (2010).
Article PubMed Google Scholar
Fokken, H. et al. A 19‐color single‐tube full spectrum flow cytometry assay for the detection of measurable residual disease in acute myeloid leukemia. Cytometry Part A 105, 181–195 (2024).
Article Google Scholar
Lin, L. & Li, J. Clustering with hidden Markov model on variable blocks. J. Mach. Learn. Res. 18, 1–49 (2017).
Google Scholar
Tettero, J. M. et al. Analytical assay validation for acute myeloid leukemia measurable residual disease assessment by multiparametric flow cytometry. Cytometry Part B: Clin. Cytometry 104, 426–439 (2023).
Article Google Scholar
Mocking, T. R. et al. Flow cytometry data from: computational measurable residual disease assessment in acute myeloid leukemia using mixture models. Zenodo. https://doi.org/10.5281/zenodo.11046402 (2024).
Mocking, T. R. et al. Code from: computational measurable residual disease assessment in acute myeloid leukemia using mixture models. Zenodo. https://doi.org/10.5281/zenodo.13474900 (2024).

Download references

Acknowledgements

We would like to thank W.N. van Wieringen for a critical reading of the manuscript. This project was in part funded by an AMC-VUMC Cancer Center Amsterdam grant (#CCA_2020-9-68).

Author information

Authors and Affiliations

Department of Hematology, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Tim R. Mocking, Angèle Kelder, Tom Reuvekamp, Lok Lam Ngai, Philip Rutten, Arjan A. van de Loosdrecht, Jacqueline Cloos & Costa Bachas
Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
Tim R. Mocking, Angèle Kelder, Tom Reuvekamp, Lok Lam Ngai, Philip Rutten, Arjan A. van de Loosdrecht, Jacqueline Cloos & Costa Bachas
Department of Hematology, Amsterdam UMC, Universiteit van Amsterdam, Amsterdam, The Netherlands
Tom Reuvekamp
Department of Epidemiology and Data Science, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Philip Rutten
Department of Hematology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
Patrycja Gradowska
HOVON Foundation, Rotterdam, The Netherlands
Patrycja Gradowska

Authors

Tim R. Mocking
View author publications
Search author on:PubMed Google Scholar
Angèle Kelder
View author publications
Search author on:PubMed Google Scholar
Tom Reuvekamp
View author publications
Search author on:PubMed Google Scholar
Lok Lam Ngai
View author publications
Search author on:PubMed Google Scholar
Philip Rutten
View author publications
Search author on:PubMed Google Scholar
Patrycja Gradowska
View author publications
Search author on:PubMed Google Scholar
Arjan A. van de Loosdrecht
View author publications
Search author on:PubMed Google Scholar
Jacqueline Cloos
View author publications
Search author on:PubMed Google Scholar
Costa Bachas
View author publications
Search author on:PubMed Google Scholar

Contributions

T.M. and C.B. conceptualized and designed the study. A.K., L.L.N., and P.G. assisted with curation and selection of clinical data. T.M. and A.K. performed manual gating. T.M. performed bioinformatics and statistical analyses. P.R. provided feedback on statistical modeling. A.K. and T.R. provided expertise on data interpretation. C.B. and J.C. supervised the project. T.M. wrote the paper with input from all authors.

Corresponding author

Correspondence to Costa Bachas.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Medicine thanks the anonymous reviewers for their contribution to the peer review of this work. [Peer review reports are available.]

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

transparent peer review

Supplementary Information

Description of Additional Supplementary Files

Reporting Summary

Supplementary Data 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Mocking, T.R., Kelder, A., Reuvekamp, T. et al. Computational assessment of measurable residual disease in acute myeloid leukemia using mixture models. Commun Med 4, 271 (2024). https://doi.org/10.1038/s43856-024-00700-x

Download citation

Received: 05 May 2024
Accepted: 05 December 2024
Published: 19 December 2024
DOI: https://doi.org/10.1038/s43856-024-00700-x

Subjects

Abstract

Background

Methods

Results

Conclusions

Plain Language Summary

Similar content being viewed by others

Introduction

Methods

Datasets

Flow cytometry

Flow cytometry pre-processing

Supervised blast classification

Supervised WBC count identification

Reference modeling and novelty detection

Results

Interpretable identification of myeloid blasts using Gaussian mixture models

Reference models describe myeloid maturation patterns in regenerating bone marrow

Novelty detection facilitates automated AML MRD assessment

Reporting summary

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links