Introduction

Allogeneic stem cell transplantation (HSCT) can be a curative therapeutic procedure for hematological malignancies1. Even though transplantation outcomes have improved in recent years2, post-transplant relapse is still common in acute lymphoblastic leukemia (ALL) patients3. In order to reduce post-transplant relapse, intensified myeloablative conditioning (MAC) regimens have long been sought4.

While previous studies, which were based on conventional statistical methods, suggested that intensified MAC regimens can actually reduce relapse leading to prolong prognosis in certain patient subgroups, in other patient subgroups, it can increase non-relapse mortality (NRM) after transplantation5,6,7,8. Conventional statistical methods involve numerous assumptions to simplify models, and suffer from difficulty in accounting for complex interactions among multiple variables. Because the potential benefit or disadvantage of intensified MAC in relapse or NRM depends on various factors, such as age, disease, disease status at transplantation, graft source, and human leukocyte antigen (HLA) match, effectiveness of intensified MAC is highly heterogeneous among patients. Due to this heterogeneity, determining whether to use intensified MAC in a specific case using a conventional statistical approach is challenging. Thus, in clinical practice, an approach, using intensified MAC for patients considered to be at high risk of relapse (‘high-risk approach’) is often implemented. However, with this approach, it is possible to apply intensified MAC to patients who may not benefit from it, as well as overlooking those who might benefit from it even if their relapse risk is relatively low. So, the net efficacy of this high-risk approach has not been determined.

To overcome this limitation, there has been rapid progress in machine learning-based approaches, which allow us to investigate how associations vary according to multidimensional individual characteristics9. The Bayesian causal forest (BCF) model is one such method that uses a modified version of a sum-of-tree structure optimized to detect heterogeneity in the association between interventions or exposures and their effects at the individual level, individualized treatment effect (ITE)10. In theory, applying intensified MAC regimens to patients with high estimated ITEs (‘high-benefit approach’) has the potential to maximize effectiveness of the treatment and to improve population outcomes11,12.

Therefore, using the Japanese nationwide registry, we examined heterogeneity in the association between intensified MAC regimen and survival after HSCT among adult patients with ALL. We then identified patient subgroups who are most likely to benefit from intensified MAC regimens. Lastly, we compared the performance when targeting patients with high estimated ITE (high-benefit approach) with traditional approaches, such as targeting patients at high risk (high-risk approach). Our findings indicate that the high-benefit approach, applying intensified MAC to patients with high ITEs, achieves a significant reduction in 1-year mortality. This approach highlights the potential to optimize transplantation outcomes for patients with ALL, provides valuable information for treatment decision-making, and contributes to improvements in outcomes for this patient group.

Patients and methods

Patients

Data on adult patients (age ≥ 16 years) with ALL who underwent their first allogeneic HSCT with MAC between 2000 and 2021 were obtained through the Transplant Registry Unified Management Program (TRUMP) sponsored by the Japanese Society for Transplantation and Cellular Therapy (JSTCT)13,14. Patients without survival data up to 1 year were excluded. Those who lacked HLA matching data, and those who underwent HLA-haploidentical transplantation using post-transplant cyclophosphamide were also excluded. Written informed consent for inclusion of clinical data in the registry was obtained from all patients. The study was planned by the Adult ALL Working Group of the JSTCT, approved by the data management committees of TRUMP and by the Institutional Review Board of Kyoto University Hospital, and was conducted in accordance with the Declaration of Helsinki.

Exposure ascertainment

Among MAC regimens defined according to operational definitions of the National Marrow Donor Program/CIBMTR15, we compared intensified MAC regimens, and non-intensified MAC regimens16,17. Intensified MAC included VP16/CY/TBI (VP16, total 30–40 mg/kg; cyclophosphamide, total 120 mg/kg; TBI 10–12 Gy)4,8, and high-dose Ara-C/CY/TBI (Ara-C, total 8–12 g/m2; cyclophosphamide, total 120 mg/kg; TBI 10–12 Gy)7,18.

Outcome ascertainment

For the purpose of this study, we used 1-year overall mortality after HSCT (binary outcome). Death regardless of cause was considered an event.

Other covariates

Variables collected include: patient age (years), patient sex (male, female), Eastern cooperative oncology group performance status scale (ECOG PS; 0, 1, 2, 3, or 4)19, cytomegalovirus (CMV) antibody (positive, negative), active bacterial or fungal infection at HSCT (yes, no), phenotype of disease (B-cell, T-cell), Philadelphia (Ph) chromosome (yes, no), refined Disease Risk Index (rDRI) (intermediate, high, or very high)20, time between diagnosis to HSCT (days), donor sex (male, female), sex mismatch between donor and patient (matched, male to female, or female to male), ABO blood type mismatch (match, any mismatch), HLA mismatch (match, mismatch), graft source (related bone marrow [BM], related peripheral blood stem cell [PBSC], unrelated BM, unrelated PBSC, or cord blood [CB]), graft-versus-host disease (GVHD) prophylaxis (cyclosporine A [CyA]-based, tacrolimus [Tac]-based), and transplantation year (2000–2010, 2011–2021) (Table 1). HLA matching was assessed using allele data for the HLA-A, -B, -C and -DRB1 loci in BM and PBSC graft, and by using serological data for the HLA-A, -B and -DR loci in CB graft21,22. HLA mismatch was defined in the GVHD vector when recipient alleles or antigens were not shared by the donor, and was defined in the host-versus-graft direction when donor alleles were not shared by the recipient. Missing covariates were imputed using a random forest algorithm23.

Table 1 Baseline characteristics according to the treatment assignment after matching

Propensity score matching

We employed 1-to-1 propensity score matching without replacement to match participants who underwent intensified MAC regimens with those who underwent standard intensity MAC regimens at baseline, to adjust for potential confounding factors (Table 1). We used a logistic regression model to obtain the propensity score for intensified MAC, and a caliper of 0.1 standard deviations of the logit of the propensity score. The absolute standardized mean difference <0.1 indicated successful balance24.

Statistical analysis

Categorical and continuous variables were compared between groups using Fisher’s exact test and two-tailed, unpaired Student’s t-test, respectively. First, after describing baseline characteristics before and after propensity score matching where, we applied a machine learning BCF algorithm (bcf package in R)10 among the matched sample to build a model to assess the reduction in 1-year overall mortality by intensified MAC regimens (compared to the standard regimens) at individual levels. Briefly, BCF employs the following framework defined as

\({Y}_{i}=\mu \left({Z}_{i}\right)+\tau \left({Z}_{i}\right){X}_{i}+{\varepsilon }_{i}\) (1), where \(\mu \left({Z}_{i}\right)\) denotes the mean outcome among untreated as a function of covariates Z, \(\tau \left({Z}_{i}\right)\) denotes the ITE of X on Y as a function of covariates Z, and \({\varepsilon }_{i}\) denotes an error term. Both \(\mu \left({Z}_{i}\right)\) and \(\tau \left({Z}_{i}\right)\) are sums of regressions trees defined as

\(f(z)={\sum }_{n=1}^{N}\,{f}_{n}(z)\) (2), where \(f(z)\) denotes a regression tree function, and n denotes the number of trees. To avoid overfitting, BCF employs an extension of Bayesian Additive Regression Trees25. In our analysis, we built BCF model by constructing 200 and 50 regression trees including all covariates mentioned above for the prognostic function and treatment effects, respectively. The model was trained through 2500 burn-in and 2500 Markov chain Monte Carlo iterations. To reduce inductive bias, the propensity score was added to the model. The convergence of Markov chain Monte Carlo (MCMC) algorithm was assessed by calculating the potential scale reduction factor (PSRF) in Gelman and Rubin’s MCMC Convergence Diagnostic26. We then assessed the model fit and calibration performance of our BCF model by dividing the sample into quartiles by their predicted ITEs, and compared the observed association among the four subgroups separately via multivariable linear regression, i.e., subgroup analysis9.

Second, based on the BCF model, we predicted the ITE (i.e., the reduction in 1-year mortality by intensified MAC compared to standard MAC) for each patient.

Third, to explore sources of heterogeneity, we compared characteristics of the high-benefit group (defined as patients with ITEs greater than the median) and those of the low-benefit group (defined as those with ITEs equal to or lower than the median).

Fourth, we compared the magnitude of the association between an intensified MAC regimen and 1-year overall mortality across the following four approaches; i) targeting all patients (population approach), ii) targeting all younger patients aged <34 years old (median; modified population approach), iii) targeting patients with high or very high rDRI (high-risk approach), and iv) targeting patients in the high-benefit group (high-benefit approach). Differences in estimates and their 95% confidence intervals (CIs) were obtained by repeating the analysis on 1000 bootstrapped samples.

Lastly, to assess whether the developed model could identify patient populations at high or low risk of post-transplant outcomes, we compared OS, disease-free survival (DFS), relapse, and NRM after HSCT among these above-mentioned subgroups, i.e., younger/older groups, high/low risk groups, high/low benefit groups. Specifically, probabilities of OS and DFS were estimated according to the Kaplan-Meier method and compared among subgroups with the Cox proportional-hazards model. Probability of relapse and NRM were also estimated on the basis of cumulative incidence methods, and compared among subgroups with the Fine-Gray proportional-hazards model27, considering death without relapse as a competing event for relapse, and relapse as a competing event for NRM.

All statistical analyses were conducted using R version 4.1.1 (R foundation for Statistical Computing, Vienna, Austria) and Stata version 17 (Stata Corp., College Station, TX).

Reporting summary

Further information on research design is available in Nature Portfolio Reporting Summary linked to this article.

Results

Patient characteristics

Among 4652 patients (intensified 1240; standard 3412) before propensity score matching, median age at transplantation was 37 years (range, 16–72), and 44.2% were female. The median follow-up period was 32.2 months (range, 0.0–252.8). Compared to patients who were transplanted with standard intensity MAC regimens, those with intensified MAC regimens were more likely to be younger, and have better ECOG PS, and poor rDRI (Supplemental Table 1). In the intensified group, the proportion of patients who underwent transplantation after 2011 is larger than in the standard group. Then, after adjusting patient characteristics between groups using propensity score matching, the 2 groups (intensified 1220; standard 1220) were well balanced in terms of all baseline covariates (Table 1, and Supplemental Fig. 1).

Bayesian causal forest model to predict ITE of intensified conditioning

Among the matched cohort, 683 patients (intensified, 329/1220 [27.0%]; standard, 354/1220 [29.0%]) died within 1 year after transplantation. 2-year OS (64.5% vs. 61.6%; hazard ratio [HR], 0.912; 95% CI, 0.800–1.040; p = 0.168), as well as DFS (58.1% vs. 54.3%; HR, 0.887; 95% CI, 0.784–1.002; p = 0.053), and NRM (18.1% vs. 18.3%; HR, 0.994; 95% CI, 0.822–1.201; p = 0.949) after transplantation were similar when the matched 2 groups were compared as a whole, while relapse was reduced in the intensified group (23.8% vs. 27.4%; HR, 0.841; 95% CI, 0.716–0.986; p = 0.033) (Supplemental Fig. 2).

Our BCF model was well-calibrated and indicates heterogeneity in the association between an intensified MAC regimen and 1-year overall mortality (Supplemental Fig. 3); a larger predicted ITE subgroup showed a larger observed association. PSRF was 1.08, indicating the convergence of MCMC algorithm.

Next, we calculated ITEs for individual patients, and compared patient backgrounds between the high-benefit group (n = 1220) and the low-benefit group (n = 1220). Patients in the high-benefit group were more likely to be younger, male, negative for CMV antibody, T-cell phenotype, and to have higher rDRI than those in the low-benefit group (Table 2). Moreover, the proportion of patients who were transplanted with related BM, or related PBSCs was significantly higher in the high-benefit group than in the low-benefit group, while unrelated BM and PBSCs were more common graft sources in the low-benefit group. Donor sex was more likely to be female and female-to-male transplantation was more common in the high-benefit group than in the low-benefit group. CyA-based GVHD prophylaxis was used more frequently in the high-benefit group.

Table 2 Baseline characteristics according to the estimated ITE

Advantage of the high-benefit approach versus the conventional approach

Our BCF model allows us to evaluate heterogeneity across high-dimensional characteristics including continuous variables. Indeed, regarding age and rDRI, two clinically important risk factors, ITEs tended to be lower in older patients, and tended to be higher in the rDRI high or very high group (Fig. 1a, b), although we found no evidence of heterogeneity in the subgroup analysis by each characteristic (Supplemental Fig. 46), or by combination of several characteristics (Supplemental Fig. 7).

Fig. 1: Distribution of individualized treatment effects (ITEs) according to age and rDRI groups.
figure 1

a According to age (n = 2440). b According to rDRI group (n = 1523 for intermediate; n = 917 for high/very high). p values were calculated using Pearson’s correlation (a), or two-tailed unpaired Student’s t test (b). * indicates p < 0.05.

When post-transplant outcomes were compared between intensified and standard conditioning according to the predicted ITE by our BCF model, we found a significantly better DFS (HR, 0.839; 95% CI, 0.717–0.983; p = 0.030) with intensified conditioning in the high-benefit group, but not in the low benefit group (Fig. 2). Although we found similar trends for OS (HR, 0.877; 95% CI, 0.741–1.037; p = 0.124) and for a reduction in relapse (HR, 0.847; 95% CI, 0.692–1.037; p = 0.107), the 95% CIs included the null values. We did not find an increase in NRM with intensified conditioning in the high-benefit group. In the low-benefit group, we found no improvement in OS, DFS, or relapse, and no increase in NRM with intensified conditioning. These results suggest that effects of intensified regimens are determined by a combination of multiple patient background factors rather than a single variable or a set of a few variables.

Fig. 2: Comparison of outcomes according to intensity of conditioning in high- and low-benefit groups.
figure 2

a Overall survival (OS). b Disease-free survival (DFS). c Cumulative incidence of non-relapse mortality (NRM). d Cumulative incidence of relapse. The percent values in the figure represent the data at 24 months after transplantation. Shadings represent 95% confidence intervals. Hazard ratios (HRs) and p values were calculated using the Cox proportional hazards model (a, b) and Fine & Gray’s tests (c, d).* indicates p < 0.05.

When we compared the performance of the high-benefit approach to the conventional approaches of targeting all individuals eligible for MAC regimens (the population approach), young individuals (the modified population approach), or individuals with a high or very high rDRI (the high-risk approach), we found the strongest association in the high-benefit approach with a reduction in 1-year overall mortality (+5.93 percentage points [95% CI, 0.88 to 10.51]). In contrast, reductions observed with the high-risk approach (+3.84 percentage points [95% CI, −1.11 to 7.90]), the modified population approach (+2.87 percentage points [95% CI, −3.16 to 9.03]), and the population approach (reduction +2.05 percentage points [95% CI, −1.14 to 5.26]) were not significant (Table 3). These results suggest that the high-benefit-approach-guided utilization of intensified MAC regimens reduces 1-year overall death after transplantation in adult ALL patients more efficiently than conventional approaches.

Table 3 High-benefit approach vs. conventional approaches for the reduction in 1-year mortality by intensified MAC

Discussion

In this study, by applying a machine learning BCF model to the Japanese nationwide registry after HSCT, we found the heterogeneity in the effect of intensified conditioning regimens on post-transplant prognosis in adult ALL patients. Furthermore, we identified a subgroup that benefited significantly from intensified conditioning regimens (high-benefit group), and showed that targeting this subgroup for these regimens, the high-benefit approach, can improve post-transplant outcomes in patients with ALL. Our findings highlight the usefulness of the high-benefit approach in the field of allogeneic haematopoietic stem cell transplantation.

We found that the association of intensified conditioning regimes with 1-year OS varies across high-dimensional characteristics of adult ALL patients. While previous reports using conventional statistical methods suggested that treatment effects of intensified regimens vary across subgroups for limited variables5,6,8, it has been difficult to comprehensively assess treatment effects on a case-by-case basis considering various factors simultaneously. The heterogeneous ITEs observed in this study highlight the importance of comprehensively considering and evaluating various clinical factors related to patient or disease characteristic, as well as transplantation procedures, when determining which patients should receive intensified MAC.

The high-benefit group tended to be younger than the low-benefit group. While older age can be associated with higher NRM after HSCT28, previous studies focusing on intensified MAC did not demonstrate the usefulness of adopting intensified conditioning based on patient age5,6,8. Even in our study, no significant difference in OS was observed in the subgroup analysis by age, while relapse was reduced in the intensified group among the young subgroup. Moreover, NRM did not show a difference between the intensified regimen and the standard regimen in either the younger or older subgroups (Supplemental Fig. 5A–D). Therefore, although age is certainly an important factor in ITE of intensified conditioning, implementation of intensified conditioning should be determined using a combination of factors, rather than solely on age. Also, since the average age of patients undergoing HSCT is currently increasing29,30, future analyses focusing on older patients will be necessary.

For rDRI, our previous study reported that patients transplanted beyond 2nd complete remission (CR) benefit from intensified MAC5. However, another study suggested that those positive for Ph chromosome and transplanted in 1st CR benefit from intensified myeloablative regimens, while those transplanted beyond CR2 did not8. These conflicting results regarding effects of disease status may stem not only from differences in patient characteristics and statistical methodologies, but also from recent advances in targeted therapies, including novel tyrosine kinase inhibitors, bispecific T-cell engagers, and antibody-drug conjugates31,32,33. These advances can not only reduce patient pre-transplant complications from cytotoxic chemotherapies, but can also expand treatment options for post-transplant relapse.

In addition to age and rDRI, we found that patients with high ITEs, i.e., those assumed to benefit from intensified conditioning regimens, exhibited several characteristics. In particular, the proportion of patients who were transplanted with related BM, or PBSCs was significantly higher in the high-benefit group than in the low-benefit group, whereas unrelated BM and PBSCs were more common graft sources in the low-benefit group. These results suggest that the benefit of the intensified MAC may potentially differ depending on graft source. Several previous studies reported that impact of intensified conditioning varies depending on graft sources6,7,18,34. Favorable effects of intensified conditioning have been reported in umbilical CB transplants for ALL and AML, but its effects in related or unrelated BM or PBSCs have not been established. Actually, because subgroup analysis in this study did not show superiority of intensified MAC among the related BM or PBSC subgroup, the potential benefit of intensified MAC based on graft source should be further evaluated. While, in Japan, the number of transplants using unrelated PBSCs has gradually increased since 201035, when the Japan Marrow Donor Program began facilitating unrelated PBSC, the number of unrelated PBSC cases remained small in the current analysis. Further analysis is needed to evaluate impact of selecting unrelated PBSCs on the benefit of intensified MAC. Differences in graft source ware also reflected in donor sex and type of GVHD prophylaxis. In the previous studies on intensified MAC for ALL patients, the benefits of intensified MAC were reported in the Ph chromosome positive subgroup5,8 or in patients who underwent CB transplantation18. In our study, there was a tendency for the Ph-positive subgroup to derive benefit from intensified MAC compared to Ph-negative subgroup, though this was not statistically significant. While Ph chromosome has traditionally been considered a poor prognostic factor, recent advancement in tyrosine kinase inhibitors have led to notable progress in pre- and post-transplant treatments, that may have impacted outcomes3. For CB transplantation, no significant improvement in OS was observed from intensified MAC in the CB transplantation subgroup in our study. The present analysis included patients who were not in remission at the time of transplantation (while previous reports were limited to patient in CR at transplantation), suggesting that differences in patient backgrounds may have influenced the results. As in the discussion of age and rDRI, while these individual factors were associated with effects of intensified MAC, they could not identify patients who benefit from intensified conditioning, as a single variable, nor could they do so when combined with a few factors (Supplemental Figs. 4 and 7).

It is important to note that the high-benefit approach successfully identified a subgroup of patients who showed improvement in DFS without increasing NRM (Fig. 2). Indeed, the high-benefit approach could reduce 1-year overall mortality in the entire population by ~6%, while conventional approach of selecting intensified conditioning regimens based on age or rDRI alone did not achieve significant survival improvement in ALL patients eligible for transplantation with MAC (Table 3). These results suggest the advantage of a machine learning-based high-benefit approach to maximize effectiveness of conditioning regimes in improving post-transplant outcomes for ALL patients.

This study has several limitations. First, we cannot rule out the possibility that unevaluated patient characteristics may have modified the treatment effect because the BCF model only enables us to detect heterogeneous treatment effects of measured variables. Second, we focused on reduction in 1-year overall mortality, as the effects of conditioning intensity are likely to be more apparent within this timeframe. Additionally, setting the endpoint at 1 year helped minimize exclusion of cases due to shorter observation period. However, further analysis regarding longer-term prognosis is still necessary. Third, several variables, such as Hematopoietic Cell Transplantation Comorbidity Index, and pre-transplant measurable residual disease, could not be included due to frequent missing values. Since these variables could potentially influence both the exposure and the outcome, possibility of confounding bias cannot be ruled out. Some of the factors included in the analysis contained missing values. Although we imputed them with random forest imputation approach, we cannot rule out the potential bias due to missingness. Fourth, although the subgroup analysis in this study did not show a difference in the benefit of intensified MAC across different transplantation periods, the high-benefit group had a higher proportion of cases with transplants conducted in earlier periods. The potential impact of ongoing trends, such as the increase in older patients, and advancements in transplant procedures, will require further evaluation in the future. Fifth, since ethnicity potentially affects treatment effects of intensity of conditioning36, our conclusions based on Japanese cohorts should be validated in other ethnic groups.

In conclusion, using data from the Japanese nationwide registry, we found heterogeneity in the association between intensified MAC regimens and post-transplant outcomes across multi-dimensional characteristics in ALL patients. Moreover, in the selection of intensity of conditioning regimen, our high-benefit approach outperformed the conventional approach, leading to significantly reduced 1-year overall mortality. Our findings suggest that the machine-learning-based, high-benefit approach enables us to identify patients who are most likely to benefit from intensified MAC regimens, thereby helping decision-making in clinical practice and leading to improvements in post-transplant outcomes.