Introduction

Head and neck cancer (HNC) ranks as the seventh most common cancer globally and its incidence is on the rise [1]. Treatment of HNC is multimodal involving surgery, radiotherapy, systemic therapy (i.e. chemotherapy or immunotherapy), or a combination thereof. Unfortunately, these treatment regimens are frequently associated with complications and subsequently decreased treatment tolerability [2]. Individuals with HNC are at an elevated risk of developing sarcopenia before start of treatment and during treatment due to increased susceptibility to malnutrition, associated comorbidities and treatment-related toxicities exacerbating these factors [3,4,5,6].

Sarcopenia is defined as the loss of skeletal muscle mass (SMM) and function and is associated with physical and functional impairment [7, 8]. Several methods exist to estimate whole-body SMM. One such method is to measure SMM using CT or MRI on a single cross-sectional slice. These measurements have been shown to be a strong predictor of whole-body SMM [9]. Radiologically defined sarcopenia is a deficit of SMM and often after adjustment for individual patient’s height presented as skeletal muscle index (SMI) [10]. It is an easily measured biomarker and is often retrospectively available. For this reason, most research omits the ‘function’ part of the definition and focuses on low SMM, which in itself has proven to be a significant predictive and prognostic factor. Henceforth, when referring to sarcopenia, we specifically denote radiologically defined sarcopenia for the sake of coherence in our study.

While recent systematic reviews and meta-analyses showed that sarcopenia is a negative prognostic factor for overall survival (OS), disease free survival (DFS), short-term treatment-related toxicity and postoperative complications, these studies often exhibit limitations in scope due to their focus on specific subdomains of HNC, measurement methods, or outcome factors. To provide a comprehensive overview of the available research, our review adopts a multi-level approach to meta-analysis, facilitating the extraction of multiple effect sizes from the same study cohort. This unique methodological approach allows us to examine impact of sarcopenia on all to this point researched outcome factors in HNC patients, thereby advancing our understanding of its predictive and prognostic significance [11].

This understanding of the impact of sarcopenia is crucial as it presents several potential benefits, particularly in enabling personalized patient care. To this end, multiple randomized controlled trials that modify treatment for sarcopenic patients are ongoing, including CISLOW and PECTORALIS [12, 13]. Additionally, sarcopenia can be regarded as a disease in its own right, warranting treatment. Randomized controlled trials have demonstrated significantly improved outcomes in patients who undergo prehabilitation prior to cancer surgery [14, 15]. Understanding the effects of sarcopenia and its effects on multiple patient outcomes is a critical first step.

Methods

The systematic literature review and meta-analysis was reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines with the checklist available in Supplementary Data S1 [16].

Search strategy

A search strategy was developed in collaboration with a clinical librarian to search PubMed, Embase and Cochrane and was performed in November 2021, with a supplementary search performed in June 2023. The search string included studies in patients with HNC where SMM was measured using CT and/or MRI imaging. As we aimed to include all available outcome data, no specific outcome was defined in the search strategy. The complete search strategy is available in Supplementary Data S2.

Study selection

Study selection was conducted by four authors (HvH, MvB, AS and JS). The initial screening was based on the title and abstract of each article, followed by a full-text screening. After the removal of duplicates all articles were screened by two authors independently. Any conflicts in the study selection process were resolved through consensus discussion. Studies reporting SMM measurements taken on CT or MRI and used as a predictive or prognostic factor in HNC patients were included. When multiple articles described the same population or had significant overlap the article using the larger sample size was used and the smaller one excluded. Conference abstracts and review articles were excluded. Only studies that stratified patients in sarcopenic vs. non-sarcopenic groups could be included (I.e. studies that used SMM or SMI as a continuous variable were excluded). Studies were eligible for inclusion if they included a time-to-event analysis (i.e. survival analysis, including reporting of a HR) or a comparison of sarcopenic and non-sarcopenic patients (i.e. ORs or 2 × 2 tables) or when they provided enough information where such values could be independently calculated.

Critical appraisal

Risk of bias of included studies was assessed using the Quality In Prognosis Studies (QUIPS) tool [17]. Articles were scored across six domains: study participation, study attrition, prognostic factor and outcome measurement, study confounding and statistical analysis and reporting. Studies were scored on three to five subdomains resulting in a “low”, “moderate” or “high” risk assessment. Each of the six domains within the Quality In Prognosis Studies (QUIPS) tool was evaluated for risk of bias, categorized as low (0), moderate (1), high (2), or unknown (NR). Points were allocated based on the fulfillment of two to four criteria per ___domain. Articles were generally classified as low risk of bias if they enrolled a sufficient number of patients and provided adequate baseline characteristics encompassing various factors such as sex, age, tumor localization, and stage, as well as treatment details. Additionally, studies were required to report the imaging modality used and the chosen level of measurement. Studies disclosing loss to follow-up were deemed to have the lowest risk of bias in the study attrition ___domain. All articles were screened by two authors independently. Disagreements in ___domain scoring were resolved through consensus discussions, with input from a third author if necessary. Results were displayed in a “traffic light” plot generated using robvis, a web-based application [18].

Data synthesis and analysis

One author (HvH) extracted the data from the included articles. Extracted data includes: First author’s name, year of publication, continent and country of population studied, sample size, age, sex, BMI, tumor staging, number of sarcopenic patients, level of measurement, imaging modality used, the origin of the cut-off value that was used in the study, whether the cut-off was sex-specific, treatment modality, outcome type and the associated Odds Ratio (OR) for risk-of-event outcomes or Hazard Ratio (HR) for time-to-event outcomes and the variance. When an OR was not provided in a study but other data such as a 2 × 2 table was available, the OR and associated variance was calculated.

Multilevel approach

A majority of the included studies reported on multiple outcome measures. Each individual outcome measure was extracted separately. This violates the assumption of individual effect sizes in a meta-analysis. To overcome this, and at the same time provide an overall view of the effect of sarcopenia on HNC patients, a multilevel approach was used. Multilevel approaches can solve this problem, so that multiple effect sizes within a study may be included, while controlling for dependency [19, 20]. A three-level approach was used. Level 1 represents the sampling variance for each effect size. Level 2 represents the variance within studies and level 3 represents the variance between studies. A random effects model was applied using R.

Results

Literature search results and study characteristics

Literature was searched until June 2023 and 63 articles were included for analysis. Figure 1 shows a flowchart of the search results. Table 1 shows an overview of the analyzed studies.

Fig. 1: Flowchart of search strategy.
figure 1

This flow diagram illustrates the systematic review search process, detailing the number of records identified, screened, included, and excluded.

Table 1 Overview of studies.

Quality assessment

Figure 2 shows a traffic light table of all analyzed studies. Fourteen (22.2%) articles were deemed as moderate risk of bias, with three (4.8%) articles being designated as incurring a high risk of bias. Studies scored a moderate risk of bias on the first ___domain due to a low number of participants or due to missing information describing the study cohort. Studies generally accounted for confounding by including covariates that are known to be associated with the studies outcomes.

Fig. 2: Stoplight table of QUIPS.
figure 2

D1: bias due to participation, D2: bias due to attrition, D3: bias due to prognostic factor measurement, D4: bias due to outcome measurement, D5: bias due to confounding, D6: bias due in statistical analysis and reporting. Green denotes low risk of bias, yellow moderate risk of bias and red high risk of bias.

Patient population

A total of 173 effect sizes from 63 studies were extracted [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82]. The following outcome measures were encountered: overall survival (OS), disease free survival (DFS), progression free survival (PFS), disease specific survival (DSS), locoregional control (LRC), distant metastasis free survival (DMFS), recurrence free survival (RFS), surgical-treatment-related complications, non-surgical treatment-related complications, chemotherapy dose-limiting toxicity (CDLT), feeding tube dependency, hospital readmission and intraoperative blood transfusions.

As described above risk-of-event outcomes and time-to-event data were analyzed separately. We included 70 HRs from 39 studies and 103 ORs from 34 studies. In total, these studies provide data from 14,804 HNC patients with sample sizes ranging from 41 to 1767 patients. Studies were performed in Asia (n = 25), Europe (n = 25), North America (n = 10) and Australia (n = 3). No studies from South-America or Africa were retrieved. Seven (11.1%) of these studies were prospective while the rest were retrospective in design. The patients had a mean age of 61.3 years (sd = 6.9) and a mean BMI of 24.4 kg/m² (sd = 1.50). Most studies included patients with advanced stage HNC: the mean percentage of patients with stage III/IV disease was 78.6% (sd = 21.0%). Studies include only patients treated with chemoradiotherapy (n = 13), surgery only (n = 5), or surgery with adjuvant radiotherapy (n = 4). There were 39 studies that included all their patients in a single cohort regardless of the therapy. Two studies described patients treated with immunotherapy with or without other treatment. A variety of cut-off values were employed across the studies. Twenty-eight (46.6%) studies determined the cut-off value within their cohort, while the remaining 35 utilized literature-sourced values, with a LSMI of <43.2 cm²/m² being the most commonly cited. For studies using lumbar measurements, almost all quantified all skeletal muscles at L3, with the exception of one study that focused solely on the psoas muscle area. In studies measuring SMM at C3, most adopted the method pioneered by Swartz et al., which involves measuring the paravertebral and sternocleidomastoid (SCM) [83]. If one SCM is affected by prior surgery or disease, the unaffected SCM area is doubled. Two studies using the C3 landmark excluded the SCM area, and one study quantified cervical muscle volume rather than area. Additionally, two studies included measurements at the second cervical vertebra (C2), focusing on the masticatory muscles. One study assessed muscle area at the hyoid level, quantifying the infrahyoid skeletal muscle area while excluding the pharyngeal constrictor and trapezius muscles.

Sarcopenia was determined on CT imaging in 55 studies. In the remaining eight studies CT and MRI were both used, depending on which scan was present. None of the included studies utilized MRI exclusively to evaluate skeletal muscle mass. The level of measurement was cervical in 36 studies, lumbar in 24 studies, thoracic in one study, cervical or lumbar (whichever was present) in one study and cranial in one study. Twenty-two studies used a transformation formula to estimate the SMM on L3 using a cervical scan as described in Swartz et al. [83]. On average 44.4% of patients was sarcopenic (sd = 15.6).

Risk of event (OR) analysis

Full model

A total of k = 103 effect sizes were included in this analysis. There was significant heterogeneity between all effect sizes (Q (102) = 419.418, p < 0.001). The total variance was 0.305, of which 0.041 was sampling error variance and 0.264 could not be attributed to sampling error. The distribution of the variance was 13.5% at level one (population variance), 32.2% at level two (within-study variance) and 54.4% at level three (between-study variance). The Akaike’s Information Criterion (AIC, a measure of the model fit, with lower values indicating better fit) of the three-level model was 231.8. Constraining the variance at level two to 0 increased the AIC to 241.4 (p < 0.001). Constraining the variance at level three to 0 significantly increased the AIC to 241.4 (p < 0.001). These findings confirm that a three level model is the optimal model for this dataset.

Analysis per predictive and prognostic outcome

Please note that the described outcomes are log OR, where values above 0 imply worse outcomes for sarcopenic patients and negative values 0 imply better outcomes for sarcopenic patients. The estimated log OR of the whole dataset was 0.644 (95% CI = 0.505–0.783, p < 0.001), suggesting that patients with sarcopenia have a higher risk of worse outcomes (Table 2). There were sixteen effect sizes on chemotherapy dose limiting toxicity (CDLT), showing that sarcopenia is a risk factor for dose limiting toxicity (log OR 0.760, 95% CI = 0.277–1.243, p < 0.001). There were 69 effect sizes on complications, showing that sarcopenia is a risk factor for complications (log OR = 0.669, 95% CI = 0.441–0.897, p < 0.001). There were 8 effect sizes for disease control, showing that sarcopenia was a risk factor for worse disease control (log OR = 0.638, 95% CI = 0.161–1.114, p = 0.02). There were two effect sizes for hospitalization readmission. There was no significant effect of sarcopenia on hospitalization outcomes, with a log OR of 0.683 (95% CI = −3.051–4.417, p = 0.26). There were seven effect sizes for survival outcomes. All seven studies described overall survival as a 1-, 2- or 5-year surviving fraction and were therefor included in this risk-of-event analysis. There was a significant effect of sarcopenia on survival, with a log OR of 0.808 (95% CI = 0.509–1.107, p < 0.001).

Table 2 Effect sizes of sarcopenic vs. non-sarcopenic patients per outcome.

Moderator analysis

Several variables were investigated as a moderator, including the imaging modality. The only modalities that had been performed in the studies were CT imaging and CT or MR whichever was available. This variable was not a significant moderator (Test of Moderators F (1,101) = 0.108, p = 0.743), suggesting that the effect of sarcopenia on outcome is not influenced by the type of imaging that was used. The level of measurement (I.e. cervical or lumbar measurements) was not a significant moderator (F (2,100) = 0.627, p = 0.536). The use of a prediction rule to estimate the skeletal muscle index (SMI) at the L3 level, versus using another level was also not a significant moderator (F (1,101) = 2.938, p = 0.090). The type of therapy that patients received was not a significant moderator (F (3,99) = 0.457, p = 0.713). The following variables were no significant moderators as well: continent of study (F (3,99) = 0.953, p = 0.418), BMI (F (1,52) = 0.088, p = 0.768), age (F (1,72) = 0.874, p = 0.353), gender (F (1,101) = 0.961, p = 0.329) and stage (F (1,95) = 0.095, p = 0.758).

Time-to-event (HR) analysis

Full model

A total of k = 70 effect sizes were included in these analyses. There was significant heterogeneity between effect sizes (Q (69) = 612.164, p < 0.001). The total variance at level two (within study variance) was 0.267 and the variance at level three (between study variance) was 0.019. The I² at level 1 was 8.42%, the variance at level 2 was 6.09% and the variance at level 3 was 85.48%. The AIC of the three-level model was 102.95. Constraining the variance at level three to zero significantly deteriorated the model fit (AIC = 126.6, p-value for change <0.001). Constraining the variance at level two to zero deteriorated the model fit (increased the AIC) to 103.7, but this was not a significant increase (p = 0.10). In this analysis, the three-level model does not offer a statistically significant improvement of the model fit. However, it is unwanted in meta-analysis to treat related outcomes (i.e. OS or DFS outcomes obtained from the same patient sample) as independent. Therefore, the three-level structure was retained.

The pooled log HR for the effect of sarcopenia on time-to-event outcomes was 0.606 (95% CI = 0.422–0.791, p < 0.001), suggesting that patients with sarcopenia had reduced time-to-event outcome. The only available time-to-event outcome categories were disease control (locoregional control, local disease control) and survival outcomes (overall survival, progression free survival, regression free survival, disease free survival, distant metastasis free survival). There were 27 effect sizes on disease control outcomes. Sarcopenic patients were at significantly higher risk of worse disease control with a log HR of 0.544 (95% CI 0.296–0.792, p < 0.001). There were 43 effect sizes on survival, with a log HR of 0.674 (95% CI 0.482–0.866, p < 0.001).

Moderator analysis

Moderator analysis showed that imaging modality (CT or CT/MRI) were not a significant moderator (F (1,8) = 0.3612, p = 0.062), nor was the use of a prediction rule (F (1,67) = 0.493, p = 0.485). Interestingly, the level of measurement was a significant moderator (F (2,67) = 4.369, p = 0.016). This was mostly influenced by the single study that used cranial level imaging to determine sarcopenia [59]. There was no significant difference between cervical or lumbar levels of measurement (p = 0.289). Continent of study was not a significant moderator (F (3,66) = 2.123, p = 0.106), nor was BMI (F (1,26) = 0.061, p = 0.807), age (F (1,34) = 0.642, p = 0.429), sex (F (1,66) = 1.750, p = 0.190), stage (F (1,46) = 0.323, p = 0.323) or treatment (F (4,65) = 0.912, p = 0.462).

Funnel plot

To investigate publication bias, funnel plots were established (Figs. 3 and 4). These plots suggested that no relevant publication bias was present.

Fig. 3
figure 3

Funnel plot for publication bias in articles reporting OR in head and neck cancer patients.

Fig. 4
figure 4

Funnel plot for publication bias in articles reporting HR in head and neck cancer patients.

Discussion

Our systematic review and meta-analysis describes data from 14,804 patients across 63 studies and shows a significant association between sarcopenia and several adverse clinical outcomes in patients with HNC. Specifically, sarcopenia was correlated with diminished survival rates and reduced disease control. Moreover, there were higher rates of complications and CDLT. It should be noted that the included studies exhibited considerable variability in disease stage, cancer site, and treatment modalities. Moreover, there was notable heterogeneity in the definition of sarcopenia across studies, including variations in measurement level, cut-off thresholds, and imaging modalities utilized. Regardless, our multilevel analysis demonstrated that sarcopenia is a consistently significant predictive and prognostic factor across all patient groups. This study highlights the potential role of sarcopenia in pre-treatment risk stratification for HNC patients, irrespective of treatment type, patient-specific factors, or tumor characteristics.

Sarcopenia is defined as the lack of SMM and function and was previously thought to be a condition exclusively related to aging [84]. However, sarcopenia can develop in cancer patients as the result of chronic systemic inflammatory processes as a reaction to the tumor [7]. Skeletal muscle tissues act as an endocrine organ secreting specific cytokines, referred to as myokines [85]. These myokines impact tissue repair and immune regulation and surveillance. A lack of SMM results in fewer myokines and therefore decreased immune function and tissue repair [85]. In HNC-patients, tumors arise in the upper air- and foodway. This may result in dysphagia or odynophagia, causing malnutrition and a catabolic state compounding the tumor biochemical related systemic negative effects [86,87,88]. These adverse effects are not confined to late disease stages, as research indicates that a significant proportion of patients present with some degree of malnutrition [89]. Within the realm of HNC, the prevalence of sarcopenia is notably elevated, with our analysis showing a sarcopenia prevalence of 44.4% among included patients, surpassing rates observed in other cancer types and corroborating earlier findings [90]. Based on its definition sarcopenia is diagnosed by measuring SMM and its function [7]. Muscle function, which can be determined by e.g. hand grip strength or gait speed, was rarely measured in the included studies, probably as it cannot be determined retrospectively. There is significant heterogeneity in the literature regarding the usage of terms and “sarcopenia” and “low muscle mass”, and these terms have been used interchangeably although this is technically incorrect. Nevertheless, low SMM by itself has been shown to be a significant predictor in earlier research and most clinical research prioritizes the amount of SMM [91, 92]. There is no current gold standard for the analysis of body composition although the reference standard for determining the amount of SMM is dual-energy x-ray absorptiometry, also referred to as DEXA, which is seldom available in a routine clinical setting [93]. Alternatives have been proposed in the form of MRI, CT, body impedance analysis and ultrasound. SMM measured on a cross-sectional image at the level of the third lumbar vertebra was first shown to correlate closely with whole-body muscle mass and has been used as a clinical standard since [9]. Additionally, research has identified surrogate measurement methods at cervical and thoracic levels, facilitating broader applicability in muscle mass assessment [83, 94]. Decreased SMM measured on cross-sectional imaging is categorized as radiologically defined sarcopenia, serving as a distinct classification from sarcopenia. However, due to ease of access most clinical research has focused on radiologically defined sarcopenia even to such a degree that the terms are used interchangeably. There are variations in regard to every step in the methodology of measuring SMM. For instance, there are several different software applications that allow for detailed muscle quantification. Previous research by van Vugt et al. showed that were was excellent inter- and intra-observer agreement in the measurement of SMM at an abdominal level using four common software applications [95]. With regards to chosen modality, research has shown that there is excellent agreement between muscle measurements taken on CT and MRI which is corroborated by our findings [96, 97]. Our analysis shows that CT emerged as the most commonly used modality for analyzing SMM, with the vast majority of studies analyzing CT imaging. This preference is likely driven by the widespread availability of CT imaging, enabling its straightforward integration into research protocols. Additionally, the delineation of muscle areas based on Hounsfield Unit pixel values allows for convenient and accurate measurements. Regarding the level of measurement, C3 was the most frequently used, closely followed by L3, with generally consistent measurement methods applied at each respective level. One study used a different method for lumbar measurements, and five studies employed alternative methods for cervical measurements. Measurements taken at different levels were not significant moderators of effect sizes. There is notable heterogeneity in the cut-off values used across studies, with nearly half (46.6%) employing a cut-off value established within their specific cohort. While studies using literature-based cut-off values also exhibited some variation, it was considerably less pronounced. Additionally, 54.0% of the included studies applied sex-specific cut-off values. The optimal method for defining sarcopenia in HNC remains unclear, as universal, ethnicity-specific, sex-specific, and BMI-dependent criteria have all demonstrated clinically significant results [77, 98, 99]. Without access to the raw data from each individual study, it is not possible to draw definitive conclusions about the impact of the chosen cut-off values. Additional differences between studies included whether SMM was used directly, adjusted for body height, or converted to muscle mass at L3, which is currently the gold clinical standard [91]. Nevertheless, our modification analysis revealed no statistically significant changes in the results, reinforcing the robust prognostic significance of sarcopenia. These differences do highlight the lack of consensus in regards to the methods for diagnosing sarcopenia in HNC patients. Considering sarcopenia is used as a determinant in randomized controlled trials in HNC patients, it is vital to the reproducibility of these trials and other research that a “ gold standard” method is reached and applied [12, 13]. Furthermore, all included studies rely on human-driven methods to quantify SMM, although recent research has begun exploring the use of machine learning models. [100, 101]. These artificial intelligence-driven approaches eliminate the need for time-consuming measurements. Although such methods greatly facilitate further research, a standardized model has yet to be established. Consensus in regards to method and cut-off value could greatly enhance the applicability of these AI-driven methods, and prevent redundant research where different AI-models utilize different methods and hamper comparability.

Our study boasts several strengths stemming from our unique methodological approach. Previous systematic reviews have explored various outcomes related to sarcopenia in HNC patients such as OS, DFS, DSS, and PFS [102,103,104,105,106,107,108,109]. These previous meta-analyses are relatively limited because they focus on specific patient groups and treatment combinations, limiting their generalizability. In contrast, our study provides a comprehensive global overview by incorporating a broader range of data and analyzing effect sizes that were previously unreported. A standard meta-analysis averages reported effect sizes from included studies under the assumption of independence, which is often not the case. Dependency can arise internally or externally within a single study. External factors may include studies conducted by the same research groups or in similar geographical locations. Internally, dependency can manifest when certain study participants are predisposed to particular outcomes, potentially leading to associations with other closely related outcome factors. For instance, a patient with a short OS may also exhibit a shortened DFS and increased risk of complications. In such cases, using both OS and DFS outcomes for a single patient in a meta-analysis effectively duplicates the data. Moreover, studies reporting multiple effect sizes can disproportionately influence the meta-analysis. Traditionally, dependency issues are addressed by either ignoring them, extracting only one effect size per study or study sample, or utilizing statistical methods to model the dependence. Our approach tackles dependency concerns by employing the three-level or multilevel method. This approach allows us to correct for the heterogeneities and dependencies present in the gathered data, thereby offering a comprehensive analysis.

We acknowledge several limitations inherent to our study. Firstly, the predominance of retrospective cohort studies (88.9%) included in our analysis introduces potential biases inherent to this study design. The exclusion of certain studies and effect sizes due to inadequate reporting of outcomes, such as odds ratios, hazard ratios, or surrogate measures, coupled with the selective reporting of statistically significant values or singular p-values in some articles, marginally reduced our statistical power. Finally, a small subset of three studies included for analysis exhibited a high risk of bias.

Conclusion

The results of this study show that sarcopenia as is robust biomarker for various predictive and prognostic outcomes in HNC patients. This relation was found regardless of factors such as age, BMI, tumor size or treatment. Our findings underscore the importance of integrating sarcopenia assessment into the development of personalized treatment strategies for all patients with head and neck cancer. Furthermore, our research provides a comprehensive overview of all the methods used of diagnosing sarcopenia in head and neck cancer. This will assist future research in determining the optimal method going forward.