Introduction

Metabolic dysfunction-associated steatotic liver disease (MASLD), formerly known as non-alcoholic fatty liver disease (NAFLD)1, is a common disease that affected 32% of individuals worldwide in 20222. Its prevalence was estimated to be 25% in 20163 and thus it has increased in recent years. Approximately 76% of patients with type 2 diabetes and 50–90% of individuals with obesity have MASLD3. Moreover, hepatic steatosis correlates with the severity of obesity and 65% of individuals with grade I or II obesity, and 85% of individuals with grade III obesity, have liver steatosis4. Over 90% of individuals with severe or morbid obesity undergoing bariatric surgery have MASLD3. In 2017–2018, the prevalence of severe obesity among Americans was estimated to be 9.2% 5. The prevalence of severe obesity is higher in women than in men, and the number of individuals with obesity tripled between 1975 and 20166.

MASLD ranges from steatosis with mild inflammation to metabolic dysfunction-associated steatohepatitis (MASH), formerly known as non-alcoholic steatohepatitis (NASH)1. It is essential for the clinician to confirm if MASH is present, because the progression of fibrosis is more rapid in patients with MASH, linked to necroinflammation7 and an increased risk of developing liver cirrhosis and, ultimately, hepatocellular carcinoma (HCC). It is not surprising that MASH has become the second leading cause of HCC requiring liver transplantation in the United States8.

Bariatric surgery is currently indicated in patients with a BMI ≥ 35 kg/m2 and MASH, as there is strong evidence that bariatric surgery is effective in dramatically reducing liver steatosis, liver inflammation and fibrosis9.

In recent years, a number of noninvasive tests have been developed to assess MASH and/or liver fibrosis (serum biomarkers, transient elastography, magnetic resonance elastography, etc.), but none of them is sufficiently effective and/or they are not externally validated to specifically assess MASH10. In the presence of severe or morbid obesity, noninvasive tests to assess liver fibrosis are less accurate. In particular, transient elastography (FibroScan®), recommended by the European Association for the Study of the Liver to exclude advanced fibrosis11, fails in up to 20% of individuals with severe or morbid obesity, although advances have been made since the introduction of the XL probe12.

In the field of noninvasive testing, Fibre Evanescent Wave Spectroscopy (FEWS) in the mid-infrared (MIR) spectral range has been developed. The principle of this technology is to record the mid-infrared absorption spectrum of a drop of serum placed on a fibre-optic sensor13. The spectrum obtained indicates the molecular composition of the sample and is referred to as the patient’s metabolic fingerprint14. A recent pilot study using MIR spectroscopy showed good sensitivity and negative predictive value (NPV) for the detection of MASH in individuals with severe or morbid obesity15.

The aim of this study was to conduct a European multicenter prospective study to assess the diagnostic performance of MIR spectroscopy in the diagnosis of MASH in individuals with severe or morbid obesity.

Results

A total of 382 patients were included, of whom 345 underwent liver biopsy. Four patients were excluded due to non-compliance with inclusion criteria (medical treatment or previous history of bariatric surgery). Among the liver biopsies, 35 were excluded due to insufficient size or significant fragmentation of the biopsy. Eleven patients were excluded because they had no infrared spectra (n = 6) or a spectrum that did not meet optimal quality criteria (n = 5). A total of 295 patients were included in the final analysis (Fig. 1).

Fig. 1
figure 1

Participant flow chart specifying the number of individuals excluded from the clinical study.

Patient characteristics

Among the 295 patients (217 women, 78 men) whose data could be used to develop a diagnostic model for MASH, the median age was 42(31.5–50) years, and the median BMI was 42 (39–47) kg/m2. MASH was diagnosed in 19% of the individuals. Patients without MASH were predominantly women (78%), but the sex distribution was more homogeneous in patients with MASH (46% male and 54% female). MASH patients were significantly older (47 vs 40 years, p < 0.01) and had significantly higher liver test results (Table 1, and Supplementary Table 1). Forty-seven per cent of individuals with MASH had type 2 diabetes versus 15% of individuals without MASH, and those with MASH had significantly higher serum glucose (5.2 vs 5.8 mmol/L, p < 0.01) and HbA1c levels (5.6% vs 6.15%, p < 0.01). Fifty-one per cent of individuals with MASH had dyslipidemia compared to 24% of those without MASH (p < 0.01), with significantly higher triglyceride serum levels (1.39 vs 1.7 mmol/L, p < 0.01). Ninety-one per cent of individuals with MASH fulfilled the criteria for metabolic syndrome compared to 75% of those without MASH (p = 0.02).

Table 1 Characteristics of the patients according to MASH status.

Most individuals without MASH (63%) had grade S1 or S2 steatosis, while most MASH patients (63%) had grade S3 steatosis. Nineteen per cent of individuals with MASH had a ballooning grade B2 and 26% had inflammation grade I2, while none of the individuals without MASH presented ballooning or inflammation grades > 1. Sixteen per cent of individuals with MASH had F3 fibrosis and two individuals (3.6%) had cirrhosis (F4 fibrosis), while none of the patients without MASH had fibrosis > F2 (Table 1).

Spectral model performance

The spectral model, based on 12 spectral variables selected to separate the metabolic profiles of MASH and non-MASH patients, showed a calibration AUC of 0.80, a sensitivity of 76%, a specificity of 77%, a positive predictive value (PPV) of 45%, a negative predictive value (NPV) of 93% and an accuracy rate of 77% for a threshold of 0.1817. The performance for the validation group was not as good with an AUC of 0.68, a sensitivity of 53%, a specificity of 72%, a PPV of 31%, a NPV of 86% and an accuracy rate of 68% (Table 2).

Table 2 Performance of the spectral model to diagnose MASH in the whole population, in men and in women.

Analyses of subgroups showed that separating the cohort according to sex improved the performance of the spectral model. The performance of the spectral model applied only to men was poor, with an AUC of 0.62. For women, the performance in terms of AUC was 0.83 and 0.82 in the calibration and validation groups, respectively. For a threshold of 0.1817, sensitivities were 86% and 70%, specificities were 81% and 75%, PPV were 43% and 32%, NPV were 97% and 94% and ACC were 82% and 74% for the calibration and validation groups, respectively (Table 2).

Analysis of clinical and biological data according to sex

The analysis showed that women with MASH had significantly higher age, systolic blood pressure, liver enzymes (AST, ALT, GGT), and rates of type 2 diabetes, dyslipidemia and metabolic syndrome than women without MASH. In men, only liver enzymes (AST, ALT, GGT) were significantly elevated between patients with or without MASH (Table 3). The prevalence of type 2 diabetes and HDL cholesterol levels were higher for women with MASH compared with other patients (Table 3).

Table 3 Characteristics of the patients according to MASH status and sex.

Diagnosis model of MASH in women using spectral, clinical and biological data

To improve the performance of the spectral model for the identification of women with MASH, a mixed model was built combining spectral data with clinical and biological information. The most effective clinical and biological parameter combination for the diagnosis of MASH was age and the ALT (clinical model). This clinical model allows for the identification of MASH with a sensitivity of 80% and a specificity of 75% in the validation group for a threshold of 0.1689 (Table 4).

Table 4 Performance of the spectral, clinical and composite models to diagnose MASH in women.

The clinical model "age + ALT" was then used to improve the diagnostic performance of the spectral model.

The composite model (spectral score + age + ALT) showed a significant increase (p < 0.05) in diagnostic performance compared to the spectral model alone or the clinical model alone for both the calibration and validation groups. The performances of the composite model were AUC 0.87, sensitivity 80%, specificity 85%, PPV 47%, NPV 96% and ACC 84% for a threshold of 0.2162 in the validation group (Table 4).

Discussion

In this study we were able to show that, with mid-infrared spectroscopy using the SPID™ platform, a multivariate model associating spectral analysis data with age and ALT value can rule out the presence of MASH with a good NPV in women with severe or morbid obesity.

The diagnosis of MASH can provide valuable elements in the management of individuals with obesity and MASLD by strengthening hygienic-dietary measures16,17 and making available new weight loss drugs such as GLP-1 analogues18,19,20,21,22, bariatric surgery23 or, in the future, bariatric endoscopy24,25.

Liver biopsy is the gold standard for determining the severity of MASLD. Several noninvasive tests have been proposed for the diagnosis of MASH. These tests can be based on clinical or biological and clinical parameters such as ALT levels, the presence of a metabolic syndrome, serum biomarkers or CK18 fragments26. Other studies have evaluated the performance of imaging techniques such as magnetic resonance imaging27. However, no test is currently available for the routine diagnosis of MASH10.

Mid-infrared spectroscopy could represent a new diagnostic tool. A pilot single-center prospective study using this technology screened for MASH in individuals with severe or morbid obesity using frozen sera15. Comparing MIR spectroscopy at a threshold of 0.15 with the histological diagnosis of MASH, the study obtained sensitivities of 75% and 69%, specificities of 72% and 76%, PPV of 35% and 36%, NPV of 94% and 93%, and correctly classified patient rates of 72% and 75% for the calibration and validation groups, respectively.

Mass spectrometry studies have shown the effects of freezing on proteins28 and lipids29. A study published by Lovergne et al.30, highlighted the influence of freeze–thaw cycles on infrared spectral profiles. Analysis of MIR spectral profiles recorded with SPID™ also highlighted the influence of freezing. Thus, models built from spectral variables affected by freezing (especially those attributed to proteins) may produce diagnostic scores that are significantly influenced by the preprocessing of the sample prior to the analysis rather than just the disease.

The current study provides a multicenter prospective validation of this diagnostic approach on fresh sera and confirms the performance obtained in the pilot study. The spectral model, defined using patient spectra in the present study, does not perform better than a clinical model associating age and ALT. However, a multivariate model with “spectral score + age + ALT” outperforms each of the other two models individually. In women, the multivariate model was characterized by sensitivities of 90% and 80%, specificities of 88% and 85%, PPV of 56% and 47%, NPV of 98% and 96%, and correctly classified patient rates of 88% and 84% for the calibration and validation groups, respectively.

This study has several strengths, including the multicenter prospective study design, and immediate analysis of fresh serum. It was conducted under real-life conditions in tertiary referral centers for the management of individuals with severe or morbid obesity. Liver biopsies, the reference examination for diagnosing MASH, were obtained in optimal conditions. The biopsy was taken at the beginning of the surgery, using a needle rather than a wedge biopsy, to avoid biases associated with coagulated or subcapsular samples. Examination of biopsies was performed centrally by a world-renowned expert (PB).

However, this study has several limitations. The diagnostic potential of mid-infrared spectroscopy on the SPID™ platform in male individuals with obesity should be tested in a larger cohort, to develop a reliable model specifically for this category of individuals.

We wanted to validate a universal model (model 2019) developed from frozen sera collected in a pilot study carried out at the Nice University Hospital Center in France and previously published15. We were surprised to find that model 2019 performed poorly in this new multicenter cohort. The reasons for the inadequate performance of model 2019 are not completely understood, but could be related to differences between the cohort used for the development of model 2019 and the current multicenter validation cohort concerning: slightly different phenotypes in terms of the included patients, the sex distribution, the spectral signal acquisition technique on fresh versus frozen serum (as noted above, studies have shown that freezing can modify the chemical structure of biomolecules and therefore their spectral signature), the operative timing and modalities for performing the liver biopsy or the centralized analysis of the liver biopsy.

We therefore built a new universal model from this multicenter cohort, but found that the overall performance of this universal model was also poor. In an attempt to understand the reasons for this inadequate performance, we observed that there were quite significant phenotypic differences between the men and women included in this study. There was a trend towards an increasingly pronounced gradient of clinical and biological abnormalities between women without MASH, women with MASH, men without MASH and men with MASH (Table 3). In particular, it was remarkable how close the clinical and biological phenotypes were between men without MASH and women with MASH. We hypothesised that this proximity of clinical and biological characteristics could reflect a similarity in biochemical elements measured in serum by MIR spectral analysis. This could explain the poor diagnostic performance of the new universal spectral model in differentiating patients with and without MASH. Phenotypic differences between men and women during MASLD are observed in everyday practice and are well documented in the literature31.

Sex differences have also been demonstrated in murine models. MASLD is more frequent in men than in women before menopause and tends to become more frequent in women after menopause32. Estrogen may play a role, particularly through formic peptide receptor 2 (FRP2). The suppression of this receptor in mice fed a high-fat diet exacerbated liver damage33. Also, FPR2 expression is induced by estradiol, which helps to protect the liver from damage associated with a high-fat diet. Male mice with farnesoid X receptor (FXR) deficiency fed a Western diet develop more severe steatosis34. Moreover, after menopause, the distribution of fat in the abdominal position and the overexposure of women to metabolic syndrome and type 2 diabetes increases the risk of MASLD31. Gut microbiota show differences according to BMI, sex and menopausal or non-menopausal status, which may explain differences in metabolism in MASLD patients31. It has also been shown in animal models that immune cells from male mice promote inflammation and liver fibrosis. In addition, macrophages from female mice produce more anti-inflammatory prostanoids31,35. In MASLD mice, sexual dimorphism is observed via the expression of certain genes36. Further studies are needed to clarify these differences.

So, we decided to repeat the analyses, separating men and women. This avoided the risk of confounding due to the phenotypic proximity of women with MASH and men with MASH. The idea was to develop a model that used two different equations depending on the sex of the patient, toenable a clinician to rapidly determine the state of the liver byindicating the sex of the patient in the MIR spectroscopy software. The new model carried out on women performed well and is presented in this paper. Unfortunately, it was found that the sample of men was too small to develop an effective test for them. Reasons for this low number of men were: men were a minority (26.5%) in our cohort (as systematically found in publications studying cohorts of morbidly or severely obese subjects undergoing bariatric surgery); there was a relatively low prevalence of MASH in the men (33.3%); and there was a reduced effective inclusion period given the COVID-19 pandemic.

We are aware of the limitations of this study. A new prospective study should be conducted to obtain a new cohort of women and men to complete the development of this diagnostic innovation (MIR spectroscopy). In particular, developing a spectral model on a larger sample of men would be useful to establish a model with two equations, to be used according to the sex of the patient. Depending on the diagnostic performance of the spectral model in men, it could also be useful to improve it by creating a mixed model using clinical and biological data. These clinical and biological parameters may or may not be the same as in the mixed model developed for women.

Nevertheless, the relevance of the model in women is well established. Women represent 70–80% of the cohort of patients undergoing bariatric surgery and this tool could be helpful in the management of individuals with severe or morbid obesity.

We were not able to develop a model to predict the presence of significant, advanced fibrosis, cirrhosis, or "at-risk MASH", mainly because the degree of liver fibrosis in our cohort was low, compared with other cohorts of individuals with severe or morbid obesity38,39.

In conclusion, mid-infrared spectroscopy using the SPID™ platform may help clinicians to exclude a diagnosis of MASH in women with severe and morbid obesity and assist in their personalized management. Further studies are needed to determine in more detail the role of this tool in the care of MASLD patients.

Methods

Study design

This European multicenter prospective study was run from 21 June 2019 to 20 October 2020 in four French university hospitals (Amiens, Angers, Nancy, Nice), two French private clinics (Clinique de la Sauvegarde Lyon and Centre Hospitalier Privé Saint-Grégoire) and one Belgian university hospital (Liège). Individuals aged 18 years or older scheduled for primary bariatric surgery (sleeve gastrectomy or Roux-en-Y gastric bypass) were eligible for inclusion. Excessive alcohol consumption (> 210g/week in men, > 140g/week in women), HBV, HCV or HIV infection and any chronic liver disease were exclusion criteria.

Prior to bariatric surgery, all patients underwent a clinical evaluation and a fasting biological test to assess liver enzymes (alanine aminotransferase (ALT), aspartate aminotransferase (AST), gamma-glutamyltransferase (GGT), Alkaline phosphatase) as well as bilirubin, blood glucose level, glycated hemoglobin (HbA1c) and lipid parameters (triglycerides, high-density lipoprotein (HDL), low-density lipoprotein (LDL)).

The study was approved by local ethics committees: “Comité de Protection des Personnes Sud Méditerranée IV” and “Comité d’Ethique Hospitalo-Facultaire Universitaire de Liège”. All patients gave informed consent. The French National Agency for the Safety of Medicines and Health Products (ANSM) approved the trial under registration number 2018-A03240-55. The clinical trial protocol was registered and is available on ClinicalTrials.gov under the identification number NCT03978247 (date of first registration of the trial was 04/06/2019). The trial was conducted in accordance with the Declaration of Helsinki and in strict compliance with good clinical practice, in particular ISO 14,155:2011. Sex and gender equity in research was followed according to the SAGER guidelines.

Liver biopsy analysis

Liver specimens were collected at the time of bariatric surgery with a Tru-Cut (16G) needle in the left hepatic lobe (segments II or III). Biopsies were prepared (paraffin embedding / sectioning / slide preparation (unstained) at each investigating center, then anonymized and sent for central examination by an expert pathologist (P. Bedossa). Biopsies were characterized according to the length and number of fragments and then classified as adequate or inadequate. Biopsies were scored according to the SAF score and the diagnosis of MASH was retained in patients with steatosis ≥ 1 + lobular inflammation ≥ 1 + ballooning ≥ 140. Liver fibrosis was assessed by Sirius Red staining and was classified according to the NASH Clinical Research Network system41.

Principles of mid-infrared spectroscopy

Infrared (IR) spectroscopy is a vibrational absorption spectroscopy: when a molecule is excited by IR radiation, it can partially and selectively absorb this radiation depending on the chemical bonds it contains. These bonds are called vibrators because the interaction with the IR radiation amplifies the amplitude of these vibrations. Thus, in an infrared spectrum, the position of a band (= frequency of vibration) depends on the nature of the vibrator, and the intensity of the band (= absorption) will depend on the concentration, but also on the nature and polarity of the bond. These effects can be used to interpret spectra and identify chemical functions (C = O, N–H, C–C, etc.). The mid-infrared (MIR) is a specific infrared range extending from 4000 to 400 cm-1. The energy of this range corresponds to the energetic transitions of fundamental vibrations of molecular bonds, making it easier to interpret the spectra. When the sample being analysed is a complex matrix made up of a set of molecules, such as tissue or a biofluid, the spectrum obtained results from the absorption of all the molecules in their environment. In this way, MIR spectroscopy provides global information on a sample, such as blood serum, in a single analysis and can be considered as metabolic fingerprinting.

SPID™ technology uses an optical fibre, with a transparency ___domain from 4000 to 800 cm-1, to carry mid-infrared radiation. As infrared radiation propagates through an optical fibre, some of it penetrates slightly outside the fibre, creating an evanescent wave that can be absorbed by a sample in contact with the fibre (Fig. 2). This technology is called Fibre Evanescent Wave Spectroscopy (FEWS). A recent pilot study using SPIDTM technology showed good sensitivity and negative predictive value (NPV) for the detection of MASH in individuals with severe or morbid obesity using metabolic fingerprints (i.e., MIR spectra).

Fig. 2
figure 2

Overview of the main steps of the study leading to the construction of the MASH diagnostic models from mid-infrared spectra and biological data.

Mid-infrared spectroscopy of serum blood samples

Prior to bariatric surgery, a blood sample was collected with a serum separator tube (SST) from the fasting and awake subject. Spectra were acquired at each investigating center using a DIAFIR SPID™ FT-IR spectrometer (Rennes-France). Serum preparation was performed according to the manufacturer’s recommendations and spectra were acquired within 5 h of blood collection. For spectral acquisition, a disposable FEWS infrared sensor was placed in the spectrometer, the background signal was recorded in air, and 7 μl of fresh (unfrozen) serum was deposited on the sensor. The serum spectrum was acquired 8 min after deposition to obtain an accurate signal and to remove excess water, which can hide some infrared bands of interest. The FTIR absorption spectra were acquired in the 4000 to 800cm-1 frequency range. The nominal spectral resolution was set to 4 cm-1 and a zero-filling factor of 2 was used, resulting in a discrete spectral point spacing of 2 cm-1. A Blackman Harris 3-term apodization function was used for the Fourier transform. For each spectrum, 64 scans were co-added.

Statistics

The number of individuals recruited was based on a MASH prevalence estimate of 20% (precision of 5% and confidence interval of 95%), and an unusable data fraction of 25%. It was planned to separate the population into a calibration group (model estimation; 2/3 of patients) and a validation group (model validation; 1/3 of patients). Thus, the number of patients needed was 308 (calibration) and 154 (validation), i.e., a minimum of 462 patients. The COVID-19 crisis halted enrolment for several months, so recruitment ceased once sufficient patients for a calibration group had been recruited.

Continuous variables were summarized by the median and interquartile range. All variables were tested for normal distribution using the Shapiro–Wilk test. Comparisons were made using Student’s t-test for normally distributed data or with the Mann–Whitney U test. Nominal data were tested using Fisher’s exact test. The Kruskal–Wallis test was used to compare data between centers.

Spectral data analysis

Preprocessed mid-infrared spectra15 were analysed to discriminate between the metabolic fingerprints of MASH and non-MASH patients. For this purpose, spectra were separated into a calibration population (2/3) and a validation population (1/3) respecting the prevalence of MASH and the representativeness of each investigating center. Once the most discriminative set of variables was identified42 a logistic regression model was fitted and applied to the validation group. The result was a score between 0 (No-MASH) and 1 (MASH) expressing the probability that a given patient had MASH. To evaluate the performance of the score, particularly in terms of sensitivity, specificity, PPV, NPV and well-classified rate, the threshold was chosen according to the "criterion closest to (0.1)". The stability of the prediction (CIs) of models was estimated by 1000 bootstrap replicates (random sampling with replacement).

Clinical variables statistically associated with MASH (p < 0.05) and the spectral score were used in a multivariate analysis (backward selection according to p-value logistic regression) to define a “multivariate model”. The clinical variables used in the multivariate model were also used to define a so-called “clinical” model. Model accuracies were assessed using the area under the curve, sensitivity and specificity, and comparisons were made using bootstrap sampling (1000 replicates, random sampling with replacement). All statistical analyses were performed in R43.