Introduction

Epicardial adipose tissue (EAT) is a unique visceral adipose tissue depot located between the myocardium and the visceral layer of the epicardium, which includes the adipose tissue surrounding the coronary artery tree and the entirety of the heart1. Multiple clinical studies have highlighted the pivotal role of EAT volume in the advancement of coronary atherosclerosis, atrial fibrillation, heart failure, metabolic syndrome, insulin resistance, and other cardiovascular diseases (CVDs)2,3. Previous research has indicated that the volume of EAT can predict the risk of coronary artery disease (CAD) independently of the coronary artery calcium (CAC) score, despite calcification being a crucial component of atherosclerotic plaques4,5. Moreover, recent studies have revealed that EAT is responsive to pharmacological treatments and can be modified, making it a potential therapeutic target. Drugs with pleiotropic effects, such as sodium–glucose cotransporter-2 (SGLT2) inhibitors in the context of CVDs, have demonstrated the capability to influence EAT6,7. Therefore, the accurate segmentation and quantification of EAT hold significant importance in predicting cardiovascular event risks.

Coronary computed tomography angiography (CCTA) serves as a non-invasive and efficient diagnostic approach for the evaluation of CAD8. With its convenience, speed, high spatial resolution, and ability to encompass the entire heart volume, CCTA has been established as an ideal tool for detecting and quantifying EAT9. For the conventional and precise volumetric measurement of EAT, the typical procedure begins with segmenting EAT through the delineation of the pericardial sac. Following this, voxel thresholding within the sac is applied, utilizing the specific attenuation values of EAT ranging between [-190HU, -30HU]10,11. The process actually involves two issues: the classification of CT slices with or without pericardium and the segmentation of the pericardium. The cumulative 3D EAT volume is then calculated by reconstructed pericardial volumes. However, discerning the thin pericardial tissue layer from surrounding tissues remains challenging12. Manual measurement of EAT is time-consuming and prone to significant inter-observer and intra-observer variability13. Hence, the automated and accurate measurement of EAT is crucial in advancing the use of this indicator in clinical practice.

In recent years, deep learning methods have played an important role in driving advancements and innovation across a multitude of fields, including the ___domain of medical image segmentation14,15,16,17,18. Since 2015, when Ronneberger et al. introduced the U-Net convolutional neural network (CNN) architecture specifically for biomedical image segmentation, its symmetrical structure and skip connections have become a backbone in the field of image segmentation19. Isensee et al. further proposed the well-known nnU-Net, which is an adaptive framework for medical image segmentation. By removing the need for manual tuning, nnU-Net simplifies the development process and achieves state-of-the-art performance across a wide range of medical image segmentation challenges20. Motivated by successes of deep learning in medical image segmentation, researchers have begun to apply the CNN for automated EAT quantification. In the study by Commandeur et al.21, the authors proposed a deep learning framework for the quantification of EAT and thoracic adipose tissue (TAT, comprising EAT and paracardial adipose tissue) using non-contrast CT slices as input. The deep learning framework employs an initial CNN to identify heart limits and thoracic mask. Subsequently, another CNN, in conjunction with a statistical shape model, is utilized specifically for pericardium detection. EAT segmentations are derived from the outputs generated by both CNNs, with median Dice score coefficients (DSC) of 0.823 (inter-quartile range (IQR): 0.779–0.860). Hoori et al. developed a deep learning method for the automatic assessment of EAT from non-contrast low-dose CT calcium score images, which segments the tissue enclosed by the pericardial sac on axial slices using two pre-processing steps, achieving an average DSC of 0.885 22. However, it is worth noting that they manually screened the CT slices containing the pericardium before inputting them into the neural network. Therefore, this approach can only be considered a semi-automated method in the true sense. He et al. proposed a 3D deep attention U-Net method to help focus on target structure, achieving an average DSC of 0.887 23. Similarly, Kuo et al. used 3D U-Net for EAT automatic segmentation, achieving a DSC of 0.870 24. While these deep learning-based EAT quantification methods could generate promising initial results, many of them rely solely on data-driven approaches and may have limitations in interpretability. As a result, these outcomes might not always align with established anatomical knowledge, which could pose challenges in meeting clinical requirements in practical applications.

Inspired by the success of previous works, we propose an improved deep learning algorithm for EAT quantification based on the continuity of the pericardium in CCTA images. Initially, we employ a CNN to predict the pericardium in each axial contrast CT slice, leveraging a purely data-driven approach. As we mentioned before, such predicted pericardium structures may deviate from the true anatomical representation and do not meet our expectation. To address this concern, we carefully design a post-processing method to regularize the predicted pericardium based on its anatomical characteristics, thereby preserving integrity and continuity across each adjacent axial CT slice. Such a post-processing method aims to enhance the reliability and accuracy of pericardium prediction. The final predicted pericardium is then combined with the original contrast CT slices to quantify EAT. We refer to this deep learning approach, which incorporates medical insights into post-processing, as MIDL. Extensive numerical experiments confirm the satisfactory capabilities of our algorithm in EAT quantification and highlight its robust generalization ability.

Materials and methods

Study population

In this study, 108 patients aged between 48 and 70 years (mean age ± standard deviation: 60.1 years ± 6.0; 62 of 108 [57.4%] were men) undergoing routine CCTA scans at the Second Xiangya Hospital of Central South University from April 2022 to September 2022, were included. And their CCTA data were analyzed for EAT volume quantification. The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Second Xiangya Hospital of Central South University (Approval number: LYEC2024-0042). Due to the retrospective nature of the study, the need of informed consent was waived by the Ethics Committee of the Second Xiangya Hospital of Central South University.

CCTA imaging

All CCTA examinations were conducted using a (192 × 2)-slice dual-source CT (Somatom Definition Force, Siemens Healthineers, Germany) at the Second Xiangya Hospital of Central South University. The scan reached from the apex of the heart to the tracheal bifurcation. Scan parameters were as follows: Prospective “FLASH” ECG gating along with tube-current modulation in the angular and longitudinal direction (CareDose 4D, Siemens proprietary technology) was employed. The pitch was set at 3.2 with a collimation of (196 × 0.6) mm in the craniocaudal direction. The tube voltage ranged from 80 to 120 kV, depending on the body size of patient. Patients typically received approximately 50-70 ml of contrast medium, followed by 50 ml of saline and injected at a rate between 4 and 6 ml per second, unless solely arterial contrast was needed. The initiation of CT scanning depended on the injection site, injection velocity, duration of injection, and expected circulation time. Imaging commenced at 70% of the R–R interval. All CCTA images were reconstructed with a slice thickness of 3 mm at intervals of 3 mm using a vascular kernel. All scans were performed in free breathing. The resolution of each contrast CT slice was chosen to be N = 512.

Creation of manual segmentation dataset

To establish and validate our deep learning method, the EAT in CCTA images was manually segmented for 108 patients. All manual segmentations of the EAT were performed by X.L (radiologist with 5 years of experience) and SQ.H (radiologist with 3 years of experience). These segmentations were also cross-checked by XB. L (with 20 years of cardiac surgery experience) and J.L (with 21 years of radiology experience) using 3D Slicer software (version 4.10.2) to reach a consensus. For each patient, images were extracted from the picture archiving and communicating system and imported in DICOM format into the validated post-processing software, 3D Slicer (version 4.10.2)25. Manual EAT segmentation and quantification occurred in two phases using the mediastinal kernel. Initially, two labels were established: one to encapsulate the pericardium and its internal area, and another for EAT storage. The pericardial line, visible in CCTA images, was delineated using slice-by-slice draw methods in the axial view, with automatic interior filling. Different perspectives were switched during annotation to ensure accurate pericardial labeling. EAT was defined as fatty-like tissue located between the myocardium and the visceral pericardium, with intensity limits between [-190HU, -30HU]. All patient information was anonymized before manual segmentation. The manual segmentation schematic diagram is presented in Fig. 1.

Figure 1
figure 1

Labeled images during the labeling process. The labeling process is implemented in the order from (A) to (D). Part (A) is the original contrast CT slice, (B) is the drawing of pericardium, (C) is the label of the pericardium and its inner region in axial, sagittal, coronal and 3D views, (D) is the EAT fat pixels in Axial and 3D views.

CNN architecture

We firstly establish and train a deep CNN denoted as \(\:{P}_{\theta\:}\) to predict the pericardium in each contrast CT slice for the segmentation task under consideration. Here, \(\:\theta\:\) represents certain parameters determined during the training process. The deep neural network \(\:{P}_{\theta\:}\) is expected to solve an image segmentation task, distinguishing the pericardium from the background. For this purpose, we parameterize \(\:{P}_{\theta\:}\) using a CNN known as U-Net, renowned for its U-shaped architecture. The original version of U-Net was first proposed by Ronneberger et al. for biomedical image segmentation26. In this work, we adopt a modified version of U-Net as shown in Fig. 2, which is similar to neural network in27. Each green and orange element represents a multi-channel feature map. The left segment and the right segment of \(\:{P}_{\theta\:}\) correspond to the contracting path and the expansive path, respectively. In the expansive path, each up-sampled output is combined with the corresponding multi-channel feature map from the contracting path. Both paths employ a series of convolutional layers using (3 × 3) convolution with zero-padding and (1 × 1) convolution stride, followed by batch normalization (BN), and rectified linear unit (ReLU). Additionally, the contracting path integrates (2 × 2) max pooling layers for down-sampling, while the expansive path utilizes (2 × 2) transposed convolutions for up-sampling layers. It is worth noting that during training, we exclusively employ CT slices containing the pericardium. However, during testing, we utilize complete series of CT slices from actual patients, which include slices both with and without the pericardium. In the training process, \(\:{P}_{\theta\:}\) is trained over 200 epochs by minimizing the \(\:{L}^{2}\) loss function.

Figure 2
figure 2

The architecture of \(\:{P}_{\theta\:}\). Each green and orange item represents a multi-channel feature map. The number of channels is shown at the top of the volume, and the length and width are provided at the lower-left edge of the volume. The arrows denote the different operations, which are explained at the lower-right corner of the figure.

Post-processing method

Inspired by the a priori knowledge of the integrity and continuity of the pericardium, the proposed post-processing method is mainly based on 1D and 2D connected component analysis28,29. In order to state our method clearly, it is necessary to introduce the notion of connectivity of a binary image (e.g., segmentation results).

Connectivity in a binary image can be defined in terms of adjacency relations among pixels. The pixels with coordinates \(\:(i,j)\) in the image will be denoted by \(\:{p}_{i,j}\). For binary images, the value of a pixel is either 1 (white) or 0 (black). Two pixels \(\:{p}_{{i}_{0},{j}_{0}}\) and \(\:{p}_{{i}_{1},{j}_{1}}\) are said to be neighbors if they share one edge, i.e., \(\:\left|{i}_{0}-{i}_{1}\right|+|{j}_{0}-{j}_{1}|\)=1. Two white pixels \(\:{p}_{{i}_{0},{j}_{0}}\) and \(\:{p}_{{i}_{k},{j}_{k}}\) are said to be path-connected if there exists a sequence of white pixels \(\:{p}_{{i}_{h},{j}_{h}}\:(1\le\:h\le\:k)\), such that \(\:{p}_{{i}_{h-1},{j}_{h-1}}\) and \(\:{p}_{{i}_{h},{j}_{h}}\) are neighbors.

Let \(\:R\) represent the connectivity relation defined on a binary image \(\:S\) as follows: For any pairs of pixels \(\:p,q\in\:S\), we have \(\:(p,q)\in\:R\) if and only if \(\:p\) and \(\:q\) are both white and are path-connected in \(\:S\). It is obvious that 𝑅 is an equivalence relation, and the corresponding equivalence classes are called connected components. We note that the notion of 1D connected component can be defined similarly. To make the concept of connected components more intuitive, we provide a visualization for both 2D case and 1D case in Fig. 3.

Figure 3
figure 3

An illustration for connected components in 2D case and 1D case. The white area A, B, C, D in the 2D case denote different connected components in the binary image. Similarly, the white area a, b, c, d in the 1D case denote different connected components in the binary sequence.

We denote the segmentation results for a patient given by the well-trained \(\:{P}_{\theta\:}\) as \(\:\left\{{S}_{i}\right\}\). In some cases, \(\:{S}_{i}\) tends to be chaotic and displays multiple connected branches, which violate the a priori knowledge of the integrity and continuity of the pericardium. However, it is difficult to explicitly encode such a priori information in \(\:{P}_{\theta\:}\) during the training stage. Based on the above observation, we aim to propose a post-processing method to improve the segmentations by\(\:\:{P}_{\theta\:}\). For a binary segmentation \(\:S\in\:\left\{{S}_{i}\right\}\), we denote it by the sum of its connected components \(\:\left\{{C}_{j}\right\}\), i.e., \(\:S=\sum\:{C}_{j}\). Our post-processing method can be described by the following three steps.

  • Step 1. The area of a true pericardium intersects a slice should be large in general, and thus small connected components will be removed. Mathematically, we set

$$\:S^{\left(1\right)}:={\sum\:}_{\left|C_j\right|>\epsilon\:}C_j,$$

where \(\:|\bullet\:|\) denote the total number of pixels and the parameter \(\:\epsilon\:\) will be chosen carefully in the numerical experiment. For simplicity, we denote \(\:{S}^{\left(1\right)}\) by the sum of its connected components \(\:\left\{{C}_{j}\right\}\), i.e., \(\:{S}^{\left(1\right)}=\sum\:{C}_{j}\).

  • Step 2. Since we have removed small noises, it is expected that there is only one connected component in \(S^{\left(1\right)}\) . At least, the largest connected component \(C_\ast\) should dominate \(S^{\left(1\right)}\). Otherwise, it is reasonable to deduce that there is no pericardium in \(S^{\left(1\right)}\) . To be more specific, we set

$$\:S^{\left(2\right)}:=\left\{\begin{array}{c}C_\ast,\:\:\:if\:m\leq\:\delta\:\:and\:\left|C_\ast\right|>\gamma\:\left(\left|S^{\left(2\right)}\right|-\left|C_\ast\right|\right).\\\:0,\:\:\:otherwise.\end{array}\right.$$

Here, m represents the number of connected components. The parameters \(\:\delta\:\) and \(\:\gamma\:\) will be chosen carefully in the numerical experiment.

  • Step 3. Apply Steps 1 and 2 to each slice \(S\in\left\{S_i\right\}\) and obtain the segmentations\(\left\{S_i^{(2)}\right\}\) . Then we define the sequence\(\left\{\alpha_i\right\}\;(\alpha_i\in\left\{0,1\right\})\)  to indicate the presence of predicted pericardium in\(\left\{S_i^{(2)}\right\}\), where \(\alpha_i=0\) signifies its absence and \(\alpha_i=1\) indicates its presence. Since slices without the pericardium are distributed at the upper and lower ends of a series of CT slices, whereas slices with the pericardium are continuous, we further extract the largest connected component \(\left\{\alpha_{i,\ast}\right\}\)  from \(\left\{\alpha_i\right\}\) and define

$$\:S_i^{\left(3\right)}:={\alpha\:}_{i,\ast}S_i^{\left(2\right)}.$$

Finally, \(\:\left\{{S}_{i}^{\left(3\right)}\right\}\) is set to be the improved segmentation results as the output of our post-processing method.

The above post-processing method is illustrated in Fig. 4. The code for our post-processing method is available at https://github.com/kxtang/MIDL-for-EAT/. This way, our post-processing method not only improves the pericardium segmentation results but also accomplishes the classification of slices with and without the pericardium. Finally, we compute the 3D EAT volume of a patient based on the improved segmentation results.

Figure 4
figure 4

The three steps of our post-processing method.

Numerical experiments

We present numerical experiments to demonstrate the effectiveness of our proposed algorithm for EAT quantification. The training process is performed on COLAB (Tesla P100 GPU, Linux operating system) and is implemented on PyTorch, while the proposed MIDL is implemented by Python 3.7 on a desktop computer (Intel Core i7-10700 CPU (2.90 GHz), 32 GB of RAM). For the selected 108 patients, we construct the whole dataset by pairing their original contrast CT matrices and the pericardium matrices labeled by the experts. Then, we divide the whole dataset into training dataset (60 objects, 2205 contrast CT slices), validation dataset (8 objects, 361 contrast CT slices) and the testing dataset (40 objects, 1862 contrast CT slices). Here, the training dataset is utilized for training the modified U-Net. Throughout this training stage, we employ the Adam optimizer30 alongside Xavier initialization31, utilizing a batch size of 5 and a learning rate of 0.001. To enhance the efficiency of data utilization, we also performed data augmentation on the original CCTA images by applying random rotations, flips, and translations. By using a grid search method on the validation dataset that maximizes their Dice score, we choose hyperparameters \(\:\epsilon\:=500\), \(\:\delta\:=5\), and \(\:\gamma\:=3\) for the proposed MIDL. Finally, the testing dataset is utilized to evaluate the overall performance of the proposed MIDL. Ablation experiments without post-processing steps, as well as training and testing using nnU-Net with CCTA slices containing the pericardium, were conducted for comparison with MIDL.

Performance evaluation

In order to quantitatively evaluate the performance of our algorithms in EAT quantification, we introduce the Dice score coefficient (DSC) to quantify the overlap ratio between expert segmentation and automatic output in 2D CT slices and 3D volume. The calculation for DSC is defined as follows:

$$\:DSC\left({EAT}_{ex},\:{EAT}_{dl}\right)=\frac{2\left|{EAT}_{ex}\cap\:{EAT}_{dl}\right|}{\left|{EAT}_{ex}\right|+\left|{EAT}_{dl}\right|}$$

where \(\:{EAT}_{ex}\) represents fat tissue within the manually delineated pericardium by expert on each CT slice, and \(\:{EAT}_{dl}\) represents the EAT portion segmented automatically by the algorithm. A higher DSC value approaching one indicates better algorithm performance. Additionally, we compute the volume of an EAT segmentation by using the following formula:

$$\:Vol\left(EAT\right)=\left|{EAT}_{seg}\right|\times\:{Spacing}_{x}\times\:{Spacing}_{y}\times\:{Spacing}_{z}$$

where \(\:{EAT}_{seg}\) is an EAT segmentation for a patient. We further define the Relative Error (RE) to evaluate the accuracy of volume quantification:

$$\:RE\left({EAT}_{ex},\:{EAT}_{dl}\right)=\frac{|Vol\left({EAT}_{ex}\right)-Vol\left({EAT}_{dl}\right)|}{Vol\left({EAT}_{ex}\right)}$$

Moreover, the comparison between the volumes obtained from both MIDL measures and expert measures is conducted using Pearson correlation coefficient, Bland-Altman analysis and paired t-test. The inter-class correlation coefficient (ICC) analysis is employed to evaluate the consistency of EAT measurements performed by the initial two experts.

Results

Manual segmentation results

The initial measurements conducted by two experts for the average EAT volume of 108 patients included in this study were 96.17 ± 36.2 cm3 and 96.93 ± 36.8 cm3, respectively. The agreement between the two experts is pretty high, as evidenced by an ICC of 0.993, with p < 0.001. The scatterplot in Fig. 5 illustrates the results of manually measured EAT volumes by the two experts.

Figure 5
figure 5

Expert 1 vs. expert 2 EAT quantifications. A high ICC is obtained (ICC = 0.993, p < 0.001).

Performance of the proposed algorithm

We evaluate the performance of our algorithms using the testing dataset. To comprehensively illustrate the generalization ability of our algorithm, we demonstrate some approximate EAT segmentations produced by the proposed MIDL using the testing contrast CT slices in Fig. 6. Each row in the figure represents the segmentation outcomes of MIDL and the expert for three contrast CT slices, corresponding to the superior, median, and inferior region of the heart. The red area denotes the EAT. The segmentation results depicted in Fig. 6 demonstrate the capability of the proposed algorithm to produce satisfactory outcomes across diverse segments within the heart.

We then quantitatively assess the performance of the proposed deep learning-based algorithm. Initially, we compute the DSC between the EAT within the expert-delineated pericardium and the approximate pericardium generated by the proposed algorithm for each contrast CT slice. The median DSC for the 2D slice obtained by MIDL is 0.916 (IQR: 0.846–0.948). Meanwhile, the median DSC for the 3D volume obtained is 0.896 (IQR: 0.874–0.908) between these two measures, with an average DSC of 0.890. We also compare the EAT volume quantification estimated by the proposed MIDL algorithm and the expert quantification for each patient in the test set. The median EAT volume measures 96.41 cm³ (IQR: 69.99-121.25) and 101.51 cm³ (IQR: 69.85-123.16) for expert and automatic quantifications, respectively. A strong correlation of 0.980 (p < 0.001) is observed (Fig. 7). The Bland-Altman analysis reveals a low bias of -2.39 cm³ (95% limits of agreement: [-19.39, 14.62]) (Fig. 8). No significant differences are noted between the two distributions (p = 0.089). These quantitative findings substantiate that the proposed algorithm yields satisfactory results for EAT quantification.

Figure 6
figure 6

Segmentation results of our algorithm. Each row presents the segmentation results of MIDL and the expert for three contrast CT slices. Top, middle, and bottom rows correspond to superior, median, and inferior region of the heart, respectively. The red region indicates the EAT.

Figure 7
figure 7

MIDL vs. expert EAT quantifications. An excellent agreement is obtained with high correlation (R = 0.980, p < 0.001).

Figure 8
figure 8

Bland-Altman analysis of MIDL and expert EAT quantifications. The analysis demonstrates a non-significant bias between the two measures (-2.39 cm3, p = 0.089).

Performance comparison

In this subsection, we compare the proposed method with some existing deep learning-based segmentation methods. We compare the DSC and correlation coefficient of the MIDL with other deep learning-based EAT quantification method in Table 1, where the results are cited from their original articles. The results presented in Table 1 demonstrate that MIDL exhibits significant competitiveness among the advanced EAT quantification methods.

In the field of medical image segmentation, the nnU-Net framework has been a powerful model as a baseline. To demonstrate the capabilities of our algorithm, we compare MIDL with U-Net and the nnU-Net framework. Both comparisons are conducted using the testing dataset. Figure 9 illustrates the 2D segmentation results for several contrast CT slices. Each row demonstrates the EAT segmentation outcomes of the three algorithms previously mentioned, along with the expert segmentation result for a given contrast CT slice. Figure 10 presents examples of 3D segmentation results for EAT using U-Net, nnU-Net, and MIDL, highlighting the comparative performance of these methods in 3D visualization. To compare their performance quantitatively, we present the median DSC relative to the expert for these three methods, the mean of relative errors (MRE), and their correlation coefficients in Table 2. These results demonstrate that the proposed MIDL effectively preserves the anatomical characteristics of the pericardium and outperforms both U-Net and nnU-Net in our numerical experiments.

Table 1 Comparison of MIDL to previously reported methods.
Figure 9
figure 9

2D segmentation results of different methods. Each row presents the EAT segmentation results of three algorithms and the expert segmentation for one contrast CT slice. The red part indicates the EAT.

Figure 10
figure 10

3D segmentation results of different methods. Each row presents the 3D EAT segmentation results from three algorithms along with the expert segmentation for the CCTA series. Each column displays two different views of the 3D EAT from two patients.

Table 2 The median DSC, IQR, MRE, and correlation coefficient of EAT volume for these three algorithms in the testing set.

Discussion

In this paper, we propose and evaluate a novel MIDL method that combines the data-driven techniques and professional medical knowledge for fully automated quantification of EAT. The complete architecture of the deep learning method proposed by us is shown in Fig. 11. To our knowledge, the application of a deep learning approach enriched with comprehensive medical insights for EAT quantification in CCTA data has not been previously explored. In our algorithm, given a series of contrast axial CT slices, we firstly employ a well-trained modified U-Net \(\:{P}_{\theta\:}\) to predict the pericardium of each CT slice, and then each predicted pericardium is regularized by a post-processing method which encoded with existing medical knowledge. The detailed description and code for the post-processing method are available at https://github.com/kxtang/MIDL-for-EAT/. Various numerical experiments demonstrate that the EAT quantification results obtained by our proposed method exhibit strong consistency with expert manual measurements.

The significance of EAT in cardiovascular disease risk assessment is increasingly validated32,33. However, the manual measurement of EAT often necessitates assistance from skilled professionals and lacks precision due to its variable distribution and the complexity of surrounding structures34. Compared to expert manual segmentation, our proposed MIDL presents a time-saving solution for fully automated EAT quantification in clinical settings, without subjecting patients to additional radiation exposure or interaction with the physician. Specifically, the runtime of MIDL for a series of contrast CT slices of a patient is typically under 5 s on a standard computer, compared to approximately 20 min required for expert-performed EAT quantification. All of these advantages will potentially facilitate the clinical application of EAT quantification in routine practice, thereby enhancing cardiovascular risk assessment.

We use a modified U-Net in combination with a post-processing method, considering the anatomical and surgical integrity and continuity of the pericardium for EAT quantification. In a previous study, researchers introduced an enhanced multi-task approach by simultaneously training different architectures to develop a deep learning model capable of cardiac CT slice classification and tissue segmentation21. However, in our proposed MIDL, the modified U-Net is singularly dedicated to EAT segmentation within each contrast CT slice. It is important to note that this modified U-Net is trained exclusively with contrast CT slices containing the pericardium. This training strategy aims to focus our neural network solely on the segmentation task, potentially resulting in more accurate segmentation outcomes when presented with contrast CT slices containing the pericardium. Additionally, such a strategy is anticipated to alleviate the training burden of the neural network, thereby simplifying the training process. Ideally, the corresponding prediction \(\:{S}_{i}\) should exhibit connectivity due to the anatomical and surgical integrity within the pericardium. However, in scenarios closer to the upper and lower ends of the heart and in CT slicers without pericardium, the output generated by \(\:{P}_{\theta\:}\) tends to be chaotic, resulting in amplified noise that could potentially affect the final quantification outcome. This aligns with common knowledge as these areas are generally challenging for manual identification, resulting in poorer predictions from neural networks in these regions. Given the general lack of interpretability in deep learning methods, ensuring the reliability of segmentation provided by \(\:{P}_{\theta\:}\) is challenging. We employ a concept based on connected branches to enhance the segmentation results of the pericardium. In practical scenarios, a series of CT slices from a patient is obtained, consisting of slices with pericardium and slices at the upper and lower ends without pericardium. To achieve fully automated EAT quantification, we also incorporate the classification of slices with and without pericardium into the post-processing stage. This combined approach not only enhances the segmentation results but also effectively addresses the classification of the presence or absence of the pericardium.

The comparison with other deep learning-based methods shows the advantages of the proposed MIDL. Some of these methods employ 3D neural networks to perform EAT segmentation23,24. Since the dimension of the slice series varies among different patients, it is hard to unify the input size of a 3D neural network. A standard procedure is to resample the input slice series. However, the interpolation operation of such a procedure may destroy valuable structure of the original input slice series. Based on this observation, the proposed MIDL employs a 2D modified U-Net for EAT segmentation. Some quantification methods are semi-automatic22 whereas the proposed MIDL is a fully-automatic algorithm. Additionally, some methods first classify the existence of a pericardium for a CT slice21, we simply use the post-processing method to achieve a high classification accuracy of 94.48%, which reduces the computational resources. The comparisons in Table 1 show that our method could generate satisfactory segmentation and quantification results. General medical image segmentation methods, such as nnU-Net, typically focus on exploring novel network structure to enhance the segmentation performance. In contrast, the proposed MIDL approach uses the modified U-Net for initial segmentation results and then employs a post-processing method for further improvement. Note that our post-processing method utilizes a priori knowledge of the integrity and continuity of the pericardium, and the modified U-Net can also be replaced by other powerful backbones. The numerical experiments show that the proposed MIDL has a better performance in terms of 2D Dice score and 3D Dice score, compared to the well-known nnU-Net framework.

Our study also has several limitations. We relied on single-center CT datasets and several expert readers for supervision during training, potentially impacting the generalizability of our framework. Future endeavors aim to assess the performance of our model on larger, more diverse multi-center datasets. Notably, our proposed MIDL exhibited lower accuracy in quantifying EAT for certain patients. These outliers mostly corresponded to contrast CT slices associated with the superior and inferior regions of the heart. Additionally, in these cases, even expert identification of the pericardium lines proved challenging, contributing to the reduced accuracy of our MIDL in EAT quantification. To address these challenges, our future research endeavors involve augmenting our dataset with similar CT slices and incorporating different data augmentation method for more robust neural network training. In addition, using a single CT slice as input for the neural network might not fully leverage the similarity in structures among adjacent slices. Employing neighboring slices as input could potentially enhance the pericardium prediction for the target slice.

Figure 11
figure 11

The framework of our proposed deep learning method.

Conclusion

In this study, we propose and evaluate a novel MIDL method that merges data-driven techniques with specialized medical knowledge to achieve fully automated and precise EAT quantification. The results demonstrate the potential for improved EAT quantification when leveraging deep learning methods alongside the crucial medical expertise utilized by our manual labeling experts.