Introduction

Breast cancer ranks as one of the most prevalent malignant diseases affecting women globally. The International Agency for Research on Cancer (IARC)1 reported that in 2020, there were 19.29 million new cancer cases worldwide. The most significant change is the rapid increase in new breast cancer cases to 2.26 million, making breast cancer the world’s leading cancer for the first time, replacing lung cancer (2.2 million cases) and accounting for 11.7% of all new cancer cases.

The assessment of breast cancer screening involves a three-pronged approach: appropriate imaging, clinical evaluation, and needle aspiration biopsy. Mammography is the preferred imaging technique for screening breast cancer, serving as a crucial tool for detecting lesions and identifying early alterations in breast tissue, particularly before the onset of physical symptoms. Early detection via mammography can significantly enhance the likelihood of recovery2.

Gene expression profiling has emerged as a reliable method for categorizing breast cancers into distinct molecular subtypes based on genetic differences in tumors. Recent advancements indicate the existence of four molecular subtypes, each demonstrating unique prognostic implications in their gene expression patterns: luminal A, luminal B (both HER2 negative and HER2 positive), HER2 positive, and triple-negative3,4,5. These subtypes vary considerably in terms of phenotypic presentation, treatment response, and survival outcomes6. The amplification of the HER2 gene has proven crucial in breast cancer treatment, leading to the development of targeted therapies like trastuzumab, which have shown significant success. Historically, survival rates for patients with HER2-positive breast cancer were low, but the introduction of systemic therapies targeting HER2 has markedly improved survival outcomes7,8. The identification of HER2 gene amplification through fluorescence in situ hybridization (FISH) or HER2 protein overexpression via immunohistochemistry (IHC) is often key to recognizing patients who may benefit from these targeted treatments. Additionally, the identification of estrogen receptors (ER) and progesterone receptors (PR) represents another significant breakthrough in breast cancer therapy. The majority of breast cancers display an overexpression of ER and PR, and the advent of hormone receptor-targeting medications like tamoxifen has considerably enhanced survival rates among patients with hormone receptor-positive breast cancer9,10.

Computer-aided diagnosis (CAD) systems are designed to provide objective insights to assist radiologists in interpreting and diagnosing medical images. These systems are particularly effective in distinguishing between benign and malignant lesions. In the last few years, deep learning11,12,13, especially convolutional neural networks (CNN), has been proven to be very effective in vision tasks. Some recently proposed CAD systems have used well-known deep learning techniques and obtained promising results14,15,16. Within the ___domain of breast cancer, many researchers have explored the application of deep learning and mammography to predict the nature of breast lumps17,18,19,20. However, more intensive research is needed, particularly regarding the use of deep learning to forecast HER2, ER, and PR status based on mammography.

Previous studies have demonstrated a relationship between molecular features and mammographic features. For instance, luminal subtype tumors typically appear oval on mammography21, while triple-negative tumors often manifest as poorly defined masses or with acanthotic calcifications22. The HER2-positive subtype is characterized by calcification and structural distortion21,23. Previous studies also proved the ability of deep learning model to predict the molecular features of breast masses.

Nonetheless, previous investigations18,24,25,26,27 used manually segmented masses as input, which might not work well with breast cancer with prominent masses and lead to potential bias. Therefore, we hope to use Deep Learning to predict breast cancer HER2, ER, and PR status of breast cancer by whole mammogram images. In this study, aiming to eliminate the pain caused by puncture biopsy and the risk of tumor dissemination along the puncture tract, a CNN model was established to predict molecular features of breast cancer using whole mammogram images. Moreover, the model we constructed does not require manual segmentation of masses, which resulted in significant labor savings and reduced economic costs.

Materials

Study design

The study was a retrospective, multicentre investigation approved by the institutional review boards of all participating hospitals (Huazhong University of Science and Technology Union Shenzhen Hospital, Shenzhen, China, and Second Hospital of Shandong University, Jinan, China). The patients were informed that their de-identified images could be used for research purposes and provided their consent. The patients were sourced from the two local hospitals mentioned above, and the enrolled patients comprised two datasets: (1) a training dataset from Huazhong University of Science and Technology Union Shenzhen Hospital; (2) an independent test dataset from Second Hospital of Shandong University. A deep learning-based model was trained and validated using the training dataset to classify HER2-positive or negative subtypes, ER positive or negative, and PR positive or negative. Subsequently, the developed model was tested using the independent test dataset.

Study population

As shown in Fig. 1, in training dataset, digital imaging and medical communication (DICOM) data of mammogram in 237 patients diagnosed with breast cancer between January 2018 and November 2021 was collected, and all data do not overlap. In test dataset DICOM data of mammography radiographs in 155 patients diagnosed with breast cancer between January 2020 and May 2022 was collected, and equally, all data does not overlap. The inclusion criteria for this study were as follows: (1) breast malignancy confirmed by surgical or puncture pathology; (2) preoperative mammography; and (3) pathological results of ER and PR status using IHC and HER2 status by IHC and FISH. Exclusion criteria were as follows: (1) lack of clinical information; (2) lack of pathological findings; (3) lack of IHC results; (4) lack of FISH results when HER2 was 2 + on IHC results; and (5) mammography images after chemotherapy/targeted/endocrine therapy. In training dataset, 184, 203, and 202 patients were finally included in the training of HER2, ER, and PR status prediction, respectively. In test dataset, 146, 148, and 149 patients were finally included in the test of HER2, ER, and PR status prediction, respectively.

Fig. 1
figure 1

Flowcharts for the training dataset and test dataset.

Methodology

Histopathology

HER2, ER, and PR analysis of surgical pathology specimens: Two or more pathologists with more than 10 years of professional experience in breast pathology re-examined all breast cancer samples to confirm HER2, ER, and PR status. According to the current American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) guidelines, HER2 positivity was defined as an IHC result of 3+, or a result of 2 + but amplified by FISH (HER2/CEP17 ratio is greater than 2.0), otherwise, it was considered negative28. According to the current guideline in China29,30, the status was considered positive when the hormonal receptors (ER and PR) had a value ≥ 1%, otherwise, it was considered negative.

Image acquisition

Selenia Dimensions(Hologic, Inc.)is the mammographic image acquisition device. It uses an Image Acquisition system that includes a digital image receptor. This receptor, which covers an area of 24 cm* 29 cm, is a direct-capture detector using an amorphous selenium photo conductor. Standard 2D mammograms were acquired in cephalocaudal (CC) and mediolateral oblique (MLO) views with the Combo protocol which can automatically control exposure. Images were stored as 12-bit grayscale in DICOM format with a resolution of 4096*3328 (row*columns) and a pixel spacing of 0.07 cm*0.07 cm.

Image pre-processing

The interference information in the initial DICOM images such as machine edges, orientation labels and empty padding in the original image was removed. The images of right breast are horizontally flipped to align the image. Then the images are further resized to 512*256 (height*width) by nearest neighbor interpolation to obtain the standard image and concatenated horizontally.

Deep learning analysis

As shown in Fig. 2, the input image is a pre-processed normalized image without tumor segmentation.

Fig. 2
figure 2

Architecture of the deep learning model.

The proposed model utilizes ResNet-18 as its backbone. ResNet-18 starts with a 7*7 convolutional layer followed by a 3*3 max-pooling layer. It includes four residual blocks, each containing four convolutional layers, with skip connections applied every two layers. The features extracted by these convolutional layers are reduced in dimensionality using average pooling layers. Finally, a linear layer with a sigmoid activation function maps the output to a single floating-point number in the range [0,1], representing the prediction of the target variable. To improve the performance on unsegmented images, Convolutional Block Attention Module (CBAM)31 is inserted between two adjacent residual blocks. CBAM is a lightweight attention mechanism that can be integrated into existing convolutional neural networks. It consists of two sequential sub-modules: the channel attention module (CAM) and the spatial attention module (SAM). First, the CAM computes an attention map by applying average pooling and max pooling along the spatial dimension (height and width), followed by two fully connected layers and a sigmoid function. This attention map is used to recalibrate the channel-wise features. Next, the SAM generates an attention map by applying average pooling and max pooling along the channel dimension (CC and MLO), followed by a 7*7 convolutional layer with a sigmoid function. This map highlights important spatial regions in the feature maps. Together, these two modules enhance the representational power of the model by focusing on both important channels and spatial regions in the input features. In the training dataset, data enhancement approaches such as random clipping and random salt and pepper noise are applied. Similarly, early stopping strategy32 was also applied to prevent overfitting. We employed the binary cross-entropy loss function and the Adam optimizer for model training. To fine-tune the hyperparameters, we used 5-fold cross-validation. The hyperparameter search space is detailed in Table 1. Once the hyperparameters are determined, the model would be trained on the whole training dataset and test on our independent test dataset. The learning rate was controlled using a cosine annealing schedule.

Table 1 The hyperparameter search space details.

Models comparison

For comparison we also developed original ResNet-18 without attention module and VGG-19 with and without CBAM. The Visual Geometry Group (VGG) convolutional neural network structure is a well-known approach in the field of image recognition. It was initially proposed by the Visual Geometry Group at the University of Oxford. VGG19 comprises a total of 19 layers, including 16 convolutional layers and the final 3 fully connected layers. The prediction is conducted through the use of the sigmoid function. Study by Praharsha et al.33 demonstrated the unique classification performance of their proposed CBAM VGG architecture. For a fair comparison, all the models were trained and tested with the same datasets, hyperparameters and random seeds.

Statistical analysis

The area under the Receiver operating characteristic curve (AUC) was used as the primary metric to assess the ability of our model in predicting HER2 status (HER2-positive vs. HER2-negative), ER status (ER-positive vs. ER-negative), and PR status (PR-positive vs. PR-negative) with 95% CIs were calculated for each molecular subtype. In addition, we calculated specificity, accuracy, precision, recall and F1-score. To visualized the significance of the attention module we proposed, the class activation maps (CAM) before and after each attention block are represented in the Fig. 2, where red indicates regions more important for prediction (HER2, ER, PR-positive regions), these feature maps suggest that attention blocks can learn to focus on HER2, ER, and PR status related regions, including tumor and peritumor regions. All the deep learning development was conduct with the Python Package, PyTorch. Statistical analysis was performed with Statistical Package for the Social Sciences (SPSS, version 24, Chicago, IL, USA) and R package (v. 4.2.0; R Foundation for Statistical Computing, Vienna, Austria). DeLong test was used to compare the performance between different approaches. Statistical tests were all two-sided, and a P-value less than 0.05 was considered to indicate a statistically significant difference.

Results

Patient population and lesion characteristics

The flowcharts for the datasets are presented in Fig. 1. The fundamental demographic and clinicopathological characteristics of the patients included in the training dataset and test dataset are presented in Table 2.

Table 2 Baseline characteristics of the patients.

Deep learning analysis

Figure 3; Table 3 illustrate the outcomes of the model training for HER2, ER, and PR. With regard to HER2, the mean AUC following 5-fold cross-validation was 0.752. For ER, the mean AUC following 5-fold cross-validation was 0.733. With respect to PR, the mean AUC following 5-fold cross-validation was 0.756.

Table 3 Results of 5-fold cross-validation.
Fig. 3
figure 3

ROC curves for training dataset (5-fold cross-validation). (a) HER2 training dataset, mean AUC = 0.752; (b) ER training dataset, mean AUC = 0.733; (c) PR training dataset, mean AUC = 0.756.

The results of the model tests for HER2, ER, and PR are presented in Fig. 4; Table 4. AUC with 95% confidence intervals for the HER2 test dataset were 0.708 (0.609, 0.808), 0.785 (0.673, 0.897) for the ER test dataset, and 0.706 (0.603, 0.809) for the PR test dataset. Table 4 also presents the specificity, accuracy, precision, recall rate, and F1-score results and Table 5 shows the AUC of original ResNet-18 without attention module and VGG-19 with and without attention module.

Table 4 Performance of the CBAM ResNet-18 in test dataset.
Table 5 Performance of the original ResNet-18 without attention module and VGG-19 with and without attention module.
Fig. 4
figure 4

ROC curves for test dataset. (a) HER2 test dataset; (b) ER test set; (c) PR test dataset.

Discussion

HER2 positivity in breast cancer patients, i.e. overexpression and/or amplification of HER2, is associated with a more aggressive tumor phenotype, malignant transformation of cells, poorer clinical outcomes34, and increased resistance to endocrine therapy35. Historically, survival rates for patients with HER2-positive breast cancer were low. However, the advent of systemic therapy with HER2-targeted agents has led to a significant improvement in survival outcomes8. Similarly, the discovery of ER and PR, and the development of medicines targeting these hormone receptors, such as tamoxifen acetonide, has resulted in a notable enhancement in the survival of patients with hormone receptor-positive breast cancer9,10. It is therefore imperative that the status of HER2, ER, and PR be determined in order to ensure an optimal treatment plan for breast cancer patients.

The clinicopathological characteristics of a tumor are critical in determining the best treatment. Analysis through deep learning allows for rapid, non-invasive study of the entire tumor and each detected lesion. The purpose of this retrospective study was to determine if deep learning models can be used to characterize malignant breast lesions’ pathological characteristics on mammography.

Previous research on mammography and deep learning has focused on developing CAD systems for classifying breast masses as benign or malignant, and achieved good diagnostic efficacy. The study by Moraru et al.36 shows that the DeepLabV3 + has significantly better segmentation performance and outperforms the U-Net network. Shen et al.27 used four models, Resnet-Resnet, Resnet-VGG, VGG-VGG, and VGG-Resnet, and on an independent test set of digitized film mammograms from the Digital Breast Screening Database (CBIS-DDSM), the best single model had an AUC per image of 0.88, and the average of the four models improved the AUC to 0.91 (sensitivity: 86.1%, specificity: 80.1%); on an independent test set of full-field digital mammography (FFDM) images from the INbreast database, the best single model had a per-image AUC of 0.95, and the average of the four models improved the AUC to 0.98 (sensitivity: 86.7%, specificity. 96.1%). Al-Antari et al.18 classified the detected and segmented masses as benign or malignant by You-Only-Look-Once (YOLO) in the INbreast database with an AUC of 94.78%. Ribli et al.26 classified the detected and segmented masses as benign or malignant by Faster R-CNN in the INbreast database and obtained an AUC of 0.95. A CAD system developed by Ribli, D. et al. using three deep learning models, VGG16, ResNet50, and Iception v320, was tested on an independent database (MIAS) and obtained an accuracy of 98.23% and an AUC of 0.99.

Further, studies on mammography and deep learning for HER2, ER, and PR status prediction are scarce, or the results are unsatisfactory. Ueda et al.25 used a deep learning model to classify breast cancer receptor expression from mammography with an AUC of 0.67 for ER positive or negative, 0.61 for PR positive or negative, and 0.75 for HER2 positive or non-HER2 positive in the assay dataset. According to Zhou et al.37, the logistic regression model AUC for assessing HER2 status based on quantitative radiomic features were 0.846 in the training set and 0.787 in the test set, but the study used only IHC to determine HER2 status and based on a small sample (n = 91), the results may have been biased. Dominique, C. et al. used deep learning analysis based on contrast-enhanced spectral mammography (CESM) to identify histotype majority factors in malignant breast tumors, with good results for identifying ER receptor positive or negative (AUC = 0.83), triple-negative breast cancers (0.90), however, it was not effective in detecting HER2 status (AUC = 0.59), and its model relies on CESM, which is not very generalizable.

It should be noted that in the above studies, suspicious masses of the mammography images needed to be manually segmented before input to the models, whereas the model we constructed did not require manual segmentation of masses and obtained a more desirable AUC, especially in describing ER status (For HER2, ER and PR, the AUCs were 0.708, 0.785, and 0.706, respectively), which means that our model is more objective and has higher time and economic benefits. Further, our results show that breast cancer samples can be classified into molecular subtypes from mammograms.

The study by Praharsha et al.33 illustrates the distinctive classification capabilities of their proposed CBAM VGG architecture. Therefore, we also developed original ResNet-18 without attention module and VGG-19 with and without attention module for comparison. The results of the comparison demonstrate that the proposed model demonstrates superior performance compared to the original ResNet-18 without attention module and VGG-19 with and without attention module.

There are several limitations of our study. Firstly, it is theoretically possible for a patient to present with multiple lesions in the same breast, each expressing a different molecular isoform. However, our study lacks an analysis of this circumstance, as we do not have a record of any patients who have presented with such a condition. Nevertheless, even in such cases, the Grad CAM has the potential to effectively visualize the prediction of our model, as illustrated in Fig. 2. This could enable the visualization of the region associated with each isoform. Moreover, in accordance with the established guidelines, if a patient is diagnosed with breast cancer and presents with different molecular isoforms, it is recommended that they adhere to the prescribed treatment plan for each lesion. This approach does not influence the selection of appropriate therapeutic options. Furthermore, the sample size of the study is relatively modest, although data augmentation was employed to address this limitation.

Conclusion

Our study demonstrates that convolutional neural network (CNN) models derived from mammography can be employed to estimate the expression of human epidermal growth factor receptor 2 (HER2), estrogen receptor (ER), and progesterone receptor (PR) in breast cancer. This approach holds promise for providing a noninvasive and cost-effective method for assessing tumor biology prior to the initiation of initial treatment. It is important to note that this technique can only be applied for the purpose of clinical guidance with regard to treatment once further high-level evidence has been produced demonstrating its safety. Core needle biopsy (or vacuum biopsy) remains the gold standard for pre-treatment molecular subtype classification. Further prospective studies are required to more fully elucidate the correlation between imaging characterization and cancer biosignatures. In the future, we intend to incorporate additional clinical data on breast cancer, including breast MRI and ultrasound, into our CNN models to enhance their accuracy in estimating the expression of HER2, ER, and PR in breast cancer. This approach has the potential to become a non-invasive and cost-effective method for assessing tumor biology prior to the initial treatment of breast cancer.

In conclusion, our results suggest that deep learning analysis of mammography has the potential to determine HER2, PR, and especially ER status and serve as an adjunctive diagnostic tool for breast cancer.