Background

Invasive lobular carcinoma (ILC) is the most common special subtype of breast cancer and accounts for 10–15% of all breast cancers1,2,3, arising from lobules and terminal ducts4. Loss of E-cadherin, a key feature of this tumor type is due to CDH1 mutation, which occurs simultaneously with heterozygous deletion in chromosome 16q in majority of the cases5,6,7. ILC has been characterized as an indolent and slowly progressive disease in the past8. In recent times, the incidence of this “special” subtype has increased due in part to advances in self-awareness of cancer prevention and diagnostic modalities9,10.

There is an increasing recognition that ILC has distinct clinical, histologic, molecular, and biological characteristics compared to non-special type carcinoma (NST), which was formerly known as invasive ductal carcinoma (IDC)6,11,12,13,14,15,16. ILC tends to be clinically bilateral, multifocal and higher tumor(T) or node(N) stages of disease; pathologically, it tends to be estrogen receptor (ER) positive and/or progesterone receptor (PR) positive and human epidermal growth factor receptor 2 (HER2/neu) negative. Although these tumors have a better clinical prognosis initially, they have unfavorable long-term outcome. They tend to recur at a later age and the metastasis can be found at uncommon sites, including the gastrointestinal tract, cerebrospinal fluid, peritoneal sites, pelvic organs, and leptomeninges16,17,18,19,20,21. ILC has several histological subtypes. Up to 55% of the cases belong to the classical subtype, which is ER and PR positive and Her2 negative22; while higher Her2 positivity is seen in the pleomorphic variant (30–80%)9,23,24.

As abovementioned, ILC is one potential subset that “clinicopathologic features” can conflict with “long-term outcome”. Despite being a unique subtype of breast cancer, decisions of the clinical treatment strategies and assessment clinical risks for ILC are derived from randomized clinical trials overshadowed by NST, which may explain why the St Gallen International Expert Consensus guidelines and the National Comprehensive Cancer Network (NCCN) still recommend that ILC be treated with the same treatment paradigms as NST25. While our ability to tailor treatment has increased for many patients with breast cancer, the optimal treatment strategy for those with discordant risk remains unclear. Given the current active investigation into the best treatment strategy for patients with discordant risk profiles, further investigation into ILC-specific prognostic and predictive tools are warranted in an era of individualized therapy.

The American Joint Committee on Cancer (AJCC) TNM staging system mainly reflects cancer status depending on anatomical factors, but does not consider the individual factors of patients. In other words, even patients with the same TNM classification who received similar therapeutic regimens might have greatly different survival prognoses26. Therefore, it is of particular importance to search for a more individualized prognosis model. In recent years, nomogram has been extensively utilized to predict the prognosis of a variety of cancers, largely due to its ability to simply the prediction model into the single estimate of event probability27, and it is customized for each individual patient27. However, the use of nomograms in ILC prognosis was insufficient due to small number of patients and limited by the under-appreciation knowledge of this distinct type of cancer.

The Surveillance, Epidemiology, and End Results (SEER) is an authoritative database of cancer statistics with long follow-up data that allows more in-depth study of the prognosis and suitable treatment options. Prognostic nomograms, constructed based on the SEER database, have been widely employed to forecast the prognosis of other “special” histological types of breast cancers28,29,30. The staging and survival data in the SEER have also been validated in breast and urothelial cancers, with high concordance with external data31,32,33. By far, only two population-based studies published in 2020-2021 has reported the application of nomograms to predict the mortality of patients with ILC in SEER29,34. However, it is important to note that these studies included some confounding factors such as males and metastatic disease, while most were predictive models of short-term survival (only one study was up to 10 years), and the recent update of the SEER database means that there may be more ILC data available for further analysis.

The current study aimed to develop a long-term model for predicting 5-, 10-, and 15-year overall survival (OS) and cancer-specific survival (CSS) of patients with non-metastatic ILC in order to assist clinicians in formulating appropriate treatments.

Materials and methods

Data source and patient selection

To eliminate differences in treatment measure and to ensure corresponding follow-up, we restricted our search to between 2004 and 2020. The baseline and follow-up data of patients with non-metastatic ILC was identified in the SEER-17 Database (2022 Nov. submission) SEER*Stat (v8.4.3), which included population-based data from 17 cancer registries covering approximately 26.5% of the United States (U.S.) cancer population between 2000 and 2020.

Inclusion criteria (male and metastatic patients were excluded due to different characteristics): (1) Diagnosis of pure ILC (ICD-O-3 morphology codes 8520/3); (2) Females; (3) ILC was the first primary tumor and non-metastatic. Exclusion criteria: (1) No confirmed histological diagnosis; (2) Survival time <1 month; (3) Incomplete data (unknown or blanks) on race, marriage, tumor size, regional nodes positive (RNP), regional nodes examined (RNE), combined summary (CS) stage, ER status, PR status, primary tumor site, laterality, or cause of death (COD); (4) Follow-up data consisting of autopsy or death certificate only; (5) Patients who received simple radioactive implantation including brachytherapy and radioisotopes (Supplementary Table S1). The present study adhered to the Declaration of Helsinki, and informed consent was not required for data extraction and personal information of patients was not involved.

AJCC stage was a basic tool to predict the prognosis of patients in the clinical practice. In our study, the T stage, N stage, and TNM stage were not included due to the inconsistency of definition, but tumor size(the main reference for T stage), lymph node ratio (LNR= RNP divided by RNE, the main reference for N stage) and CS stage(the main reference for combined TNM stage)—were included.

Due to the deidentified nature of the public-access user files, the study did not require institutional review board approval.

Variables and endpoints

Fourteen demographic and clinicopathological variables were extracted from the SEER database, including patient ID, age at diagnosis, race, marital status, laterality, primary tumor site, tumor size, grade, RNP, RNE, CS stage, ER status, PR status, HER-2 status. Treatment information that was extracted included chemotherapy, radiotherapy, and surgery. Outcome measures were cause of death (COD) and survival time.

Continuous variables were stratified as per the optimal cutoffs determined by the Kaplan-Meier method in X-tile (v3.6.1). Age (22–65, 66–78, and ≥ 79 years) and LNR were each stratified into two stages (I :0–0.53, II: 0.54-1). While tumor size was divided into three groups (≤ 20 mm, 20-50mm and ≥ 50mm). For categorical variables, race was classified as 3 subgroups (white, black, and other), marital status as 3 subgroups (married, unmarried, and separated), laterality as 2 subgroups (left and right), and grade as 2 subgroups (I/II and III/IV). Primary site was divided into 9 subgroups [upper-outer quadrant of breast (UO), central portion of breast (CP), upper-inner quadrant of breast (UI), lower-inner quadrant of breast (LI), nipple (NP), lower-outer quadrant of breast (LO), axillary tail of breast (AT), overlapping lesion of breast (OL), and not otherwise specific (NOS)]. Stratification was also performed on CS stage (localized and regional). ER and PR statuses were classified as 2 subgroups (positive and negative), and HER-2 status as 3 subgroups (positive, negative, and unknown). Radiation and chemotherapy statuses were stratified by yes and no/unknown, and surgery management as mastectomy, breast conservation, NOS, and no/unknown. COD was categorized as alive, OS, and CSS.

OS is the primary endpoint of this study and is defined as the time from initial diagnosis to death of any cause. CSS is the secondary endpoint and is defined as the time from initial diagnosis to ILC death. An important consideration in the interpretation of our findings is the use of OS and CSS as opposed to disease-free survival (DFS) as the primary endpoint, as it is the only outcome available from the SEER data Files. All variables are presented as frequency and percentage except for survival, which is expressed as median, range, first quartile, and third quartile.

Statistical analysis

Patients were randomly divided into a training cohort and a validation cohort at a ratio of 7:3, and baseline data were compared by the Pearson's chi-square test. Subsequently, RStudio software was used to screen statistically significant variables in the univariate Cox regression (P < 0.05 was considered statistically significant). Statistically significant variables (all P < 0.05) identified by univariate Cox regression were incorporated into multivariate Cox regression to determine the independent prognostic factors for OS and CSS. Nomograms of the competing risk model to conveniently predict the 5-year, 10-year, and 15-year survival were constructed based on these variables. Discrimination was evaluated using bootstrapping with 2000 resamples. The concordance index (C-index) and area under the time-dependent receiver operating characteristic (td-ROC) curve (AUC) were chosen to quantify the discriminatory ability of nomograms. Consistency between actual prognosis and nomogram-predicted survival was investigated by plotting the calibration curve, and predictive accuracy was determined by decision curve analysis (DCA). Patients were classified as low- and high-risk based on the optimal risk score cutoff determined by td-ROC curve analysis of the nomogram. OS and CSS curves were generated using the Kaplan–Meier estimator and compared between groups by the log-rank test. All statistical analyses were completed using R (version 4.2.1). The DynNom package35 is built to generate dynamic nomograms as R/Shiny applications for a variety of statistical models to allow a reader to interact with the model in a user-friendly manner. A two-tailed P < 0.05 was regarded as statistically significant.

Results

Baseline characteristics

A total of 31451 eligible patients diagnosed between 2004 and 2015 were included in this study and randomly divided into the training cohort (n=22,017) and the validation cohort (n=9434) at a 7:3 ratio. The last follow-up was December, 31, 2020 and the median follow-up period was 101, 101, and 99 months in the training set, validation set, and whole set, respectively. Baseline characteristics of patients were similar between the training and validation cohorts (chi-square test, all P > 0.05). Most ILC patients were diagnosed at 22-65 years of age (59.9%), White (86.5%), married (60.7%), grade I/II (90.8%), ER positive (97.6%), PR positive (83.5%), LNR 0.00-0.53 (88.4%), CS stage localized (61.8%), had primary tumor in UO (36.2%), and received chemotherapy (36.8%). Given that HER-2 status was only documented in SEER since 2010, HER-2 data were lacking for many patients and were hence classified as unknown. At the end of follow-up, 7954 (25.3%) patients died, of whom 3,417 (10.9%) died from ILC. The clinicopathological characteristics of enrolled patients are shown in Table 1.

Table 1 Characteristics of Eligible Patients.

Independent predictors for OS and CSS in training cohort

Univariate Cox regression (Supplementary Table S2) showed that all variables except laterality were significantly correlated with OS and CSS (P < 0.05). Subsequent multivariate Cox regression (Supplementary Table S3) revealed that age, marriage, grade, ER status, PR status, tumor size, LNR, CS stage, radiation, and chemotherapy were independent prognostic factors for OS. In addition, these variables excluding radiation and chemotherapy were independent prognostic factors for CSS.

Nomogram construction

Next, we constructed two nomograms based on the above independent prognostic factors to predict the 5-, 10-, and 15-year OS as well as CSS (Fig. 1). The score of a variable is determined by the intercept on the “point axis”, and the total score is the sum of scores of all variables. The probability of survival can subsequently be estimated by sketching a vertical line from the “total score axis” to the “survival axis”.

Figure 1
figure 1

Nomograms for predicting 5-, 10-, and 15-year survival of ILC in the training cohort. (A) OS nomogram. (B) CSS nomogram.

The OS and CSS nomogram revealed that age, LNR, tumor size, and CS stage markedly affected ILC prognosis. The effect of tumor size, LNR and CS stage was markedly increased in the CSS nomogram.

To allow a reader to interact with the models in a user-friendly manner, we provide OS and CSS R/Shiny apps (https://ilc-survival2024.shinyapps.io/osnomogram/; https://ilc-survival2024.shinyapps.io/cssnomogram/). The dynamic nomograms display time slider for covariates (bounded by follow-up ranges) and drop-down boxes for variables. The predict function maps the appropriate inverse link function from the generalized linear model object to generate predicted values (on the scale of the response variable) and corresponding 95% confidence intervals. The predicted value and corresponding confidence interval are plotted and presented in the tooltip labels and the ‘Numerical Summary’ tab. Further, a formatted model output summary is displayed in the ‘Model Summary’ tab (Supplementary Fig. S1).

Nomogram validation

The predictive performance of the nomograms was assessed by several methods. First, we evaluated the discriminatory ability of the nomograms using C-index and AUC. For the training and validation cohorts, the C-index of the OS nomogram was 0.765 (95% CI 0.762–0.768) and 0.757 (95% CI 0.747–0.767), and the C-index of the CSS nomogram were 0.812 (95% CI 0.804–0.820) and 0.813 (95% CI 0.799–0.827), respectively. The AUC for predicting the 5-, 10-, and 15-year OS were 0.821, 0.789, and 0.783 in the training cohort, and 0.786, 0.773 and 0.785 in the validation cohort, respectively. Similarly, the AUC for predicting the 5-, 10, and 15-year CSS were 0.852, 0.831, and 0.792 in the training cohort, and 0.831, 0.832, and 0.787 in the validation cohort, respectively (Fig. 2). Second, we plotted the calibration curve (Figs. 3 and 4) to evaluate the agreement between predicted and actual survival outcomes of ILC patients. Our results demonstrated good agreement between the predicted and actual OS and CSS in both training and validation cohorts, indicating that these nomograms exhibited satisfactory discrimination. Last, we conducted DCA to assess the predictive performance of the nomograms compared with traditional AJCC TNM stage. Our findings indicated that both nomograms had significantly higher clinical benefits than AJCC TNM stage (Figs. 5 and 6).

Figure 2
figure 2

ROC curves for the OS and CSS nomograms. ROC curves for 5-, 10-, and 15-year OS in the training cohort (A) and validation cohort (B). ROC curves for 5-, 10-, and 15-year CSS in the training cohort (C) and validation cohort (D).

Figure 3
figure 3

Calibration curves for 5-, 10-, and 15-year OS in the training cohort (AC) and validation cohort (DF).

Figure 4
figure 4

Calibration curves for 5-, 10-, and 15-year CSS in the training cohort (GI) and validation cohort (JL).

Figure 5
figure 5

DCA of nomogram, TNM stage, and summary stage for predicting 5-, 10-, and 15-year OS in the training cohort (AC) and validation cohort (DF).

Figure 6
figure 6

DCA curves of the nomogram, AJCC TNM stage, and summary stage for predicting 5-, 10-, and 15-year CSS in the training cohort (GI) and validation cohort (JL).

Risk stratification

The risk score of each patient was calculated according to the nomograms, and the optimal cutoffs were 72 points (training cohort) and 64 points (validation cohort) for 15-year OS, and 122 points (training cohort) and 138 points (validation cohort) for 15-year CSS. Patients were then classified as low- and high-risk according to these cutoffs. According to the OS nomogram, 9452 (42.93%) patients in the training cohort and 4599 (48.75%) patients in the validation cohort had high-risk ILC. According to the CSS nomogram, 4909 (20.77%) patients in the training cohort and 2292 (28.35%) patients in the validation cohort had high-risk ILC. Both nomograms revealed that high-risk ILC patients in the training and validation cohorts had worse prognoses than low-risk patients (Fig. 7).

Figure 7
figure 7

OS and CSS curves after risk stratification based on nomograms. (A,B) Training cohort. (C,D) Validation cohort.

Discussion

Many evidences in the literature suggested a rising trend of patients with ILC9,10. Although these tumors have a better clinical prognosis initially, they tend to recur at a later age. Thus, more effective scoring system is highly meaningful for long-term prognosis assessment in this population36. To the best of our knowledge, our nomograms are the first for predicting the 5-year, 10-year, and 15-year survival rates of an individual patient in the clinical practice. In this retrospective study, our findings showed that age, marriage, ER status, PR status, grade, tumor size, and CS stage were prognostic factors for both OS and CSS of ILC, whereas chemotherapy and radiation were independent protect factors for OS. Abovementioned findings confirmed the opinion of previous studies19,29,30,37. The needed information for nomograms was easy to collect. This would help the clinicians to predict the prognosis and devise the follow-up strategy individually.

Patients with ILC are more prone to lymph node metastasis than other types of breast cancer. Dayan D et al.38 reported that a differential effect of nodal stage on survival was observed, with better survival for ILC patients with pN0/pN1 tumors and worse survival for ILC patients with pN2/pN3 tumors compared to NST patients. Many factors may affect the number of examined lymph nodes, such as varied levels of surgical expertise and different handling of the surgical specimen by the pathologist. The tumor stage could be underestimated when the number of resected and assessed lymph nodes is insufficient, which might lead to inadequate treatment and incorrect prognostic judgment39. To tackle this problem, LNR has been introduced to assess the prognosis in breast cancer40,41,42,43,44,45. However, the evidence for the prognostic value of LNR was based on other types of breast cancer or mixed together instead of ILC36,46,47. Our findings highlight the unique prognostic values that higher LNR was correlated with a worse prognosis. Beyond clinical and demographic variables, the novel nomograms are able to incorporate the prognostic value of LNR into clinical practice and better to guide prognosis and management strategies.

The predictive value of chemotherapy in ILC is controversial. It has been reported that ILC has lower responsiveness to neoadjuvant/adjuvant chemotherapy than NST48,49,50,51. Similar to Xu et al.29 and Fu R et al.34, our study found that chemotherapy was a predictor for OS but not for CSS. Chemotherapy can eventually translate into OS benefits possibly because adjuvant cytotoxic therapy directly inhibits or kills cancer cells and eradicate micrometastatic disease, thereby reducing the risk of related diseases and prolonging patient survival. In contrast, its effects on CSS may be varying with confounding factors as chemotherapy is considered only for high-risk patients30. The results of our univariate analysis appeared to support these explanations, showing that patients on chemotherapy had a significantly higher incidence of CSS, whereas the difference was not significant after the adjustment for confounding variables in the multivariate analysis.

ILC is different from NST in clinical appearance, imaging, histopathological findings, treatment options and survival. This histologic tumor type needs clarification in many ways. The underlying causes of its high hormone sensitivity and chemoresistance are challenges to explore. In the present study, nomograms exhibited excellent performance in predicting long-term survival., DCA revealed that our model provides significantly stronger capability in risk stratification for patients with ILC than AJCC TNM tumor stage. Nomograms can be used for patient consultation on survival information, guiding clinical decision making and treatment allocation. Patients defined as high risk through the nomogram are expected to have a dismal prognosis, so we recommend that these patients should receive additional treatment. When considering the diverse endocrine therapy options (at least 5 years), preferably 7–10 years according to international guidelines52 and intensive follow-ups. Furthermore, in current clinical practice, multigene tests, such as the 21-gene recurrence score (21-RS) and the 70-gene signature (70-GS), are currently being used to predict recurrence and survival, and identify candidates for adjuvant chemotherapy53,54. We suggest that the combination of the nomogram and genomics might better guide clinical decision-making for this subset of patients.

There are several limitations in this study. First, the retrospective and non-randomized nature of the data may introduce bias into the results. Hence, prospective and/or randomized studies are needed to validate our findings. Second, an extremely small fraction, some cases with missing information were excluded from our analysis, which may in turn lead to selection and information bias. Third, A key limitation of this study was the pathologists' interpretation which many not be standardized and a lack of a unified pathology review center for pathology. Selection for pure ILC would likely reduce these data-ambiguity because of the corresponding terminology has not changed in different versions of the ICD-O-3 classification. We lack information on ILC variants, which will be critical to consider in future analyses. Meanwhile data on the utility of multigene prognostic tests in ILC was limited. Some assays have been tested in ILC with mixed results, and there are efforts to develop assays specifically based on information from ILC. These studies are necessary given the large number of ILC variants, which may harbor unique mutations that change clinical phenotype55,56,57,58,59,60,61,62,63,64,65. Fourth, many key potential prognostic variables are not documented in SEER, such as BMI, reproductive status, Ki-67 level, histological subtype, and types of radiotherapy and chemotherapy. Fifth, systemic treatments are being developed, and an increasing number of targeted drugs are being administered in clinics, both of which have a great impact on patient recovery. However, inclusion of ER/PR/Her2 status and tumor size, LNR, and stage categories in our multivariate analyses would likely reduce these undesired effects and biases. Sixth, although studies have shown that the use of endocrine therapy in patients with hormone receptor–positive breast cancer has become common practice, our present work lacks data on endocrine therapy. Regardless, the lack of data on endocrine therapy did not affect the judgment of the overall results. Finally, the effect of comorbidities on prognosis was not considered in this study. Given that only a selected number of prognostic factors were incorporated in our nomograms, the predicted results should be interpreted with caution and used as a reference only. It is promising to see overall progress in the understanding of ILC; nevertheless, many open questions remain.

Conclusion

The novel nomograms to predict long-term prognosis based on LNR are reliable tools to predict survival, which may assist clinicians in identifying high-risk patients and devising individual treatments for patients with ILC.