Introduction

Lung cancer presents a significant challenge due to its high incidence and low survival rates, with a mere 16.6% 5-year survival rate1,2. This poor prognosis is primarily attributed to late-stage diagnoses and inadequate treatment responses3. Addressing this issue necessitates a focus on early detection and treatment, considering that the 5-year survival rate for stage IA (early-stage) can reach 75%4. Unfortunately, early-stage lung cancer typically lacks noticeable symptoms, and current diagnostic methods are insufficient, leading to only 10-15% of patients being diagnosed in the early stages of the disease5.

While low-dose computed tomography (LDCT) is widely utilized for the early detection of lung cancer, the increasing diagnoses of lung nodules have not corresponded to a rise in lung cancer incidence6,7. This trend suggests a high false-positive rate in LDCT screening, potentially subjecting patients to unnecessary invasive investigations or treatments such as needle biopsies and surgery8,9.

Moreover, traditional serum biomarkers like CEA, NSE, and CYFR21-1 demonstrate poor efficacy in diagnosing early-stage lung cancer10. Although F-18 fluorodeoxyglucose positron emission tomography/computed tomography (PET/CT) exhibits high diagnostic value, it may yield false negative results for non-solid pulmonary nodules and those with a diameter of less than 8 millimeters11. Notably, early-stage lung cancer often manifests as small ground-glass or mixed-density nodules, further complicating accurate diagnosis.

Liquid biopsies have gained increasing popularity in cancer diagnosis in recent years, serving the purpose of tumor diagnosis through the identification of tumor cells and metabolites in blood or other bodily fluids. Peripheral blood samples are commonly employed for testing, with circulating tumor cells (CTCs), circulating genetically abnormal cells (CACs), circulating tumor DNA (ctDNA), exosomes, microRNAs (miRNAs), circular RNAs (circRNAs), and tumor-educated platelets (TEPs) being the typical detection indicators12,13.

CTCs and CACs stand out as the most frequently utilized liquid biopsy markers at present. CTCs typically denote cancer cells that have disseminated from the primary tumor or metastatic site into the bloodstream, and their detection relies on the presence of specific epithelial markers14. However, metastatic cancer cells often undergo epithelial-to-mesenchymal transition (EMT), significantly reducing the sensitivity of CTCs detection15. On the other hand, CACs are peripheral blood mononuclear cells harboring mutations on chromosome 3 (3p22.1, 3q29) and chromosome 10 (10q22.3, CEP10), a common anomaly in lung cancer16. CACs exhibit genetic abnormalities similar to those in the primary tumor and are strongly associated with tumorigenesis in non-small cell lung cancer (NSCLC)17, making them a promising diagnostic marker for cancer. Various studies have reported the diagnostic sensitivity of CACs for early lung cancer ranging from 67.2 to 90%, with a specificity of 64.7–76.9%, and an AUC between 0.76139 and 0.83716,18,19,20. These findings highlight the variability in CAC diagnostic efficacy across different populations, sometimes falling short of ideal diagnostic outcomes. Therefore, our study sought to address the question: which pulmonary nodule patients exhibit the highest diagnostic efficacy for CACs? To achieve this, we utilized demographic and imaging data from patients with pulmonary nodules to conduct consensus clustering analysis. This approach allowed us to cluster patients and assess the diagnostic efficacy of CACs within each cluster, identifying the optimal candidates for CACs detection while summarizing the clinical and imaging characteristics of these individuals.

Result

Patient characteristics

Initially, 263 patients were included in this study for screening. However, 22 patients were subsequently excluded for various reasons: 7 had a history of other malignant tumors, 2 had non-lung primary malignant tumors, 12 did not have thin-slice CT scans, and 1 had received anti-tumor treatment before CACs detection. Therefore, 241 patients were ultimately included in the subsequent analysis (Fig. 1). Among the included patients, 75 (31.1%) were female, and 166 (68.9%) were male. The median age of the patients was 52 years old with an interquartile range of 42 to 60 years. Additionally, 34 patients (14.1%) had a smoking history, while 22 patients (9.1%) had a drinking history. Furthermore, there were 54 patients (22.4%) with hypertension and 15 patients (6.2%) with diabetes. The median maximum nodule size was 8 mm with an interquartile range of 6 to 12 mm. Regarding the distribution of pulmonary nodules, 59 patients (24.5%) had nodules in the left upper lobe, 40 patients (16.6%) in the left lower lobe, 68 patients (28.2%) in the right upper lobe, 18 patients (7.5%) in the right middle lobe, 34 patients (14.1%) in the right lower lobe, and 22 patients (9.2%) had nodules in two or more lobes. Analysis of CT images revealed that 136 cases (56.4%) exhibited pure ground glass nodules, 71 cases (29.5%) were solid nodules, and 34 cases (14.1%) were mixed ground glass nodules. These characteristics are summarized in Table 1.

Fig. 1
figure 1

Flowchart of patients enrolled.

Table 1 Patient and CT characteristics.

IQR interquartile range, CT computed tomography.

Consensus clustering analysis

The consensus clustering analysis incorporated variables such as the patient’s baseline information and imaging characteristics, encompassing age, sex, smoking history, drinking history, hypertension history, diabetes history, maximum nodule size, maximum nodule type, and CT ___location. Figure 2 displays consensus matrix heat maps for various k values (k = 1–10). When k = 3 (Fig. 2B), the boundaries of the heat map are clearly defined, indicating a high degree of clustering stability during iterative repetitions at this particular value. Figure 3 illustrates the relative change in area under the conditional density function curve, revealing a significant decrease when k = 3. Moreover, for k = 3, the mean consensus score for Clusters 1, 2, and 3 was 0.96, 0.92, and 1.00, respectively (Fig. S1). A higher mean consensus score indicates greater stability of cluster membership. Subsequently, all patients were classified into 3 clusters through consensus clustering analyses. Notably, the age of Cluster 1 surpassed that of Cluster 2 and Cluster 3 (P < 0.001). Additionally, Cluster 1 exhibited the highest prevalence of diabetes and hypertension among the 3 clusters (P = 0.03, P < 0.001, respectively). Furthermore, the maximum nodule size of Cluster 1 exceeded that of Cluster 2 and Cluster 3 (P < 0.001). The baseline information of the 3 clusters is detailed in Table 2.

Fig. 2
figure 2

Consensus matrix heat maps for different k values. (A) k = 2. (B) k = 3. (C) k = 4. (D) k = 5. (E) k = 6. (F) k = 7. (G) k = 8. (H) k = 9. (I) k = 10. The color changes from dark blue to white, indicating a gradual decrease in consistency. The deepest blue color signifies a state of absolute consensus, where two individuals consistently cluster together. Conversely, the presence of white color indicates a perfect consensus in which the two individuals invariably group separately. k is the number of clusters. When k = 3, clearly defined boundaries in the heat map indicate high clustering stability.

Fig. 3
figure 3

The relative change in area under the conditional density function (CDF) curve of consensus cluster analysis when cluster number changes from k to k + 1. When k = 3, a significant decrease in the curve suggests optimal k = 3.

Table 2 Demographic, morphologic, and histologic characteristics by clusters.

Diagnostic efficiency of CACs

We evaluated the diagnostic efficiency of CACs in all patients and different clusters. The AUC for all patients was 0.689 (P<0.001, 95%CI: 0.583–0.796, Fig. 4A). Notably, CACs exhibited the best diagnostic performance in Cluster 1, with an AUC value of 0.855 (P<0.001, 95%CI: 0.730–0.979, Fig. 4B). Conversely, no significant AUC values were observed for Cluster 2 and Cluster 3 (P = 0.675, P = 0.139, respectively, Fig. 4C, D). Table S1 illustrates CACs’ diagnostic efficacy in overall patients and Cluster 1, indicating that the AUC of Cluster 1 outperformed that of the overall patient population (P = 0.044). At the maximum Youden index (with cut-off values of 2.5 and 1.5 for CACs in the overall patient cohort and Cluster 1, respectively), Cluster 1 demonstrated superior sensitivity (0.714 vs. 0.667), specificity (0.955 vs. 0.722), accuracy (0.922 vs. 0.714), positive predictive value (PPV) (0.714 vs. 0.296), and negative predictive value (NPV) (0.955 vs. 0.925) compared to the overall patient cohort (Table S1).

Fig. 4
figure 4

Receiver operating characteristic curve (ROC) of circulating genetically abnormal cells (CACs). (A) ROC of CACs in overall patients. (B) ROC of CACs in Cluster (1) (C) ROC of CACs in Cluster (2) (D) ROC of CACs in Cluster 3.

Combination of multiple variables to identify malignant nodules of cluster 1

In Cluster 1, the lung cancer group exhibited notable distinctions compared to the benign group. These differences included an increased count of CACs (P < 0.001), a higher percentage of females (78.7% vs 35.7%, P = 0.002), a higher prevalence of pure ground-glass nodules (57.3% vs 28.6%, P = 0.045), and larger nodule sizes (P = 0.004). Conversely, there were no significant variations observed in terms of age, drinking and smoking history, diabetes, hypertension, CT ___location, or serum tumor markers (TAP, CEA, CYFRA21-1, NSE, AFP, CA125, CA199, Ft) within Cluster 1 (Table 3). Moreover, the diagnostic performance of traditional serum tumor markers for pulmonary nodules in Cluster 1 was found to be statistically insignificant (all P > 0.05, Table S2 and Fig. S2). Subsequently, a multiple logistic regression analysis was conducted to devise a diagnostic model specific to malignant pulmonary nodules in Cluster 1. The model incorporated CACs, sex, maximum nodule type, and maximum nodule size as predictive variables. The analysis revealed that CACs (OR = 1.528, 95% CI: 1.119–2.088, P = 0.008), female gender (OR = 6.070, 95% CI: 1.153–31.948, P = 0.033), maximum nodule type (OR = 7.903, 95% CI: 1.429–43.697, P = 0.018), and maximum nodule size (OR = 1.212, 95% CI: 1.011–1.452, P = 0.037) were independent risk factors associated with malignant lung nodules (Table S3). Based on these findings, a diagnostic model for lung nodules was formulated utilizing the four aforementioned predictors, with the probability of malignancy calculated as 1/(1 + e^(-Y)), where Y= -4.123 + (0.424 × CACs) + (1.803 × Sex) + (2.067 × Maximum nodule type) + (0.192 × Maximum nodule size). Here, ‘e’ represents the natural logarithm, CACs denotes the CAC counts, female = 1, male = 2, pure ground-glass nodule = 1, other nodules = 2, and the maximum nodule size is measured in millimeters. The AUC of this model was determined to be 0.925 (P<0.001, 95% CI: 0.846-1.000, Fig. 5). Notably, at the maximum Youden index (0.752), yielding a sensitivity of 0.786, specificity of 0.966, accuracy of 0.942, PPV of 0.786, and NPV of 0.966.

Table 3 Patient characteristics of cluster 1.
Fig. 5
figure 5

ROC of the joint diagnostic model (including CACs, sex, maximum nodule type, and maximum nodule size).

Discussion

Early-stage lung cancer often manifests as pulmonary nodules, and CT examination is highly effective for their detection. While LDCT screening in high-risk populations has shown a 26% reduction in lung cancer mortality21,22, its increasing application has also led to a higher diagnostic rate of pulmonary nodules. Nonetheless, CT imaging can not differentiate between benign and malignant nodules, potentially leading to unnecessary invasive procedures or surgeries23. Consequently, non-invasive identification of malignant nodules has become a key focus of current research.

Emerging as a promising liquid biopsy marker, CACs has demonstrated improved diagnostic performance compared to other markers24. However, CACs exhibit limitations in distinguishing certain pulmonary nodules. Therefore, identifying the most suitable patient population for CACs detection is crucial. To address this, we employed consensus clustering to categorize enrolled patients into 3 clusters and evaluated the diagnostic performance of CACs within each cluster. It should be noted that consensus clustering involves multiple rounds of resampling and clustering to obtain varied clustering outcomes. Subsequently, the assessment of the probability of each data point being assigned to each cluster involves calculating the similarity or consistency between clustering results, rather than simply categorizing patients into entirely distinct groups. The relationship between different clusters in consensus clustering is determined by factors such as their similarity, stability, potential overlap, and hierarchical structure.

It is well-established that nodule size is a critical factor in differentiating between benign and malignant pulmonary nodules. Interestingly, in our study cohort, the median diameter of pulmonary nodules was 8 mm, indicating that many patients had nodules smaller than 8 mm. For these patients, a surveillance strategy would typically be appropriate. However, despite this, some patients still underwent surgical intervention. This is because, in our study, many patients did not undergo surgery upon their initial consultation but were instead monitored through one or more follow-up visits with thin-section chest CT to assess the nodule. Although nodule size is indeed a crucial consideration in clinical decision-making, we also evaluate various other factors concerning pulmonary nodules, such as morphology, density on CT, and changes in morphology, density, and size during follow-up, in conjunction with patient risk factors. If there are changes such as enlargement, irregular margins, heterogeneous density, cavitation, or vascular penetration during follow-up, we tend to recommend surgical intervention. Regarding the management of patients with pulmonary nodules smaller than 8 mm under clinical surveillance, we will follow relevant guidelines25,26,27 to determine whether patients require follow-up and the appropriate follow-up schedule. Depending on factors such as nodule size, morphological characteristics, and family history, we recommend that patients undergo thin-section chest CT follow-up at our outpatient clinic every 3–6 months. For some low-risk individuals, follow-up intervals may be extended to 12 months, while those with nodules smaller than 5 mm and without additional risk factors may be advised that follow-up visits are unnecessary. If nodules show changes during follow-up, we adhere to guidelines for assessment and determine subsequent management, sometimes requiring multidisciplinary consultation.

In Cluster 1, the AUC of CACs surpassed that of the overall patient cohort (0.855 vs. 0.689, P < 0.001), demonstrating improved sensitivity, specificity, accuracy, positive predictive value, and negative predictive value compared to the entire patient population. Conversely, the AUC of CACs in Clusters 2 and 3 did not yield statistically significant results. This is intriguing; currently, research on factors influencing the diagnostic efficiency of CACs remains limited. Previous studies have predominantly focused on differences between CACs-positive and -negative patient cohorts. Some studies suggest no correlation between CACs detection outcomes and patient age, gender, smoking history, nodule size, or nodule type18, whereas others indicate association between CACs positivity and nodule type16,20. However, these studies have primarily analyzed factors influencing CACs detection outcomes rather than diagnostic performance. Our findings reveal that patients in Cluster 2 and Cluster 3 were younger with smaller nodules size and lower proportion of comorbidities such as hypertension and diabetes compared to Cluster 1. Prior research has shown an increase in CACs count with advancing stages of lung cancer, reflecting tumor burden17. Yet, little is reported on the relationship between CACs diagnostic efficiency and age, as well as diabetes and hypertension. Therefore, the impact of these factors on CACs diagnostic performance remains unclear, possibly there are other unknown factors not addressed in our study. Consequently, we cannot definitively ascertain the reasons for poorer CACs diagnostic efficiency in Clusters 2 and 3. Further research on matching relevant confounding factors is needed in the future to explore this issue.

Consequently, CACs detection proved more suitable for distinguishing lung nodules in Cluster 1. Cluster 1 was characterized by older age, larger nodule size, and a higher prevalence of hypertension and diabetes. Notably, there were no significant differences in sex, smoking, drinking habits, nodule ___location, or maximal nodule type between the three clusters. It is well known that older age and nodule size are strongly correlated with malignant pulmonary nodules. However, within Cluster 1, there were 14 patients with benign lung nodules who underwent unnecessary surgical procedures. The inclination of doctors to choose surgical intervention for these patients may be associated with the presence of lung cancer risk factors in this cluster. This highlights the necessity for additional indicators in high-risk populations such as Cluster 1 to aid in making more precise clinical decisions. Introducing CACs detection as a clinical decision-making reference in patients resembling Cluster 1 may lead some patients with benign nodules to adopt a strategy of regular follow-up. Determining thresholds for age and nodule size and even establishing a recommended scoring system for CACs detection would aid doctors in making more intuitive and concise decisions about whether patients should undergo CACs testing. However, thresholds and scoring system requires larger sample size, further validation and prospective studies, which still needs further investigation. Our results indicate that CACs can not discriminate between benign and malignant nodules in Cluster 2 and Cluster 3. Therefore, the adjunctive diagnostic value of CACs may be limited in these patients. In practical clinical settings, we will encounter patients who do not meet or only partially meet the criteria of Cluster 1. Considering the scarcity of research on the diagnostic utility of CACs in this patient population and their non-decisive role in lung cancer diagnosis, CACs continue to be utilized as an adjunctive diagnostic tool. Consequently, our recommendation is to exercise caution when ordering CACs examinations for patients who do not meet all/some criteria. Even if a CACs examination is ordered, careful interpretation of the results is advised.

Subsequently, we conducted a detailed analysis of Cluster 1, dividing patients into lung cancer and benign groups based on postoperative pathological results. Consistent with previous research, we found significantly higher levels of CACs in the lung cancer group than in the benign nodule group (P < 0.001)19,24,28. Moreover, female patients were more prevalent in the lung cancer group, aligning with existing research18,20,28. This observation may be attributed to the predominance of lung adenocarcinoma, a common pathological type of malignant lung nodules, among females29,30.

The pathological types of pure ground-glass pulmonary nodules tend to be minimally invasive adenocarcinoma, adenocarcinoma in situ, and atypical adenomatous hyperplasia, with a postoperative 5-year survival rate of nearly 100%31,32. Additionally, the presence of solid components in nodules may indicate increased invasion, leading to a decrease in patients’ survival rates33,34. Furthermore, within our study, the cohort comprising solid and mixed ground-glass nodules in Cluster 1 was relatively small compared to pure ground-glass nodules. To mitigate potential biases, we amalgamated solid and mixed ground-glass nodules into a unified group for analysis. Intriguingly, all mixed ground-glass nodules were malignant, suggesting a distinctive feature of patients in Cluster 1. Ground-glass nodules are well-known for their higher malignancy rate35, a conclusion supported by our study’s findings that the malignant group was predominantly composed of ground-glass nodules. Previous research has also indicated a higher probability of malignancy associated with larger nodules36, which aligns with our study’s observation that the lung cancer group had larger nodule sizes compared to the benign group (P = 0.04).

In the current landscape, commonly used serum markers for lung cancer offer limited diagnostic value for early-stage lung cancer37,38. Within Cluster 1, no significant differences were observed in the serum tumor markers between the two groups (all P > 0.05), and none of the commonly utilized serum tumor markers exhibited statistically significant AUC values (all P > 0.05). Given the promising diagnostic performance of CACs in Cluster 1, we explored the potential enhancement of diagnostic ability by combining CACs with other relevant variables. Our multivariate logistic regression analysis included variables that demonstrated statistical differences between the two groups. The results revealed the inclusion of CACs, sex, maximum nodule type, and maximum nodule size in a joint diagnostic model, leading to a further improvement in the AUC (0.925, 95% CI: 0.846-1.000), sensitivity (0.786), and specificity (0.966). Importantly, all variables in this enhanced model can be easily obtained through non-invasive methods, offering valuable guidance for clinical doctors in selecting appropriate follow-up diagnosis and treatment plans for patients with pulmonary nodules.

Although this study has yielded significant results, there are several limitations that need to be addressed. Firstly, to avoid potential selection bias during the patient inclusion and exclusion process, we did not query the pathological results of the patients’ lung nodules. However,, which, although a predominant histological type among lung nodule patients, may limit the generalizability of our findings, particularly to other histological types of lung cancer populations. This may be related to the relatively small sample size in our study. To address this limitation, future research should aim to expand the sample size to include a broader spectrum of pathological types, thereby enhancing the applicability of the study results in practical settings. Secondly, our study exhibited a predominance of female patients in both the overall patient cohort and Cluster 1. Additionally, all patients included in our study had a pathological diagnosis of lung adenocarcinoma. Lung adenocarcinoma is the most common histological type of lung cancer globally, its prevalence is notably higher among female lung cancer patients in East Asia39. These findings suggest potential limitations in the generalizability of our results beyond the local demographic context. Thus, future studies encompassing broader geographic diversity, diverse pathological types, and more patients are warranted to validate and refine our findings. It is also crucial to conduct prospective studies, enrolling patients newly diagnosed with pulmonary nodules within a specified timeframe and conducting long-term follow-up, and regular CACs testing at each reassessment. This would make the patient information included in the study more comprehensive. Additionally, it allows for the observation of correlations between CACs changes and changes in lung nodules, thereby mitigating selection bias and increasing the credibility of the research findings. Thirdly, in the analysis of Cluster 1, the division of nodule types into pure ground glass and others was necessitated by the relatively small number of solid and mixed nodules. As a result, a separate analysis to determine differences between the three types of nodules in the lung cancer group and benign group was not conducted. This represents a potential area for further investigation to achieve a more comprehensive understanding of different nodule characteristics. Fourthly, our study aimed to mitigate false-negative results associated with non-surgical biopsies by confirming all pathological diagnoses through surgical procedures. Pathological diagnosis based on surgical specimens was employed as a key inclusion criterion during patient recruitment, which may introduce potential selection bias. To address this potential bias, future research should incorporate a prospective study design that includes all patients requiring lung nodule surgery within a defined period and performs CACs testing preoperatively to minimize bias. Moreover, while adhering to this standard ensured the reliability of nodule pathological diagnoses, it may have led to the exclusion of certain true-negative and false-negative patients who underwent non-surgical biopsies, potentially introducing selection bias. Future studies could mitigate this limitation by incorporating long-term follow-up and observation of nodule changes to accurately determine their nature.

In conclusion, our study demonstrates that CACs detection shows better diagnostic performance in aiding the differentiation between benign and malignant nodules in older patients with larger pulmonary nodules and comorbidities such as diabetes and hypertension. Further research and validation are needed to explore how to better integrate CACs detection into clinical practice.

Materials and methods

Study design and patients

This retrospective study included patients with pulmonary nodules who were diagnosed by CT at the First Affiliated Hospital of Zhengzhou University between May 2020 and December 2022. To ensure accuracy, only patients with surgically confirmed pathological results were included to avoid false-negative outcomes from non-surgical biopsies. The inclusion criteria: (1)aged 18 years and above, (2) pulmonary nodules removed by surgery, (3) pathological diagnosis based on surgical specimens, and (4) CACs detection and CT scans performed before surgery. The exclusion criteria: (1) history of other malignant tumors, (2) malignant nodules were not lung primary malignant tumors, (3) CT scans without thin sections (resolution and slice thickness ≤ 1.0 mm), and (4) patients received any anti-tumor treatment before enrollment. Finally, 241 patients were enrolled. The eighth edition of the tumor, node, metastasis (TNM) classification4 was applied for lung cancer staging. The specific process of determining the study subjects is as follows: It is important to note that we did not include all patients who underwent pulmonary nodule surgery at the First Affiliated Hospital of Zhengzhou University between May 2020 and December 2022. Instead, we first identified 1563 patients with CACs results during this period through the electronic medical record system. Subsequently, we further screened these patients to identify those who underwent pulmonary nodule surgery, specifically by applying the inclusion criterion “(3) pathological diagnosis based on surgical specimens”, resulting in a cohort of 473 patients. We collected data on the age, CT and CACs testing dates, surgical method, and surgery date for these 473 patients. After applying the other 3 inclusion criteria, 13 patients under the age of 18 years old, 168 patients with only post-operative CACs results, and 29 patients with only post-operative CT data and missing pre-operative CT data. Ultimately, 263 patients met the inclusion criteria. Following this, 22 patients were excluded based on the exclusion criteria: 7 had a history of other malignant tumors, 2 had non-lung primary malignant tumors, 12 did not have thin-slice CT scans, and 1 had received anti-tumor treatment before CACs detection, leaving a final cohort of 241 patients.

CT assessment and pathological diagnosis

Two senior radiologists independently assessed the characteristics of pulmonary nodules (maximum nodule type, nodule ___location, maximum nodule size) in thin-section CT scans for all enrolled patients. When opinions differed, a consensus was reached through discussion with a third senior radiologist. Preoperatively, a mixture of methylene blue and surgical glue was injected around pulmonary nodules guided by CT for localization purposes. Excised surgical specimens were fixed in 4% neutral formaldehyde, dehydrated using standard procedures, embedded in paraffin, sectioned at 5 μm thickness, stained with hematoxylin and eosin (HE), and examined under a light microscope. Histological types were assessed independently by two senior pathologists. When opinions differed, a consensus was reached through discussion with a third senior pathologist.

Tumor biomarkers measurement

Peripheral venous blood samples of 3 ml and 1 ml were collected from patients in a fasting state and transferred into anticoagulant tubes for analysis. The 3 ml peripheral venous blood samples were analyzed using Roche Elecsys 2010 analyzer (Roche Diagnostics) employing chemiluminescent immunoassay technology to measure the tumor markers CEA (normal range: 0 ~ 5 ng/ml), CYFRA21-1 (normal range: 0 ~ 3.3 ng/ml), NSE (normal range: 0 ~ 25 ng/ml), AFP (normal range: 0 ~ 10 ng/ml), CA125 (normal range: 0 ~ 35 U/ml), CA19-9 (normal range: 0 ~ 35 U/ml), and Ft (normal range: 5 ~ 223.5 ng/ml). The detection of TAP was performed using a comprehensive diagnostic instrument from Zhejiang Ruisheng Medical Ltd. and TAP coagulant through coagulin affinity method. The specific procedure involved preparing a blood smear from 1 ml peripheral venous blood, allowing it to air dry naturally, and subsequently using a coagulant and the TAP detection system. Based on the specific images generated by the interaction between TAP and the coagulant, TAP coagulum area < 121 μm2 was considered as normal TAP expression, while > 121 μm2 indicated elevated TAP expression.

CACs detection

CACs detection was conducted by the pathology department of the First Affiliated Hospital of Zhengzhou University. Peripheral venous blood samples (10 ml) were collected in EDTA anticoagulant tubes. Within 2 h, blood samples were mixed with a cell preservation solution. The sample could be stored at room temperature for 4 days. Ficoll density gradient centrifugation was performed to isolate peripheral blood mononuclear cells, which were then used for cell smears and fixed for further processing. The fixed cell samples on glass slides underwent processing steps according to the mononuclear cell chromosome abnormality detection kit (Zhuhai SanMed Biotech Inc.), which included digestion with proteinase, ethanol gradient dehydration, and fluorescence in situ hybridization targeting the chromosomal aberrations on 3p22.1, 3q29, 10q22.3, and CEP10. Analysis was done using the Duet System.

Data collection

Clinical data, including age, sex, smoking history, drinking history, hypertension and diabetes history were extracted from electronic medical record system of the First Affiliated Hospital of Zhengzhou University.

Consensus clustering

Consensus clustering analysis combines multiple clustering algorithms to obtain a more stable clustering result. The approach involves resampling subsets from the original dataset and applying a specified clustering algorithm to partition each subset into k groups. The consistency of the clustering analysis results is evaluated based on multiple resampling iterations, which helps determine the optimal value of k. To achieve the division of all research objects into multiple clusters with similar characteristics. In our study, age, sex, smoking history, drinking history, hypertension history, diabetes history, maximum nod size, maximum nodule type and nodule ___location were involved in consensus clustering analysis. And, consensus clustering analysis was performed using the ConsensusClusterPlus package of R (version 4.2.1) with parameter settings as follows: pItem = 0.8, pFeature = 1, maxK = 10, reps = 100, clusterAlg="PAM”, distance = euclidean, innerLinkage="complete”, finalLinkage="complete”.

Statistical analysis

Statistical analysis was processed by SPSS 26.0. Median (interquartile ranges) was used to describe continuous data, and n (%) was used to describe categorical data. Mann-Whitney U, Chi-square or Fishe’s exact test were applied for the comparison of differences between groups. Receiver operating characteristic (ROC) curve was adopted to evaluate diagnostic performance. Multivariable logistic regression analyses were performed to identify the significant variables associated with malignant pulmonary nodules. Two-sided P values < 0.05 were considered statistically significant.

Ethics approval

All methods performed in this study were carried out in accordance with the Declaration of Helsinki. This study was approved by the ethics committee of First Affiliated Hospital of Zhengzhou University (Ethics approval number: 2021-KY-0302). Due to the retrospective study design, the informed consent was waived by the ethics committee of First Affiliated Hospital of Zhengzhou University.