Exploring vision transformers and XGBoost as deep learning ensembles for transforming carcinoma recognition

Raju, Akella Subrahmanya Narasimha; Venkatesh, K.; Padmaja, B.; Kumar, CH. N. Santhosh; Patnala, Pattabhi Rama Mohan; Lasisi, Ayodele; Islam, Saiful; Razak, Abdul; Khan, Wahaj Ahmad

doi:10.1038/s41598-024-81456-1

Download PDF

Article
Open access
Published: 03 December 2024

Exploring vision transformers and XGBoost as deep learning ensembles for transforming carcinoma recognition

Akella Subrahmanya Narasimha Raju¹,
K. Venkatesh²,
B. Padmaja³,
CH. N. Santhosh Kumar⁴,
Pattabhi Rama Mohan Patnala⁵,
Ayodele Lasisi⁶,
Saiful Islam⁷,
Abdul Razak⁸ &
…
Wahaj Ahmad Khan⁹

Scientific Reports volume 14, Article number: 30052 (2024) Cite this article

3293 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Early detection of colorectal carcinoma (CRC), one of the most prevalent forms of cancer worldwide, significantly enhances the prognosis of patients. This research presents a new method for improving CRC detection using a deep learning ensemble with the Computer Aided Diagnosis (CADx). The method involves combining pre-trained convolutional neural network (CNN) models, such as ADaRDEV2I-22, DaRD-22, and ADaDR-22, using Vision Transformers (ViT) and XGBoost. The study addresses the challenges associated with imbalanced datasets and the necessity of sophisticated feature extraction in medical image analysis. Initially, the CKHK-22 dataset comprised 24 classes. However, we refined it to 14 classes, which led to an improvement in data balance and quality. This improvement enabled more precise feature extraction and improved classification results. We created two ensemble models: the first model used Vision Transformers to capture long-range spatial relationships in the images, while the second model combined CNNs with XGBoost to facilitate structured data classification. We implemented DCGAN-based augmentation to enhance the dataset’s diversity. The tests showed big improvements in performance, with the ADaDR-22 + Vision Transformer group getting the best results, with a testing accuracy of 93.4% and an AUC of 98.8%. In contrast, the ADaDR-22 + XGBoost model had an AUC of 97.8% and an accuracy of 92.2%. These findings highlight the efficacy of the proposed ensemble models in detecting CRC and highlight the importance of using well-balanced, high-quality datasets. The proposed method significantly enhances the clinical diagnostic accuracy and the capabilities of medical image analysis or early CRC detection.

Enhancing colorectal cancer histology diagnosis using modified deep neural networks optimizer

Article Open access 22 August 2024

Ensemble-based multi-tissue classification approach of colorectal cancer histology images using a novel hybrid deep learning framework

Article Open access 31 May 2023

Color-CADx: a deep learning approach for colorectal cancer classification through triple convolutional neural networks and discrete cosine transform

Article Open access 22 March 2024

Introduction

New studies show that CRC will continue to dictate the lives of many long into this year and possibly 2024, with an uptick of more than 2.2 million new cases. It made it 1,75,0000 cases beyond US borders, accompanied by a single death toll¹. Cases of colorectal cancer are common and considered to be one of the severe types of cancer, and hence, the findings are very valuable in emphasizing the need to enhance the efficiency of diagnosis and treatment approaches. Despite the many advances in the fields of diagnostics and therapeutics, colon cancer (CRC) is still a health care issue of global proportions. Pre-hypnotic diagnosis is still a problem all around the globe because more focus is centred on the provision of such healthcare services. Worldwide, colorectal cancer is among the most prevalent causes of cancer-related mortality. Despite the importance of early and accurate detection in enhancing survival rates, current diagnostic instruments frequently fail to provide the necessary sensitivity and specificity for comprehensive screening.

The worldwide incidence of colorectal cancer also shows geographical variation. The incidence rates for CRC are among the highest in the high-income countries like the United States of America, Japan, and some countries of Europe. In the international context, Japan is characterized by a high level of age-standardized incidence rates of colorectal cancer and mortality rates². Comparatively, the number of cases reported in low and middle-income countries such as India is less; this is mostly because of an under-records and limited access to screenings and diagnoses. Colorectal cancer is the sixth most prevalent cancer in India where the estimated rate of new cases is 70,000 and the death toll is about 41,000^3,4. The differences in the healthcare facilities and the varied patterns of disease can be seen from the fact that the incidence rate in India is 4.9 per 100,000 as compared to high-income countries like Japan where it is 36.6 per 100,000⁵.

Multifactorial, including hereditary and environmental components, play important roles in the development of colorectal cancer. Some of the causes that are known to raise the risk of this condition include; high intake of red and processed meat, low fibre intake, obesity, lack of physical activities, smoking and excessive alcohol intake. Colorectal cancer is genetic and there is a high probability of one being affected by the disease if members of the family are affected. This supports the genetic basis of diseases since the children are born with the gene that makes them vulnerable to the disease.

Although CRC incidence continues to rise, healthcare infrastructure and specialists’ distribution remain uneven across different regions. This is because the people in high income countries are likely to be diagnosed early and have better chance of surviving because of increased awareness and improved screening methods including colonoscopy and faecal occult blood test. On the other hand, LMICs like India also have challenges like inadequate health literate populations, diagnosis at later stages of cancer, and a specific shortage of trained human resource in cancer care.

Once detected early enough, colorectal cancer is very easy to treat and this enhances the overall survival rates. 5-year survival of CRC can be more than 90% in case of early diagnosis of the disease⁶. However, this ratio is rather low when cancer is discovered at a later stage, when treatments are less efficient. Problems with invasiveness, patient pain, and poor compliance rates prevent early detection, even while efficient screening technologies like colonoscopy and faecal occult blood tests are available. This is particularly true in low- and middle-income nations⁷. Given this situation, it is clear that easily accessible and integrable non-invasive early detection methods are required.

Artificial intelligence (AI) has the potential to revolutionise medical imaging, allowing for more precise identification and diagnosis of colorectal cancer. An area of artificial intelligence known as deep learning has shown enormous promise in evaluating massive amounts of imaging data, detecting early indicators of colorectal cancer (CRC), and assisting physicians with decision-making. These models powered by AI, especially those built on deep learning, can analyse medical images with remarkable precision, often matching or even outperforming human experts. The use of deep learning in colorectal cancer diagnosis has yielded remarkable results, particularly in the areas of polyp and adenoma identification and classification, which are critical for early intervention.

However, the application of AI in identifying CRC comes with certain problems. This is because a model that has been trained to work on one specific population or type of imaging is very likely to fail when exposed to another population or type of imaging data if it was trained on a small and/or skewed dataset. The use of input from different categories of data, for example, clinical notes and images, leads to a number of further issues. To these challenges, this study seeks to establish how it is possible to improving the accuracy of colorectal cancer assessment by incorporating deep learning models such as XGBoost and Vision Transformers with CNNs. In the field of medical image analysis, conventional deep learning techniques, including convolutional neural networks (CNNs), have made substantial breakthroughs. In contrast, these models frequently encounter difficulties with feature extraction, particularly when dealing with intricate datasets such as colonoscopy images. Additionally, the poor generalisation of models, resulting from class imbalance in medical datasets, further restricts their diagnostic utility.

This investigation uses the CKHK-22 dataset, a massive compilation from three well-known datasets: Hyper Kvasir, CVC Clinic DB, and Kvasir2. Once we narrowed down the dataset to 14 acceptable classes, we reduced the number of images from 19,621 to 16,942. This we were able to accomplish by removing 24 classes. The main idea of this procedure was to enhance the quality of the dataset and its usability, so that the evaluated models would be more precise and sophisticated.

The presented research uses a new approach to the integration of models, which is based on the ensemble learning framework, Vision Transformers, and XGBoost with pre-trained deep integrated CNN models, including ADaRDEV²I-22, DaRD-22, and ADaDR-22. The structured data is processed using XGBoost and for the images, Vision Transformers are used to extract spatial features from the images. For this purpose, we have applied the performance measures on the original data set as well as on the data set modified as per the proposed approach. When evaluating the AUC values and accuracy of all the ensemble configurations it can be seen that the size and quality of the dataset positively affects the model’s performance. The highest performance with the AUC of 98.80% was achieved by the ADaDR-22 + Vision Transformer ensemble. The training and testing set accuracies of the model that was designed were on the modified dataset it was 93.40%. This paper aims to resolve these obstacles by integrating a variety of sophisticated deep learning techniques. We introduce a novel ensemble learning framework that integrates CNNs, Vision Transformers, and XGBoost to improve the accuracy of both feature extraction and classification in the detection of colorectal carcinoma.

This study shows a new ensemble learning framework that uses Vision Transformers, XGBoost, and pre-trained deep CNN models (ADaRDEV2I-22, DaRD-22, and ADaDR-22) to improve the detection of colorectal cancer. Our method solves some of the biggest problems in medical imaging, like how hard it is to extract features and how unequal the classes are, by using Vision Transformers to get spatial features out of images and XGBoost to handle structured data. The integration of these models has led to significant enhancements in classification accuracy, enabling the ADaDR-22 + Vision Transformer ensemble to achieve state-of-the-art performance. This innovative ensemble demonstrates that the size and quality of the dataset positively influence the model’s performance, making it a valuable instrument for the identification of carcinomas.

Thus, the study finds that deep learning ensembles are applicable in medical image analysis and that there is need to use quality data to enhance the performance of the models. The findings from this study show that the use of ADaDR-22, Vision Transformer and a well-analysed dataset could greatly improve the diagnostic processes and the early detection of colorectal cancer. Due to the ability to support the early and more accurate diagnosis, these innovations can significantly decrease the world burden of this disease and improve patients’ outcomes.

Our study strategically integrates deep learning ensemble approaches to improve colorectal cancer diagnosis. The project progresses via five separate phases:

1.
The first step in improving the dataset was to narrow the CKHK-22 collection, which included 19,621 images across 24 classes, down to 14,942 high-quality images. This was important as it made the dataset to be suitable for training of the algorithms for cancer detection.
2.
Hence, the following approach is suggested in light of the findings of this paper: one should use three pre-trained Integrated CNNs, namely ADaRDEV²I-22, DaRD-22, and ADaDR-22. Then, we combine these CNN models with other advanced models such as XGBoost and Vision Transformers to enhance the extraction of features and the enhancement of classification accuracy due to the combination of the strengths of both models.
3.
We train and test the ensemble models with the original and the revised datasets. The two-evaluation metrics which we employ to fine-tune the models and ensure that they are able to identify the probable malignant lesions correctly are the testing accuracy and the area under the curve (AUC).
4.
In the analysis of the results of the collaboration of CNN + Vision Transformer and CNN + XGBoost, we consider the results of the ensembles on the two sets of data in the full study of the model after training. Such a investigation helps in identifying the right configuration for the diagnosis of colorectal cancer.
5.
The last step entails fine-tuning of the top-performing model with a view of making it ready for clinical application. We aim to improve patient’s care and advance medical practice by identifying and fine-tuning the best performing ensemble for the early detection of Colorectal Cancer.

We structure the remaining portion of this paper as follows: In Section 2, the most recent developments in medical image classification based on deep learning are the primary focus of the detailed overview of the related work. The methodology is defined in Section 3, which encompasses the data preprocessing stages, CNN integration, Vision Transformer adoption, and XGBoost implementation. Section 4 presents the experimental setup and results, along with a comparison with existing models. Section 5 focuses on the results, highlighting the strengths and weaknesses of the proposed method. Finally, Section 6 concludes the paper by delineating potential clinical applications and future research directions.

Literature survey

Detailed literature review

Screening and diagnosis of CRC have greatly improved in the current generation due to the integration of AI and medical imaging. Colonoscopies and other conventional procedures are still the most reliable in the diagnosis of CRC due to the high level of accuracy in the identification of precancerous polyps and adenomas. On the other hand, these procedures often include invasiveness, patient discomfort, and the need for trained personnel. This has led to increased research on improving the precision, timeliness, and availability of CRC screening using AI-based diagnostic tools. This literature review looks at the current state of CRC detection with a focus on deep learning models, data augmentation approaches, and ensemble methods in order to improve diagnostic results.

New studies reveal that AI, and deep learning models like CNNs and ViTs, are capable of changing diagnosis in the medical field. These models’ ability in feature extraction and classification can result in better identification of CRC from medical images. The ensemble methods that are the methods which combine the features of other models to form a new one can be of help in boosting the diagnostic performance.For instance, when using XGBoost with CNNs, there is enhanced prediction performance as compared to the use of the two models separately. In consequence, to overcome the problems related to the lack of comprehensive and comparable data, researchers applied GAN and other data augmentation methods. This summary presents a review of these developments, which includes the major works, approaches, and findings in the field. Further, it discusses gaps, limitations, and possible future studies concerning the diagnosis of colorectal cancer. Table 1 contains a comprehensive literature review.

Table 1 Comprehensive Literature review.

Subjects

Abstract

Similar content being viewed by others

Enhancing colorectal cancer histology diagnosis using modified deep neural networks optimizer

Ensemble-based multi-tissue classification approach of colorectal cancer histology images using a novel hybrid deep learning framework

Color-CADx: a deep learning approach for colorectal cancer classification through triple convolutional neural networks and discrete cosine transform

Introduction

Literature survey

Detailed literature review

Research deficits

Resolving research gaps: the ideal approach

Materials and methods

Ensemble CADx for the diagnosis of colorectal carcinoma

Introduction to ensemble CADx systems

Preparation of datasets

Medical motion image collection utilising colonoscopy

Augmentation of CKHK-22 Dataset

DCGAN-Assisted augmentation of CKHK-22 images

The application of a cloud environment for data storage Google

Image preprocessing for the CKHK-22 Medical Motion colonoscopy

Image datasets train and test splitting possibility

Model preparation for ensemble CADx systems

Integrated CNNs

Vision transformers architecture

XGBoost’ s architecture

Ensemble models

Integrated CNN + ViTs

Expected benefits:

Integrated CNN + XGBoost

Expected benefits

Flow diagram of ensemble CADx

Experimental setup

Hardware and software environment

Hyperparameter tuning

Training and testing procedure

Evaluation criteria

Results

Results: A review

Performance of integrated CNN models

Performance of integrated CNN+ vision transformers (ViTs)

Performance of integrated CNN+ XGboost

Exploring ensemble model comparisons

Effect of training and testing timings of best models

Results of the statistical significance test

Effect of augmentation with DCGAN for CKHK-22 dataset

The best model performance metrics

Confusion matrices and ROC curves

Ablation study

Model robust testing

Discussion

The future directions for research

Clinical relevance

Conclusion

Future work

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethical Approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links