Table 25 Summary of model limitations and mitigation strategies.

From: Exploring vision transformers and XGBoost as deep learning ensembles for transforming carcinoma recognition

Limitation

Description

Proposed Mitigation

Computational Resources

Requires high-performance GPUs with significant memory (16 GB +) for training and inference

Employ model compression techniques or explore distributed training to optimize hardware utilization

Training Time

Longer training times due to the complexity of the ensemble model integrating multiple components

Optimize training pipelines and consider reducing the number of ensemble components without sacrificing accuracy

Overfitting

Higher risk of overfitting, especially on smaller datasets, due to the complexity of the model

Utilize regularization techniques (e.g., dropout, L2), extensive data augmentation, and cross-validation

Interpretability

Reduced interpretability compared to simpler models, even with techniques like Grad-CAM

Extend interpretability methods to cover all components, such as visualization tools for ViT and XGBoost

Scalability

Limited scalability to resource-constrained environments like mobile or edge devices

Develop lightweight versions of the ensemble using knowledge distillation or pruning methods