Figure 4

Visualizing the features of seven polyp images using ResNet50 with different weights through Grad-CAM. The regions with higher transparency in the images indicate that the backbone network has paid more attention. The orange bounding boxes represent the ground truth delineating the locations of polyps in each image. The green and red bounding boxes denote accurate and erroneous predictions generated by different methods for each image respectively. FPSiam can activate more accurate polyp regions in the attention maps to make precise bounding box predictions. (a–c,e,g) A more precise bounding box localization capability of the FPSiam method than the TL method. TL method fails to locate polyp regions in some cases: (d) existence of yellow intestinal fluid, big bubble, and reflection; (e) existence of folded intestinal walls; (f) existence of folded intestinal walls and dense foam.