Table 3 Comparison experiments of different encoders on the dataset of TCE-S.
From: Combined query embroidery image retrieval based on enhanced CNN and blend transformer
Image encoder | Text encoder | EParams/M | R@1 | R@10 | R@50 | mAP |
---|---|---|---|---|---|---|
ResNet-18 + ViT | BERT | 207.2 | 46.25 | 56.76 | 81.54 | 53.45 |
ResNet-34 + ViT | BERT | 218.7 | 46.37 | 56.87 | 81.71 | 53.51 |
ResNet-50 + ViT | BERT | 221.6 | 46.42 | 56.90 | 81.75 | 53.58 |
ConvNeXt + ViT | BERT | 232.2 | 46.50 | 57.02 | 81.78 | 53.65 |
ResNet-18 + ViT | RoBERTa | 432.2 | 46.45 | 56.85 | 81.77 | 53.53 |
ResNet-18 + Swin-T | RoBERTa | 374.2 | 44.52 | 54.64 | 79.73 | 51.38 |