Table 3 Comparison experiments of different encoders on the dataset of TCE-S.

From: Combined query embroidery image retrieval based on enhanced CNN and blend transformer

Image encoder

Text encoder

EParams/M

R@1

R@10

R@50

mAP

ResNet-18 + ViT

BERT

207.2

46.25

56.76

81.54

53.45

ResNet-34 + ViT

BERT

218.7

46.37

56.87

81.71

53.51

ResNet-50 + ViT

BERT

221.6

46.42

56.90

81.75

53.58

ConvNeXt + ViT

BERT

232.2

46.50

57.02

81.78

53.65

ResNet-18 + ViT

RoBERTa

432.2

46.45

56.85

81.77

53.53

ResNet-18 + Swin-T

RoBERTa

374.2

44.52

54.64

79.73

51.38