Table 3 Results of ablation studies with different metrics on public and private datasets.

From: Automatic detecting multiple bone metastases in breast cancer using deep learning based on low-resolution bone scan images

Model

Value

AP

Precision

Recall

Public dataset

 Backbone

Mean ± Std

60.5 ± 3.9%

64.2 ± 7.8%

40.5 ± 8.7%

Median

59.2%

64.1%

38.2%

95% CI

57.4%, 63.6%

60.7%, 68.2%

33.5%,47.4%

 +DH module

Mean ± Std

66.9 ± 3.2%

67.6 ± 1.6%

59.0 ± 4.3%

Median

67.2%

69.8%

59.7%

95% CI

64.3%, 69.5%

65.3%, 70.9 %

55.6%,62.4%

Effect sizes

1.791

1.267

2.171

p value

*\(p<0.025\)

*\(p<0.025\)

*\(p<0.025\)

 +ST_Encoder

Mean ± Std

68.7 ± 3.2%

69.9 ± 2.9%

61.9 ± 4%

Median

68.5%

69.4%

63%

95% CI

66.2%, 71.3%

67.6%,72.2 %

58.7%,65.1%

Effect sizes

2.79

1.528

2.758

p value

*\(p<0.025\)

*\(p<0.025\)

*\(p<0.025\)

Private dataset

 Backbone

Mean ± Std

33.3 ± 2.0%

53.6 ± 3.5%

13.0 ± 5.3%

Median

33.5%

54.0%

11.6%

95% CI

31.7%, 34.9%

50.8%, 56.3%

8.8%,17.3%

 +DH module

Mean ± Std

47.6 ± 4.4%

59.2 ± 4.5%

49.6 ± 4.4%

Median

48.3%

59.1%

50.2%

95% CI

44.1%, 51.2%

55.6%,62.9%

46.0%,53.1%

Effect sizes

1.176

2.347

8.594

p value

*\(p<0.0125\)

*\(p<0.0125\)

*\(p<0.0125\)

 +ST_Encoder

Mean ± Std

49.5 ± 5.3%

56.7 ± 7.0%

52.1 ± 6.6%

Median

49.7%

54.9%

52.3%

95% CI

45.2%, 53.7%

51.1%,62.3%

47.8%,55.3%

Effect sizes

3.428

1.987

4.571

p value

*\(p<0.0125\)

*\(p<0.0125\)

*\(p<0.0125\)

 +PAE module

Mean ± Std

52.6 ± 6.1%

61.3 ± 6.3%

52.8 ± 4.5%

Median

52.6%

61.9%

52.9%

95% CI

46.7%, 56.5%

56.3%,66.3 %

48.2%,56.4%

Effect sizes

3.253

1.411

6.303

p value

*\(p<0.0125\)

*\(p<0.0125\)

*\(p<0.0125\)

 +T_Decoder

Mean ± Std

55.0 ± 6.4%

62.0 ± 6.4%

54.3 ± 4.2%

Median

54.8%

62.9%

54.2%

95% CI

49.9%, 60.1%

56.9%,67.1%

50.9%,58.6%

Effect sizes

3.961

1.313

6.854

p value

*\(p<0.0125\)

*\(p<0.0125\)

*\(p<0.0125\)

  1. The metrics are shown with AP, precision and recall. The significant differences are compared between the Backbone and other modifications. For public dataset, *\(p<0.025\), Bonferroni-adjusted Wilcoxon signed rank test. For private dataset, *\(p<0.125\), Bonferroni-adjusted Wilcoxon signed rank test.