Fig. 2: Performance evaluation on publicly available patches datasets.

a Average accuracy comparison at the patient level on the BreakHis. The values are calculated based on the mean and standard deviation of classification accuracy across all magnifications. b As shown in (a), average accuracy comparison at the image level across all magnifications. Training and testing sets for each method were independently split, with significance indicated by “*” for p-values < 0.05 and non-significance by “ns”, calculated using a two-sided two-sample independent t-test. c, d Performance of different methods at the patient level and image level, respectively, at magnification of 40\(\times\), 100\(\times\), 200\(\times\), and 400\(\times\). The bar chart results are from five repeated experiments using the same training and testing data (n = 5). e Performance of different algorithms on the LC25000 dataset. The numbers below each model name indicate the parameter count. “Unknown” denotes models for which the parameter count is unavailable. f Confusion matrix on validation set of LC25000 dataset. All data are presented as mean values ± SD. Source data are provided with this paper.