Table 2 Results summary of all experiments where we train the models on either batch and test the models within the same-batch or cross-batch. We perform hypothesis testing where the null hypothesis states that the model testing accuracy with color normalization is the same as using just the original H&E images and the alternative hypothesis as the model testing performance with color normalization is better than with the original H &E images. p-values are indicated in the parenthesis.

From: Impact of stain variation and color normalization for prognostic predictions in pathology

Testing set

Batch A

Batch B

Train on Batch A

 Original H&E

0.81

0.53

 Traditional method

0.96(p\(=\)0.010)

0.60

 Generative method

0.93 (p\(=\)0.069)

0.61

Train on Batch B

 Original H&E

0.52

0.74

 Traditional method

0.58

0.88 (p\(=\)0.033)

 Generative method

0.52

0.87 (p\(=\)0.033)

  1. H\(_{0}\): The model testing accuracy with color normalization is the same as using the original H&E data
  2. H\(_{1}\): The model testing accuracy with color normalization is better than using the original H&E data.