Fig. 2: Simulation-supervised deep neural network for the integrated microscope.
From: Large depth-of-field ultra-compact microscope by progressive optimization and deep learning

a Illustrations of generating training pairs through simulation-supervision strategy. To generate the training pairs between an all-in-focus image and a depth-coded image from the integrated microscope, a commercial microscope combined with a piezo objective scanner was employed to capture focal stacks of 3D samples (Supplementary Fig. 13). Two regions from a tilted sample with an approximate axial distance of 300 µm were delineated by blue and green boxes. The region labeled by the green box was situated in the top-right corner of the image and gradually came to focus as the imaging focal plane approached the sample. Conversely, the region marked by the blue box was situated in the bottom left corner of the image and gradually progressively became clearer as the imaging focal plane receded from the sample. The green and blue arrows represented the direction in which the focal plane must move to capture those regions with optimal clarity. The Σ symbol denoted the summation of a collection of images enclosed within a large bracket. In the “Depth fusion” row, each slice from the captured focal stack was first processed to extract the region where the sample was clearly captured (i.e., the in-focus region), as illustrated by gray patches inside the bracket. Then, these in-focus regions were summed to create an all-in-focus image. In the “Physical propagation image” row, each slice from the captured focal stack was first convolved with the depth-specific PSFs and then summed to reassemble the capture from the integrated microscope. b Structures of the proposed simulation-supervision network to retrieve clear images from the coded integrated microscope. c Comparisons of the raw coded image (Raw; top left) and the network-retrieved image (Network; bottom right). The zoom-in regions on the right side compare the raw coded images (Raw; top), deconvolved images using shift-variant deconvolution (Deconv; middle), and network-retrieved images (Network; bottom). Representative data from 122 samples. d Statistical comparisons between the shift-variant deconvolution and the proposed network on 19 test samples in terms of peak signal-to-noise ratio (PSNR, left), perceptual loss (Learned Perceptual Image Patch Similarity, LPIPS70, middle), and structure similarity index (SSIM, right). Central line inside the box: Median. Box: interquartile range. Whiskers: Maximum and minimum. Outliers: Individual data points. e Statistical comparisons between results of the shift-variant deconvolution (blue) and the proposed network (red) for 19 samples placed at different axial depths in terms of SSIM. Error bars are represented for standard deviation. Center of error bars: Mean of scores.