Training and refining deep learning based denoisers without ground truth data

They are typically trained to minimize the mean squared error (MSE) between the output image of a deep neural network (DNN) and a ground truth image. Therefore, it is important for deep learning-based denoisers to use high quality denoised ground truth data for high performance. However, it is often challenging or even infeasible to obtain noise-free images in some applications such as medical imaging and hyperspectral remote sensing and , where it is expensive or even infeasible to obtain noise-free ground truth data.

Our fast refinement method outperformed conventional BM3D, deep image priors, and often the networks trained with ground truth.

Image denoising with deep neural networks

Training DNN denoisers without ground truth images

Contribution of This Thesis

Organization of This Thesis

Unfortunately, it is challenging to calculate the last divergence term of (II.2) analytically for general denoising methods h(y). Ex∼p(x),n∼N0,σ2kx−h(y;θ)k2, (II.5) whereh(y;θ) is a deep learning-based denoiser parametrized with a large-scale vectorθ∈RP. Equation (II.8) is still an unbiased estimator of (II.5), provided the training data is randomly permuted each epoch.

Finally, the last divergence term in (II.11) can be approximated using MC-SURE, so that the final estimator for (II.5) will be that. Note that the last term of (II.11) contains M K derivatives of the denoising with respect to the image intensity at each pixel. The deep learning-based image denoising with the cost function of (II.12) can be implemented using a deep learning development framework, such as TensorFlow [46], by the.

All networks presented in this section (denoted NET, which can be SDA or DnCNN) are trained using one of the following two optimization objectives: (MSE) the minimum MSE between a denoised image and the ground truth image in (II. 8) and (SURE) our proposed minimum MC-SURE without ground truth in (II.12). Note that, unlike (II.5), the existence of the minimizer for (II.12) must be taken into consideration. Finally, we approximate the last term in (III.5) using (III.4), so that the final unbiased risk estimator for (II.5) will be for Poisson noise reduction.

Training deep learning based denoisers without ground truth data 4

Background

Reviewing Stein’s unbiased risk estimator (SURE)
Reviewing Monte-Carlo Stein’s unbiased risk estimator
Reviewing Supervised training of DNN based denoisers

In practice, σ2 can be estimated [40], and ky−h(y)k2 only requires the output of the estimator (or noise reducer). Based on Theorem 2, the divergence term in (II.2) can be approximated by one realization n˜ ~N0,1 and a fixed small positive value. A gradient-based optimization algorithm such as stochastic gradient descent (SGD) [43], bootstrapping, Nester bootstrapping [44], or the Adam optimization algorithm is used to train the deep learning network h(y;θ) with respect to θ 45].

In practice, calculating the gradient of (II.6) for largeN is inefficient since a small amount of well-shuffled training data can often approximate the gradient of (II.6) well. In deep learning-based image processing such as image warping or single-image super-resolution, it is often more efficient to use image patches instead of whole images for training. For example, x(j) and y(j) may be image patches of a ground truth image and a noisy image, respectively.

Methods

Unsupervised training of DNN based denoisers

Note that no noise-free ground truth data x is required for the risk function on the right-hand side of (II.10) and there is no approximation in (II.10). To ensure that the estimator (II.12) is unbiased, the order of y(j) should be randomly permuted and the new set of n˜(j) should be generated at each epoch. Moreover, the derivative with respect to θ during training must be calculated in each iteration, so that the total number of K P derivatives must be evaluated for a mini-batch iteration (e.g., K in one of our simulations).

Finally, almost any DNN denoiser can use our MC-SURE-based training by changing the cost function from (II.8) to (II.12) if it satisfies the condition of Theorem 2. Many deep denoisers learnings with differential activation functions (eg sigmoid) can satisfy this condition. Therefore, we expect our proposed method to work for most deep learning image reducers [13–17].

Simulation results

Results: MNIST dataset
Regularization effect of DNN denoisers
Accuracy of MC-SURE approximation
Results: high resolution natural images dataset

One of the potential advantages of our SURE-based training method is that we can use all available data without real noise-free ground images. The performance of the model was tested with 100 randomly selected images from the default test set of 10,000 images. For all cases, SDA was trained with the Adam optimization algorithm [45] with a learning rate of 10−3 for 100 epochs.

In the case of the noise level of σ = 50, early stopping rule was applied as the network started to overfit the noisy data set after the first few epochs. To achieve stable training with good performance, each of the selected noise levels of σ had to be tuned. SURE-based methods took more training time than MSE-based methods due to the additional divergence calculations performed to optimize the MC-SURE cost function.

In the case of the BSD68 dataset in Table2.4, the SURE-based method outperformed BM3D for all the noise levels. One of the advantages of DnCNN-SURE over BM3D is that it does not suffer from rare patch effects. BM3D struggled to preserve important details in the image, such as the outline of the fish's eye, while DnCNN-SURE produced much sharper image with higher PSNR.

Table 2.1: Summary of denoising methods. NET can be either SDA or DnCNN.

Discussion

2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep Residual Learning for Image Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, p. 4] R Girshick, J Donahue, and T Darrell, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, p. 14] Harold C Burger, Christian J Schuler, and Stefan Harmeling, “Image Denoising: Can Regular Neural Networks Compete with BM3D?,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, p.

REFERENCES [17] Stamatios Lefkimmiatis, “Non-local Color Image Denoising with Convolutional Neural Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, p. 1 [20] Raymond A Yeh, Chen Chen, Teck Lim, Alexander G Schwing, Mark Hasegawa- Johnson, and Minh N Do, "Semantic Image Painting with Deep Generative Patterns," in IEEE Conference on Computer Vision and Pattern Recognition ( CVPR). 22] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee, "Extended Residual Deep Networks for Single Image Super-Resolution," in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW ).

23] Dongwon Park, Kwanyoung Kim, and Se Young Chun, “Efficient module-based single image super resolution for multiple problems,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 25] Ruohan Gao and Kristen Grauman, “On-demand learning for deep image restoration,” in IEEE International Conference on Computer Vision (ICCV), 2017. 32] Dmitri Ulyanov, Andrea Vedaldi, and Victor Lempitsky, “Deep Image Prior,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp.

Refining DNNs with SURE and Poisson denoising 17

Unsupervised training of blind image denoisers

DNN-based reducers are often trained as dummy image reducers for a range of noise levels instead of a single noise level. Thus, any blind silencer with a finite range of noise levels should satisfy this sufficient condition.

Unsupervised refining of DNN based denoisers

Unsupervised training for Poisson noise

K˙(n˙ y)t(h(y+ ˙n)˙ −h(y)) , (III.4) where n˙ ∈RK is a random variable following a binary distribution with values −1 and 1 with a probability of each 12 , ˙ is a small positive number, similar to (II.4) and is a multiplication in components.

Simulation results

Extension to blind color image denoising
Unsupervised refining (fine-tuning) with SURE
Extension to Poisson noise denoising

Even though color-blind image display is a more difficult task than gray-scale non-blind single noise level display, CDnCNN-SURE showed a superior display performance compared to the CBM3D. To demonstrate the effectiveness of this method, we took CDnCNN-SURE from Section3.2.1 as a baseline denoiser network and refined it on an individual noisy image (CDnCNN-SURE- FT) for all test images. The methods are evaluated on 9 widely used color images [51] and CDnCNN-SURE-FT was implemented for each image separately.

Unlike the CBSD68 dataset, CBM3D has superior performance over both CDnCNN-SURE and CDnCNN-MSE-GT methods on most of the color images. As a result, CDnCNN-SURE-FT outperformed all the other methods at both noise levels, including CBM3D, CDnCNN-MSE-GT in many images and on average (denoted by. The DIP method had the worst performance, lagging behind CDnCNN- SURE method with almost 2dB.

Our proposed CDnCNN-SURE-FT provided a sharper image that was visually and quantitatively closer to the ground truth image. We observed that BM3D+VST images were significantly blurrier than the results of our SDA-based methods, especially for ζ = 0.2. At high values of ζ >0.1 the approximation became very inaccurate and after ζ >0.2 the model could not converge and thus training was not feasible.

Figure 3.1: Denoising results of an image from the CBSD68 dataset for σ = 50.

Discussion

1] A Krizhevsky, I Sutskever, and G E Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS bl. 3] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi , “Inception - v4, Inception-ResNet and the Impact of Residual Connections on Learning.," in AAAI Conference on Artificial Intelligence (AAAI), 2017, pp. REFERENCES [7] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille, "Semantic image segmentation with deep convolusional neat and fully coupled crfs," in International Conference on Learning Representation (ICLR), 2015.

8] Jonathan Long, Evan Shelhamer, and Trevor Darrell, “Fully convolutional networks for semantic segmentation,” inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 11] Diederik P. Kingma and Max Welling, “Auto-Encoding Variational Bayes,” inInternational Conference on Learning Representations (ICLR), 2014. 12] Jun-Yan Zhu, Taesung Park, Phillip Isola and Alexei A Efros, “Unpaired Image to-image translation using cycle-consistent adversarial networks,” in IEEE International Conference on Computer Vision (ICCV).

31] Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala en Timo Aila, “Noise2Noise: Learning Image Restoration without Clean Data,” in International Conference on Machine Learning (ICML), 2018, pp 45] Diederik P Kingma en Jimmy Ba, “Adam - A Method for Stochastic Optimization,” in International Conference on Learning Representation (ICLR), 2015. 46] Mart´ın Abadi et al., “Tensorflow: A system for large- scale machine learning”, inProceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, 2016, pp.

Denoising results of SDA with various training methods for MNIST dataset at

Denoising results of an image from the BSD68 dataset for σ = 75

Denoising results of an image from the CBSD68 dataset for σ = 50

Denoising results of the “Baboon” image for σ = 25

Denoising results of the “Kodak 2” image for σ = 50

Denoising results of SDA with various methods for MNIST dataset for ζ = 0.1

Denoising results of SDA with various methods for MNIST dataset for ζ = 0.2