2.4 Summary
3.2.1 Proposed Work
3.2.1.3 Training
The training of DCNN and the classifier is carried out on BOSSBase v1.0 [85]
and BOWS2 Ep.3 [86] datasets. The BOSSBase and BOWS2 datasets are having 10,000 gray-scale images. Each image is split into four sub-images from the center, each with a size 256×256 to increase the dataset size. The creation of sub-images is shown in Figure 3.10. Therefore, each of the BOSSBase and BOWS2 datasets is having 40,000 images. We call these new datasets as cropped-BOSSBase and cropped-BOWS2 datasets, respectively. Both networks are trained and tested in three scenarios.
Scenario-1: Under this scenario, 15,000 pairs (15,000 cover and 15,000 stego) of images from cropped BOSSBase dataset are used to train DCNN. The vali- dation of DCNN is done on 20% of training pairs. The Steganalytic classifier is trained and validated (with 20% of training data) using another 15,000 pairs. A set of 10,000 pairs are kept aside for testing of the complete network.
Scenario-2: The DCNN under this scenario is trained and validated (with 20% of training pairs) using 30,000 pairs (30,000 cover and 30,000 stego) images
BossBase Cover image with size 512x512 Cropped 4 Cover images each with size 256x256
Image I Image II
Image IV Image III
Figure 3.10: Division of images (Original size = 512×512 image) into four sub- images each of size 256×256.
from cropped BOSSBase v1.0 [85]. The steganalytic classifier is trained with 30,000 pairs of cropped BOWS2 [86] and validated (with 20% of training pairs), and testing is done using 10,000 pairs of images from cropped BOSSBase v1.0 and 10,000 pairs of images from cropped BOWS2 dataset.
Scenario-3: Under this scenario, mixed datasets are formed with 2,500 images each using S-UNIWARD [15], WOW [5], and HUGO [6] with payloads 0.2, 0.3, 0.4 and 0.4 bpp, total 3×4×2,500 = 30,000 pairs for both, cropped-BOSSBase and cropped-BOWS2 dataset separately, while leaving 10,000 pairs untouched for testing. Both the CNNs are trained and tested in two ways:
(i) DCNN is trained and validated on 30,000 (20% validation) mixed-cropped- BOSSBase and Steganalytic classifier on 30,000 (20% validation) mixed-cropped- BOWS2.
(ii) DCNN is trained and validated on 30,000 (20% validation) mixed-cropped- BOWS2 and classifier on 30,000 (20% validation) mixed-cropped-BOSSBase.
Training parameters of DCNN All the kernels of the DCNN are initialized to a random normal distribution with µ = 0 and σ = 0.02. During training, the Adagrad optimization algorithm [94] is used with a learning rate of 0.001.
Each batch consists of 150 images ( 75 cover and 75 stego images). The mean- squared loss is used as a cost function. The network is trained until it converges (≈80 epochs), and the model with the best validation accuracy is chosen for the evaluation of the model.
Evaluation of DCNN The kernel’s weight learned by the DCNN is similar to various kernels ofSRM. Figure3.11depicts the visualizations of (16 filters) of the trained DCNN. It is clear that there is no dead filter [3] (filter with almost white features is referred to as the dead filter), and all filters have learned to extract slightly different noise characteristics.
Figure 3.11: Visualization of 16 learned kernels weights of size 5×5 of DCNN.
To show the effectiveness of DCNN, we evaluated the noise residuals resulted from the high-pass filter and DCNN using 10,000 test images from the cropped- BOSSBase dataset using Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM).
(i)Cover image prediction using high-pass filters: First stego image is convolved with the high-pass filter (KV filter) eq. (1.5), which results in the noise residual.
This noise residual is subtracted from the stego image, which yields a cover image predicted by the KV filter.
(ii) Cover image prediction using DCNN: DCNN is trained to predict the cover image from a given input image.
The PSNR and theSSIM between the original cover and cover predicted us- ing a high-pass filter (HPF) is calculated; likewise, the PSNR and the SSIM between the original cover and cover predicted using DCNN is calculated. The
Table 3.4: Quantitative evaluation of DCNN in terms of mean (µ) and standard deviation (σ) of PSNR (dB) and SSIM. Tabulated scores are obtained with original cover and cover image predicted by the HPF and the DCNN on different embedding configurations. P† refers to the considered set of payloads {0.2,0.3,0.4,0.5} in bpp.
Method Embedding Payload µ(PSNR) σ(PSNR) µ(SSIM) σ(SSIM) scheme (bpp)
HPF
S-UNIWARD P†
HUGO 30.719 10.56 0.908 0.13
WOW
DCNN
S-UNIWARD
0.5 44.878 8.12 0.996 0.01 0.4 41.791 6.88 0.994 0.01 0.3 40.536 7.34 0.998 0.00 0.2 39.103 6.38 0.995 0.01
HUGO
0.5 42.267 6.45 0.994 0.01 0.4 42.023 6.13 0.996 0.01 0.3 41.557 6.81 0.994 0.01 0.2 40.383 7.05 0.997 0.00
WOW
0.5 43.612 5.56 0.996 0.00 0.4 42.516 6.29 0.995 0.01 0.3 41.592 8.16 0.995 0.01 0.2 40.215 8.95 0.991 0.02
average PSNR (µ(PSNR)), standard-deviation of the PSNR (σ(PSNR)) and av- erage SSIM (µ(SSIM)) and standard deviation of the SSIM (σ(SSIM)) between the original cover and predicted cover of 10,000 test images for different stegano- graphic embeddings (S-UNIWARD, HUGO, WOW) and different payloads (0.2, 0.3, 0.4, 0.5 bpp) are tabulated in Table 3.4 for both high-pass filters and pro- posed DCNN as the denoising tool. It has been observed that the PSNR and SSIM for the proposed DCNN are relatively higher than that of the HPF, and thus the efficacy of the proposed DCNN has been shown over conventional HPF.
Training parameters of Steganalytic Classifier This CNN is trained with the stego noise produced by DCNN. All the network layers are initialized with random normal distribution with µ = 0 and σ = 0.02. The learning algorithm used is Adagrad [94] optimizer. Each mini-batch consists of 150 images (75 cover and 75 stego images). The learning rate is fixed to 0.005. The categorical cross- entropy loss [95] function is used for error calculation while training. Intuitively, the network converges faster for the images with high embedding rates than the
images with a low embedding rate while training.