We create image variations and further train our network model on top of them to increase the generalization power of the CNN. It is crucial to train our CNN with as many image variations as possible so as to increase the accuracy. We basically obtain more samples of the same image by flipping or rotating them. In this recipe, we will transform and create samples of images using a concrete implementation of ImageTransform in DL4J.
How to do it...
Use FlipImageTransform to flip the images horizontally or vertically 1. (randomly or not randomly):
ImageTransform flipTransform = new FlipImageTransform(new Random(seed));
Use WarpImageTransform to warp the perspective of images deterministically 2. or randomly:
ImageTransform warpTransform = new WarpImageTransform(new Random(seed),delta);
Use RotateImageTransform to rotate the images deterministically or 3. randomly:
ImageTransform rotateTransform = new RotateImageTransform(new Random(seed), angle);
Use PipelineImageTransform to add image transformations to the pipeline:
4.
List<Pair<ImageTransform,Double>> pipeline = Arrays.asList(
new Pair<>(flipTransform, flipImageTransformRatio), new Pair<>(warpTransform , warpImageTransformRatio) );
ImageTransform transform = new PipelineImageTransform(pipeline);
How it works...
In step 1, if we don't need a random flip but a specified mode of flip (deterministic), then we can do the following:
int flipMode = 0;
ImageTransform flipTransform = new FlipImageTransform(flipMode);
flipMode is the deterministic flip mode.
flipMode = 0: Flips around the x axis flipMode > 0: Flips around the y axis flipMode < 0: Flips around both axes
In step 2, we passed in two attributes: Random(seed) and delta. delta is the magnitude in which an image is warped. Check the following image sample for the demonstration of image warping:
(Image source: https://commons.wikimedia.org/wiki/File:Image_warping_example.jpg License: CC BY-SA 3.0)
WarpImageTransform(new Random(seed),delta) internally calls the following constructor:
public WarpImageTransform(java.util.Random random, float dx1,
float dy1, float dx2, float dy2, float dx3, float dy3, float dx4, float dy4
It will assume dx1=dy1=dx2=dy2=dx3=dy3=dx4=dy4=delta. Here are the parameter descriptions:
dx1: Maximum warping in x for the top-left corner (pixels) dy1: Maximum warping in y for the top-left corner (pixels) dx2: Maximum warping in x for the top-right corner (pixels) dy2: Maximum warping in y for the top-right corner (pixels) dx3: Maximum warping in x for the bottom-right corner (pixels) dy3: Maximum warping in y for the bottom-right corner (pixels) dx4: Maximum warping in x for the bottom-left corner (pixels) dy4: Maximum warping in y for the bottom-left corner (pixels)
The value of delta will be auto adjusted as per the normalized width/height while creating ImageRecordReader. This means that the given value of delta will be treated relative to the normalized width/height specified while creating ImageRecordReader. So, let's say we perform 10 pixels of warping across the x/y axis in an image with a size of 100 x 100. If the image is normalized to a size of 30 x 30, then 3 pixels of warping will happen across the x/y axis. You need to experiment with different values for delta since there's no
constant/min/max delta value that can solve all types of image classification problems.
In step 3, we used RotateImageTransform to perform rotational image transformations by rotating the image samples on the angle mentioned.
In step 4, we added multiple image transformations with the help
of PipelineImageTransform into a pipeline to load them sequentially or randomly for training purposes. We have created a pipeline with the
List<Pair<ImageTransform,Double>> type. The Double value in Pair is the probability
Image transformations will help CNN to learn image patterns better.
Training on top of transformed images will further avoid the chances of overfitting.
There's more...
WarpImageTransform under the hood makes an internal call to the JavaCPP method, warpPerspective(), with the given properties, interMode, borderMode,
and borderValue. JavaCPP is an API that parses native C/C++ files and generates Java interfaces to act as a wrapper. We added the JavaCPP dependency for OpenCV in pom.xml earlier. This will enable us to exploit OpenCV libraries for image transformation.
Image preprocessing and the design of input layers
Normalization is a crucial preprocessing step for a CNN, just like for any feed forward networks. Image data is complex. Each image has several pixels of information. Also, each pixel is a source of information. We need to normalize this pixel value so that the neural network will not overfit/underfit while training. Convolution/subsampling layers also need to be specified while designing input layers for CNN. In this recipe, we will normalize and then design input layers for the CNN.
How to do it...
Create ImagePreProcessingScaler for image normalization:
1.
DataNormalization scaler = new ImagePreProcessingScaler(0,1);
Create a neural network configuration and add default hyperparameters:
2.
MultiLayerConfiguration.Builder builder = new
NeuralNetConfiguration.Builder().weightInit(WeightInit.DISTRIBUTION )
.dist(new NormalDistribution(0.0, 0.01)) .activation(Activation.RELU)
.updater(new Nesterovs(new StepSchedule(ScheduleType.ITERATION, 1e-2, 0.1, 100000), 0.9))
.biasUpdater(new Nesterovs(new
StepSchedule(ScheduleType.ITERATION, 2e-2, 0.1, 100000), 0.9)) .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) // normalize to prevent vanishing or exploding gradients
.l2(l2RegularizationParam) .list();
Create convolution layers for a CNN using ConvolutionLayer: 3.
builder.layer(new ConvolutionLayer.Builder(11,11) .nIn(channels)
.nOut(96) .stride(1,1)
.activation(Activation.RELU) .build());
Configure subsampling layers using SubsamplingLayer: 4.
builder.layer(new SubsamplingLayer.Builder(PoolingType.MAX) .kernelSize(kernelSize,kernelSize)
.build());
Normalize activation between layers using LocalResponseNormalization:
5.
builder.layer(1, new
LocalResponseNormalization.Builder().name("lrn1").build());
How it works...
In step 1, ImagePreProcessingScaler normalizes the pixels in a specified range of values (0, 1) . We will use this normalizer once we create iterators for the data.
In step 2, we have added hyperparameters such as an L2 regularization coefficient, a gradient normalization strategy, a gradient update algorithm, and an activation function globally (applicable for all layers).
In step 3, ConvolutionLayer requires you to mention the kernel dimensions (11*11 for the previous code). A kernel acts as a feature detector in the context of a CNN:
stride: Directs the space between each sample in an operation on a pixel grid.
channels: The number of input neurons. We mention the number of color channels here (RGB: 3).
OutGoingConnectionCount: The number of output neurons.
In step 4, SubsamplingLayer is a downsampling layer to reduce the amount of data to be transmitted or stored, and, at the same time, keep the significant features intact. Max pooling is the most commonly used sampling method. ConvolutionLayer is always followed by SubsamplingLayer.
Efficiency is a challenging task in the case of a CNN. It requires a lot of images, along with transformations, to train better. In step 4, LocalResponseNormalization improves the generalization power of a CNN. It performs a normalization operation right before performing ReLU activation
We add this as a separate layer placed between a convolution layer and a subsampling layer:
ConvolutionLayer is similar to a feed forward layer, but for performing two- dimensional convolution on images.
SubsamplingLayer is required for pooling/downsampling in CNNs.
ConvolutionLayer and SubsamplingLayer together form the input layers for a CNN and extract abstract features from images and pass them to the hidden layers for further processing.