Learn Keras for Deep Neural Networks

He is an active data science teacher and blogs at http://blog.jojomoolayil.com. You can find out more about his various other activities at his website http://www.mswamynathan.com.

Introduction

What will you learn from this guide?

Who is this book for?

What is the approach to learning in the book?

How is the book structured?

Finally, Chapter 6 - the conclusion - discusses the way forward for the reader to further improve his/her deep learning skills and discusses some areas of active development and research in deep learning. By the end of this crash course, the reader will have gained a thorough understanding of deep learning principles in the shortest possible time frame and will have gained hands-on hands-on experience in developing enterprise-level deep learning solutions. enterprise in Keras.

An Introduction to Deep Learning

Introduction to DL

Maybe this was too high level or maybe hard to consume, so let's break it down step by step.

Demystifying the Buzzwords

We can simplify the three fields using a simple Venn diagram, as shown below. Finally, DL is a field within ML where intelligence is induced in systems without explicit programming using algorithms inspired by the biological functioning of the human brain.

What Are Some Classic Problems Solved by DL in Today’s Market?

Putting it all together, we can say that AI is the field of inducing intelligence in a machine or system artificially, with or without explicit programming.

Decomposing a DL Model

As you can see in the previous figure, the input data is consumed by the neurons in the first hidden layer, which then provides an output to the next layer and so on, finally resulting in the final output. The input is consumed by neurons in the first layer and an activation function is calculated within each neuron.

Exploring the Popular DL Frameworks

Several aspects of DL have been developed as reusable code that today can be used directly from frameworks that do an excellent job of abstracting the idea. That being said, let's take a look at the popular choices of DL frameworks used in the industry today.

Low-Level DL Frameworks

Similarly, in DL, there are a set of code blocks that can be reused for different types of use cases. The same algorithm with a different parameter value can be used for a different use case, so why not package the algorithm as a simple function or a class.

Theano

You can indeed develop a DNN from scratch, say in C++, Java or Python, with ~1000 lines of code, or probably use a framework and reuse available tools with maybe 10–15 lines of code.

Torch

PyTorch

MxNet

TensorFlow

High-Level DL Frameworks

The most popular high-level DL framework that provides a second-level abstraction for DL model development is Keras. Note While gluon works on top of Mxnet, and Lasagne on top of theano, Keras can work on top of tensorFlow, theano, Mxnet and .

Note While gluon works on top of Mxnet, and Lasagne on top of theano, Keras can work on top of tensorFlow, theano, Mxnet, and

Other frameworks like Gluon, Lasagna, and so on are also available, but Keras is the most commonly used framework.

A Sneak Peek into the Keras Framework

Let's say we have the age, the number of hours studied, and the average score out of 100 for all the previous tests in which he appeared as an input data point.

Getting the Data Ready

Defining the Model Structure

Training the Model and Making Predictions

Summary

Keras in Action

Setting Up the Environment

Selecting the Python Version

However, for our use case, I highly recommend starting with Python 3.x, as this is the future. Some may be reluctant to start with Python 3, assuming that there will be problems with many packages in 3.x, but for almost all practical use cases, we have all the major DL, ML and other useful packages already updated for 3.x.

Installing Python for Windows, Linux, or macOS

The Anaconda distribution of Python eases the process for DL and ML by installing all major Python packages required for DL.

Installing Keras and TensorFlow Back End

If you have installed one or more virtual environments, they will all appear in the drop-down menu; select the Python environment of your choice. If you are new to Jupyter, I recommend the available options in the navigation menu.

Getting Started with DL in Keras

Input Data

You can also change the layout of the training samples (ie, each row can be a training sample), so in the context of the student passing/failing the test example, one row would show all the attributes of a the student (his grades, age, etc.). Value normalization will bring all values in the input tensor into a range 0-1, and standardization will bring values into a range where the mean is 0 and the standard deviation is 1.

Neuron

An image is stored in the data as a three-dimensional tensor, where two dimensions define the pixel values on a 2D plane and a third dimension defines the values for RGB color channels. So essentially one image will be a three-dimensional tensor and n images will be a four-dimensional tensor, where the fourth dimension will stack a 3D tensor image as a training example.

Activation Function

To simplify the story, we can say that if your activation function is a linear function (no activation in principle), the derivative of that function becomes 0; this becomes a major problem because training with the backpropagation algorithm helps provide feedback to the network about misclassifications and therefore helps a neuron adjust its weight using a derivative of the function. To keep things simple, we would always need a non-linear activation function (at least in all hidden layers) for the network to learn properly.

Sigmoid Activation Function

To put it another way, we can say that there is really no point in having DNN, since the output of only one layer would be equivalent to having n layers.

ReLU Activation Function

But because of the horizontal line with 0 as the result, we can sometimes face serious problems. We can directly use the activation function by setting the alpha value with a small constant.

Model

Layers

To simplify the model development process, Keras offers us several types of layers and different tools to connect them. However, we will take a closer look at some layers and also take a look through some important layers for other advanced use cases that you can explore later.

Core Layers

Dense Layer

We can use the input_dim attribute to define how many dimensions the input has. For example, if we have a table with 10 features and 1000 samples, we need to provide the input_dim as 10 for the layer to understand the shape of input data.

Dropout Layer

These additional parameters become important when working in specialized use cases where it is important to use specific types of constraints and initializers for a given layer.

Other Important Layers

You can also write your own layers in Keras for a different type of use case.

The Loss Function

Binary cross-entropy: Defines the loss when the categorical outcomes are a binary variable, i.e. with two possible outcomes: (pass/fail) or (yes/no). Categorical cross-entropy: defines the loss when the categorical outcomes are non-binary, i.e. >2 possible outcomes: (Yes/No/Maybe) or (Type 1/Type 2/… Type n).

Optimizers

This is called an iteration (ie, a successful pass of all samples in a set followed by a weight update in the network). Computation of all training samples given in the input data with group-by-group weight updates is called an epoch.

Stochastic Gradient Descent (SGD)

Adam

It determines the momentum and variance of the loss gradient and exploits the combined effect to update the weight parameters. Together, momentum and variance help smooth the learning curve and effectively improve the learning process.

Other Important Optimizers

The default values work quite effectively and do not need to be changed in most use cases.

Metrics

Model Configuration

Once you've designed your network, Keras provides you with an easy one-step model configuration process with the 'compile' command. We compile the model with Adam optimizer and define binary cross-entropy as the loss function and "accuracy" as the metric for validation.

Model Training

Keras provides a fit function for a training model object with the given training data. At this point, it is assumed that you have the model architecture defined and configured (compiled) as described previously.

Model Evaluation

We can see that after each epoch the model prints the average training loss and accuracy as well as the validation loss and accuracy. In the evaluate method, the model returns the loss value and all metrics defined in the model configuration.

Putting All the Building Blocks Together

MeDV median value of owner-occupied homes in the $1000's To look at the contents of the training dataset, we can use the index slicing option from Python's numpy library for the numpy n-dimensional arrays. If we take a closer look at the loss and MAPE of the validation datasets, we can see a significant improvement.

Deep Neural Networks for Supervised

Learning: Regression

Getting Started

Provides some large datasets from NASA NEX and Openstreetmap, the Deutsche Bank public dataset, and so on. We will use the Kaggle public data repository to get datasets for our DL use case.

Problem Statement

First, we need to reframe the problem statement in a slightly strategic way to be able to represent the problem statement as a design solution. We will use one of the aforementioned frameworks to represent our problem statement in a more effective and concise manner.

Why Is Representing a Problem Statement with a Design Principle Important?

There are several problem solving frameworks recognized by the market to help define and represent a problem statement in a standard way so that the problem can be solved more effectively. Although it may not be possible to build the entire solution at the beginning, given that the entire process is iterative and exploratory, we can still do a better job of representing the problem and presenting the solution at a high level. to approach.

Designing an SCQ

The team has hit a roadblock as they don't have the resources to estimate future sales for a particular store. Therefore, to solve the problem, we ask the following question: "How can we estimate the future sales of the store?" Now that the roadblock has been overcome, the marketing team now has the means to study and estimate future sales in the store and thus plan more effectively.

Designing the Solution

In our use case, we can consider the data in a way that it can be represented as. The process will be clear as we explore the data and get closer to model development.

Exploring the Data

Once we have entered the data, the first thing we need to examine is the length, width and data type. The header method displays the first 5 rows of the data frame, and we can look at the contents of the data by looking through the self-explanatory column names.

Looking at the Data Dictionary

The shape tells us that we have all the columns from the two data frames in one unified data frame. We'll start by finding the number of unique stores in the data, the number of unique days for which we have data, and the average sales for all stores.

Finding Data Types

The pandas dataframe unique method returns the list of unique elements for the selected column, and the len function returns the total number of elements in the list. The dataframe average method returns the average for the selected column, in our case the revenue.

Working with Time

As we would expect, we can see seven distinct values, with a similar number of records in each, for the "day of the week" property. Since we already have the date as a feature, we could have directly used the date column to create the day of the week and also create some other features.

Predicting Sales

This suggests that most stores have sales in the range of 0-20,000, and only a few stores have sales above 20,000.

Exploring Numeric Columns

Bold rows indicate the high number of missing data points in the corresponding columns. For now, in this use case, we'll use mode to fill in the blanks where we have missing values.

Understanding the Categorical Features

To cement our understanding of the observation, we can simply control the number of data points in each category using the same barplot function with an additional parameter setting. We can observe that the distribution of data points across different classes within a category is skewed.

Data Engineering

As you can see, the data form looks fine with the new encoded data form, but there is at least one column that has object as the data type within our data frame. Let's take a look at the property's content before converting it to numeric or hotcoded form.

Defining Model Baseline Performance

For a regression model, if we assume that the average value of sales in the training data set is the prediction for all samples in the test data set, we will have a basic benchmark score. The DL model must score at least better than this to be considered useful.

Designing the DNN

At the end of each epoch, the model uses the validation dataset to evaluate and report the metrics we configured. The performance of the model on the validation dataset was 697, which is much better than our baseline score.

Testing the Model Performance

Improving the Model

We can see that as we created a deeper model, the performance on the test dataset further improved. Let's try a few more experiments to see if we can expect further improved performance.

Increasing the Number of Neurons

We can use the history, after training, to visualize and understand the learning curve of the model. As you may have noticed, we have seen a further improvement in overall test performance for the model with deeper architectures.

Plotting the Loss Metric Across Epochs

We can probably increase the number of time periods to test whether model performance improves further. Of course, this comes with a significant amount of computational time for training, but once you've finalized the architecture for your model, you can increase the number of epochs for training and see if there was any further improvement.

Testing the Model Manually

We can see that after a point the net loss reduction was quite low, but still relatively good. In the next chapter, we will look at another business problem that we can solve using a DNN for classification in supervised learning.

Learning: Classification

Once you accept the contest rules, you can download the dataset from the "Data" tab or www.kaggle. We'll use the same environment for the use case, but before we begin, let's review the problem statement and define the SCQ and solution approach, just as we did in Chapter 3.

Designing the SCQ

How Can We Identify a Potential Customer?

And we see that none of the columns in the customer dataset are missing values. In activity data, we need to drop columns that have 90% missing values as they cannot be corrected.

Defining Model Baseline Accuracy

Right now, we have the training data in the desired form for building DL models for classification. If we build a model that gives us an overall accuracy somewhere below our standard, then it would be of virtually no use.

Designing the DNN for Classification

The training and validation accuracy from the deeper network is nowhere near what you would expect. The training and validation accuracies from a medium-sized network are nowhere near what you would expect.

Revisiting the Data

Standardize, Normalize, or Scale the Data

In standardization, we transform the data into a form where the mean is 0 and the standard deviation is 1. The distribution of the data in this form is a good input candidate for our neuron's activation function and therefore improves the ability to learn more appropriately.

Transforming the Input Data

Now that we have the benchmark data sets scaled, we can provide this newly added data for training.

DNNs for Classification with Improved Data

Let's use this model to evaluate the model's performance on the test data sets we created earlier. Accuracy metrics for the training and validation datasets are also stored in the model history.

Tuning and Deploying Deep Neural Networks

The Problem of Overfitting

The five-year-old continues to watch every day and slowly learns that his mother will bake a cake every Sunday. Sunday, then mother will bake cakes.” One fine Sunday, his mother had to go out on an errand and had no time to bake a cake.

So, What Is Regularization?

Similarly, when a DL model learns from the noise and adapts by adjusting the weights to fit the noise, it overfits the data. Depending on how the weights are added to the loss function, we have two different types of regularization techniques: L1 and L2.

L1 Regularization

L2 Regularization

Dropout Regularization

A parameter value of 0.25 indicates the dropout rate (ie, the percentage of neurons that will be removed).

Hyperparameter Tuning

When we design a DNN, the architecture of the model is defined by some high-level artifacts. These parameters cannot be trained; in fact, they should be chosen with experience and judgment, just like the rules we learned in Chapter 3 for deciding on the size of the architecture to start with.

Hyperparameters in DL

These artifacts can be the number of neurons in a layer, the number of hidden layers, the activation function, the optimizer, the learning rate of the architecture, the number of epochs, the batch size, etc. All these parameters together are used to design the network and have a great impact on the learning process of the model and its final performance.

Number of Neurons in a Layer

It is good to have the number of neurons in the power of 2, as it helps the calculation of the network to be faster. Based on the number of input dimensions, take the number closest to 2 times the size.

Number of Layers

Number of Epochs

Weight Initialization

Batch Size

Learning Rate

However, in some specific cases, you may cross a use case where it might be better to use a lower learning rate, or maybe a slightly higher one. You could almost always proceed with ReLU as an activation for any use case and get favorable results.

Optimization

Approaches for Hyperparameter Tuning

Manual Search

Grid Search

For example, for learning rate (0.1), the vertical column shows the different models that will be developed with different values for optimizer and the batch size. The advantage of this approach is that it gives the best model for the defined grid of hyperparameters.

Random Search

Tailoring the Test Data

For the development of the model, you will make the effort to get the data from these different sources and bring them into a unified form. This myth will eventually be exploded as there is some serious harmony that needs to be established to get this part of the puzzle in place.

Saving Models to Memory

We define when to save the model and which metric to measure and where to save the model. The file path uses a naming convention where it stores the model weights in a file with the file name representing the epoch number and the corresponding accuracy metric.

Retraining the Models with New Data

All you need to do is use the weights of the already trained model and provide additional data with a few epochs to pass and iterate over the new samples. The weights it has already learned need not be discarded; you can simply use the pause and resume formula and continue with the incremental data.

Online Models

Delivering Your Model As an API

You can also explore how to deploy your model as an API with AWS Sagemaker in a few steps here: https://docs.aws.amazon.com/.

Putting All the Pieces of the Puzzle Together

In this chapter, we discussed the methods and strategies to look forward to when the model performance does not match your expectations. Finally, we also looked at the options we can use to deploy the model and looked at a baby architecture to deploy the model using Flask.

The Path Ahead

What’s Next for DL Expertise?

Importing CNN related layers as described in Chapter 2 from keras.layers.convolutional import Conv2D, MaxPooling2D from keras.utils import np_utils. Import necessary packages from keras.datasets import imdb from keras.models import Sequential from keras.layers import Dense, LSTM.

CNN + RNN

To experiment more and study some really cool examples, you can check out some popular git repositories for LSTM related use case. Today, we have a smaller version of that implemented in some apps that can be installed on the phone and work with the phone's camera.

Why Do We Need GPU for DL?

So, to display something on the screen for just a second, the computer internally calculates the values for the array at least 30 times. The massively parallel processing on the matrices to render the smooth graphical content on the screen can instead be used to handle the computation in the DL model training process.

Other Hot Areas in DL (GAN)

It trains the GAN with a bunch of t-shirt designs and the model generates new designs. Imagine you saw a criminal on the road and the police need your help to sketch his face for further investigation; with future GAN systems, we can imagine a system where you describe the details of a criminal's face, and the system sketches your face.

Concluding Thoughts

Index