Data preprocessing, feature engineering, and feature learning

Part 1 Fundamentals

4.3 Data preprocessing, feature engineering, and feature learning

101 Data preprocessing, feature engineering, and feature learning

4.3 Data preprocessing, feature engineering,

Additionally, the following stricter normalization practice is common and can help, although it isn’t always necessary (for example, you didn’t do this in the digit-classification example):

 Normalize each feature independently to have a mean of 0.

 Normalize each feature independently to have a standard deviation of 1.

This is easy to do with Numpy arrays:

x -= x.mean(axis=0) x /= x.std(axis=0)

HANDLINGMISSINGVALUES

You may sometimes have missing values in your data. For instance, in the house-price example, the first feature (the column of index 0 in the data) was the per capita crime rate. What if this feature wasn’t available for all samples? You’d then have missing values in the training or test data.

In general, with neural networks, it’s safe to input missing values as 0, with the con- dition that 0 isn’t already a meaningful value. The network will learn from exposure to the data that the value 0 means missing data and will start ignoring the value.

Note that if you’re expecting missing values in the test data, but the network was trained on data without any missing values, the network won’t have learned to ignore missing values! In this situation, you should artificially generate training samples with missing entries: copy some training samples several times, and drop some of the features that you expect are likely to be missing in the test data.

4.3.2 Feature engineering

Feature engineering is the process of using your own knowledge about the data and about the machine-learning algorithm at hand (in this case, a neural network) to make the algorithm work better by applying

hardcoded (nonlearned) transfor- mations to the data before it goes into the model. In many cases, it isn’t reasonable to expect a machine- learning model to be able to learn from completely arbitrary data. The data needs to be presented to the model in a way that will make the model’s job easier.

Let’s look at an intuitive example.

Suppose you’re trying to develop a model that can take as input an image of a clock and can output the time of day (see figure 4.3).

Assuming x is a 2D data matrix of shape (samples, features)

Raw data:

pixel grid

Better features:

clock hands’

coordinates

{x1: 0.7, y1: 0.7}

{x2: 0.5, y2: 0.0}

{x1: 0.0, y2: 1.0}

{x2: -0.38, 2: 0.32}

Even better features:

angles of clock hands

theta1: 45 theta2: 0

theta1: 90 theta2: 140

Figure 4.3 Feature engineering for reading the time on a clock

103 Data preprocessing, feature engineering, and feature learning

If you choose to use the raw pixels of the image as input data, then you have a difficult machine-learning problem on your hands. You’ll need a convolutional neural network to solve it, and you’ll have to expend quite a bit of computational resources to train the network.

But if you already understand the problem at a high level (you understand how humans read time on a clock face), then you can come up with much better input features for a machine-learning algorithm: for instance, it’s easy to write a five-line Python script to follow the black pixels of the clock hands and output the (x, y) coordinates of the tip of each hand. Then a simple machine-learning algorithm can learn to associate these coordinates with the appropriate time of day.

You can go even further: do a coordinate change, and express the (x, y) coordinates as polar coordinates with regard to the center of the image. Your input will become the angle theta of each clock hand. At this point, your features are making the problem so easy that no machine learning is required; a simple rounding opera- tion and dictionary lookup are enough to recover the approximate time of day.

That’s the essence of feature engineering: making a problem easier by expressing it in a simpler way. It usually requires understanding the problem in depth.

Before deep learning, feature engineering used to be critical, because classical shallow algorithms didn’t have hypothesis spaces rich enough to learn useful features by themselves. The way you presented the data to the algorithm was essential to its suc- cess. For instance, before convolutional neural networks became successful on the MNIST digit-classification problem, solutions were typically based on hardcoded features such as the number of loops in a digit image, the height of each digit in an image, a histogram of pixel values, and so on.

Fortunately, modern deep learning removes the need for most feature engineering, because neural networks are capable of automatically extracting useful features from raw data. Does this mean you don’t have to worry about feature engineering as long as you’re using deep neural networks? No, for two reasons:

 Good features still allow you to solve problems more elegantly while using fewer resources. For instance, it would be ridiculous to solve the problem of reading a clock face using a convolutional neural network.

 Good features let you solve a problem with far less data. The ability of deep- learning models to learn features on their own relies on having lots of training data available; if you have only a few samples, then the information value in their features becomes critical.

Dalam dokumen [Chollet] Deep Learning with Python (Halaman 115-118)