Data preprocessing - ENGINEERINGFOOD for AUTOMATION

Before data analysis, data preprocessing is necessary to remove “noise” from the data to let analysis and modeling tools work on “clean” data covering similar ranges.

In general, data preprocessing involves scaling all input and output data from a process to a certain range and the reduction of dimensionality. Many tools of analysis and modeling work better with data within similar ranges, so it is generally useful to scale raw input and output data to a common mean and range. The scaling methods are usually used to scale the input and output data to a mean of zero and a standard deviation of one or to a mean of zero with a range of plus or minus one.

Assume process input and output observations are uk and yk, then the data with zero mean and one standard deviation are

(3.1)

where and are the means of raw input and output data, respectively;

su and sy are the standard deviations of raw input and output data, respec-tively; k is a data sample sequential number; and N is the sample size or the number of samples.

In data preprocessing, a good example is that, for artificial neural net-work modeling (discussed further in later chapters), all input and output data should be scaled to the range of the processing units of the networks.

The most popular transfer function of the network processing units is the sigmoidal function

where x is the input of one of the processing units in the network. The range of the function is [0, 1]. So, the input and output data are often scaled as follows to fall into the range

(3.2) where umax and umin are the maximum and minimum values of the raw input data, respectively, and ymax and ymin are the maximum and minimum values of the raw output data, respectively. This scaling method scales the data into the range of [0, 1]. If any other ranges are required, only simple arithmetic is needed on the scaled data and . For example, if a range of [−1, 1]

is required, then and produce the data in this range that can actually be used to construct a different nonlinear transfer function of processing units of the networks

For dynamic systems, input and output data often contain constant or low frequency components. No model identification methods can remove

u˜k

uk–u s_u

---= y˜k

yk–y s_y

--- (k=1,…, N)

u y

S x( ) 1 1+e^−x

---=

u˜ k( ) u k( ) u– min

umax–umin

---=

y˜ k( ) y k( ) y– min

ymax–ymin

---=

u˜ k( ) y˜ k( ) 2u˜ k( ) 1– 2y˜ k( ) 1–

S x( ) 1 e– ^−x 1+e^−x

---=

the negative impact of these components on the modeling accuracy. Also in many cases, high frequency components in the data may not be beneficial for model identification. In general data analysis and modeling of dynamic systems, the input and output data need to be preprocessed to zero mean to eliminate high frequency components. This may significantly improve the accuracy of model identification.

For many years, transform theory has played an important role in data preprocessing and analysis. There are various types of transforms, but our emphasis is on the methods of Fourier and wavelet transforms because of their wide range applications in data preprocessing and analysis problems.

Fourier transform is a classic tool in signal processing and analysis.

Given a signal f(x), the one-dimensional discrete Fourier transform of it can be computed by

(3.3)

for k= 1, 2,…, N− 1 and . The signal f(x) can be reconstructed by

(3.4)

for x = 1, 2,…, N− 1.

If Fourier transform is expressed as follows

then,

(3.5) This is the translation property of the Fourier transform. In spectral analysis, the magnitude of the Fourier transform of f(x) is displayed as

(3.6) Note from Eq. (3.5) that a shift in the signal f(x) does not affect the magnitude of its Fourier transform

F k( ) 1

N---- f i( )e^{− j2pki/N}

i=0 N−1

∑

j = –1

f x( ) F i( )e^{− j2pxi/N}

i=0 N−1

∑

ℑ f x( ( )) = F k( )

ℑ f x x( ( – 0)) = F k( )e^{− j2pkx}⁰^/N

ℑ f x( ( )) = F k( )

F k( )e^{− j2pkx}⁰^/N = F k( )

In food quality inspection, frequency analysis techniques are useful to reveal characteristics of materials. Frequency analysis is based on discrete Fourier transformation of signal data from experiments, such as ultrasonic A-mode experiments. Then, spectral analysis will indicate which frequen-cies are most significant. Let us study a little bit more detail about frequency analysis for ultrasonic A-mode experiments. Figure 3.1 shows an ideal spectrum curve for a homogeneous ultrasonic signal. This curve is in a Gaussian, or normal, shape. The spectrum is symmetric about the peak frequency, fp; thus, there is no skewness. The frequency half-power points, fa and fb, in this case occur equidistant to either side of the peak. The central frequency can be calculated as

(3.7)

For a symmetric spectrum, fp, peak-power (resonant) frequency is equal to fc, the central frequency. The percentage bandwidth, B^∗, expresses the broadness of the curve as

(3.8)

The skewness, fsk, can be represented as

(3.9) Figure 3.1 Spectrum power curve for ultrasonic transducer. (Adapted from Park, 1991. With permission.)

f_p

f_a f_b

fc 1

2---(fa+ fb)

B^∗ fb– fa

---×100

fsk

fp– fa

fb– fp

---=

Local maxima is another parameter which is useful in spectral analysis.

It describes the multiple peaks of the Fourier spectrum from an ultrasonic signal.

Wavelet transform presents a breakthrough in signal processing and analysis. In comparison to the Fourier transform, wavelet basis functions are local both in frequency and time while Fourier basis functions are local only in frequency. Wavelets are “small” waves that should integrate to zero around the x axis. They localize functions well, and a “mother” wavelet can be translated and dilated into a series of wavelet basis functions. The basic idea of wavelets can be traced back to very early in the century. However, the development of the construction of compactly supported orthonormal wavelets (Daubechies, 1988) and the wavelet-based multiresolution analysis (Mallat, 1989) have resulted in extensive research and applications of wave-lets in recent years.

In practice, the wavelet analysis can be used to transform the raw signals, process the transformed signals, and transform the processed results inversely to get the processed, reconstructed signals. Given a signal f(x), a scaling function Φ(x), and a wavelet function Ψ(x), the one-dimensional discrete orthogonal wavelet transform of f(x) along the dyadic scales can be computed by Mallat’s recursive algorithm

(3.10)

(3.11)

where is the smoothed signal of f(x) at the resolution m and the sampling point k, is the detail signal of f(x) at the resolution m and sampling point k, h(i) is the impulse response of a unique low-pass FIR (finite impulse response) filter associated with Φ(x) at the sampling point k, g(i) and is the impulse response of a unique FIR filter associated with Φ(x) and Ψ(x) at the sampling point k.

This computation is a convolution process followed by j/2 subsampling at one-half rate. Here j = N, N − 1,…, 1 and , where M is the highest level of the resolution of the signal, is the number of signal samplings. In the preceding equations, is the smoothed signal at scale while is the detail signal that is present in but lost in at scale .

An image is a two-dimensional data array. The concepts and methods described previously are all extendable to the two-dimensional case. Espe-cially, we will see such an extension of wavelet transform to two dimensions in Section 3.3.2. These two-dimensional tools are very useful in image pro-cessing and analysis.

Image preprocessing is important for human perception and subsequent analysis. A poorly preprocessed image will be less understood by a human or computer analyzer. It is critical to remove the noises, adhere to the image,

f^{( )}^m( )k h i 2k( – ) f⁽^m+1⁾( )i

∑

d^{( )}^m( )k g i 2k( – ) f⁽^m+1⁾( )i

∑

f^{( )}^m( )k

d^{( )}^m( )k

N = 2^M

f^{( )}^m 2^m d^{( )}^m

f^{( )}^m f⁽^m+1⁾ 2^m+1

and enhance the region that we are concerned with in order to ensure the performance of an imaging system.

Images are subject to various types of noises. These noises may degrade the quality of an image and, hence, this image may not provide enough information. In order to improve the quality of an image, operations need to be performed on it to remove or decrease degradations suffered by the image in its acquisition, storage, or display. Through such preprocessing, the appear-ance of the image should be improved for human perception or subsequent analysis.

Image enhancement techniques are important in image preprocessing.

The purpose of image enhancement is to process an image to create a more suitable one than the original for a specific application. Gonzalez and Woods (1992) explained that the word specific is important because it establishes at the outset that the enhancement techniques are very much problem-oriented.

Thus, for example, a method that is quite useful for enhancing x-ray images may not necessarily be the best approach for enhancing pictures of apples taken by a digital camera.

Image enhancement techniques can be divided into two categories: spa-tial domain methods and frequency domain methods. Spaspa-tial domain refers to the image plane itself. Approaches in this category are based on direct manipulation of pixels in an image which includes point processing and spatial filtering (smoothing filtering and sharpening filtering) (Gonzalez and Woods, 1992). Frequency domain processing techniques are based on mod-ifying the Fourier transform of an image which involves the design of low-pass, highlow-pass, and homomorphic filters (Gonzalez and Woods, 1992).

Dalam dokumen ENGINEERINGFOOD for AUTOMATION (Halaman 62-67)