A mixture of the previous representations (hybrid paradigm)

The scope and methods of the study

6. A mixture of the previous representations (hybrid paradigm)

The above listed types of knowledge representation largely determine the frameworks for forecast performers. These frameworks are presented be- low [Langley, Simon, 1995]:

1. Neural networks use weights on the links to compute the activation level passed on for a given input case through the network. The activation of output nodes is transformed into numeric predictions or discrete deci- sions about the class of the input.

2. Instance-based learning includes one common scheme, it uses the target value of the stored nearest (according to some distance metric) case as a classification or predicted value for the current case.

3. Genetic Algorithms share the approach of neural networks and other paradigms, because genetic algorithms are often used to speed up the learning process for other paradigms.

4. Rule induction. The performer sorts cases down the branches of the decision tree or finds the rule whose conditions match the cases. The values stored in the if-part of the rules or the leaves of the tree are used as

target values (classes or numeric predictions).

5. Analytical learning. The forecast is produced through the use of background knowledge to construct a specific combination of rules for a current case. This combination of rules produces a forecast similar to that in rule induction. The process of constructing the combination of rules is called a proof or "explanation" of experience for that case.

The next important component of each of these paradigms is a learning mechanism. These mechanisms are very specific for different paradigms.

However, search methods like gradient descent search and parallel hill climbing play an essential role in many of these mechanisms.

Figure 1.5 shows the interaction of the components of a learning para-

digm. The training data and other available knowledge are embedded into

some form of knowledge representation. Then the learning mechanism (method, algorithm) uses them to produce a forecast performer and possibly a separate entity, learned knowledge, which can be communicated to human experts.

Figure 1.5. Learning paradigm

Neural network learning identifies the forecast performer, but does not produce knowledge in a form understandable by humans, IF-THEN rules.

The rule induction paradigm produces learned knowledge in the form of understandable IF-THEN rules and the forecast performer is a derivative from this form of knowledge.

Steps for learning. Langley and Simon [1995] pointed out the general

steps of machine learning presented in Figure 1.6. In general, data mining

follows these steps in the learning process.

These steps are challenging for many reasons. Collecting training examples has been a bottleneck for many years. Merging database and data mining technologies evidently speeds up collecting the training examples. Cur- rently, the least formalized steps are reformulating the actual problem as a learning problem and identifying an effective knowledge representation.

Figure 1.6. Data mining steps

1.8 Intellectual challenges in data mining

The importance of identifying an effective knowledge representation has been hidden by the data-collecting problem. Currently it has become in- creasingly evident that the effective knowledge representation is an important problem for the success of data mining. Close inspection of success- ful projects suggests mat much of the power comes not from the specific induction method, but from proper formulation of the problems and from crafting the representation to make learning tractable [Langley, Simon, 1995]. Thus, the conceptual challenges in data mining are:

– Proper formulation of the problems and

– Crafting the knowledge representation to make learning meaningful and tractable.

In this book, we specifically address conceptual challenges related to knowledge representation as related to relational date mining and date

types in Chapters 4-7.

Available data mining packages implement well-known procedures from the fields of machine learning, pattern recognition, neural networks, and data visualization. These packages emphasize look and feel (GUI) and the existence of functionality. Most academic research in this area so far has

focused on incremental modifications to current machine learning methods, and the speed–up of existing algorithms [Friedman, 1997].

The current trend shows three new technological challenges in data mining [Friedman, 1997]:

– Implementation of data mining tools using parallel computation of on- line queries.

– Direct interface of DBMS to data mining algorithms.

– Parallel implementations of basic data mining algorithms.

Some our advances in parallel data mining are presented in Section 2.8.

Munakata [1999] and Mitchell [1999] point out four especially promising and challenging areas:

– incorporation of background and associated knowledge,

– incorporation of more comprehensible, non-oversimplified, real-world types of data,

– human-computer interaction for extracting background knowledge and guiding data mining, and

– hybrid systems for taking advantage of different methods of data mining.

Fu [1999] noted “Lack of comprehension causes concern about the credi-

bility of the result when neural networks are applied to risky domains, such as patient care and financial investment”. Therefore, the development of a

special neural network whose knowledge can be decoded faithfully is con- sidered as a promising direction [Fu, 1999].

It is important for die future of data mining that the current growth of this technology is stimulated by requests from the database management area.

The database management area is neutral to the learning methods, which will be used. This already has produced an increased interest for hybrid

learning methods and cooperation among different professional groups de-

veloping and implementing learning methods.

Based upon new demands for data mining and recent achievements in in-

formation technology, the significant intellectual and commercial future of

the data mining methodology has been pointed out in many recent publica-

tions (e.g., [Friedman, 1997; Ramakrishnan, Grama, 1999]). R. Groth [1997]

cited “Bank systems and technology” [Jan., 1996] which states that data

mining is the most important application in financial services.

Numerical Data Mining Models and Financial

Dalam dokumen DATA MINING IN FINANCE (Halaman 31-35)