• Tidak ada hasil yang ditemukan

2.2 Educational data mining

2.2.2 Web mining

analysis came obvious. While another proposed models tended to detect a typical behaviour in the grouping structure of the students of a virtual campus. They used a clustering and a generative topographic mapping model to conduct experiments using the extracted data. The experiment results indicated that the proposed model neutralizes the negative impact of outliers on the data clustering process (Castro et al., 2005). More recently, two clustering methods; Hierarchical Clustering (Ward’s clustering) and Non-Hierarchical Clustering (k-Means clustering) are presented the literature. Researchers built profiles of student behaviour from learner’s’ activity in an online learning environment and also to create click-stream server data. They proposed a new approach for implementing data mining services in SCORM-compliant LMSs (Psaromiligkos et al., 2011). Accordingly, several clustering techniques such as Expectation Maximization, Hierarchical Clustering, Simple k-Means and X-Means as provided in WEKA software are proposed. They applied these methods to predict the potentiality of students’ performance: who could fail during online courses in a Learning Management System (LMS) to fill in the research gap of system integrations. Experiments in clustering were conducted using real data obtained from various online courses. That is why the authors have compared several classic clustering algorithms on several group of students using their defined features and analysed the meaning of the clusters they produced. (Bovo et al., 2013). Despite the fact that many methods have been proposed to classify data in order to analyse and predict facts and relationships, classification is the most important technique in data mining. In this study we used classification techniques in data mining to predict the employability of students (Syeda Farha Shazmeen, 2013). Many researchers have come to consider the idea of a classification technique is to categorize an attribute value into one of a group of possible classes and answer certain research questions that incorporate information to the readers such as classification definition (Wedyan, 2014). Therefore, Classification techniques are presented as supervised learning methods that classify data items into predefined class labels (Syeda Farha Shazmeen, 2013). Additionally, several techniques and algorithms have been proposed to perform classification tasks, some of which are listed below:

1. Decision Trees

2. Artificial Neural Networks 3. Support Vector Machine 4. K-Nearest Neighbor

5. Naïve-Bayes

These techniques enhance the efficiency of the algorithm to classify the data properly (Syeda Farha Shazmeen, 2013). For that reason, classifications have been widely used in EDM due to their high accuracy in prediction.

In this regard, Chen et al. (2000) used a decision tree technique and applied C5.0 algorithm and data cube information processing methodologies to monitor students’ behaviours and find the pedagogical rules on students’ learning performance from web logs. The induction analysis found potential student groups that have shared characteristics and reaction to a certain pedagogical strategy (Chen et al., 2000). After few years, another research study used four classifiers to categorize students based on features extracted from the logged web data in order to predict their final grades. By using a genetic algorithm weighing the features, they optimized a combination of classifiers (Minaei-Bidgoli and Punch, 2003). Hence, researchers applied a Bayesian classification technique to a student database to predict students’ performance with emphasis on identifying the difference between high fast learners and slow learners. According to the authors, the results of their work help to identify students who need special attention to reduce failing (Ahmed and Elaraby, 2014). Data mining is widely used in the educational field for different purposes, one of the most important topics are discussed such as using several classifications to study students’ performance, focusing on several factors that may impact students’ performance in higher education. Qualitative predictive models which were effectively able to predict students’ grades from a training dataset are presented. In one study, four decision tree algorithms were applied, in addition to the Naïve Bayes algorithm. The study found that not solely academic efforts impacted the students’ performance, but that many other factors also have an influence (Abu, 2016). However, Ahmed et al. (2014) implemented the decision tree (ID3) method as a classification technique in data mining to predict students’ final grades. They used different factors that were collected from the student’s’ database. According to the authors, the study assists to enhance the student’s’ performance and reduce the failure rate (Badr, Din and Elaraby, 2014). Unlike classification and clustering techniques, outlier detection methods handle unusual data, and can detect students with learning problems (Kou and Lu, 2016).

Thus, another research study applied two classification methods, Rule Induction and Naïve Bayesian classifier. The dataset was collected from graduate students’ data collected from the college of Science and Technology – Khanyounis. The students are clustered into groups using K-Means clustering. To detect all outliers in the data, they used Distance-based Approach and Density-based Approach (Tair and El-Halees, 2012). On the other hand, a model for automatic analysis of student interactions with a web-based learning

system is proposed in one more study. The result of automatic analysis provides useful information including decision trees that were used for predictions in later experiments. Also the generated decision trees were used in analysing data using machine learning techniques (Muehlenbrock, 2005). In this regard, Ueno (2004) proposed a method of online outlier detection of learners’ irregular learning processes by using a Bayesian predictive distribution. The outlier detection method uses a students’ response time data for the e-learning contents. This method can be used for short samples, and it helps a two-way instruction by using the results of mining processes (Ueno, 2004).