The historical relationship between Education, prediction models, and the evolution of Data Mining:58

Nowadays graduates’ employability is a main issue for the organizations proposing higher education and a technique for early expectation of employability of the graduates is usually necessary to take an appropriate decision (Bridgstock, 2009).

A number of research studies apply several classification procedures of data mining, such as Bayesian methods, Multilayer Perceptron’s and Sequential Minimal Optimization (SMO), ensemble techniques and decision trees, to predict the employability of Master of Computer Applications (MCA) learners and find the algorithm which is most suitable for this issue (Mishra, Kumar and Gupta, 2016a).

In the same vein, another study assured the importance of data mining techniques in the educational field related to the history of graduates’ employment (Berland, Baker and Blikstein, 2014). Further, data mining has been implemented in diverse fields because of its capacity to promptly study massive volumes of data.

The cited study in this regard aimed to construct the Graduates Employment Model with a classification task in data mining, and to contrast some data-mining techniques such as Bayesian method and the Tree method.

The Bayesian technique contains five algorithms, such as AODE, BayesNet, HNB, Naive Bayes, WAODE.

The Tree technique contains five algorithms, such as BFTree, NBTree, REPTree, ID3, C4.5. The study

research handles a classification task in WEKA, and the authors compared the findings of each algorithm, as different classification models were produced. In order to authenticate the produced model, the experiments were constructed with authentic data gathered from graduate profiles at the Maejo University in Thailand.

The intended model is used for expecting whether a graduate was employed, unemployed, or in an unidentified case (Pääkkönen et al., 2013).

Moreover, a number of studies have focused on the required qualifications for improving the future employability (Thijssen, Van Der Heijden and Rocco, 2008). With the current massive focus on academic principles, features and graduate employability consequences, Australian Higher Education organizations have an enhanced need to improve and develop feedback instruments to help and support graduate employability results. Another paper states the improvement of the Graduate Employability Indicators (GEI), a set of surveys for graduates, employers and instructors, of the significance of 14 employment competences for graduate place of work accomplishment and their determination by prospective graduates until five years out. These surveys were assigned through an ALTC permit, Building course team ability for graduate employability, a collaborative project between Curtin University, RMIT University, University of Southern Queensland and Victoria University. The paper sets out the relationships and differentiations between the GEI and other pointers; such as the Australian Graduate Pathways Survey (GPS), the Australasian Survey of Student Engagement (AUSSE) and the National Survey of Student Engagement (NSSE), revealing its conceivable implementation in local and international benchmarking actions. Summary visual data on the viewpoints of graduates from one of the preliminary surveys is also constructed to determine the kind of data that can be collected from the surveys (Oliver and Jorre de St Jorre, 2018).Also, in Educational Data Mining (EDM) there is too much imprecise input information, ambiguity or vagueness in input data, thus a lot of problems can occur during the classification process (Hernández-Blanco et al., 2019). Fuzzy logic is a mathematical model proposed by Lotfi Zadeh in 1965 to solve this problem observed with conventional computer logic while manipulating the imprecise and vague data (Zadeh and Aliev, 2018). Fuzzy logic is an approach to computing based on degrees of truth (degrees of membership of specific class) rather than crisp logic, true or false, on which the modern computer is based (Anderson et al., 2009).

Fuzzy logic is a computational paradigm based on the manner of human thinking; it deals with the problems in the same manner as the human brain works, where it takes imprecise input then produce the precise (e.g.

grade is excellent, income is high).

Fuzzy logic can make development and implementation much simpler, and the techniques can give higher accuracy than other conventional techniques (Lingala et al., 2014).

Neuro-fuzzy algorithms combine neural networks with fuzzy logic where the fuzzy logic system builds according to the structure of the neural network.

There are two neuro-fuzzy systems:

1. Mamdani approach. The main characteristics of this approach are:

a. There is output membership function in this technique

b. The output of this approach is crisp value which is produced through the defuzzification process c. This approach can take multiple inputs then produce single or multiple outputs

d. High interpretability

e. The design of the system in this approach is not inelastic 2. Sugeno’s approach: the main characteristics are:

a. There is no output membership function in this technique

b. No need for defuzzification because the output of the last layer is crisp value

c. It takes multiple inputs then produce single output and it cannot produce multiple outputs d. There is a problem with interpretability

e. The system design can be elastic.

The two techniques, Neural networks and fuzzy logic could be integrated by two different ways:

1- Neuro-Fuzzy System (NFS)

A fuzzy logic is represented using the structure of the neural networks (NN) and trained using either a backpropagation (BP) algorithm or genetic algorithm (GA). The purpose is to enhance the performance of the fuzzy reasoning tool by representing the fuzzy system using NN structure and the network is trained using BPGA.

2- Fuzzy Neural Network (FNN)

The neurons of the neural networks are built using fuzzy set theory (‘Proceedings of the International Conference on Soft Computing for Problem Solving, SocProS 2011’, 2012); this method actually can be developed using three different ways:

1- Fuzzy input with real weight 2- Real input with fuzzy weight 3- Fuzzy input and fuzzy weight.

The fuzzy neural network has not become popular; but the neuro fuzzy system is very popular and has a lot of real-life application.

The current research study uses the neuro-fuzzy algorithm called Adaptive Network-Based Fuzzy Inference System (ANFIS), which uses Sugeno’s approach in the fuzzy system. Additionally, an interesting topic presented by researchers in higher education is to examine the importance of data mining in studying data in order to improve the quality of provision and address the demands of their graduates. Therefore, educational data mining appears as one of the most appropriate instruments to investigate academic data to classify patterns and assist in decision making influencing the educational field (Han et al., 2007). This cited article predicts the employability of IT graduates, applying nine variables. First, a number of classification algorithms in data mining were examined producing logistic regression with accuracy of 78.4 is applied.

According to logistic regression analysis, three academic variables clearly have an effect: IT_core, IT_professional and gender are illustrated as meaningful predictors for employability. The data was gathered from the five-year profiles of 515 graduates randomly selected from the employment office tracer study (Piad et al., 2016).

Adding to all of the above, it is essential to take a look at the other wing of employability which is the designers and constructors of the careers (Larson and Lockee, 2009). Many research studies all over the world had been conducted in this area; one study intended to emphasize improving a career proficiency model by exploring the relationships between career proficiency and career success from a career improvement viewpoint. The authors assured that career competencies explained in the cited study are an essential orientation for the designing of basic or general education programs in the hospitality area. Hospitality courses can propose a

“hospitality career and employability” program that introduces courses such as career identification, career designing, self-management, job-seeking and success approaches, problem solving skills, ethics and security in the place of the work, manner, collaboration, and coordination and networking skills. The study implemented a questionnaire survey to gather data from a group of 277 participants at 36 international places and applied the AMOS statistical software group to do structural equation modelling (SEM) for analysis. The findings of this study explained that career competency model is a multifaceted concept including four competency scopes that affect the career fulfilment of some department employees in international tourist hotels. Particularly, the competencies related to “career modification and management” competency measurements were the most effective competencies for career success. The career competencies in this study

can serve as an orientation for people designing basic or general education programs in the hospitality field (Wang and Tsai, 2014).

2.7 Data Mining Techniques Used for Employability:

This section delivers a detailed review of the literature regarding employability problem and how data mining techniques are widely implemented to support the study claim. Thus, the main sections introduced in the literature review highlight and discuss the most effective factors affect employment future.

Some researchers focus on data mining methods and algorithms to predict graduates’ employability; others study the attributes that impact employability. The following literature studied attributes of employability.

Al-Janabi (2010) proposed an approach depending on features (knowledge areas) gained from the logged data of employment and university graduates. He presented a model for analysing data of the IT graduates according to the employability knowledge areas in order to predict feedback recommendations to enhance the IT programs’ teaching and learning resources and processes towards the improvement of the programs’

learning outcomes. Furthermore, Artificial neural networks came from attempts to simulate the biological neural system, where the human brain consists of neurons linked together via axons. Each neuron connects to the axon of other neurons via dendrites. The point of convergence between axon and dendrites is called a synapse. Scientists discovered the learning process of the brain is carried out by changes to the strength of synaptic connection between neurons. Artificial neural network (ANN) corresponds to a set of nodes, linked as in a brain. ANN process one record at a time; it classifies the record then it compares the classification process with the actual class of the record. If an error exists in classification, the error is fed back to the network to modify the second iteration and so on. ANN is a popular technique, usually used when high accuracy and superior learning capability is desired even if the available training data is not large. There are various ANN methods but the most popular one is the Multilayer Perceptron Backpropagation Network (MLPBPN) algorithm, which is used in most ANN research studies (Albawi, Mohammed and Al-Zawi, 2018). Bezuidenhout (2011) developed the employability attributes framework (EAF). This framework illustrates a group of eight employability attributes that are considered as essential for boosting the probability of securing and sustaining employment opportunities (Bezuidenhout and Jeppesen, 2011). The EAF has the following eight measures: career self-management, cultural competence, career resilience,

proactivity, entrepreneurial orientation, sociability, self-efficacy, and emotional literacy. Sriram, Srinivas and Thammi (2014) presented a model to predict the attributes which play the main role in the employability of students. They used Maximally Specific Hypothesis in order to reduce the representation of rules. These hypotheses can be used to identify the key attributes needed for employability among graduates.

Isljamovic and Suknovic (2014) used different artificial neural network algorithms in order to find the best suited technique for prediction of students’ performance. Also, they studied which factors had a crucial influence on overall performance. The authors developed a method of multilayer neural network that has the ability to predict the success of students at the end of their studies. The main idea of their research was to make early prediction of students’ performance.

Thakar and Mehta (2017) studied the role of secondary attributes to enhance the prediction accuracy of students’ employability using data mining. They proved that prediction accuracy for students’ employability can be enhanced with the applying of secondary attributes such as personal, social, psychological and other environmental variables in the dataset.

The following are related to work that focused on the data mining methods or algorithms to predict graduate employability.

Piad (2018) proposed a technique to predict the employability of IT graduates. His study defines the influential attributes for supervised learning using data mining methods. He conducted a comparison between several classification data mining algorithms. These algorithms are Naive Bayes, J48, Simple Cart, Logistic Regression and Chaid Algorithms. The author proved that the Logistic Regression achieved the highest accuracy, and he found that three possible predictors with a direct effect on IT employability are the IT_core Subjects, IT_professional subjects and gender (Piad, 2018).

Jantawan et al. (2013) used real data of graduate students of Maejo University in Thailand over three academic years. They conducted several experiments using algorithms of Bayesian Network and Decision Tree to predict whether a graduate has been employed, remains unemployed, or is in an undetermined situation after graduation (Jantawan and Tsai, 2013).

Sapaat et al. (2011) built the graduates employability model using a classification method in data mining. To perform the classification, they used extracted data from web-based survey system from the Ministry of

Higher Education, Malaysia (MOHE) for the year 2009. Bayes algorithms were used to achieve classification.

In addition, they compared the performance of Bayes algorithms against a number of tree-based algorithms.

The comparison shows the superiority of the Decision Tree classification model over Bayes Network Classification Models (Sapaat et al., 2011).

Mishra et al. (2016) applied several classifiers to predict the employability of students and build an employability model based on proper classifiers. The authors used different classification methods of data mining such as Bayesian methods, Multilayer Perceptron’s and Sequential Minimal Optimization (SMO), Ensemble Methods and Decision Trees. They conducted a comparison between the classifiers to find the best classifier. A comparative study shows that J48 (a pruned C4. 5 decision tree) is most suitable for employability (Mishra, Kumar and Gupta, 2016b).

Khadilkar and Joshi (2017) proposed a predictive technique on employability using machine learning. In screening the resumes, they used text mining and appropriate weighting. They used several classifiers such as decision tree, K-NN, Naïve based approach, and Random Forest for employability prediction. Naïve based has the highest accuracy for the prediction of employability.

Rahman, Tan and Lim (2017) used supervised and unsupervised learning in data mining for employment prediction of fresh graduate students. These techniques were applied in features selection and determined the best model that can be used to predict the employment status of fresh graduates, either employed or unemployed. The algorithms in supervised and unsupervised learning, K-Nearest Neighbor, Naive Bayes, Decision Tree, Neural Network, Logistic Regression and Support Vector Machines, were compared to find which one achieved the best accuracy. They proved that K-Nearest Neighbor achieved the highest accuracy (Rahman, Tan and Lim, 2017).

Othman, Shan, Yusoff and Kee (2018) proposed a model that uses data mining techniques to discover the most important features that affect graduates’ employability. They collected seven years of data (from 2011 to 2017) through Malaysia’s Ministry of Education tracer study. The authors applied a set classification algorithms (three), Decision Tree, Support Vector Machines, and Artificial Neural Networks to develop the classification model, then compared them to reach the perfect performance. According to the authors, the decision tree J48 algorithm achieved higher accuracy compared to other algorithms, with a classification accuracy of 66.0651%, and it rose to 66.1824% after the process of parameter tuning. In their work, they

discovered seven variables affecting graduate employability: age, faculty, the field of study, co-curriculum, marital status, industrial internship and English skill. In addition to these variables, attribute age, industrial internship and faculty hold the information that influences the employability status (Othman et al., 2018).

Kumar and Babu (2019) applied supervised Machine Learning algorithms to analyse data collected from educational institutions to predict the employability of current students (not graduates). They collected the data from 500 students of several Engineering colleges in Hyderabad and used Supervised Machine Learning algorithms such as Decision Tree, Support Vector Machine, Gaussian Naïve Bayes and K-Nearest Neighbor to build an employability prediction model of students and to determine the factors affecting their employability. They found that Decision Tree and Support Vector Machine outperformed the Gaussian Naïve Bayes and K-Nearest Neighbor, by predicting the employability of the students with 98% accuracy. Also, they found that factors such as communication skills, aptitude and reasoning skills, mentor, family income status, and the quality of teaching in college affect the employability of students (Kumar and Babu, 2019).

Dalam dokumen Data Analytics: Adaptive Network-based Fuzzy Inference (Halaman 75-82)