• Tidak ada hasil yang ditemukan

Ema Aveleyra, Melisa Proyetti and Diego Racero

Performance Forecasting of University Students Using Machine

Ema Aveleyra, Melisa Proyetti and Diego Racero

ƒ Development and learning of the machine learning tool and the model testing; and

ƒ Design of preventive strategies to help students in the learning process with the aim of improving their performance.

2. Methodology and development

An action research is carried out, and it may be cast as a binary classification problem, where the two prediction labels are students who are prone to fail the course, and students who are prone to pass (Gerritsen, 2017). The final purpose here is to help students whose predictions are negative. Then, educational materials will be employed to reinforce the learning of the contents. In this way, students obtain more tools to learn appropriate physics models and pass the course. The variables studied to carry out this analysis are the marks of three self- tests, attendance to classes and the marks of face to face exams. The techniques used for analysis in this research process are network and exam analysis. For this research, two samples are considered. A primary sample, called data, is required as an input for model training. In this case, it consists of 200 students from previous iterations of Physics I courses. The secondary sample, used to predict the result of the second exam, consists of 46 students. Both samples are convenience samples.

2.1 Learning analytics

There are two common terms that are close to big data sets: learning analytics (LA) and educational data mining (EDM). LA refers to measurement, analysis, collection of student information and learning context. On the other hand, EDM is concerned with developing methods for exploring data sets of education and using them to understand students and their learning environment. Both of them share the goal of improving education, but they have some differences. EDM has a focus on automated discovery, whereas LA has a focus on influencing human judgment. Moreover, in EDM research it is more common to see a reduction of phenomena to components and relationships between them. On the other hand, LA researchers attempt to understand systems as wholes (Siemens & Baker, 2012). So, in this work, the study is conducted in the field of learning analytics because it tries to predict students’ achievements based on big data sets of students’ experiences and logs available in the database of Learning Management Systems (LMS) of the School, to change students’

performance (Morabito, 2015).

A model consisting of the above mentioned tensor of variables (features) and a label with two categories are chosen. In order to implement machine training, data from the previous three iterations of Physics I courses are loaded. The training data is loaded into a .csv file and it is transformed into a features and label tensor using the Keras API. Finally, to make the forecast, a tensor with the features of the current course, without labels, is entered. The set of data is divided into two subsets: the test setand the training set, that is used to train the Deep Neuronal Network (DNN) which forms the model.

There are two numbers that characterize the performance of the model: loss and accuracy. The loss value is not a percentage, it refers to how well the model is doing for the training set and the test set. On the other hand, accuracy is the probability of success. In order to minimize loss and maximize accuracy, the best number of epochs, which is the number of times to loop over the dataset collection, is determined. Counter intuitively, training a model for longer does not guarantee a better result. In doing this, two problems may appear:

overfitting and underfitting. The sample sizeis a subset of data used to perform the estimation of the parameters which give the model accuracy and loss (Gerritsen, 2017). So, the problem is addressed in the following 5 steps:

ƒ Import and analysis of the dataset.

ƒ Selection of the type of model.

ƒ Model training.

ƒ Evaluation (test) of the effectiveness of the model.

ƒ Use of the trained model to make predictions.

2.2 Keras

Keras is a high-level neural network API, written in Python and capable of running on top of TensorFlow.

TensorFlow is used to develop students’ classification problems. Its main function is to use neural networks to detect and decipher patterns and correlations analogous to human learning and reasoning. Keras supports the

Ema Aveleyra, Melisa Proyetti and Diego Racero

creation of easy and fast prototypes (TensorFlow´s code) and convolutional and recurring networks, as well as combinations of both of them. The implementation of Keras and its different stages are shown in the table extracted from the TensorFlow page (https://www.tensorflow.org/get_started/get_ started_for_beginners)

Figure 1: Keras’ different stages

2.3 Code implementation

Machine learning is the process of training a model constituted of features and labels to make it able to estimate the values of future labels based on current features. The ideal number of hidden layers and neurons depends on the problem and the dataset. Like many aspects of machine learning, picking the best shape of the neural network requires a mixture of knowledge and experimentation. As a rule of thumb, increasing the number of hidden layers and neurons typically creates a more powerful model, which requires more data in order to train effectively.

For the implementation of the model, API Keras is used, so Python and all the dependencies necessaries to allow TensorFlow to run with Keras must be installed. The implementation requires information of students from other years, which define the features. When it comes to developing the code, the first thing to do is activate the eager execution, thus TensorFlow will be executed in its fastest way. During the process, an analysis of the data of different marks of the students and of the attendance index of offline class is carried out. The objective is to use the marks of self-tests and of the first exam, in conjunction with attendance, to be able to estimate the mark of the second exam. As the model does not use continuous variables for labels, two categories are established: 1 pass, 0 fail.

The Bucketize function, with the marks of the first partial exam, is used to form a tensor that divides the marks into three categories. Data binning or bucketing is a data pre-processing technique used to reduce the effects of minor observation errors. The original data values which fall in a given interval, a bin, are replaced by a value representative of that interval. That is a form of quantization.

Data analysis is treated, as in Python dictionaries, in the pre-process instance. Another measure to take, so as to avoid overfitting, is to divide randomly and in equal proportion students who pass and fail the exam. A training of the data set formed with labels and features, and using the sequential model is then carried out, with a hidden network of ten nodes and two levels. The input has the shape of five features which are continuous variables and two levels of categorical labels.

Epoch is a hyper parameter that can be tuned. Choosing the right number of epochs usually requires both experience and experimentation. If the number is not chosen in the correct way, overfitting or under fitting happens. Overfitting occurs when a model learns the training data too well and cannot generalize. Underfitting, the opposite of overfitting, can also happen with supervised learning. In the case of underfitting, the model is unable to make accurate predictions with both training data and new data.

Ema Aveleyra, Melisa Proyetti and Diego Racero

The test values are introduced in the same way as training values, for which two categories (0 = fail, 1 = pass) are defined. Once the model is found to be trained with the training and test sets, five features are introduced in an unlabelled tensor, which is formed by the marks of the students of the real course. The stack of programs used to perform the analysis can be summarized in the following table, which was extracted directly from the TensorFlow page (https://www.tensorflow.org/get_started/premade_estimators).

Figure 2: Tensorflow table

3. Results

Students’ data from old periods was taken and used to build a dataset consisting of two subsets: the training set and the test set. Previously, the pre-process function was used to build the tensor that was applied in the DNN model. Actions to avoid overfitting, underfitting and to minimize the loss were taken.

Finally, an unlabelled tensor form was introduced with the current students’ marks. The model detects the unlabelled data and places on the function a label of what is learned during the training process. The pass/fail forecasts for students in the second exam were determined. With such input, extra activities will be prepared for those students, such as self-tests with the explanation of the errors, videos to explain important concepts, materials for reading, or additional video-streaming practice classes.

Once the training and test set have been selected, the accuracy and the epoch are selected. The prediction is made with a percentage of error lower than 10% because accuracy is about 90%, which is the probability of being right when predicting whether a student passes or not the exam that is set as a label. The best number of epoch is 752. Both parameters are shown in graphic 1. Moreover, and as it has already been mentioned, we can see that training a model for longer does not guarantee a better result.

Graphic 1: Accuracy as a function of the epochs

4. Conclusions

A model able to predict future results for students’ marks has been defined. It is a fact that machine learning is like human learning: first, it needs to have a question to be answered and a little background knowledge of the correlation between the variables that are going to be defined as features and labels. The second step involves

Ema Aveleyra, Melisa Proyetti and Diego Racero

the necessary pre-process to accommodate the data in the way the API needs to treat it; the type of result, which is as a categorical label, must also be provided to shape the data accordingly. Finally, the model chosen should analyse the data splitting it in datasets of training and test, which will be formed taking into account different parameters and will draw conclusions in the form of results of the unlabelled set.

The analysis of the results of the second exams was not the one expected. Taking into account the results obtained during the training process, an accuracy of about 93% was expected, but a value of 72% was actually obtained. Incorporating more variables to the model (previous training, country of origin, start date of studies, if students work and for how many hours) is highly necessary. In order to find out those values, data from another source beyond the e-learning platform needs to be incorporated. Thus, a system with Python bottle and MongoDB is under construction, in order to gather the information and create a .csv file to be used, in final instance, with TensorFlow. The reason underlying the choice of MongoDB is the possibility of performing a specific kind of update called upsert. This operation is useful while importing data from external sources. It will update existing documents if matched, otherwise it will insert new documents into collection.

The character of the model can be considered in the way that it only evaluates students inside the subject. Using a global model (Hartman et al., 2016) would be interesting because expected results can be evaluated within subjects, using students’ results in preceding subjects.

References

Barberà, E. and Badia, A. (2004) Educar con aulas virtuales, Antonio Machado libros S.A., Madrid.

Cabero, J. and Barroso, J. (2015) Nuevos restos en tecnología educativa, Síntesis, Madrid.

Forokhmehr, M. and Fatemi, S. (2016) Implementing Machine Learning on a big Data Engine for e-learning”, Paper readat 15th European Conference on e-Learning ECEL 2016, Prague, Czech Republic, October.

Gerritsen, L. (2017) Predicting student performance with Neural Networks (Thesis of Master), Tilburg University, The Netherlands.

Hartman, D., Petkovová, L., Hybsová, A., Cadi, J. and Nový, J. (2016) “Prediction Model for Success of Students at University level”, Paper read at 15th European Conference on e-Learning ECEL 2016, Prague, Czech Republic, October.

Kotsiantis, S., Pierrakeas, C., Zaharakis, I. and Pintelas, P. (2003) Efficiency of Machine Learning Techniques in Predicting Students’ performance in distance learning systems, Recent advances in Mechanics and Related Fields, University of Patras, Greece.

Morabito, V. (2015) Big Data Analysis: Strategic and Organizational Impacts, Springer International Publishing, Milan, Italy.

Siemens, G. and Baker, R. (2012) “Learning analytics and educational data mining: towards communication and collaboration”, Paper read at the 2nd International Conference on Learning Analytics and Knowledge, Vancouver, Canada, April.

Xu, J., Han, Y., Marcu, D. andSchaar, M.(2017) “Progressive Prediction of Student Performance in College Programs”, Paper read at the Thirty First AAAI Conference on Artificial Intelligence, San Francisco, California, February.

Student Engagement and Perceptions of Quality in Flexible Online