Prediction on Digital Addiction among Undergraduate Students during Pandemic using Machine Learning

The aim of the final year project is to conduct detailed research on digital addictions on how it affects undergraduate students during the pandemic, and to come out with an analytical analysis using technical skills. The research on the project showed that digital addiction does not only focus on playing online video games, but it can also be in any kind of online activities. The study also showed that digital addiction grew throughout the pandemic and affected the student's personal health.

In most recent cases, it has shown that university students have gone through various types of mental health issues due to movement restrictions and spending hours on online activities. Other than that, it can also help the government bodies to take early actions to deal with students' welfare. First of all, I would like to express my gratitude to my university, Universiti Teknologi Petronas, which gave me the opportunity to complete my Final Year Project and provided enough resources for the future research on the project.

Aziz who fully supported me throughout my project as he guided me very patiently in all assigned tasks. Finally, I would like to thank my parents who have always supported me in completing the project and at the same time financially supported me.

Introduction
Background

Digital Addiction During Pandemic
Machine Learning Methods and Modelling
Undergraduate Students in Malaysia

Problem Statement
Objectives
Scope of Study

First of all, according to (Rahayu et al., 2020) digital addiction can have negative impacts on a specific person's life. The purpose of the projects is to create analytical analysis of the digital addiction among undergraduate students in Malaysia, either private or public universities during pandemic using machine learning. This project will be able to bring out a dashboard of the analysis of how digital addiction, whichever has become worse or better during pandemic.

Analytics can be used by any party involved in healthcare to come up with approaches that can help solve digital addiction. To predict the number of university students who have digital addiction, who are within the age group of 17 to 25 years. The focus of this project is to study how digital addiction has increased among university students, which will lead to mental and physical health.

The boundaries and limitations of this project is to create an analytical analysis of the digital addiction by .. a) Conduct research based on a digital addiction model provided by experts. The target users for the projects are bachelor students aged 17-25 who have been exposed to digital addiction or not.

INTRODUCTION
Digital addiction
Digital addiction parameters
Digital addiction effects on undergraduate students
Machine learning applications in digital addictions
K-Mean modelling on predicting addicted students

The addiction components are from medical experts, such as tolerance, salience, mood swings, harm, relapse, conflict, withdrawal, physical health, and loss of control (Aziz et al., 2021). Based on the listed components, we will conclude based on how poorly the undergraduate students face each of the components while performing their online activities. The digital addicts tend to face at least 5 of the mentioned components as they spend hours on screen doing their online activities.

According to (Aziz et al., 2021), the experience of streaming, satisfaction and engagement in computer games influence player attitudes and actual use of computer games, but when it comes to evaluating actual gameplay, enjoyment and social influence become compelling evidence . Moreover, these effects are easier to identify because it happens on the physical condition of the undergraduate students where it can be clearly seen without needing to be diagnosed by doctors and experts. I will mainly influence the person's health problem, such as mental health, psychology, physical health and anxiety (Aziz et al., 2021).

Mental health can be wide ranging like depression, personality disorders, psychotic disorders and more, different online activities that a person does can have different types of negative impact on them. These negative effects can affect them in three different areas, namely time management, social life and emotions (Latif et al., 2017). An individual who is addicted to online activities usually spends hours and does not count when they will do it, it can be during the time of studying, eating or taking a break, as their main purpose is to have fun.

This group of people had no choice but to limit their online activities or else the worst case could happen. It is mentioned that machine learning can be applied to any problem and can be narrowed down to become more complex. Additionally, there are various types of methods and modeling that can be used and implemented in machine learning that are suitable for data-related projects.

The reason why I use machine learning in the project is that (Mak et al., 2019) mentions the use of machine learning in addiction studies. According to (Klochko, n.d.), the basic concept is to use different clustering approaches to conduct an empirical comparative study and discover which methods provide the best clustering of data in solving a given problem. According to (Shi et al., 2010), clustering is a technique for logically classifying raw data and finding hidden patterns in data sets. Therefore, K-Mean modeling is suitable for the case study as it will group the students according to the level of addicted students according to the given parameter.

Figure 1 Digital addiction relationship diagram

Methodology
Project activities
GANTT CHART
Tools and Software

Planning: Identify the problem statement and project goals by narrowing the scope. At this stage, the project planning and what will be done should be clear and the purpose of the project should be defined. Design: Plan how to build a project that will achieve the goals and be able to solve the project's problem into a product.

The project will require using several types of scoring methods to ensure the model is accurate. Release: The roll-out stage of initial development where the project is put into development and actually runs state-run. This is because the project will require machine learning methods and modeling, and CRISP-DM is the appropriate method for the dataset training.

In this process, it is necessary to define project objectives related to the primary goal of the objective, which aims to create a Power Bi dashboard related to digital addiction during the pandemic. In addition, the activity of creating and distributing a survey to collect a data set is part of the business understanding process. The activities for this phase in the project are to collect all the data through the survey responses and ensure that they follow the data formats.

In addition, the quality of the survey data should be determined, whether it can be used or removed. Criteria to consider are data mining objectives, data quality, and technical limitations of the data. As shown in Figure 6 below, removing columns with null values is part of data preparation.

Other than that, it will require planning monitoring and maintenance of the important issues of data mining. The figure shows the complete schedule for FYP 1 of the project from week 1 to week 24. The tools and software used in the project development are Anaconda, Jupyter notebook and Power Bi.

Dataset
K-Means Modelling
Clustering Method

Silhouette Score
Calinski-Harabasz Score
Elbow plot graph
Cluster each of the students based on the cluster score

Data Visualization
Prediction on Digital Addiction

Low cluster of addiction (Cluster = 0)
Intermediate cluster of addiction (Cluster = 1)
High cluster of addiction (Cluster = 3)

The next step is to implement the k-means algorithm using sci-kit Learn with an initial set size of 12. The range of this metric is [-1, 1]. When dealing with higher dimensions, the silhouette result comes in handy for validating the clustering algorithm's operation, as no other type of visualization can be used to validate clustering when dimensions exceed three. The Calinski-Harabasz index, also known as the variance ratio criterion, is the ratio of the sum of the between-cluster distribution and the inter-cluster distribution for all clusters, the higher the score, the better the performances.

From the table above, it can be concluded that cluster size 3 has the highest score with 8.1214806. The distance between the mean of a group and the other data points in the group is at its shortest at what value of k, according to an elbow plot. The mean of the squared Euclidean distance from the centroid of the respective clusters is used to calculate distortion.

From the figure above, it shows that the decline of the squared distance starts to slow down at 3. From the Silhouete result and the Harabasz-Calinski estimation method, it is appropriate to use the cluster size of 3 as it fits the data set. Therefore, a group size of 3 is used along with K-means modeling to cluster the student population.

If the points are pixels in an image, the center of the group will be a pixel from that image. The next step is to get the total number of students of each group size. The tools can filter based on preferences to see the total number of students who have a low, intermediate and high level of addiction.

An Empirical Comparison of Machine Learning Clustering Methods in the Study of Internet Addiction Between.