Identification of various user group in university counseling scene using machine learning algorithm

Yet university counseling's quantitative and qualitative capabilities are insufficient to handle the increasing need and level of difficulty. Therefore, in order to solve current problems by increasing the service efficiency of the university counseling center's service, the identification of different user groups was made using machine learning algorithm based on initial data in the counseling service. First, service (substance use counseling or clinical treatment) and clinical (suicide risk and potential dropout) group classification was performed using supervised learning algorithms and identified important features in the classification of each group.

The current research successfully established meaningful references for the university guidance service, using data focused on the early stages of the service. After performing the classifications of different user groups, the result of ensemble modeling using the stack classifier was improved to achieve better performance on type 2 errors in classification results.

INTRODUCTION

As a result, users' symptoms and problems are sometimes misdiagnosed, and some problems that could have been solved earlier cannot be properly treated or even worsen. (이영은, 차영은, & 민경화, 2013) However, many counselors report that it is difficult to expand the capacity of counseling centers to handle the increasing demand due to the lack of understanding of the importance of the counseling center at the university headquarters. (노윤신 & 이성성, 2019). To solve such problems, the current study carried out the identification of different groups of users using a machine learning algorithm to improve the service efficiency of university counseling centers. Specifically, this study aims to perform a classification of user groups based on application data of counseling services using a machine learning algorithm and provide a reference for clinical decision-making and future service planning.

We classified users into counseling or clinical treatment groups based on the service request form submitted by the users and discover other potential groups.

RELATED WORKS

This has the advantage of discussing the specific group of college students and various groups and characteristics that need to be addressed in future services. However, the study used data recorded in the form of self-report of symptoms, not the actual screening method. In present study, data collected from the initial stage of the actual service was used.

METHOD

Data Introduction

Structure of service application form and feature characteristics

There was no direct service record for suicide risk group classification. After classification into the service group, the same procedure was used for the suicidal risk group and the dropout risk group. Similar to suicide risk grouping, dropout risk grouping showed low efficacy compared to service user grouping.

Models except KNN and LDA classified all dropout risk groups as non-risk groups. Feature selection method in dropout risk group, which selected particularly many features that barely identified dropout risk correctly.

Research methods

Data pre-processing
User group classification
Latent user group identification

RESULTS

User service group classification

Service group classification result using all data
Service group classification result after counselor feedback

Based on the results, clinical feedback on the project was discussed and after several adjustment steps, the service group classification was performed using the same process with adjusted data. Feature selection methods were applied after classification to select important features in classifying each service group. As a result, 'visit_purpose_3', which indicates that the purpose of visiting the counseling center was clinical treatment, and 'additional_survey', which means that 8 12-week follow-up surveys were recorded, were selected as important features in classifying the counseling and clinical treatment group.

Afterwards, consultant feedback was held at the UNIST Health Center to evaluate the results of the current study and gain insight into health services. Based on the classification results using the raw preprocessed data, feedback was given to the health center clinicians. As a result, the purpose of the visit ('visit_purpose') often changes after the intake interview, and previous classification results have shown a classification bias due to the explicit desire for counselling/clinical treatment.

Variable' in table 1 means the name of each selected variable, and each row filled with yellow in SGL, Lasso and ElasticNet column means that a specific variable is selected. As a result of feature selection, several features were selected in general satisfaction, suicidal problem, depression and anxiety, sleep problem, various symptoms and problem areas. This is consistent with the prediction that there will be various severities and user demand within the service group.

As a result of tuning the hyperparameters in Table 2, the model using the features selected by ElasticNet resulted in the best performance. And SVM, MLP, ExT showed the best and especially improved results as a result of hyperparameter tuning. In the confusion matrix score comparison, ensemble modeling using features from SGL showed the lowest type 2 error in classifying clinical treatment over counseling.

Additional clinical group classification

Suicidal risk group
Dropout risk group

The cross-validation results before hyperparameter tuning show low performance compared to user service group classification. In Figure 9, the importances of the ExT traits were: 'No taste or overeating (phq_5)', 'Total sum of urge symptoms (sum_urge)', 'Suicidal/self-harming thoughts due to depression (phq_9)', 'Past experience of professional treatment (q9_prof_help) ', 'Symptom of anger(anger)', 'Suicidal thoughts(q13_suid_thou_a)' were selected as important features in classifying the suicidal risk group. As a result of setting the hyperparameters in Table 6, the model using the features selected by ElasticNet achieved the best performance.

However, due to radical improvements in the models, this may be due to overfitting to the target value. Although the performance of each model increased significantly after ensemble modeling, confusion matrix validation showed that some models classified all users into the non-suicidal risk group, which was the majority of all users. In addition, SGL selected many more features than others, and these features were associated with ability problems, relationship conflict.

Characteristics related to sleep and relationship problems were considered important when classifying the dropout group. Key features selected from AdaBoost when classifying the risk group for school dropout were 'total sum of physiological symptoms (sum_physio)', 'daytime functional disorder (fd_days_lost)', 'Tasteless or overeating due to depression (phq_5)', 'Problems in personal sleep quality(psqi_c1_personal_slq)', 'Relational satisfaction(q2_relation_sat)', 'Falling asleep for more than 30 minutes(psqi_10a)', 'Problem of suitability in club activity(club_activity)'. As a result of tuning the hyperparameters in Table 10, the model using Lasso selected features resulted in the best performance.

And the overall performance of all models increased significantly, which was also the case in ensemble modeling. By reviewing feature selection methods, selecting an appropriate number of variables appears to be important in dropout group classification. As in Table 12, the results of the confusion matrix indicate that some models classified all dropout groups as non-risk groups.

Improvements in ensemble modeling and confusion matrix

User service group
Suicidal risk group
Dropout group

User latent group identification

Latent Class Analysis
Features validation after latent class analysis
Analysis of individual user latent group characteristics

The first group tested was group 3, which showed higher problematic levels compared to other groups and labeled as high-risk group. Group 4 was also suspected as a potential dropout group due to the group's suffering problem and symptoms. Low/moderate risk group and high risk group have similar number of users, and sleep problem group has only 4 users in total.

The ratio of higher risk groups, meaning moderate/high severity of depression and anxiety, was 46% and about 34% each. Users experienced professional help before was 30% and had difficulty focusing (37%). 2) Moderate risk group (Group 4). About 59% of users were in the clinical treatment group, and a relatively higher proportion of the depression and anxiety high-risk groups were present at about 75% and 53%.

Difficulty focusing (53.7%) was similar to the high-risk group and memory impairment (36.14%) was higher than the high-risk group. Various symptoms such as difficulty in daily life (17.7%), alcohol abuse (6.72%), unreal feelings (14.29%) were also reported higher compared to other groups. Also, the problem in the old/young relationship (21%) was reported to be higher in all groups. 3) Lower/medium risk group (Group 1).

Cognitive and behavioral problems included problems concentrating, impaired memory and problems in daily life, which were more common than in many other groups. 45% of users suffered from insomnia and reported taking sleep-related medications, which was only found in the high-risk group for sleep problems and sleep problems.

DISCUSSION

Further research

First is the development of refined data management system that can better reflect user characteristics and improve the performance of machine learning models using such methods. Especially, if a model is developed that can test the effect of a certain feature on the classification of the user group, it will be possible to prove the efficiency of the data collection system and build the most accurate machine learning model. Second, it is possible to analyze the effectiveness of the service according to the classification of the user group using the entire service record based on the improved data collection system and model.

Service effectiveness analysis has the advantage that it can confirm not only the effectiveness of the service, but also the side effects caused by incorrect service delivery. The service plan that takes into account the classification of user groups will help increase the service effect of the university counseling center. Thus, testing refined machine learning models and classification methods at other university counseling centers will reveal unique features of each school.

Such trials will enable the counseling centers to easily discover the characteristics of their visitors and provide tailored service according to different user needs. The application for actual services is expected to help university counseling centers to manage their users more effectively, especially if it is a smaller counseling center without affiliated psychiatric clinics. 대학상단센터의 자살예방과 개이에 대한 홈황 자성태 사지.

대학상담센터 지원면접제도 개선 방안에 관한 연구. 정두영 선생님을 지도교수로 모신 것은 UNIST에 입학하고 가장 잘한 일 중 하나였습니다. 아울러 Healthcare Analytics 및 Interface 연구실 관계자 여러분께도 감사의 말씀을 전하고 싶습니다.

Second, I would like to thank my friends, Joontae Ki, Changjin Kim, Sangsuk Lee, Sungwon Lee, Chanwoo Jung, Younghwan Jun, who always stood by me and gave me the strength to continue. Finally, I would like to thank my parents and sister for their unconditional support and love that allowed me to continue with my degree.