PDF TAGR: Telehealth Automatically Generated Recommendations

The Special Problem entitled "TAGR: Telehealth Automatically Generated Recommendations" prepared and submitted by Kryle Marxel E. Molina in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science has been examined and is recommended for acceptance. Accepted and approved as partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science.

Mathematical and Computer Sciences Unit Department of Physical Sciences Department of Physical Sciences and Mathematics. Telehealth Automatically Generated Recommendations (TAGR) is a module to automatically classify SMS and email messages with their appropriate labels as required by the National Telehealth Center for their National Telehealth Service Program (NTSP). It uses support vector machines in text classification that are configurable in a web-based interface.

Background of the Study

Rural doctors may not have all the facilities and resources they need during their deployment. This can be attributed to the low cost of sending text messages: 1 Philippine peso (about $0.02) per message. The NTSP allows rural physicians to send consultations to the system via email or text message [11].

When a doctor sends a message, a nurse receives and acknowledges it by replying to the referring doctor. The message is then forwarded by the nurse to the relevant clinical specialist or domain expert. The domain expert sends a response to the system, which is received by the nurse.

Without telehealth nurses, the system will not be able to accommodate rural doctors.

Statement of the Problem

Due to the nature of the NTSP system and its reliance on email and SMS, it is prone to spam on both platforms. Although the occasional spam message may seem harmless, consistent spam can slow down the telemedicine process by taking up valuable time of the telehealth nurse assigned to the system. With the expansion and rollout of the system to more areas, the volume of incoming messages is expected to increase.

Currently, the system requires referring physicians to tag their messages according to keywords, to help the telehealth nurse classify the request and forward it to the appropriate domain expert. At this point, the remaining task of the telehealth nurse is accurate verification of the messages. However, there are times when messages are mislabeled or not labeled at all, which adds to the telehealth nurse's duties.

In addition, the development of the NSTP aims to eventually do away with the labeling requirement altogether.

Objectives of the Study

Aware of this, the telehealth nurse should focus as much as possible on only relevant messages and not bother with unwanted messages. In email, this is placed in the subject field, while SMS is in a specific format. To ensure accuracy, there must be a way to help the telehealth nurse sort the messages.

View Filtered Messages (i) Mark Message as Spam 2. a) Act as Telehealth Nurse (b) Configure Spam Blocker.

Significance of the Project

Scope and Limitations

Assumptions

Review of Related Literature 8

Among others, the National Telehealth Center of the University of the Philippines, Manila produced the Community Health Information Tracking System (CHITS), the e-learning project for health, and the SMS telemedicine project [ 19 ]. One of the most popular approaches to solving multi-label problems is to implement binary relevance (BR) [24]. In BR, the classification problem is considered as multiple binary classifiers, where the final result is the union of the results of these binary classifiers.

It was able to achieve as much as 22% higher than previous state-of-the-art systems[31]. In the same study, it was found that classification accuracy is a function of the number of tests and the frequent terms in them. Zhenfei used SVM to recognize the name of the specified type from the collection of biomedical texts, with a precision or 0.84 and recall of 0.80 [ 42 ].

The National Telehealth Center of the University of the Philippines Manila (NThC) was established in 1998 and is the main research unit at the University of the Philippines with a focus on making ICT innovations to improve healthcare [ 46 ].

Telemedicine

In line with its vision, the NThC is committed to empowering people to use available technologies in cost-effective ways to improve healthcare, despite geographic barriers [7]. Some of her projects include: the RxBox, a multi-component device that provides access to life-saving healthcare services for isolated and underserved communities; and the National Telehealth Service Program, which enables conversations between general practitioners in geographically isolated and underserved areas and specialists from the Philippine General Hospital using mobile and Internet-based technologies [ 47 ].

Doctors to the Barrios

Referrals

Machine Learning

In a classification task where labeled data is available, it is only appropriate to use a supervised learning algorithm. The model will then be used on new data consisting of inputs and predict the respective outputs.

Natural Language Processing

N-gram

Term Frequency-Inverse Document Frequency (TF-IDF)

Chi Square (χ 2 ) Feature Selection

Feature selection is the process of selecting a subset of features and using only this subset in building a machine learning model. In text classification, feature selection is mainly used to increase the efficiency of a classifier by reducing the size of the effective dictionary. It is also used to improve classification accuracy by eliminating noise features, or features that increase classification error in new data [56].

In feature selection, we use it to check whether the occurrence of a certain term and the occurrence of a certain class are independent [57]. Features found to be most likely to be class-independent are considered irrelevant for classification.

Support Vector Machine (SVM)

The support vectors are the points closest to the second class (they lie on the dashed lines). New unlabeled data will be labeled accordingly on which side of the optimal hyperplane they fall in. In case the data is not linearly separable, SVMs can still perform non-linear classification using the kernel trick. If the number of features is large, the data may not need to be mapped to a higher dimensional space.

SVMs can be applied to a multi-class classification problem by combining multiple binary classifiers in a one-to-all scheme.

Figure 3: Support vector machine; generated optimal hyperplane

Design and Implementation 23

Data Flow Diagram

If a message is determined to be spam, it will be filtered out but still stored in the database. If the message is not filtered out, it is passed to the message tagger. The next three Figures 12,13,14demonstrate the manual verification process that can be performed by either the telehealth nurse or the administrator.

Note that only messages that are not marked as spam can be flagged and checked. Classifier configuration is divided into blocker and flagger configuration as shown in Figure 15. Configuring a blocker is as simple as adding and removing entries from the block list.

Training a new model involves the similar process of tokenization, vectorization, TF-IDF weighting, and feature selection of process 1.2 (Figure 13). The difference is that instead of applying it to a single message instance, the process is applied to the existing database of messages. After all the messages from the database are processed, cross-validation is performed using an instance of the classifier.

For each iteration, one fold is selected, while the classifier is trained on the remaining 9 folds. The accumulated results of 10 replications are displayed as an evaluation report containing Precision, Recall, F-core and a confusion matrix.

Figure 9 exposes the three main functions of the system: automated tagging, manual verification, and classifier configuration.

Database Design

Data Dictionary

System Architecture

Technical Architecture

Results 34

For the purpose of discussion, the web-based component, the National Telehealth System, will be addressed as part of TAGR itself. The web-based component was built on the web2py framework, while the message tag used scikit-learn library. It contains a login form that allows users with the right credentials to access the rest of the system.

Telehealth Nurse View

As in Figure 25, the following details are shown: the date and time the messages were received, who sent the message, the message content, the message label, and the available actions. Filtered messages are practically similar to Messages, except that the messages displayed are either automatically filtered messages or messages marked as spam. Figure 29 shows the filtered messages page while Figure 30 shows the confirmation page when a message is removed as spam.

Figure 23: Edit profile details, TAGR

Administrator View

Discussions 48

For demonstration purposes, a minimal web-based interface was built around the TAGR module that simulates the real processes of the NTSP. This opens the possibility of improving the accuracy of the classifier even if the module is deployed in a production environment. The nature of the dataset also led to the planned use of Conditional Random Fields (CRF) as a machine learning method for the classifier being dropped in favor of Support Vector Machines.

This caused a problem as TAGR configuration required a persistent instance of the classifier that could be accessed across multiple pages. Nadkarni PM, Ohno-Machado L, “Natural language processing: an introduction,” Journal of the American Medical Informatics Association, vol. Binagwaho, “Design and implementation of an innovative sms-based alert system (rapidsms-mch) to monitor pregnancy and reduce maternal and child mortality in Rwanda,” The Pan African Medical Journal , vol.

Assefa, “Amplifying the Voice of Africa's Youth through Text Analysis,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. Denny, “Paste: patient-centric SMS text tagging in medication management system,” Journal of the American Medical Informatics Association, vol. Kwok, “Efficient multi-label classification with many labels,” in Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp.

Gabaglio, “An Ensemble of Bayesian Networks for Multilabel Classification,” in Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, p. Hirschman, “The Miter Clinical Claim Status Classification System,” Journal of the American Medical Informatics Association, vol.