4.6 Results and Discussions
4.6.3 Exploration of Fine Motor Patterns
1.99, p = .046, r = -.58; ASD: RC = 58.2%, Z = -2.20, p = .028, r = -.64) in both groups. Typically, the ASD group achieved significant improvements in all three tasks with large effect sizes (“THE” task: RC = 35.7%, Z = -2.20, p = .028, r = -.64; “LAZY” task: RC = 77.3%, Z = -2.20, p = .028, r = -.64; “DOG” task:
RC = 56.3%, Z = -1.99, p = .046, r = -.58). The improved raw task scores (TS) also existed in both groups, but no statistically significant differences between pre- and post-test were found in either the TD or ASD group (TD: RC = 9.59%, Z = -0.94, p = .345, r = -.27; ASD: RC = 39.6%, Z = -0.94, p = .345, r = -.27).
The performance improvements in both VMI test and virtual test suggested that the virtual practice tasks (Path Tasks) had positive impact on improving the finger and hand motor control. Medium positive relationships between the virtual task scores (TS) and the VMI scores (TD: ρ = 0.368, p = .239; ASD: ρ = 0.332, p = .292) were found, which also suggested the effectiveness of the tasks in our system to evaluate the user’s fine motor abilities.
corresponding features extracted from these metrics. For Task Duration, Hit Number and RMSE, except for the original data, we extracted the “difference” feature, which was defined as the difference between the BACK process data and GO process data. As for the Grip Force and Speed, we extracted more features, such as the mean, median, standard deviation and so on.
Table 4-6. Performance Metrics and Features.
Types Metrics Features
Task Duration (D) D, D_GO, D_BACK
Original metric, difference.
Hit Number (H) H, H_GO, H_BACK
RMSE (E) E, E_GO, E_BACK
Grip Force (F) F, F_GO, F_BACK Difference, mean, median, standard deviation, coefficient of variation (COV), interquartile range, skewness, kurtosis, mean/median absolute deviation
(mad), maximum
Speed (V) V, V_GO , V_BACK
4.6.3.2 Feature Selection and Classification
The number of the overall extracted features was 59. We first ranked all features by F-values using one- way analysis of variance (ANOVA) test [75], and generated a feature list that arranged all features in descending order of F-values. Since some features were redundant to improve the accuracy of the classification model, we reduced the highly-related features (correlation > 0.9) to a single feature with a higher F-value, and finally obtained 35 features. To select the most discriminative features, we conducted the model training and evaluation on a subset of the feature list, where the subset size increased from 1 to 35. The feature subset was constructed by iteratively inserting a feature from the top of the feature list to the feature subset.
We trained six classifiers (Table 4-7) on our dataset (96 samples), and used the leave-one-out cross validation (LOOCV) for model evaluation. LOOCV is widely used when only a few data are available and is known to be almost unbiased [76]. LOOCV works by repeatedly selecting one sample from the dataset as the test set and using the remaining samples as the training set. In this study, the classification accuracy was evaluated using score, which considered the precision (P) and the recall (R) of the classification results. The score for each model was computed as:
𝐹,= 2 ×𝑃 × 𝑅
𝑃 + 𝑅 , ( 4-7 )
𝑃 = 𝑇𝑃
𝑇𝑃 + 𝐹𝑃, 𝑅 = 𝑇𝑃
𝑇𝑃 + 𝐹𝑁 ( 4-8 )
F1
F1
where TP was the number of ASD samples that were correctly classified as the member of ASD group, FP was the number of TD samples that were incorrectly classified as the member of ASD group, and FN was the number of ASD samples that were incorrectly classified as the member of TD group. The whole procedure of model training and evaluation is described in Figure 4-8.
Figure 4-8. The procedure of model training and evaluation.
4.6.3.3 Classification Results
Figure 4-9 shows the classification results of six classifiers with respect to the number of features. Table 4-7 lists the maximum scores for each classifier. The results indicated that all classifiers could achieve maximum accuracies within 67-80% by considering appropriate features. The k-NN and ANN classifiers had the best performance with the maximum accuracy of 80%, when the k-NN classifier used the top one feature (Mean of Grip Force during the BACK process) and the ANN classifier used the top six features.
The Naïve Bayes and the Random Forest classifiers had the second best performance with the maximum accuracy of 75%, when they separately used the top 5 and 11 features. The SVM and Decision Trees classifiers had lower accuracies of 71% and 67% respectively.
F1
Table 4-7. Classification methods and results.
Classifiers Parameters # features Max
Decision Trees CART algorithm 5 0.67
Random Forest 50 random trees 11 0.75
Naïve Bayes Gaussian 5 0.75
k-Nearest Neighbor (k-NN) k = 7, Euclidean distance 1 0.80 Artificial Neural Network (ANN) 1 hidden layer (4 neurons) 6 0.80 Support Vector Machine (SVM) Radial basis function kernel 18 0.71
Figure 4-9. Classification Results.
According to the results of the ANOVA test, the top 10 features (all with p<.05) are related to the Grip Force (six features) and Speed data (four features). It suggested that Grip Force and Speed data provided much information for improving the classification of participants with ASD. The boxplots of the top 10 features (Figure 4-10) indicated that the ASD group applied significantly smaller Grip Force than the TD group (BACK_Fmean, Fmedian, GO_Fmedian). During the BACK process, the TD group increased much more Grip Force than the ASD group (DIFF_Fmean, DIFF_Fmedian), and reduced the variability of Grip Force much more than the ASD group (DIFF_Fmeanmad). In addition, during the GO process, the ASD group had greater Speed than the TD group (GO_Vmean), and had a lower kurtosis of Speed (GO_Vkurtosis). During the BACK process, the TD group reduced the COV of Speed much more than the ASD group (DIFF_Vcov). During the whole process, the ASD group had greater variability of Speed than
F1
the TD group (Vmeanmad).
Figure 4-10. Boxplots of the top ten features ranked by F-values using the ANOVA test.