3.1 METHODS
3.1.7 BrainCoder System Performance Evaluation
The BrainCoder system included decoding and neurofeedback training. Therefore, the metric assessing the performance for both aspects was needed for the BrainCoder system evaluation. The accuracy was calculated for the decoding performance measure, and the false direction score (FDS) was derived from measuring the neurofeedback effect, which was whether the user could give the effort to put the cursor in the right direction. Furthermore, as the experiment was conducted 5 times (4 times for one subject) for each participant, the trend of accuracy and FDS was analyzed by the 1st-order polynomial fitting. Also, the steepness of the increment in trend was acquired by the first-order coefficient and was also included in the analysis. Consequently, there were 3 evaluation methods, and those were compared with the random model we generated to verify the validity of the model. After that, a comparison between MT and NT subjects was conducted to observe the possible effect of musical ability.
3.1.7.1 Accuracy
The accuracy was defined as the ratio of correct runs out of total runs, 63 runs. When counting the correct runs, it was counted no matter how many trials were needed to achieve the correctness.
Therefore, this evaluation aims to find if the user was able to use it correctly anyhow within 7 trials.
The equation is shown below.
π΄π΄π΄π΄π΄π΄π΄π΄πππππ΄π΄π΄π΄ (%) = πΆπΆπ΄π΄πππππππ΄π΄π‘π‘ π π π΄π΄πππ΄π΄
63 β100
3.1.7.2 False Direction Score (FDS)
The false direction score (FDS) was derived from measuring the effort of the user and whether they controlled the cursor in the right direction. To derive FDS, the cursor movement was visited in detail in Figure 3.3. The negative score was allocated when the cursor moved in the opposite direction to the target pitch or when the cursor did not move. Then, the FDS for each run was the sum of the negative scores, and the overall FDS was the average FDSs over 63 runs.
38
Figure 3.3. Example of False Direction Score (FDS) of the cursor generation
3.1.7.3 Trend Line Fitting
The statistical test was not reliable as the sample size was too small for 5 participants. Although we could collect a total of 24 trials, the variability of days was also included, so the analysis power was not confidential with difficulty in assuming random distribution. Thus, in this study, we are approaching the qualitative analysis of the result rather than quantitatively testing.
To realize quantitative analysis, we observed the day-by-day changing of the evaluation metrics, whether they were increasing or decreasing by days, then digitized the trend of the changing tendency by fitting the trend line fitting. The trend line was fitted using the polyfit() function in MATLAB parallel computing toolbox, where the order was 1, and the data of each day for each subject was applied to fit the first-ordered polynomial equation [71]. Therefore, one trend line was generated per subject with 5 days of data (4days for one subject). Furthermore, to compare the trend of increment or decrement, we used the measurement of the steepness of the trend line, which is the first coefficient out of two coefficients.
39 3.1.7.4 Random Model (RM) Generation
As the BrainCoder used the MLR model and the position was allocated with a 10Hz barrier, not by the possibility like classification, the comparison of the randomly generated label and the estimated position label was not comparable because we could not guarantee the same random distribution of the output of our estimation. Due to the impropriety of comparing the result with random labels, we generated the random model (RM), and its estimation was compared with the BrainCoder (BC). The estimation was made with the RM model with the generation of the random feature when the number of trials was lacking. This happened when the user corrected in their cursor position smaller than 7 trials for some of the runs, although some of the generated trials might not be used in case RM was correct faster than BC.
The RM had the same Intercept and weight range as BC; nonetheless, the weight was randomly selected in the weight range of BC. That is, we found the minimum of WBC and maximum of WBC, then extracted p (feature dimension size, see 3.1.5) of random number within [minimum(WBC) maximum(WBC)]. Then, the random feature vector for vacant trials was achieved by the normally distributed signal generation with mean and standard deviation values of all preceded trials. For example, if we are out of signal for the fourth signal, then the normally distributed signal with mean and standard deviation for trial 1~3 was generated, then for the fifth signal, the mean and std for trial 1~4 were used, although the fourth was also the generated one. With this method, all of the runs could possess all 7 trials, regardless of the trial vacancy in the online experiment. For each run, the RM made the estimation, and it stopped estimating when its estimation was allocated to the target position. The accuracy and the FDS were generated for every 63 runs per iteration, and then the metrics were averaged over 10 times of iteration.
40