Comparison - Scalar Evaluation Measures - An investigation of multi-label classification techni

5.2 Scalar Evaluation Measures

5.2.3 Comparison

5.2 Scalar Evaluation Measures 97

Ensemble Methods

The scalar evaluation measures are given for the RAkEL and ECC ensemble methods for dataset 2 in table 5.12. The highest value for each measure is shown in bold. As for dataset 1, the RAkEL method produces no blanket positive predictions, which means that it can provide predictive information for every label. The highest values forT NR,MCCandDPproduced by the RAkEL technique are all for tenofovir. This means that RAkEL performs best at predicting this label, which is confirmed by the fact that theAUCvalue is also high (0.70).

The ECC technique trained and validated using dataset 2 shows the best results out of all the experiments performed. It has the highest mean values for theT PR,MCC,DPandAUCevaluation measures across all experiments on both datasets. ECC has a particular high value (and low standard deviation) for theAUC measure for the efavirens label (0.83) and has a T PR value of 0.97 for the same label. This means that ECC correctly detects resistance to efavirens in 97% of cases for dataset 2. The ECC technique also has a T PRvalue above 0.95 for six different labels (efavirens, emtricitabine, delavirdine, nevirapine, etravirine and lamivudine). For the same 6 labels, ECC has anAUCof over 0.71 in each case.

5.2 Scalar Evaluation Measures 98

Table 5.13. Problem transformation methods evaluation measure differences. The values are generated by subtracting the values generated for dataset 1 from the corresponding values for dataset 2. The absolute values are shown and bold values indicate improved performance on dataset 2, while grey values imply that performance was better on dataset 1.

BR-SVM BR-NB HOMER

TPR TNR MCC DP TPR TNR MCC DP AUC TPR TNR MCC DP AUC efavirens 0.00 0.00 - - 0.39 0.59 0.11 0.45 0.21 0.28 0.60 0.21 0.72 0.16 didanosine 0.16 0.04 0.22 0.56 0.16 0.10 0.07 0.17 0.03 0.14 0.16 0.03 0.10 0.04 emtricitabine 0.00 0.00 - - 0.01 0.08 0.07 0.16 0.11 0.24 0.50 0.14 0.19 0.16 delavirdine 0.00 0.00 - - 0.40 0.57 0.11 0.39 0.16 0.33 0.52 0.12 0.39 0.06 stavudine 0.08 0.06 0.00 0.05 0.07 0.07 0.01 0.04 0.01 0.01 0.09 0.10 0.26 0.04 nevirapine 0.00 0.00 - - 0.39 0.59 0.11 0.45 0.21 0.28 0.60 0.21 0.72 0.16 tenofovir 0.16 0.00 0.17 0.43 0.20 0.10 0.05 0.07 0.08 0.02 0.01 0.04 0.12 0.06 etravirine 0.01 0.00 0.05 0.72 0.24 0.47 0.18 0.43 0.10 0.09 0.27 0.16 0.49 0.07 abacavir 0.08 0.10 0.18 0.43 0.03 0.05 0.02 0.06 0.01 0.13 0.11 0.01 0.02 0.06 lamivudine 0.00 0.00 - - 0.01 0.08 0.07 0.16 0.11 0.24 0.50 0.14 0.19 0.16 zidovudine 0.02 0.02 0.01 0.01 0.06 0.04 0.02 0.01 0.02 0.01 0.05 0.05 0.14 0.01 mean 0.03 0.02 0.10 0.37 0.12 0.21 0.07 0.22 0.09 0.15 0.28 0.07 0.20 0.06 std dev 0.02 0.02 0.08 0.17 0.02 0.06 0.00 0.01 0.01 0.11 0.25 0.04 0.06 0.00

BR-SVM- Binary relevance with support vector machine base classifier,BR-NB- Binary relevance with naive Bayes base classifier,HOMER - Hierarchy of multi-label classifiers,TPR- True positive rate, TNR- True negative rate, MCC- Matthews correlation coefficient,DP- Discriminative power,AUC- Area under the receiver operating characteristic curve

5.2 Scalar Evaluation Measures 99

and subtracting the corresponding value produced for dataset 1 (0.26 shown in table 5.7). Only the absolute values of the differences are shown in the tables and a positive difference (i.e. the technique performs better on dataset 2) is shown in bold, while negative differences (the technique performs better on dataset 1) are shown in grey. Note that a reduced standard deviation is considered an improvement, so the standard deviation values shown in bold indicate that the technique had a lower standard deviation on dataset 2.

Problem Transformation Methods

Table 5.13 shows the evaluation measure differences for the problem transformation techniques. There is no difference for the six labels that BR-SVM produces blanket positive predictions for. Most of the differences indicate that BR-SVM performs better on dataset 2. This is confirmed by the fact that the mean value for every evaluation measure has improved for dataset 2. The most notable increase for the BR-SVM technique is theDPvalue for etravirine, which increased from 0.31 to 1.04 – an increase of over 300%. The value for all evaluation measures increased for both didanosine and abacavir, which implies on dataset 2 BR-SVM performs better in all cases for these drugs.

The evaluation measure difference for the BR-NB method shown in table 5.13 indicates improved performance overall on dataset 2. This is evident from the fact that the mean values forT NR,MCC,DPand AUC have all increased. The only mean value that did not increase wasT PR, which decreased by 0.12.

This implies that the BR-NB method is on average 12% worse at detecting cases of resistance. The standard deviations for each evaluation measure improved or stayed the same forT PR, T NR andMCC. Standard deviation decreased by 0.01 for both DP andAUC, meaning that the range of values obtained for these measures was distributed more widely for dataset 2 than for dataset 1.

As is the case with BR-NB, T PR for the HOMER technique has decreased for dataset 2. The mean value for theT PRevaluation measure has decreased by 0.15. The only two instances where theT PRvalue increased were for stavudine and zidovudine (both drugs in dataset 2 to which patients are overwhelm- ingly more susceptible than resistant). The evaluation measure that has increased most significantly for the

5.2 Scalar Evaluation Measures 100

HOMER method isT NR. TheT NRvalue has increase by 0.60 for efavirens and nevirapine and by 0.50 or more for emtricitabine, delavirdine and lamivudine. In the case of efavirens, this means that HOMER is 5 times better at predicting susceptibility to efavirens for dataset 2 than for dataset 1, which is a considerable improvement. The improvement is similar for nevirapine, emtricitabine, delavirdine and lamivudine.

Algorithm Adaptation Methods

The scalar evaluation measure differences for the algorithm adaptation techniques are shown in table 5.14.

It can be seen that theT NRvalues for MLkNN have all either increased or stayed the same, meaning that MLkNN is better at detecting susceptibility to drugs on dataset 2 than on dataset 1. While the mean value increased for four out of the five measures, the standard deviation decreased for four out of the five. This implies that overall predictive power increased, but the range of values for the evaluation measures was spread more widely. Although not shown in table 5.14, it is important to notice that the MLkNN method only has two blanket positive predictions on dataset 2 (emtricitabine and lamivudine) compared to six on dataset 1 (efavirens, emtricitabine, delavirdine, nevirapine, etravirine and lamivudine), which is a large improvement and implies that MLkNN can provide meaningful predictive information for four more labels on dataset 2 than on dataset 1.

The values for all four evaluation measures decreased for the PCT technique for efavirens, emtricitabine, delavirdine, nevirapine and lamivudine. These are all drugs to which many more patients are resistant than susceptible to (in both datasets). For all the other labels, the evaluation measure values have increased, with the exception of theT NRvalue for didanosine, which has decreased by a small amount (0.03). The most sig- nificant evaluation measure improvement for dataset 2 for the PCT technique is theDPvalue for tenofovir, which has increased by 0.54. This represents a more than 100% improvement for that measure. Overall, considering the sum of all the improvements (2.49) and comparing this to the sum of all the decreases in performance (3.16), it appears that the PCT method performs slightly better on dataset 1 than on dataset 2.

5.2 Scalar Evaluation Measures 101

Table 5.14. Algorithm adaptation methods evaluation measure differences. The values are generated by subtracting the values generated for dataset 1 from the corresponding values for dataset 2. The absolute values are shown and bold values indicate improved performance on dataset 2, while grey values imply that performance was better on dataset 1.

MLkNN PCT

TPR TNR MCC DP AUC TPR TNR MCC DP

efavirens 0.00 0.02 - - 0.14 0.04 0.06 0.14 0.61

didanosine 0.01 0.11 0.13 0.48 0.04 0.08 0.03 0.04 0.07

emtricitabine 0.00 0.00 - - 0.13 0.02 0.01 0.05 0.22

delavirdine 0.00 0.02 0.09 - 0.06 0.04 0.05 0.13 0.59

stavudine 0.03 0.03 0.00 0.16 0.02 0.07 0.01 0.08 0.21

nevirapine 0.00 0.02 - - 0.14 0.04 0.06 0.14 0.61

tenofovir 0.01 0.01 0.10 - 0.02 0.17 0.02 0.20 0.54

etravirine 0.01 0.02 - - 0.11 0.00 0.01 0.01 0.03

abacavir 0.12 0.04 0.08 0.19 0.02 0.06 0.02 0.07 0.17

lamivudine 0.00 0.00 - - 0.13 0.02 0.01 0.05 0.22

zidovudine 0.01 0.00 0.04 0.53 0.07 0.14 0.02 0.13 0.34

mean 0.01 0.02 0.05 0.42 0.06 0.04 0.02 0.00 0.08

std dev 0.01 0.01 0.03 0.12 0.02 0.07 0.01 0.05 0.31

MLkNN- Multi-labelk-nearest neighbours,PCT- Predictive clustering trees,TPR- True positive rate,TNR- True negative rate, MCC- Matthews correlation coefficient,DP- Discriminative power,AUC- Area under the receiver operating characteristic curve

5.2 Scalar Evaluation Measures 102

Table 5.15. Ensemble methods evaluation measure differences. The values are generated by subtracting the values generated for dataset 1 from the corresponding values for dataset 2. The absolute values are shown and bold values indicate improved performance on dataset 2, while grey values imply that performance was better on dataset 1.

RAkEL ECC

TPR TNR MCC DP AUC TPR TNR MCC DP AUC

efavirens 0.24 0.45 0.10 0.18 0.13 0.03 0.28 0.36 - 0.15

didanosine 0.12 0.26 0.18 0.49 0.07 0.11 0.05 0.17 0.40 0.07 emtricitabine 0.13 0.26 0.07 0.03 0.05 0.02 0.21 0.24 0.38 0.15 delavirdine 0.22 0.59 0.27 0.83 0.10 0.04 0.28 0.18 - 0.11 stavudine 0.12 0.03 0.08 0.20 0.08 0.09 0.01 0.11 0.27 0.12

nevirapine 0.24 0.47 0.12 0.25 0.12 0.03 0.27 0.33 - 0.21

tenofovir 0.04 0.04 0.11 0.33 0.04 0.11 0.02 0.06 0.10 0.09 etravirine 0.20 0.38 0.18 0.54 0.09 0.03 0.23 0.29 1.15 0.15 abacavir 0.09 0.10 0.03 0.12 0.00 0.00 0.04 0.05 0.13 0.03 lamivudine 0.08 0.19 0.03 0.30 0.06 0.03 0.25 0.25 0.30 0.16 zidovudine 0.01 0.01 0.01 0.01 0.02 0.02 0.02 0.00 0.01 0.05

mean 0.10 0.25 0.11 0.24 0.07 0.02 0.14 0.18 0.45 0.12

std dev 0.07 0.13 0.01 0.10 0.01 0.05 0.13 0.06 0.11 0.00

RAkEL- Randomk-labelsets,ECC- Ensemble of classifier chains,TPR- True positive rate,TNR- True negative rate,MCC- Matthews correlation coefficient,DP- Discriminative power,AUC- Area under the receiver operating characteristic curve

5.2 Scalar Evaluation Measures 103

Ensemble Methods

The evaluation measure differences for the ensemble methods are given in table 5.15. As with BR-NB and HOMER, most of theT PRvalues have decreased in dataset 2 for RAkEL. The only labels for which theT PR values have increased are stavudine and tenofovir. Both of these labels correspond to drugs to which many more patients are susceptible than resistant. The only other drug for which this is the case is zidovudine and theT PRvalue for zidovudine has only decreased by a very small amount (0.01). This means that overall, for theT PRmeasure, RAkEL has improved for the drugs to which patients are more susceptible than resistant, and worsened for drugs to which patients are more resistant than susceptible. The values forT NR,MCC, DPandAUChave all increased on average.

TheAUCvalue has increased for all labels for dataset 2 for the ECC technique, with the value for nevirapine increasing by the largest amount (0.21). TheMCCvalues have also all either increased or remained the same and all but one of theDPvalues (zidovudine, which has only decreased by 0.01) have increased.

The mean value for every evaluation measure has increased and the standard deviations have all improved (i.e. decreased) or remained the same. This means that the ECC has improved by all measures on dataset two, with an average of a 14% increase for theT NRmeasure and a 12% increase forAUC. Overall, the ECC technique has showed the most improvement on dataset 2, with 42 out of the total of 55 evaluation measures increasing and only 9 measures decreasing.

Dalam dokumen An investigation of multi-label classification techniques for predicting HIV drug resistance in resource-limited settings. (Halaman 110-116)