5.2 Scalar Evaluation Measures
5.2.4 Scalar Evaluation Measure Summary
5.2 Scalar Evaluation Measures 103
Ensemble Methods
The evaluation measure differences for the ensemble methods are given in table 5.15. As with BR-NB and HOMER, most of theT PRvalues have decreased in dataset 2 for RAkEL. The only labels for which theT PR values have increased are stavudine and tenofovir. Both of these labels correspond to drugs to which many more patients are susceptible than resistant. The only other drug for which this is the case is zidovudine and theT PRvalue for zidovudine has only decreased by a very small amount (0.01). This means that overall, for theT PRmeasure, RAkEL has improved for the drugs to which patients are more susceptible than resistant, and worsened for drugs to which patients are more resistant than susceptible. The values forT NR,MCC, DPandAUChave all increased on average.
TheAUCvalue has increased for all labels for dataset 2 for the ECC technique, with the value for nevi- rapine increasing by the largest amount (0.21). TheMCCvalues have also all either increased or remained the same and all but one of theDPvalues (zidovudine, which has only decreased by 0.01) have increased.
The mean value for every evaluation measure has increased and the standard deviations have all improved (i.e. decreased) or remained the same. This means that the ECC has improved by all measures on dataset two, with an average of a 14% increase for theT NRmeasure and a 12% increase forAUC. Overall, the ECC technique has showed the most improvement on dataset 2, with 42 out of the total of 55 evaluation measures increasing and only 9 measures decreasing.
5.2 Scalar Evaluation Measures 104
These values represent the differences between the evaluation measures generated for dataset 1 and those generated for dataset 2.
The scalar evaluation measures for dataset 1 have been presented in tables 5.7 – 5.9. These measures in- clude theT PR,T NR,MCC,DPandAUCvalues (where possible) for each label for each of the seven multi- label classification models trained and evaluated using dataset 1 (BR-SVM, BR-NB, HOMER, MLkNN, PCT, RAkEL and ECC). In general the BR-SVM and MLkNN techniques did not perform particularly well on dataset 1. They both had numerous cases of blanket positive predictions and the meanT PRandT NR values were all below 0.66 for both of these methods. MLkNN had a meanT NRof 0.40, which means that on average it could only correctly predict susceptibility to a drug in 40% of cases.
The BR-NB, HOMER and PCT techniques performed slightly better than BR-SVM and MLkNN on dataset 1. These techniques produced no blanket positive predictions and have meanT PRvalues of 0.69, 0.67 and 0.68 respectively. This means that in more than two thirds of cases, these three methods are able to correctly predict resistance to a drug. For BR-NB and HOMER, this is confirmed by the AUC values (0.60 and 0.57 respectively), which indicate that these methods are moderately more effective than a random classifier. The same can be said about the PCT method due to the fact that it has aDPvalue of 0.86 – the highest of all methods for dataset 1. PCT has a meanT NRof 0.53, which makes it the best at predicting susceptibility to drugs on dataset 1.
The RAkEL method performs about as well as PCT on dataset 1 in terms of theT PRmeasure. The mean T PRvalues for these methods are 0.69 and 0.68 respectively, meaning that on average they can correctly predict resistance to a drug in 69% and 68% of cases. Both methods perform particularly well for efavirens, emtricitabine, delavirdine, nevirapine and lamivudine (the drugs to which patients are more resistant than susceptible), withT PRvalues as high as 0.97 in some cases. ECC performs best on dataset 1 at predicting resistance, with a meanT PR of 0.71, although this may be slightly inflated due to the fact that the ECC technique produces blanket positive predictions for two labels (efavirens and nevirapine).
For dataset 1 some techniques produce meanT PRvalues as high as 0.71 (in the case of ECC), butT NR values are generally low. This means that while all of the methods seem to be sensitive enough to detect
5.2 Scalar Evaluation Measures 105
resistance better than a random classifier, most are not specific enough to also be able to detect susceptibility.
Only PCT has aT NRvalue above 0.5 (0.53), meaning that this method is the only one that is able to predict susceptibility to a drug better than a random classifier. This, along with the fact that PCT has the highest values for theMCC(0.25) andDP(0.86), makes this technique the best performing on dataset 1 based on the scalar evaluation measures.
The BR-SVM and MLkNN methods again do not perform well on dataset 2. Both still produce blanket positive predictions, although MLkNN produces blanket positive predictions for only two labels on dataset 2, compared to 5 on dataset 1. TheT PRandT NRvalues for these two techniques are much the same for dataset 2 as for dataset 1, with at most a 0.03 improvement (in the case ofT PR for BR-SVM). BR-SVM and MLkNN are both still only moderately better than a random classifier at predicting resistance and worse than a random classifier at predicting susceptibility on dataset 2.
The performance of the BR-NB, HOMER and PCT techniques has increased on average on dataset 2. Most significantly, the meanT NRvalues for all of these three methods is above 0.5 for dataset 2. This means that BR-NB, HOMER and PCT are all able to predict resistance and susceptibility to drugs better than a random classifier on dataset 2. On dataset 1, only the PCT technique was able to do this. The HOMER method has a meanT NRof 0.73 on dataset 2, making it the best method at predicting susceptibility to drugs across all the experiments conducted on both datasets.
The RAkEL and ECC methods both perform better than a random classifier based on all five evaluation metrics on dataset 2. RAkEL has a meanT NRvalue of 0.66 making it the second best method (tied with BR-NB) at predicting susceptibility to drugs across all experiments on both datasets. The meanT PRvalue for the the ECC method has increased to 0.73, meaning that on dataset 2 this method is able to correctly predict resistance to drugs in 73% of cases on average. ECC also has the only meanDPvalue above 1.0 and the highest meanAUC(0.75).
Most methods perform better than a random classifier at predicting both resistance and susceptibility when trained and evaluated using dataset 2, with only BR-SVM and MLkNN still performing worse than a