List of Tables
3. THE THEORY OF IMAGE PROCESSING AND ANALYSIS
3.6 ASSESSING THE ACCURACY OF CLASSIFIED IMAGES
Assessing the accuracy of an image classification is an important step in the processing of satellite images (Shao et al, 2001). Firstly, by knowing the accuracy of a classification, some measure of confidence can be attached to the final product, thereby influencing its interpretation and subsequent use. Also, by assessing classification accuracy, analysts can iteratively modify their classification procedure in order to achieve an optimum level of accuracy. Finally, producing an accuracy assessment allows different classification techniques to be compared, allowing the analyst to choose the most accurate and appropriate technique (Congalton & Green, 1999).
While accuracy assessment is considered an essential step in image processing, it was not until the mid-1970s that attempts were made to evaluate how accurate image classifications
were. Initially a single measure of accuracy was calculated by comparing estimates of the area of ground cover with those produced by the image classification. This was later expanded to include site-specific assessments where actual site visits (known as ground truthing) were undertaken to verify classification accuracy (Congalton & Green, 1999).
Ground truth data did not necessarily need be collected from site visits but could also be obtained from other sources like aerial photographs and map data (e.g. Wang et al, 1998;
Lunetta & Balogh, 1999). The ground truthing approach is currently still in use but has been developed into a methodology which produces an error matrix and numerous measures of accuracy.
The error matrix is most often applied to hard classifications where each pixel is assigned to only one land cover class. The matrix is used to compare reference (or ground truth) data to a sample of classified data in a way that allows the overall accuracy to be calculated while at the same time allowing errors of commission and omission to be determined (Congalton, 1991). This is illustrated in Table 3.3 which shows reference data (in columns) compared against the corresponding classified data (in rows). The values in the major diagonal represent all those pixels that have been correctly classified. Adding these and dividing by the total number of samples gives an indication of the overall classification accuracy. In this example, 68 water pixels, 45 forest pixels and 72 sand pixels have been correctly classified giving an overall accuracy of (68+45+72)/202 = 92%.
Table 3.3: Error matrix relating reference data to classified data.
-o
S & Water J Q Forest
u Sand
Column Total
Reference Data Water
68 3
1
Forest 6 45
2
Sand 4
1 72
72 53 77
Row Total 78 49 75 202
Errors of commission occur when a pixel is classified in the wrong category while errors of omission occur when a pixel is excluded from the category to which it actually belongs.
These can be quantified by calculating the User's and Producer's Accuracy as shown in Table 3.4. From this table it can be seen that 68 out of the 72 water reference sites have been correctly classified as water, giving a producer's accuracy of 94%. For the producer of
the classification this means that 94% of the areas that are water have been correctly classified. However, when the user's accuracy for water pixels is calculated, it can be seen that the value is only 87%, meaning that 13% of the pixels that have been classified as water are actually not water. This can be problematic for the user of the classified data who might find forest or sand where the classification indicates water.
Table 3.4: Calculation of User's and Producer's Accuracy using data from Table 3.3.
Producer's Accuracy User's Accuracy (omission errors) (commission errors) Water = 68/72 = 94% Water = 68/78 = 87%
Forest = 45/53 = 85% Forest = 45/49 = 92%
Sand = 72/77 = 94% Sand = 72/75 = 96%
If the error matrix is being used to evaluate a supervised classification it is important that the sites used for ground truthing are not the same as those that were used as training sites.
Using the same sites both for training and accuracy assessment can result in unrealistically high accuracy values. At the same time, it should be remembered that data collected by ground truthing is often far from perfect, its quality being influenced by factors such as sampling technique and representivity (Malthus & Mumby, 2003).
The error matrix is an easily implementable and commonly used method for determining the accuracy of hard classifications. However, accuracy assessment becomes rather more difficult for studies involving soft classifications and/or historical satellite images.
Assessing the accuracy of classifications performed on historical images is difficult as it is impossible to go back in time to perform traditional ground truthing (Jensen et al, 1995).
This is especially problematic for change detection studies that rely on differences between multidate images to identify changes in land cover (Congalton & Green, 1999). Current ground truth data would not necessarily be an accurate reflection of land cover on the date the image was captured. It is also very difficult to evaluate the accuracy of soft classifications as the traditional application of the error matrix is only applicable to hard classifications (Binaghi et al. 1999; Ricotta, 2004). There is thus a need to develop accuracy assessment procedures for both change detection and soft (or fuzzy) classifications. Congalton & Green (1999), Binaghi et al. (1999) and Ricotta (2004) discussed methods for doing this but in all cases relied on the availability of accurate and
contemporary ground truth data to test their classifications. In addition, Congalton & Green (1999) observed that even though a few procedures for assessing change detection accuracies had been proposed, there was still no standard technique available and very little work had been done on comparing the relative merits of those techniques.
An alternative method of accuracy assessment was utilised by Vicente-Serrano et al. (2004) in their recent study using Landsat ETM+ and NOAA AVHRR images to map soil moisture in the Ebro River valley in Spain. They used meteorological records from the area to calculate a Standardised Precipitation Index (SPI) which was then used to verify their soil moisture map. They called this method an 'indirect climate approach' and found a clear relationship between the SPI and the soil moisture maps produced by their image classification. In calculating the SPI they did not use meteorological data taken at the time of image acquisition but used an aggregation of data from the preceding 15 days.
In another alternative to the error matrix approach, studies based on linear mixture analysis are able to use the band residuals and root-mean-square (RMS) error values to give some indication of classification accuracy (e.g. Adams et al, 1995). Low values of band residuals and/or RMS error indicate that the overall fit of the mixture model is good while high values indicate the possible presence of additional land cover classes that have not been included as endmembers.
In conclusion, it can be seen that although classification accuracy assessment is fairly straightforward for studies involving single images and hard classifications, problems occur when soft classifications are performed and/or no ground truth data are available.
While some methods have been proposed for assessing accuracies in these situations, much development is still needed in this area.