• Tidak ada hasil yang ditemukan

CHAPTER 3: ENHANCED URBAN CLASSIFICATION USING MULTI-SPECTRAL

3.2 Materials and methods

41

thickness of water (Sharma, et al., 2012). Evidently, the four indices contain valuable information that if integrated with the traditional Landsat 8 reflective bands and thermal bands can be used to discriminate different land cover types, particularly in complex and heterogeneous urban areas with plausible accuracies.

This work aimed at assessing the potential of integrating Landsat 8 derived thermal bands, with the sensor’s traditional reflective and computed vegetation indices in discriminating complex and heterogeneous urban landscapes. It was hypothesized that the inclusion of the Landsat 8 thermal bands together with the sensor’s traditional reflective bands, as well as the computed vegetation indices has the potential to greatly improve the image classification of complex and heterogeneous urban landscapes.

3.2 Materials and methods

42 3.2.2. Field data collection and processing

Supervised image classification requires prior knowledge of LULC classes in the study area. It also requires coordinates of representative samples for each LULC type used for training the computer how to assign classes as well as to assess accuracy afterwards. In order to identify LULC classes in Harare and obtain coordinate of representative points per LULC class, field data collection was done between the 1 Apriland 30 April 2015. During data collection, 120 GPS points were collected for each land cover class using a hand-held Garmin eTrex30 GPS with ±3m accuracy. Field data collection followed a stratified random sampling approach to obtain sample from several locations across Harare. For each LULC type, sub-classes were also identified and coordinates of samples were collected. This was done to incorporate intra- class variability. For example, different types of vegetation were identified during field survey and data collection ensured that samples were taken from all possible sub-classes of vegetation (e.g. trees, shrubs, grassland). Although classes obtained in the field could be further disaggregated into several small sub classes, this study generalized them into seven major LULC types (Table 3.3). Seven land cover classes were used based on the recommendations that when using moderate to coarse resolution satellite data, such as Landsat series data, the generation of a large number of classes was inappropriate (Yu, et al., 2013).

43 Figure 3.1: Location of the area under study area

44

3.2.3. Remote sensing data acquisition and pre-processing

A cloud free 30-m Landsat 8 image covering the entire study area was downloaded for free using 170/72 path/row from the earth explorer website courtesy of the USGS-EROS Centre archive (www.earthexplorer.usgs.gov). The image was acquired on 31st October 2014. The Landsat 8 Thermal Infrared Sensor (TIRS) bands are acquired at 100 metre resolution, but were provided already resampled to 30 metre spatial resolution (Table 3.1). The acquired image was corrected for geometric and radiometric errors. The image was rectified to UTM Zone 36S using 20 ground control points collected in the field at the intersection of major roads. Also, to ensure accurate retrieval of spectral information, the image was atmospherically corrected using the FLAASH module in ENVI 4.5 software and the parameters downloaded from AERONET website (Dube, et al., 2014).

Table 3.1: Properties of Landsat 8 data used in the study (Genc et al., 2014)

Band Name Bandwidth (µm) GSD (m)

1 Coastal blue 0.435–0.451 30

2 Blue 0.452–0.512 30

3 Green 0.533–0.590 30

4 Red 0.636–0.673 30

5 NIR 0.851–0.879 30

6 SW1 1.566–1.651 30

7 SW2 2.107–2.294 30

8 Pan 0.503–0.676 15

9 Cirrus 1.363–1.384 30

10 TIRS 1 10.60-11.19 100 *(30)

11 TIRS 2 11.50-12.51 100 *(30)

*TIRS- Landsat 8 thermal Infra-red bands

3.2.4 Landsat 8 spectral bands and vegetation indices retrieval

Six simple spectral reflectance from the visible, near-infra-red short-wave (i.e. blue, green, red, nir, swir I, swir II and two thermal bands (i.e. TIR I and TIR II) were extracted from Landsat 8 OLI and TIR images. In addition, four spectral vegetation indices were computed using Landsat 8 OLI spectral bands. The choice of these indices was based on previous studies that demonstrated their reliable applications in land cover mapping (Chen, et al., 2006; Sharma, et al., 2012). The computed vegetation indices are summarized in Table 3.2. The Landsat 8 OLI and TIRS spectral bands and the computed vegetation indices selected for this study were extracted at each location based on points obtained during field data collection. 120 field- collected GPS points were first projected to the Landsat 8 OLI and TIRS image coordinate system for easy overlay and spectral extraction purposes. Since a point represents a single pixel,

45

a land cover may occupy a pixel and its neighbours, polygons were created by digitizing around each point on pixels falling within the same class. These regions of interest were created using the Region Of Interest (ROI) tool in ENVI 4.5. This was done for both training and validation datasets so that polygons instead of points were prepared as ground truth regions for classification and validation.

3.2.5 Image classification

The extracted Landsat 8 OLI and TIRS bands and computed vegetation indices were used in classifying the complex and heterogeneous urban settings. The analysis was done using seven different sets of both spectral and vegetation indices summarised in Table 3.2. The analysis procedure was done using the Support Vector Machine (SVM) classifier algorithm. The SVM is regarded as one of the most powerful and robust non-parametric machine learning algorithms in image classification studies when compared to the commonly used classification algorithms such as Maximum Likelihood, Random Forest, Artificial Neural Networks and Mahalanobis classifiers (Adelabu et al., 2013; Jia, et al., 2014). One of the major advantages of the SVM algorithm is that it requires comparatively low amounts of training data compared to its counterparts (Forkuor & Cofie, 2011; Yu, et al., 2013). The algorithm applies two classes, namely presence or absence of the training samples, within a multi-dimensional feature space to fit an optimal separating hyper-plane (i.e. in each dimension, vector component is image gray-level). During the process the algorithm attempts to maximize the distance between the closest training samples, or support vectors, and the hyper-plane.

The ground truth data for classification were used to classify each of the layer combinations shown on Table 3.2 using the SVM classifier (Gamma in Kennel function was set at 0.091, Penalty parameter was 100, Pyramid level were set at 0 and the Classification Probability threshold was also 0). The same settings were used in all the methods to eliminate the contribution of the SVM parameters on the accuracy since input band combinations were the only variable in this study. For this work, the dataset was randomly split into 70% (85) and 30% (36) training and testing datasets respectively (Adelabu, et al., 2013). The major land cover classes considered in this study are summarised in Table 3.

Table 3.2: OLI, TIRS spectral bands and computed vegetation indices

46

Data type data source variables applied Analysis

SB OLI 1-6: blue, green, red, NIR, SWIR I & II I

TIRS 1-2: TIRS I & II II

OLI & TIRS 1-8: blue, green, red, NIR, SWIR I, SWIR II, TIRS I & TIRS II

III

Vis OLI 1-4: NDBaI, NDVI, NDBI & NDWI IV

SB & Vis OLI & TIRS 1-6: TIRS I, TIRS II, NDBaI, NDVI, NDBI &

NDWI

V SB & Vis OLI 1-10: blue, green, red, NIR, SWIR I, SWIR II,

NDBaI, NDVI, NDBI & NDWI

VI All

variables

OLI & TIRS blue, green, red, NIR, SWIR I, SWIR II, TIRS I, TIRS II, NDBaI, NDVI, NDBI & NDWI

VII

*SB = spectral bands; Vis = Vegetation Indices; TIRS = Thermal Infrared Sensor; OLI = Operational Land Imager; NDBaI = Normalized Difference Bareness Index; NDVI = Normalized Difference Vegetation Index; NDWI = Normalized Difference Water Index; NDBI

= Normalized Difference Built Index

Table 3.3: Description of the major land cover classes considered for this study

Class Description

Densely built (DB) Very high built density (CBD and industrial areas) Low-medium density residential (LMR) Low and medium density residential areas with

higher vegetation fraction than high density residential High density residential (HDR) Built-up with higher density of building and lower

vegetation cover than low-medium residential Forested Areas (Fr) moderate to dense forest cover

Development (Dv) High density residential under development; mixture of bare and building with very low vegetation cover Grasslands (Gr) Grass covered areas with little or no trees

Water (Wt) Water bodies

3.2.6 Accuracy assessment

To evaluate the reliability of the results obtained from this study, accuracy assessment was performed for each land cover class. An independent test dataset of LULC data consisting of 36 points per LULC type was used in the process. For each method, the obtained classes where cross tabulated on a confusion matrix against the ground truth classes for the corresponding pixels on a confusion matrix in order to determine classification accuracies (Yu, et al., 2013).

The agreement between classification results and ground truth was measured using the producer accuracy, user’s accuracy, overall accuracy and Kappa index generated from the confusion matrices (Jia, et al., 2014). Producer’s accuracy is a measure of how correct the classification is, while user’s accuracy is a measure of the reliability of the map for each class (Namdar, et al., 2014). The different classification methods were primarily compared in performance with the traditional method, which uses reflective bands only based on the

47

coverage per class (area), producer’s accuracy, user’s accuracy, overall accuracy and McNemar’s tests.

3.2.7 Significance of the differences in accuracy between the classification methods The significance of the differences in accuracy between the methods was tested based on the confusions tables, using the McNemar’s test. The McNemar’s test was used to compare each of the methods with the traditional method which uses only the reflective bands for classification to assess whether the other methods significantly differed in terms of accuracy.

The McNemar’s test is a better statistic for comparing accuracies of classification methods than the Kappa index and it is simple to compute (Petropoulos et al., 2012; Adelabu, et al., 2013).

The Kappa chi-squared requires that independent data are used to assess accuracies, but in this study, the same points are used in all methods thus the McNemar’s test was more appropriate as it is also more precise and sensitive (Manandhar et al., 2009).

Table 3.4: Comparison of two methods using the McNemar’s test Method 2

Correctly classified Wrongly classified

Method 1 Correctly classified f11 f12

Wrongly classified f21 f22

McNemar’s Chi squared statistic was computed using Equation 3.1 as:

𝒁𝟐 =(𝒇𝟏𝟐−𝒇𝟐𝟏)𝟐

𝒇𝟏𝟐+𝒇𝟐𝟏 Equation 3.1

where f12 denotes the number of cases that are wrongly classified by classifier 1 but correctly classified by classifier 2 (Table 3.4) and f21 denotes the number of cases that are correctly classified by classifier 1 and wrongly classified by classifier 2 (Petropoulos, et al., 2012). The difference in accuracies were tested at 95% significant level and deemed different if Z > 1.96.

By comparing error matrix of each analysis with that of Analysis I, we obtained total number of cases correctly classified by the analysis and wrongly classified by Analysis I (f12) and vice versa (f21). The values of f12 and f21 thus obtained were used in equation one to test whether the accuracy of each analysis was significantly different with that of analysis I at 95% confidence intervals.