2.2 Atmospheric Classification
2.2.1 Unsupervised Classification Techniques
Unsupervised classification utilizes self learning techniques to derive classes that repre- sent different states in the variable domain. The classes are then linked to a variable of interest after they have been derived. For example in atmospheric sciences the classes comprise of gridded pressures and are linked to surface variables such as precipitation or temperature (e.g. Bárdossy et al., 1995; Hewitson & Crane, 2002; Huth et al., 2008). The following sections review common unsupervised classification techniques (Huth et al., 2008).
Self Organizing Maps
Self organizing maps commonly referred to as SOMs are a category within artificial neural network models (Kohonen, 1989, 1990, 1991, 1995). Kohonen (1990) describe it as the process in which a neural network of connected nodes evolve in such a manner as to best describe an input signal. A signal is shown to the network whereby the nodes actively update their positions through a learning technique providing a simplified yet detailed description of the input signal (Hewitson & Crane, 2002; Kohonen, 1990).
The learning technique utilizes a Euclidean distance measure to quantify performance (Hewitson & Crane, 2002).
In many ways SOMs are similar to traditional cluster analysis techniques in that the SOM will place nodes in such a way that is representative of the distribution of data points (Hewitson & Crane, 2002). For example nodes are densely spaced where data points are densely spaced and visa versa. However Hewitson & Crane (2002) argue that they differ in two fundamental ways. The first being the way in which the groups are defined and secondly when the SOM is viewed collectively it describes the structure of the data set. For example if two states commonly occur together in the dataset then their associated nodes will be located near to each other. The initial aim
of the SOM technique is not to identify individual groups or clusters within the data set. Rather the aim is to best describe the data set and if groups or clusters exist within the data set then they are reflected within the SOM (Hewitson & Crane, 2002).
For example the SOM technique applied to atmospheric classification is described as follows (Hewitson & Crane, 2002; Skivic & Francis, 2012):
For a data set that consists a time series of pressure on a 5x10 grid the SOM will have n numbers of nodes each associated with a reference vector that consists of 50 coefficients. The size of the SOM (number of nodes) is user defined. It is expected that a larger SOM will represent the data to a higher degree in comparison to a smaller more general SOM. For each time realization the pressure data is presented to the SOM. The similarity between the reference vector for each node and the pressure data is calculated. The reference vector for the best match node is then updated by a user defined learning rate. The advantage of the SOM technique compared to cluster analysis is the nodes surrounding the best match node are also adjusted. The updating scheme is given as (Kohonen, 1990)
mi(t+ 1) =mi(t) +hci·[x(t)−mi(t)] (2.17) wheremi(t) is the reference vector for nodeiat timet. The data set at timetis given as x(t) and hci is a neighborhood function relating to the best match node capplied to the ith node. It is given as
hci(t) = α(t)·exp −∥rc−ri∥2 2σ2(t)
!
(2.18) where α(t) is a user defined learning rate that decreases with time, ∥rc−ri∥ is the distance between the best matching node c and node i and σ(t) is referred to as the radius of training and defines which nodes to update.
The use of SOMs in atmospheric science has gained significant attention (Hewitson
& Crane, 2002; Hong et al., 2005; Skivic & Francis, 2012). The advantage of SOMs in synoptic climatology is the data is treated as a continuum (Hewitson & Crane, 2002).
This provides a physically meaningful framework from which to evaluate the complex environment. Atmospheric circulation is not a set of clearly defined states but rather forms part of a continuum in which systems transition between states smoothly (Huth et al., 2008). However a drawback in the method is that links to surface variables are only made post classification. In other words the SOM technique is a top down
approach as atmospheric states are derived first after which they are best matched to the occurrence of a particular surface variable.
Cluster Analysis
Cluster analysis methods are commonly used in classification and are arguably the most natural approach (Huth et al., 2008). The k means clustering algorithm is popular approach although the use of simulated annealing as an optimizing tool in clustering analysis has been shown to be an improvement over the latter (Huth et al., 2008; Phillipet al., 2007). A drawback of the k means technique is that it is sensitive to the preliminary selection of node centroids therefore making it unstable (Huthet al., 2008). The aim of thekmeans method is to maximise the similarity between data that belong to a particular cluster (Huth et al., 2008). There are currently two methods by which this is done. The first is the algorithm developed in Lloyd (1982) and the second is that developed in Hartigan & Wong (1979). In the Lloyd (1982) algorithm the data are randomly divided intok clusters and the locally optimal solution is found by alternating between two steps (Slonim et al., 2013). The first step assigns data to the nearest cluster centroid which is followed by updating the clusters’ centroid (Lloyd, 1982; Slonim et al., 2013). The algorithm developed in Hartigan & Wong (1979) attempts to search for k partitions that have locally optimal within-type sum of squares. This is achieved by moving points between clusters and evaluating the new within-type sum of squares and accepting changes that result in lower values.
Principle Component Analysis
Principle component analysis (PCA) is an eigentechnique that has has two uses in classification (Huth et al., 2008; Richman, 1986). Firstly it can be used as a pre- processing tool before classification to reduce colinearity between variables and as a data compression tool. Secondly the technique can be exploited in itself as a classifi- cation method (Richman, 1986). In order for PCA to be used as a classification tool the data matrix must be organized so that the gridded values are in rows and the time realizations are in columns (referred to as T-mode) (Compagnucci & Richmond, 2008;
Huthet al., 2008). For example in atmospheric classification the gridded pressures are the rows while the time realizations of the CP are the columns. If the data matrix is configured so that the gridded values are the columns and the time realizations are the rows (referred to as S-mode), the PCA algorithm only detects the modes of variance
that are not representative of individual circulation patterns (e.g. Huth et al., 2008).
A description of the T-mode and S-mode PCA is given in Compagnucci & Richmond (2008). For T-mode PCA the data matrix Xt is of order (t×n) where t is the length of the record and n is the size of the grid. The PCA algorithm is the solution to the formulation Xt = FAT where F is a matrix of the principle components and A relates the components of Fto the input variables in Xt, for exampleA could be the correlation matrix.