Air quality impact analysis and modelling
3.5. DETERMINATION OF IMPACT 43 Gaussian models
Gaussian plume dispersion models are one of the most commonly used air quality models based on the ’deterministic principle’. At the most basic level, they assume a constant emission rate from the source, a wind flow that has uniform time and height, and that the gravity is not taken into account for the loss. The absorption at the ground surface is assumed to have no absorption, and that the turbulent diffusion in directionxis neglected relative to advection in the direction of transport. It is implied that the model should use wind speeds of more than1ms
The model requires that the co-ordinates be directed with theX-axis into the direction of the flow, and the v (lateral) and w (vertical) components of the time-averaged wind vector are set to zero. It is required that the terrain underlying the plume is flat and that all variables are ensemble-averaged, which implies long-term averaging with stationary conditions [Cheremisinoff, 1989] [Nieuwstadt, 1980] [Arya et al., 1999]
C(x, y, z) = Q U σyσz2πe−(
y2 2σ2
z)[e−[
c+h2 2 2σ2
Z
]+e−[
z−h2 e 2σ2
Z
]], (3.11)
WhereC= concentration of emission(gmm3) at any receptor located atx(downwind distance from the source),y(crosswind) andxvertical;Q= source emission rate(gmsec);u= horizontal wind velocity;He= plume centre line height above ground;σ2= vertical standard devia- tion of emission distribution;σy= horizontal standard deviation of emission distribution.
Figure 3.6: Gaussian plume
Numerical models
Cluster analysis- Cluster analysis is a technique for solving data classification problems.
Used by [Lee et al., 2007] in research on identifying pollution characteristics and how they
correlated with public health issues in Taiwan. The procedure for this involved measure- ment of either the distance or the similarity between the objects forming the cluster. This shows the natural cluster that would exist between the variables. Using a grouping that ensures that objects are with other similar objects ensures a strong association with ’like’
and weak associations between ’different’. The [Lee et al., 2007] study used a hierarchical clustering ensuring complete linkage as the amalgamation rule and a squared Euclidean distance as a metric. This Euclidean distance between two objects (regional air pollution characteristics),iandk, can be expressed as
d2ij=
m
X
k=1
(Zik−zjk)2, (3.12)
Where,dij is the Euclidean distance,Zikis the standardised value ofXik,Zjkis the stan- dardised value ofXjkandmis the number of pollutant kinds.
In order for [Lee et al., 2007] to establish pollution level prior to cluster analysis, the descriptor variables (concentrations ofP M10,SO2,N O2,COandO3) were standardised by means of z-scores to avoid any effects of units scale on the distance measurements by applying the equation described by [Kowalkowski et al., 2006]:
Zik= Xik−µk
σk , (3.13)
WhereZikis the standardised value ofXik,Xikis the original value of measured parame- ter (the concentration of pollutantkin regioni),µukis the average value of pollutantk,σk
is the standard deviation of pollutantk.
Principal component analysis- Principal component analysis is a data reduction tech- nique that is used to find linear correlations (known as thePrincipal components) which ac- count for as much as the original total as possible [Statheropoulos et al., 1998]. Successive combinations of these are extracted so that they account for amounts of the total variance.
Principal components can be expressed by
P Ci =a1iV1+a2iV2+...+aniVn, (3.14) WhereP Ci is the Principal componentiandaji is the loading (correlation coefficient) of the original variableVi[Jolliffe, 2011] [Wold et al., 1987] [Malinowski, 1987].
[Singh et al., 2013] describe the process of using Principal component analysis in their research on identifying pollutant sources: "PCA extracts the eigenvalues and eigenvectors from the covariance matrix of original variables. The principal components (PCs) are the uncorrelated (orthogonal) variables, obtained by multiplying the original correlated vari- ables with the eigenvector (loadings). The eigenvalues of the PCs are the measure of their associated variance, the loadings give the participation of the original variables in the PCs, and the individual transformed observations are called scores.
PCA was performed on the complete data set standardised through z-scale transforma- tions. Standardisation tends to minimise the influence of variance difference of variables and eliminates the effect of different units of measurement and renders the data dimen- sionless. PCA was performed to identify the pollution sources in the study area and to understand the influences of meteorological parameters on their levels."
Pearson’s correlation coefficient- [Vardoulakis and Kassomenos, 2008] used Pearson correlation analysis in their study on sources and factors impactP M10 in two European
3.5. DETERMINATION OF IMPACT 45 cities, as well as principal component analysis (PCA). Pearson correlation is the co-variance of two variables divided by the product of their standard deviations. The formula can take many forms depending on it’s application, but generally when applied to a sample, referred to asr, it is:
r=
Pn
i=1(xi−x)(yˆ i−y)ˆ pPn
i=1(xi−x)ˆ 2pPn
i=1(yi−y)ˆ 2, (3.15) where n is the sample size, xi,yi are the individual sample points indexed with i, xˆ =
1 n
Pn
i=1xi(the sample mean); and analogously foryˆ
[Shaddick et al., 2018] utilise a model to correlate data from air-quality, with satellite- based aerosol optical depth and chemical transport models.
Canonical Component Analyses are mathematical techniques which can be used to in- vestigate whether a correlation could exist between, for instance, meteorological data and air-quality data. Mathematically, we need to establish the maximum correlation among sets of variables - for instance, with two sets of variables the approach is to run Princi- pal component analyses on each to remove multi-co linearities. Each resulting component has a weighting developed in order to develop the maximum correlations between said datasets. This results in Canonical variables, which are unrelated to one another [Jendoubi and Strimmer, 2018].
This approach saw good results in the [Statheropoulos et al., 1998] experiment which applied it five years worth of data onCO,N O,N O2),smokeandSO2collected in Athens.The results showed Principal component analysis used along with Canonical Correlation Anal- ysis to be effective tools for correlating data on their air-quality and meteorological data.
PCA with a varimax rotation was applied to the air-quality data from each monitoring station separately in [Vardoulakis and Kassomenos, 2008] followed by linear regression (least squares) analysis and some further standard mathematical, statistical calculations.
The HYSPLIT (version 4.7) model [Stein et al., 2015] was used with meteorological data in order to calculate kinematic 3D back trajectories of air parcels arriving at different points.
[Diapouli et al., 2017] used positive matrix factorisation by way of a two-way PMF (2PMF) [Paatero, 2000] for source apportionment in order to determine the impact of a measured impact factor on the quality of air. When run on both older and more recent datasets, this was used to harmonise the data to achieve comparative results.
Positive Matrix Factorisation - Positive matrix factorisation (PMF) was described in other air-quality research by [Lee et al., 1999] to identify suspended particulates over a two year data period in Hong Kong. PMF differs from principal component analysis (PCA) in that non-negativity of factors (loadings and scores) is included in the PMF model and uses a point by point least-squares minimisation scheme. This has the benefit of allowing the profile produced to be compared to the input matrix without further calculations.
In PMF, any matrix X of data of dimensionn rows and mcolumns where nand m are the numbers of samples and species, respectively) can be factorised into two matri- ces, namelyZ(nxp)andC(pxm)and the unexplained part ofX, Ewhereprepresents the number of factors to be extracted.
X =ZC+E. (3.16)
The product ofZ and Ccould explain the systematic variation inX. This model has an emphasis on information from all samples, which is achieved by weighting the squares of residuals with the reciprocals of the squares of the standard deviations of the data val- ues. Ordinarily, emissions involved will have larger standard deviations, and therefore their weight coefficients are smaller than those in unweighted models like Principal com- ponent analysis. The aim of positive matrix factorisation is to minimiseQ- as shown by
the following equation:
Q=
n
X
i=1 n
X
j=1
e2ij
s2ij, (3.17)
subject tozik≥0andckj≥0wherezikandckjare elements of Z and C, respectively.
The residuals,eijare defined by:
eij=xij−
p
X
k=1
zikckj, (3.18)
andsijis the standard deviation ofxij.
Grey Correlation Model- Used by [Pan et al., 2011], is an analysis of which known factors have the most impact on air quality. Air-quality forecasts can pose some difficulties as the data is obtained in limited time and space and often incomplete and/or may contain inaccuracies [Vallero, 2014]. [Pan et al., 2011] regard the air environment system as a grey system and used grey system theory as an analysis method in order to weaken the effect of unknown information through available information thus giving a more accurate picture of the air quality system as a whole.
Their impact factor analysis utilises a grey relational model for the analysis of a number of impacts in order to determine an annual concentration ofP M10,SO2,N O2, and the air quality standard rate. By calculating correlations this way, there is a correlation between each impacting factor on each air quality concentration which has a high degree of accuracy as shown by application to historical data. The [Pan et al., 2011] model uses index values as a reference series:
X0= (x0(1), x0(2)...x0(9), (3.19) and the factors affecting air quality a series:
X1= (xi(1), x0(2)...x0(9). (3.20) To discover which factors in these series have a negative correlation with the reference series, the series is turned into it’s reciprocal form:
XiD= (1/xi(1),1/xi(2)...1/xi(9). (3.21) Thus, the relational coefficient between the reference and factor series at timekis:
(k) =
min i
min
k |x0(k)−xi(k)|+ρmaxi maxk |x0(k)−xi(k)
|x0(k)−xi(k)|+ρmaxi maxk |x0(k)−xi(k) , (3.22) The grey correlation is:
υ1= 1 m
m
X
i=1
(k). (3.23)
3.5. DETERMINATION OF IMPACT 47