DATA PREPARATION AND MEASUREMENT MODEL
5.4 Data Preparation and Screening
to a minimal amount of missing data in this study where only 18 questionnaires had missing data.
A number of approaches exist on how to handle missing data, although experts differ on their personal recommendations concerning which technique to use under varying degrees of the randomness of the missing data. These include listwise deletion, pairwise deletion, and imputation techniques. Listwise deletion involves deleting from the statistical analysis all cases that have missing data while pairwise deletion only excludes the case if they are missing the data required for specific analysis. Imputation techniques attempt to impute or substitute for a missing value some other value that they deem to be a reasonable estimate (Hair et al., 2010;
Meyers, Gamst and Guarino, 2013; Tabachnick and Fidell, 2014).
This study used the listwise approach to deal with the missing data problem.
The listwise approach is simple and is perfectly appropriate if the number of deleted incomplete cases is small. Moreover, listwise deletion leads to unbiased parameter estimates if the data are missing completely at random (Hair et al., 2010; Meyers, Gamst and Guarino, 2013; Tabachnick and Fidell, 2014). Therefore, the 18 questionnaires that had missing data were completely discarded. This resulted in a decrease in the sample size but had no significant influence on the richness of the data and the number of remaining cases was adequate for the analysis. As a result, a total of 232 valid questionnaires were used in this study to analyse the data.
5.4.2 Outliers
Outliers are cases with extreme or unusual values and this can be uni-variate or multivariate (Hair et al., 2010; Meyers, Gamst and Guarino, 2013; Tabachnick and Fidell, 2014). Uni-variate outliers are data points with extreme values with regard to a single variable, whereas multivariate outliers are data points with extreme values with regard to multiple variables. Outliers can result from several causes, such as data entry errors or improper attribute coding, extraordinary events or unusual circumstances, errors in sampling, respondent intentional or motivated by misreporting, or legitimately from the correct population being sampled (Hair et al., 2010; Tabachnick and Fidell, 2014). In this study, a Likert scale ranging from
“strongly disagree” or “strongly agree” could become an outlier because they were the
extreme points of the scale. Hence, evaluating uni-variate outliers was irrelevant here as the respondents may have had different or extreme opinions with regard to a particular issue.
With regards to multivariate outliers, this study used Mahalanobis D2 as a measure. Mahalanobis D2 is used to evaluate the position of each observation compared with the centre of all observations on a set of variables (Hair et al., 2010).
From the results of the Mahalanobis D2, any observation that has a probability of 0.001 or less is considered as an outlier or with the use of SPSS, the Mahalanobis D2 values are evaluated with a Chi-square (χ2) distribution with degrees of freedom equal to the number of variables and evaluated with a table of critical values for Chi-square (χ2) of p < 0.001 (Hair et al., 2010; Meyers, Gamst and Guarino, 2013; Tabachnick and Fidell, 2014). However, Stoimenova, Mateev and Dobreva (2006) are of the view that observations with Mahalanobis D2 probabilities of 0.001 are not necessarily outliers and could still belong to the data distribution. Using AMOS, Mahalanobis D2 was used to measured outliers and eight (8) observations were found to be outliers.
While some researchers believe that the best way to deal with outliers is to delete them (Osborne and Overbay, 2004), Hair et al. (2010) argued that deleting outliers might improve the multivariate analysis but at the risk of limiting generalisability. Therefore, in order to ensure generalisability to the entire population, outliers should be retained if they depict a representative segment of the population (Hair et al., 2010). The presence of a few outliers within a large sample is not a big concern (Kline, 2005). The decision can also be to retain these values as they can explain the uniqueness of the observation (Pallant, 2010). Accordingly, I decided to retain them.
Table 5.3 Mahalanobis D2 for Outliers
Observation Number Mahalanobis D2-Distance p1 p2
93 127.533 0 0
116 106.897 0 0
98 106.341 0 0
43 104.625 0 0
64 99.626 0 0
172 98.802 0 0
182 94.564 0 0
117 93.622 0 0
5.4.3 Normality
Normality is the most fundamental assumption underlying the multivariate analysis of data. Normality is the extent to which a variable or data corresponds to the shape of a normal distribution, the benchmark for statistical methods (Hair et al., 2010). Even though it is very difficult, it is beneficial as variations from the normal distribution render all statistical results and tests invalid. Checking for uni-variate normality is the common approach, as multivariate normality means that the individual variables are normal in uni-variate and that their combinations are also normal (Hair et al., 2010). Uni-variate normality can be assessed with the use of statistical or graphical approaches. The statistical approach of assessing normality uses kurtosis and skewness. Kurtosis refers to the “peakedness” or “flateness” of the distribution compared with the normal distribution. A positive kurtosis indicates that the distribution is more peaked than the normal distribution, whereas a negative kurtosis indicates that the distribution is less peaked than the normal distribution.
Skewness on the other hand refers to the degree of symmetry of a distribution around the mean. In a positively-skewed distribution, the long tail of the distribution is to the right while a negatively-skewed distribution has the long tail on the left side.
A normally-distributed variable will generate skewness and kurtosis values that hover around ±1.0 (Meyers, Gamst and Guarino, 2013). In addition, the Kolmogorov-
Smirnov test and Shapiro-Wilk test can be used as a statistical test. Statistical significance with these measures with an alpha level of p < 0.001 indicates a possible uni-variate normality violation (Pallant, 2007; Hair et al., 2010; Meyers, Gamst and Guarino, 2013; Tabachnick and Fidell, 2014). Graphical approaches of assessing uni- variate normality use histogram and stem-and-leaf plots for each variable.
Table 5.4 presents the skewness and kurtosis of the items in this study. The results shows that most of the skewness values (except variables Cost1, Cost2, Cost3, Quit1, Quit2, Quit3, Quit4, Count1, Count2, Count3, and Count4) were positive and close to zero (between 0.01 and 0.9), showing a very slight skew to the left hand side while the skewness values for the variables (Cost1, Cost2, Cost3, Quit1, Quit2, Quit3, Quit4, Count1, Count3, and Count4) were negative and close to zero (between -0.02 and -0.09), which shows a very slight skew to the right hand side. However, Count2 had a skewness value of -1.26, while Qua1 had 1.38. Furthermore, all of the kurtosis values (except variables Qua2, Inno1, Cost1, Cost2, Cost3, TM4, Task1, Task2, Task3, Task4, Adapt1, Adapt2, Adapt3, Adapt4, Satis1, Satis2, Satis3, Cont1, Cont2, Cont3, Cont4, Count1, Count3, Count4) were negative and roughly close to zero (-0.2 and -0.9), which shows very slight flat shape with few cases at the extreme.
On the other hand the variables Qua2, Inno1, Cost1, Cost2, Cost3, TM4, Task1, Task2, Task3, Task4, Adapt1, Adapt2, Adapt3, Adapt4, Satis1, Satis2, Satis3, Cont1, Cont2, Cont3, Cont4, Count1, Count3, Count4 were positive and roughly close to zero (between 0.3 and 0.8), which shows very slight peaked shape with very close cases at the extreme. Also, variables Qua1, Comit1, Comit2, Comit3, and Count2 had kurtosis values of between ±2.08 to ±1.02 giving an indication of skewed. The results showed that none of the items had extreme skewness as none of the values was higher than ±3.0. Based on this and the fact that the values were not extreme, multivariate normality could be assumed. Indeed, Hair et al. (2010) argued that significant departures from normality may be negligible and have no severe impact on the results when sample sizes exceed 200. Therefore, none of these variables was transformed.
Table 5.4 Assessment of Normality
Construct Items Skewness Kurtosis Construct Items Skewness Kurtosis Quality Qua1 1.38 2.08 Satisf-action Satis1 0.73 0.40
Qua2 0.93 0.30 Satis2 0.99 0.56
Innovation Inno1 0.95 0.34 Satis3 0.73 0.08
Inno2 0.66 -0.28 Talent
Management
TM1 0.72 -0.11
Inno3 0.87 0.20 TM2 0.89 -0.01
Cost Cost1 -0.83 0.66 TM3 0.91 -0.03
Cost2 -0.93 0.71 TM4 0.93 0.74
Cost3 -0.97 0.81 TM5 0.63 -0.35
Perceived Organi- sational Support
POS1 0.54 -0.42 Task
Perform- ance
Task1 0.66 0.74
POS2 0.58 -0.25 Task2 0.77 0.67
POS3 0.48 -0.32 Task3 0.91 0.66
POS4 0.62 -0.01 Task4 0.93 0.47
POS5 0.529 -0.77 Task5 0.889 0.199
POS6 0.45 -0.54 Contextual
Perform- ance
Cont1 0.97 0.66
Adaptive Performance
Adapt1 0.90 0.62 Cont2 0.97 0.59
Adapt2 0.84 0.45 Cont3 0.98 0.68
Adapt3 0.92 0.54 Cont4 0.91 0.43
Adapt4 0.90 0.51 Cont5 0.36 2.529
Adapt5 0.36 0.25 Cont6 0.54 0.10
Adapt6 0.24 0.97 Cont7 0.91 0.05
Counter- productive Perform- ance
Count1 -0.87 0.30 Cont8 1.085 1.133
Count2 -1.26 1.28 Intention to Quit
Quit1 -0.48 -0.68
Count3 -0.90 0.43 Quit2 -0.22 -0.88
Count4 -0.90 0.34 Quit3 -0.59 -0.38
Commit- ment
Comit1 0.34 -1.02 Quit4 -0.50 -0.70
Comit2 0.41 -1.02 Comit3 0.05 -1.03
5.4.4 Multicollinearity, Linearity, and Homoscedasticity
Multicollinearlity is a condition that exists when two or more predictors correlate very strongly. A number of ways are available to identify multicollinearity.
They include: using the correlation matrix, the tolerance value, and the variance inflation factor [VIF] (Hair et al., 2010; Pallant, 2010; Meyers, Gamst and Guarino, 2013; Tabachnick and Fidel, 2014). In the correlation matrix, multicollinearity exists when the correlation between the variables in the analysis are very strongly related.
Even though there is disagreement concerning the strength of the relationship, correlations of 0.80 and above raise a red flag. With respect to tolerance and VIF, less than 0.10 and greater than 10 respectively are recommended as good for the non- existence of multicollinearlity. In this study, I employed correlation matrix to assess multicollinearity because of the difficulties of selecting one variable in this study as the dependent variable. The results as displayed in table 5.15 show that the correlation between the variables did not exceed 0.80, indicating non-violation of the multicollinearity assumption.
Figure 5.1 Scatter Plots for Some Variables
Linearity was checked using scatter plot and the results were satisfactory because all of the variable relationships were positive (an upward line can be drawn through the points). This was done with randomly-selected variables because it is difficult to assess each pair by scatter plots when numerous variables are involved.
With respect to homoscedasticity, none of the individual relationships between the independent/dependent variable indicated a violation of homoscedasticity (cone or diamond shapes) and all of the relationships showed a rough cigar shape. This is shown in figure 5.1 and this indicated that there was a linear relationship between the variables. Based on this, all of the assumptions (missing data, outliers, normality, linearity, multicollinearity and homoscedasticity) were met; therefore, I was able to proceed to test the measurement model.