• Tidak ada hasil yang ditemukan

Analysis of outliers

List of Abbreviations

Chapter 6: Chapter 6: Findings on the Associations of Organisation Size and Employee Roles with the BSC

6.6 Data processing analysis techniques

6.7.3 Analysis of outliers

Kline (2011) defined outliers as the scores that are different from the rest. Hair et al. (2010, p. 64) gave a more detailed definition by referring to outliers as

148

observations with a unique combination of characteristics identifiable as distinctly different from the other observations. Hair et al. (2010) further explained that a unique characteristic refers to an unusually high or low value on a variable or a unique combination of values of several variables that make an observation stand out from the others. Hair et al. (2010) classified outliers into four possible groups:

1) outliers that occur due to a procedural error such as a data entry error or a mistake in coding; 2) outliers that result from an extraordinary event which considers the uniqueness of the observation. In such a case, the researcher must determine whether the extraordinary event fits the objective of the research; if yes, the outliers should be retained, and not, they should not; 3) outliers that result from extraordinary observations that cannot be explained by the researcher; 4) outliers that consist of observations that fall within the ordinary range of values on each of the variables. In such a case, unless specific evidence is available, the researcher should retain the outliers.

In this research, a frequency test was first conducted to determine the distribution of participants answering the survey questionnaires. The data assessed were all 24 variables under the classification of five latent variables: leadership involvement (li), strategy translation (st), strategy alignment (sa), strategy as everyone’s everyday job (sj) and strategy as a continuous process (sp). In examining the results, it was determined that any ‘unidentified number’ (a number not conforming to a 7-point Likert scale) appearing in the survey questionnaire would be treated as an outlier). From all 24 variables, one outlier appeared in Item li4. Table 6.11 provides information about the distribution of complete answers to Item li4. Referring to Hair and colleagues’ (2010) classifications, this outlier might have appeared due to a procedural error such as data entry error or a mistake in coding.

Table 6.12 Frequency of Item li4

li4 Frequency Percentage Cumulative

1 3 0.18 0.18

149

As can be seen in Table 6.12, there is one answer (number 8) that is not part of a 7-point Likert scale. Having found the outlier, the next step to be performed was to locate the position of the outlier (case number). Nick Cox’s extremes command using the STATA program was conducted to identify which case contained the score 8.

Table 6.13 Extreme values of Item li4 Case Scale

438 1

750 1

1425 1

414 2

532 2

1671 7 1673 7 1674 7 1675 7

54 8

Table 6.13 shows that the value 8 appears in observation data number 54. Case number 54 was thus deleted to avoid any bias in the subsequent analysis process.

The final total number of samples for the subsequent analysis is therefore 1,674.

2 13 0.78 0.96

3 28 1.68 2.64

4 96 5.77 8.41

5 238 14.29 22.7

6 902 54.17 76.88

7 384 23.06 99.94

8 1 100

150 6.7.4 Analysis of data distribution normality

After completing the analysis of outliers, the researcher proceeded by conducting a normality test of data distribution. As in the test of outliers, the data tested were all 24 variables under the classification of five latent variables: leadership involvement (li), strategy translation (st), strategy alignment (sa), strategy as everyone’s everyday job (sj) and strategy as a continuous process (sp).

The normality of observed variables is analysed with either a statistical or a graphical approach (Tabachnik and Fidel 2007). Hair et al. (2010) mentioned that the shape of any data distribution can be examined by means of two statistical parameters: kurtosis and skewness. Furthermore, Hair et al. (2010) explained that kurtosis refers to the ‘peakedness or flatness of the distribution compared with the normal distribution’; skewness describes the balance of data distribution. If the data distribution is unbalanced, then it is called skewed. A positive skew means that data distribution shifts to the left and, a negative skew means that data distribution shifts to the right. Furthermore, Hair et al. (2010) explained that values above or below zero denote departures from normality, with negative kurtosis values indicating flatter distribution and positive values representing peaked distribution. Similarly, positive skewness values indicate that the data distribution shifts to the left, while negative skewness values denote a rightward shift.

In order to observe the normality of the variables, this research applied Doornik- Hansen omnibus, Shapiro-Wilk and kurtosis and skewness methods using STATA 14 statistic software. If the test is non-significant (ρ >0.05), it means that the data distribution does not significantly differ from a normal distribution, or that the data distribution is normal. However, if the test is significant where ρ <0.05, it indicates that the data distribution significantly differs from a normal distribution, or that the data does not meet the condition of normal distribution.

151

The Doornik-Hansen test showed p-values of less than 0.05, indicating significance. As for the Shapiro-Wilk method, the normality assessment test results as shown in Appendix 6 showed p-values of less than 0.05 for all observed variables. The last test performed, using the kurtosis and skewness methods, showed p-values in all observed variables excluding st2 and st4 to be less than 0.05 (Appendix 7).

In sum, the three normality measurements used (Doornik-Hansen omnibus, Shapiro-Wilk and kurtosis and skewness) showed p-values of less than 0.05 in all observed variables excluding st2 and st4 (according to the kurtosis and skewness test). This means that the data deviate from normality and thus indicates that the data distributions are not normal. Non-normality occurs frequently in large samples; Hair et al. (2010, p. 72) noted that in most instances with large sample sizes, the researcher can be less concerned about non-normal variables. A similar view was espoused by Tabachnik and Fidel (2007, p. 80): ‘In a large sample, a variable with statistically significant skewness often does not deviate enough from normality to make a substantive difference in analysis and the impact of departure from zero kurtosis also diminishes’.