• Tidak ada hasil yang ditemukan

Statistical Problems and Tests

CHAPTER THREE: METHODOLOGY

3.6 Statistical Problems and Tests

The sample size is 48 with a total of 12 explanatory variables leaving 36 degrees of freedom. The sample size is not particularly large and is therefore vulnerable to small-sample biases and distortions. However the data only goes back as far as 1960 and although small, the sample-size is not tiny. It should be large enough in order to do some meaningful analysis. Gujarati (2003: 356) defines a large sample as being 40-50 observations and Moolman et al. (2006) and Fedderke et al. (2004) both run similar regressions with even smaller samples. Therefore the relatively small sample is something that needs to be kept cognisant, but should not be a major problem.

There are a number of potential statistical problems with any econometric regression. These include perfect multicollinearity, autocorrelation and heteroskedasticity. Should any of these be present in a given regression, they may render it spurious and therefore unusable. Hence it is important to understand each problem and then correct for it, should it be present in the regression.

3.6.1 Multicollinearity

Multicollinearity occurs when there is a linear correlation between two or more of the explanatory variables in a regression model (Gujarati, 2003: 341). This results in the estimators having large variances hence making precise estimation difficult. The t stats of some of the coefficients tend to be incorrectly reported as statistically insignificant. The R2, the overall goodness of fit, can also be incorrectly reported as very high. Thus the presence of multicollinearity can result in distorted regression results.

46 | P a g e There are a number of different methods to test for multicollinearity. The first evidence of multicollinearity being present is very high R2 while there are very few statistically significant t-stats (Gujarati, 2003: 359).

The variance inflation factor (VIF) increases as the collinearity of Xj on the remaining regressors increases. As a rule of thumb, if the VIF of a variable exceeds 10, which will happen if R²j exceeds 0.9, that variable is said to be highly collinear (Gujarati, 2003: 360).

A third test for multicollinearity is high pair-wise correlations among regressors. If the pair-wise or zero-order correlation coefficient between two regressors is high, >0.8, then multicollinearity is a serious problem (Gujarati, 2003: 362).

There are a number of potential solutions to the problem of multicollinearity. The first is to drop collinear variables from the model. However one has to be very careful when dropping variables. The model is based on economic theory and so dropping variables to get rid of multicollinearity is not based on economic theory and is therefore ideologically unsound. Furthermore it can cause specification bias which is a more serious problem than multicollinearity (Gujarati, 2003: 364).

Another potential solution is to transform the data by first-differencing the variables and then running those variables in the regression. However, the economic theory is based on the variables being in level form and therefore using first-differenced variables would not provide and easily interpreted result (Gujarati, 2003: 364).

The final solution would be to find and use new data. This would be the best solution.

Unfortunately this is the data for the period and there is no other data for the period in South Africa. Better proxies, particularly for skills, may help the problem but finding this data is unfortunately not possible either as explained previously (Gujarati, 2003:

364).

Blanchard suggests that econometricians might do nothing if multicollinearity is present as every remedy for multicollinearity has a drawback of some sort and the cure may be worse than disease (Gujarati, 2003: 363).

47 | P a g e 3.6.2 Autocorrelation

Autocorrelation occurs when the error terms are correlated i.e. there is a systematic pattern among the residuals. Autocorrelation will result in the coefficient estimates being inefficient. The usual t and F test are no longer valid and the R² will be incorrect (Verbeek, 2004: 97).

The first test for autocorrelation is the Durbin-Watson d test. If the reported d statistic is equal to 2 then there is no autocorrelation present in the model. The d test, however, has a couple of drawbacks. If it falls in the ‘indecisive zone’, one cannot conclude whether or not autocorrelation is present. Secondly if the model contains lagged values, the d stat is often around 2 even if autocorrelation is present.

Furthermore the d stat only identifies first order AR(1) autocorrelation (Gujarati, 2003: 467; Verbeek, 2004: 102).

The Breusch-Godfrey (BG) test is another test for autocorrelation and is considered more general than the Durbin-Watson d test as it allows for nonstochastic regressors such as lagged values, higher-order autoregressive schemes and simple or higher order moving averages of white noise error terms (Gujarati, 2003: 472).

3.6.3 Heteroskedasticity

Heteroskedasticity is another violation of the classical linear regression model and means that the disturbances µi do not all have the same variance. In the presence of heteroskedasticity, the coefficient estimates are inefficient, making precise estimation difficult. Hence persisting with the usual testing procedures despite heteroskedasticity being present, may lead to misleading results and inferences (Gujarati, 2003: 394; Verbeek, 2004: 82).

White’s test is a widely used method to establish whether heteroskedasticity is present in the regression model. The test entails first estimating the residuals µi. Then an auxiliary regression is run in which the squared residuals from the original regression are regressed on the original regressors. Under the null hypothesis that there is no heteroskedasticity, the sample size (n) multiplied by the R2 from the auxiliary regression follows the chi-square distribution (Gujarati, 2003: 413; Verbeek,

48 | P a g e 2004: 92). If the calculated test statistic exceeds the critical value, then we reject the null-hypothesis of constant error variance and conclude that there is heteroscedasticity. If not, then we do not reject the null-hypothesis and conclude that there is no evidence of heteroscedasticity.

White also provides a remedial measure in order to correct for heteroskedasticity.

This procedure allows statistical inferences to be made about the true coefficient values (Gujarati, 2003: 417).