4. Regression Analysis
7.3 DETERMINANTS OF INCOME
7.3.2 Multiple Regression
144
those with external locus of control (Gh ₵ 18). Hence, operators who base life outcomes on their personal actions (those with internal locus of control), earn more than operators who attribute life outcomes on external factors such as luck, fate and circumstances (those with external locus of control), consistent with Cobb-Clark, Kassenboehmer and Sinning’s (2013) result in Australia that, persons with internal earn more than persons with external locus of control.
145
a weaker correlation with networks (r= 0.23). Network also has a significant but negative and weak correlation with gender and sector (r= -0.14 and –0.04 respectively).
Table 7.8 Correlation analysis of regression variables
Pearson’s correlation
Ln Income Gender Education Sector Labour Network Locus of Control Age of business
Ln Income Corr p- value N
1 344 Gender Corr
p- value N
0.06 0.28 344
1 344 Education Corr
p- value N
0.31**
0.00 344
0.19**
0.00 344
1 344 Sector Corr
p- value N
0.07 0.16 344
0.39**
0.00 344
0.26**
0.00 344
1 344 Labour Corr
p- value N
0.7**
0.00 344
0.21**
0.00 344
0.41**
0.00 344
0.18**
0.00 344
1 344 Network Corr
p- value N
0.27**
0.00 344
-0.14**
0.01 344
0.16**
0.00 344
-0.04 0.48 344
0.28**
0.00 344
1 344 Locus of
control
Corr p- value N
0.53**
0.00 344
0.21**
0.00 344
0.35**
0.00 344
-0.00 0.9 344
0.56**
0.00 344
0.23**
0.00 344
1 344 Age of
business
Corr p- value N
0.58**
0.00 344
-0.12*
0.02 344
-0.03 0.63 344
-0.17*
0.03 344
0.46**
0.00 344
0.36*
0.00 344
0.34**
0.00 344
1 344 Source: Own computation, results obtained from STATA
** –1%,
* – 5%
146
Locus of control is significant and positively correlated with gender (r= 0.21) and age of business (0.34). Gender has a significant and positively moderate correlation with sector (r= 0.39) and a negatively weak but significant correlation with age of business (r= -0.12, p-value =0.024).
Finally, there exists a negatively weak and significant correlation between sector and age of business (r = -0.17, p-value =0.03).
After the correlation analysis, a multiple regression analysis is run to see the collective effect of gender, networks, locus of control, sector, education, age of business and size of labour on log of income.
Table 7.9 Determinants of income in slum activities (Regression results, Model 1) Ln Income
(Independent var)
β St. Error t p
value
Comment VIF 1/VIF
Gender -.0582 .0371 -1.58 0.115 1.35 0.74
Education .0097 ** .0043 2.24 0.026 Sig @5% 1.42 0.71
Sector .033 .0367 0.92 0.359 1.30 0.77
Labour size .084** .0098 8.59 0.000 Sig @5% 2.03 0.49
Network -.017 .0227 -0.77 0.444 1.23 0.81
Locus of control .024** .0063 3.94 0.000 Sig @5% 1.62 0.61 Age of business .147** .0192 7.67 0.000 Sig @5% 1.60 0.62 Constant 2.235 .1237 18.07 0.000
Mean VIF 1.51
R-squared = 0.60; Adjusted R-squared = 0.592; F (7, 336) = 72.2; Prob > F= 0.0000; Root MSE
= .29104
Breusch-Pagan test for heteroscedasticity Ho: Constant variance
Chi2 (1) = 5.79 Prob > Chi2 = 0.0161
Source: Own computation, results obtained from STATA
147
Table 7.9 shows the initial OLS regression results of the relationship between Ln Income and gender, networks, locus of control, sector, education, age of business and size of labour. The regression results (Model 1) are summarised below:
𝐿𝐿𝐶𝐶 𝐼𝐼𝐶𝐶𝐼𝐼𝐶𝐶𝐶𝐶𝐶𝐶= 2.235−0.582𝐺𝐺𝐶𝐶𝐶𝐶𝐺𝐺𝐶𝐶𝐺𝐺+ 0.0097𝐸𝐸𝐺𝐺𝑢𝑢𝐼𝐼+ 0.033𝑆𝑆𝐶𝐶𝐼𝐼𝐶𝐶𝐶𝐶𝐺𝐺+ 0.084𝐿𝐿𝐻𝐻𝐿𝐿𝐶𝐶𝑢𝑢𝐺𝐺 − 0.017𝑁𝑁𝐶𝐶𝐶𝐶𝑁𝑁𝐶𝐶𝐺𝐺𝑁𝑁+ 0.0241𝐿𝐿𝐶𝐶𝐼𝐼𝑢𝑢𝐻𝐻+ 0.147𝐵𝐵𝑢𝑢𝐻𝐻𝐵𝐵𝐵𝐵𝐶𝐶
𝐶𝐶= (18.07) (−1.58) (2.24) (0.92) (8.59) (−0.77) (3.94) (7.67)
Model 1 shows a positive relationship between Ln Income and education, sector, labour size, locus of control and age of business. On the other hand, gender and network are negatively related to Ln Income.
The VIF test, as explained under section 5.6.2.4 (1) in Chapter Five, was undertaken to test for the presence of multicollinearity, since some of the explanatory variables are correlated, as discovered when the correlation analysis was undertaken. The VIF test for multicollinearity, shows tolerance levels higher than 0.1 and VIF coefficients less than 10. Since the VIF values are less than 10, it can therefore be concluded that there is no multicollinearity betwen the variables (Wooldridge, 2013).
In regression analysis, one has to check whether the variances associated with the predicted variables tend to be the same. If the spread of the residuals at each level of the predictor variable is unequal, then the regression is said to suffer from heteroscedasticity. To check for heteroscedasticity, the Breusch-Pagan / Cook-Weisberg test is used. The results of the Breusch- Pagan / Cook-Weisberg test for heteroscedasticity has a p-value of 0.0161 (Table 7.9), which is less than the critical value of 0.05; hence one rejects the null hypotheses of homoscedasticity and concludes that there is heteroscedasticity present in our data.
As discussed in section 5.6.2.4 (2), the The Robust standard errors method is used to correct heteroscedasticity. Hence, a second regression is run. This time, a Robust Backward Elimination Stepwise regression is run using STATA. The Backward Elimination Stepwise regression eliminates insignificant variables, giving us results of only significant variables. Thus, through this stepwise regression, variables such as gender, sector and network have been eliminated from the second analysis.
148
Table 7.10 presents results from the Robust Backward Elimination Stepwise regression. Model 2 is represented as:
𝐿𝐿𝐶𝐶 𝐼𝐼𝐶𝐶𝐼𝐼𝐶𝐶𝐶𝐶𝐶𝐶 = 2.1863 + 0.0098𝐸𝐸𝐺𝐺𝑢𝑢𝐼𝐼+ 0.0826𝐿𝐿𝐻𝐻𝐿𝐿𝐶𝐶𝑢𝑢𝐺𝐺+ 0.0225𝐿𝐿𝐶𝐶𝐼𝐼𝑢𝑢𝐻𝐻+ 0.148𝐵𝐵𝑢𝑢𝐻𝐻𝐵𝐵𝐵𝐵𝐶𝐶 t = (20.02) (6.08) (2.59) (3.37) (7.44)
Model 2 is adjusted for heteroscedasticity with the robust standard errors method. The F test for the overall significance of the model is Prob > F= 0.0000, signifying that the explanatory variables (education, labour, locus of control and age of business) together, have an effect on the dependent variable (Ln Income) with an R2 of about 60% (Table 7.10). This implies that the significant explanatory variables together explain about 60% of the variation in the dependent variable, income.
Table 7.10 Determinants of income in slum activities (Regression results, Model 2) Robust
Ln Income (Independent var)
β St. Error t p
value
VIF 1/VIF Labour size .0826 .0136 6.08 0.000 1.91 0.525 Education .0098 .0038 2.59 0.010 1.35 0.74 Locus of control .0225 .0067 3.37 0.001 1.53 0.654 Age of business .148 .02 7.44 0.000 1.41 0.709 Constant 2.1863 .109 20.02 0.000
Mean VIF 1.55
R-squared = 0.597; F (4, 339) = 82.25; Prob > F= 0.0000; Root MSE = .2910 Source: Own computation, results obtained from STATA
The new model shows that, Ln Income has a positive relationship with labour size, education, locus of control and age of business. Since Ln Income is the log transformed version of income, Ln Income is interpreted as an elasticity.
149
In terms of labour size (β = 0.0826), the results indicate there will be an 8.3% increase in income if one additional person is hired, holding all the other variables constant. An extra year of education (β = 0.009) increases an operator’s income by almost 1%. If locus of control (β = 0.0225), which Rotter (1966) refers to as the degree to which one attributes rewards to their own efforts or factors independent of one’s personal actions, increases by 1 point, this is likely to increase an operator’s income by 2.25%. The age of the operator’s business (β = 0.148), has the highest effect on income, reflecting that, an additional year in business, increases income by about 15%, other variables assumed to be constant.
7.3.2.1 Residual Analysis
Residual analysis was performed to ascertain the appropriateness of linear regression for the data set. The generated residuals of Model 2 were plotted against the fitted model and individually with the independent variables to see the dispersion.
Firstly, the residuals were plotted with the fitted model (Model 2). The scatter plot of the residuals and fitted model (Appendix 4, page 207) shows fairly dispersed plots around the zero line. The shape of the plot might however suggest that, one of the variables might have a parabolic effect on the model (Gujarati and Porter, 2009).
Again, the residuals were plotted with the explanatory variables. The residuals plotted against labour size, as an explanatory factor, (Appendix 5, page 208) shows a biased but homoscedastic plot. It can also be observed that the plot is parabolic in nature. This is an indication that, labour might have a quadratic effect on the model and needs to the squared. Hence a new regression is run with labour squared to see if the parabolic shape will be corrected. Results of the model are shown in the next section.
The residual plot for education as an independent variable, show a fair distribution around the zero line (Appendix 6, page 208). The residual plot for Locus of control (Appendix 7, page 209) however, is slightly skewed to the right because of the high number of respondents who have internal locus of control. That aside, the plot seems fairly scattered around the zero axis. The residual plot for age of business also shows a fair distribution around the zero line (Appendix 8, page 209).
150 7.3.2.1.1 New regression with labour squared
As observed in the section above, the scatter plot of residuals and size of labour looks parabolic, suggesting labour might have a quadratic effect on income earned in the slum. Firstly, a bar chart of income and labour size (Appendix 9, page 210) is drawn to see if the parabolic shape will be seen. The regression is re-run with labour squared and the residual plot drawn. This helps to ascertain if labour does have a quadratic effect on income.
Plotting labour size against Ln Income on a bar chart (Appendix 9, page 210) it is observed that, as the size of labour increases, income also increases up to a point, reaches a maximum and starts decreasing. This confirms the assertion of a quadratic relationship between income and labour.
One can therefore go ahead to run a new regression, with labour squared, as the third model. The new regression model is presented as:
Table 7.11 Determinants of income in slum activities (Regression model 3) Robust
Ln Income β St. Error t p value VIF 1/VIF Locus of control .0404 .0066 6.08 0.000 1.39 0.719 Education .0184 .0044 4.17 0.000 1.37 0.727 Sector .0622 .034 1.83 0.068 1.12 0.891 Age of business .207 .0210 9.84 0.000 1.26 0.791 Labour2 0.002 .0001 1.89 0.060 1.43 0.7 Constant 1.775 0.109 16.25 0.000
Mean VIF 1.32
R-squared = 0.512; F (5, 338) = 71.21; Prob > F= 0.0000; Root MSE = .3189 Source: Own computation, results obtained from STATA
Model 3 is stated as:
𝐿𝐿𝐶𝐶 𝐼𝐼𝐶𝐶𝐼𝐼𝐶𝐶𝐶𝐶𝐶𝐶= 1.775 + 0.0404𝐿𝐿𝐶𝐶𝐼𝐼𝑢𝑢𝐻𝐻+ 0.0184𝐸𝐸𝐺𝐺𝑢𝑢𝐼𝐼+ 0.0622𝑆𝑆𝐶𝐶𝐼𝐼𝐶𝐶𝐶𝐶𝐺𝐺+ 0.207𝐵𝐵𝑢𝑢𝐻𝐻𝐵𝐵𝐵𝐵𝐶𝐶 + 0.002𝐿𝐿𝐻𝐻𝐿𝐿𝐶𝐶𝑢𝑢𝐺𝐺𝑆𝑆𝐿𝐿𝑢𝑢𝐻𝐻𝐺𝐺𝐶𝐶𝐺𝐺
t = (16.25) (6.08) (4.17) (1.83) (9.84) (1.89)
151
Model 3 (Table 7.11) is adjusted for heteroscedasticity with the robust standard errors method.
The VIF coefficient for the model shows no signs of multicollinearity as the tolerance levels are higher than 0.1 and VIF coefficients are less than 10, as indicated in Table 7.11. The VIF values range from 1.12 to 1.43 (Table 7.11). The residual plots in the Appendix 10 (page 210) show that the scatter plots of residuals and fitted model 3 and residuals and labour squared are no longer parabolic in nature.
Model 3 shows a positive and significant relationship between Ln Income and locus of control (β=0.0404), education (β=0.0184), sector (β=0.0622), age of business (β=0.207) and labour2 (β=0.002). A one point increase in locus of control is likely to increase income by 4%. Income can increase by about 2% if an operator achieves an extra year of education. Operators in the industrial sector earn 6% higher than their counterparts in the services sector. An extra year in a business’s age contributes about 21% increase in income, while an extra labour squared hired only increases an operator’s income by 0.2%.
The R2 of this model is 51.2%. This figure is less than the R2 model 2 (60%), showing model 2 explains the variations in Ln Income better than model 3. However, since in our diagnostics diagrams (pages 207 – 211), Model 2 shows signs of a quadratic effect of labour on Ln Income and Model 3 corrects that effect, one can conclude that Model 3 is a better fitted model.