CHAPTER 6 THE EFFECTS OF GRAIN STORAGE TECHNOLOGIES ON THE HUNGER GAP
6.2 Research methodology
6.2.3 Model choice and specification
111
and hunger gap intensity. Shamva district was chosen as the base category because smallholder maize farming is more predominant than in Makoni district. Sinyolo (2016), in a study carried out in South Africa, uses a district dummy variable to capture the effect of location-specific factors. Therefore, the expected impact of the district on hunger gap and its intensity is positive.
On the other hand, the study recognizes the heterogeneity nature of the land tenure of the smallholder farming households in the different wards. Smallholder farming households in Zimbabwe are comprised of the old resettlement farmers, communal farmers, model A1 farmers (newly resettled farmers through land reform) and small-scale commercial farmers.
The different farming sectors depict a diversity of agricultural production and resource endowments of the smallholder farmers (Ndakaza et al., 2016). Therefore, land tenure (land_tenure) is captured as a dummy variable, with small-scale commercial as the reference category. The effect of land tenure on hunger gap and hunger gap intensity is expected to be negative.
112
equation. In this case, the majority of households reported a positive number of months without grain in storage making the Heckman approach less appropriate. Furthermore, Heckman regression is designed for incidental truncation, where the zeros are unobserved values (Ricker-Gilbert et al., 2011). A corner solution model thus becomes more appropriate in this context because, due to market and agronomic conditions, the zeros in the data reflect farmers‟ optimal choice rather than a missing value ( as with Heckman). Other alternatives to least squares, which are both corner solution models, are the Tobit estimator proposed by Tobin (1958) and the double hurdle (DH) proposed by Cragg (1971). The Tobit model could be used to model households‟ hunger gap occurrence but its major drawback is that it requires hunger gap occurrence and its intensity to be determined by the same process, that is the same variables, making it fairly restrictive(Wooldridge, 2003 and Ricker-Gilbert et al., 2011). More so, in a Tobit model, the partial effects of a particular explanatory variable on the probability that a household incurs hunger gap and in the expected value of the number of hunger gap months, conditional on hunger gap occurrence, have the same signs (Wooldridge, 2008).
The DH model is a more flexible alternative than the Tobit because it allows for the possibility that factors influencing hunger gap occurrence to be different than factors affecting the intensity of hunger gap (Burke, 2009; Ricker-Gilbert et al., 2011). Hence, a DH model as proposed by Cragg (1971) was used in this study. The DH model is designed to analyse instances of an event that may occur or may not occur, and if it occurs, takes on continuous positive values (Tura et al., 2016). The first hurdle estimates the possibility of incurring a hunger gap or not and, conditional on hunger gap occurrence, the second hurdle estimates the number of months without grain in storage(hunger gap intensity). The binary variable, hunger gap, is used to estimate the maximum likelihood estimator (MLE) of the first hurdle and is assumed to follow a logit model. The use of the logit and probit models will depend on whether an assumption is made that the stochastic error term, follows a logistic distribution or a standard normal distribution, respectively (Wooldridge, 2002). According to Gujarati (1988), it does not matter much which function is used since the logistic and probit formulation are quite comparable and the two models may give the same result. In this study, a logit model is chosen over a probit model because it is simpler and extremely flexible to work with. The functional form of the logit model is specified as follows (Gujarati, 1995;
Greene, 2003),
113
P ( =1) = (1) Equation (1) above can be rewritten as,
P ( =1) = , (2)
where: P ( =1) is the probability that household has hunger gap, is the function of a vector of n independent variables. Equation (2) is the cumulative distribution function. It follows that if P ( =1) is the probability of experiencing hunger gap, then 1- P ( =1) represents the probability of experiencing zero hunger gap and is expressed as,
1- P ( =1) = , (3) thus, we can write,
= (4)
Equation (4) is simply the odds ratio, the ratio of the probability that a household experiences hunger gap to the probability that it experiences no hunger gap. By taking the natural log of equation (4) we obtain
= ln = , (5)
where is the natural logarithm of the odds ratio which is not only linear in the explanatory variable but also in the parameters. Thus introducing the stochastic error term, the logit model can be written as
= ln = , (6)
= + + +…+ + (7)
where is an intercept and , , …, are slopes of the equation in the model, and X is a vector of relevant household characteristics as hypothesized in the study. On the other hand, hunger gap intensity, a continuous variable, is assumed to follow a truncated normal distribution. Thus the MLE is obtained by fitting a truncated normal regression model to the number of months without grain in storage (hunger gap intensity) (Cragg, 1971 and Burke, 2009). The difference between the logit and truncated regression model is that in the truncated regression model only a part of the distribution of the outcome variable, hunger gap, is considered for analysis while in logit model, all the observations of the outcome variable are considered. This means that in logit model, the analysis considers those households who incurred a hunger gap as well as those that did not incur a hunger gap, the full sample (413 households) while truncated regression looks at only those households that incurred a hunger gap ( 281 households). A total of sixteen explanatory variables are used to
114
model hunger gap while 14 explanatory variables are used in the hunger gap intensity model, as outlined in Table 6.1.
Before running the models, all the hypothesized explanatory variables were checked for the existence of multicollinearity problem. Variance Inflation Factor (VIF) and contingency coefficients for association among the continuous and dummy variables respectively are often the two measures used to test the existence of multicollinearity. In this study, these two were used accordingly (Appendix H and Appendix I). According to Maddala (1992), VIF can be defined as:
VIF( ) = (8)
Where R is the squared multiple correlation coefficients between and the other explanatory variables. The larger the value of VIF, the more troublesome it is. As exceeds 0.95, that variable is said to be highly collinear (Gujarati, 1995). Similarly, contingency coefficients for dummy variables will be calculated as:
CC=√ (9)
Where CC is contingency coefficient, =chi-square value and =total sample size