Multiple Regression Results - Geospatial monitoring and modeling system

Regression Equation:

Y1PRICE = 87.0431 - 0.0992*X1DIST - 0.0253*X2RAIN + 0.2424*X3OXEN Regression Statistics:

R = 0.630161 R square = 0.397102

Adjusted R = 0.600469 Adjusted R square = 0.360563 F ( 3, 32) = 7.02566

ANOVA Regression Table

Source Degrees of freedom sum of squares mean square

Regression 3.00 7744.41 2581.47

Residual 32.00 11757.90 367.43

5 You may create these values files yourself with the data provided. Note that the values files will have a record for the non-market areas as well as the market points.

This value will be 0 in the left column of the values file. Delete this line from the values files before running the regression.

EXERCISE 2-16 MULTIPLE REGRESSION AND GIS ` 186

Total 35.00 19502.31

Individual Regression Coefficient

Coefficient t_test (32)

Intercept 87.04 6.01

x1dist -0.10 -2.54

x2rain -0.03 -3.04

x3oxen 0.24 1.41

Notes on the Results

Regression Equation: The regression equation outputs the regression coefficients for each of the independent variables and the intercept.

The intercept can be thought of as the value for the dependent variable when each of the independent variables takes on a value of zero. The coefficients indicate the effects of each of the independent variables on the dependent variable. For example, if the cost-distance value for an area to its central market decreases by 100 units because of the construction of a new road, then the market integration percentage increases by 9.92% (i.e., -100 multiplied -0.0992 = 9.92%).

R, Adjusted R, R square, Adjusted R square: R represents the multiple correlation coefficient between the independent variables and the dependent variable. R squared represents the extent of variability in the dependent variable explained by all of the independent variables. In our case, about 40% of the variance in the price transmission is explained by our independent variables. The adjusted R and R squared are the R and R squared after adjusting for the effects of the number of variables.⁶

F Value: The F value indicates the overall significance of the regression (i.e., whether or not the independent variables, taken jointly,

contribute significantly to the prediction of the dependent variable). A significant F value in our case, F(3, 32) with 99% confidence interval, is 4.46. The F value in this regression (7.02) is greater than the F value given in the table and hence, the overall regression is significant. If our F value was less, then we would need to rethink our selection of the independent variables.

ANOVA Table (Analysis Of Variance): A simple two variable regression can be thought of as fitting a best-fit line through the two variables plotted on an XY graph. The difference between the predicted value for a point and the actual value for that point (on the line of best fit) is the residual for that point or the unexplained variation. This is squared to take care of both negative and positive deviations. The sum of the squared residuals subtracted from the total sum of squares gives us the explained part of the regression (or what is called the regression sum

6 Refer to any text on introductory statistics for a detailed explanation of R-square, F-test and t-test.

EXERCISE 2-16 MULTIPLE REGRESSION AND GIS ` 187

of squares). You could also calculate the regression sum of squares and then subtract it from the total to get the residual sum of squares. The explained part divided by the total sum of squares yields the R-squared. Multiple regression just extends the same idea to a multi-variable scenario (a line of best fit through a multidimensional space).

Individual Regression Coefficient: As mentioned in the regression equation paragraph above, the coefficients express the individual

contribution of each independent variable to the dependent variable. The significance of the coefficient is expressed in the form of a t-statistic.

The t-statistic verifies the significance of the variables' departure from zero (i.e., no effect). In our case, the t-statistic has to exceed the following critical values⁷ in order for the independent variable to be significant:

at a 99% confidence level with 32 degrees of freedom = 2.45 at a 85% confidence level with 32 degrees of freedom = 1.055

The distance coefficient has a t-statistic of 2.54, the rainfall t-statistic is 3.04 and the oxen ownership t-statistic is 1.41 indicating that the distance and rainfall variables are highly significant (99%) while the oxen ownership is relatively less significant (85%). The t-statistic and the F statistic combined are the most common tests used in estimating the relative success of the model and for adding and deleting independent variables from a regression model.

The output also produced two values files called PREDICTION and RESIDUAL. These are the regression model predicted price transmission values and residual values. We will assign the residuals back to the market point file and briefly analyze them.

E

Display the vector file AWRAJAS with the Outline Black symbol file. Add the vector file RESIDUAL using the same symbol file.

Highlight RESIDUAL in Composer then use the Identify tool to explore the residual values for the market centers.

Analysis of the residuals can direct us to problems with the model in specific areas. High positive residuals indicate that the model is under- predicting the price transmission values for these areas. Conversely, a high negative value indicates that the actual price transmission value is less than the predicted value. By geographically linking these values to specific provinces or market areas, we can begin to formulate more specific questions that could lead to a better understanding of price transmission performance throughout Ethiopia.

7 F statistic and t-statistic look-up tables are available in the back of most elementary statistics texts.

Dalam dokumen Geospatial monitoring and modeling system (Halaman 187-190)