Model Transformation - Investigating the spatial distribution of diabetes in Africa using both

Figure 4.31. Diagnostic plots of the linear regression

This section compares the results of the different transformations implemented, which are:

 Log taken on dependent variable only

 Log taken on dependent variable and all the independent variables

 Fractional probit regression.

4.6.1. Results of Log Linear Regression Model

Table 4.16: Log -linear Regression Dependent variable transformed (Model 2) Coefficients Estimates Standard error t-value Pr (> |t|)

Intercept -0.797 0.877 -0.948 0.368

Health expenditure

-0.033 0.031 -1.081 0.285

GDP 0.00004 0.00002 1.568 0.124

Population age 0.041 0.019 2.219 0.031*

Urbanization -0.006 0.005 -1.416 0.164

Physician density 0.368 0.155 2.369 0.022*

MYSC 0.036 0.037 0.970 0.337

Residuals:

Min 1Q Median 3Q Max

-1.43218 -0.27361 -0.07971 0.22681 0.96483 Residual standard error: 0.4876 on 47 degrees of freedom Multiple R-squared: 0.5143, Adjusted R-squared: 0.4523 F-statistic: 8.295 on 6 and 47 DF, p-value: 0.00000380

*significant variables

The result of log linear regression with the six variables (model 2) shows that urban population growth and physician density are significant with p-values of 0.031 and 0.022 respectively.

𝑅² = 0.5143, which shows that 51.43% of the variance in the diabetes prevalence can be predicted from all the independent variables in the data set. The overall model fit given 𝐹_(12,41)= 8.295 >

𝑝(𝑣𝑎𝑙𝑢𝑒) = 0.0000, leading us to conclude that the independent variables reliably predicts the dependent variable.

From the parameter of estimates in Table 4.16, it can be seen that for every increase in population age by one unit, we expect diabetes prevalence to increase by 0.041 on average, holding all other

variables constant. Lastly, for every increase in physician density by one unit, we expect diabetes prevalence to increase by 0.368 on average, holding all other variables constant.

Figure 4.32. Diagnostic plots of the-Log linear regression The diagnostic, residual, and q-q plots show that the data is normally distributed.

4.6.2. Result of Fractional Probit Model

Table 4.17: Parameter estimates for fractional Probit regression model

Variables Parameter

estimate

Standard

error z- value Pr > |t| 95% confidence limits

Low High

cons -3.030 0.360 -8.41 0.000 -3.735 -2.324

Health expenditure

-0.22 0.015 -1.44 0.149 -0.510 0.008

GDP 0.00001 0.000006 2.26 0.024* 0.000001 0.00003

Population age 0.026 0.008 3.45 0.001* 0.011 0.414

Urban Population -0.004 0.002 -1.61 0.108 -0.008 0.0008

Physician density 0.182 0.039 4.66 0.000* 0.012 0.259

MYSC 0.125 0.012 0.99 0.324 -0.012 0.037

Wald Chi (6) = 169.35 Prob > chi2 = 0.0000 Pseudo R2 = 0.0417

Pseudo likelihood = -10.819

*significant variables

The fractional probit (also known as fractional response estimator) used the proportion of the diabetes prevalence which was first calculated as the percentage of diabetes prevalence divided by the total population for each country. The fractional response estimator fits models on continuous zero to one data using probit, logit, and heteroskedastic probit. The result of a fractional probit with the six variables (model 3) shows that GDP, population age, and physician density are significant with p-values of 0.024, 0.001 and 0.000 respectively.

The parameter of estimates shows that for every increase in GDP proportion, we expect diabetes prevalence to increase by 0.00001 on average, holding all other variables constant. Also, for every increase in the proportion of physician density, we expect diabetes prevalence to increase by 0.183 on average, holding all other variables constant.

4.6.3. Poisson Regression Results

In this section, instead of working with prevalence of diabetes, the dependent variable is transformed into count and a Poisson regression was utilised. The Poisson regression model was estimated in SAS using the PROC GENMOD and the result of the parameter estimate is shown in Table 4.18.

Table 4.18: Parameter estimates of Poisson regression

Parameter DF Estimate Standard error

Wald 95% Confidence Limits

Wald Chi- Square

Pr > Chi square Intercept 1 15.2881 0.0020 15.2842 15.2921 5.712E7 < .0001 Health

expenditure

1 -0.0789 0.0001 -0.0791 -0.0787 984347 <.0001

GDP 1 -0.0001 0.0000 -0.0001 -0.0001 2098808 < .0001

Population age 1 -0.0367 0.0000 -0.0369 -0.0367 739373 < . 0001 Urbanization 1 -0.0021 0.0000 -0.0021 -0.0002 39139.5 < .0001 Physician

density

1 1.0227 0.0001 1.0924 1.0929 5,937E7 < .0001

MYSC 1 0.1975 0.0001 0.1973 0.1977 3606594 < .0001

Scale 0 1.0000 0.0000 1.0000 1.0000

AIC = 55551557.29 AICC = 55551560.80 BIC = 555515

Deviance = 1293835.13, Scaled Deviance = 1293839.13, Pearson Chi-Square = 1670118.01 with degree of freedom 43

*significant variables

Table 4.18 shows the result of the parameters estimates, using a maximum likelihood method. All the predictors were significant with p-value < 0.05. Therefore, we can conclude that for a one-unit increase in physician density, the difference in the logs of expected counts would be expected to increase by 1.0227 units while holding other variables in the model constant.

However, for a one-unit increase in urbanization, the difference in the logs of expected counts would be expected to decrease by 0.0021 units while holding other variables in the model constant.

4.6.4. Negative binomial model result

Table 4.19. Parameter estimates of Negative binomial model

Parameter DF Estimate Standard error

Wald 95% Confidence Limits

Wald Chi- Square

Pr > Chi square

Intercept 1 13.9574 2.774 9.8857 18.0290 45.14 <

.0001*

Health expenditure

1 -0.1388 0.0934 -.0.3236 0.0459 2.17 0.1407

GDP 1 -0.0001 0.0001 -0.0003 0.0000 3.79 0.0517*

Population age 1 0.0052 0.0478 -0.0885 0.0989 0.01 0.9130

Urbanization 1 -0.0166 0.0176 -0.0511 0.0179 0.89 0.3463

Physician density

1 1.2683 0.5245 0.2403 2.2962 5.85 0.0156*

MYSC 1 0.1940 0.1128 -0.0270 0.4150 2.96 0.0854

Dispersion 0 1.3729 0.2345 0.9823 1.9189

AIC = 1478.18 AICC = 1481.69 BIC = 1493.47 deviance = 1.3999, Pearson Chi-Square 1.4128 with 43 degrees of freedom

*significant variables

The output in Table 4.19 begins with the model information and the criteria for assessing goodness of fit. The number of observations is 50, and the link function is log. Pearson Chi-square is 1.4128, the deviance statistics is 1.3999 with 42 degrees of freedom for both, and with the dispersion ratio as given. The deviance statistics shows that there is over-dispersion, and a non-significant value of Pearson statistics suggested that the model adequately fits the data. A dispersion ratio value close to one is an evidence of over-dispersion in the model. Therefore, the model shows convincing evidence of over-dispersion. The AIC, BIC, and AICC are 1478.18, 1493.47, and 1481.69, respectively. Finally, from the Table 4.16, the algorithm for parameter estimates has converged, implying that a solution was found.

GDP and physician density were seen to be the significant variables in this model with p-values of 0.0517 and 0.0156, respectively. The coefficient estimates of physician density =1.2683 shows an increase compared with Poisson regression results), with SE=0.5245, 95% CI of (0.2403, 2.2962), a wider interval as compared to Poisson regression, with p-value = 0.0693, and a Wald chi square

statistic of 5.85. This means that for a one-unit increase in physician density, the expected log count of the diabetes prevalence will increase by 1.2683. A significant GDP of p-value = 0.0517 means that for each unit increase in GDP, the expected log count of the diabetes prevalence will decrease by 0.0001. This shows that with a negative binomial model, physician density per 1000 persons, GDP, and population age are significant and have a positive relationship with diabetes prevalence on the continent.

In addition, there is an estimate of the dispersion coefficient (often called alpha) = 1.3729. A Poisson model is one in which this alpha value is constrained to zero. In this study, the estimated alpha has a 95% CI that does not include zero, which suggested that the negative binomial is more appropriate than Poisson. An estimate greater than zero suggest over-dispersion (variance greater than mean), and an estimate less than zero suggest under-dispersion, which is rare.

With a sign of over-dispersion in Poisson regression, negative binomial is used to relax the assumptions stated earlier and to uses a different probability which allow for more variability in the data.

Dalam dokumen Investigating the spatial distribution of diabetes in Africa using both classical and Bayesian approaches. (Halaman 161-167)