Figure 4.31. Diagnostic plots of the linear regression
This section compares the results of the different transformations implemented, which are:
Log taken on dependent variable only
Log taken on dependent variable and all the independent variables
Fractional probit regression.
4.6.1. Results of Log Linear Regression Model
Table 4.16: Log -linear Regression Dependent variable transformed (Model 2) Coefficients Estimates Standard error t-value Pr (> |t|)
Intercept -0.797 0.877 -0.948 0.368
Health expenditure
-0.033 0.031 -1.081 0.285
GDP 0.00004 0.00002 1.568 0.124
Population age 0.041 0.019 2.219 0.031*
Urbanization -0.006 0.005 -1.416 0.164
Physician density 0.368 0.155 2.369 0.022*
MYSC 0.036 0.037 0.970 0.337
Residuals:
Min 1Q Median 3Q Max
-1.43218 -0.27361 -0.07971 0.22681 0.96483 Residual standard error: 0.4876 on 47 degrees of freedom Multiple R-squared: 0.5143, Adjusted R-squared: 0.4523 F-statistic: 8.295 on 6 and 47 DF, p-value: 0.00000380
*significant variables
The result of log linear regression with the six variables (model 2) shows that urban population growth and physician density are significant with p-values of 0.031 and 0.022 respectively.
𝑅2 = 0.5143, which shows that 51.43% of the variance in the diabetes prevalence can be predicted from all the independent variables in the data set. The overall model fit given 𝐹(12,41)= 8.295 >
𝑝(𝑣𝑎𝑙𝑢𝑒) = 0.0000, leading us to conclude that the independent variables reliably predicts the dependent variable.
From the parameter of estimates in Table 4.16, it can be seen that for every increase in population age by one unit, we expect diabetes prevalence to increase by 0.041 on average, holding all other
variables constant. Lastly, for every increase in physician density by one unit, we expect diabetes prevalence to increase by 0.368 on average, holding all other variables constant.
Figure 4.32. Diagnostic plots of the-Log linear regression The diagnostic, residual, and q-q plots show that the data is normally distributed.
4.6.2. Result of Fractional Probit Model
Table 4.17: Parameter estimates for fractional Probit regression model
Variables Parameter
estimate
Standard
error z- value Pr > |t| 95% confidence limits
Low High
cons -3.030 0.360 -8.41 0.000 -3.735 -2.324
Health expenditure
-0.22 0.015 -1.44 0.149 -0.510 0.008
GDP 0.00001 0.000006 2.26 0.024* 0.000001 0.00003
Population age 0.026 0.008 3.45 0.001* 0.011 0.414
Urban Population -0.004 0.002 -1.61 0.108 -0.008 0.0008
Physician density 0.182 0.039 4.66 0.000* 0.012 0.259
MYSC 0.125 0.012 0.99 0.324 -0.012 0.037
Wald Chi (6) = 169.35 Prob > chi2 = 0.0000 Pseudo R2 = 0.0417
Pseudo likelihood = -10.819
*significant variables
The fractional probit (also known as fractional response estimator) used the proportion of the diabetes prevalence which was first calculated as the percentage of diabetes prevalence divided by the total population for each country. The fractional response estimator fits models on continuous zero to one data using probit, logit, and heteroskedastic probit. The result of a fractional probit with the six variables (model 3) shows that GDP, population age, and physician density are significant with p-values of 0.024, 0.001 and 0.000 respectively.
The parameter of estimates shows that for every increase in GDP proportion, we expect diabetes prevalence to increase by 0.00001 on average, holding all other variables constant. Also, for every increase in the proportion of physician density, we expect diabetes prevalence to increase by 0.183 on average, holding all other variables constant.
4.6.3. Poisson Regression Results
In this section, instead of working with prevalence of diabetes, the dependent variable is transformed into count and a Poisson regression was utilised. The Poisson regression model was estimated in SAS using the PROC GENMOD and the result of the parameter estimate is shown in Table 4.18.
Table 4.18: Parameter estimates of Poisson regression
Parameter DF Estimate Standard error
Wald 95% Confidence Limits
Wald Chi- Square
Pr > Chi square Intercept 1 15.2881 0.0020 15.2842 15.2921 5.712E7 < .0001 Health
expenditure
1 -0.0789 0.0001 -0.0791 -0.0787 984347 <.0001
GDP 1 -0.0001 0.0000 -0.0001 -0.0001 2098808 < .0001
Population age 1 -0.0367 0.0000 -0.0369 -0.0367 739373 < . 0001 Urbanization 1 -0.0021 0.0000 -0.0021 -0.0002 39139.5 < .0001 Physician
density
1 1.0227 0.0001 1.0924 1.0929 5,937E7 < .0001
MYSC 1 0.1975 0.0001 0.1973 0.1977 3606594 < .0001
Scale 0 1.0000 0.0000 1.0000 1.0000
AIC = 55551557.29 AICC = 55551560.80 BIC = 555515
Deviance = 1293835.13, Scaled Deviance = 1293839.13, Pearson Chi-Square = 1670118.01 with degree of freedom 43
*significant variables
Table 4.18 shows the result of the parameters estimates, using a maximum likelihood method. All the predictors were significant with p-value < 0.05. Therefore, we can conclude that for a one-unit increase in physician density, the difference in the logs of expected counts would be expected to increase by 1.0227 units while holding other variables in the model constant.
However, for a one-unit increase in urbanization, the difference in the logs of expected counts would be expected to decrease by 0.0021 units while holding other variables in the model constant.
4.6.4. Negative binomial model result
Table 4.19. Parameter estimates of Negative binomial model
Parameter DF Estimate Standard error
Wald 95% Confidence Limits
Wald Chi- Square
Pr > Chi square
Intercept 1 13.9574 2.774 9.8857 18.0290 45.14 <
.0001*
Health expenditure
1 -0.1388 0.0934 -.0.3236 0.0459 2.17 0.1407
GDP 1 -0.0001 0.0001 -0.0003 0.0000 3.79 0.0517*
Population age 1 0.0052 0.0478 -0.0885 0.0989 0.01 0.9130
Urbanization 1 -0.0166 0.0176 -0.0511 0.0179 0.89 0.3463
Physician density
1 1.2683 0.5245 0.2403 2.2962 5.85 0.0156*
MYSC 1 0.1940 0.1128 -0.0270 0.4150 2.96 0.0854
Dispersion 0 1.3729 0.2345 0.9823 1.9189
AIC = 1478.18 AICC = 1481.69 BIC = 1493.47 deviance = 1.3999, Pearson Chi-Square 1.4128 with 43 degrees of freedom
*significant variables
The output in Table 4.19 begins with the model information and the criteria for assessing goodness of fit. The number of observations is 50, and the link function is log. Pearson Chi-square is 1.4128, the deviance statistics is 1.3999 with 42 degrees of freedom for both, and with the dispersion ratio as given. The deviance statistics shows that there is over-dispersion, and a non-significant value of Pearson statistics suggested that the model adequately fits the data. A dispersion ratio value close to one is an evidence of over-dispersion in the model. Therefore, the model shows convincing evidence of over-dispersion. The AIC, BIC, and AICC are 1478.18, 1493.47, and 1481.69, respectively. Finally, from the Table 4.16, the algorithm for parameter estimates has converged, implying that a solution was found.
GDP and physician density were seen to be the significant variables in this model with p-values of 0.0517 and 0.0156, respectively. The coefficient estimates of physician density =1.2683 shows an increase compared with Poisson regression results), with SE=0.5245, 95% CI of (0.2403, 2.2962), a wider interval as compared to Poisson regression, with p-value = 0.0693, and a Wald chi square
statistic of 5.85. This means that for a one-unit increase in physician density, the expected log count of the diabetes prevalence will increase by 1.2683. A significant GDP of p-value = 0.0517 means that for each unit increase in GDP, the expected log count of the diabetes prevalence will decrease by 0.0001. This shows that with a negative binomial model, physician density per 1000 persons, GDP, and population age are significant and have a positive relationship with diabetes prevalence on the continent.
In addition, there is an estimate of the dispersion coefficient (often called alpha) = 1.3729. A Poisson model is one in which this alpha value is constrained to zero. In this study, the estimated alpha has a 95% CI that does not include zero, which suggested that the negative binomial is more appropriate than Poisson. An estimate greater than zero suggest over-dispersion (variance greater than mean), and an estimate less than zero suggest under-dispersion, which is rare.
With a sign of over-dispersion in Poisson regression, negative binomial is used to relax the assumptions stated earlier and to uses a different probability which allow for more variability in the data.