Investigating the spatial distribution of diabetes in Africa using both classical and Bayesian approaches.

Many studies find that the magnitude and geographic distribution of diabetes prevalence are heterogeneous (Barker, 2011; Tuomi, 2014; Schwitzgebel, 2014). According to IDF (2015), the North America and Caribbean (NAC) region has the highest prevalence of diabetes compared to other IDF regions.

Spatial Clustering of diabetes

However, we are aware of no previous studies on spatial clustering of diabetes and its correlates in Africa as a continent using meta-data. This chapter reviewed previous studies on diabetes by various researchers on the risk factors and prevalence of diabetes.

Exploratory data analysis

Shapiro–Wilk Normality Test
Spearman Rho Correlation Test
Ordinary Least Square Regression (OLS)
Geographic Weighted Regression (GWR)

If the relationship between two variables appears to be linear, a straight line can be fitted to the data to model the relationship. For one of these study aims to determine the relationship between diabetes prevalence (response variables) and the risk factors (predictors).

Cluster Analysis

Definition

GWR allows visualization of stimulus response relationships and if this relationship varies in space, and it also accounts for spatial autocorrelation of variables (Mitchell, 2012, Fotherigham et al., 2002). However, GWR has some limitations, which include the problem of multicollinearity and approaches to calculating goodness of fit statistics (Charlton and Fotheringham 2009, Leung et al. 2000, Wheeler and Tiefelsdorf 2005).

Cluster Criteria for Continuous Data

Clustering Techniques

K-Means Clustering

This study uses K-means, which is one of the oldest and most widely used clustering algorithms (Tango, 2010). An example of this approach is the K-means clustering algorithm, which attempts to minimize the sum of squared distances (errors) between objects and their cluster centroids.

K-Means Algorithm

Specifically, we investigate how best to update a cluster centroid to minimize the cluster's SSE. The goal in Ward's method is to aggregate cases into groups so that the variance within a group is minimized.

Conclusion

In this method, the average similarity of the cluster is measured at each step, and the difference between each instance within a cluster and the average similarity is calculated and squared. The method is used to predict and judge the number of clusters that should be retained or considered.

Spatial Analysis

Introduction
Classification of Disease Clustering
Spatial Neighbours and Weights
Spatial Autocorrelation Tests
General Test for Detecting Spatial Clustering (Moran I Indices and Local Indexes)

The correlograms of the Moran's I statistic are used to determine the appropriate number of neighbors or distance. The purpose of the LISA test is generally to detect local areas with similar values.

Distribution for Count data

Modelling Count Data
The Poisson distribution
The Negative Binomial Distribution
The Fractional Probit Model
Regression Model for Count Data

This assumption implies that the Poisson distribution does not allow the variance to adjust independently of the mean. In case of overdispersion of data, one of the distributions that can provide a better fit is the negative binomial distribution.

Principles of Bayesian Methods

The Likelihood
Choice of Prior Distribution
The Prior Distribution
Non-Informative Priors
Conjugate Prior
Bayesian Linear Regression model
Bayesian Poisson Regression
Bayesian Negative-Binomial

Here 𝑝(𝜃|𝑦) is called the posterior probability function of 𝜃 given the observed data, and 𝑔(𝜃) is the prior probability function 𝜃. By examining the core of the prior probability product, the conjugation prior can be identified. In the Bayesian approach, the posterior distribution contains all the relevant information because it combines the prior information and the probability of the data.

From equation 3.56, the right-hand side is the product of the probability and the prior distribution.

Bayesian Markov Chain Monte Carlo Methods (MCMC) Methods: Parameter Estimation . 69

Gibbs Sampler versus Metropolis-Hastings

There are advantages and disadvantages to using Gibbs sampler and Metropolis Hastings methods. Lawson, 2013, Lesaffre and Lawson, 2012) stated that the Gibbs Sampler provides one new value for each estimate at each interaction. On the other hand, Metropolis-Hastings steps are usually faster with each iteration, but do not guarantee acceptance of the value generated in exploration. Using a more global proposal in the Metropolis Hastings algorithm can provide higher efficiency, but the disadvantage lies in the choice of proposal distribution.

On the other hand, M-H is easy to understand and can use other software such as SAS, Stata and SPSS.

Monte Carlo Error

Assessing MCMC Convergence Diagnostic

The second method, known as the window estimator, is based on the expression of variance in autocorrelated samples given by Roberts (1996). However, a multi-chain convergence diagnostic is an evidence of consistency across different subspaces ((Lawson et al., 2003). Here the focus is on the MC error, which measures the change in the mean of the parameter of interest due to simulation.

The Markov Chain Monte Carlo diagnosis of model output as explained above will be discussed below:

Comparison of Bayesian and Frequentist Approaches

This graph is an indication of the stability in the mentioned quantiles, which also implies convergence. Frequentist measures such as the p-value and confidence interval continue to dominate research, especially in the life sciences. However, in the current era of powerful computers and big data, Bayesian methods have undergone a tremendous renaissance in research.

In the long run, an important theoretical result is that for every decision rule there is a Bayesian decision rule that is more accurate and better.

Conclusion

The main criticism of Bayesian methods relies on the subjective prior, which can often lead to different backs and conclusions, as opposed to a straightforward objective frequentist approach. However, the decision rule combines both the expected and the observed data into a statistical framework for making decisions. In a decision rule, the frequentist will consider the expected variable given a hypothesis, whereas the Bayesian combines this expectation with a posterior distribution.

Introduction

Data Exploratory Analysis Results
Summary Statistics of Diabetes and predictors
Spatial Pattern of Diabetes and Predictors
Spatial distribution of socio economic variables
Normality Tests

From Figure 4.1 above, it can be visualized that the prevalence of diabetes is not uniformly distributed across the continent. Figure 4.4, which shows the prevalence of obesity (covariate 2), shows that the highest prevalence of obesity is in the North and South Africa region. In general, the highest dispersion of urban population growth is seen more in the North Africa and Central Africa regions, while the East Africa region has the lower dispersion.

The overview shows that the North, South and some parts of the Central African region have the highest distribution of HDI, while the West African region has the lowest rate on the continent.

Table 4.1: Summary statistics of variables in the data set

Clustering of Diabetes prevalence in Africa

K-Means Results
Ward Clustering Result
Centroid Linkage Result
Conclusion

Algeria, Comoros, Djoubti, Equatorial Guinea, Gabon, Libya, Morocco, South Africa, South Sudan, Sudan and Tunisia. The department's dendrogram result shows five clusters with cluster 1 with eleven countries, these are: Algeria, Comoros, Djoubti, Equatorial Guinea, Gabon, Libya, Morocco, South Africa, South Sudan, Sudan and Tunisia. Five clusters were created, with cluster 1 having eleven countries, which are Algeria, Comoros, Djibouti, Equatorial Guinea, Gabon, Libya, Morocco, South Africa, South Sudan, Sudan and Tunisia.

High urbanization Gabon, Libya, Djibouti, Algeria, Tunisia, Cabo Verde, Republic of Congo, Sao Tome Principe, South Africa, Morocco, Mauritania and Gambia.

Table 4.4: Summary of clustering of Diabetes in Africa using K - mean clustering techniques Countries in clusters n Mean Minimum 1 st

Spatial Analysis Results

Global Moran Indices (GMI) and Moran Indices (MI)
Methodological steps in model building

However, there is an agreement in the value of the Moran's I in table 4.9 (under randomization) and the p-values in Table 4.10. After seeing that diabetes is clustered across the continent from the Moran I statistics in Tables 4.8 and 4.9. Due to the limitation of OLS, the global GWR model in Table 4.12 produced estimation coefficient for each variable, the minimum and maximum coefficient, standard error and mean, the t-value and the confidence intervals.

The final model in Table 4.13 is the result of the improved model and this shows a perfect fit, while other variables not in the table became a fixed term and then obesity, urban population growth, age dependency ratio, physician density, HDI and GNI became the final parameter remaining in the model, the output gave the mean, standard deviation, and confidence interval.

Table 4.8: Global Moran’s Indices and spatial patterns of diabetes prevalence in Africa for 2015 (under assumption of normality)

Local Indicator of Spatial Association (LISA)

Conclusion

Similarly, the countries that have high diabetes prevalence and neighboring countries with low diabetes prevalence rates (HL) are Ivory Coast, Liberia, Senegal, Chad, Eritrea, Tanzania, Lesotho, Libya and Swaziland. Similarly, countries with a low prevalence of diabetes but which have neighboring countries with a high prevalence of diabetes (in the LH category) are indicated in light blue, and are South Africa, Mali, Sierra Leone, the Republic of the Congo, the Congo DRC, Ethiopia, Uganda, and Madagascar. Countries with high diabetes surrounded by countries with low diabetes are: from Southern Africa region are: South Africa and Madagascar, from East Africa region are: Eritrea and Tanzania, from North Africa region: Libya, from Central- Africa region: Chad, from West Africa region are: Senegal, Ivory Coast and Liberia.

Likewise, countries with low diabetes surrounded by countries with high diabetes are: from the Southern African region: Lesotho, from the Eastern African region: Ethiopia and Uganda, from the North African region: Morocco, from the Central African region : Congo DRC and from the West African region: Mali. .

Figure 4.24. LISA Cluster map of diabetes prevalence in West Africa region

Results of Classical Statistics

Similarly, we expect that for every one percent increase in health care spending, diabetes prevalence will decrease by -0.049 on average, holding all other variables constant. For every one percent increase in population age, we also expect diabetes prevalence to increase by 0.101 on average, holding all other variables constant. For each one percent increase in urban population, we also expect diabetes prevalence to decrease by -0.056 on average, holding all other variables constant.

Finally, for every one percent increase in physician density, we expect the prevalence of diabetes to increase by an average of 2,947 holding all other variables constant.

Figure 4.29. Diagnostic plot of linear regression

Model Transformation

Results of Log Linear Regression Model
Result of Fractional Probit Model
Poisson Regression Results
Negative binomial model result

Finally, for every one-unit increase in physician density, we expect the prevalence of diabetes to increase by an average of 0.368, holding all other variables constant. The parameter estimates indicate that for each increase in the proportion of GDP, we expect the prevalence of diabetes to increase by an average of 0.00001, holding all other variables constant. Also, for each increase in the proportion of physician density, we expect the prevalence of diabetes to increase by an average of 0.183, holding all other variables constant.

This means that for a one unit increase in physician density, the expected log count of diabetes prevalence will increase by 1.2683.

Table 4.16: Log -linear Regression Dependent variable transformed (Model 2) Coefficients Estimates Standard error t-value Pr (> |t|)

Results of Bayesian Analysis of Factors Affecting Diabetes

Results of Bayesian Linear Model
MCMC Diagnosis Plots of Diabetes and Risk Factors for Bayesian Linear Model
Results of Bayesian Poisson Model of Diabetes and Predictors
Convergence Diagnosis Results of Bayesian Poisson Model of Diabetes and Covariates
Results of Bayesian Negative Binomial Model of Diabetes and Predictors

The upper frame shows that MCMC (Gamerman sampling algorithm) convergence gradually reaches the appropriate mean value of the posterior for all the variables. The lower left frame displays the autocorrelation and indicates the approximate independence of the sampling from the posterior distribution. The posterior mean is at the top of the distribution, showing where most of the mass rests.

Given the interval's posterior score, in the Bayesian paradigm, the interpretation of a credible interval is to indicate that there is a 95% probability that the true parameter (for the intervals in question) is included in the given interval.

Table 4.20: Results of Bayesian Linear Regression

Conclusion

Likewise, a significant density of doctors means that access to the health care system, health skills, the number of doctors serving the population play an important role in the prevalence of diabetes. Finally, urbanization was found to be significant, which implies that as countries continue to urbanize, the pattern of people's changing living standards has a positive impact on the prevalence of diabetes, that urbanization increases the prevalence of diabetes in Africa.

Introduction

Discussion

In general, the northern and southern regions have a high prevalence of diabetes and risk factors for diabetes. Similarly, countries with a high prevalence of diabetes with neighboring countries with a low prevalence of diabetes are Ivory Coast, Liberia, Senegal, Chad, Eritrea, Tanzania, Lesotho, Libya, Swaziland. Similarly, countries with low prevalence of diabetes and neighboring countries with high prevalence of diabetes (in the low-high category) are South Africa, Mali, Sierra Leone, Republic of the Congo, Congo, Ethiopia, Uganda and Madagascar.

We show that countries with high GDP, high obesity, high MYSC, high physician density, high population age and high urbanization also have high diabetes prevalence rates.

Conclusion

This can be attributed to their relatively low rates of the risk factors GDP, Urban population growth and physician density. First, the diabetes prevalence estimates were modeled from survey data, and we did not account for the survey sampling uncertainty or the biases and limitations of the survey. Finally, the use of cluster analysis and LISA map helps to strengthen the results, due to the heterogeneity of the population and diabetes in Africa, the cluster analysis brings together the homogeneous groups of countries into the same clusters, that is countries with similar pattern in terms of the level of their incidence and risk factors can come together to form joint policies.

The primary strength of this study is the use of GWR in the analysis of the spatial distribution and correlates of diabetes incidence.

Implications and Recommendations

Further Study

Finally, further research can be done on the contribution of ethnicity to the spread of diabetes in Africa. Prevalence and risk factors of diabetes mellitus in the rural region of Mali (West Africa): a practical approach. Effects of urbanization on lifestyle and diabetes prevalence in indigenous Asian Indian population.

Impact of poverty on the prevalence of diabetes and its complications in urban south India.