Exploratory analysis based on food consumption score as a continuous variable

5.1 Exploratory analysis

5.1.1 Exploratory analysis based on food consumption score as a continuous variable

This sub-section aims at inspecting the distribution of the food consumption score (FCS) observations from a sample size of 9 220 households. However, the sample includes some 395 (4.3%) missing values as well. Although not really of any significant importance to this study, some readers might want to find out about the distribution of missing values by state. Most

(81%) of the missing values are found in Western Bahr el-Ghazal State (21%), Jonglei (19%), Warrap (15%), Northern Bahr el-Ghazal (15%) and Unity (11%).

The first procedure is to examine common descriptive statistics from the sample as shown in Table 5.2 below. It should be noted that although the point estimates of range, minimum and maximum food consumption score yield awkward values, this has not affected the mean significantly; since the extreme values close to the lower range, i.e., 0.5 to 3.5 food consumption score, arise from 86 cases only (or only 1 per cent of the data). The values to the upper end of the distribution of the data, i.e., between 100 to105, amount to only 0.4 per cent (37 cases). The fact that the sample is reasonably big means these extreme cases are nothing to worry about. Instead, it implies that the dataset gives hope for a better fitting model, although this result will have to be later confirmed after fitting a model with all the predictors selected.

Table 5.2 Measures of central tendency for the food consumption score variable

Statistic Value

N 8 825^a

Range 104.5

Minimum 0.5

Maximum 105.0

Mean 40.9

Standard error of mean 0.23

Standard deviation 21.66

a Valid cases only (i.e. sample size less missing cases).

The second type of analysis features plotting of the observations using a frequency distribution histogram. The plot (see Figure 5.2) shows that the distribution of food consumption scores tends to normality with a slight skew to the lower end of the scale. This heralds hope to the rest of the analytical process, as there is an indication of a fair distribution of observations of the response variable.

Figure 5.2: Frequency distribution histogram of food consumption score (FCS)

Another promising sign of normality in the distribution of the data is using the P-P Plot, otherwise known as probability plot. A P-P Plot plots the cumulative proportions (observed cumulative probabilities) of a variable against the proportions (expected cumulative probabilities) of any of a number of test distributions. P-P plots are generally used to determine whether the distribution of a variable matches a given distribution whereby clustering of points around the straight line indicates the variable matches the test distribution specified (SPSS, 2006). Figure 5.3 assures that the distribution of food consumption scores tends to normality.

Figure 5.3: Normal P-P Plot of Food Consumption Score

The Detrended Plot option of SPSS enables plotting of observed cumulative values against deviations from the expected values. Deviations are calculated by subtracting the expected value from the observed value. As can be observed in Figure 5.4, the distribution of expected probabilities from observed cumulative probability deviations is fairly good, as the points appear to be tightly following a linear distribution and both negative and positive deviations seem to show balanced distribution. It is also observed that the points are densely clustered indicating very little variability. The two types of probability distributions show no visible outliers. In addition, the deviations lie in the interval -0.04 and 0.04, which is very close.

The two illustrations could serve the purpose of showing evidence of closeness of the data to a normal distribution. However, it is worthwhile examining an equally popular method for exploring the distribution of an interval scale variable such as the FCS. The simple Boxplot method summarises a single numeric variable within categories of another variable (SPSS, 2006).

Figure 5.4: Detrended Normal P-P Plot of Food Consumption Score

Boxplots are used in descriptive exploratory analysis to show the median and quartiles as well as outlier and extreme values for a scale variable. The method uses the interqurtile range (the difference between the 75^th and 25^th percentiles and corresponds to the length of the box. In the boxplot of Figure 5.5, each box shows, the median, quartiles and extreme cases of the food

consumption scores within a state. Values between 1.5 and 3 box length from the upper or lower edge of the box are classified as as outliers. Values above 3 box length from the upper or lower edge of the box are extreme. The length box is the interquartile rage. The boxplot of Figure 5.5 shows that four states (Unity, Western Bahr el-Ghazal, Central Equatoria and Eastern Equatoria) are free of outliers or extreme cases. Three states have one extreme case each and three have between 4 and 6 extreme cases.

Figure 5.5: Box plots of Food Consumption Score (FCS) by state

A boxplot examining the distribution of food consumption scores by food consumption groups, shown in Figure 5.6, reveals 6 extreme values in the ‘Good Food Consumption’ group. These extreme cases are shown numbered and will be removed if the examination of the model later reveals evidence of lack of goodness of fit.

Figure 5.6: Box plot of food consumption score by food consumption group

In general, the four types of exploratory analysis of the distribution of food consumption scores as a numeric variable give hope of a good model although the boxplot method distinctly reveals extreme cases. However, the results of the analysis using frequency distribution histogram, P-P Plot and point measures of central tendency (mean, median and range,) give motivation that the dataset is quite good. Indeed the large sample size must have played a vital role in minimising the effect of the relatively few (21) extreme cases from influencing the model.

Dalam dokumen Statistical analysis of determinants of household food insecurity in post-conflict Southern Sudan. (Halaman 80-85)