Statistical approach - Techniques for data analysis and interpretation

5.7 Techniques for data analysis and interpretation

5.7.2 Statistical approach

Neuman (2000:313) notes that statistics is a tool to collect, organize and analyse numerical facts or observations. During data analysis, it is vital for the researcher to choose an appropriate statistical approach which is relevant to the nature of the survey conducted. In this study, two types of statistical methods will be used, descriptive statistics and inferential statistics.

5.7.2.1 Descriptive statistics

Descriptive statistics involve the organizing and summarizing of quantitative data (Lind, Marchal and Mason, 2004:6). This relates to the description and/or summary of the data obtained from a group of individuals (Huysamen, 1998:4). Neuman (2000:313) observes that descriptive statistics present information in a convenient, usable, and understandable form; in this study information is presented in graphic forms such as bar and pie charts.

Bar charts (Willemse, 2009:29-34):

· Can be horizontal or vertical bars;

· Various levels of complexity are possible; and

· Generally, all bars are the same width, with the length corresponding to the frequency.

According to Willemse (2009:34-35), pie charts are commonly used to depict differences between people/groups/spending; various levels of complexity are also possible.

According to Neuman (2000:317), descriptive statistics describe numerical data and are categorized by the number of variables involved: univariate, bivariate and multivariate, for one, two and three or more variables. Univariate and bivariate analysis is most appropriate for descriptive statistics (Lind et al., 2001). Univariate analysis is concerned with measures of central tendency and measures of dispersion. The most appropriate measure of central tendency for interval data is the mean and the most appropriate measure of dispersion for interval data is the standard deviation. Bivariate analysis concerns the measurement of two variables at a time (Lind et al., 2004:6). Descriptive statistics is a useful tool as it summarizes results for an experiment, thereby also allowing for more constructive research after more detailed analysis. Descriptive data analysis aims to describe the data by investigating the distribution of scores on each variable and by determining whether the scores on different variables are related to each other (Lind et al., 2001:6).

Lind et al. (2001:457-460) further clarify that linear correlation is an associated degree of measurement between two interval variables. The level and direction of any relationship between the perception and expectation variables are therefore described by the correlation coefficient calculated by correlating the two means of the variables.

a) Frequencies

The simplest way of summarizing data for individual variables so that specific values can be read, is to use a table (frequency distribution). For descriptive data the table summarizes the number of cases; this represents the frequency (Saunders et al., 2000:338). In SPSS, the statistical programme employed for this study, a frequency distribution is “obtained by selecting and analysing descriptive frequencies which usually include a percentage for each value” (Fielding and Gilbert, 2002:49).

b) Central tendency

There are three types of averages, which are collectively known as “measures of central tendency”. These are the mean, the median and the mode (Tredoux and Durkheim, 2002:40). According to Denscombe (1998:193), the choice of a measure of central tendency may be limited by the nature of the measurements involved. If

nominal-scale data are involved, the mode is the only measure of central tendency which can be sensibly used. With ordinal data, the median is usually preferred since it not only takes the frequencies of various categories into account, but also their rank. The mean is usually preferred in the case of numerical data. In the case of skewed distributions the median may be preferred to the mean. These approaches are briefly discussed in (i) and (ii).

i) Mean

The mean (also known as the arithmetic mean or average) of a collection of scores is the sum of the scores divided by the number of scores (Huysamen, 1998:44).

According to Nichols (1995:124), mean is a kind of average for interval variables (total of the sample values divided by the number of values in the sample) which one can use as a guessed mean (a round number, close to the true mean) to simplify calculation of the standard deviation. The mean is what most people have in mind, when, in common parlance, they think about “the average”. It is a measure of central tendency in the sense that it describes what would result if there were a totally equal distribution of values - if the total amount or frequencies were spread evenly (Denscombe, 1998:193).

ii) Median

The median of a collection of scores is the middlemost score when the scores have been arranged in ascending or descending order (Huysamen, 1998:43). Nichols (1995:124) holds that the median is a kind of average for interval variables. The middle value is when the data are arranged in order of size. Where the set of data has an even number of values, the median is the mean of the two middle values.

The median is the mid-point of a range. Calculation of the median is straightforward in that values in the data are placed in either ascending or descending rank order and the point which lies in the middle of the range is the median (Denscombe, 1998:194).

The advantages of using the median as a measure of central tendency include:

· It can be used with ordinal data as well as interval and ratio data;

· It is an ordinal operation; the median is not affected by extreme values, i.e.

“outliers”;

· The median works well with a low number of values; and

· It is possible to establish that exactly half the values are above the median and half the values are below the median (Denscombe, 1998:194-195).

iii) Mode

According to Huysamen (1998:42), the mode of a collection of scores is the score value which has the highest frequency of occurrence. In an ungrouped frequency distribution the mode is the score value which has the highest frequency. When social researchers use the mode as a measure of central tendency they have in mind the most fashionable or popular figure. The mode is the value which is most common. Identification of the modal value simply consists of seeing which value among a set occurs most frequently; this is the mode (Denscombe, 1998:194-195).

In this study the data will be presented in tables, bar charts, pie charts, line graphs, box plots etc., using frequencies and percentages.

5.7.2.2 Inferential statistics

The process of generalizing from the findings based on the sample of the population is called statistical inference (Bless and Higson-Smith, 1995:86). Inferential statistics is used to make inferences regarding the properties (e.g. the mean) of the population on the basis of the results obtained from appropriately-selected samples of the population (Huysamen, 1998:4). Inferential statistical analysis is concerned with the testing of the hypothesis (Bless and Higson-Smith, 1995:86). The independent t-test is the most appropriate parametric test for a comparison of the means. This tests any significant difference between the two variables. In this study, primary data were collated and analyzed and comments and conclusions are based on the results (Lind et al., 2001:348-351).

Inferential statistical analysis allows a researcher to draw conclusions about populations from sample data. The services of a qualified statistician were used during the analysis and presentation of data; however, the researcher retains ownership of the overall research study and its findings.

a) Analysis of the t-test

Kerr (2004:61) notes that the t-test is a parametric test that makes the following assumptions:

1. The level of measurement of the dependent variable must be at least interval.

2. The dependent variable is normally distributed in the population.

3. The variances of the samples are not significantly different.

Measurement

According to Steyn, Smit, Du Toit and Strasheim (1994:7), measurements include the items reflected in Table 5.6.

Table 5.6: Types of measurements Measurement Description

Nominal measurement Is a classification of responses (e.g. gender).

Ordinal measurement Is achieved by ranking (e.g. the use of a 1 to 5 rating scale from ‘strongly agree’ to ‘strongly disagree’).

Interval measurement Is achieved if the differences are meaningful (e.g.

temperature).

Ratio measurement Is the highest level – where the difference and the absence of a characteristic (zero) are both meaningful (e.g. distance).

Nominal and ordinal measurements were analyzed in this study to reach conclusions and formulate recommendations.

b) Chi-square test

Willemse (2009:209-214) notes that a chi-square test is any statistical hypothesis test in which the test statistic has a chi-square distribution when the null hypothesis is true, or where the probability distribution of the test statistic (assuming the null hypothesis is true) can be made to approximate a chi-square distribution as closely

as desired by making the sample size large enough. More specifically, a chi-square test for independence evaluates statistically significant differences between proportions for two or more groups in a data set.

c) Factor analysis

SPSS Statistics 17.0 (2008) stipulates that factor analysis seeks to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often used in data reduction to identify a small number of factors that explain most of the variance that is observed in a much larger number of manifest variables. Factor analysis can also be used to generate hypotheses regarding causal mechanisms or to screen variables for subsequent analysis, for example, to identify co-linearity prior to performing a linear regression analysis.

d) Cross tabulations

Data generated from observations of two different related categorical variables (bivariate) can be summarized using a table known as a two-way frequency table or contingency table. The word “contingency” is used to determine whether or not there is an association between the variables (Willemse, 2009:28).

e) Linear regression

Linear correlation is an associated degree of measure between two interval variables. The level and direction of any relationship between the perception and expectation variables are therefore described by the correlation coefficient calculated by correlating the two means of the variables. The Pearson’s R-value gives an indication of the strength of the relationship between the variables. The closer values are to ±1, the stronger the relationship, positive or negative. The closer the value is to 0, the weaker the relationship Lind et al. (2004:457- 460).

f) Testing reliability

According to SPSS Statistics 17.0 (2008), reliability refers to the property of a measurement instrument that causes it to give similar results for similar inputs.

Cronbach's alpha is a measure of reliability. Alpha is a lower bound for the true reliability of the survey. Mathematically, reliability is defined as the proportion of the

variability in the responses to the survey that is the result of differences among the respondents; that is, answers to a reliable survey will differ because respondents have different opinions, not because the survey is confusing, or has multiple interpretations. The computation of Cronbach's alpha is based on the number of items in the survey (k) and the ratio of the average inter-item covariance to the average item variance.

α = k(cov/var) 1 + (k − 1) (cov/var)

Under the assumption that the item variances are all equal, the ratio simplifies the average inter-item correlation, and the result is known as the standardized item alpha (or Spearman-Brown stepped-up reliability coefficient).

α = kr

1 + (k − 1) r

It is important to note that the standardized item alpha is computed only if inter-item statistics are specified (Willemse, 2009).

g) Hypotheses tests: P-values and statistical significance

Jupp (2006:137) asserts that a hypothesis is “an untested assertion about the relationship between two or more variables. The validity of such an assertion is assessed by examining the extent to which it is, or is not, supported by data generated by empirical inquiry”. Bless and Higson-Smith (2000:154) state, that a hypothesis is a tentative, concrete and testable explanation or solution to a research question.

Lind et al. (2004:348-351) affirm that inferential statistical analysis is concerned with the testing of a hypothesis. The independent t-test is the most appropriate parametric test for a comparison of the means. This tests any significant difference between the two variables. Primary data were collated and analyzed and comments and discussion are thereafter based on the results obtained. Inferential statistical analysis allows a researcher to draw conclusions about populations from the sample data. The most important application of statistical theory on sampling distributions in the social sciences has been significance testing or statistical hypothesis testing.

The researcher is interested in the outcome of a study of the management of the EPWP and its impact in terms of service delivery.

The traditional approach to reporting a result requires a statement of statistical significance. A p-value is generated from a test statistic. A significant result is indicated by "p < 0.05" (Lind et al., 2004:347).

Dalam dokumen Management of the Expanded Public Works Programme in the Department of Public Works : KwaZulu-Natal Province. (Halaman 179-186)