CHAPTER 4: RESEARCH METHODOLOGY
4.7 STATISTICAL ANALYSIS
91
Risk tolerance scale E23 Section E: Item E23
External events F1-F16 Section F: Items F1-F16
Political-legal F1-F3 Section F: Items F1-F3
Market fluctuations F4-F9 Section F: Items F4-F9
Crime F10-F13 Section F: Items F10-F13
Unemployment F14-F16 Section F: Items F14-F16
Source: Author compilation
The data was captured and coded electronically to minimise errors as shown in Table 4.5. The codebook is presented in Annexure B. In order to present the data obtained from the questionnaire, the next section discusses the statistical analysis techniques.
92 Source: Ghoodjani (2018)
The statistical methods depicted above are described in the subsequent sections.
4.7.1 Reliability analysis
According to Quinlan (2011:306), a reliability test is conducted to determine if the variables do not contain random errors. Therefore, Hair et al. (2010) define reliability as the process of repeating trials to test if it produces the same or similar results. Therefore, a measure is considered reliable if it produces the same outcome when tested repeatedly (Schiffman et al., 2010:58). Similarly, reliability is characterised as a high-quality measurement to deem a research study trustworthy (Cant et al., 2005). Moreover, a reliability test is utilised to test whether or not the data is free from random errors. Cant et al. (2005) state that should any random errors be found in a study, it can be concluded that it is inefficient.
There are three approaches to achieving reliability namely, test-retest reliability, alternative forms reliability and internal consistency reliability. According to Iacobucci &Churchill (2010), test-retest reliability involves the administration of a similar scale on two separate observations at different points. The reliability method aims to determine whether differences exist between scores from the two separate observations. The presence of differences between the two observations indicates a random error. The alternative forms reliability requires that the same forms be administered to the same participants at two different times (Malhotra, 2010). This method measures how the observations are correlated, although measured differently and at different times. Test-retest reliability and alternative forms reliability are time-consuming and expensive; therefore, they were not used in this study. The internal consistency reliability measures the consistency of the set items when summated to make the complete scale (Quinlan et al., 2015:113). The most common measure of internal consistency reliability is the Cronbach alpha which measures the average correlation between the summated items that make up the complete scale. The scale is considered reliable through the value of alpha, which ranges from zero to one, where zero represents a complete random error, and one represents no random error.
4.7.2 Validity analysis
According to Hair et al. (2013:166) and Pallant (2008:7), a measure is considered valid if it measures the scale it is intended to measure. In addition, validity can determine the extent to which dissimilarities in the observed data can reveal actual dissimilarities of the characteristics
93
under consideration rather than a random error. According to Maree (2011), when measuring validity, there are three guidelines to consider amongst others. Firstly, the measuring instrument cannot be valid if it is not reliable. Secondly, participants can answer the questionnaire in a way they think they are supposed to answer, which they consider to be socially correct. Thirdly, the participants can be biased by answering yes to all the questions.
There are three main types of methods used by researchers to validate their studies (Goundar, 2012). These methods include content validity, construct validity and criterion validity.
Content validity is performed before data collection is performed to ensure that the items included accurately represent the topic in question (Hair 2013:167). In addition, Malhotra suggests that the scale items must be examined to determine whether they cover the entire domain. The questionnaire utilised in this study was checked and validated by the researcher and two study leaders.
Construct validity involves linking the measure to the theoretical foundations in a study (McDaniel & Gates, 2012). This suggests that the results provided by the measure should have a strong linkage with the theory underlying the study (Malhotra, 2010). In other words, there should be a close correlation between the results and theoretical grounds (Quinlan 2011).
Zikmund and Babin (2013) highlight that validity consists of convergent, discriminant and nomological validity. Convergent validity is the degree to which the measured variable correlates with other variables measuring the same theoretical variable (Iacobucci & Churchill, 2010). Discriminant validity refers to the degree to which the measured variable does not correlate with other measured variables measuring the theoretical variable. Lastly, nomological validity measures the level of correlation between different constructs and the theoretical variable (Remler & Van Ryzin, 2011).
Criterion validity is considered to be significantly important if it can measure the instrument it is expected to measure (Maree et al., 2011). Additionally, Hair et al. (2013) define criterion validity as the level of correlation between self-report measures and behavioural variables in the study. Criterion validity constitutes predictive and concurrent validity. Predictive validity is a measure that predicts future performance, whereas concurrent validity is a measure that determines the degree to which a self-report measure correlates with a certain behaviour measured at the same time (Jackson, 2008).
94
For this study, the construct validity was utilised using the most accurate index, Cronbach alpha. Content validity was also followed through the inclusion of open-ended questions in the questionnaire. The following section discusses the descriptive analysis followed in the study.
4.7.3 Descriptive statistics
According to Churchill and Brown (2004), descriptive analysis is the simplifying organising and interpreting of data to provide a meaningful analysis. Quilan (2011) further highlights that data in the form of age, gender, race and income can be expressed in ratios, averages, summations and standard deviation using descriptive statistics. This study utilised measures of central tendency, dispersion and shape as the three classifications of descriptive statistics.
4.7.3.1 Measures of central tendency
This measure of central tendency consists of the mean, mode, and median. The mean can also be referred to as the average, which is the sum of all values divided by the number of the values (Kolb, 2008). The median is also known as the central point. It is determined by identifying the central point when the data is arranged in ascending or descending order. According to Maree (2011), the median is the 50th percentile in a data set and is suitable for ordinal data.
The mode is the value that occurs more frequently within the data set (Struwig & Stead, 2010).
Maree (2011) warns that the mode can be insufficient when measuring data because the data can have more than one value or characteristic that appears frequently.
4.7.3.2 Measures of dispersion
The measures of dispersion include the range, variance, and standard deviation. Unlike measures of location that focus on where the values are located, measures of dispersion focus examine the broadness of the data (Manikandan, 2011). The range is computed by subtracting the minimum value from the largest value to obtain the values between the largest and smallest values in a data set (Quinlan, 2011). On the other hand, the variance refers to the spread, thus, the degree to which the values differ from the arithmetic mean. Similar to variance, the standard deviation is the square root of the variance, referring to the spread in the data set. The standard deviation is the degree to which the average amount deviates from the arithmetic mean, that is, how much the data differs around the mean.
4.7.3.3 Measures of shape
The measures of shape include kurtosis and skewness, which are useful when analysing the nature of distribution (Petar & Sanja, 2010). The distribution of the data can be measured through a graphic presentation (Pallant, 2007). According to Quinlan (2011), kurtosis measures
95
flatness or peakedness of the data distribution. When the distribution is flatter than normal, kurtosis is negative, and when kurtosis has a zero value, the distribution signifies a normal distribution that is peaked (Malhotra, 2010). Skewness is a measure of asymmetry in distribution. In other words, skewness is the unequal distribution of the mean, mode, and median (Diedericks, 2015). According to Swanepoel (2006), distribution is skewed to the right when it is positively distributed, while distribution is skewed to the left when it is negatively distributed.
This study employed descriptive statistics to determine the overall pattern and distribution of the data and determine the participants' characteristics. Overall, the descriptive statistics used in this study consist of mean, mode, median, standard deviation, skewness and kurtosis. The subsequent section provides an overview of the correlation analysis followed in this study.
4.7.4 Correlation analysis
Correlation analysis measures how two or more variables are associated with each other (Iacobucci & Churchill 2010). Quinlan (2011:400) adds to this definition by stating that correlation is a single number that describes the strength and direction of the relationship between two or more variables. The most commonly used measure of correlation in SPSS is the Pearson correlation coefficient (r). This coefficient measures the strength and direction of how two or more variables are related (Kumar et al., 2002). Pallant (2013) further highlights that the Pearson correlation coefficient can be used in one continuous or dichotomous variable.
On the other hand, the Spearman rank-order correlation is a non-parametric measure of the degree of association between two variables. The Pearson correlation coefficient utilises the range from -1 to 1 to determine the degree of association, with -1 indicating a negative correlation, 1 indicating a strong positive correlation and 0 indicating no correlation between the variables (McDaniel & Gates, 2007).
This study utilised both the Pearson and Spearman correlation to determine the relationship between the factors influencing the risk perception of insurance policyholders in Gauteng. The test of significance used in the study will be discussed in the following section.
Figure 4.3 below shows the correlation coefficients using scatter plots.
Figure 4. 3: Correlation coefficients using scatter plots
r = -1 r = 0 r = 1
96 Source: Gujarati (2010)
4.7.5 Test of significance
The t-test is used to determine whether there is a significant difference between the arithmetic means of the two groups (Hair et al., 2008). In other words, it tests whether the arithmetic means differ from each other (Quinlan, 2015). According to Brendt and Petzer (2011), the t- test is the total risk the researcher is willing to accept for the hypothesis under testing. When performing a test of significance, there must be a null and alternative hypothesis. The null hypothesis (H0) resembles no variation between the different groups, while the alternative hypothesis (Ha) resembles variation between the groups. The p-values obtained from the t-test indicates whether or not the hypothesis can be supported. The t-test statistics were used to determine the differences in the perception of risk of insurance policyholders by males and females.
4.7.6 Analysis of variance (ANOVA)
McDaniel and Gates (2013) indicate that the analysis of variance (ANOVA) determines the differences in arithmetic means of more than two independent variables. It utilises the F-value and the p-value to explore whether the means are significantly different. The values provided by the F-value and p-value indicate the percentage that the variables are significantly different from each other. The ANOVA was utilised in this study to determine whether there were any statistically significant differences between the participants’ perceptions of risk and demographics, risk tolerance, endogenous factors, financial well-being and behavioural finance, and exogenous factors.
4.7.7 Multiple linear regression
According to Quinlan (2011), a regression measures the degree to which one or more independent variables predict a dependent variable. This means that it tests whether there exists a positive or negative relationship between the independent variables and the dependent variable. A multiple linear regression was utilised in this study to determine whether there were any statistically significant differences between the participants’ perceptions of risk and the
97
determinant factors, namely, demographics, risk tolerance, endogenous factors, financial well- being and behavioural finance, and exogenous factors.
The following section describes the highly ethical considerations that were followed in this study.
4.7.8 Outline of statistical techniques utilised
Table 4.4 below provides a summary of the statistical methods utilised to achieve the empirical objectives of the study.
Table 4. 6: Summary of statistical and inferential statistical methods utilised
Objective Methods
Determine how demographic factors influence the risk identification and risk perceptions of insurance policyholders
Descriptive Correlation T-test ANOVA Determine the relationship of risk tolerance on risk
perception of insurance policyholders
Descriptive Correlation Identifying the endogenous factors that influence risk
perception of insurance policyholders in Gauteng
Correlation Regression Identifying the exogenous factors that influence risk
perception of insurance policyholders in Gauteng
Correlation Regression Source: Author compilation