Assess measurement model reliability, validity and fit

CHAPTER 4 RESEARCH DESIGN AND METHODOLOGY

4.10 STRUCTURAL EQUATION MODELLING

4.10.3 Assess measurement model reliability, validity and fit

Reliability refers to the degree to which the findings remain consistent as tests are repeatedly carried out (Kumar, 2011:181; Malhotra, 2010:319). In addition, reliability demonstrates the degree to which a measure ment is free from random errors (Babin & Zikmund, 2016:280). In order to assess reliability, three approaches are used: test-retest reliability, alternative forms of reliability, and internal reliability consistency (Malhotra, 2010:318). The test-retest reliability is used to test stability by measuring the same scaled items using the same sample at two different points in time, each time under identical conditions (Zikmund & Babin, 2013:257). With the alternative-forms of reliability, two similar experiment tests are conducted on the same sample at different time intervals (Malhotra, 2010:319; Shukla, 2008:84). The method of internal consistency tests whether all items within a scale measure the same basic element (Aaker et al., 2011:270).

Furthermore, internal consistency can be measured by one of two methods, namely split-half reliability and the Cronbach alpha coefficient (Iacobucci & Churchill, 2010:259). The primary focus of split-half reliability is to measure attitudes towards a phenomenon (Kumar, 2011:184). The method involves splitting the elements of the scale into two parts, each half producing a score that is then correlated. When the connection between the two halves is strong, it is an indicator of high internal consistency (Rovai et al., 2014:578; Malhotra, 2010:319). However, the most common measure of internal consistency is the Cronbach alpha coefficient (Zikmund & Babin, 2013:257; Malhotra, 2010:319). The number of items in a scale is segmented by measuring the correlation coefficient between item pairs and then calculating the mean of the total score (Malhotra, 2010:319; McDaniel & Gates, 2010:253; Zikmund & Babin, 2013:257). In order to assess inter-item reliability for Likert scale questions, this approach is usually used (Gliner et al., 2011:159). A Cronbach alpha value of 0.6 and above is appropriate, according to Zikmund and Babin (2013:257), but a value of 0.7 and above is preferable. A higher value indicates high internal-consistency reliability. Cronbach's alpha will be used to assess the reliability of the data for this study.

In addition, composite reliability (CR) should be used if reliability is calculated for structural equation modelling (Afari, 2013:101). The correlation between true score variance and total score variance is known as CR (Malhotra, 2010:733). When the measured CR value is 0.70 and above, the model is considered reliable. However, it is also considered acceptable if the CR values vary from 0.60 to 0.70, but only if coupled with acceptable average variance extracted (AVE) values (Hair et al., 2014:619). The next section explains the study's validity section.

Validity demonstrates the accuracy of a measurement by determining whether a scale measures what it is supposed to measure (Burns & Bush, 2014:214; Hair et al., 2013:151). Validity guarantees the scales are free of random and systematic errors (Feinberg et al., 2013:128). A scale is characterised as having perfect validity when it has zero measurement errors (Shukla, 2008:82). However, if a scale lacks validity, inadequate assumptions on the measure are likely to be made (Zikmund & Babin, 2013:258). There are three approaches for determining validity, namely content validity, criterion validity and construct validity (Malhorta, 2010:317).

Content validity defines the degree to which the scale’s content seems to logically represent what was intended to be measured (Zikmund & Babin, 2010:250). Content validity is evaluated by a subject expert or the researcher (Malhotra, 2010:320). Criterion validity assesses whether the measurement scale performs according to constructs of identical measuring instruments or standard measurements as calculated (Shukla, 2008:82). Construct validity assesses whether the measurement tool reflects the underlying theory logically and connects it with the scales (McDaniel & Gates, 2010:256). Construct validity consist of three measures, namely convergent, discriminant and nomological validity (Sarstedt & Mooi, 2014:57).

Convergent validity calculates the degree of association amongst different measures that were developed in order to measure identical or similar constructs (Clow & James, 2014:271). Two methods can be used to measure convergent validity: by the size of the factor loadings and by estimating the average variance extracted (AVE) (Malhotra, 2010:734). AVE values are considered to be acceptable if the values are 0.50 or higher, whereas factor loadings values above 0.50 are acceptable but should preferably be above 0.70 (Hair et al., 2014:618-619).

Additionally, Afari (2013:101) recommends that AVE should be used when assessing validity when structural equation modelling is employed. Discriminant validity, on the other hand, identifies the uniqueness of a measure, therefore, it evaluates the lack of correlation between different constructs or measures not developed to measure the same concept (Zikmund & Babin, 2010:251). With structural equation modelling, discriminant validity is identified by comparing the correlation coefficients of the measurement model with the square root of the constructs’ AVE values (Byrne, 2010:290-291). Discriminant validity occurs if the square root of the AVE value is

greater than the associated correlation coefficients (Hair et al., 2014:620). Nomological validity is the degree to which distinct yet similar constructs correlate in a theoretically expected manner (Remler & Van Ryzin, 2011:113; Malhotra, 2010:321). A Pearson’s Product-Moment correlation analysis is generally used in structural equation modelling to assess the nomological validity of the measurement model (Malhotra, 2010:562; Hair et al., 2010:710). The Pearson’s Product- Movement correlation coefficient (r) method is most commonly used to assess the intensity of the relationship between two metric variables (Malhotra, 2010:562). It ranges from -1 to +1; a perfect negative relationship between two variables is represented by -1 and a perfect positive relationship between two variables is represented by +1 (Hair et al., 2008:286). A correlation coefficient of zero indicates that there is no relationship between two variables (Berndt & Petzer, 2011:239). Pallant (2013:139) explains that the strength of the relationship between the two variables depends on the magnitude of the correlation value.

In this study, content and construct validity were used. First the questionnaire was reviewed by three experienced researchers to ensure the validity of its content. Secondly, by looking at the convergent, discriminant and nomological validity of the measurement model, construct validity was assessed.

Structural equation modelling is used to determine the relationships between a set of variables and to determine the overall fit of the model (Pallant, 2013:109). Consequently, once the reliability and validity of the measurement model has been identified, the model fit indices should be assessed. Model fit is analysed by comparing the degree of similarity between the covariance matrix predicted and the covariance matrix observed (Malhotra, 2010:731). For acceptable levels of goodness-of-fit indices, measurement models should be evaluated. There are three types of goodness-of-fit measures: absolute fit indices, incremental fit indices and parsimony fit indices (Hair et al., 2014:576).

Absolute fit indices evaluate how well a measurement model recreates observed results. These include goodness-of-fit measures which consist of the goodness-of-fit index (GFI), the adjusted goodness-of-fit index (AGFI); and badness-of-fit measures which consist of the chi-square test (X²), the standardised root mean residuals (SRMR) and the root mean square approximation error (RMSEA) (Kline, 2011:195). The absolute fit indices and the suggested threshold values needed during evaluation are presented in Table 4-3.

Table 4-3: Absolute fit indices and recommended values

Measure Description Recommended

value Absolute-fit-

indices

(Goodness-of-fit)

GFI Goodness-of-fit ≥ 0.90

AGFI Adjusted goodness-of-fit ≥ 0.90

Absolute-fit- indices

(Badness-of-fit)

X² Chi-square p ≥ 0.05

SRMR Standardise root mean residual ≤ 0.08 RMSEA Root mean square error of

approximation ≤ 0.08

Incremental fit indices are used to determine the degree to which a baseline model matches the predicted model (Kline, 2011:196; Malhotra, 2010:731). The incremental fit index (IFI), the Comparative Fit Index (CFI), the Normed Fit Index (NFI) and the Tucker-Lewis Index (TLI) all form part of the incremental fit indices (Malhotra, 2010:733). The incremental fit indices and the suggested threshold values needed during evaluation are presented in Table 4-4.

Table 4-4: Incremental fit indices and recommended values

Measure Description Recommended

value

Incremental-fit- indices

(Goodness-of-fit)

NFI Normed fit index ≥ 0.90

CFI Comparative fit index

TLI Tucker-Lewis index ≥ 0.90

IFI Incremental fit index ≥ 0.90

Lastly, parsimony fit indices are used to compare models with different complexities (Teo et al., 2013:14). The Akaike's information criterion (AIC) and the Bozdogan’s consistent version of the AIC (CAIC) are taken into account when a researcher compares two or more models; notably, small differences between the models suggest a better fit between the models (Kline, 2011:220).

In order to determine the model fit, this study applied absolute fit indices by means of chi-square and SRMR as well as incremental fit indices by means of IFI, CFI and TLI. In addition, parsimony

was used to compare two structural models by applying AIC and CAIC to determine which model demonstrated a better fit.

The structural model can be specified as soon as the reliability and validity of the measurement model has been determined and the model fit is identified. However, the tests should be refined and a new study planned if the measurement model does not show an adequate validity, which means that the process will be repeated from the beginning (Malhotra, 2010:729).

Dalam dokumen Modelling the factors influencing Generation Y consumers' adoption of streaming services (Halaman 127-131)