Validity and Reliability of the Instrument

THEORETICAL FRAMEWORK AND CONCEPTUAL DEVELOPMENT

4.13 Validity and Reliability of the Instrument

Stage Five: Specifying the Structural Model

This stage involves the specification of the structural model through assigning relationships from one construct to another according to the proposed theoretical model. Although the emphasis in this step is on the structural model, the measurement specifications should also be included for the estimation of the SEM model. The path diagram, in this way, represents the measurement part together with the structural part of SEM in one model (Hair et al., 2010). By the end of this step the model should be ready for estimation. This will be the test of the overall theory, comprising both the measurement relationships of the indicators to constructs and the hypothesized structural relationships between constructs (Hair et al., 2010).

Stage Six: Assessing Structural Model Validity

The final stage of SEM is to test the validity of the complete structural model besides its corresponding hypothesized relationships. If an acceptable fit is not achieved for the measurement model, model fit will not improve when the structural relationships are specified (Hair et al., 2010). The general guidelines outlined in stage four are also followed for establishing the validity of the structural model. However, good model fit alone is not enough to support a proposed structural theory.

Individual parameter estimates representing each hypothesis should also be examined. The structural model is only considered acceptable when it shows acceptable model fit and when the path estimates representing the hypotheses are statistically significant and in the predicted direction (Hair et al., 2010). Table 4.6 above provides a summary and description of the fit indices (Byrne, 2009; Hair et al., 2010; Meyers, Gamst, and Guarino, 2013).

used because most constructs are measured with multi-items. The section below discusses these two concepts and how they were used in this study.

4.13.1 Validity of Instruments

In research we want to measure the accuracy of the concept as much as possible. Validity refers to the extent to which an empirical measure adequately reflects the real meaning of the concept under consideration (Babbie, 2013: 191).

Validity reflects how well a given measurement “measures what is purports to measure” (Nunnally and Bernstein, 1994: 83). In other words, validity is how well we measure what we intend to measure. There are several criteria for measuring the validity of a concept: face, content, construct (convergent and divergent), and criterion validity (Babbie, 2013). Face validity is an indicator that makes it seem a reasonable measure of a variable (Babbie, 2013). Thus, all of the variables in this study on their face seem to have measured what they were intended to measure.

Content validity on the other hand, is the extent to which a measure covers the range of meanings included within the concept (Babbie, 2013). There is no statistical tool to check for the content validity of a concept. Therefore, in this study a thorough review of the literature was conducted and the instrument was also pre-tested with experts, professionals, academicians, and selected respondents for its adequacy and relevance. Construct validity is the extent to which the constructs or a set of measured items actually reflect the theoretical latent constructs of those items that they are designed to measure (Hair et al., 2010). It is also refers to the degree to which a measure relates to other variables as expected within a system of theoretical relationships (Babbie, 2013). Therefore, construct validity should reflect the accuracy of the measurement from the sample that exists in the population. Construct validity is made up of convergent and divergent validity.

Convergent validity is the degree to which a construct’s items are correlated with each other. High convergent validity occurs when the scale’s items are highly correlated (Meyers, Gamst, and Guarino, 2013). In this study, convergent validity was established by examining statistically-significant factor loadings on each construct.

Standardized loading estimates of 0.50 or higher indicate convergent validity (Hair et al., 2010). Convergent validity was also assessed by examining the average variance extracted (AVE) from the measures. An AVE of 0.50 or more indicates adequate

convergent validity (Hair et al., 2010). The convergent validity results are presented in Chapter 5.

Divergent validity (also referred to as discriminant validity) is the degree to which a construct is truly different from other constructs (Hair et al., 2010). This type of validity involves demonstrating a lack of or low correlations between different constructs (Meyers, Gamst and Guarino, 2013; Tabachnick and Fidell, 2014). High divergent validity delivers evidence that a construct captures some phenomena other measures do not (Hair et al., 2010). In this study, divergent validity was assessed by comparing the square root of the AVE values with the correlation estimates between constructs. Evidence of divergent validity is provided if the square root of the AVE for a construct is higher than the correlation estimate between that construct and all other constructs. That is, divergent validity is achieved if the AVE of a construct is higher than the squared of the correlation between that construct and other constructs (Hair et al., 2010). The results regarding divergent validity are presented in Chapter 5.

4.13.2 Reliability of the Instruments

Reliability is the extent to which a measurement is consistent and yields the same results repeatedly. Reliability is the extent to which a particular technique when applied repeatedly to the same object yields the same results each time (Babbie, 2013:

188). Reliability of a measurement refers to its consistency (Hair et al., 2010). There are two main ways for checking for the reliability of a measurement: external and internal. External reliability has to do with the consistency of the measurement over time. External reliability can be examined through the test-retest method (Babbie, 2013). In this way an instrument can be administered twice with the same respondents. It is assumed that the respondents that scored high on the first test should also score high on the second test. However, Bryman and Cramer (2001) argued that a low test-retest correlation does not always mean the reliability is low but may be due to the underlying concept itself changing.

Also, two different forms of a measurement can be constructed and administer to the same respondents at different times to check for external reliability. This is often called the split-half method (Babbie, 2013). Even though it is useful, it is expensive and time consuming (Malhotra, 2004). Again, established measures can

also be used for external reliability. Thus, Babbie (2013) argued that a good way to ensure reliability is the use of measures that have proved their reliability in previous research. Therefore, in this study the established measures that have proved their external reliability in several studies were used. Thus, all of the measures were adapted from previous studies.

The use of internal reliability is popular in multi-scale items. According to Bryman and Cramer (2001), internal reliability refers to whether the items that make up a particular scale are measuring a single concept or are internally consistent.

Therefore, if the correlation produces high results, it means that internal consistency is high. Cronbach’s alpha coefficient is the most common measure of internal reliability.

Hatcher (1994) stated that if all the items are from the domain of a single construct, the responses to the items composing the measurement should be highly correlated.

Coefficient alpha values between 0.70 and 0.80 are usually acceptable. However, when dealing with psychological constructs, values less than 0.70 (but more than 0.60) are acceptable because of the diversity of the measured constructs (Kline, 1999).

A major deficiency with coefficient alpha is its positive relationship with the number of scale items. Increasing the number of the scale items will increase the value of the coefficient alpha. Therefore, Cronbach’s alpha may be inappropriately inflated by including several redundant items (Hair et al., 2010). To defeat this problem, reliability measures derived from CFA have been suggested (Hair et al., 2010). These measures include the composite reliability and the AVE, and both measures provide more rigorous results (Hair et al., 2010). Therefore, the reliability of the constructs of this study was assessed using Cronbach’s alpha, composite reliability, and AVE.

Composite reliability refers to the extent to which a set of indicators shares in their measurement of a construct (Meyers, Gamst and Guarino, 2013). It is a measure of the homogeneity and internal consistency of the items that comprise a scale.

Constructs that are highly reliable are those in which the indicators are intercorrelated highly and this indicates that they are all measuring the same latent construct.

Composite reliability values of 0.60 or more are generally considered acceptable (Tabachnick and Fidell, 2014). However, values of 0.80 or more are preferable

Dalam dokumen PDF repository.nida.ac.th (Halaman 165-169)