CHAPTER 3: REVIEW OF LATENT VARIABLE ANALYSIS AND SEM TECHNIQUES
3.2 Latent variables
Before proceeding with a review of latent variable analysis and the various methodological frameworks that may be applied, it is prudent to first identify an unambiguous definition for the concept of a latent variable. Many different definitions of the concept exist, although the selection of the most appropriate definition depends on the context (Skrondal & Rabe-Hesketh, 2004).
Schumacker & Lomax (1996) define latent variables as variables which are not directly observable or measurable, rather they must be observed or measured indirectly and, hence, are inferred.
Skrondal & Rabe-Hesketh (2004) define a latent variable as a random variable whose realizations are hidden from us. Bowen & Guo (2011) define latent variables as measures of hidden or unobserved phenomena and theoretical constructs. Apart from minor differences, all of these definitions highlight the unobservable, or not directly observable, nature central to the concept of latent variables.
42
Since latent variables cannot be directly measured, they must be indirectly measured by observable indicator variables which can be directly measured (Schumacker & Lomax, 1996). These observed variables are modelled as functions of model-specific latent constructs and latent measurement errors (Bowen & Guo, 2011). The estimation of latent constructs using observed variables is the basis of structural equation modelling which will be discussed in the following section.
The conceptual framework behind latent variables analysis originates from the work of Spearman (1904), who developed factor analytic models for continuous variables in the context of intelligence testing (Borsboom et al., 2003). The basic statistical idea of latent variable analysis is that if a latent variable underlies a number of observed variables, then conditionalizing on that latent variable will render the observed variables statistically independent, otherwise known as the principle of local independence. The primary challenge, however, is to find a set of latent variables that satisfies this condition for a given set of observed variables (Borsboom et al., 2003).
Although the theoretical concepts behind latent variables are often rich, available indicators often fail to fully capture the substantive content behind these latent constructs (Treier & Jackman, 2008). This introduces the importance of content validity. Content validity exists when the scope of the latent construct is adequately represented by the indicators adopted for its measurement (Dunn et al., 1994). The standard approach to this problem is to use statistical procedures to combine the information into multiple indicators of the latent concept (Treier & Jackman, 2008).
If content validity does not exist, it can be argued that proceeding with further analysis is pointless since the latent construct is not sufficiently represented by the indicators considered (Dunn et al., 1994). Information from multiple indicators can be combined in several ways, including the use of a linear additive scale, simply summing each indicator, or weighting or re-scaling each item so that the contributions of each item to the scale are equal (Treier & Jackman, 2008).
Another important consideration in the modelling of latent variables is that of substantive validity.
Substantive validity refers to whether the items included to measure a construct are conceptually or theoretically linked to that construct. It differs from content validity in that it deals with each individual item (indicator) of a construct rather than with a set of items, as in the case of content validity. For a set of measurement items (scale) to have content validity, they must possess substantive validity (Dunn et al., 1994). For a description of various other types of validity considered in latent variable analysis, refer to Dunn et al. (1994).
43
There are several studies that recognize the latency of variables for which proxy variables have traditionally been used. Gao et al. (1997) specified “consumer taste” as a latent variable in an analysis of the effect of consumer taste on the demand for beef in the US. Patterson & Richards (2000) adopted a latent variable model to determine the effect of newspaper advertisement characteristics on consumer preferences for apples and on the demand for different apple varieties, specifying “consumer preferences” as a latent variable. Winklhofer & Diamantopoulos (2002) investigated the effect of various forecast performance criteria, such as bias, accuracy and cost, on sales forecasting effectiveness, which they defined as a latent variable with a number of imperfect indicators and causes. Shehzad (2006) adopted a latent variable approach to the problem of health unobservability, specifying child health as a latent variable.
The application of latent variable analysis is not limited to any particular field of study and several studies have recognized the latency of variables in agriculture. Ford & Shonkwiler (1994) acknowledged the unobservable nature of management ability, relating a measure of farm financial success to three latent measures of “managerial ability”. These included financial, dairy and crop managerial ability. For each of these aspects of managerial ability, four observable indicators were specified in an attempt to ensure model identification. Kalaitzandonakes & Dunn (1995) adopted a similar approach in a study concerning technical efficiency, managerial ability and farmer education in Guatemalan corn production. Managerial ability was regarded to be a latent variable, with education, farming experience, and relevant personal attributes and talents specified as imperfect indicators.
Ivaldi et al. (1994) and Ivaldi et al. (1995) investigated productive efficiency on samples of French grain producers and fruit growers, respectively. Both studies consider variations of the traditional production function approach in which individual levels of productive efficiency are proposed to be latent variables. Both studies consider applications of covariance structure analysis to deal with the estimation of the stochastic production function, and the measurement of technical efficiency in the case of Ivaldi et al. (1994). These latent variable approaches are credited for their ability to solve the problem of correlations between input quantities and individual effects.
Eposti & Pierani (2000) proposed an alternative approach to the measurement of technical change, specifying the “level of technology” as a latent variable. Their analysis aimed to investigate the sources of growth of output and the rate of technical change in Italian agriculture through the
44
inclusion of latent technology level into an input demand system. Since the latent level of technology cannot be directly estimated from the input demand system the authors adopted a MIMIC model framework.