PGI 0.78936 0.27486 0.53311 0.00484 −0.131 1.0000
IWF 0.75343 −0.53523 0.1163 0.07618 0.35573 1.0000
AID 0.83563 0.07931 −0.19336 −0.50797 0.00153 1.0000
FCWSD 0.8153 −0.34541 −0.20346 0.20534 −0.3639 1.0000
IGC 0.72256 0.55585 −0.2504 0.27104 0.18093 1.0000
In this model, we assume that ε for a given indicator variable is independent of the ε for all other indicators, and independent of η. In addition, each indicator will have a loading (i.e., parameter estimate) associated with each factor. These loadings reflect the relationships between the factors and the indicators, with larger values being indicative of a closer association between a latent and an observed variable. In general, loadings range between −1 and 1, much like correlation coefficients.
Nonetheless, we will generally interpret these loadings much as we would correlation coefficients, so that larger values indicate a stronger relationship between an indicator and a factor. This factor model can be used to predict the correlation (or covariance) matrix of the indicator variables as expressed in equation (7.2).
Σ ΛΨΛ Θ= ′ + (Equation 7.2)
Where
Σ = Model predicted covariance (correlation) matrix of the indicators Ψ = Correlation matrix for the factors
Θ = Diagonal matrix of unique error variances.
Figure 7.1 provides a matrix representation of equation (7.2) for our college adjustment model, to aid understanding of the equations. It can be seen that each observed variable (X) has two factor loadings, one for each factor, each factor has a variance, the factors are correlated, and each measured variable has a uniqueness term, where these terms are assumed to be uncorrelated indicated by the 0s on the off-diagonal.
Before we move deeper into the steps for conducting EFA, let us first clarify what we have been referring to as factor loadings. The term loadings is generally discouraged because there are both pattern and structure coefficients. The pattern coefficients (PVxF) are analogous to Beta weights in multiple regression and represented in a matric of variables (v) by factors (f). In fact, some computer programs will label these as standardized regression coefficients. We have seen how these values are used to estimate the amount of variance that is explained in the measured vari- able by the factors. We have also seen how these coefficients can be used to represent the original correlation matrix of observed variables. The structure coefficients are also weights that represent the correlation between the measured variable and the factor. These are defined as the pattern coefficient × the factor correlation. This is why if the factor correlations are 0.0, pattern coeffi- cients are equal to structure coefficients. Proper interpretation in a correlated factor solution requires examining both the pattern and structure coefficients, such as one does in regression
X X X X X
1 2 3 4 5
11 12
21 22
31 32
41 42
51 52
=
λ λ λ λ λ λ λ λ λ λ
+
ϕ ϕ ϕ
θ θ
θ θ
θ
11
21 22
11 22
33 44
5
0 0 0 0 0 0
0 0 0 0 55
Figure 7.1 Matrix Notion of the Factor Model
where both the unstandardized slopes and the beta weights are examined (e.g., Thompson, 2004).
As in regression where beta weights equal the variables correlation with the outcome when pre- dictors are perfectly uncorrelated, pattern and structure coefficients are the same when factors are perfectly uncorrelated and differ in correlated factor models. Understanding the pattern and structure coefficients allows us to be more precisely and accurately interpret our factor analysis results.
Conduct of EFA consists of two primary steps to identify the factors: (1) factor extraction and (2) factor rotation. These steps can be carried out by the software simultaneously, but the researcher must make decisions regarding the method to use for each step. Factor extraction involves the initial estimation of model parameters, in particular the loadings, given the data at hand. There are potentially as many factors as there are observed indicators in the data, as we have shown. However, given that the goal of EFA is to identify a latent structure present in the data by assessing which latent variables drive responses to each observed indicator, in practice a small number of factors compared to variables is desired. We will discuss the issue of determining the number of factors to retain after we have first reviewed some of the most common extraction methods.
Factor Extraction
A number of factor extraction methods are available, with the most popular probably being max- imum likelihood (ML) and principal axis factoring (PAF), sometimes referred to as common fac- tors analysis. Other EFA extraction methods that are available, though used less than ML and PAF, are generalized least squares, unweighted least squares, weighted least squares, alpha factoring, and image factoring, to name a few. Be aware that the default in most programs is principal components analysis (PCA). This is not considered here due to the fact that it holds the strong assumption that measured variables are measured without error or are perfectly reliable (i.e., 1.0 on the diagonal of the original correlation matrix), an assumption not likely met in the social and behavioral sciences.
Note that PCA was used in the simple example of communalities early for merely learning the idea of the analysis. Whichever extraction method we use, the algorithm seeks to find estimates of factor loadings that will yield a predicted correlation matrix among the indicators, Σ, as close as possible to the observed correlation matrix, S, among the indicators. ML extraction uses the proximity of Σ and S to form a test statistic for evaluating the quality of a factor solution. Though ML has the advantage of providing a direct assessment of model fit, it also rests on an assumption of multivariate normality of the observed indicators. When this assumption is violated, model parameter estimates may not be accurate, and in some cases the algorithm will not be able to find a solution (Brown, 2015; Fabrigar &
Wegener, 2012). PAF does not rely on distributional assumptions about the indicators, and thus may be particularly attractive to use when the data are not normally distributed. However, it does not provide a statistical test of model fit. PAF also moves away from the strong assumption of PCA by replacing 1.0 on the diagonal with communality coefficients, a conservative estimate of reliability. All methods involve an iterative estimation process, which searches for a solution until results, generally parameter estimates, stabilize. This stabilization is generally judged using an accuracy criterion of the estimates. For example, it is common, but not necessary, for PAF to begin with a PCA. From the PCA, communality estimates for each indicator variable are used in the PAF to replace the 1.0 values on the diagonal of the analyzed correlation matrix. For details on estimation, specifically ML, see Eliason (1993).
Factor Rotation
An important aspect of EFA is that when more than one factor is obtained in the extraction step, the model is indeterminate in nature. This means that there are an infinite number of factor loading combinations that will yield the same mathematical fit to the data; i.e., there is more than one solu- tion of weights (e.g., pattern coefficients) that will yield the same Σ. Thus, we need a way to determine which of these is possible solutions is optimal for our dataset. This determination is made using factor rotation, which refers to the transformation of factor loadings so as to simplify interpretation of the results by seeking a simple structure solution. Thurstone (1947) defined simple structure as occurring when two conditions are met. First, each factor has a subset of the indicator variables that are highly (i.e., large coefficients) associated with the factor. Second, each indicator is highly asso- ciated with only one factor, and has coefficients near 0 on the other factors. Rotation adjusts all of the loadings with the goal of approximating simple structure. Rotation does not alter the underlying fit of the model, so that the value of Σ for the unrotated and rotated solutions is exactly the same as what we obtained from the initial extraction. In fact, if you compare communalities before and after rotation you will see that they do not change. Thus, the variance accounted for by the factor model does not change due to factor rotation, though the interpretation of the factors themselves might change quite a bit, as we will see below.
Factor rotation methods are generally described as being in one of two broad families, orthogonal and oblique. Orthogonal rotations constrain the correlations among factors to be 0, whereas oblique rotations allow the factors to be correlated. Within both broad rotational families there exist many varieties, differing based upon the criterion used to transform the data. As with methods of estima- tion, no one rotation approach is always optimal, but perhaps the most popular orthogonal rotation method is Varimax, while among the oblique rotations Promax, and Oblimin are popular. The deci- sion as to whether to use an orthogonal or oblique rotation should be based on both theoretical and empirical grounds. If the researcher anticipates that the factors will be correlated, then she should begin the analysis using an oblique rotation such as Promax. If the resulting correlations are small (e.g., close to 0), then the model can be estimated again using an orthogonal rotation. On the other hand, if the researcher thinks that the correlations among factors should be constrained to 0 for some theoretical reason, then she may only use the orthogonal rotation from the beginning. However, it should be noted that if the factors are in fact correlated but an orthogonal rotation is used, the result- ing factor loadings may be adversely affected and the accuracy of the solution compromised. In the social and behavioral sciences, it is likely the case that factors are correlated and thus, best to begin with a correlated solution, as it may be the most sensible (Fabrigar & Wegener, 2012).
Determining the Number of Factors
The determination of the number of factors to retain in EFA is perhaps the greatest methodological challenge that researchers using the technique face. The question itself is deceptively simple: How many factors should you keep? The complexity in answering this question comes from the fact that a number of approaches, some of which work better than others, can be used, with none having been shown to be universally optimal. In other words, there are many ways to address this question, but no one of these ways has been shown to work best in all cases. Given this fact, we will discuss several approaches for determining the number of factors to retain, with the recommendation that in prac- tice you make use of several of these and consider their results in combination when determining the number of factors to retain.
One of the earliest, and most popular approaches for determining the number of factors is the eigenvalue greater than 1 criterion, also sometimes referred to as Kaiser’s Little Jiffy or the Kai- ser criterion, in honor of its progenitor (Fabrigar & Wegener, 2012; Kaiser, 1960; Pett, Lackey, &
Sullivan, 2003). Recall that an eigenvalue for a given factor is the amount of the variance in all the variables that is accounted for by that factor. The ratio of eigenvalues is the ratio of explan- atory importance of the factors with respect to the variables. If a factor has a low eigenvalue, then it is contributing little to the explanation of variances in the variables and may be ignored as redundant with more important factors. However, the eigenvalue is not the percent of vari- ance explained but rather a measure of amount, used for comparison with other eigenvalues. You may recall computing these values in your multivariate statistics course. This method utilizes the eigenvalues that are associated with each of the factors and retains those with values that are greater than 1. The rationale behind this method is based upon the fact that an eigenvalue reflects the variance associated with a factor. When the observed indicator variables are standardized, they each have a variance of 1. Thus, factors with eigenvalues larger than 1 account for more variance in the data than any one observed variable. However, while simple and somewhat intui- tive, this method for determining the number of factors to retain can yield solutions that are too complex and thus retains too many factors (Pett, Lackey, & Sullivan, 2003). Moreover, it has no theoretical biases for its use with common factor models (Gorsuch, 1980). Therefore, although it is mentioned often in the literature, it may not be the best rule to use and should be avoided in light of other better performing methods.
Perhaps second in popularity to the Kaiser criterion is the use of the scree plot, which is simply a scatterplot with the eigenvalue on the y-axis and the factor number on the x-axis. The researcher using this approach examines the plot, looking for the point where a line connecting the eigen- values begins to flatten out in its rate of decline. The area below the major drops in values is scree, related to scree in the physical world. That is, scree is a mass of small loose stones that form or cover a slope on a mountain. We will examine the scree plot in action in our examples below.
A third descriptive approach for determining the number of factors that relies on the eigenvalues involves an examination of the proportion of variation in the entire set of observed indicators that is accounted for by each factor (e.g., < 10%), and by a set of factors as a whole (< 75%). There are no exact rules that constitute a sufficient amount of variance accounted for in the solution. How- ever, as with the scree plot, the researcher would examine the proportions and attempt to ascertain where including more factors in the solution does not yield an appreciable increase in the overall proportion of explained variance. An additional descriptive tool for determining the number of factors involves an examination of the residual correlation matrix for the observed indicators.
Recall that the factor analysis algorithm seeks to identify model parameters (i.e., loadings) that will yield a predicted covariance (or correlation) matrix as similar to the actual matrix as possible.
Thus, one tool that might prove useful in ascertaining whether an EFA solution is a good fit to the data, is the matrix of residual correlations (difference between the observed and model predicted correlations) for the indicators. By convention, residuals larger than 0.05 are considered to be too large so that a good solution is one which produces few residual correlations greater than 0.05, in absolute value.
Beyond these descriptive and subjective methods, inferential methods for determining the num- ber of factors to retain exist as well. In particular, when ML is used for factor extraction, a Chi-square test can be constructed comparing the relative fit of differing factor solutions. As noted above, results from ML extraction can be used to compare the relative proximity of Σ and S in the form of a Chi-square goodness of fit test. The null hypothesis of this test is that Σ = S, such that a statistically
How It Works 7.3
Parallel Analysis
Parallel analysis (Horn, 1965) is a robust method for determining the number of factors in EFA, yet it is likely an under-utilized procedure. This is likely due to the fact that it is not a component of the standard software packages most people use for factor analysis. The following steps are used in PA:
1. Fit an EFA to the original dataset and retain the eigenvalues for each factor.
2. Generate or simulate observed data with marginal characteristics identical to the observed data (i.e., same means and standard deviations), but with uncorrelated indicators.
3. Fit an EFA to the generated data and retain the eigenvalues for each factor.
4. Repeat steps 2 and 3 many times (e.g., 1,000) in order to develop distributions for each eigen- value under the case where indicators are not related to one another.
5. Compare the observed eigenvalue for the first factor with the 95th percentile of the distribu- tion of first factor eigenvalues from the generated data. If the observed value is greater than or equal to the 95th percentile value, conclude that at minimum one factor should be retained, and continue to step 6. If the observed value is less than the 95th percentile, then stop and con- clude that there exist no common factors.
6. Compare the observed eigenvalue for each factor with the 95th percentile for the correspond- ing eigenvalue distribution of the generated data. If the observed value is greater than or equal to the 95th percentile, retain that factor (e.g., the second factor, the third factor, etc.), and move to the next factor in line. This process stops when the observed eigenvalue is less than the 95th percentile of the generated data.
Table 7.3 contains the eigenvalues from the dataset referenced above and randomly generated eigenvalues using the SAS program referenced below. The real data eigenvalues were obtained from assuming a common factor model with error (i.e., not 1.0 on the diagonal). In such cases, PA tends to indicate more factors than what actually are present and meaningful (e.g., Buja & Eyuboglu, 1992). Given other guidelines, especially interpretation, this is likely what has occurred here. It is the combination of evidence that leads us to a 2-factor model.
significant result would lead to the rejection of the null that the model fits the data perfectly. Unfor- tunately, such perfect fit is rarely achieved, even for sample models that are reasonably close to the population generating model. Therefore, this test is not particularly useful for assessing the fit of an individual model, and when the sample size is large, too many factors can be retained due to the sensitivity of the test statistic (Kim & Muller, 1978).
A second inferential approach for factor retention decisions is parallel analysis (PA), which was first described by Horn (1965). PA is a robust method for determining the number of factors to retain in EFA, yet it is likely an under-utilized procedure. This under-use is likely due to the fact that it is not a component of the standard software packages most people use for factor analysis. The basic idea underlying PA is that one would not expect to identify more meaningful factors in randomly generated data that are of the same rank as the observed data than would be the case for the fac- tors that are identified from the actual observed data itself. Parallel analysis consists of several steps using routines that have been developed in for many statistical packages, including R, SAS, SPSS, and MATLAB. PA is frequently more accurate compared to other approaches that are commonly used (Fabrigar & Wegener, 2012).
Another option that is possible with large datasets is to look at the stability of factor solutions. If you are fortunate to have a large dataset that can be divided in half (or into more than two sets) at random, an EFA can be performed on each subset of the original data. The results (e.g., pattern of parameter estimates) can be evaluated across these solutions. The factor model judged to be most stable (selected most commonly across the subsets) is selected as the correct one. The last criterion, and perhaps the most important, is interpretability of the factors (Gorsuch, 1980). If the resulting number of factors cannot all be interpreted with meaning for practice, then it is likely not the correct number of factors. This criterion can assist with over extracting meaningless factors, or scree!
As we stated at the beginning of this discussion, no single approach, including PA, is optimal in every situation. Therefore, we recommend that in practice the researcher use several approaches for determining the number of factors to retain in an attempt to build a case for the best solution given the current set of data. The criteria that are to be used in this effort should be identified prior to the beginning of the analysis, just as one would specify the p-value used to judge statistical significance before running a statistical test. After all of the evidence is collected, the researcher should then consider which solution was identified as optimal by the largest number of high-quality methods, and also whether this solution is conceptually meaningful. In the final analysis, this theoretically meaningful criterion is the most important. It is not worthwhile to retain factors if they do not carry a meaningful interpretation that can be defended in the literature.
Sample programs for MATLAB, R, SPSS, SAS can be found here: https://people.ok.ubc.ca/brioconn/
nfactors/nfactors.html.
Table 7.3 Eigenvalues From Real Data and Randomly Generated Data
Number Real Data
Eigenvalues Random Eigenvalues (95th Percentile)
1 2.562 0.064 (0.095)
2 0.263 0.028 (0.049)
3 −0.088 0.0008 (0.017)
4 −0.119 −0.025 (−0.008)
5 −0.204 −0.058 (−0.035)
Psychometrics in the Real World: EFA
To see how EFA works in practice, we will fit an EFA model to the dataset described above. We use SAS for this example and the dataset and code can be found in the eResources. Recall that our exam- ple contains five scales designed to measure student integration to the university. The scale develop- ers thought there may be two factors (Faculty and Student Integration), but it was not certain that this was not a unidimensional measure of student integration to the general university environment.
An EFA using the PAF extraction technique and Promax rotation was applied the correlation matrix.
Promax was employed given that the factors are expected to be correlated. Several criteria were