• Tidak ada hasil yang ditemukan

ASSUMPtionS in FActoR AnALYSiS

Dalam dokumen Multivariate Data Analysis 8th-edition 2019 (Halaman 185-200)

An illustrative example

StAGe 3: ASSUMPtionS in FActoR AnALYSiS

The underlying statistical assumptions influence exploratory factor analysis to the extent that they affect the derived correlations. Departures from normality, homoscedasticity, and linearity can diminish correlations between variables.

These assumptions are examined in Chapter 2, and the reader is encouraged to review the findings. The researcher must also assess the factorability of the correlation matrix.

Visual Examination of the Correlations Table 3.4 shows the correlation matrix for the 13 perceptions of HBAT.

Inspection of the correlation matrix reveals that 29 of the 78 correlations (37%) are significant at the .01 level, which provides an adequate basis for proceeding to an empirical examination of adequacy for factor analysis on both an overall basis and for each variable. Tabulating the number of significant correlations per variable finds a range from 0 (X1 5) to 9 (X1 7). Although no limits are placed on what is too high or low, variables that have no significant correlations may not be part of any factor, and if a variable has a large number of correlations, it may be part of several factors. We can note these patterns and see how they are reflected as the analysis proceeds.

table 3.4 Assessing the Appropriateness of exploratory Factor Analysis: correlations, Measures of Sampling Adequacy, and Partial correlations Among Variables Correlations Among Variables X6X7X8X9X10X11X12X13X14X15X16X17X18

corr Significant at .01 Level X6 Product Quality1.0002.137 .096.1062.053.4772.1522.401.088.027.1042.493.028 X7 E-Commerce1.000 .001.140.4302.053.792.229.0522.027.156.271.192 X8 Technical Support1.000.0972.063.193.0172.271.7972.074.0802.186.025 X9 Complaint Resolution1.000.197.561.2302.128.140.059.757.395.865 X10 Advertising1.0002.012.542.134.011.084.184.334.276 X11 Product Line1.0002.0612.495.273.046.4242.378.602 X12 Salesforce Image1.000.265.107.032.195.352.272 X13 Competitive Pricing1.0002.245.0232.115.4712.073 X14 Warranty & Claims1.000.035.1972.170.109 X15 Packaging1.000.069.094.106 X16 Order & Billing1.000.407.751 X17 Price Flexibility1.000.497 X18 Delivery Speed1.000 Note: Bolded values indicate correlations significant at the .01 significance level. Overall Measure of Sampling Adequacy: .609 Bartlett Test of Sphericity: 948.9 Significance: .000

166

Measures of Sampling Adequacy and Partial Correlations X6X7X8X9X10X11X12X13X14X15X16X17 X6 Product Quality.873 X7 E-Commerce.038.620 X8 Technical Support2.0492.060.527 X9 Complaint Resolution2.082.1172.150.890 X10 Advertising2.122.002.049.092.807 X11 Product Line2.0232.157.0672.1522.101.448 X12 Salesforce Image2.0062.729.0772.1542.333.273.586 X13 Competitive Pricing.0542.018.125.049.0902.0882.138.879 X14 Warranty & Claims.124.0912.792.1232.0202.1032.1722.019.529 X15 Packaging2.076.091.143.0612.0262.1182.054.0152.138.314 X16 Order & Billing2.1892.105.1602.312.044.044.100.1062.250.031.859 X17 Price Flexibility.1352.134.0312.1432.151.953.2412.2122.0292.1372.037.442 X18 Delivery Speed.013.1362.0282.081.0642.9412.254.126.070.0902.1092.922 Note: Measures of sampling adequacy (MSA) are on the diagonal, partial correlations in the off-diagonal.

167

Bartlett's Test and MSA Values The researcher can assess the overall significance of the correlation matrix with the Bartlett test and the factorability of the overall set of variables and individual variables using the measure of sampling adequacy (MSA). Because exploratory factor analysis will always derive factors, the objective is to ensure a base level of statistical correlation within the set of variables, such that the resulting factor structure has some objective basis.

In this example, the Bartlett’s test finds that the correlations, when taken collectively, are significant at the .0001 level (see Table 3.4). But this test only indicates the presence of non-zero correlations, not the pattern of these correla- tions. More specific measures related to the patterns of variables and even specific variables are required.

The measure of sampling adequacy (MSA) looks not only at the correlations, but also at patterns between variables. In this situation the overall MSA value falls in the acceptable range (above .50) with a value of .609.

Examination of the values for each variable, however, identifies three variables (X11,X15, and X17) with MSA values under .50. Because X15 has the lowest MSA value, it will be omitted in the attempt to obtain a set of variables that can exceed the minimum acceptable MSA levels. Recalculating the MSA values after excluding X15 finds that X17

still has an individual MSA value below .50, so it is also deleted from the analysis. We should note at this point that X15 and X17 were the two variables with the lowest and highest number of significant correlations, respectively.

Table 3.5 contains the correlation matrix for the revised set of variables (X15 and X17 deleted) along with the measures of sampling adequacy and the Bartlett test value. In the reduced correlation matrix, 20 of the 55 correla- tions are statistically significant. As with the full set of variables, the Bartlett test shows that non-zero correlations exist at the significance level of .0001. The reduced set of variables collectively meets the necessary threshold of sampling adequacy with an MSA value of .653. Each of the variables also exceeds the threshold value, indicating that the reduced set of variables meets the fundamental requirements for factor analysis. Finally, examining the partial correlations shows only five with values greater than .50 (X6–X11,X7–X12,X8–X14,X9–X18, and X11–X18), which is another indicator of the strength of the interrelationships among the variables in the reduced set. It is of note that both X11 and X18 are involved in two of the high partial correlations. Collectively, these measures all indicate that the reduced set of variables is appropriate for factor analysis, and the analysis can proceed to the next stages.

PRinciPAL coMPonent FActoR AnALYSiS: StAGeS 4–7

As noted earlier, factor analysis procedures are based on the initial computation of a complete table of intercorrelations among the variables (correlation matrix). The correlation matrix is then transformed through estimation of a factor model to obtain a factor matrix that contains factor loadings for each variable on each derived factor. The loadings of each variable on the factors are then interpreted to identify the underlying structure of the variables, in this example the perceptions of HBAT. These steps of factor analysis, contained in stages 4–7, are examined first for principal component analysis. Then, a common factor analysis is performed and comparisons made between the two factor models.

Stage 4: Deriving Factors and Assessing Overall Fit Given that the principal components method of extraction will be used first, the next decision is to select the number of components to be retained for further analysis. As discussed earlier, the researcher should employ a number of different criteria in determining the number of factors to be retained for interpretation, ranging from the more subjective (e.g., selecting a number of factors a priori or specifying the percentage of variance extracted) to the more objective (latent root criterion, scree test or parallel analysis) criteria.

StoPPinG RULeS Table 3.6 contains the information regarding the 11 possible factors and their relative explanatory power as expressed by their eigenvalues. In addition to assessing the importance of each component, we can also use the eigenvalues to assist in selecting the number of factors.

A priori criterion. The researcher is not bound by preconceptions as to the number of factors that should be retained, but practical reasons of desiring multiple measures per factor (at least 2 and preferably 3) dictate that between three and five factors would be best given the 11 variables to be analyzed.

Measures of Sampling Adequacy and Partial Correlations X6X7X8X9X10X11X12X13X14X16X X6 Product Quality.509 X7 E-Commerce.061.626 X8 Technical Support2.0452.068.519 X9 Complaint Resolution2.062.0972.156.787 X10 Advertising2.1072.015.062.074.779 X11 Product Line2.5032.101.1172.054.143.622 X12 Salesforce Image2.0422.725.0762.1242.311.148.622 X13 Competitive Pricing.0852.047.139.020.060.3862.092.753 X14 Warranty & Claims.122.1002.787.1272.0322.2462.1752.028.511 X16 Order & Billing2.1842.113.1602.322.040.261.113.1012.250.760 X18 Delivery Speed.355.040.0172.5552.2022.5292.0872.184.1002.369.666 Note: Measures of sampling adequacy (MSA) are on the diagonal, partial correlations in the off-diagonal.

table 3.5 Assessing the Appropriateness of Factor Analysis for the Revised Set of Variables (X15 and X17 Deleted): correlations, Measures of Sampling Adequacy, and Partial correlations Among Variables Correlations Among Variables X6X7X8X9X10X11X12X13X14X16X18

correlations Significant at .01 Level X6 Product Quality1.0002.137.096.1062.053.4772.1522.401.088.104.0282 X7 E-Commerce1.000.001.140.4302.053.792.229.052.156.1922 X8 Technical Support1.000.0972.063.193.0172.271.797.080.0252 X9 Complaint Resolution1.000.197.561.2302.128.140.757.8654 X10 Advertising1.0002.012.542.134.011.184.2763 X11 Product Line1.0002.0612.495.273.424.6026 X12 Salesforce Image1.000.265.107.195.2725 X13 Competitive Pricing1.0002.2452.1152.0735 X14 Warranty & Claims1.000.197.1093 X16 Order & Billing1.000.7513 X18 Delivery Speed1.0005 Note: Bolded values indicate correlations significant at the .01 significance level. Overall Measure of Sampling Adequacy: .653 Bartlett's Test of Sphericity: 619.3 Significance: .000

Latent root criterion. If we retain factors with eigenvalues greater than 1.0, four factors will be retained.

Percentage of variance criterion. The four factors retained represent 79.6 percent of the variance of the 11 variables, deemed sufficient in terms of total variance explained.

Scree test. As shown in Figure 3.11, the scree test indicates that four or perhaps five factors may be appropriate when considering the changes in eigenvalues (i.e., identifying the “elbow” in the eigenvalues at the fifth factor).

In viewing the eigenvalue for the fifth factor, its low value (.61) relative to the latent root criterion value of 1.0 precluded its inclusion. If the eigenvalue had been quite close to 1, then it might be considered for inclusion as well.

Parallel analysis. Table 3.7 contains the parallel analysis for the full set of variables as well as the final set of reduced variables. For the parallel analysis, the mean of the eigenvalues of the random datasets is given, along with the 95th percentile which may be used as an even more conservative threshold. For our purposes, we will use the mean values for comparison.

For the full set of 13 variables, we would select five factors based on the latent root criterion, although noting that the fifth factor had an eigenvalue (1.01)—barely exceeding 1. The parallel analysis would indicate four factors, as the mean of the eigenvalues in parallel analysis (1.14) exceeds the actual eigenvalue (1.01). Examining the reduced

table 3.6 Results for the extraction of component Factors component

Eigenvalues

total % of Variance cumulative %

1 3.43 31.2 31.2

2 2.55 23.2 54.3

3 1.69 15.4 69.7

4 1.09 9.9 79.6

5 .61 5.5 85.1

6 .55 5.0 90.2

7 .40 3.7 93.8

8 .25 2.2 96.0

9 .20 1.9 97.9

10 .13 1.2 99.1

11 .10 .9 100.0

Figure 3.11

Scree test for component Analysis

Eigenvalue

2.0 3.0 4.0

1.0

0.0 1

Factor

2 3 4 5 6 7 8 9 10 11

table 3.7 Parallel Analysis as a Stopping Rule for Principal components Analysis

Analysis of Full Variable Set (13 Variables)

PcA Results Parallel Analysis

component eigenvalue Mean 95th Percentile

1 3.57 1.64 1.79

2 3.00 1.47 1.57

3 1.74 1.35 1.44

4 1.29 1.24 1.30

5 1.01 1.14 1.22

6 0.62 1.05 1.11

7 0.55 0.96 1.02

8 0.45 0.88 0.93

9 0.28 0.81 0.88

10 0.20 0.74 0.80

11 0.17 0.66 0.74

12 0.13 0.57 0.63

13 0.01 0.49 0.55

Analysis of Final Variable Set (11 Variables)

PcA Results Parallel Analysis

component eigenvalue Mean 95th Percentile

1 3.43 1.57 1.72

2 2.55 1.40 1.51

3 1.69 1.27 1.36

4 1.09 1.17 1.24

5 0.61 1.06 1.12

6 0.55 0.97 1.04

7 0.40 0.88 0.95

8 0.25 0.80 0.87

9 0.20 0.72 0.79

10 0.13 0.63 0.70

11 0.10 0.53 0.60

set of 11 variables we would retain four components per the latent root criterion, but parallel analysis retains only three, with the mean eigenvalues greater than the fourth eigenvalue (1.17 versus 1.09). So in both cases parallel analysis is more conservative in retaining components, in each instance providing evidence that the final factor considered for retention, while passing the latent root criterion, may not be suitable. This is as expected when the final factors have eigenvalues very close to 1.

Combining all these criteria together is essential given that there is no single best method for determining the number of factors. In this case it leads to the conclusion to retain four factors for further analysis. When questions arise as to the appropriate number of factors, researchers are encouraged to evaluate the alternative solutions. As will be shown in a later section, the three- and five-factor solutions are somewhat less interpretable or useful, giving additional support to a four factor solution. More importantly, these results illustrate the need for multiple decision criteria in deciding the number of components to be retained.

Stage 5: Interpreting the Factors With four factors to be analyzed, the researcher now turns to interpreting the factors. Once the factor matrix of loadings has been calculated, the interpretation process proceeds by exam- ining the unrotated and then rotated factor matrices for significant factor loadings and adequate communalities.

table 3.8 Unrotated component Analysis Factor Matrix Variables

Factor

Communality

1 2 3 4

X6 Product Quality .248 2.501 2.081 .670 .768

X7 E-Commerce .307 .713 .306 .284 .777

X8 Technical Support .292 2.369 .794 2.202 .893

X9 Complaint Resolution .871 .031 2.274 2.215 .881

X10 Advertising .340 .581 .115 .331 .576

X11 Product Line .716 2.455 2.151 .212 .787

X12 Salesforce Image .377 .752 .314 .232 .859

X13 Competitive Pricing 2.281 .660 2.069 2.348 .641

X14 Warranty & Claims .394 2.306 .778 2.193 .892

X16 Order & Billing .809 .042 2.220 2.247 .766

X18 Delivery Speed .876 .117 2.302 2.206 .914

Total

Sum of Squares (eigenvalue) 3.427 2.551 1.691 1.087 8.756

Percentage of tracea 31.15 23.19 15.37 9.88 79.59

a Trace 5 11.0 (sum of eigenvalues)

If deficiencies are found (i.e., cross-loadings or factors with only a single variable), respecification of the factors is considered. Once the factors are finalized, they can be described based on the significant factor loadings charac- terizing each factor.

SteP 1: exAMine tHe FActoR MAtRix oF LoADinGS FoR tHe UnRotAteD FActoR MAtRix Factor loadings, in either the unro- tated or rotated factor matrices, represent the degree of association (correlation) of each variable with each factor. The loadings take on a key role in interpretation of the factors, particularly if they are used in ways that require charac- terization as to the substantive meaning of the factors (e.g., as predictor variables in a dependence relationship). The objective of factor analysis in these instances is to maximize the association of each variable with a single factor, many times through rotation of the factor matrix. The researcher must make a judgment as to the adequacy of the solution in this stage and its representation of the structure of the variables and ability to meet the goals of the research. We will first examine the unrotated factor solution and determine whether the use of the rotated solution is necessary.

Table 3.8 presents the unrotated principal component analysis factor matrix. To begin the analysis, let us explain the numbers included in the table. Five columns of numbers are shown. The first four are the results for the four factors that are extracted (i.e., factor loadings of each variable on each of the factors). The fifth column provides summary statistics detailing how well each variable is explained by the four components (communality), which are discussed in the next section. The first row of numbers at the bottom of each column is the column sum of squared factor load- ings (eigenvalues) and indicates the relative importance of each factor in accounting for the variance associated with the set of variables. Note that the sums of squares for the four factors are 3.427, 2.551, 1.691, and 1.087, respectively.

As expected, the factor solution extracts the factors in the order of their importance, with factor 1 accounting for the most variance, factor 2 slightly less, and so on through all 11 factors. At the far right-hand side of the row is the number 8.756, which represents the total of the four eigenvalues (3.427 1 2.551 1 1.691 1 1.087). The total of eigenvalues represents the total amount of variance extracted by the factor solution.

The total amount of variance explained by either a single factor or the overall factor solution can be compared to the total variation in the set of variables as represented by the trace of the factor matrix. The trace is the total variance to be explained and is equal to the sum of the eigenvalues of the variable set. In principal components analysis, the trace is equal to the number of variables because each variable has a possible eigenvalue of 1.0. By adding the percentages of trace for each of the factors (or dividing the total eigenvalues of the factors by the trace), we obtain the total percentage of trace extracted for the factor solution. This total is used as an index to determine how well a

particular factor solution accounts for what all the variables together represent. If the variables are all very different from one another, this index will be low. If the variables fall into one or more highly redundant or related groups, and if the extracted factors account for all the groups, the index will approach 100 percent.

The percentages of trace explained by each of the four factors (31.15%, 23.19%, 15.37%, and 9.88%, respectively) are shown as the last row of values of Table 3.7. The percentage of trace is obtained by dividing each factor’s sum of squares (eigenvalues) by the trace for the set of variables being analyzed. For example, dividing the sum of squares of 3.427 for factor 1 by the trace of 11.0 results in the percentage of trace of 31.154 percent for factor 1. The index for the overall solution shows that 79.59 percent of the total variance 18.756411.02 is represented by the information contained in the factor matrix of the four-factor solution. Therefore, the index for this solution is high, and the variables are in fact highly related to one another.

SteP 2: iDentiFY tHe SiGniFicAnt LoADinGS in tHe UnRotAteD FActoR MAtRix Having defined the various elements of the unrotated factor matrix, let us examine the factor-loading patterns. As discussed earlier, the factor loadings allow for the description of each factor and the structure in the set of variables.

As anticipated, the first factor accounts for the largest amount of variance in Table 3.8. The second factor is some- what of a general factor, with half of the variables having a high loading (high loading is defined as greater than .40).

The third factor has two high loadings, whereas the fourth factor only has one high loading. Based on this factor-load- ing pattern with a relatively large number of high loadings on factor 2 and only one high loading on factor 4, inter- pretation would be difficult and theoretically less meaningful. Therefore, the researcher should proceed to rotate the factor matrix to redistribute the variance from the earlier factors to the later factors. Rotation should result in a simpler and theoretically more meaningful factor pattern. However, before proceeding with the rotation process, we must examine the communalities to see whether any variables have communalities so low that they should be eliminated.

SteP 3: ASSeSS tHe coMMUnALitieS oF tHe VARiABLeS in tHe UnRotAteD FActoR MAtRix The row sum of squared factor load- ings are referred to as communalities. The communalities show the amount of variance in a variable that is accounted for by all of the retained factors taken together. The size of the communality is a useful index for assessing how much variance in a particular variable is accounted for by the factor solution. Higher communality values indicate that a large amount of the variance in a variable has been extracted by the factor solution. Small communalities show that a substantial portion of the variable’s variance is not accounted for by the factors. Although no statistical guidelines indicate exactly what is “large” or “small,” practical considerations are consistent with a lower level of .50 for communalities in this analysis.

The communalities in Table 3.8 are shown at the far right side of the table. For instance, the communality value of .576 for variable X10 indicates that it has less in common with the other variables included in the analysis than does variable X8, which has a communality of .893. Both variables, however, still share more than one-half of their variance with the four factors. All of the communalities are sufficiently high to proceed with the rotation of the factor matrix.

StePS 2 AnD 3: ASSeSS tHe SiGniFicAnt FActoR LoADinG(S) AnD coMMUnALitieS oF tHe RotAteD FActoR MAtRix Given that the unrotated factor matrix did not have a completely clean set of factor loadings (i.e., had substantial cross-loadings or did not maximize the loadings of each variable on one factor), a rotation technique can be applied to hopefully improve the interpretation. In this case, the VARIMAX rotation is used and its impact on the overall factor solution and the factor loadings are described next.

Applying the Orthogonal (VARIMAX) Rotation The VARIMAX-rotated principal component analysis factor matrix is shown in Table 3.9. Note that the total amount of variance extracted is the same in the rotated solution as it was in the unrotated solution, 79.6 percent. Also, the communalities for each variable do not change when a rotation technique is applied. Still, two differences do emerge. First, the variance is redistributed so that the factor-loading pattern and the percentage of variance for each of the factors are slightly different. Specifically, in the VARIMAX-rotated factor solution, the first factor accounts for 26.3 percent of the variance, compared to 31.2 percent in the unrotated solution.

Likewise, the other factors also change, the largest change being the fourth factor, increasing from 9.9 percent in the unrotated solution to 16.1 percent in the rotated solution. Thus, the explanatory power shifted slightly to a more

even distribution because of the rotation. Second, the interpretation of the factor matrix is simplified. As will be discussed in the next section, the factor loadings for each variable are maximized for each variable on one factor, except in any instances of cross-loadings.

With the rotation complete, the researcher now examines the rotated factor matrix for the patterns of significant factor loadings hoping to find a simplified structure. If any problems remain (i.e., nonsignificant loadings for one or more variables, cross-loadings, or unacceptable communalities), the researcher must consider respecification of the factor analysis through the set of options discussed earlier.

Our first examination is to see if “simple structure” is found in the rotated factor solution. As described earlier, the solution is evaluated from the perspectives of each variable, each factor and each pair of factors. Examination

table 3.9 VARiMAx-Rotated component Analysis Factor Matrices: Full and Reduced Sets of Variables

Full Set of Variables

VARiMAx-RotAteD LoADinGS a

Communality Factor

1 2 3 4

X18 Delivery Speed .938 .177 2.005 .052 .914

X9 Complaint Resolution .926 .116 .048 .091 .881

X16 Order & Billing .864 .107 .084 2.039 .766

X12 Salesforce Image .133 .900 .076 2.159 .859

X7 E-Commerce .057 .871 .047 2.117 .777

X10 Advertising .139 .742 2.082 .015 .576

X8 Technical Support .018 2.024 .939 .101 .893

X14 Warranty & Claims .110 .055 .931 .102 .892

X6 Product Quality .002 2.013 2.033 .876 .768

X13 Competitive Pricing 2.085 .226 2.246 2.723 .641

X11 Product Line .591 2.064 .146 .642 .787

Total

Sum of Squares (eigenvalue) 2.893 2.234 1.855 1.774 8.756

Percentage of trace 26.30 20.31 16.87 16.12 79.59

a Factor loadings greater than .40 are in bold and variables have been sorted by loadings on each factor.

Reduced Set of Variables (X11 deleted)

VARiMAx-RotAteD LoADinGS a

Communality Factor

1 2 3 4

X9 Complaint Resolution .933 .890

X18 Delivery Speed .931 .894

X16 Order & Billing .886 .806

X12 Salesforce Image .898 .860

X7 E-Commerce .868 .780

X10 Advertising .743 .585

X8 Technical Support .940 .894

X14 Warranty & Claims .933 .891

X6 Product Quality .892 .798

X13 Competitive Pricing 2.730 .661

Total

Sum of Squares (eigenvalue) 2.589 2.216 1.846 1.406 8.057

Percentage of trace 25.89 22.16 18.46 14.06 80.57

a Factor loadings less than .40 have not been printed and variables have been sorted by loadings.

Dalam dokumen Multivariate Data Analysis 8th-edition 2019 (Halaman 185-200)