Rules of thumb 2-5 transforming Data - Multivariate Data Analysis 8th-edition 2019

● To judge the potential impact of a transformation, calculate the ratio of the variable’s mean to its standard deviation:

● Noticeable effects should occur when the ratio is less than 4.

● When the transformation can be performed on either of two variables, select the variable with the smallest ratio.

● Transformations should be applied to the independent variables except in the case of heteroscedasticity.

● Heteroscedasticity can be remedied only by the transformation of the dependent variable in a dependence relationship; if a heteroscedastic relationship is also nonlinear, the dependent variable, and perhaps the independent variables, must be transformed.

● Transformations may change the interpretation of the variables; for example, transforming variables by taking their logarithm translates the relationship into a measure of proportional change (elasticity); always be sure to explore thoroughly the possible interpretations of the transformed variables.

● Use variables in their original (untransformed) format when profiling or interpreting results.

106

table 2.11 Distributional characteristics, testing for normality, and Possible Remedies SHAPe DeScRiPtoRS SkewnessKurtosisTests of NormalityApplicable Remedies VariableStatisticz valueStatisticz valueStatisticSignificanceDescription of the DistributiontransformationSignificance After Remedy Firm characteristics X62.24521.0121.13222.37.109.005Almost uniform distributionSquared term.015 X7.6602.74.7351.54.122.001Peaked with positive skewLogarithm.037 X82.2032.842.54821.15.060.200aNormal distribution X92.1362.562.58621.23.051.200aNormal distribution X10.044.182.88821.86.065.200aNormal distribution X112.0922.382.52221.09.060.200aNormal distribution X12.3771.56.410.86.111.004Slight positive skew and peakedness X132.24021.002.90321.89.106.007PeakedCubed term- X14.008.032.4452.93.064.200aNormal distribution X15.2991.24.016.03.074.200aNormal distribution X162.33421.39.244.51.129.000Negative skewnessSquared term.066 X17.3231.342.81621.71.101.013Peaked, positive skewnessInverse.187 X182.46321.92.218.46.084.082Normal distribution Performance Measures X19.078.322.79121.65.078.137Normal distribution X20.044.182.0892.19.077.147Normal distribution X212.0932.392.0902.19.073.200aNormal distribution X222.1322.552.68421.43.075.180Normal distribution aLower bound of true significance. Note: The z values are derived by dividing the statistics by the appropriate standard errors of .241 (skewness) and .478 (kurtosis). The equations for calculating the standar errors are given in the text.

normality tests. When viewing the shape characteristics, significant deviations were found for skewness 1X₇2 and kurtosis 1X₆2. One should note that only two variables were found with shape characteristics significantly different from the normal curve, while six variables were identified with the overall tests. The overall test provides no insight as to the transformations that might be best, whereas the shape characteristics provide guidelines for possible transformations. The researcher can also use the normal probability plots to identify the shape of the distribution. Figure 2.14 contains the normal probability plots for the six variables found to have the non-normal distributions. By combining information, from the empirical and graphical methods, the researcher can charac- terize the non-normal distribution in anticipation of selecting a transformation (see Table 2.11 for a description of each non-normal distribution).

X₆ Product Quality 0.00

0.00 .25 .50 .75 1.00

.25 .50 .75 1.00

X₇ E-Commerce Activities 0.00

0.00 .25 .50 .75 1.00

.25 .50 .75 1.00

0.00 0.00

.25 .50 .75 1.00

.25 .50 .75 1.00 0.000.00 .25 .50 .75 1.00

.25 .50 .75 1.00

0.000.00 .25 .50 .75 1.00

.25 .50 .75 1.00 0.000.00 .25 .50 .75 1.00

.25 .50 .75 1.00

X₁₂ Salesforce Image

X₁₆ Order & Billing

X₁₃ Competitive Pricing

X₁₇ Price Flexibility

Figure 2.14 normal Probability Plots (nPP) of non-normal Metric Variables (X₆, X₇, X₁₂, X₁₃, X₁₆, and X₁₇)

Table 2.11 also suggests the appropriate remedy for each of the variables. Two variables (X₆ and X₁₆) were transformed by taking the square root. X₇ was transformed by logarithm, whereas X₁₇ was squared and X₁₃ was cubed.

Only X₁₂ could not be transformed to improve on its distributional characteristics. For the other five variables, their tests of normality were now either nonsignificant (X₁₆ and X₁₇) or markedly improved to more acceptable levels (X₆, X₇, and X₁₃). Figure 2.15 demonstrates the effect of the transformation on X₁₇ in achieving normality. The transformed X₁₇ appears markedly more normal in the graphical portrayals, and the statistical descriptors are also improved. The researcher should always examine the transformed variables as rigorously as the original variables in terms of their normality and distribution shape.

In the case of the remaining variable 1X₁₂2, none of the transformations could improve the normality. This variable will have to be used in its original form. In situations where the normality of the variables is critical, the transformed variables can be used with the assurance that they meet the assumptions of normality. But the departures from normality are not so extreme in any of the original variables that they should never be used in any analysis in their original form. If the technique has a robustness to departures from normality, then the original variables may be preferred for the comparability in the interpretation phase.

HoMoSceDASticitY

All statistical packages have tests to assess homoscedasticity on a univariate basis (e.g., the Levene test in SPSS) where the variance of a metric variable is compared across levels of a nonmetric variable. For our purposes, we examine each of the metric variables across the five nonmetric variables in the dataset. These analyses are appropriate in preparation for analysis of variance or multivariate analysis of variance, in which the nonmetric variables are the independent variables, or for discriminant analysis, in which the nonmetric variables are the dependent measures.

Table 2.12 contains the results of the Levene test for each of the nonmetric variables. Among the performance factors, only X₄ (Region) has notable problems with heteroscedasticity. For the 13 firm characteristic variables, only X₆ and X₁₇ show patterns of heteroscedasticity on more than one of the nonmetric variables. Moreover, in no instance do any of the nonmetric variables have more than two problematic metric variables. The actual implications of these instances of heteroscedasticity must be examined whenever group differences are examined using these nonmetric variables as independent variables and these metric variables as dependent variables. The relative lack of either numerous problems or any consistent patterns across one of the nonmetric variables suggests that heteroscedasticity problems will be minimal. If the assumption violations are found, variable transformations are available to help remedy the variance dispersion.

The ability for transformations to address the problem of heteroscedasticity for X₁₇, if desired, is also shown in Figure 2.15. Before a logarithmic transformation was applied, heteroscedastic conditions were found on three of the five nonmetric variables. The transformation not only corrected the nonnormality problem, but also eliminated the problems with heteroscedasticity. It should be noted, however, that several transformations “fixed” the normality problem, but only the logarithmic transformation also addressed the heteroscedasticity, which demonstrates the relationship between normality and heteroscedasticity and the role of transformations in addressing each issue.

The tests for homoscedasticity of two metric variables, encountered in methods such as multiple regression, are best accomplished through graphical analysis, particularly an analysis of the residuals. The interested reader is referred to Chapter 5 for a complete discussion of residual analysis and the patterns of residuals indicative of heteroscedasticity.

LineARitY

The final assumption to be examined is the linearity of the relationships. In the case of individual variables, this linearity relates to the patterns of association between each pair of variables and the ability of the correlation coeffi- cient to adequately represent the relationship. If nonlinear relationships are indicated, then the researcher can either transform one or both of the variables to achieve linearity or create additional variables to represent the nonlinear components. For our purposes, we rely on the visual inspection of the relationships to determine whether nonlinear

109 table 2.12 testing for Homoscedasticity nonMetRic/cAteGoRicAL VARiABLe X1Customer Type

X2Industry TypeX3Firm SizeX4Region

X5Distribution System Metric VariableLevene StatisticSig.Levene StatisticSig.Levene StatisticSig.Levene StatisticSig.Levene Statistic Firm characteristics X617.47.00.01.94.02.8917.86.00.48 X7.58.56.09.76.09.76.05.832.87 X8.37.69.48.491.40.24.72.40.11 X9.43.65.02.88.17.68.58.451.20 X10.74.48.00.99.74.391.19.28.69 X11.05.95.15.70.09.763.44.071.72 X122.46.09.36.55.06.801.55.221.55 X13.84.434.43.041.71.19.24.632.09 X142.39.102.53.114.55.04.25.62.16 X151.13.33.47.491.05.31.01.94.59 X161.65.20.83.37.31.582.49.124.60 X175.56.012.84.104.19.0416.21.00.62 X18.87.43.30.59.18.672.25.144.27 Performance Measures X193.40.04.00.96.73.398.57.00.18 X201.64.20.03.86.03.867.07.01.46 X211.05.35.11.74.68.4111.54.002.67 X22.15.86.30.59.74.39.00.991.47 Notes: Values represent the Levene statistic value and the statistical significance in assessing the variance dispersion of each metric variable acr levels of the nonmetric/categorical variables. Values in bold are statistically significant at the .05 level or less.

110

ORIGINAL VARIABLE TRANSFORMED VARIABLE

Observed Value Observed Value

Normal Probability Plot of X17 — Price Flexibility Normal Probability Plot of Transformed X17

Expected Normal V alue Expected Normal V alue

17 X

Price Flexibility 17 X T ransformed

Original Distribution of X17 Price Flexibility22

8 346587 Transformed Distribution of X17

20 0 1020 0 0.80.81.2 1.01.41.61.82.02.2

X3 Film Size

Small (0 to 499)Lar (500+) 1.01.21.61.42.22.01.8

10 X3 Film Size

Small (0 to 499)Lar (500+)

Figure 2.15 transformation of X17 (Price Flexibility) to Achieve normality and Homoscedasticity

(Continued)

111 Distribution characteristics Before and After transformation SHAPE DESCRIPTORSTest of Normality SkewnessKurtosis Variable FormStatisticzvalueaStatisticzvalueaStatisticSignificance Original X17.3231.342.81621.71.101 Transformed X17 b2.121.502.80321.68.080 a The z values are derived by dividing the statistics by the appropriate standard errors of .241 (skewness) and .478 (kurtosis). The equations for calculating the standard errors are given in the text. b Logarithmic transformation. Levene Test Statistic Variable FormX1 customer typeX2 industry typeX3 Firm SizeX4 RegionX5 Distribution System Original X175.56**2.844.19*16.21**.62 X172.762.231.203.11**.01 * Significant at .05 significance level. ** Significant at .01 significance level.

Figure 2.15 Continued

relationships are present. The reader can also refer to Figure 2.2, the scatterplot matrix containing the scatterplot for selected metric variables in the dataset. Examination of the scatterplots does not reveal any apparent nonlinear relationships. Review of the scatterplots not shown in Figure 2.2 also did not reveal any apparent nonlinear patterns.

Thus, transformations are not deemed necessary. The assumption of linearity will also be checked for the entire multivariate model, as is done in the examination of residuals in multiple regression.

SUMMARY

The series of graphical and statistical tests directed toward assessing the assumptions underlying the multivariate techniques revealed relatively little in terms of violations of the assumptions. Where violations were indicated, they were relatively minor and should not present any serious problems in the course of the data analysis. The researcher is encouraged always to perform these simple, yet revealing, examinations of the data to ensure that potential problems can be identified and resolved before the analysis begins.

Dalam dokumen Multivariate Data Analysis 8th-edition 2019 (Halaman 125-132)