RELEASE
IV. STATISTICAL METHODOLOGIES
Parametric statistics were used to compare data sets for
gametophyte lengths and areas. Random size samplings from brown algal gametophyte cultures often yielded size data that were not normally distributed. One tail of the size distribution was often skewed to larger sizes in cultures aged one week and older (Figure 2.3).
Variances usually increased proportionally with the means as cultures aged. If this was detected, raw data were transformed to their
natural logarithms to stabilize variances. Transformations permitted comparisons of control and treated cultures by rendering variances homogeneous. Means computed from log-transformed data ("geometric"
means) were usually smaller than means calculated from the original data. Geometric means were back-transformed prior to graphic display.
Influences from log-transformations on means, standard deviations and 95% confidence limits usually did not alter determinations of
significance as computed from untransformed data. Occasionally transformations did not render variances homogeneous. Significance tests relying on pooled data were avoided if this was found to be the case.
Significance of results was initially assessed by computing 95 percent confidence limits from the sample size (n) and standard deviation:
95% conf. std.dev. · t95 / (n-1) , (2.2)
Figure 2.3: Example of broadening of gametophyte size distribution with time. Data shown for Pterygophora californica Series C control culture.
.8 .7
VII .6
-
Cg
.5CJ
.E .4
-
0 ~ .3 :;::CJ
e
.2Li..
.1 0
.s
.7
fll .6
- 5
C.s
CJ
.E .,
-
0.§·3
-
CJe
.2a.a..
.1 0
Ill 110
..,
0-
co IICI
..,
0-
Gametophyte size distribution
P.callfornfcc Day 3 n=40
0 ll0 N
N on
IC ID
....
Area (um2)
Gametophyte size distribution
P.californicc Day 9 n=19
Area (um2)
N ll0 ID
where t95 is the two-tailed tabulated Student's t-value for n-1
degrees of freedom as found in standard statistical tables (Rohlf and Sokal, 1981). The assumed null hypothesis when comparing means from two samples is that the samples were drawn from populations with identical means. The null hypothesis was rejected at the 5 percent confidence level if the confidence limits from the two samples did not overlap.
An alternative method of computing t-statistics for comparison of data from two samples employs a pooled standard deviation, which is computed using data from both samples (Alder and Roessler, 1972). A t-value is computed directly from the data and compared to tabulated values oft for n1 + nz - 2 degrees freedom, where n1 and nz are sizes of the two samples. This method assumes that samples are drawn from populations with identical standard deviations even if the means are found to be different. The pooled standard deviation method was avoided because this assumption was not valid for our experimental systems. Measured standard deviations increased proportionately with means as cultures grew with time and occasionally did not stabilize when transformations were applied. Standard deviations in control cultures became large with time because of a distribution of growth rates in tested cultures. Fertile gametophytes usually grew more slowly than non-fertile organisms causing the size distribution of randomly sampled organisms to spread with time. Cultures which been inhibited by toxicants and did not grow had much smaller standard deviations than control cultures.
Student's t-statistics proved to be conservative estimates of significance. Student's t-tests occasionally indicated no
significant difference between means that proved significantly
different when tested by a combination of ANOVA and~ priori multiple range tests (Sokal and Rohlf, 1981). ANOVA and multiple-range tests were used to make the final determination of significance if data satisfied conditions of independence and homogeneity.
Two types of nonparametric statistics were employed for analyzing counts of viable organisms obtained from random scans of slide
surfaces with a cross-hatched reticule (spores, sperm, sporophytes):
1) Wilcoxon rank-sum tests. (Alder and Roessler, 1972; Sokal and Rohlf, 1981),
2) Complete and sampled randomization tests (Bray, 1988; Sokal and Rohlf, 1981).
Nonparametric statistical tests were used because experimental organism counts were seldom normally distributed. Normal, or near- normal distributions are required for valid outcomes of parametric tests such as Student's t-test or ANOVA (Alder and Roessler, 1972).
Randomization tests employ the following strategy (Sokal and Rohlf, 1981):
"(l) Consider an observed sample of variates or frequencies as one of many possible but equally likely different outcomes that could have arisen by chance.
(2) Enumerate the possible outcomes that could be obtained by randomly rearranging the variates or frequencies.
(3) On the basis of the resulting distribution of outcomes, decide whether the single outcome observed is deviant
(-~mprobable) enough to warrant rejection of the null hypothesis.
In other words, is the probability of obtaining an observation as deviant as or more deviant than the single outcome less than the desired significance level?"
Exact randomization tests can be performed when sample sizes are small enough to keep the total number of combinations of random rearrangements of the data small. Total combinations rise rapidly with increasing sample size.
For comparison of two samples of unequal size,
For comparison of two samples of equal size,
#combinations= (2n)!/ (n!)2.
(2.3)
(2.4) The"!" indicates a factorial. A sample table illustrates growth in number of combinations for equal sample sizes:
n in each sample
5
10 15 20
# of combinations 252 184,756 155,117,520 13,784,652,880
Computer time and memory limitations make complete randomization trials for large samples infeasible, so sampled randomization trials are performed using only a fraction of the possible combinations. The difference between means (or other test statistics) are computed from 500 or 1000 randomly selected recombinations of the pooled original data and compared to the values that were actually obtained in the experiment. If only a small percentage of the random
recombinations exceeds the value obtained in the original experiment, then it is concluded that the observed experimental outcome had a small probability of occurring by chance. Confidence limits must be applied to percentages calculated from sampled randomizations (Sokal and Rohlf, 1981; Rohlf & Sokal, 1981). Computer routines (Bray, 1988) were made available to the author to facilitate analysis of data sets
by complete and sampled randomizations. These methods were used extensively to test of significance of toxicant effects on sporophyte recruitment (Chapter 4).
Errors arising from computation of powers, products or quotients of two uncertain quantities were treated by propagation-of-error methods described in Bevington
(1969).
Propagation-of-error theory was employed in estimation of magnitude of machine error in the digitization process and when computing ratios of treated means to control means. The essence of the theory is that for a quantity x which is a function of several variables:x ... f(u,v, . . ) (2.5)
each of which has a known mean and variance, the uncertainty of xis given by:
2 2
(2.6)(6x) ou + u v
2(6x) 8v + 2u uv
2(6x)(8x) 6u 6v +
Assuming that the fluctuations of u and v are uncorrelated, the expression simplifies to:
u x
2~ o- u
2(6x) ou
2+ u v
2(6x) 6v
2+ (2.7)
An example would be the computation of a quotient:
X
- ±
au (2.8)V
Substitution of this expression for x into (2.6) gives
(2.9)
For computations of ratios of treated means to control means, 95%
confidence limits were substituted for sample variances in (2.9).
This was done to facilitate comparison of 95% confidence limits of ratios computed from differently-sized samples. (Variations in sample size arose from variable gametophyte counts obtained in a fixed
number of scans with the digital image analysis system.) Treatment of data by this method resulted in typical 95%
confidence limits on ratios of± 20%; e.g., the ratio of a toxicant mean area to control mean area would be 0.60
±
0.20. This data treatment tended to obscure significant differences in the original dimensional data and was only employed when comparison of data of unequal magnitudes from different experiments was necessary (i.e., the mean control size in one experiment was twice the size of the control mean in another experiment and comparisons of degree of inhibition of a certain concentration of toxicant were desired).REFERENCES
Alder, H.L. and E.B. Roessler 1972. Introduction to Probability and Statistics: 5th edition. W.H. Freeman, Inc.: San Francisco, Calif.
373pp.
Anderson, E.K. and W.J. North 1972. Chapter 8 in Kelp Habitat
Improvement Project Annual Report: 1971-72. W.M. Keck Engineering Laboratories, California Institute of Technology: Pasadena,
California, 84pp.
Bevington, P.R. 1969. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill Book Company: New York, 336pp.
Bray, R. 1988. Random-T: Non-parametric sampled randomization program for comparison of means of two samples. California State
University: Long Beach. Magnetic media.
Kuwahara, J. 1982. Evidence for cobalt and manganese deficiency in southern California offshore seawater. Science.216:1221-1225.
Rohlf, F.J. and R. R. Sokal 1981. Statistical Tables, 2nd edition.
W.H. Freeman Press: San Francisco, Calif. 219pp.
Sokal, R.R. and F.J. Rohlf 1981. Biometry. 2nd edition. W.H. Freeman Press: San Francisco, Calif. 859pp.
STSC, Inc. 1987. Statgraphics: version 2.6 STSC Inc.: Rockville, Maryland. Magnetic media.