• Tidak ada hasil yang ditemukan

Background

Dalam dokumen Thomas W. MacFarland Jan M. Yates (Halaman 89-92)

Keywords: Barplot, Boxplot (box-and-whiskers plot), Boxplot statistics, Data exploration, Density plot, Descriptive statistics, Dotchart, Histogram, Interquar- tile range (IQR), Length, Maximum, Maximum location, Mean, Measures of central tendency, Median, Minimum, Minimum location, Mode, Quantile-quantile (Q-Q, QQ) plot, Quartiles, Range, Scatter plot, Sort, Standard deviation, Stripchart, Sum, Summary, Trimmed mean, Tukey’s five number summary, Variance, Violin plot, Winsor mean

tor, the school nurse is naturally concerned with overall trends as well as individual measures. What was the average weight? What was the lowest weight? What was the highest weight? What was the variance in weight? Were there any trends that need attention, either for immediate purposes or in the future? With proper analysis, this information could be used, in part, as the basis for informed decision-making on wellness, food selections in the cafeteria, recommendations for physical education classes and individual exercise programs, etc.

Going back to the purpose of this lesson, follow along with the use of a small sample of only 61 subjects and how data exploration, descriptive statistics, and measures of central tendency have value on their own and also as indicators for the use of other statistical tests. Quite often, when examining data and between and among data, it is useful to offer a general view of the data. Saying this, consider the data associated with this lesson.1,2 It would be more than somewhat useful to know:

• How many students were enrolled in the class and are eligible to have their weights measured?

• How many students had their weights measured?

• How many students were not weighed, either because they were unavail- able at the time of data collection or because either they or their parents declined participation in the study?

• Are the data representative of the overall population? Are the data rep- resentative of all Grade 12 students at this school, are the data represen- tative of of all Grade 12 students in the general community, etc.?

• What is the average weight and are there multiple definitions of the term average? If there are multiple definitions for the term average, when is it appropriate to use one view of the term average (e.g., mean, median, or mode) but not the other(s)?

• Did most weights cluster around the average weight or was there a wide degree of variance (e.g., spread, dispersion, etc.) in weights?

• Were there any weights that seem to be exceptionally out-of-range (e.g., outliers), demanding specific attention for these observed weights?

1When reviewing these reviewing these measures of central tendency, know that with a perfect (e.g., theoretical) distribution of values showing in what is commonly called a bell-shaped curve, all three measures for average (e.g., mode, median, and mean) would be equivalent, but this level of theoretical perfection is rarely, if ever, achieved.

2In advance, consider how an oddity of R is that the mode() function has nothing to do with measures of central tendency, but there are from external packages that provide mode as an average.

• Were there any weights that seem to be illogical, perhaps due to accidental data entry of alphabetical characters, perhaps due to wildly unexpected values, or perhaps due to similar errors in the construction of an object variable that has otherwise been declared as a vector of numeric values?

• What was the range of weights, from the lowest (e.g., minimum) weight to the highest (e.g., maximum) weight?

• Do the weights display normal distribution, approximating a bell-shaped curve, or is the distribution skewed and, if so, how? Are weights skewed to the left or are weights skewed to the right?

The following listing identifies a general series of descriptive statistics that are often the first focus for data exploration:

• Measures of central tendency, or representation of the average:

– Mode: most frequent measure.

– Median: mid-point of an array of measures.

– Mean: arithmetic average (Sum/N).

• Measures of dispersion, spread, or values away from the average:

– Variance: the sum of squared deviations from the mean.

– SD or sd: the standard deviation or the square root of the variance.

– Range: the spread from the lowest measure to the highest measure.

It is common to present these statistics early in the research process to give the reader a general view of the data. It is also highly desirable to provide graphical figures of these statistics that visually represent trends.

This lesson has been designed as a demonstration of how R can be used to explore data, and from this initial inquiry provide descriptive statistics and measures of central tendency. The emphasis will be on the use of functions found in the R packages obtained when R is first downloaded. There is also a brief demonstration on the use of functions gained from external R packages.

Complementary graphical representations are also provided that add additional insight into the data.

Accordingly, this lesson should provide a fairly detailed introduction to the practice of data exploration using R, and from this practice generate a full set of descriptive statistics and measures of central tendency. This topic is of special importance since nearly each statistical analysis associated with parametric data (e.g., the use of interval or ratio data for tests such as Student’s t-Test for Independent Samples, Oneway Analysis of Variance, etc.) and nonparametric data (e.g., the use of nominal and ordinal data for tests such as Mann–Whitney

U Test, Kruskal–Wallis Analysis of Variance, etc.) begins with data exploration, descriptive statistics, and measures of central tendency.

2.1.2 Null Hypothesis

There are no inferential analyses associated with this lesson. Therefore, a Null Hypothesis is not provided. All analyses are descriptive (e.g., describe the data), not inferential (e.g., allow an inference or judgment about differences between groups, association between object variables, etc.).

2.2 Import Data in Comma-Separated Values (.csv) File

Dalam dokumen Thomas W. MacFarland Jan M. Yates (Halaman 89-92)