• Tidak ada hasil yang ditemukan

Scatter Plots

Dalam dokumen Statistics for Business and Economics (Halaman 48-52)

In Section 1.3 we discussed graphs (bar chart, pie chart, Pareto diagram) to describe a single categorical variable, and we also discussed graphs (component bar chart and clus-ter bar chart) to describe the relationship between two categorical variables. In this section we presented histograms, ogives, and stem-and-leaf displays as graphs to describe a sin-gle numerical variable. We now extend graphical measures to include a scatter plot, which is a graph used to investigate possible relationships between two numerical variables.

Business and economic analyses are often concerned about relationships between variables. What is the effect of advertising on total profits? What is the change in quan-tity sold as the result of a change in price? How are total sales influenced by total disposable income in a geographic region? What is the change in infant mortality in de-veloping countries as per capita income increases? How does one asset perform in rela-tion to another asset? Do higher SAT mathematics scores predict higher college GPAs?

In these examples we notice that one variable may depend to a certain extent on the other variable. For example, the quantity of an item sold may depend on the price of the commodity. We then call the quantity sold the dependent variable and label it Y. We call the price of the commodity the independent variable and label it X.

To answer these questions, we gather and analyze random samples of data collected from relevant populations. A picture often provides insight as to the relationship that may exist between two variables. Our analysis begins with constructing a graph called a scat-ter plot (or scatscat-ter diagram). A more extensive study of possible relationships between numerical variables is considered in Chapters 11–13.

Example 1.11 Grades on an Accounting Final Exam (Stem-and-Leaf Display)

Describe the following random sample of 10 final exam grades for an introductory accounting class with a stem-and-leaf display.

88 51 63 85 79 65 79 70 73 77

Solution In constructing a stem-and-leaf display, each final exam grade is separated into two parts. For example, the grade of 63 is separated as 6|3, where 6 is called a stem; it appears on the left side of the straight line. The number 3 is called a leaf and appears on the right side of the straight line. From Figure 1.17 we see that the lowest grade was 51, the hightest grade was 88, and most of the students in the sample earned a grade of C on the accounting final exam.

Figure 1.17 Accounting Final-exam Grades (Stem-and-Leaf Display) Stem-and-Leaf Display

n = 10

Stem Leaves 5 1

6 3 5

7 0 3 7 9 9

8 5 8

48 Chapter 1 Using Graphs to Describe Data

Scatter Plot

We can prepare a scatter plot by locating one point for each pair of two vari-ables that represent an observation in the data set. The scatter plot provides a picture of the data, including the following:

1.The range of each variable

2.The pattern of values over the range

3.A suggestion as to a possible relationship between the two variables 4.An indication of outliers (extreme points)

We could prepare scatter plots by plotting individual points on graph paper. How-ever, all modern statistical packages contain routines for preparing scatter plots directly from an electronic data file. Construction of such a plot is a common task in any initial data analysis that occurs at the beginning of an economic or business study. In Example 1.12 we illustrate a scatter plot of two numerical variables.

Example 1.12 Entrance Scores and College GPA (Scatter Plots)

Are SAT mathematics scores a good indicator of college success? All of us have taken one or more academic aptitude tests as part of a college admission procedure. The admissions staff at your college used the results of these tests to determine your admission status.

Table 1.9 gives the SAT math scores from a test given before admission to college and the GPAs at college graduation for a random sample of 11 students at one small private univer-sity in the Midwest. Construct a scatter plot and determine what information it provides.

Table 1.9 SAT Math Versus GPA

SAT MATH GPA

450 3.25

480 2.60

500 2.88

520 2.85

560 3.30

580 3.10

590 3.35

600 3.20

620 3.50

650 3.59

700 3.95

Solution Using Excel, we obtain Figure 1.18, a scatter plot of the dependent variable, college GPA, and the independent variable, SAT math score.

We can make several observations from examining the scatter plot in Figure 1.18.

GPAs range from around 2.5 to 4, and SAT math scores range from 450 to 700. An inter-esting pattern is the positive upward trend—GPA scores tend to increase directly with increases in SAT math scores. Note also that the relationship does not provide an exact prediction. Some students with low SAT math scores have higher GPA scores than do stu-dents with higher SAT math scores. We see that the basic pattern appears to indicate that higher entrance scores predict higher grade point averages, but the results are not perfect.

Exercises 49

E

XERCISES

Visit www.mymathlab.com/global or www.pearsonglobal editions.com/newbold to access the data files.

Basic Exercises

1.30 Use the Quick Guide to find an approximate number of classes for a frequency distribution for each sample size.

a. n = 47 b. n = 80 c. n = 150 d. n = 400 e. n = 650

1.31 Determine an appropriate interval width for a random sample of 110 observations that fall between and in-clude each of the following:

a. 20 to 85 b. 30 to 190 c. 40 to 230 d. 140 to 500 1.32 Consider the following data:

17 62 15 65

28 51 24 65

39 41 35 15

39 32 36 37

40 21 44 37

59 13 44 56

12 54 64 59

a. Construct a frequency distribution.

b. Construct a histogram.

c. Construct an ogive.

d. Construct a stem-and-leaf display.

1.33 Construct a stem-and-leaf display for the hours that 20 students spent studying for a marketing test.

3.5 2.8 4.5 6.2 4.8 2.3 2.6 3.9 4.4 5.5 5.2 6.7 3.0 2.4 5.0 3.6 2.9 1.0 2.8 3.6

1.34 Consider the following frequency distribution:

Class Frequency

06 10 8

106 20 10

206 30 13

306 40 12

406 50 6

a. Construct a relative frequency distribution.

b. Construct a cumulative frequency distribution.

c. Construct a cumulative relative frequency distribution.

1.35 Prepare a scatter plot of the following data:

15, 532 121, 652 114, 482 111, 662 19, 462 14, 562 17, 532 121, 572 117, 492 114, 662 19, 542 17, 562 19, 532 121, 522 113, 492 114, 562 19, 592 14, 562

Application Exercises

1.36 The following table shows the ages of competitors in a charity tennis event in Rome:

Age Percent

18–24 18.26

25–34 16.25

35–44 25.88

45–54 19.26

55+ 20.35

a. Construct a relative cumulative frequency distribution.

b. What percent of competitors were under the age of 35?

c. What percent of competitors were 45 or older?

Figure 1.18 GPA vs. SAT Math Scores (Scatter Plot) 4.00

3.75

3.50

3.25

3.00

2.75

2.50

GPA

450 500 550

SAT Math

600 650 700

50 Chapter 1 Using Graphs to Describe Data

1.37 The demand for bottled water increases dur-ing the hurricane season in Florida. The man-ager at a plant that bottles drinking water wants to be sure that the process to fill 1-gallon bottles (ap-proximately 3.785 liters) is operating properly. Cur-rently, the company is testing the volumes of 1-gallon bottles. A random sample of 75 bottles is tested. Study the filling process for this product and submit a report of your findings to the operations manager. Construct a frequency distribution, cumu-lative frequency distribution, histogram, and a stem-and-leaf display. Incorporate these graphs into a well-written summary. How could we apply statisti-cal thinking in this situation? The data are stored in the data file Water.

1.38 Percentage returns for the 25 largest U.S. com-mon stock mutual funds for a particular day are stored in the data file Returns.

a. Construct a histogram to describe the data.

b. Draw a stem-and-leaf display to describe the data.

1.39 Ann Thorne, the operations manager at a sun-tan lotion manufacturing plant, wants to be sure that the filling process for 8-oz (237 mL) bottles of SunProtector is operating properly. Suppose that a random sample of 100 bottles of this lotion is se-lected, the contents are measured, and the volumes (in mL) are stored in the data file Sun. Describe the data graphically.

1.40 A company sets different prices for a particular DVD system in eight different regions of the coun-try. The accompanying table shows the numbers of units sold and the corresponding prices (in dollars).

Plot the data using a scatter plot with sales as the dependent variable and price as the independent variable.

Sales 420 380 350 400 440 380 450 420 Price 104 195 148 204 96 256 141 109 1.41 A corporation administers an aptitude test to all new

sales representatives. Management is interested in the possible relationship between test scores and the sales representatives’ eventual success. The accompanying table records average weekly sales (in thousands of dollars) and aptitude test scores for a random sample of eight representatives. Construct a scatter plot with weekly sales as the dependent variable and test scores as the independent variable.

Weekly sales 10 12 28 24 18 16 15 12 Test score 55 60 85 75 80 85 65 60 1.42 Doctors are interested in the possible relationship

between the dosage of a medicine and the time required for a patient’s recovery. The following table shows, for a sample of 10 patients, dosage levels (in grams) and recovery times (in hours).

These patients have similar characteristics except for medicine dosages. Describe the data graphi-cally with a scatter plot.

Dosage level 1.2 1.3 1.0 1.4 1.5 1.8 1.2 1.3 1.4 1.3 Recovery time 25 28 40 38 10 9 27 30 16 18 1.43 Bishop’s supermarket records the actual price

for consumer food products and the weekly quantities sold. Use the data file Bishop to obtain the scatter plot for the actual price of a gallon of or-ange juice and the weekly quantities sold at that price. Does the scatter plot follow the pattern from economic theory?

1.44 A Hong Kong snack-food vendor offers 3 types of boxed ”lunches to go,” priced at $3, $5, and $10, respectively. The vendor would like to establish whether there is a relationship between the price of the boxed lunch and the number of sales achieved per hour. Consequently, over a 15-day period the vendor records the number of sales made for each of the 3 types of boxed lunches. The following data show the boxed-lunch price (x) and the number sold (y) during each of the 15 lunch hours.

13, 72 15, 52 110, 22 13, 92 15, 62 110, 52 13, 62 15, 62 110, 12 13, 102 15, 72 110, 42 13, 52 15, 62 110, 42 Prepare a scatter plot of the points and comment on the relationship between the price of the boxed lunches and the numbers sold each lunchtime.

1.45 Sales revenue totals (in dollars) by day of the week are contained in the data file Stordata.

Prepare a cross table that contains the days of the week as rows and the four sales quartile intervals as columns.

a. Compute the row percentages.

b. What are the major differences in sales level by day of the week as indicated by the row percentages?

c. Describe the expected sales volume patterns over the week based on this table.

1.46 Many small cities make significant efforts to attract commercial operations such as shop-ping centers and large retail stores. One of the ar-guments is that these facilities will contribute to the property that can be taxed and thus provide additional funds for local government needs. The data stored in the data file Citydatr come from a study of municipal revenue-generation capability.

Prepare a scatter plot of “taxbase”—the assessed value of all city property in millions of dollars—

versus “comper”—the percent of assessed prop-erty value that is commercial propprop-erty. What information does this scatter plot provide about the assessable tax base and percent of commercial property in the city?

1.6 Data Presentation Errors 51

1.6 D

ATA

P

RESENTATION

E

RRORS

Poorly designed graphs can easily distort the truth. Used sensibly and carefully, graphs can be excellent tools for extracting the essential information from what would otherwise be a mere mass of numbers. Unfortunately, it is not invariably the case that an attempt at data summarization is carried out either sensibly or carefully. In such circumstances one can easily be misled by the manner in which the summary is presented. We must draw from data as clear and accurate a picture as possible. Improper graphs can produce a distorted picture, yielding a false impression. It is possible to convey the wrong message without being deliberately dishonest.

Accurate graphic design is essential in today’s global markets. Cultural biases may influence the way people view charts. For example, in Western cultures people read from left to right and will automatically do so when reading bar charts or time-series plots. In this situation, you should aim to place your most important informa-tion on the right-hand side of the chart. Charts and graphs must be persuasive, clear, and truthful.

In this section we present some examples of misleading graphs, the intent being not to encourage their use but to caution against their dangers. Example 1.13 shows that distortions in histograms can lead to incorrect conclusions. Example 1.14 illus-trates that different choices for the vertical axis in time-series plots can lead to different conclusions.

Dalam dokumen Statistics for Business and Economics (Halaman 48-52)