CHAPTER
2 2.1
Frequency Distributions and Their Graphs
2.2
More Graphs and Displays
2.3
Measures of Central Tendency
2.4
Measures of Variation Case Study
2.5
Measures of Position Uses and Abuses Real Statistics– Real Decisions Technology
Akhiok is a small fishing village on Kodiak Island. Island. Akhiok has a population of 80 residents. Photographs © Roy Corral
Descriptive Statistics
Where You’ve Been In Chapter Chapter 1, you learned learned that there are are many many ways ways to collect data. data. Usuall Usuallyy, researchers must work with sample data in order to analyze populations, but occasionally it is possible to collect all the data for a given population. population. For instance, instance, the following represents represents the ages of the entir entiree populatio population n of the 80 80 resident residentss of Akhi Akhiok, ok, Alas Alaska, ka, from the 2000 2000 census census.. 25, 5, 18, 12, 60, 44, 24, 22, 2, 7, 15, 39, 58, 53, 36, 42, 16, 20, 1, 5, 39, 51, 44, 23, 3, 13, 37, 56, 58, 13, 47, 23, 1, 17, 39, 13, 24, 0, 39, 10, 41, 1, 48, 17, 18, 3, 72, 20, 3, 9, 0, 12, 33, 21, 40, 68, 25, 40, 59, 4, 67, 29, 13, 18, 19, 13, 16, 41, 19, 26, 68, 49, 5, 26, 49, 26, 45, 41, 19, 49
Where You’re Going In Chapter 2, you will learn ways ways to organize and describe describe data sets. The goal is to make make the data easier easier to understand by by describing trends, averages, and variations. variations. For instance, instance, in the raw data data showing showing the the ages of the residen residents ts of Akhi Akhiok, ok, it is not easy easy to see any any patterns patterns or special characteristics. Here are some ways ways you can organize and describe the data. Draw a histogram.
Make a frequency distribution table. Class Cla ss
Frequ Fr equenc ency y, f
0 –9 10–19 20–29 30–39 40 – 49 50 –59 60 –69 70 –79
15 19 14 7 14 6 4 1
20 18 16
y 14 c
n 12 e u 10 q er F
8 6 4 2 5 5 5 5 5 5 5 5 4. 1 4. 2 4. 3 4. 4 4. 5 4. 6 4. 7 4.
Age
Mean
Range
=
0
+
0
+
1
+
1
+
1
+ Á +
67
+
68
80
=
2226 80
L
27.8 years
=
72
=
72 years
-
Find an average.
0 Find how the data vary.
+
68
+
72
34
CHA CH APT PTE ER 2
2.1
Desc De scrrip ipti tiv ve St Stat atis isti ticcs
Frequency Distributions and Their Graphs
What You Should Learn • How to const construct ruct a frequen frequency cy distribution including limits, boundaries,midpoints, relative frequencies, frequencies, and cumulative frequencies • How to constr construct uct frequen frequency cy histograms,, frequency histograms polygons,relative frequency histograms,, and ogives histograms
Example of a Frequency Distribution Class Cla ss
Frequ Fr equenc ency y, f
1– 5 6 –10 1 1 – 15 1 6 – 20 2 1 – 25 26 –30
5 8 6 8 5 4
Frequency Frequ ency Distri Distributio butions ns • Graph Graphss of Frequ Frequency ency Distri Distributio butions ns
Frequency Distributions When a data set has many entries, entries, it can be difficult to see patterns. patterns. In this section, you will learn how to organize data sets by grouping grouping the data into intervals called classes and forming a frequency distribution. You will also learn how to use frequency distributions to construct graphs.
DEFINITION A frequency distribution is a table that shows classes or intervals of data entries with a count of the number of entries in each class. The frequency f of a class is the number of data entries in the class. In the frequency distribution shown there are six classes. classes. The frequencies for each each of the six six classes classes are are 5, 8, 6, 8, 5, and 4. Each class class has has a lower class limit, which is the least number that that can belong to the class, class, and an upper class limit, which is the greatest number number that can belong to the class. class. In the frequency distri dis tribut bution ion sho shown, wn, the low lower er cla class ss lim limits its are 1, 6, 11, 16, 21, and 26, and the uppe up perr clas classs lim limit itss are are 5, 10 10,, 15 15,, 20 20,, 25 25,, an and d 30. 30. The class width is the distance between lower (or upper) limits limits of consecutive consecutive classes. classes. For instance, the class width in the frequency distribution shown is 6 - 1 = 5 . The difference between the maximum and minimum data entries is called the range. For instance, if the maximum data entry is 29, and the minimum data entry is 1, the range is 29 - 1 = 28. You will learn more about the range in Section 2.4. Guidelines for constructing a frequency distribution from a data set are as follows.
GUIDELINES
Study T ip t tiion, i t trribu t t frrequenc y dis In a f e h th s t f each class ha is bes t i f o wn h s s r e h. Ans w th same wid t taa a t d he minimum th will use t t o f he lo wer limi t th or t fo vaalue f v t tiimes i t e t fiirs t class. Som he f th t o to t t t n e eni ve con v ma y be more y tll y h ha t is slig t th vaalue t choose a v he minimum th han t th lo wer t trri t frrequenc y dis he f Th vaalue. T v vaar y will v tiion produced bu t y. tll y. sligh t
Constructing a Frequency Distribution from a Data Set 1. Decide on the number of classes to include in the frequency distribution.
The number number of classes should be between 5 and 20; otherwise, it may be difficult to detect any patterns. 2. Find the class width width as follows. follows. Determine the range of the data, divide the range by the number of classes, classes, and round up to the next convenient number.
3. Find the class limits. limits. You can use the minimum data entry as the lower
limit of the first class. class. To find the remaining lower limits, limits, add the class width to the lower limit of the the preceding class. class. Then find the upper limit of the first class class.. Remember that classes classes cannot overlap. overlap. Find the remaining upper class limits limits.. 4. Make a tally mark for each data entry in the row of the appropriate class. 5. Count the the tally marks to to find the total total frequency frequency f for each class.
SECT SE CTIO ION N 2. 2.1 1
Note to Instructor Let students know that there are many correct versions for a frequency distribution.To make it easy to check answers, however however,, they should follow the conventions shown in the text.
Insight um taain a whole n ou ob t yo f y I f h th t tiing e ula t ber when calc frrequenc y h o f a f th class wid t he ne x t th tiion, use t trribu t dis t he class th r as t whole numbe his ensures th h. Doing t th wid t e in e enough spac ve ou ha v yo y tiion u trrib t t frrequenc y dis our f yo y vaalues. taa v he da t th or all t fo f
EXAMPLE
Upperr Uppe lim it
7 19 31 43 55 67 79
18 30 42 54 66 78 90
p S tudy T i
e t t e r e G r e e k l l e s a c r p e o u g h T h e u p 2 i s s e d t h r u s c a t e a g m a 1 g s t o i n s i g n d i c c c i t s s i t u e s. o u t s t a n o f v a l u o o i t a m s u m
Note to Instructor Be sure that students interpret the class width correctly as the distance between lower (or upper) limits of consecutive classes. A common error is to use a class width of 11 for the class 7–18. Students should be shown shown that this class actually has a width of 12.
35
1
Constructing a Frequency Distribution from a Data Set The following sample data set lists the number of minutes 50 Internet subscribers spent on the Internet during their their most recent session. Construct a frequency distribution that has seven classes classes.. 50 40 41 17 11 7 22 44 28 21 19 23 37 51 54 42 88 41 78 56 72 56 17 7 69 30 80 56 29 33 46 31 39 20 18 29 34 59 73 77 36 39 30 62 54 67 39 31 53 44
SOLUTION 1. The The number of classes classes (7) is stated stated in the problem. problem. 2. Th Thee minim minimum um data entry entry is 7 and the maximum maximum data entry is 88, 88, so the range is 81. Divide the range by the number of classes and round up to find that the class width is 12. Class width = =
Lowerr Lowe l im it
Freq Fr eque uenc ncyy Dist Distri ribu buti tion onss and and Th Thei eirr Gra Graph phss
88 - 7 7
Maximum entry - Minimum entry Number of classes
81 7
Range Number of classes
L 11.57
Round up to 12.
3. The minimum minimum data entry is a convenient convenient lower limit for the first class. class. To find the lower limits of the remaining six classes, classes, add the class width of 12 to the lower limit of each previous class. class. The upper limit limit of the first class is 18, which is one less than the lower limit of the second class.The upper limits of the other classes are 18 + 12 = 30 30,, 30 + 12 = 42 42,, and so on. The lower and upper limits for all seven classes are shown. 4. Make a tally tally mark for for each data entry in the appropriate class. class. 5. The number number of tally marks marks for a class is the frequency frequency for that class. class. The frequency frequency distribution is shown in the following following table. The first class, class, 7–18, has six tally tally marks. marks. So, the frequency for this class is 6. 6. Notice that the sum of the frequencies is 50, which is the number of entries in the sample data set.The set. The sum is denoted by g f , where g is the uppercase Greek letter sigma.
Frequency Distribution for Internet Usage (in minutes) Minutes online
Class 7–18 19–30 31–42 43–54 55–66 67–78 79–90
Tally ƒƒƒƒ ƒƒƒƒ ƒƒƒƒ ƒƒƒƒ ƒƒƒƒ ƒƒƒƒ ƒƒ
ƒ ƒƒƒƒ ƒƒƒƒ ƒƒƒ ƒƒƒ ƒ
Frequency, f 6 10 13 8 5 6 2 g f = 50
Number of subscribers
Check that the sum of the frequencies equals the number in the sample.
36
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
Try It Yourself 1 Construct a frequency distribution using the Akhiok population data set listed in the Chapter Opener on page 33. Use eight classes. classes. a. b. c. d. e.
State the number of classes . Find the minimum and maximum values and the class width. Find the class limits . entries.. Tally the data entries Answer: Page A29 Write the frequency f for each class.
After constructing a standard frequency distribution such as the one in Example 1, you can include several additional features that will will help provide a better bett er unders understan tanding ding of the the data. data. Th These ese featu features res,, the midp midpoint oint,, rela relativ tivee frequency,, and cumulative frequency of each class, frequency class, can be included as additional columns in your table.
DEFINITION The midpoint of a class is the sum of the lower and upper limits of the class divided by two. The midpoint is sometimes sometimes called the class mark. Midpoint =
1Lower class limit 2 + 1Upper class limit2 2
The relative frequency of a class is the portion or percentage of the data that falls in that class. class. To find the relative frequency frequency of a class, divide the . f n frequency by the sample size Relative frequency = =
Class frequency Sample size f n
The cumulative frequency of a class is the sum of the frequency for that class and all previous classes. The cumulative frequency frequency of the last class is equal to the sample size n. After finding the first first midpoint, you can find the remaining remaining midpoints by adding the class adding class width to to the previous previous midpoint. midpoint. For instance instance,, if the first first midpoint is 12.5 and the class width width is 12, then the remaining midpoints are are 12.5 + 12 = 24.5 24.5 + 12 = 36.5 36.5 + 12 = 48.5 48.5 + 12 = 60.5 and so on. You can write the relative frequency frequency as a fraction, fraction, decimal, or percent. The sum of the relative frequencies of all the classes must equal 1 or 100%.
SECT SE CTIO ION N 2. 2.1 1
EXAMPLE
Freq Fr eque uenc ncyy Dist Distri ribu buti tion onss and and Th Thei eirr Gra Graph phss
37
2
Midpoints, Relative and Cumulative Frequencies Using the frequency frequency distribution constructed in Example 1, find the midpoint, relative frequency, frequency, and cumulative frequency for each class. class. Identify any patterns. midpoint, relative frequency frequency, and cumulative frequency frequency for the SOLUTION The midpoint, first three classes are calculated as follows. Class
f
Relative frequency
Midpoint
Cumulative frequency
7 + 18 6 6 = 12.5 = 0.12 2 50 19 + 30 10 = 24.5 = 0.2 19 –30 10 6 + 10 = 16 2 50 31 + 42 13 16 + 13 = 29 31 – 42 13 = 36.5 = 0.26 2 50 The remaining remaining midpoints, midpoints, relative frequencies, frequencies, and cumulative frequencies frequencies are shown in the following expanded frequency distribution. 7–18
6
Frequency Distribution for Internet Usage (in minutes) Frequency, Minutes online Number of subscribers
Class
f
Midpoint
Relative frequency
Cumulative frequency
7–18 19–30 31–42 43–54 55–66 67–78 79–90
6 10 13 8 5 6 2
12.5 24.5 36.5 48.5 60.5 72.5 84.5
0.12 0.2 0.26 0.16 0.1 0.12 0.04
6 16 29 37 42 48 50
g f = 50
g
f n
Portion of subscribers
= 1
There ere are several several patterns patterns in the the data set. set. For instance instance,, the Interpretation Th most common time span that users spent online was 31 to 42 minutes minutes..
Try It Yourself 2 Using the frequency distribution constructed constructed in Try Try It Yourself Yourself 1, find the midpoint, relative frequency frequency,, and cumulative cumulative frequency for each class class.. Identify any patterns. a. Use the formulas to find each midpoint , relative frequency, and cumulative frequency. b. Organize your results in a frequency distribution. Answer: Page A29 c. Identify patterns that emerge from the data.
38
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
Graphs of Frequency Distributions Sometimes it is easier to identify patterns of a data set by looking at a graph of the frequency distribution. distribution. One such graph is a frequency histogram. histogram.
DEFINITION A frequency histogram is a bar graph that represents the frequency distribution of a data set. A histogram has the following properties. properties.
Study T ip tegers, trries are in te taa en t f da t I f er frrom each lo w trrac t 0.5 f sub t s he lo wer clas th fiind t o f t to limi t t he upper th fiind t o f To boundaries. T o to ies, add 0.5 t class boundar he upper Th t. T i t. each upper lim l class will equa boundar y o f a e h th f t dar y o f he lo wer boun th t ass. ne x t higher cl
1. The horizontal horizontal scale is quantitative and measures measures the data values. values. 2. The vertical scale measures the frequencies of the classes. classes. 3. Conse Consecuti cutive ve bar barss must must touch. touch. Because consecutive bars of a histogram histogram must touch, bars must begin and end at class boundaries instead of class limits. Class boundaries are the numbers that separate classes without forming gaps between them. them. You can mark the horizontal scale either at the midpoints or at the class boundaries, boundaries, as shown in Example 3.
EXAMPLE
3
Constructing a Frequency Histogram Draw a frequency histogram for the frequency distribution in Example 2. Describe any patterns patterns..
SOLUTION Class Frequency, Class Cla ss bou boundar ndaries ies f 7–18 19–30 31– 42 43–54 55– 66 67–78 79–90
6.5–18.5 18.5–30.5 30.5– 42.5 42.5–54.5 54.5– 66.5 66.5–78.5 78.5–90.5
6 10 13 8 5 6 2
First, find the class boundaries. First, boundaries. The distance distance from the upper limit of the first class to the lower limit of the second class is 19 - 18 = 1 . Half this distance is 0.5.So, 0.5. So,the the lower and upper boundaries of the first class are as follows: follows: First class lower boundary = 7 - 0.5 = 6.5 First class upper boundary = 18 + 0.5 = 18.5 The boundaries of the remaining remaining classes are shown in the table. table. Using the class midpoints or class boundaries for the horizontal scale and choosing possible frequency values for the vertical scale, scale, you can construct the histogram. Internet Usage (labeled with class boundaries)
Internet Usage (labeled with class midpoints) )s 14 e 12 b ri c b n us e u o er
r 6 F e
e 12 b ri
6
b
c b n us e
5
u
m 4 u
Time online (in minutes)
6
r 6 F
12.5 12. 5 24. 24.5 5 36. 36.5 5 48. 48.5 5 60. 60.5 5 72. 72.5 5 84. 84.5 5
Broken axis
o re
2
(n 2
8
8
f q
6
10
cs 10 y
8
8
f q
13
r
10
cs 10 y
)s 14
13
r
e b
5
6
m 4 u
2
(n 2 6.5 6. 5
18.5 18 .5 30.5 42.5 42.5 54. 54.5 5 66. 66.5 5 78. 78.5 5 90. 90.5 5
Time online (in minutes)
either histogram, you can see that more than half of the Interpretation From either subscribers spent between 19 and 54 minutes on the Internet during their most recent session.
SECT SE CTIO ION N 2. 2.1 1
Freq Fr eque uenc ncyy Dist Distri ribu buti tion onss and and Th Thei eirr Gra Graph phss
39
Try It Yourself 3 Use the frequency distribution from Try It Yourself 1 to construct a frequency histogram that represents the the ages of the residents of Akhiok. Describe any patterns. a. b. c. d.
Find the class boundaries. Choose appropriate horizontal and vertical scales . Use the frequency distribution to find the height of each bar . Answer: Page A30 Describe any patterns for the data.
Another way to graph a frequency distribution is to use a frequency polygo pol ygon. n. A frequency polygon is a line graph that emphasizes the continuous change in frequencies frequencies..
EXAMPLE
4
Constructing a Frequency Polygon Draw a frequency polygon for the frequency distribution in Example 2.
Study T ip ts ogram and i ts to A his t frrequenc y f corresponding ten dra wn ygon are o f te pol yg e no t ve ou ha v f yo her. I f y th oge t to t he th ed t te trruc t alread y cons t t- trruc t cons t ogram, begin to his t ygon yg frrequenc y pol he f th ing t e te r pprop ia t b y choosing a s tiical cales. er t ve taal and v zon t hori zo ld taal scale shou zon t he hori zo Th T tss, in t o p d i he class m th f t consis t o f ld u o tiical scale sh er t ve he v th and t e te ropria t consis t o f app vaalues. frrequenc y v f
SOLUTION To construct the frequency frequency polygon, use the same horizontal horizontal and vertical scales that were used in the histogram labeled with class midpoints in Example 3. Then plot points that represent the midpoint and frequency of each class and connect the the points in order from from left to right. right. Because the graph should begin and end on the horizontal axis, extend the left side to one class class width before the first class midpoint and extend the right side to one class width after the last class midpoint. Internet Usage 14
)s
r 12 e ibr
cs 10 y c b n us
8
e
r
6
m
4
e u f q o er F u
b n(
2 0.5
12.5
24.5
36.5
48.5
60.5
72.5
84.5
96.5
Time online (in minutes)
Interpretation You can see that the frequency of subscribers increases up to 36.5 minutes and then decreases.
Try It Yourself 4 Use the frequency distribution from Try It Yourself 1 to construct a frequency polygon that represents the ages of the residents of Akhiok. Describe any patterns. a. b. c. d.
Choose appropriate horizontal and vertical scales. Plot points that represent the midpoint and frequency for each class class.. Connect the points and extend the sides as necessary. Describe any patterns for the data. Answer: Page A30
40
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
A relative frequency histogram has the same shape and the same horizontal scale as the corresponding corresponding frequency histogram. histogram. The difference difference is that the the vertical scale measures the relative frequencies frequencies,, not frequencies. frequencies.
Picturing the World Old Faith Faithful, ful, a geyser geyser at at Yellowstone National Park, erupts on a regular basis. basis. The time spans of a sample of eruptions are given in the relative frequency histogram. (Source: Yellowstone National Park)
Old Faithful Eruptions
EXAMPLE
5
Constructing a Relative Frequency Histogram Draw a relative frequency histogram for the frequency distribution in Example 2.
SOLUTION The relative frequency histogram is shown. Notice that the shape of the histogram is the same as the shape of the frequency histogram constructed in Example 3. The only difference difference is that the vertical vertical scale measures measures the relative frequencies.
0.40 y c n e u 0.30 q e r f e 0.20 v i t a 0.10 l e R
Internet Usage 0.28
)s
r 0.24 y e c bi n
r 0.20 c e u s q b er
us 0.16 f f e
2.0 2. 0 2. 2.6 6 3. 3.2 2 3. 3.8 8 4. 4.4 4
Duration of eruption (in minutes)
Fifty percent of the eruptions last less than how many minutes?
o 0.12 n vi t la e R
oi
tr 0.08 o
(p 0.04 6.5
18.5
30.5
42.5
54.5
66.5
78.5
90.5
Time online (in minutes)
this graph, you can quickly see that 0.20 or 20% of the the Interpretation From this Internet subscribers spent between 18.5 minutes and 30.5 minutes online, which is not as immediately obvious from the frequency histogram.
Try It Yourself 5 Use the frequency distribution from Try It Yourself 1 to construct a relative frequency histogram that represents the ages of the residents of Akhiok. a. Use the same horizontal scale as used in the frequency histogram. b. Revise the vertical scale to reflect relative frequencies. of each bar bar. Answer: Page A30 c. Use the relative frequencies to find the height of
If you want to describe the number of data entries that are equal to or below a certain certain value, you can easily do so by constructing constructing a cumulative cumulative frequency graph.
DEFINITION A cumulative frequency graph, or ogive (pron (pronounce ounced d o ¿ j jiv ivee ), is a line graph that displays the cumulative frequency of each class at its upper class boundary. boundary. The upper boundaries are marked on the horizontal axis, and the cumulative frequencies are marked on the vertical axis.
SECT SE CTIO ION N 2. 2.1 1
Freq Fr eque uenc ncyy Dist Distri ribu buti tion onss and and Th Thei eirr Gra Graph phss
41
GUIDELINES Constructing an Ogive (Cumulative Frequency Graph) 1. Construct a frequency distribution that includes cumulative frequencies 2.
3. 4. 5.
as one of the columns columns.. Specify the horizontal and vertical scales. The horizontal scale consists of upper class boundaries, boundaries, and the vertical scale measures cumulative cumulative frequencies. Plot points that represent the upper class boundaries and their corresponding cumulative frequencies frequencies.. Connect the points in order from left to right. The graph should start at the lower boundary of the first class (cumulative frequency is zero) and should end at the upper boundary of the last class (cumulative frequency is equal to the sample size).
6
EXAMPLE
Constructing an Ogive Draw an ogive for the frequency distribution in Example 2. Estimate how many subscribers spent 60 minutes or less online online during their their last session. session. Also, use the graph to estimate when the greatest increase in usage occurs.
Upperr cl Uppe clas asss boundary
f
Cum umul ulat ativ ive e frequency
18.5 30.5 42.5 54.5 66.5 78.5 90.5
6 10 13 8 5 6 2
6 16 29 37 42 48 50
SOLUTION
Using the frequency distribution, you can construct the the ogive shown. The upper class boundaries, shown.The boundaries, frequencies frequencies,, and cumulative frequencies are shown in the table. table. Notice that the graph starts starts at 6.5, where the cumulative frequency is 0,and 0, and the graph ends at 90.5,where 90.5, where the cumulative frequency is 50. Internet Usage 50
y
)s e
bi 40
r c e n r u cs q er b f
us 30 e f ivt o u
r 20 e u u
al b m C
m
(n 10 6.5
18.5
30.5
42.5
54.5
66.5
78.5
90.5
Time online (in minutes)
the ogive, you can see that about about 40 subscribers spent spent Interpretation From the 60 minutes or less online during their last session. The greatest increase in usage occurs between 30.5 minutes and 42.5 minutes because the line segment is steepest between these two class boundaries. Another type of ogive uses percent as the vertical axis instead of frequency (see Example 5 in Section 2.5).
42
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
Try It Yourself 6 Use the frequency distribution from Try It Yourself 1 to construct an ogive that represents the ages of the residents of Akhiok. Estimate the the number of residents who are 49 years old or younger. a. Specify the horizontal and vertical scales . b. Plot the points given by the upper class boundaries and the cumulative
frequencies. c. Construct the graph. d. Estimate the number of residents who are 49 years old or younger. Answer: Page A30
EXAMPLE
7
Using Technology to Construct Histograms
p Study T i p or using fo tiions f trruc t ailed ins t ta De t TII-83 he T th xccel, and t TAB, E x MINI TA echnolog y Te he T th n in t wn are sho w his th ha t accompanies t th Guide t e r a e r e ance, h ta t.. For ins t e x t te t tiing a or crea t fo f s n o i ti trruc t ins t TII-83. ogram on a T to his t A T TA S T
R E R T E E N T E N
tss in L1. er midpoin t te En t frrequencies in L2. er f te En t d nd 2 n
O T LO P L A T P TA S T
Use a calculator or a computer to construct a histogram for the frequency distribution in Example 2.
SOLUTION
MINITAB, MINIT AB, Excel Excel,, and the TI-83 TI-83 each have features features for graphing graphing histograms.. Try using this technology to draw the histograms as shown. histograms
14 12 10
10 y c n 8 e u q 6 e r F 4
y c n e u q e 5 r F
2 0
0 12.5 24.5 36.5 48.5 60.5 72.5 84.5
Minutes
12.5
24.5
36.5
48.5
60.5
72.5
84.5
Minutes
Turn on Plo t 1. Tu ogram. to Highligh t His t t:: L1 Xlis t Xl Freq: L2 ZOO M
9
O W DO N D I N W I
Xscl=12 Xs A P H RA G R
Try It Yourself 7 Use a calculator or a computer to construct a frequency histogram that represents the ages of the residents of Akhiok listed in the Chapter Opener on page 33. Use eight classes classes.. a. Enter the data. b. Construct the histogram.
Answer: Page A30
SECT SE CTIO ION N 2. 2.1 1
2.1 2. 1
Freq Fr eque uency ncy Di Distr strib ibut utio ions ns and and Th Thei eirr Gra Graph phss
43
Exercises
Building Basic Skills and Vocabulary
Help
1. What are some benefits of representing data sets using frequency
distributions? 2. What are some benefits of representing data sets using graphs of frequency
distributions?
Student Stud y Pack 1. Organizing the data into a frequency distribution may make patterns within the data more evident. 2. Sometimes it is easier to identify patterns of a data set by looking at a graph of the frequency distribution. 3. Class limits determine which numbers can belong to that class. Class boundaries are the numbers that separate classes without forming gaps between them. 4. Frequency for a class is the number of data entries in each class. Relative frequency of a class is the percent of the data that fall in each class. 5. False.The midpoint of a class is the sum of the lower and upper limits of the class divided by two. 6. False.The relative frequency of a class is the frequency of the class divided by the sample size. 7. True 8. False.Class False. Class boundaries are used to ensure that consecutive bars of a histogram do not touch. 9. See Odd Answers, Answers, page A## 10. See Selected Answers, Answers, page A##
3. What is the difference between class limits and class boundaries? 4. What is the difference between frequency and relative frequency?
True or False? In Exercises 5–8, determine whether the statement is true or false. false. If it is false, rewrite it as a true statement. statement.
5. The midpoint of a class is the sum of its lower and upper limits. 6. The relative frequency of a class is the sample size divided by the frequency
of the class. 7. An ogive is a graph that displays cumulative frequency. frequency. 8. Class limits are used to ensure that consecutive bars of a histogram do
not touch.
frequency Reading a Frequency Distribution In Exercises 9 and 10, use the given frequency distribution to find the (a)) cl (a clas asss width width.. (b) cla class ss midp midpoin oints. ts. (c) cla class ss bound boundari aries. es.
9.
Employee Age
10.
Tree Height
Class Cla ss
Frequ Fr equenc ency y, f
Class Cla ss
Frequ Fr equenc ency y, f
20–29 30–39 40– 49 50–59 60– 69 70–79 80– 89
10 132 284 300 175 65 25
16 – 20 21– 25 26 – 30 31– 35 36 – 40 41– 45 46 – 50
100 122 900 207 795 568 322
11. See Odd Answers, Answers, page A## 12. See Selected Answers, Answers, page A##
11. Use the frequency distribution in Exercise 9 to construct an expanded
frequency distribution, distribution, as shown in Example Example 2. 12. Use the frequency distribution in Exercise 10 to construct an expanded
frequency distribution, distribution, as shown in Example Example 2.
44
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
13. (a) Number of classes = 7 (b) Lea Least st frequ frequency ency L 10 (c) Gre Greate atest st frequenc frequencyy L 300 (d) Cla Class ss wid width th = 10 14. (a) Num Number ber of of classes classes = 7 (b) Lea Least st frequ frequency ency L 100 (c) Gre Greate atest st frequenc frequencyy L 900 (d) Cla Class ss wid width th = 5
Graphical Analysis In Exercises 13 and 14, use the frequency histogram to (a) deter determine mine the the number number of classe classes. s. (b) estimate the frequency of the class with the least frequency. (c) estimate the frequency of the class with with the greatest greatest frequency. frequency. (d) deter determine mine the class width width..
13.
14. Employee Age
15. (a) 50 (b) 12. 12.5–1 5–13.5 3.5 pound poundss
300
16. (a) 50
(b) 19. 19.5 5 pou pounds nds
900
250
y
750
c 200 n
y
q 150 e
q 450 e
(b)) 68–70 in (b inch ches es 17. (a) 24
Tree Height
c 600 n
e
e
u
u
r F
18. (a) 44
r F
100 50
(b)) 70 in (b inch ches es
300 150
5 . 5 . 5 . 5 . 5 . 5 . 5 . 4 4 4 4 4 4 4 2 3 4 5 6 7 8
18 23 28 33 38 43 48
Height (in inches)
Age (in years)
Graphical Analysis
In Exercises 15 and 16, use the ogive to approximate approximate
(a) the number number in the the sample. sample. (b) the location location of the greatest greatest increase increase in frequency frequency..
15.
16. Adult Male Rhesus Monkeys 55
Adult Male Ages 20–29 55 y 50 c n 45 e u 40 q e r 35 f e 30 v 25 i t a 20 l u 15 m u 10 C 5
y 50 c n 45 e u 40 q e r 35 f e 30 v i t 25 a 20 l u 15 m u 10 C 5 8.5 10.512.514.516.518.520.522.5
Weight (in pounds)
62 64 66 68 70 72 74 76 78
Height (in inches)
17. Use the ogive in Exercise 15 to approximate
(a) the cumulative frequency for a weight of 14.5 pounds. (b) the weight for which the cumulative frequency is 45. 18. Use the ogive in Exercise 16 to approximate
(a) the cumulative frequency for for a height of 74 inches. inches. (b) the heigh heightt for which the cumulati cumulative ve frequency frequency is 25.
SECT SE CTIO ION N 2. 2.1 1
19. (a) Class with with greatest greatest relat relative ive frequency: freq uency: 8–9 inches
Freq Fr eque uency ncy Di Dist stri ribu buti tion onss and and Th Thei eirr Gra Graph phss
45
Graphical Analysis In Exercises 19 and 20, use the relative frequency histogram to (a) identify the class with the the greatest and the least least relative frequency.
Class with least relative frequency: freq uency: 17–18 inches
(b) approximate the greatest and least relative frequency. frequency.
(b) Greatest Greatest relativ relative e frequency frequency L 0.195
(c) appro approximat ximate e the relative relative frequency frequency of the second class.
19.
Least relative frequency L 0.005
Atlantic Croaker Fish
20.
Emergency Response Time
0.20
40%
y c 0.16 n e u q 0.12 e r f e 0.08 v i t a l e 0.04 R
(c) Appr Approxim oximately ately 0.015 20. (a) Class with with greatest greatest relat relative ive frequency: frequ ency: 19–20 minut minutes es Class with least relative frequency: 21–22 minutes (b) Greatest Greatest relativ relative e frequency frequency L 40%
y c n 30% e u q e r f 20% e v i t a l e 10% R 5.5 5. 5
7.5 7. 5
9.5 9. 5 11 11.5 .5 13. 13.5 5 15.5 15.5 17 17.5 .5
17.5 17 .5 18 18.5 .5 19 19.5 .5 20 20.5 .5 21 21.5 .5
Length (in inches)
Least relative frequency L 2% (c) App Appro roxim ximate ately ly 33%
Time (in minutes)
Graphical Analysis In Exercises 21 and 22, use the frequency polygon to identify identify
21. Class with greatest frequency: 500–550
the class with the greatest and the least frequency.
21.
Classes with least frequency: 250–300 and 700–750 22. Class with greatest frequency: 7.75–8.25 Class with least frequency: 6.25–6.75 23. See Odd Answers, page A##
SAT Scores for 50 Students
22.
Shoe Sizes for 50 Females
12
20
y c 9 n e u q 6 e r F
y c 15 n e u q 10 e r F
3
5
24. See Selected Answers, Answers, page A## 5 5 5 5 5 5 5 5 5 5 5 5 2 7 2 7 2 7 2 7 2 7 2 7 2 2 3 3 4 4 5 5 6 6 7 7
6.0
Score
7.0
8.0
9.0
10.0
Size
Using and Interpreting Concepts Constructing a Frequency Distribution In Exercises 23 and 24, construct a frequency frequency distribution for the data set using the indicated number of classes. In the table, include inclu de the the midpoint midpoints, s, relat relative ive frequ frequencie encies, s, and cumul cumulativ ative e frequenc frequencies. ies. Which class has the greatest frequency and which has the least frequency?
23. Newspaper Reading Times DATA
Number of classes: Number classes: 5 Data set: Tim Timee (in minutes) spent reading the newspaper in a day 7 39 13 9 25 8 22 0 2 18 2 30 7 35 12 15 8 6 5 29 0 11 39 16 15 24. Book Spending
DATA
Number of classes: Number classes: 6 Data set: Amount (in dollars) spent on books for a semester semester 91 472 279 249 530 376 188 341 266 199 142 273 189 130 489 266 248 101 375 486 190 398 188 269 43 30 127 354 84 indicates that the data set for this exercise is available electronically. DATA
46
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
25. See Odd Answers, page A##
Constructing a Frequency Distribution and a Frequency Histogram In Exercises
26. See Selected Answers, Answers, page A##
25– 28, construct a frequency distribution distribution and a frequency histogram for the data set using the indicated number of classes. Describe any patterns.
27. See Odd Answers, page A## 28. See Selected Answers, Answers, page A## 29. See Odd Answers, page A##
25. Sales DATA
30. See Selected Answers, Answers, page A##
Number of classes: Number classes: 6 Data set: July sales (in dollars) for all sales representatives representatives at a company 2114 2468 7119 1876 4105 3183 1932 1355 4278 1030 2000 1077 5835 1512 1697 2478 3981 1643 1858 1500 4608 1000 26. Pepper Pungencies
DATA
Number of classes: Number classes: 5 Data set: Pungencies (in 1000s of Scoville Scoville units) of 24 tabasco peppers 35 51 44 42 37 38 36 39 44 43 40 40 32 39 41 38 42 39 40 46 37 35 41 39 27. Reaction Times
DATA
Number of classes: Number classes: 8 Data set: Reaction times (in milliseconds) milliseconds) of a sample of 30 adult females to an auditory stimulus 507 389 305 291 336 310 514 442 307 337 373 428 387 454 323 441 388 426 469 351 411 382 320 450 309 416 359 388 422 413 28. Fracture Times
DATA
Number of classes: Number classes: 5 Data set: Amount of pressure (in pounds per square inch) at fracture fracture time for 25 samples of brick mortar 2750 2862 2885 2490 2512 2456 2554 2532 2885 2872 2601 2877 2721 2692 2888 2755 2853 2517 2867 2718 2641 2834 2466 2596 2519
Constructing a Frequency Distribution and a Relative Frequency Histogram In Exercises 29–32, construct a frequency distribution and a relative frequency frequency histogram for the data set using five classes. Which class has the greatest relative frequency and which has the least relative frequency?
29. Bowling Scores DATA
Data set: Bowling scores of a sample of league members 154 257 195 220 182 240 177 228 235 146 174 192 165 207 185 180 264 169 225 239 148 190 182 205 148 188 30. ATM Withdrawals
DATA
Data set: A sample of ATM ATM withdrawals (in dollars) 35 10 30 25 75 10 30 20 20 10 40 50 40 30 60 70 25 40 10 60 20 80 40 25 20 10 20 25 30 50 80 20
SECT SE CTIO ION N 2. 2.1 1
31. See Odd Answers, page A## 32. See Selected Answers, Answers, page A##
47
31. Tree Heights DATA
33. See Odd Answers, page A##
Data set: Heights (in feet) of a sample sample of Douglas-fir trees 40 44 35 49 35 43 35 36 39 37 41 41 48 52 37 45 40 36 35 50 42 51 33 34 51 39
34. See Selected Answers, Answers, page A## 35. See Odd Answers, page A## 36. See Selected Answers, Answers, page A## 37. See Odd Answers, page A##
Freq Fr eque uency ncy Di Dist stri ribu buti tion onss and and Th Thei eirr Gra Graph phss
32. Farm Acreage DATA
Data set: Number of acres on a sample of small small farms 12 7 9 8 9 8 12 10 9 10 6 8 13 12 10 11 7 14 12 9 8 10 9 11 13 8
Constructing a Cumulative Frequency Distribution and an Ogive In Exercises 33–36, construct a cumulative frequency distribution and an ogive for the data set using six classes.Then describe the location of the greatest increase in frequency.
33. Retirement Ages DATA
Data set: Retirement ages for a sample sample of engineers 60 65 68 63 66 67 69 67 58 65 67 61 63 65 62 64 73 50 61 71 62 69 72 63 34. Saturated Fat Intakes
DATA
Data set: Daily saturated fat intakes intakes (in grams) of a sample of people 38 32 34 39 40 54 32 17 29 33 57 40 25 36 33 24 42 16 31 33 35. Gasoline Purchases
DATA
Data set: Gasoline (in gallons) purchased by a sample sample of drivers during one fill-up 7 4 18 4 9 8 8 7 6 2 9 5 9 12 4 14 15 7 10 2 3 11 4 4 9 1 2 5 3 36. Long-Distance Phone Calls
DATA
Data set: Lengths (in minutes) of a sample sample of long-distance phone calls 1 2 0 10 2 0 13 23 3 7 18 7 4 5 15 7 29 10 18 10 10 23 4 12 8 6
Constructing a Frequency Distribution and a Frequency Polygon
In Exercises 37 and 38, construct a frequency distribution and a frequency polygon for the the data set. Describe any patterns.
37. Exam Scores DATA
Number of classes: Number classes: 5 Data set: Exam scores for all students students in a statistics class 83 92 94 82 73 98 78 85 72 90 89 92 96 89 75 85 63 47 75 82
48
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
38. See Selected Answers, Answers, page A## 39. See Odd Answers, page A##
38. Children of the President DATA
40. See Selected Answers, Answers, page A## 41.
Histogram (5 Classes)
Number of classes: Number classes: 6 infoplease.com) om) Data set: Number of children of the U.S. U.S. presidents (Source: infoplease.c 0 5 6 0 2 4 0 4 10 15 0 6 2 3 0 4 5 4 8 7 3 5 3 2 6 3 3 0 2 2 6 1 2 3 2 2 4 4 4 6 1 2
8 7 y 6 c n 5 e u 4 q e r 3 F 2 1
Extending Concepts 2
5
8
11
14 14
39. What Would You Do? You work at a bank and are asked to recommend the
Data DATA
Histogram (10 Classes) 6
amount of cash to put in an ATM ATM each day. You don’t want to put in too much (security) or too little little (customer irritation). Here are the daily withdrawals (in 100s of dollars) for a period of 30 days.
5
y c 4 n e u 3 q e r F2
72 84 61 76 104 76 86 92 80 88 98 76 97 82 84 67 70 81 82 89 74 73 86 81 85 78 82 80 91 83
1 1.5 1. 5
5.5 5. 5
(a) Construct Construct a relat relative ive frequenc frequencyy histogram histogram for for the data, data, using eight eight classes. (b) If you put $9000 $9000 in the ATM ATM each day day, what percent percent of the days days in a month should you expect to run out of cash? Explain your reasoning. (c) If you are willing willing to run out of cash for 10% of the days, days,how how much cash, in hundreds of dollars, should you put in the ATM ATM each day? Explain your reasoning.
9.5 9. 5 13 13.5 .5 17 17.5 .5
Data
Histogram (20 Classes) 5
y 4 c n e 3 u q e r 2 F
40. What Would You Do? You work in the admissions department for a college
1 DATA
1 3 5 7 9 11 13 15 1719
Data
In general, a greater number number of classes better preserves the actual values of the data set but is not as helpful for observing general trends and making conclusions. In choosing the number of classes, an important consideration is the size of the data set. For instance, instance, you would not want to use 20 classes if your data set contained 20 entries. In this particular example, as the number of classes classes increases, the histogram shows more fluctuation.The histograms with 10 and 20 classes have classes with zero frequencies. frequencies. Not much is gained by using more than five classes. Therefore, Therefore,it it appears that five classes would be best.
and are asked to recommend the minimum SAT SAT scores that the college will accept for a position as a full-time student. Here are the SAT scores scores for a sample of 50 applicants applicants.. 1325 885 1052 1051 1211
1072 982 996 872 849 785 706 1367 935 980 1188 869 1006 1127 1165 1359 667 1264 727 808 955 1173 410 1148 1195 1141 1193 768 1266 830 672 917 988 791 1035
669 1049 9 79 103 4 5 44 120 2 8 12 887 688 700
(a) Construct a relative relative frequency histogram for the data using 10 classes. classes. (b) If you set the minimum minimum score score at 986, what percent percent of the applicant applicantss will you be accepting? Explain your reasoning. (c) If you want want to accept accept the top 88% 88% of the applica applicants nts,, what should should the minimum score be? Explain your reasoning. 41. Writing DATA
What happens when the number of classes is increased for a frequency histogram? Use the data set listed and a technology tool to create frequency histograms with with 5, 10, and 20 classes. classes. Which graph displays displays the data best? 2 7 3 2 11 3 15 8 4 9 10 13 9 7 11 10 1 2 12 5 6 4 2 9 15
SECT SE CTIO ION N 2. 2.2 2
2.2
49
Mor ore e Gra Graph phss and and Di Disp spllays
More Graphs and Displays
What You Should Learn • How to graph graph and and interpr interpret et quantitative data sets using stem-and-leaf plots and dot plots • How to graph graph and and interpr interpret et qualitative data sets using pie charts and Pareto charts • How to graph graph and and interpr interpret et paired data sets using scatter plots and time series charts
Graphing Graphi ng Quan Quantit titati ative ve Dat Dataa Sets Sets • Gra Graphi phing ng Qual Qualita itativ tivee Data Data Se Sets ts • Graphing Paired Data Sets
Graphing Quantitative Data Sets In Section 2.1, you learned several traditional ways to to display quantitative data graphically graphic ally.. In this section, you will learn a newer way to display quantitativ quantitativee data da ta,, ca call lled ed a stem-and-leaf plot. Stem-and-leaf plots are examples of exploratory data analysis (EDA), which was developed develop ed by John Tukey Tukey in 1977. In a stem-a stem-and-l nd-leaf eaf plot, plot, each numb number er is sepa separate rated d into into a stem (for instance,, the entry’s instance entry’s leftmost digits) digits) and a leaf (for instance instance,, the rightmost digit). A stem-and-leaf plot plot is similar to a histogram but has the advantage that the graph still contains the original data values. values. Another advantage of a stem-and-leaf plot is that it provides an easy way to sort data.
EXAMPLE
1
Constructing a Stem-and-Leaf Plot The following are the numbers of league-leading runs batted in (RBIs) for baseball’s American League during a recent 50-year 50-year period. Display the data in (Source: ce: Major Leagu Leaguee Baseball) Baseball) a stem-and-leaf plot.What plot. What can you conclude? (Sour 155 118 139 129
Study T ip ou yo t,, y lo t em-and-lea f p te In a s t es ve v e as man y lea ve should ha v e h th t trries in here are en t th as t t.. taa se t original da t
159 118 139 112
144 108 122 126
129 122 78 148
105 145 126 116 130 114 122 112 112 142 126 121 109 140 126 119 113 117 118 109 109 119 133 126 123 123 145 121 134 124 119 132 133 124 147
SOLUTION
Because the data entries go from a low of 78 to a high of 159, you should use stem values from 7 to 15.T 15. To construct the plot, list these stems to the left of a vertical vertical line. line. For each data entry entry, list a leaf to the right right of its stem. stem. For instance, the entry 155 has a stem of 15 and a leaf of 5.The 5. The resulting stem-and-leaf plot will be unordered. To obtain an ordered stem-and-leaf plot, rewrite the plot with the leaves in increasing order from left left to right. It is important to include a key for the display to identify the values of the data. RBIs for Amer American ican Leagu League e Leade Leaders rs
Insight em-and-lea f te ou can use s t Yo Y y unusual f y tii f o iden t to tss t plo t ers. ie u t l i vaalues called o taa v da t va v taa alue he da t th xaample 1, t In E x ou will Yo tllier. Y 78 is an ou t tlliers ou t ou t learn more ab tiion 2.3. in Sec t
7 8 9 10 11 12 13 14 15
8
Key: 15 ƒ 5
=
155
58999 6422889378992 962621626314496 0993423 4520587 59
Unordered Stem-and-Leaf Plot
RBIs for Ameri American can Leagu League e Lead Leaders ers
7 8 9 10 11 12 13 14 15
8
Key: 15 ƒ 5
=
155
58999 2223467888999 112223446666699 0233499 0245578 59
Ordered Stem-and-Leaf Plot
stem-and-leaf plot, you can conclude that Interpretation From the ordered stem-and-leaf more than 50% of the RBI leaders had between 110 and 130 RBIs.
50
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
Try It Yourself 1 Use a stem-and-leaf plot to organize the Akhiok population data set listed in the Chapter Opener on page 33. What can you conclude? stems.. List all possible stems List the leaf of each data entry to the right of its stem and include a key. Rewrite the stem-and-leaf plot so that the leaves are ordered. Answer: Page A30 Use the plot to make a conclusion.
a. b. c. d.
EXAMPLE
2
Constructing Variations Variations of Stem-and-Lea Stem-and-Leaff Plots Note to Instructor If you are using MINIT MINITAB AB or Excel, Excel, ask students to use this technology to construct a stem-and-leaf plot.
Organize the data given in Example 1 using a stem-and-leaf plot that has two lines for each stem.What stem. What can you conclude?
SOLUTION
Construct the stem-and-leaf stem-and-leaf plot as described in Example 1, except now list list each stem stem twice. twice. Use the leaves leaves 0, 1, 2, 3, and 4 in the first first stem row row and the leaves leaves 5, 6, 7, 8, and 9 in the second stem row. row. Th Thee revised revised stem-and-lea stem-and-leaff plot is shown. RBIs for American League Leaders
Insight 2. xaamples 1 and Compare E x o w ha t b y using t th tiice t No t taain a ou ob t yo em, y te lines per s t ure o f tu taailed pic t more de t taa. he da t th t
7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15
Key: 15 ƒ 5
=
155
8
58999 42232 68897899 22123144 9666696 03423 99 420 5587 59
Unordered Stem-and-Leaf Plot
RBIs for American League Leaders
7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15
Key: 15 ƒ 5
=
155
8
58999 22234 67888999 11222344 6666699 02334 99 024 5578 59
Ordered Stem-and-Leaf Plot
the display, display, you can conclude that most of the RBI RBI Interpretation From the leaders had between 105 and 135 RBIs.
Try It Yourself 2 Using two rows for each stem, revise the stem-and-leaf plot plot you constructed in Try It Yourself 1. a. List each stem twice. b. List all leaves using the appropriate stem row.
Answer: Page A30
SECT SE CTIO ION N 2. 2.2 2
51
Mor ore e Gr Grap aphs hs an and d Dis Displ play ayss
You can also use a dot plot to graph quantitative quantitative data. In a dot plot, each data entry is plotted, plotted, using a point, above a horizontal axis. axis.Like Like a stem-and-leaf stem-and-leaf plot, a dot plot allows you to see how data are distributed, distributed, determine specific specific data entries, entries, and identify unusual data values. values.
EXAMPLE
3
Constructing a Dot Plot Use a dot plot to organize the RBI data given in Example 1. 155 114 122 109 123 129
159 122 121 109 145 112
144 112 109 119 121 126
129 112 140 139 134 148
105 142 126 139 124 147
145 126 119 122 119
126 118 113 78 132
116 118 117 133 133
130 108 118 126 124
SOLUTION So that each data entry entry is included in the the dot plot, the horizontal axis should include numbers between 70 and 160. To represent a data entry, entry, plot a point above above the entry’s entry’s position position on the axis. axis. If an entry is repeated repeated,, plot another point above the previous point. RBIs for American League Leaders
70
75
80
85
90
95
100
105
110
115
120
125
130
135
140
145
150
155
160
From om the dot plot, plot, you can see tha thatt most values values cluster cluster Interpretation Fr between 105 and 148 and the value that occurs the most is 126. You can also see that 78 is an unusual data value.
Try It Yourself 3 Use a dot plot to organize the Akhiok population data set listed in the Chapter Opener on page 33. What can you conclude from from the graph? a. Choose an appropriate scale for the horizontal axis. b. Represent each data entry by plotting a point. c. Describe any patterns for the data.
Answer: Page A30
Technology can be used to construct stem-and-leaf plots and dot plots. For instance, a MINITAB MINITAB dot plot for the RBI data is shown. shown.
RBIs for American League Leaders
80
90
100
110
120
130
140
150
160
52
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
Graphing Qualitative Data Sets Pie charts provide a convenient way to present qualitative data graphically. A pie chart is a circle that is divided into sectors that represent categories. categories. The area of each sector is proportional to the frequency of each category category..
EXAMPLE
4
Constructing a Pie Chart
Motor Vehicle Occupants Killed in 2001
Vehi ehicle cle typ type e
Killed Kil led
Cars Trucks Motorcycles Other
20,269 12,260 3,067 612
The numbers of motor vehicle occupants killed in crashes in 2001 are shown in the table. Use a pie chart to organize organize the data. What can you conclude? conclude? (Source: U.S.. Department of Transportation, U.S Transportation, National Highway Traffic Traffic Safety Administration)
SOLUTION
Begin by finding the relative frequency, frequency, or percent, of each category. category. Then construct the pie chart using the central angle that corresponds to each category categ ory.. To find find the the central central angle angle,, mult multiply iply 360° by the the categor category’s y’s relat relative ive frequency.. For example, the central angle for cars is 360°10.562 202°. From frequency the pie chart, you can see that most fatalities fatalities in motor vehicle crashes were were those involving the occupants of cars. L
Cars Trucks Motorcycles Other
f
Relative freq fr eque uenc ncy y
Angl An gle e
20,269 12,260 3,067 610
0.56 0.34 0.08 0.02
202° 122° 29° 7°
Motor Vehicle Occupants Killed in 2001 Motorcycles 8% Trucks 34%
Other 2%
Cars 56%
Try It Yourself 4 The numbers of motor vehicle occupants killed in crashes in 1991 are shown in the table. table. Use a pie chart to to organize the data. data. Compare the 1991 1991 data with U.S.. Departme Department nt of Transportat Transportation, ion, National Highway Highway Safety Safety the 2001 data. (Source: U.S Administration)
Motor Vehicle Occupants Killed in 1991
Motor Vehicle Occupants Killed in 2001 motorcycles other 8% 2%
Vehi ehicle cle typ type e
Killed Kil led
Cars Trucks Motorcycles Other
22,385 8,457 2,806 49 7
a. Find the relative frequency of each category. b. Use the central angle to find the portion that corresponds to each category. Answer: Page A31 c. Compare the 1991 data with the 2001 data.
trucks 34%
cars 56%
Technology can be used to construct pie charts. charts. For instance, instance, an Excel pie chart for the data in Example 4 is shown.
SECT SE CTIO ION N 2. 2.2 2
Mor ore e Gr Grap aphs hs an and d Dis Displ play ayss
53
Another way to graph qualitative qualitative data is to use a Pareto chart. chart. A Pareto chart is a vertical bar graph in which the height of each bar represents frequency or relative frequency. frequency. The bars are positioned in order of decreasing height, with the tallest bar positioned at the left. Such positioning helps highlight important important data and is used frequently in business.
EXAMPLE
5
Constructing a Pareto Chart
Picturing the World The five top-selling vehicles in the United States for January of 2004 are shown in the following Pareto chart. One of the top five vehicles was a car. car. The othe otherr four four vehicles were trucks. (Source: Associated Press)
d
s
)
Five Top-Selling Vehicles for January of 2004 70 o
u
60 ni
50
ht
s
a
n
40 os
30
dl
(
In a recent year, the retail industry industry lost $41.0 million in inventory inventory shrinkage. shrinkage. Inventory shrinkage shrinkage is the loss of inventory through breakage,pilferag breakage, pilferage, e, shoplift shoplift-ing, and so on. The causes causes of the inventory inventory shrinkage are administrat administrative ive error ($7.8 million million), ), employee theft ($15.6 million) million),, shoplift shoplifting ing ($14.7 million million), ), and vendor fraud fraud ($2.9 million). million). If you were a retailer, retailer, which causes causes of inventory inventory Federation ion and Center shrinkage would you address first? (Source: National Retail Federat for Retailing Education, University of Florida)
SOLUTION
Using frequencies for the vertical axis, axis, you can construct the Pareto chart as shown. Causes of Inventory Shrinkage
62
16
sr 14 al
41
l 12 o
d 10 f
31 28 26 o
20
s
8
m
10
il
6
N
u
b
e
r
y m e r e s d o r i r r a m R a l o r e a e S l e x p i v t a C d g E F - S d o o d t y D r e r F o r l o T o F o v h e Vehicle C
How many vehicles from the top five did Ford sell in January of 2004?
n iol M
4 2 Employee theft
Shoplifting Administrative error
Vendor fraud
Cause
the graph, it is easy to see see that the causes of inventory Interpretation From the shrinkage that should be addressed first are employee theft and shoplifting.
Try It Yourself 5 Every year, year, the Better Business Business Bureau (BBB) receives receives complaints from customers.. In a recent year, the BBB received the following customers following complaints. complaints. 7792 complaints about home furnishing stores 5733 complaints about computer sales and service stores 14,668 complaints about auto dealers 9728 complaints about auto repair shops 4649 complaints about dry cleaning companies Use a Pareto chart to organize the data. What source is the greatest cause cause of (Source: Council of Bett Better er Business Bureaus) complaints? a. Find the frequency or relative frequency for each data entry. b. Position the bars in decreasing order according to frequency or relative
frequency. c. Interpret the results in the context context of the data.
Answer: Page A31
54
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
Graphing Paired Data Sets When each entry in one data set corresponds to one entry in a second data set, the sets are called paired data sets. For instance, instance, suppose a data set contains the costs of an item and a second data set contains sales amounts for the item at each cost. Because each cost corresponds to a sales amount, the data sets are paired. One way to graph paired data sets is to use a scatter plot, where the ordered pairs are graphed as points in a coordinate plane. plane. A scatter plot is used to show the relationship between two quantitative variables.
EXAMPLE
6
Interpreting a Scatter Plot The British statistician Ronald Fisher (see page 29) introduced a famous data set called Fisher’s Iris data set.This set. This data set describes various physical characteristics, such as petal length and petal width (in millimeters), millimeters), for three species of iris. iris. In the scatter plot shown, the petal lengths form the first data set and the petal widths form form the second second data set. As the petal length length increases increases,, what tends tends to (Sourc urce: e: Fis Fisher her,, R. A., 193 1936) 6) happen to the petal width? (So Note to Instructor A complete discussion of types of correlation occurs in Chapter 9. Yo You u may want,however want, however,, to discuss positive correlation, negative correlation, and no correlation at this point. Be sure that students do not confuse correlation with causation.
Fisher’s Iris Data Set 25
) s r e t e 20 m i l l i m15 n i ( h t d i 10 w l a t e 5 P 10
Length of emp em plo loy yme men nt (in (i n ye year ars) s)
Sala Sa lary ry (in (i n do dolla llars rs))
5 4 8 4 2 10 7 6 9 3
32,000 32,500 40,000 27,350 25,000 43,000 41,650 39,225 45,100 28,000
20
30
40
50
60
70
Petal length (in millimeters)
SOLUTION The horizontal axis represents represents the petal length, and the vertical axis represents the petal width. Each point in the scatter plot represents represents the petal length and petal width of one flower. Interpretation From the scatter plot, you can see that as the petal length increases,, the petal width also tends to increase. increases increase.
Try It Yourself 6 The lengths of employment and the salaries of 10 employees are listed in the table at the left. left. Graph the data using using a scatter plot.What plot. What can you conclude? a. Label the horizontal and vertical axes. b. Plot the paired data. c. Describe any trends.
Answer: Page A31
You will learn more about scatter plots and how to analyze them in Chapter 9.
SECT SE CTIO ION N 2. 2.2 2
Mor ore e Gr Grap aphs hs an and d Dis Displ play ayss
55
A data set that is composed composed of quantitat quantitative ive entries entries taken at regular regular intervals over a period of time is a time series. For instance instance,, the amount amount of precipitation measured each day for one month is an example of a time series. You can use a time series chart to graph a time series. See MINITAB MINITAB and TI-83 TI-83 steps on pages 114 and 115.
EXAMPLE
7
Constructing a Time Series Chart The table lists the number of cellular telephone subscribers (in millions) and a subscriber’s average local monthly bill for service (in dollars) for the years 1991 through 2001. Construct a time series chart for the number of cellular subscribers. What can you conclude? (Source: Cellular Telecommunications & Internet Association)
Note to Instructor Consider asking students to find a time series plot in a magazine or newspaper and bring it to class for discussion.
Subscribers Subscribe rs Averag Average e bill Year (in mil million lions) s) (in dollar dollars) s) 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
7.6 11.0 16.0 24.1 33.8 44.0 55.3 69.2 86.0 109.5 128.4
72.74 68.68 61.48 56.21 51.00 47.70 42.78 39.43 41.24 45.27 47.37
SOLUTION Let the horizontal axis represent the years and the vertical axis represent the number of subscribers (in millions). Then plot the paired data and connect them with line segments. Cellular Telephone Subscribers ) s n iol il m ni ( sr e bi r c s b u S
130 120 110 100 90 80 70 60 50 40 30 20 10 1991 19 91 19 1992 92 19 1993 93 19 1994 94 19 1995 95 19 1996 96 19 1997 97 19 1998 98 19 1999 99 20 2000 00 20 2001 01
Year
Interpretation The graph shows that the number of subscribers has been increasing since 1991, with greater increases recently. recently.
Try It Yourself 7 Use the table in Example 7 to construct a time series chart for a subscriber’s average local monthly cellular telephone bill for the years 1991 through 2001. What can you conclude? a. Label the horizontal and vertical axes. b. Plot the paired data and connect them with line segments. Answer: Page A31 c. Describe any patterns you see.
56
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
Exercises
2.2 2. 2
Building Basic Skills and Voca Vocabulary bulary 1. Name some ways to display quantitative data graphically. graphically. Name some ways
Help
to display qualitative data graphically. 2. What is an advantage of using a stem-and-leaf plot instead of a histogram?
What is a disadvantage?
Student Stud y Pack
Putting Graphs in Context In Exercises 3–6, match the plot with the description of of the sample.
89 Key: 2 ƒ 8 2223457789 0245 1 56 2
3. 2
3 4 5 6 7
1. Quantitative: stem-and-leaf plot, plot, dot plot, histogram, scatter plot, time series chart Qualitative: pie chart, Pareto chart 2. Unlike the histogram, histogram, the stemand-leaf plot still contains the original data values. values. However However,, some data are difficult to organize in a stem-and-leaf plot. 3. a
4. d
5. b
=
28
5.
78 Key: 6 ƒ 7 7 455888 8 1355889 9 00024
4. 6
50 52 54 56 58 60 6 2 6 4 66 160 162 164 166 168 170 172 174 176
7. 27,32,41,43,43,44,47,47,48,50, 51,51,52,53,53,53,54,54,54,54, 55,56,56,58,59,68,68,68,73,78, 78,85
(a) (b) (c) (d)
Max:85; Ma x:85; Mi Min:27 n:27
Prices (in dollars) Prices dollars) of a sample sample of 20 brands brands of jeans jeans Weights (in pounds) of a sample of 20 20 first grade grade students Volumes (in cubic centimeters) centimeters) of a sample of 20 oranges Ages (in years) years) of a sample of 20 20 residents of a retirement retirement home
Graphical Analysis
In Exercises 7–10, use the stem-and-leaf plot or dot plot to list the actual data entries. What is the maximum data entry? What is the minimum data entry?
Max:: 167;Min: 129 Max 9. 13,13,14,14,14,15,15,15,15,15, 16,17,17,18,19
7. 2
3 4 5 6 7 8
Max:19; Ma x:19; Mi Min:13 n:13 10. 214, 214, 214 214,, 214 214,, 216 216,, 216 216,, 217 217,, 218 218,, 218,, 220 218 220,, 221 221,, 223 223,, 224 224,, 225 225,, 225 225,, 227,, 228 227 228,, 228 228,, 228 228,, 228 228,, 230 230,, 230 230,, 231,, 235 231 235,, 237 237,, 239 Max:: 239;Min: 214 Max
7 Key: 2 ƒ 7 27 2 1334778 0112333444456689 888 388 5 =
11. Anheuser-Busch spends the most on advertising and Honda spends the least. least. (Answ (Answers ers will will vary.) vary.) 12. Value increased the most between 2000 and 2003. (Answers will vary.) vary.)
67
6.
6. c
8. 129, 129, 133 133,, 136 136,, 137 137,, 137,141, 141 141,, 141,, 141 141 141,, 143 143,, 144 144,, 144,146, 149 149,, 149,, 150 149 150,, 150 150,, 150 150,, 151,152, 154 154,, 156,, 157 156 157,, 158 158,, 158 158,, 158,159, 161 161,, 166, 167
=
9.
Key: 12 ƒ 9
8. 12
12 13 13 14 14 15 15 16 1 6
9 3 677 1111344 699 000124 678889 1 67
10.
13. Tailgaters irk drivers the most, and too-cautious drivers irk drivers the least. (Answ (Answers ers will will vary vary.) .) 13
14
15
16
17
18
19
215
220
225
230
235
=
12.9
SECT SE CTIION2.2
14. Twice as many people “sped up” than “cut “cut off a car.” (Answers will vary.) 15. Key: 3 ƒ 3 3 4 5 6
=
Graphical Analysis In Exercises 11–14, what can you conclude from from the graph?
33
11.
It appears that most elephants tend to drink less than 55 gallons of water per day.(Answers will vary.) 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
=
Stock Portfolio
12. ) 30,000 s r a l l o d 20,000 n i ( e u l a 10,000 V
t s d a e l e r e r h o r c r l l n s o i s o u u v C H o e B e M h h n C A Company
2000 200 0 200 2001 1 200 2002 2 200 2003 3 200 2004 4
Year
(Source: (Sour ce: Niel Nielsen sen Media Media Research) Research)
13.
5 1
How Other Drivers Irk Us Too cautious 2% Speeding 7% Driving slow 13%
03 39 059
No signals 13% Other 10%
689 05 05 99
Ignoring signals 3% Using cell phone 21%
Using two parking spots 4% Bright lights Tailgating 23% 4%
(Adapted from Reuters/Zogby)
Driving and Cell Phone Use
14. s 50 t n e d i 40 c n i f 30 o r 20 e b m u 10 N
Swerved Swerv ed Sped up
Cut off Almost a car hit a car
Incident
(Adapted from USA TODAY)
Graphing Data Sets In Exercises 15–28, organize the data using the indicated type of graph. What can you conclude about the data?
1 3
15. Elephants: Water Consumed Use a stem-and-leaf plot to display display the data. The
It appears that the majority of the elephants eat between 390 and 480 pounds of hay each day. (Answers will vary.) 16 17 18 19 20
Top Five Sports Advertisers
) s r a l l 200 o g d n i f 150 s i o t r s e n 100 v o d i l A l i 50 m n i (
319
8 5 9 7
17. Key: 17 ƒ 5
57
Using and Interpreting Concepts
233459 0 1 13 45 56 67 8 133 0069
16. Key : 31 ƒ 9
Mor ore e Gra Grap phs an and d Di Disp spla lays ys
=
DATA
33 45 34 47 43 48 35 69 45 60 46 51 41 60 66 41 32 40 44 39 46 33 53 53
17.5
48 113455679 13446669 0023356 18
data represent the amount of water (in gallons) consumed by 24 elephants in one day.
16. Elephants: Hay Eaten Use a stem-and-leaf plot to display the data. data. The data DATA
represent the amount of hay (in pounds) eaten daily by 24 elephants. 449 450 419 448 479 410 446 465 415 455 345 305 491 479 390 393 403 298 503 327 460 351 409 319
17. Apple Prices Use a stem-and-le stem-and-leaf af plot to display display the data. data. Th Thee data It appears that most farmers DATA represent the price (in cents per pound) paid to 28 farmers for apples. charge 17 to 19 cents per pound of apples. (Answers will vary.) 19.2 19.6 16.4 17.1 19.0 17.4 17.3 20.1 19.0 17.5 18. See Selected Answers, Answers, page A## 17.6 18.6 18.4 17.7 19.5 18.4 18.9 17.5 19.3 20.8
19.3 18.6 18.6 18.3 17.1 18.1 16.8 17.9 18. Advertisements DATA
Use a dot plot to to display the data. data. The data data represent the number of advertisements seen or heard in one week by a sample of 30 people from the United States. 598 494 441 595 728 690 684 486 735 808 734 590 673 545 702 481 298 135 846 764 317 649 732 582 637 588 540 727 486 703
58
19.
CHA HAPT PTER2 ER2
Desc De scrrip ipti tiv ve St Sta ati tist stic icss
19. Life Spans of House Flies
Housefly Life Spans DATA
4 5 6 7 8 9 10 11 12 13 14
Life span (in days)
It appears that the life span of a housefly tends to be between 4 and 14 days. days. (Answ (Answers ers will will vary.) vary.) 20. Nobel Prize Laureates United Kingdom 15%
United States 40%
France 7% Sweden 4%
Other 23%
Germany 11%
The United States had the greatest number of Nobel Prize laureates during the years 1901–2002. 21.
Use a dot plot plot to display display the data. data. Th Thee data represent the life span (in days) of 40 house flies flies.. 9 9 4 4 8 1111 10 10 5 8 1133 9 6 7 11 11 13 11 6 9 8 14 10 6 10 10 8 7 14 11 7 8 6 11 1 3 10 14 14 8 13 1 4 10
20. Nobel Prize Use a pie chart to to display the data. The data represent the
number of Nobel Prize laureates by country during the years 1901–2002. United States United Kingdom
27 0 100
France Sweden
49 30
Germany Other
77 157
21. NASA Budget Use a pie chart to to display the data. The data represent the
2004 NASA budget (in millions of dollars) divided among three categories. (Source: (Sour ce: NAS NASA) A)
Scienc Scie ncee, ae aero rona naut utic icss, an and d ex expl plor orat atio ion n Space flight capabilities Inspector General
7661 7661 7782 26
22. NASA Expenditures
2004 NASA Budget Inspector General Science, 0.2% aeronautics, and exploration 49.5% Space flight capabilities 50.3%
It appears that 50.3% of NASA’s NASA’s budget went to space flight capabilities. (Answers will vary).
Use a Pareto Pareto chart to to display display the data. data. Th Thee data represent the estimated 2003 NASA space shuttle operations expenditures (Source: e: NAS NASA) A) (in millions of dollars). (Sourc External tank Main engine Reusable solid rocket motor Solid rocket booster Vehi ehicle cle and ext extrav ravehi ehicul cular ar act activi ivity ty Flight hardware upgrades
265.4 249.0 374.9 156.3 636.1 636 .1 162.6
23. UV Index Use a Pareto chart to to display the data. The data represent represent the
22. See Selected Answers, Answers, page A##
(Source: ce: Natio National nal ultraviolet index for five cities at noon on a recent date. (Sour
23.
Oceanic and Atmospheric Administration)
Ultraviolet Index
Atlanta, GA Boise, ID Concord, NH Denver, CO Miami, FL 9 7 8 7 10
10
x e d 8 n 6 i V4 U
24. Hourly Wages
2 L F , i m a i M
A G , a t n a l t A
H N , d r o c n o C
D I , e s i o B
O C , r e v n e D
It appears that that Boise, Boise, ID, and Denver,, CO, Denver CO,have have the same UV index. (Answers will vary.) 24.
) s r a 14.00 l l o d13.00 n i ( 12.00 e g11.00 a w10.00 y l r 9.00 u o H
Hourly Wages
25 30 35 40 45 50
Hours
It appears that hourly wage increases as the number of hours worked increases. (Answers will vary.)
Use a scatter plot to display display the data in the table. table. The data represent the number of hours worked and the hourly wage (in dollars) for a sample of 12 production workers. workers. Describe any trends shown.
Hours Hou rs
Hourly Hou rly wag wage e
33 37 34 40 35 33 40 33 28 45 37 28
12.16 9.98 10.79 11.71 11.80 11.51 13.65 12.05 10.54 10.33 11.57 10.17
SECT SE CTIION2.2
Table for Exercise 25
Number Numb er of stud st uden ents ts perr te pe teac ach her
59
Mor ore e Gra Grap phs an and d Di Disp spla lays ys
25. Salaries
Use a scatter plot to display the data shown in the table. The data represent the number of students per teacher and the average teacher salary (in thousands of dollars) for a sample of 10 school districts. districts. Describe any trends shown.
Aver erag age e teac te ache her’ r’ss sala sa lary ry
26. UV Index Use a time series chart to display the data. The data represent the
17.1 17.5 18.9 17.1 20.0 18.6 14.4 16.5 13.3 18.4 25.
28.7 47.5 31.8 28.1 40.3 33.8 49.8 37.5 42.5 31.9
ultraviol ultr aviolet et index for Memphis Memphis,, TN TN,, on June June 14 –23 during a recent year. year. (Source: Weather Services Services International)
June 14 June 15 June 16 June 17 June 18 9 4 10 10 10 June 19 June 20 June 21 June 22 June 23 10 10 10 9 9 27. Egg Prices Use a time series chart to display the data. The data represent the prices of Grade A eggs (in dollars per dozen) for the indicated years. (Source: U.S U.S.. Bureau of Labor Statistics) Statistics)
1990 1.00 1996 1.31
Teachers’ Salaries y r a l a s s ’ r e h c a e t . g v A
55 50 45 40
19 91 1.01 19 97 1.17
1992 0.93 1998 1.09
1993 0.87 1999 0.92
1994 0.87 2000 0.96
1995 1.16 2001 0.93
28. T-Bone Steak Prices
Use a time series series chart to display display the data. The data represent the prices of T-bone T-bone steak (in dollars per pound) for the indicated U.S.. Bureau of Labor Statistics) Statistics) years. (Source: U.S
35 30 25
1990 5.45 1996 5.87
13 15 17 19 21
Students per teacher
It appears that a teacher’s average salary decreases as the number of students per teacher increases. (Answers will vary.)
19 91 5.21 19 97 6.07
1992 5.39 1998 6.40
1993 5.77 1999 6.71
1994 5.86 2000 6.82
1995 5.92 2001 7.31
26. See Selected Answers, Answers, page A## 27.
s ) g n 1.35 g e e z o A d 1.25 e r d e 1.15 a p r r 1.05 G s a f l o l o 0.95 e d c i n 0.85 r i P (
Price of Grade A Eggs
Extending Concepts A Misleading Graph? In Exercises 29 and 30, (a) explai explain n why the the graph is is misleading. misleading. (b) redra redraw w the graph graph so that it is not mislead misleading. ing.
0 1 2 3 4 5 6 7 8 9 0 1 9 9 9 9 9 9 9 9 9 9 0 0 9 9 9 9 9 9 9 9 9 9 0 0 1 1 1 1 1 1 1 1 1 1 2 2
s)r
29.
Sales for Company A al l o d
Year
f 120 o s
It appears the price of eggs peaked in 1996. (Answers will vary.) vary.) el
s 110 d a n S
as 100 u o
28. See Selected Answers, Answers, page A##
90
th ni
29. See Odd Answers, page A## (
30. See Selected Answers, Answers, page A##
3rd
2nd
1st
4th
Quarter
30. Sales for Company B 1st quarter 20% 3rd quarter 45%
2nd quarter 15%
1st 2nd 3rd 4th quarte qu arterr quarte quarterr quarte quarterr quarte quarterr 20 %
15%
45%
20%
60
CHA CH APT PTE ER 2
2.3
Desscr De crip ipti tiv ve Sta Stattis isti ticcs
Measures Meas ures of Central Tenden Tendency cy
What You Should Learn
Mean,, Media Mean Median, n, and and Mode Mode • Weig eight hted ed Mean Mean and and Mean Mean of Gro Group uped ed Data Data • The Shape of Distributions
• How to find find the the mean, mean, median, and mode of of a population and a sample
Mean, Median, and Mode
• How to to find a weight weighted ed mean mean of a data set and the mean of a frequency distribution
A measure of central tendency is a value that represents a typical, or central, entry of a data set. The three most commonly used measures of central tendency are the mean, mean, the median, median, and the mode. mode.
• How to to describ describe e the shape shape of a distribution as symmetric symmetric,, uniform, or skewed and how to compare the mean and median for each
DEFINITION The mean of a data set is the sum of the data entries divided by the number The of entries.To find the mean of a data set, use one of the following formulas. formulas. Population Mean:
m =
gx
Sample Mean: x
N
=
gx
n
Note that N represents the number of entries in a population and n represents the number of entries in a sample.
1
EXAMPLE
Finding a Sample Mean
p St udy T i p Notice that the mean in Example 1 has one more decimal place than the alues. va data v original set o f da ll be i w l e r u f his r ound-of f r Th T used throughout the text. f Another important r ound-of f l d u o h s g n i d n r ule is that rou not be done until the f inal a calculation. f a answer o f
The prices (in dollars) for a sample of room air conditioners (10,000 Btus per hour) are listed. What is the mean price of the air conditioners? 500 840 470 480 420 440 440
SOLUTION gx
The sum of the air conditioner prices is
=
500
+
840
+
470
+
480
+
420
+
440
+
440
=
3590.
To find the mean price, divide the sum of the prices by the number of prices in the sample. x
=
gx
n
=
3590 7
L
512.9
So, the mean price of the air conditioners conditioners is about $512.90.
Try It Yourself 1 The ages of employees in a department are listed. What is the mean age? 34 27 50 45 41 37 24 57 40 38 62 44 39 40 a. Find the sum of the data entries. b. Divide the sum by the number of data entries. c. Interpret the results in the context context of the data.
Answer: Page A31
SECT SE CTIO ION N 2. 2.3 3
Meas Me asur ures es of Cen Centr tral al Ten ende denc ncyy
61
DEFINITION
S tudy Tip
t he t,, t here are taa se t In a da t ues v a ta a l lu ber o f da t same num e an as t h re e t he med i ia ve a bo v d i iaan. For o w t he me are be l lo e 2, t hree xaamp l le n E x taance, i in ns t i in o w $ 4 70 s are be l lo o f t he pr i icce 0. a bo ve $ 4 7 e r a e e r h t and
The median of a data set is the value that lies in the middle of the data when the data set is ordered. If the data set has an odd number of entries, the median is the middle data entry. entry. If the data set has an even number of entries, the median is the mean of the two middle data entries.
2
EXAMPLE
Finding the Median Find the median of the air conditioner prices given in Example 1.
SOLUTION To find the median price, first order the data. 420 440 440 470 480 500 840 Because there are are seven entries (an odd number), the median is is the middle, middle, or fourth, data entry. entry. So, the median air conditioner conditioner price is $470.
Try It Yourself 2 One of the families of Akhiok is planning to relocate to another city. city. The ages of the fami family ly members members are are 33, 37, 3, 7, and 59. 59. Wha Whatt will will be the the median median age of the remaining residents of Akhiok after this family relocates?
Akhiok, Alaska is a fishing fishing village on Kodiak Island. (Photograph (Photograp h © Roy Corral.)
a. Order the data entries entries.. b. Find the middle data entry.
Answer: Page A31
3
EXAMPLE
Finding the Median The air conditioner priced at $480 is discontinued. What is the median price of the remaining air conditioners?
SOLUTION
The remaining remaining prices prices,, in order, are
420, 42 0, 44 440, 0, 44 440, 0, 47 470, 0, 50 500, 0, an and d 840. 840. Because there are six entries entries (an even number), the median is the mean of the two middle entries. Median
=
=
440
+
470
2 455
So, the median price of the remaining remaining air conditioners is $455.
Try It Yourself 3 Find the median age of the residents of Akhiok using the population data set listed in the Chapter Opener on page 33. a. Order the data entries entries.. b. Fin Find d the mea mean n of the two middle data entries. c. Interpret the results in the context context of the data.
Answer: Page A31
62
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
DEFINITION The mode of a data set is the data entry that occurs with the greatest frequency frequ ency.. If no entry is repeated repeated,, the data set has has no mode. If two entries entries occur with the same greatest frequency, frequency, each entry is a mode and the data set is called bimodal.
EXAMPLE
4
Finding the Mode Find the mode of the air conditioner prices given in Example 1.
Insigh t
SOLUTION
y i iss t he on l y T he mode endenc y te trra l l t f cen t be measure o o descr i b to used t e b n a c t a t h e l o f ve e v na l l l le m i in o n e h t t a taa da t t.. en t measurem
Ordering the data helps to find the mode.
420 440 440 470 480 500 840 From the ordered ordered data, you can see that the entry of 440 occurs twice, twice, whereas the other data data entries occur occur only once. once. So So,, the mode of the air conditione conditionerr prices is $440.
Try It Yourself 4 Find the mode of the ages of the Akhiok residents. residents. The data are given below. 25, 5, 18, 12 25, 12,, 60 60,, 44, 24 24,, 22 22,, 2, 7, 15, 39 39,, 58 58,, 53, 36 36,, 42 42,, 16, 20 20,, 1, 5,39, 51,, 44 51 44,, 23 23,, 3,13, 37, 56 56,, 58 58,, 13, 47 47,, 23 23,, 1, 17 17,, 39 39,, 13 13,, 24 24,, 0, 39 39,, 10 10,, 41, 1,48, 17, 18 18,, 3, 72 72,, 20 20,, 3, 9,0, 12 12,, 33 33,, 21, 40 40,, 68 68,, 25, 40 40,, 59 59,, 4, 67 67,, 29 29,, 13,, 18 13 18,, 19 19,, 13 13,, 16 16,, 41 41,, 19 19,, 26 26,, 68 68,, 49 49,, 5, 26, 49 49,, 26 26,, 45, 41 41,, 19 19,, 49 a. Write the data in order . b. Identify the entry, entry, or entries, entries, that occur with with the greatest frequency. Answer: Page A31 c. Interpret the results in the context context of the data.
EXAMPLE
5
Finding the Mode Pol olit itic ical al party
Freq eque uenc ncy y,
Democrat Republican Other Did not respond
34 56 21 9
f
At a political debate a sample of audience members was asked to name the political party to which they belong. belong. Their responses are show shown n in the table. What is the mode of the responses?
SOLUTION
The response occurring with the greatest frequency is Republican. So, the mode is Republican. Republican. sample, there were more more Republicans than people of Interpretation In this sample, any other single affiliation.
Try It Yourself 5 In a survey, survey, 250 baseball fans were asked if Barry Bonds’s Bonds’s home run record would ever be broken. One hundred sixty-nine sixty-nine of the fans responded responded “yes, “yes,”” 54 responded “no,” “no,” and 27 “didn’t know.” know.” What is the mode of the responses? a. Identify the entry that occurs with the greatest frequency. Answer: Page A31 b. Interpret the results in the context context of the data.
SECT SE CTIO ION N 2. 2.3 3
Meas Me asur ures es of Cen Centr tral al Ten ende denc ncyy
63
Although the mean, the median, and the mode mode each describe describe a typical entry of a data set, there are advantages advantages and disadvantages disadvantages of using each, especially when the data set contains outliers. outliers.
DEFINITION An outlier is a data entry that is far removed from the other entries in the data set.
Ages in a class 20 21 23
20 21 23
20 21 23
20 22 24
20 22 24
6
EXAMPLE 20 22 65
21 23
Outlier
Comparing the Mean, the Median, and the Mode Find the the mean, the median, and the mode of the sample ages ages of a class class shown at the left. Which measure of central tendency best describes describes a typical entry of this data set? Are there any outliers?
SOLUTION
Picturing the World The National Association of Realtors keeps a databank of existing-home sales. sales. One list uses the median price of existing homes sold and another uses the mean price of existing homes sold. sold. The sales sales for the first quarter of 2003 are shown (Source: ce: Nati National onal in the graph. (Sour Association of Realtors)
2003 U.S. Existing-Home Sales )s 240 al ci p
r
l 220 o
e
d
Mode Mo de::
The ent entry ry occ occur urri ring ng wit with h the the grea greate test st fre frequ quen ency cy is is 20 yea years rs..
180
y 5
o
u
160
n 4 e
in(
140
r
as ni ts i ht
=
n
=
21
+
L
23.8 years
22
2
=
21.5 years
Ages of Students in a Class
n g
h-
=
Interpretation The mean takes every entry into account but is influenced by the outlier of 65. The median median also takes takes every entry into into account, and it is not affected by the outlier. outlier. In this case case the mode exists exists,, but it doesn’t doesn’t appear to represent a typical entry.Sometimes entry. Sometimes a graphical comparison can help you decide which measure of central tendency tendency best represents a data set. The histogram histogram shows the distribution distribution of the data and the location location of the mean, the median, and the mode. In this case, it appears that the median median best describes the data set.
6
s
E
Median
d
o o
x
Median:
f 200
m
475 20
x
Median price Mean price
r e
gx
Mean:
c u
q 3 e F 2 Jan.
Feb.
M a r.
1
Month
Notice in the graph that each month the mean price is about $40,000 more than the median price. What factors would cause the mean price to be greater than the median price?
20
Mode
25
30
Mean Median
35
40
Age
45
50
55
60
65
Outlier
Try It Yourself 6 Remove the data entry of 65 from the the preceding data set. Then rework the example. How does the absence of this outlier outlier change each of the measures? a. Find the mean, th thee median, and the mode. b. Compare these measures of central tendency with those found in Example 6. Answer: Page A31
64
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
Weighted Mean and Mean of Grouped Data Sometimes data sets contain entries that have a greater effect on the mean than do other entries entries.. To find the mean of such data sets, sets, you must find the weighted mean.
DEFINITION A weighted mean is the mean of a data set whose entries have varying weights.. A weighted mean is given by weights x
=
g 1x # w2 gw
where w is the weight of each entry x.
7
EXAMPLE
Finding a Weighted Mean You are taking a class in which your grade is determined from five sources: 50% from your test mean, 15% from your midterm, midterm, 20% from your final exam, 10% from your computer computer lab work, and 5% from your homework. Your scores are 86 (test (test mean mean), ), 96 (midt (midterm erm), ), 82 (fina (finall exam), exam), 98 (comp (compute uterr lab), lab), and 100 100 (homework). What is the weighted mean of your scores?
SOLUTION
Begin by organizing the scores and the weights in a table.
Source Test Mean Test Midterm Final Exam Computer Lab Homework
Score, x
Weight, w
xw
86 96 82 98 1 00
0.50 0.15 0.20 0.10 0.05
43.0 14.4 16.4 9.8 5.0
g w
x
=
g 1x # w2 gw
=
88.6 1
=
=
1
g 1 x # w 2
=
88.6
88.6
So, your weighted mean for the course is 88.6. 88.6.
Try It Yourself 7 An error error was made made in grading grading your your final exam. Instead of getting getting 82, you scored 98. What is your new weighted mean? a. Multiply each score by its weight and find the sum of these products . b. Find the sum of the weights. c. Find the weighted mean. Answer: Page A31 d. Interpret the results in the context context of the data.
SECT SE CTIO ION N 2. 2.3 3
Meas Me asur ures es of Cen Centr tral al Ten ende denc ncyy
65
If data are presented in a frequency distribution, distribution, you can approximate the mean as follows.
DEFINITION
Study T ip
The mean of a frequency distribution for a sample is approximated by
tiion trribu t t frrequenc y dis he f f th I f t hen th tiion, t tss a popula t represen t frrequenc y he f th f t he mean o f th t ed te xiima t tiion is appro x trribu t dis t b y g 1 x # f 2 m
=
where N
N
x
g 1x # f2
n
Note that n
g f
GUIDELINES Finding the Mean of a Frequency Distribution In Words
In Symbols
1. Find the midpoint of
x
each class. 2. Find the sum of the products
of the midpoints and the frequencies. 3. Find the sum of the frequencies. 4. Find the mean of the
frequency distribution.
Class midpoint
=
1Lower limit2
+
1Upper limit2
2
g 1x # f2
n
=
x
=
g f g 1x # f2
n
8
EXAMPLE
Finding the Mean of a Frequency Distribution
Frequency, x
f
1 x # f 2
12.5 24.5 36.5 48.5 60.5 72.5 84.5
6 10 13 8 5 6 2
75.0 245.0 474.5 388.0 302.5 435.0 169.0
n =
=
where x and f are the midpoints and frequencies of a class class,, respectively respectively..
g f .
=
=
50
g
=
2089.0
Use the frequency distribution at the left to approximate the mean number of minutes that a sample of Internet subscribers spent online during their most recent session.
SOLUTION x
=
g 1x # f2
n
=
2089 50
L
41.8
So, the mean time spent online was approximately 41.8 minutes. minutes.
Try It Yourself 8 Use a frequency distribution to approximate the mean age of the residents of Akhiok. (See Try Try It Yourself Yourself 2 on page 37.) a. b. c. d.
Find the midpoint of each class. Find the sum of the products products of each midpoint and corresponding frequency. Find the sum of the frequencies. Answer: Page A32 Find the mean of the frequency distribution .
66
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
The Shape of Distributions A graph reveals several characteristics characteristics of a frequency frequency distribution. One such characteristic is the shape of the distribution.
DEFINITION A frequency distribution is symmetric when a vertical line can be drawn through the middle of a graph of the distribution and the resulting halves are approximately mirror images images.. A frequency distribution is uniform (or rectangular) when all entries entries,, or classes,, in the distribution have equal frequencies. classes frequencies. A uniform distribution is also symmetric. A frequ frequency ency distribu distribution tion is skewed skewed if the “tail” “tail” of the graph elongate elongatess more to one side side than to the other. other. A distributi distribution on is skewed left (negatively skewed) if its tail extends to the left. A distribution is skewed right (positively skewe skewed) d) if its tail extends to the right.
Insigh t n l in faa l l i wa ys f w i l l l l a l w n a e m e h on T bu t i io trr i b t he d i iss t n o io i t c e r ir i d t he taance, ns t For i in i iss s k e wed. on i iss bu t i io trr i b t w hen a d i iss o to i iss t t, t he mean e f t, s k e wed l le . d i iaan t o f t he me e f t t he l le
When a distribution When distribution is symmetri symmetricc and unimodal, unimodal, the mean, media median, n, and mode are equal. If a distribution distribution is skewed left, the mean is less less than the median median and the median is usually less than than the mode. mode. If a distribution distribution is skewed right, right, the mean is greater than the median and the median is usually greater than the mode. Examples of these commonly occurring distributions distributions are shown.
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5 1
3
5
7
9
11
Mean Median Mode
13
15
1
3
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5 5
7
9
Mean
9
11
13
15
Uniform Distribution
40
3
7
Mean Median
Symmetric Distribution
1
5
13
Mode Median
Skewed-Left Distribution
15
1
3
5
Mode
9
11
13
Mean Median
Skewed-Right Distribution
15
SECT SE CTIO ION N 2. 2.3 3
2.3 2. 3
Meas Me asur ures es of Ce Cent ntra rall Ten Tende denc ncyy
67
Exercises
Building Basic Skills and Vocabulary
Help
statement is true or false. True or False? In Exercises 1–4, determine whether the statement If it is false, rewrite it so it is a true statement. statement. 1. The median is the measure of central tendency most likely to be affected by
an extreme value (an outlier).
Student Stud y Pack
2. Every data set must have a mode. 3. Some quantitative data sets do not have a median. 4. The mean is the only measure of central tendency that can be used for data
at the nominal level of measurement. False. The mean is the measure of 1. False.The central tendency most likely to be affected by an extreme value (or outlier). False. Not all data sets must have 2. False. a mode. False. All quantitative data sets 3. False. have a median. 4. False.The mode is the only measure of central tendency that can be used for data at the nominal level of measuremen measurement. t.
5. Give an example in which the mean of a data set is not representative of a
typical number in the data set. 6. Give an example in which the median and the mode of a data set are
the same. approximate shape Graphical Analysis In Exercises 7–10, determine whether the approximate of the distributi distribution on in the histogram histogram is symmetr symmetric, ic, unif uniform, orm, skewe skewed d left, skewe skewed d right,, or none of these. Justi right Justify fy your answer. answer. 7.
5. A data set with an outlier within it would be an example. (Answers will vary.) 6. Any data set that is symmetric has the same median and mode. 7. The shape of the distribution is skewed right because the bars have a “tail” to the right.
Symmetric. If a vertical line is drawn 8. Symmetric. down the middle,the two halves look approximately the same. 9. The shape of the distribution is uniform because the bars are approximatelyy the same height. approximatel
8.
22 20 18 16 14 12 10 8 6 4 2
15 12 9 6 3
9.
10.
18
16
15 12
12
9
8
6 4
3 1 2 3 4 5 6 7 8 9 10 11 12
Answers, page A## 10. See Selected Answers, distribution of 11. (9), because the distribution values ranges from 1 to 12 and has (approximately) equal frequencies.
85 95 105 115 12 125 13 135 14 145 15 155
2 5, 5, 00 00 0 45,000 6 5, 5, 00 00 0 8 5, 5,0 00 00
52.5
62.5
72.5
82.5
with one of the graphs in Matching In Exercises 11–14, match the distribution with Exercises 7–10. Justify your decision.
Answers, page A## 12. See Selected Answers,
11. The frequency distribution distribution of 180 rolls of a dodecagon (a 12-sided die)
distribution has a 13. (10), because the distribution maximum value of 90 and is skewed left owing to a few students’scoring much lower than the majority of the students.
12. The frequency distribution of salaries at a company where a few executives
Answers, page A## 14. See Selected Answers,
make much higher salaries than the majority of employees 13. The frequency distribution distribution of scores on a 90-point test where a few students
scored much lower than the majority of students 14. The frequency distribution distribution of weights for a sample of seventh grade boys
68
CHA HAPT PTER2 ER2
15. (a)
x L
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
6.2
median mode
=
Using and Interpreting Concepts
6
Finding and Discussing the Mean, Median, and Mode In Exercises 15–32,
5
=
(b) Median Median,, bec becaus ause e the the distribution is skewed. 16. (a)
x =
(a) find the mean mean,, media median, n, and mode of the the data, data, if possibl possible. e. If it is not not possible possible,, explain why the measure of central tendency cannot be found.
19.6
median mode
=
(b) determine determine which measure measure of central central tendency tendency best represents represents the data. data. Explain your reasoning.
19.5
19, 20 20
=
17. (a)
x L
6 6 9 9 6 5 5 5 7 5 5 5 8
4.57
median mode
The education cost per student (in thousands of dollars) from a sample of 10 liberal arts colleges
16. Education
=
4.8
4.8
=
22 26 19 20 20 18 21 17 19 14
(b) Median, Median, becau because se there there are no outliers. 18. (a)
x =
The time (in seconds) for a sample of seven sports cars to go from 0 to 60 miles per hour
17. Sports Cars
184.6
median mode
=
3.7 4.0 4.8 4.8 4.8 4.8 5.1
182.5
none
=
18. Cholesterol The cholesterol level of a sample of 10 female employees
(b) Mean,because there are no outliers. 19. (a)
x L
mode
=
154 216 171 188 229 203 184 173 181 147 DATA
92.9
89.8 90.3 92.9 90.1 91.8
90.3, 91 91.8
=
(b) Median, Median, becau because se the dist distriburibution is skewed. 20. (a)
x =
61.2
median mode
=
=
55
x =
median mode
=
not possible =
not po possible
“ Worse”
(b) Mode, Mode, beca because use the data are at the nominal level of measurement. 22. (a)
x =
not possible
median mode
=
=
not po possible
“Watchful”
(b) Mode, Mode, beca because use the data are at the nominal level of measurement. 23. (a)
x L
170.63
median mode
=
=
169.3
none
(b) Mean,because there are no outliers.
88.0 95.3 90.3 92.0 91.8 92.8 89.7 103.5 85.4 105.2 97.2 94.5 96.7 88.7 93.3 98.2 94.8 90.7 102.8 97.1
94.0 98.0 91.5 94.2
The duration (in minutes) of every power failure at a residence in the last 10 years
20. Power Failures
80, 12 125
(b) Median, Median, becau because se the dist distriburibution is skewed. 21. (a)
The average points per game scored by each NBA team during the (Source: ce: NB NBA) A) 2003–2004 regular season (Sour
19. NBA
93.81
median
The maximum number of seats in a sample of 13 sport utility vehicles
15. SUVs
(b) Mean,because there are no outliers.
DATA
18 26 45 75 125 80 33 40 44 49 89 80 96 125 12 61 31 63 103 28 The responses of a sample of 1040 people who were asked if the air quality in their community is better or worse than it was 10 years ago
21. Air Quality
Better: 346 Worse: 450 Same: 244 22. Crime The responses of a sample of 1019 people who were asked how they
felt when they thought about crime Unconcerned: 34 Watchful: 672 Nervous: 125 Afraid: 188 23. Top Speeds
The top speed (in miles per hour) for a sample of seven
sports cars 187.3 181.8 180.0 169.3 162.2 158.1 155.7 24. Purchase Preference The responses of a sample of 1001 people who were
asked if their next vehicle purchase will be foreign or domestic Dom omes esttic ic:: 70 7044 For orei eign gn:: 25 2533 Don on’’t kn know ow:: 44 The recommended prices (in dollars) for several stocks that analysts predict should produce at least 10% annual returns (Source: Money)
25. Stocks
41 20 22 14 15 25 18 40 17 14
SECT SE CTIO ION N 2. 2.3 3
24. (a)
x =
not possible
median mode
=
x =
The number of weeks it took to reach a target weight for a sample of five patients with eating disorders treated by psychodynamic Journal of Consulting and Clinical Psychology) psychotherapy (Source: The Journal
not po possible
(b) Mode, Mode, becau because se the the data data are are at at the nominal level of measurement. 25. (a)
26. Eating Disorders
“Domestic”
=
15.0 31.5 10.0 25.5 1.0 27. Eating Disorders
The number of weeks it took to reach a target weight for a sample of 14 patients with eating disorders treated by psychodynamic (Source: ce: The Journal Journal of psychotherapy and cognitive behavior techniques (Sour
22.6
median mode
=
19
Consulting and Clinical Psychology)
14
=
2.5 20.0 11.0 10.5 17.5 16.5 13.0 15.5 26.5 2.5 27.0 28.5 1.5 5.0
(b) Median Median,, bec becaus ause e the the distribution is skewed. 26. (a)
x =
28. Aircraft The number of aircraft 11 airlines have in their fleets (Source:
16.6
median mode
=
(b) Mean,because ther there e are are no no outliers. 27. (a)
x L
14.11
median mode
Airline Transport Transport Association) Association)
15
none
=
=
14.25
819 366 573 280 375 567 444 145 102 26 37 29. Weights (in pounds) of Dogs at a Kennel
1 2 3 4 5 6 7 8 9 10
2.5
=
(b) Mean,because ther there e are are no no outliers. 28. (a)
x L
339.5
median mode
=
36 6
none
=
(b) Median Median,, bec becaus ause e the the distribution is skewed. 29. (a)
x =
41.3
median mode
=
39.5
45
=
31.
(b) Median Median,, bec becaus ause e the the distribution is skewed. 30. (a)
x L
30. Grade Poi Point nt Averages of Students in a Class
Key: 1 ƒ 0
02 147 78 155 07 5
=
10
0 1 2 3 4
mode
Key: 0 ƒ 8
=
0.8
6
Time (in minutes) it Takes Employees to Drive to Work
32. Top Speeds (in miles per hour) of High-Performance Sports Cars
=
2.35
4.0
=
5
10
15
20
25
30
35
40 200
(b) Mean,because ther there e are are no no outliers. x L
8 568 1345 09 00
2.5
median
31. (a)
69
Meas Me asur ures es of Ce Cent ntra rall Ten Tende denc ncyy
mode
=
210
215
220
Graphical Analysis In Exercises 33 and 34, the letters A,B, and C are marked on the
19.5
median
205
=
20
15
(b) Median Median,, bec becaus ause e the the distribution is skewed. Answers, page A## 32. See Selected Answers, mode de,, be beca caus use e it it’’s the the da data ta 33. A = mo entry that occurred most often. B = median,be ,beccause the distribution is skewed right. C = mean, b ec ecause the distribution is skewed right. Answers, page A## 34. See Selected Answers,
horizontal axis.Determine which is the mean,which is the median, and which is the the mode. Justify your answers. 33.
Sick Days Used by Employees 16 14 y12 c10 n
34.
Hourly Wages of Employees 16 14 y12 c10 n
e u 8 q e r 6 F
e u 8 q e r 6 F
4 2
4 2
10
14 16 18 20 22 24 26 28
AB C
Days
10 12 14 16 18 20 22
26 28
Hourly wageA B C
70
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
Mode, because the data are at the 35. Mode,because nominal level of measurement.
In Exercises 35–38, determine which measure of of central tendency best represents represents the graphed data without performing any calculations. Explain your reasoning.
distribution is 36. Median, because the distribution skewed.
35.
37. Mean, because there are no outliers.
distribution is 38. Median, because the distribution skewed.
y c n e u q e r F
39. 89.3 40. $32,640
36.
Are You Getting Enough Sleep? 120 100
Heights of Players on a Hockey Team
y c n e u q e r F
80 60 40 20
41. 2.8
8 7 6 5 4 3 2 1
Needmore Nee dmore Nee Needless dless Getthe correctt amount correc
69 70 71 72 73 74 75 76
Response
37.
Height (in inches)
Heart Rate of a Sample of Adults
y c n e u q e r F
45 40 35 30 25 20 15 10 5
38.
Body Mass Index (BMI) of People in a Gym
y c n e u q e r F
55
60
65
70
75
80
85
9 8 7 6 5 4 3 2 1 18
20
Heart rate (beats per minute)
22
24
26
28
30
BMI
Exercisess 39– 42, find the the weighted weighted mean mean of Finding the Weighted Mean In Exercise the data. 39. Final Grade The scores and their percent of the final grade for a statistics
student are given. What is the student’s mean score? Homework Quiz Quiz Quiz Project Speech Final Exam
Scor Sc ore e
Per erce cen nt of fi fina nall gra rade de
85 80 92 76 100 90 93
15% 10% 10% 10% 15% 15% 25%
40. Salaries
The average starting salaries (by degree attained) for 25 employees at a company are given. What is the mean starting salary for these employees?
8 with with MBAs: MBAs: $42 $42,50 ,5000 17 with with BAs BAs in busine business: ss: $28,0 $28,000 00 41. Grades A student receives the following grades, with an A worth 4 points, a B worth 3 points, points, a C worth 2 points, and a D worth 1 point. What is the student’s mean grade point score? B in in 2 three-credit classes A in 1 four-credit class
D in 1 two-credit cl class C in 1 three-credit class
SECT SE CTIO ION N 2. 2.3 3
42. 82
71
42. Scores
The mean scores for a statistics course (by major) are given. What is the mean score for the class?
43. 65.5 44. 70.1
8 engineering engineering majors: majors: 83 5 math majors majors:: 87 11 business business majors: majors: 79
45. 35.0 46. 15.3 47.
Meas Me asur ures es of Ce Cent ntra rall Ten Tende denc ncyy
Clas Cl asss
Freq Fr eque uenc ncy y, f Midpoint
3–4
3
3.5
5–6
8
5.5
7–8
4
7.5
9–10
2
9.5
11–12
2
11.5
13–14
1
13.5
g f
=
43–46, 46, approximate the mean of Finding the Mean of Grouped Data In Exercises 43– the grouped data. 43. Heights of Females
The heights (in inches) of 16 female students in a physical education class
20
Hospitalization
Height (in inc inches hes))
Frequ Fr equenc ency y
60 – 62 63 – 65 66 – 68 69 – 71
3 4 7 2
8 7 y 6 c 5 n e 4 u q 3 e r F2 1
44. Heights of Males
The heights (in inches) of 21 male students in a physical education class Height (in inc inches hes))
Frequ Fr equenc ency y
63– 65 66– 68 69– 71 72– 74 75– 77
2 4 8 5 2
5 . 5 . 5 . 5 . 5 . 5 . 3 5 7 9 1 3 1 1
Days hospitalized
45. Ages
Positively skewed
The ages of residents of a
46. Phone Calls
town Age
Frequency
0 –9 10 –19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89
57 68 36 55 71 44 36 14 8
The lengths of longdistance calls (in minutes) made by one person in one year Length of call
Number of calls
1 –5 6 –10 11–15 16 –20 21–25 26 –30 31– 35 36 – 40 41– 45
12 26 20 7 11 7 4 4 1
Identifying the Shape of a Distribution In Exercises 47–50, construct a frequency distribution and a frequency histogram of the data using the indicated number of classes. classe s. Describ Describe e the shape shape of the histogr histogram am as symmetri symmetric, c, unif uniform, orm, negat negatively ively skewed, positively skewed, skewed, or none of these. DATA
47. Hospitalization
Number of classes: Number classes: 6 Data set: The number of days 20 patients remained hospitalized 6 9 7 14 4 5 6 8 4 11 10 6 8 6 5 7 6 6 3 11
72
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
48.
48. Hospital Beds
Cla lasss
Freque uen ncy cy,, f Midpoint
1 2 7 – 1 61 1 6 2 – 1 96 1 9 7 – 2 31
9 8 3
1 44 1 79 2 14
2 3 2 – 2 66
3
2 49
267–301
1
284
g f
24
=
DATA
49. Height of Males DATA
Number of clas Number classes ses:: 5 Data set: The heights (to the nearest inch) of 30 males 67 76 69 68 72 68 65 63 75 69 66 72 67 66 69 73 64 62 71 73 68 72 71 65 69 66 74 72 68 69
Hospital Beds
y c n e u q e r F
9 8 7 6 5 4 3 2 1
50. Six-Sided Die DATA
144 179 214 249 284
Number of beds
Positively skewed 49.
Number of clas Number classes ses:: 5 Data set: The number of beds in a sample sample of 24 hospitals 149 167 162 127 130 180 160 167 221 145 137 194 207 150 254 262 244 297 137 204 166 174 180 151
Clas Cl asss
Freq Fr eque uenc ncy y, f Midpoint
Number of classes: Number classes: 6 Data set: The results of rolling a six-sided die 30 times 1 4 6 1 5 3 2 5 4 6 1 2 4 3 5 6 3 2 1 1 5 6 2 4 4 3 1 6 2 4
62– 64
3
63
65–67
7
66
68–70
9
69
51. Coffee Content During a quality assurance assurance check, the actual coffee content content
71–73
8
72
74–76
3
75
(in ounces) of six jars jars of instant instant coffee was was recorded as 6.03, 5.59, 6.40, 6.00, 5.99,, and 6.02. 5.99 6.02.
g f
=
30
Heights of Males y c n e u q e r F
9 8 7 6 5 4 3 2 1
52. U.S. Exports 63
66
69
72
75
Heights (to the nearest inch)
Symmetric Answers, page A## 50. See Selected Answers, 51. (a)
x =
6.005
median (b)
x =
=
6.01
5.945
median
=
6.01
(c)) Me (c Mean an 52. (a)
x L
29.63
median (b)
x L
(a) Find the mean and and the median of the coffee coffee content. (b) The third value was incorrectly incorrectly measured and is actually actually 6.04. Find the mean and median of the coffee content again. (c) Whi Which ch measu measure re of centra centrall tendenc tendencyy, the mean or the the median median,, was affected more by the data entry error?
=
18.3
22.34
median (c)) Me (c Mean an
=
17.25
The following data are the U.S. U.S. exports (in billions of dollars) U.S.. Department of Commerce) Commerce) to 19 countries for a recent year. (Source: U.S Canada Mexico Germany Taiwan Netherlands China Australia Malaysia Switzerland Saudi Arabia
160.8 97.5 26.6 18.4 18.3 22.1 13.1 10.3 7.8 4.8
Japan United Kingdom South Korea Singapore France Brazil Belgium Italy Thailand
51.4 33.3 22.6 16.2 19.0 12.4 13.3 10.1 4.9
(a) Find Find the the mean mean and median median.. (b) Find the mean and median without the U.S. exports to Canada. (c) Whi Which ch measu measure re of centra centrall tendenc tendencyy, the mean or the the median median,, was affected more by the elimination of the Canadian export data?
SECT SE CTIO ION N 2. 2.3 3
has the the 53. (a) Mean,because Car A has highest mean of the three.
53. Data Analysis
A consumer testing service obtained the following miles per gallon in five test runs performed with three types of compact cars.
(c) Mode, Mode, beca because use Car C has the highest mode of the three.
Car A: Car B: Car C:
midrange is the 54. Car A, because its midrange largest. 49.2
x L
(c)) Ke (c Key: y: 3 ƒ 6 1 2 3 4 5 6 7 8 9
(b) me median =
=
46.5
36
13 28 6667778 13467 1113 1234 2246 5 0
Run 1
Run 2
Run 3
Run 4
Run 5
28 31 29
32 29 32
28 31 28
30 29 32
34 31 30
(a) The manufacturer manufacturer of Car Car A wants wants to advertise that that their car performed performed best in this test. Which measure measure of central central tendency—mean, tendency—mean, median, or mode—should be used for their claim? Explain your reasoning. (b) The manufacturer of Car B wants to advertise that their car performed
mean
best in this test. Which measure of central central tendency—mean, tendency—mean, median, or mode—should be used for their claim? Explain your reasoning.
median
(c) The manufacturer manufacturer of Car Car C wants to advertise advertise that their their car performed performed best in this test. Which measure measure of central central tendency—m tendency—mean, ean, median, or mode—should be used for their claim? Explain your reasoning.
(d) Pos Positive itively ly skew skewed ed
54. Midrange The midrange is
56. (a) 49.2
(b)
73
Extending Concepts
(b) Median, Median, becau because se Car Car B has the highest median of the three.
55. (a)
Meas Me asur ures es of Ce Cent ntra rall Ten Tende denc ncyy
1Maximum data entry2
49.2; median = 46.5; mode = 36, 37, 51
x =
2
(c) Using a trimmed trimmed mean eliminates potential outliers that may affect the mean of all the entries. different symbols are needed needed 57. Two different because they describe a measure of central tendency for two different sets of data (sample is a subset of the population). 58. A distribution with one data entry in each class would be an example of a rectangular (uniform) distribution whose mean and median are equal and whose mode does not exist.
+
1Minimum data entry2
.
Which of the manufacturers in Exercise 53 would prefer to use the midrange statistic in their ads? Explain your reasoning. 55. Data Analysis DATA
Students in an experimental psychology class did research on depression as a sign of stress. stress. A test was administered administered to a sample of 30 students.The students. The scores are given. 44 51 11 90 76 36 64 37 43 72 53 62 36 74 51 72 37 28 38 61 47 63 36 41 22 37 51 46 85 13 (a) Find Find the mean of the the data. data. (b) Fi Find nd the median median of the data. data. (c) Draw a stem-and stem-and-lea -leaff plot for the data data using one one line per stem. stem. Loca Locate te the mean and median on the display. (d) Descr Describe ibe the shape shape of the distribut distribution. ion. To find the 10% trimmed mean of a data set, set, order the data, data, delete the lowest 10% of the entries and the highest 10% of the entries, entries, and find the mean of the remaining entries.
56. Trimmed Mean 2
1
1
2
3
4
5
6
(a) Find the 10% trimmed mean for the data data in Exercise Exercise 55. (b) Compare the four measures measures of central tendency tendency. (c) What is the benefit of using a trimmed trimmed mean versus using a mean found using all data entries? Explain your reasoning. 57. Writing
The population mean m and the sample mean x have essentially the same formulas. Explain why it is necessary to have two different symbols.
58. Writing
Describe in words the shape of a distribution that is symmetric but whose whose mean, mean, medi median, an, and mode mode are not all all equal. equal. Th Then en sketch sketch this distribution.
74
CHA CH APT PTE ER 2
2.4
Desscr De crip ipti tiv ve Sta Stattis isti ticcs
Measures Measu res of Variation
What You Should Learn • How to find find the range range of of a data set • How to find find the varian variance ce and standard deviation of a population and of a sample • How to to use the the Empiric Empirical al Rule and Chebychev’s Theorem to interpret standard deviation • How to appro approximat ximate e the sample standard deviation for grouped data
Rangee • De Rang Devi viat atio ion, n, Var Varian iance ce,, and Stan Standa dard rd Devi Deviat atio ionn • In Inte terp rpre retin tingg Stand Standar ardd Deviat Dev iation ion • Sta Standar ndardd Devia Deviatio tionn for for Grou Grouped ped Dat Dataa
Range In this section, you will learn different ways ways to measure the variation of a data set. The simplest measure measure is the range of the set.
DEFINITION The range of a data set is the difference between the maximum and minimum data entries in the set. Range
EXAMPLE
1Maximum data entry2 entry2
=
-
1Minimum data entry2 entry2
1
Finding the Range of a Data Set Two corporations each hired 10 graduates. The starting salaries for each are shown. Find the range of the starting salaries for Corporation A.
Starting Salaries for Corporation A (1000s (1000s of dollars) Salary
41
38
39
45
47
41
44
41
37
42
52
58
Starting Salaries for Corporation B (1000s (1000s of dollars) Salary
Insigh t e1 xaamp l le n E x s i in ts t e s a ta t a d h Bo t n o f 41.5, a ha ve a mea ode 1, a n d a m 41 4 f o n a ia i d e m tss se t e t t he t wo ye 1. And y 41 o f 4 y. ican t l y f ic gn i f fer s i ig f fe d i f ha t t he ference i iss t f fe T he d i f nd se t n t he seco es i in trr i ie en t on. er var i iaa t i io te e grea t ve ha v on i iss c n t h i iss se t i io Your goa l i in o measure to earn ho w t o l le to t t.. taa se t n o f a da t o io i t a ia i r a va v e h t
SOLUTION
40
23
41
50
49
32
41
29
Ordering the data helps to find the least and greatest salaries. 37 38 39 41 41 41 42 44 45 47
Minimum
Range
Maximum =
salary2 1Maximum salary2
=
47
=
10
-
-
salary2 1Minimum salary2
37
So, the range of the starting salaries salaries for Corporation A is 10, or $10,000.
Try It Yourself 1 Find the range of the starting salaries for Corporation B. B. a. Identify the minimum and maximum salaries. b. Find the range. c. Compare your answer with that for Example 1.
Answer: Page A32
SECT SE CTIO ION N 2. 2.4 4
Mea eassure ress of of Var Variiat atio ion n
75
Deviation, Variance, and Standard Deviation As a measure of variation, the range has the advantage of being easy to compute.Its compute. Its disadvantage, disadvantage, however, is that it uses only only two entries from the data set. Two measures of variation that use use all the entries in a data set are the variance varia nce and the standard deviation deviation.. Howe However, ver, befor beforee you learn about these measures of variation, you need to know what is meant by the deviation deviation of an entry in a data set.
DEFINITION
Note to Instructor Remind students of the reason for the difference between the symbols m and x.
Deviations of Starting Salaries for Corporation A Salary (100 (1 000s 0s of dollars) x
41 38 39 45 47 41 44 41 37 42 g x
=
Deviation of x
EXAMPLE
Deviation (100 (1 000s 0s of dollars) x
The deviation of an entry x in a population data set is the difference between the entry and the mean m of the data set. =
x
- m
2
Finding the Deviations of a Data Set Find the deviation of each starting salary for Corporation A given in Example 1.
M
SOLUTION
The mean starting salary is m = 415 415>> 10 = 41.5. To find out how much each salary deviates from the mean, subtract 41.5 from the salary salary.. For instance, the deviation of 41 (or $41,000) is
- 0.5 - 3.5 - 2.5
41
3.5 5.5 - 0.5 2.5 - 0.5 - 4.5 0.5 415
g 1 x
2
- m
-
x
41.5
= -
0.5 1or
-
$5002 . $5002
Deviation of x
=
x
- m
m
The table at the left lists the deviations of each of the 10 starting salaries. salaries.
Try It Yourself 2 Find the deviation of each starting salary for Corporation B given in Example 1. =
0
S tudy Tip uares add t he sq u o yo y n e h W ou yo ons, y a t i io o f t he de v i ia led t y ca l le e a quan t i t y te compu t ed te no t squares, de f o m u s e h t SS x.
a. Find the mean of the data set. b. Subtract the mean from each salary.
Answer: Page A32
In Example 2, notice that the sum of the deviations deviations is zero. Because this this is true for any data set, it doesn’t make sense to find the average of the deviations. deviations. To overcome this problem, problem, you can square each deviation. deviation. In a population data set, the mean of the squares of the deviations deviations is called the population variance.
DEFINITION The population variance of a population data set of N entries is Population variance
= s
2
=
g 1x
- m
22
N
The symbol s is the lowercase Greek letter sigma.
76
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
DEFINITION The population standard deviation of a population data set of N entries is the square root of the population variance. Population standard deviation
Note to Instructor We have used the formulas here that are derived from the definition of the population variance and standard deviation because we feel they are easier to remember than the shortcut formula. If you prefer to use the shortcut formula, we have included it on page 91.
Sum of Squares of Starting Salaries for Corporation A Salary Sala ry Deviatio Deviation n Squ Square aress x
41 38 39 45 47 41 44 41 37 42
x
M
1 x
- 0.5
0.25 12.25 6.25 12.25 30.25 0.25 6.25 0.25 20.25 0.25
- 3.5 - 2.5
3.5 5.5 - 0.5 2.5 - 0.5 - 4.5 0.5 g
=
0
2 M2
SS x =
s
=
- m
N
Finding the Population Variance and Standard Deviation In Words Words
In Symbols
1. Find the mean of the population data set.
m =
2. Find the deviation of each entry.
x
3. Square each deviation.
1x
4. Add to get the sum of squares.
SSx
N
2
22
- m
s
6. Find the square root of the variance to get the population standard deviation.
s =
EXAMPLE
gx
- m
5. Divide by N to get the population variance.
=
=
22
g 1x
- m
22
g 1x
- m
N
A
22
g 1x
- m
N
3
Finding the Population Standard Deviation Find the population variance and standard deviation of the starting salaries for Corporation A given in Example 1.
SOLUTION SSx
vaariance and he v th ha t t th tiice t No t tiion in viia t taandard de v s t e one more ve xaample 3 ha v E x he th han t th t decimal place vaalues. taa v da t original se t o f f rule -of f r he same round th his is t Th T e te o calcula t to was used t t w ha t th t he mean. th t
2 A
22
g 1x
GUIDELINES
The table at the left summarizes the steps used to find SSx.
88.5
Study T ip
= s =
2
=
88.5,
N
=
10,
s
2
=
88.5 10
L
8.9,
s =
8.85 2 8.85
L
3.0
So, the population variance variance is about 8.9, 8.9, and the population standard deviation is about 3.0, 3.0, or $3000. $3000.
Try It Yourself 3 Find the population standard deviation of the starting salaries for Corporation B given in Example 1. a. Find the mean and each deviation, as you did in Try It Yourself 2. b. Square each deviation and add to get the sum of squares. c. Divide by N to get the population variance. d. Find the square root of the population variance. e. Interpret the results by giving the population standard deviation in dollars. Answer: Page A32
SECT SE CTIO ION N 2. 2.4 4
77
Mea eassure ress of of Var Variiat atio ion n
DEFINITION
S tudy Tip
The sample variance and sample standard deviation of a sample data set of n entries are listed below.
ind t he ou f in yo en y h w t a h t e t te No ou yo vaar i iaance, y v f population e num ber o de b y N, t h v i id d i v ind ou f in yo n y es, bu t w he trr i ie en t u o yo y vaar i iaance, v t he sample ss - 1, one l lee n y b e d id i v v i d es. trr i ie ber o f en t m u n e h t n t ha
Sample variance
=
2
s
=
g 1x
Sample standard deviation
-
n
-
=
s
x 22
1 =
2 s A 1nx 2
g
=
x22
-
1
-
GUIDELINES Finding the Sample Variance and Standard Deviation Symbols in Variance and Standard Deviation Formulas
In Words Words
Populati Pop ulation on Samp Sample le Variance Standard deviation
2 s
s
2
s
s
In Symbols
1. Find the mean of the sample data set.
x
=
2. Find the deviation of each entry.
x
-
3. Square each deviation.
1x
4. Add to get the sum of squares.
SSx
Mean
m
x
5. Divide by n
Number of entries
N
n
6. Find the square root of the variance to get the sample standard deviation.
Deviation
x - m
Sum of squares
g 1 x -
2 m2
-
1 to get the sample variance.
s
2
s
x - x 2 g 1 x - x 2
EXAMPLE
gx
n x
-
=
g 1x
g 1x
=
=
x 22
n
A
x 22
-
g 1x
n
x 22
-
1 x22
-
1
4
Finding the Sample Standard Deviation See MINITAB MINITAB and TI-83 TI-83 steps on pages 114 and 115.
The starting salaries given in Example 1 are for the Chicago branches of Corporations Corporat ions A and B. B. Each corporation corporation has several other other branches, branches, and you plan to use the starting salaries of the Chicago branches to estimate the starting salaries for the larger larger populations. populations. Fin Find d the sample standard deviation of the starting salaries for the Chicago branch of Corporation A.
SOLUTION SSx
=
88.5,
n
=
10,
s2
=
88.5 9
L
9.8,
s
=
A
88.5 9
L
3.1
So, the sample variance is about 9.8, and the sample standard deviation deviation is about 3.1, or $3100. $3100.
Try It Yourself 4 Find the sample standard deviation of the starting salaries for the Chicago branch of Corporation B. a. Find the sum of squares, as you did in Try Try It Yourself Yourself 3. b. Divide by n - 1 to get the sample variance. c. Find the square root of the sample variance.
Answer: Page A32
78
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
EXAMPLE Office Rental Rates 35.00 23.75 36.50 39.25 37.75 27.00 37.00 24.50
33.50 26.50 40.00 37.50 37.25 35.75 29.00 33.00
37.00 31.25 32.00 34.75 36.75 26.00 40.50 38.00
5
Using Technology to Find the Standard Deviation Sample office rental rates (in dollars per square foot per year) for Miami’s central business district are are shown in the table. Use a calculator or a computer to find the mean rental rate and the sample standard deviation. (Adapted from Cushman & Wakefie Wakefield ld Inc.) Inc.)
SOLUTION
MINIT MIN ITAB, AB, Exc Excel, el, and the the TITI-83 83 each each have feat feature uress that that automatically calculate the mean and the standard deviation of data sets. Try using this technology to find the mean and the standard deviation of the office rental rates rates.. From the displays displays,, you can see that x L 33.73 and s L 5.09.
Descriptive Statistics Variable Rental Rates
N 24
Mean 33.73
Median 35.38
TrMean 33.88
StDev 5.09
Variable Rental Rates
SE Mean 1.04
Minimum 23.75
Maximum 40.50
Q1 29.56
Q3 37.44
Note to Instructor The standard deviations reported reported by MINITAB MINIT AB and Excel represent represent sample standard deviations.The TI-83 also reports s, the population standard deviation. Ask students to compare the values of s and s shown from the same data.
A 1 2 3 4 5 6 7 8 9 10 11 12 13
B
Mean Standard Error Median Mode Stan St anda dard rd De Devi viat atio ion n Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count
33.72917 1.038864 35.375 37 5.08 5. 0893 9373 73 25.90172 -0.74282 -0.70345 16.75 23.75 40.5 809.5 24
1-Var Stats x=33.72916667 x=809.5 2 x =27899.5 Sx=5.089373342 x=4.982216639 n=24
Sample Mean Sample Standard Deviation
Try It Yourself 5 Sample office rental rates (in dollars per square foot per year) for Seattle’s central business district district are listed. Use a calculator or a computer to find the mean rental rate and the sample standard deviation. (Adapted from Cushman & Wakefield Inc.)
40.00 36.75 29.00
43.00 35.75 35.00
46.00 38.75 42.75
40.50 38.75 32.75
35.75 36.75 40.75
39.75 38.75 35.25
32.75 39.00
a. Enter the data. b. Calculate the sample mean and the sample standard deviation. Answer: Page A32
SECT SE CTIO ION N 2. 2.4 4
In s igh t a r e
u e s a l u l l l d a t a v d a n e h W d a r h e s t a n e r w i s s e, e q u a l l,, t i s t h O . 0 s o n a t i o o n a t i o d e v i a d e v i a d r a d n t h e s t a o s i t v e. v e p t i m u s t b
Mea eassure ress of of Var Variiat atio ion n
79
Interpreting Standard Deviation When interpreting the standard deviation, deviation, remember that it is a measure of the typical amount an entry deviates deviates from the mean. The more the the entries are spread out, the greater the standard deviation.
y
8 7 6 5 4 3 2 1
c n e u q er F
x = 5 s=0 y
8 7 6 5 4 3 2 1
c n e u q er F
1 2 3 4 5 6 7 8 9
x = 5 s 1.2 y c n
≈
e u q er F
x = 5 s 3.0 ≈
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Data value
EXAMPLE
8 7 6 5 4 3 2 1
Data value
Data value
6
Estimating Standard Deviation Without Wit hout calculating,estimate calculating, estimate the population standard deviation of each data set. 1. y c n e u q er F
8 7 6 5 4 3 2 1
2. N = 8 µ =
y
4 c n e u q er F
0 1 2
3 4 5 6 7
Data value
8 7 6 5 4 3 2 1
3. N = 8 µ =
y
4 c n e u q re F
0 1 2
3 4 5 6 7
Data value
8 7 6 5 4 3 2 1
N = 8 µ =
4
0 1 2
3 4 5 6 7
Data value
SOLUTION 1. Each of of the eight eight entri entries es is 4. 4. So, each deviat deviation ion is 0, which impl implies ies that that s =
0.
2. Each of the eight entri entries es has a devia deviation tion of ; 1 . So, the popul population ation stand standard ard deviation should should be 1. By calculating, calculating, you can see that s =
1.
3. Each of the eight entri entries es has a devia deviation tion of ; 1 or ; 3 . So, the popul population ation standard deviation deviation should be about 2. By calculating, calculating, you can see that s L
2.24.
Try It Yourself 6 Write a data data set that has has 10 entries, entries, a mean of 10, and a population standard deviation that is approximately 3. (There are many correct answers.) answers.) a. Write a data set that has five entries that are three units less than 10 and five
entries that are three units more than 10. b. Calculate the population standard deviation to check that s is approxi Answer: Page A32 mately 3.
80
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
Picturing the World A survey was conducted by the National Center for Health Statistics to find the mean height of males in the U.S. The histogram shows the distribution of heights for the 2485 respondents in the 20–29 age group.. In this group this group, group, the mean mean was 69.2 inches and the standard deviation was 2.9 inches.
Many real-life data sets have distributions that are approximately symmetric and bell bell shape shaped. d. Later in the text, you will study this type of distribution in detail det ail.. For now now,, how howeve ever, r, the following Empirical Rule can help you see how valuable the standard deviation can be as a measure of variation.
Bell-Shaped Distribution 99.7% within 3 standard deviations 95% within 2 standard deviations 68% within 1 standard deviation
34% 2.35%
Heights of Men in the U.S. Ages 20–29
2.35% 13.5%
x 3s −
y 14 c n ) 12 e t u n 10 q e e c 8 r r f e e p 6 v n i t i 4 a ( l 2 e R
34%
x 2s −
13.5%
x s −
x
x + s
x + 2s
x + 3s
Empirical Rule (or 68-95-99.7 Rule) 62 64 66 68 70 72 74 76 78
For data with a (symmetric) For (symmetric) bell-shape bell-shaped d distribution, distribution, the standard standard deviation has the following characteristics.
Height (in inches)
About what percent of the heights lie within two standard standar d deviations of the mean?
1. About 68% of the data lie within one standard deviation deviation of the the mean. 2. About 95% of the data lie lie within two two standard deviations deviations of the mean. 3. About 99.7% of the data lie within three standard deviations of the mean.
EXAMPLE
Insight
7
Using the Empirical Rule
ha t lie more th vaalues t taa v Da t viia taandard de v han t wo s t th t he mean are th frrom t tiions f t taa usual. Da t considered un han th ha t lie more t th vaalues t v ti t a i vi ions taandard de v hree s t th t er y ve he mean are v th frrom t f unusual.
In a survey conducted by the National Center for Health Health Statistics, Statistics, the sample mean height of women in the United United States (ages 20–29) was 64 inches, with a sample standard deviation of 2.75 inches. inches. Estimate the percent of the women women whose heights are between 64 inches and 69.5 inches.
SOLUTION
The distribution distribution of the women’s heights heights is shown. Because the distribution is bell shaped, you can use the Empirical Rule. The mean height height is 64, so when you add two standard standard deviations to the mean mean height, you get x
Heights of Women in the U.S. Ages 20–29
+
2s
=
64
+
212.75 2.7522
=
69.5.
Because 69.5 is two standard standard deviations above the mean height, the percent of the heights between 64 inches and 69.5 inches is 34% + 13.5% = 47.5%. Interpretation So, 47.5% of women are between between 64 and 69.5 inches tall.
Try It Yourself 7 34%
Estimate the percent of the heights that are between 61.25 and 64 inches inches.. 13.5%
55.75 58.5 61.2 61.25 5 x 2s x 3s x s x −
−
−
64 x
66.7 66 .75 5 69.5 69.5 72.25 72.25 x + 2s x + s x + 3s
a. How many standard deviations is 61.25 to the left of 64? b. Use the Empirical Rule to estimate the percent of the data between x - s and x . Answer: Page A32 c. Interpret the result in the context of the data.
SECT SE CTIO ION N 2. 2.4 4
81
Mea eassure ress of of Var Variiat atio ion n
The Empirical Rule applies only to (symmetric) bell-shaped distributions. What if the distribution distribution is not bell-shaped, or what if the shape of the distribution is not known? The following theorem applies to all distributions distributions.. It is named after the Russian statistician Pafnuti Chebychev (1821–1894). Note to Instructor Explain that k represents the number of standard deviations from the mean. Ask students to calculate the percents for k = 4 and k = 5 . Then ask them what happens as k increases.Point out that it is helpful to draw a number line and mark it in units of standard deviations.
Insight ’s ycche v ’s y xaample 8, Cheb In E x ha t a t th ou t yo ells y te heorem t Th T tiion he popula t th f t leas t 75 % o f he age o f th er t o f Florida is und t,, emen t te taa t trrue s t his is a t Th 88.8. T g n o r tr t y as s t is no t nearl y bu t i t be o c emen t as uld te taa t a s t e h th t frrom reading made f ogram. to his t v ’ ’ss ycche v y In general, Cheb tiious ves cau t heorem gi ve Th T he percen t th f t es o f te tiima t es t taandard thin k s t ying wi th l yi he mean. th f t tiions o f viia t de v heorem th he t th Remember, t tiions. trribu t o all dis t to applies t
Chebychev’s Theorem The portion of any data set lying within k standard deviations 1k the mean is at least 1 1 - 2.
7
12 of
k
• k = 2: In any data set, at least 1 - 12 = 34 , or 75%, of the data lie within 2 2 standard deviations of the mean. • k = 3: In any data set, at least 1 - 12 = 89 , or 88.9%, of the data lie 3 within 3 standard deviations of the mean.
EXAMPLE
8
Using Chebychev’s Theorem The age distributions for Alaska and Florida are shown in the histograms. Decide which is which. Apply Chebychev’s Theorem to the data for Florida using k = 2 . What can you conclude? 120
)s d n
100 80
n
in(
ht
o
60
la
40
o
20
u
as
oi t u
d
µ = 39.2
u 2000
σ =
31.6 σ = 19.5
n
µ =
as o th
24.8
ni 1500 ( n iot
al 1000 u p o
p P
)s 2500
P
5
15
25
35 35
45 45
55 55
Age (in years)
65 65
75 75
85 85
500
5
15
25
35 35
45 45
55 55
65 65
75 75
85 85
Age (in years)
SOLUTION
The histogram on the right shows Florida’s age distribution. You can tell because the population is greater and older. Moving two standard deviations to the left of the mean puts you below 0, because m - 2s = 39.2 - 2124.8 24.822 = - 10.4. Moving two standard deviations to the right of the mean puts you at m + 2s = 39.2 + 2124.8 24.822 = 88.8. By Chebychev’s Theorem, Theorem, you can say that at least 75% of the population of Florida is between 0 and 88.8 years old.
Try It Yourself 8 Apply Chebychev’s Theorem Theorem to the data for Alaska using k
=
2.
a. Subtract two standard deviations from the mean. b. Add two standard deviations to the mean. c. Apply Chebych Chebychev’s ev’s Theorem for k = 2 and interpret the results. Answer: Page A32
82
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
Standard Deviation for Grouped Data In Section 2.1, you learned that large data sets sets are usually best represented represented by a frequency distribution. The formula for the sample sample standard deviation for a frequency distribution is
Sample standard deviation where n
=
1 1 1 1 3 1 3 2 4 0
3 2 1 5 0 1 6 3 1 3
1 2 0 0 3 6 6 0 1 0
1 1 0 3 1 0 1 1 2 2
1 0 0 6 1 1 2 1 2 4
A
g 1x
-
x22 f
n
-
1
Finding the Standard Deviation for Grouped Data You collect a random sample of the number of children per household in a region. The results results are shown at the left. Find the sample sample mean and the sample standard deviation of the data set.
SOLUTION
These data could be treated as 50 individual individual entries, entries, and you could use the formulas for mean and standard standard deviation. Because there are so many repeated numbers, numbers, however, it is easier to use a frequency frequency distribution.
x
or fo ormu l laas f fo r t ha t f e b m e m o e to R ou t yo a requ i irre y ta t a d d e p u gro es. frrequenc i ie y b y t he f p l y t i ip mu l t
=
9
x
f
x f x f
0 1 2 3 4 5 6
10 19 7 7 2 1 4
0 19 14 21 8 5 24
g
S tudy Tip
s
g f is the number of entries in the data set.
EXAMPLE Number of Children in 50 Households
=
=
g xf
n
g
50
=
=
91 50
L
=
x
- 1.8 - 0.8
0.2 1.2 2.2 3.2 4.2
x
1 x x 22
1 x
3.24 0.64 0.04 1.44 4.84 10.24 17.64
32.40 12.16 0.28 10.08 9.68 10.24 70.56 g
91
1.8
2
x 2 f
=
145.40
Sample mean
Use the sum of squares to find the sample standard deviation. s
=
A
g 1x
-
x22 f
n
-
1
=
A
145.4 49
L
1.7
Sample standard deviation
So, the sample mean is 1.8 children, and the standard deviation deviation is 1.7 children. children.
Try It Yourself 9 Change three of the 6s in the the data set to 4s. 4s. How does this change affect affect the sample mean and sample standard deviation? a. b. c. d.
Write the first three columns of a frequency distribution. Find the sample mean. Complete the last three columns of the frequency distribution. Answer: Page A32 Find the sample standard deviation.
SECT SE CTIO ION N 2. 2.4 4
83
Mea eassure ress of of Var Variiat atio ion n
When a frequency distribution distribution has classes, classes, you can estimate the sample mean and standard deviation by using the midpoint of each class class..
EXAMPLE 10 Using Midpoints of Classes The circle graph at the right shows the results of a survey in which 1000 adults were asked how much they spend in preparation for personal travel each year. year. Make a frequency distribution for the data. dat a. Then use use the the tabl tablee to estimate the sample mean and the sample standard deviation of the data set. (Adapted from Travel Industry Association Association of America)
SOLUTION Begin by using a frequency distribution to organize the data.
Class
x
f
x f x f
0 – 99 1 00 – 1 99 2 00 – 2 99 3 00 – 3 99 4 00 – 4 99 5 00 +
49.5 149.5 249.5 349.5 449.5 599.5
380 23 0 21 0 50 60 70
18,810 34,385 52,395 17,475 26,970 41,965
g
S tudy Tip as ss i iss open, W hen a c l laa ou mus t yo c l laass, y t s a la l e h t n i in o to ue t vaa l lu e v ng l le gn a s i in ass i ig t.. For n dpo i in t he m i id represen t t ed te ec t e, we se l le l le p m a x e s is i h t 5 9 9.5.
x
=
g xf
n
=
=
1,000
g
192,000 1,000
=
=
x
x
- 142.5 - 42.5
57.5 157.5 257.5 407.5
1 x x 22 20,306.25 1,806.25 3,306.25 24,806.25 66,306.25 166,056.25 g
192,000
192
1 x
2
x 2 f
7,716,375.0 415,437.5 694,312.5 1,240,312.5 3,978,375.0 11,623,937.5 =
25,668,750.0
Sample mean
Use the sum of squares to find the sample standard deviation. s
=
A
g 1x
-
x22 f
n
-
1
=
A
25,668,750 999
L
160.3
Sample standard deviation
So, the sample sample mean is $192 $192 per year, year, and the sample sample standard standard deviati deviation on is about $160.3 per year.
Try It Yourself 10 In the frequency distribution, distribution, 599.5 was chosen to represent the class class of $500 or more. How would the sample mean mean and standard deviation change if you used 650 to represent this class? a. b. c. d.
Write the first four columns of a frequency distribution. Find the sample mean. Complete the last three columns of the frequency distribution. Answer: Page A32 Find the sample standard deviation.
84
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
Exercises
2.4 2. 4
Building Basic Skills and Vocabulary In Exercises Exercises 1 and 2, find the range, range, mean mean,, varia variance, nce, and standard standard deviation deviation of the population data set.
Help
1. 11
10 8 4 6 7 11 6 11 7
2. 13
23 15 13 18 13 15 14 20 20 18 17 20 13
Student Stud y Pack
In Exercises Exercises 3 and 4, find the range, range, mean mean,, varia variance, nce, and standard standard deviation deviation of the sample data set. 3. 15
mean = 8.1, 1. Range = 7, me variance L 5.7, standard deviation L 2.4 2. Range Mean
=
10
Variance
L
L
3.2 3. 2
mean 3. Range = 14, me variance L 21.6, standard deviation
L
11.1,
L
4.6
L
7.7 7. 7
Mean
=
3 4 5 6 7 8 9
19 L
59.6
Stan St anda darrd de devi viat atiion
39 Key: 2 ƒ 3 002367 012338 0119 1299 59 48 0256
5. 2
17.9
L
Variance 5. 73
26 27 23 9 1 4 8 8 26 15 15 27 11
represented by the display or graph.
10.2
Stan St anda darrd de devi viat atiion
4. Range
4. 24
Graphical Reasoning In Exercises Exercises 5 and 6, find the range of the data set
16.6
L
8 12 5 1 9 14 8 6 13
=
23
6. 10
7. The range is the difference between the maximum and minimum values of a data set.The advantage of the range is that it is easy to calculate.The disadvantage is that it uses only two entries from the data set. 8. A deviation 1 x - m2 is the difference differenc e between an observation x and the mean of the data m . The sum of the deviations is always zero. squared. 9. The units of variance are squared. Its units are meaningless. meaningless. (Example: doll do llar arss2) 10. The standard deviation is the positive square root of the variance.The standard deviation and variance can never be negative.Squared negative. Squared deviations can never be negative.
57, 7, 7, 7, 76
6.
Bride’s Age at First Marriage 8
y 6 c n e u
q 4 e r F
2
24 25 26 27 28 29 30 31 32 33 34
Age (in years)
7. Explain how to find the range of a data set. What is an advantage of using
the range as a measure of variation? What is a disadvantage? 8. Explain how to find the deviation of an entry in a data set. What is the sum
of all the deviations in any data set? 9. Why is the standard deviation used more frequently than the variance? (Hint: Consider the units of the variance.) 10. Explain the relationship between between variance and standard deviation. Can
either of these measures be negative? Explain. Find a data set set for which n = 5 , x = 7 , and s = 0 .
SEC ECTI TION2.4 ON2.4
11. (a) Range
=
25.1
(b) Range
=
45.1
(c) Changing Changing the the maximum maximum value value of the data set greatly affects the range. 12. 53 , 3 , 3 , 7 , 7 , 76 13. (a) has a standard deviation of 24 and (b) has a standard deviation of 16, because the data in (a) have more variability. 14. (a) has a standard deviation of 2.4 and (b) has a standard deviation of 5 because the data in (b) have more variability. 15. When calculating the population standard deviation, you divide the sum of the squared deviations by n, then take the square root of that value.When calculating the sample standard deviation, deviation, you divide the sum of the squared deviations by n - 1, then take the square root of that value. 16. When given a data set,one would have to determine if it represented the population or was a sample taken from the population. If the data are a population, then s is calculated.If the data are a sample, then s is calculated. 17. Company B 18. Player B
11. Marriage Ages
85
Meas Me asu ures of Var ariiat atiion
The ages of 10 grooms at their first marriage are given below. below.
24.3 46.6 41.6 32.9 26.8 39.8 21.5 45.7 33.9
35.1
(a) Fi Find nd the range range of the data data set. set. (b) Chang Changee 46.6 to 66.6 66.6 and find the range range of the new data data set. (c) Compare your answer to part (a) with with your answer answer to part (b). 12. Find a population data set that contains six entries, entries, has a mean of 5, and has
a standard deviation of 2.
Using and Interpreting Concepts 13. Graphical Reasoning
Both data sets have a mean of 165. One has a standard deviation of 16, deviation 16, and the other other has a standard standard deviat deviation ion of 24. Whi Which ch is which? Explain your reasoning. 89 Key: 12 ƒ 8 558 12 0067 459 1368 089 6 357
(a) 12 13 14 15 16 17 18 19 20
=
128
(b) 12 13 14 15 16 17 18 19 20
1 235 04568 112333 1588 2345 02
14. Graphical Reasoning
Both data sets represented below have a mean of 50. One has a standard deviation of 2.4, and the other has a standard deviation of 5. Which is which? Explain Explain your reasoning. (a)
(b) 20
20
y 15 c n e u q 10 e r F
y 15 c n e u q 10 e r F
5
5
45 42 45
48
51
54
Data value
57
60
45 42 45
48
51
54
57
60
Data value
15. Writing
Describe the difference between the calculation of population standard deviation and sample standard deviation.
16. Writing
Given a data set, how do you know whether to calculate s or s?
17. Salary Offers
You are applying for a job at two companies. Company A offers starting salaries with m = $31,000 and s = $1000. Company B offers starting salaries with m = $31,000 and s = $5000. From which company are you more likely to get an offer of $33,000 or more?
18. Golf Strokes
An Internet site compares the strokes per round of two professional profes sional golfer golferss. Whi Which ch golfer golfer is more more consistent consistent:: Playe Playerr A with m = 71.5 strokes and s = 2.3 strokes strokes,, or Player B with m = 70.1 strokes and s = 1.2 strokes?
86
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
Loss Ange Angeles les:: 17. 17.6, 6, 37. 37.35,6.11 35,6.11 19. (a) Lo Long Beach Beach:: 8.7, 8.71,2.95 (b) It appears appears from from the the data data that the annual salaries in Los Angeles are more variable than the salaries in Long Beach.
Comparing Two Data Sets In Exercises 19–22, you are asked to compare compare two data data sets and interpret the results.
Salaries es 19. Annual Salari
Sample annual salaries (in thousands of dollars) for municipal employees in Los Angeles and Long Beach are listed. Los Angeles: 20.2 Long Beach: 20.9
(a)) Da Dall llas as:: 18 18.1 .1,, 37. 37.33 33,, 6. 6.11 11 20. (a Houston:13, Hous ton:13, 12.26,3.50 (b) It appears appears from from the the data data that the annual salaries in Dallas are more variable than the salaries in Houston.
(a) Fi Find nd the range, range, varia variance nce,, and standard standard deviati deviation on of each data data set. (b) Interpret the results in the context of the real-life real-life setting. setting. Salaries es 20. Annual Salari
Sample annual salaries (in thousands of dollars) for municipal employees in Dallas and Houston are listed.
21. (a) Mal Males:405; es:405; 16, 16,225 225.3;127.4 .3;127.4
Dallas: 34.9 Houston: 25.6
Females Fem ales:: 552; 34,575 34,575.1; .1; 185.9 (b) It appears appears from from the the data data that the SAT scores for females are more variable than the SAT scores for males. 22. (a) Pu Publi blicc teache teachers:5.1, rs:5.1, 2.9 2.95, 5, 1.7 1.72 2
SATT Sco Scores res Sample SAT scores for eight males and eight females are listed. 21. SA Male SAT scores: 1059 1328 1175 1123 923 1017 1214 1042 Female SAT scores: 1226 965 841 1053 1056 1393 1312 1222
(b) It appears appears from from the the data data that the annual salaries for public teachers are more variable than the salaries for private teachers.
(a) Fi Find nd the range, range, varia variance nce,, and standard standard deviati deviation on of each data data set. (b) Interpret the results in the context of the real-life real-life setting. setting. 22. Annual Salaries
Sample annual salaries (in thousands of dollars) for public and private elementary school teachers are listed.
Data set (ii) has more entries that are farther away from the mean.
Public teachers: 38.6 Private teachers: 21.8
(b) The three three data data sets have the the same mean but have different standard deviations.
38.1 38.7 36.8 34.8 35.9 39.9 36.2 18.4 20.3 17.6 19.7 18.3 19.4 20.8
(a) Fi Find nd the range, range, varia variance nce,, and standard standard deviati deviation on of each data data set. (b) Interpret the results in the context of the real-life real-life setting. setting.
Least sample standard deviation: (iii) Data set (iii) has more entries that are close to the mean.
25.7 17.3 16.8 26.8 24.7 29.4 32.7 25.5 23.2 26.7 27.7 25.4 26.4 18.3 26.1 31.3
(a) Fi Find nd the range, range, varia variance nce,, and standard standard deviati deviation on of each data data set. (b) Interpret the results in the context of the real-life real-life setting. setting.
Private Priv ate teachers teachers:: 4.2, 1.99,1.41
Greatest sampl sample e standard standard 23. (a) Greatest deviation: (ii)
26.1 20.9 32.1 35.9 23.0 28.2 31.6 18.3 18.2 20.8 21.1 26.5 26.9 24.2 25.1 22.2
Reasoning with Graphs In Exercises Exercises 23–26, you are are asked to to compare compare three data sets. 23. (a) Wi Without thout calculat calculating, ing, which data data set has the greatest greatest sample standard standard
deviation? Which has the least sample sample standard deviation? Explain Explain your reasoning. (i)
(ii)
(iii)
6
6
6
y 5 c n 4 e
y 5 c n 4 e
y 5
er
er
er
u
c
n 4 e
u
q 3
u
q 3
F 2
q 3
F 2
1
F 2
1 4 5
6 7
8 9 10 10
Data value
1 4 5
6 7
8 9 10 10
4 5
Data value
(b) How are the the data sets sets the same? How How do they differ? differ?
6 7
8 9 10 10
Data value
SEC ECTI TION2.4 ON2.4
Greatest sample sample stand standard ard 24. (a) Greatest deviation devi ation:: (i) Data set (i) has more entries that are farther away from the mean. Least sample standard deviation: (iii) Data set (iii) has more entries that are close to the mean. (b) The three three data data sets have have the same mean, mean, medi median, an, and mode but have different standard deviations. 25. (a) Greatest Greatest sample sample stand standard ard deviation: (ii) Data set (ii) has more entries that are farther away from the mean.
87
Meas Me asu ures of Var ariiat atiion
24. (a) Wi Without thout calculat calculating, ing, whic which h data set has the greatest greatest sample sample standard standard
deviation? Which Which has the least sample standard deviation? Explain your reasoning. (i) 0 1 2 3 4
9 58 3377 25 1
Key: 4 ƒ 1
(ii) 0 9 1 5 2 333777 3 5 4 1
=
Key: 4 ƒ 1
41
=
(iii) 0 1 5 2 33337777 3 5 4 Key: 4 ƒ 1
41
=
41
(b) How are the the data sets the the same? How How do they differ? differ? 25. (a) Wi Without thout calculat calculating, ing, whic which h data set has the greatest greatest sample sample standard standard
deviation? Which has the least sample standard deviation? Explain your reasoning. (i)
(ii)
(iii)
Least sample standard deviation: (iii) Data set (iii) has more entries that are close to the mean. (b) The three three data data sets have have the same mean, mean, medi median, an, and mode but have different standard deviations. Greatest sample sample stand standard ard 26. (a) Greatest deviation: (iii) Data set (iii) has more entries that are farther away from the mean.
10
11
12
13
14
10
11
12
13
14
10
11
12
13
14
(b) How are the the data sets the the same? How How do they differ? differ? 26. (a) Wi Without thout calculat calculating, ing, whic which h data set has the greatest greatest sample sample standard standard
deviation? Which has the least sample standard deviation? Explain your reasoning. (i)
(ii)
(iii)
Least sample standard deviatio devi ation: n: (i) Data set (i) has more entries that are close to the mean. (b) The three three data data sets have have the same mean and median but have different modes and standard deviations. 27. Similarity: Both estimate proportions of the data contained within k standard deviations of the mean. Difference: The Empirical Rule Difference: assumes the distribution is bell shaped; Chebychev’s Theorem makes no such assumption. 28. You must know that the distribution is bell shaped. 29. 68%
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
(b) How are the the data sets the the same? How How do they differ? differ? 27. Writing
Discuss the similarities and the differences between the Empirical Rule and Chebychev’s Theorem. Theorem.
28. Writing
What must you know about a data set before you can use the Empirical Rule?
Using the Empirical Rule In Exercises 29–34,you are asked to use the Empirical Rule. 29. The mean value of land and buildings per acre from a sample of farms is
$1000, with a standard deviation deviation of $200. The data set has a bell-shaped bell-shaped distribution. Estimate the percent of farms farms whose land and building values per acre are between $800 and $1200.
88
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
30. The mean value of land and buildings per acre from a sample of farms is
30. Between $500 and $1900 31. (a) 51
(b) 17
32. (a) 38
(b) 19
$1200, with a standard deviation deviation of $350. Between what what two values do about 95% of the data lie? (Assume the data set has a bell-shaped distribution.) 31. Using the sample statistics from Exercise 29, do the following. following. (Assume the
33. $12 $1250,$1375,$1450 50,$1375,$1450,, $55 $550 0
number of farms in the sample is 75.)
34. $1950,$475, $2050 35. 24 36. 148.07, 56.672; so, at least 75% of the 400-meter dash times lie between 48.07 and 56.67 seconds. 37. Sample mean
L
2.1
Sample standard deviation
L
1.3
(a) Use the Empirical Empirical Rule to estimate the number of farms whose land and building values per acre are between $800 and $1200. (b) If 25 additional additional farms farms were were sampled, sampled, about how many many of these these farms would you expect to have land and building values between $800 per acre and $1200 per acre? 32. Using the sample statistics from Exercise 30, do the following. following. (Assume the number of farms in the sample is 40.) (a) Use the Empirical Empirical Rule to estimate the number of farms whose land and building values per acre are between $500 and $1900. (b) If 20 additional additional farms farms were were sampled, sampled, about how many many of these these farms would you expect to have land and building values between $500 per acre and $1900 per acre? 33. Using the sample statistics from Exercise 29 and the Empirical Rule,
determine which of the following farms, farms, whose land and building values per acre are given, are outliers (more than two standard deviations from the mean). $1250, $1375, $1125, $1450, $550, $800 34. Using the sample statistics from Exercise 30 and the Empirical Rule,
determine which which of the following farms, farms, whose land and building values per acre are given, are outliers (more than two standard standard deviations from the mean). $1875, $1950, $475, $600, $2050, $1600 35. Chebychev’s Theorem
Old Faithful is a famous geyser at Yellowstone Yellowstone National Park. From a sample with n = 32 32,, the mean duration of Old Faithful’s eruptions is 3.32 minutes and the standard deviation is 1.09 minutes. minutes. Using Chebychev’s Theorem, Theorem, determine at least how many of the eruptions lasted Park) between 1.14 minutes and 5.5 minutes. (Source: Yellowstone National Park)
36. Chebychev’s Theorem
The mean time in a women’s 400-meter dash is 52.37 seconds, with a standard deviation deviation of 2.15. Apply Chebychev’s Theorem to to the data using k = 2. Interpret the results results..
Calculating Using Grouped Data In Exercises Exercises 37–44, use the the grouped grouped data formulas to find the indicated mean and standard deviation. 37. Pets per Household The results of a
s 12 d
random sample of the number of pets per household in a region are shown in the histogram histogram.. Esti Estimate mate the sample mean and the sample standard deviation of the data set.
11
l o
10
h 10 es
8
o
f
6
h
o
u r N
u
m
b
e
7
7
5
4 2 0
1
2
3
Number of pets
4
SEC ECTI TION2.4 ON2.4
38. Sample mean
L
Sample de deviation
1.7 L
89
Meas Me asu ures of Var ariiat atiion
38. Cars per Household A random sample of households in a region and the
number of cars per household household are shown in the histogram. Estimate the sample mean and the sample deviation of the data set.
0.8
39. See Odd Answers, Answers, page A## 40. See Selected Answers, Answers, page A## Answers, page A## 41. See Odd Answers,
s
Answers, page A## 42. See Selected Answers,
h
24
dl 25 o
es 20 u
15
o 15 h f o
r 10
8
e b m
5
u
3
N
0
1
2
3
Number of cars
39. Football Wins
The number of wins for each National Football League team in 2003 are listed. Make a frequency distribution (using five five classes) for the data set. Then approximate approximate the population mean and the population Football League) standard deviation of the data set. (Source: National Football
DATA
14 10 6 6 10 8 6 5 12 12 5 5 13 10 4 4 12 10 5 4 1 0 9 7 5 11 11 8 7 5 12 12 1100 7 4 40. Water Consumption
The number of gallons of water water consumed per day by a small village are listed. listed. Make a frequency distribution distribution (using five classes) for the data set.Then set. Then approximate the population population mean and the population standard deviation of the data set.
DATA
167 180 192 173 145 151 174 175 178 160 195 224 244 146 162 146 177 163 149 188 41. Amount of Caffeine The amount of caffeine in a sample of five-ounce servings
of brewed coffee is shown in the histogram. Make a frequency distribution distribution for the data. Then use the table to estimate the sample mean and the sample standard deviation of the data set.
s 30 g
14
in
25
rv 25 e
g 12 ni
s
d 10 n
e
c 20 n o-
5 15
12
f o
8 6
u
m
b
e
r
re
sp
4
10
r 10 e m
N
5
u N
9
o
u
b
13
2
1 70.5
92.5 92 .5
114. 11 4.5 5 13 136. 6.5 5 15 158. 8.5 5
Caffeine (in milligrams)
Figure for Exercise 41
42. Supermarket Trips
2
5 2
1 0
1
2
3
4
Number of supermarket trips
Figure for Exercise 42
Thirty people were randomly selected and asked how many trips to the supermarket they made in the past week. The responses are shown in the histogram. histogram. Make a frequency distribution distribution for the data. Then use the table to estimate the sample mean and the sample standard deviation of the data set.
90
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
Answers, page A## 43. See Odd Answers,
43. U.S. Population
The estimated distribution (in millions) of the U.S. population by age for the the year 2009 is shown in in the circle graph. Make a frequency distribution distribution for the data. Then use the the table to estimate estimate the sample mean and the sample sample standard deviation of the data data set. Use 70 as (Source: ce: U. U.S. S. Censu Censuss Burea Bureau) u) the midpoint for “65 years and over.” (Sour
44. See Selected Answers, Answers, page A## 45.
CV heights =
3.44 # 100 72.75
CV weights =
18.47 # 100 187.83
L
4.73 L
9.83 65 years and over
It appears that weight is more variable than height.
45–64 years
19.9 78.3
35.2 16.9
35–44 years
40.0
21
Under 5 years
39.0
29.8 38.3
Figure for Exercise 43
n 18 o
8
i 15
1
m
in( 12 n
14 –17 years
5.
lil
5 –13 years
18 –24 years
25– 25 –34 years ye ars
)s
9
u
6
P
3
la
t
oi p o
0.
.9
.1
4
1 1
2 1
8.
1
6
.6 3.
7 1
6
1
1
4. 2 1 3. 6
5
.3 1
15 25 3 5 4 5 5 5 6 5 7 5 85 85 9 5
Age (in years)
Figure for Exercise 44
44. Japan’s Population
Japan’s estimated population for the year 2010 is shown in the bar graph. Make a frequency distribution for the data. Then use the table to estimate the sample mean and the sample standard deviation of the (Source: ce: U. U.S. S. Censu Censuss Burea Bureau, u, Inter Internati national onal Data Base) data set. (Sour
Extending Concepts 45. Coefficient of Variation DATA
The coefficient of variation CV describes the standard deviation deviation as a percent of the mean. Because it has no units, units, you can use the coefficient of variation to compare data with different units. Standard deviation * 100% Mean The following table shows the heights (in inches) and weights (in pounds) of the members of a basketball team. team. Find the coefficient coefficient of variation for each data set. What can you conclude? CV
=
Heigh He ights ts
Weig eights hts
72 74 68 76 74 69 72 79 70 69 77 73
180 168 225 201 189 192 197 162 174 171 185 210
SEC ECTI TION2.4 ON2.4
(a)) Ma Male le:: 12 127. 7.4 4 46. (a x =
550,
(b)
x =
5500,
(c)
x =
55,
91
You used SSx = g 1x - x 22 when calculating variance and standard deviation. deviation. An alternative formula that is sometimes more convenient for hand calculations is
46. Shortcut Formula
Female:: 185.9 Female 47. (a)
Meas Me asu ures of Var ariiat atiion
s L
302.8
s L
s L
3028
30.28
(d) When each each entry entry is multipli multiplied ed by a constant k , the new sample mean is k # x , and the new sample standard deviation is k # s .
SSx
=
gx
2
-
1g x22 n
.
You can find the sample variance by dividing the sum of squares by n - 1 and the sample standard deviation by finding the square root of the sample variance.
48. (a)
x =
550, s
L
302.8
(b)
x =
560, s
L
302.8
(a) Use the shortcut shortcut formula to calculate the sample standard deviation for the data set given in Exercise 21.
(c)
x =
540, s
L
302.8
(b) Compare your results with with those obtained in Exercise 21.
(d) Adding Adding or subtr subtracting acting a constant k to each entry makes the new sample mean x + k with the sample standard deviation being unaffected unaffected.. 49. 10 -
1 2
k
=
0.99 and solve for k .
P L - 2.61
The data are skewed left. (b)
Consider the following sample data set.
100 200 300 400 500 600 700 800 900 1000 (a)) Fin (a ind d x and and s.
Set 1 50. (a)
47. Team Project: Scaling Data
P L
4.12
The data are skewed right.
(b) Multi Multiply ply each entry by 10. Fi Find nd x and s for the revis revised ed data. (c) Divi Divide de the origi original nal data by 10. Fi Find nd x and s for the revis revised ed data. (d) What can you concl conclude ude from from the resul results ts of (a), (a), (b), and (c)? (c)? 48. Team Project: Shifting Data
Consider the following sample data set.
100 200 300 400 500 600 700 800 900 1000 (a)) Fin (a ind d x and and s. (b) Add 10 to each entry entry.. Fi Find nd x and s for the revis revised ed data. (c) Subtr Subtract act 10 from the origi original nal data. Fi Find nd x and s for the revis revised ed data. (d) What can you concl conclude ude from from the resul results ts of (a), (a), (b), and (c)? (c)? 49. Chebychev’s Theorem
At least 99% of the data in any data set lie within how many standard deviations of the mean? Explain how you obtained your answer.
50. Pearson’s Index of Skewness
The English statistician Karl Pearson (1857–1936) introduced a formula for the skewness of a distribution. P
=
31x
-
median2 median2 s
Pearson’s index of skewness
Most distributions have an index of skewness between - 3 and 3. When P 7 0 , the data are skewed right. When P 6 0 , the data are skewed left. When P = 0 , the data are symmetric. Calculate the coefficient of skewness for each distribution. distribution. Describe the shape shape of each. (a) x
=
17, s
=
2.3, media median n
=
19
(b) x
=
32, s
=
5.1, median
=
25
Case Study Number of locations
Outlet type
WWW . SUNGLASSASSOCIATION . CO M
Sunglass Sales in the United States The Sunglass Association of America is a not-for-profit association of manufacturers and distributors of sunglasses. sunglasses. Part of the association’s mission is to gather and distribute marketing information about the sale of sunglasses. sunglasses. The data presented presented here are based on surveys surveys administered by Jobson Optical Research International.
Optical Store Sunglass Specialty Dept. Store Discount Dept. S to tore Catalog Showroom General Merchandise Supermarket Convenience Store Chain Drug Store Indep. Dr Drug Store Chain Apparel Store Chain Sports Store Indep. Sp Sports Store
34,043 2,060 6,866 10,376 887 11,868 21,613 83,613 31,127 7,034 26,831 5,760 14,683
Number (in 1000s) of Pairs of Sunglasses Sold Price Optical Store Sunglass Specialty Dept. Store Discount Dept. Store Catalog Showroom General Merchandise Supermarket Convenience Store Chain Drug Store Indep. Drug Store Chain Apparel Store Chain Sports Store Indep. Sports Store
$0 –$ –$10
$11–$30
$31–$50
$51–$75
0 192 1,224 8,793 153 6,147 14,108 19,726 17,883 1,352 3,464 672 875
290 7 08 1,464 5,284 10 0 495 31 6 2,985 3,432 1,110 1,804 526 1,997
3,164 2,515 1,527 147 65 0 0 0 50 12 186 43 0 1,320
1,240 1,697 488 67 35 0 0 0 0 0 112 72 528
$76–$100 $101–$150 $7 3,654 1,145 38 16 29 0 0 0 0 0 40 45 206
84 2 80 5 16 8 9 0 0 0 0 0 17 18 85
$151+ $1 478 378 5 0 0 0 0 0 0 0 7 4 11
Exercises Exercises 1. Mean Price Estimate the mean price of a pair of
sunglasses sold at (a) an optical optical store, (b) a sunglass specialty store, store, and (c) a department store. Use $200 as the midpoint for $151+. 2. Revenue Which type of outlet had the greatest total
revenue? Explain your reasoning. 3. Revenue Which type of outlet had the greatest
revenue per location? Explain your reasoning.
4. Standard Deviation
Estimate the standard deviation for the number of pairs of sunglasses sold at (a) optical optical stores, stores, (b) sunglass sunglass specialty specialty stores stores,, and (c) department stores.
5. Standard Deviation
Of the 13 distributions distributions,, which has the greatest standard deviation? Explain your reasoning.
6. Bell-Shaped Distribution
Of the 13 distributions, distributions, which is more bell shaped? Explain.
SEC SE CTI TION2. ON2.5 5
2.5
Meas Me asur ure es of of Po Posi siti tion on
93
Measur Mea sures es of Posit osition ion
What You Should Learn • How to find find the first first,, sec second ond,, and third quartiles of a data set • How to to find the the interquarti interquartile le range of a data set • How to to repres represent ent a data data set graphically using a box-andwhisker plot • How to interp interpret ret other fractiles such as percentiles • How to to find and and interpret interpret the the standard score ( z -score) -score)
Quar Qu arti tile less • Pe Perc rcen enti tile less and and Othe Otherr Frac Fracti tile less • Th Thee Sta Stand ndar ardd Scor Scoree
Quartiles In this section, you will learn how to use fractiles fractiles to specify the position of a data entry within a data set. Fractiles are numbers numbers that partit partition, ion, or divide, divide, an ordered data set into equal parts. parts. For instance, instance, the median is a fractile because because it divides an ordered data set into two equal parts.
DEFINITION The three quartiles, Q1, Q2, and Q3, approximately divide an ordered data The set into four equal parts. parts. About one quarter of the data fall on or below the first quartile Q1. About one half the data fall on or below the second quartile Q2 (the second quartile is the same as the median of the data set). About three quarters of the data fall on or below the third quartile Q 3 .
EXAMPLE
1
Finding the Quartiles of a Data Set The test scores of 15 employees enrolled in a CPR training course are listed. Find the first, second, and third quartiles of the test scores. scores. 13 9 18 15 14 21 7 10 11 20 5 18 37 16 17
SOLUTION
First, order the data set and find the median Q2. Once you find Q2, divide the data set into two halves halves.. The first and third quartiles are the medians of the lower and upper halves of the data set. Lower half
Upper half
5 7 9 10 11 13 14 15 16 17 18 18 20 21 37 Q1
Q2
Q3
employees scored 10 or less; about Interpretation About one fourth of the employees one half scored 15 or less; and about three fourths scored 18 or less. less.
Try It Yourself 1 Find the the first, second, and third quartiles quartiles for the ages of the Akhiok residents using the population data set listed in the Chapter Opener on page 33. a. Order the data set. b. Find the median Q2. c. Find the first and third quartiles Q1 and Q3.
Answer: Page A33
94
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
2
EXAMPLE
Using Technology to Find Quartiles The tuition costs (in thousands of dollars) for 25 liberal arts colleges are listed. Use a calculator or a computer to find the first, first, second, and third quartiles. quartiles. 23 25 30 23 20 22 21 15 25 24 30 25 30 20 23 29 20 19 22 23 29 23 28 22 28
SOLUTION
MINITAB, MINIT AB, Exc Excel, el, and the the TITI-83 83 each each have have feat features ures tha thatt automatically calculate calculate quartiles. quartiles. Try using this technology to find the first, second, secon d, and third third quartiles quartiles of the tuition tuition data. Fr From om the displays displays,, you can see that Q1 = 21.5, Q2 = 23, and Q3 = 28.
S tudy Tip
ind o f in to ys t ra l wa ys e v e s e r a e T her t.. taa se t les o f a da t t he quar t i le ind ou f in yo f ho w y o s s e le l d r a g Re ts are l ts les, t he resu t he quar t i le han one f b y more t y o f f rare l y n taance, i in ns t y.. For i in trr y taa en t da t le, t r a u irs t q i le e 2, t he f ir E xamp l le s is i e l l,, 2 2 ned b y E xc erm i in te as de t 1.5. 21 ead o f 2 te ns t i in
Descriptive Statistics Variable Tuition
N 25
Mean 23.960
Median 23.000
TrMean 24.087
StDev 3.942
Variable Tuition
SE Mean 0.788
Minimum 15.000
Maximum 30.000
Q1 21.500
Q3 28.000
A 1 2 3 4 5 6 7 8
Note to Instructor For MINI MINIT TAB and the TI-83, TI-83, quarti quartiles les are found with the following ranks. Q1: Q2: Q3:
11n
+
12
4 21n
+
12
4 31n
+
4
12
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
23 25 30 23 20 22 21 15 25 24 30 25 30 20 23 29 20 19 22 23 29 23 28 22 28
B
C
Quartile(A1:A25,1) 22 Quartile(A1:A25,2) 23 Quartile(A1:A25,3) 28
D
1-Var Stats ↑n=25 minX=15 Q1=21.5 Med=23 Q3=28 maxX=30
Interpretation About one quarter of these colleges charge tuition of $21,500 or less; one half charge $23,000 or less; and about three quarters charge $28,000 or less.
SEC ECTI TION2.5 ON2.5
Meas Me asur ures es of Pos osiiti tio on
95
Try It Yourself 2 The tuition tuition costs (in thousands of dollars) for 25 universities universities are listed. Use a calculator or a computer to find the first, second, and third quartiles. quartiles. 20 26 28 25 31 14 23 15 12 26 29 24 31 19 31 17 15 17 20 31 32 16 21 22 28 a. Enter the data. b. Calculate the first, first, secon second, d, and third quartiles quartiles.. c. What can you conclude?
Insight measure o f T he IQR i iss a ou yo ves y on t ha t g i ve vaar i iaa t i io v t he h c ho w mu f h dea o f an i id taa e da t e 50 % o f t h dd l le m i id e used t can a l lsso b es. I t vaar i ie v taa t iers. An y da u t l ie o y y f f i t n e d id i o to t n ies more t ha ue t ha t l ie vaa l lu v t o f Q1 e f t o t he l le to 1.5 IQRs t an g h t o f Q3 i iss o t he r i ig to or t an taance, 3 7 i iss ns t ier. For i in ou t l ie e es t scor s te t ier o f t he 15 ou t l ie e 1. xaamp l le n E x i in
Answer: Page A33
After finding the quartiles quartiles of a data set, you can find the interquartile range. range.
DEFINITION The interquartile range (IQR) of a data set is the difference between the third and first quartiles. Interquartile range (IQR2
=
Q3
-
Q1
3
EXAMPLE
Finding the Interquartile Range Find the interquartile interquartile range of the 15 test scores given in Example 1. What can you conclude from the result?
SOLUTION
From Exam From Example ple 1, you know that Q1 interquartile range is IQR
=
Q3
-
Q1
=
18
-
10
=
8.
=
10 and Q3
=
18.. So, the 18
Interpretation The test scores in the middle portion of the data set vary by at most 8 points.
Try It Yourself 3 Find the interquartile range for the ages of the Akhiok residents listed in the Chapter Opener on page 33. a. Fin ind d the the fi firs rstt and and th thir ird d qua quart rtil iles es,, Q1 an and d Q3 . b. Subtract Q1 from Q3 . c. Interpret the result in the context of the data. Answer: Page A33
Another important application of quartiles is to represent data sets using box-and-whisker plots. A box-and-whisk box-and-whisker er plot is an exploratory data analysis tool that highlights the important features of a data set. To graph a box-andwhisker plot, you must know the following values. values.
96
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
Picturing the World Of the first first 43 U.S U.S.. presidents,Theodore Roosevelt was the youngest at the time of inaugura inau guration tion,, at the the age of 42. Ron Ronald ald Rea Reagan gan was was the the oldestt presi oldes president, dent, inaugu inaugurated rated at the age of 69. The box-andbox-andwhisker plot summarizes the ages of the first 43 U.S. presidents at inauguration.
1. The min minimu imum m ent entry ry 2. The fir first st qua quarti rtile le Q1 3. The med media ian n Q2
These five numbers are called the five-number summary of the data set.
GUIDELINES Drawing a Box-and-Whisker Plot 1. Find the five-number summary of the data set. 2. Construct a horizontal scale that spans the range of the data. 3. Plot the five numbers above the horizontal scale.
(Source: infoplease.c infoplease.com) om)
4. Draw a box above the horizontal scale from Q1 to Q3 and draw a
Ages of U.S. Presidents at Inauguration 51
40
vertical line in the box at Q2 . 5. Draw whiskers from the box to the minimum and maximum entries.
55 58
50
60
Box
Whisker
69
42
4. The thi third rd qua quarti rtile le Q3 5. The max maximu imum m ent entry ry
Minimum entry
70
Maximum entry Q3
Median, Q 2
Q1
Whisker
How many U.S. presidents’ ages are represented by the box?
4
EXAMPLE
See MINIT MINITAB and TI-83 TI-83 steps steps on pages 114 and 115.
Drawing a Box-and-Whisker Plot
Draw a box-and-whisker plot that represents the 15 test scores given in Example 1. What can you conclude from the the display?
SOLUTION The five-number five-number summary of the test scores is below. below. Using these five numbers, numbers, you can construct the box-and-whisker box-and-whisker plot shown.
Insight x--andbo x You can use a Yo ermine te o de t to t t whisk er plo t tiion. trribu t dis t he shape o f a th t x--andhe bo x th ha t t th tiice t No t ample 4 xa n E x whisk er plo t i tiion trribu t tss a dis t represen t t.. righ t ha t is sk e wed th t
Min
=
5
Q1
=
10
Q2
=
15
Q3
=
18
Max
=
37
Test Scores in CPR Class 5
10
15
18
37
5 6 7 8 9 10 11 11 12 13 13 14 15 15 16 17 17 18 19 19 20 21 22 23 24 24 25 26 26 27 28 28 29 30 30 31 32 32 33 34 34 35 36 36 37
display.. One is that Interpretation You can make several conclusions from the display about half the scores are between 10 and 18.
Try It Yourself 4 Draw a box-and-whisker plot that represents the ages of the residents of Akhiok listed in the chapter opener on page 33. a. b. c. d.
Find the five-number summary of the data set. Construct a horizontal scale and plot the five numbers above it. Draw the box, the vertical line , and the whiskers. Answer: Page A33 Make some conclusions conclusions..
SEC ECTI TION2.5 ON2.5
Insight t he 25 t h No t i icce t ha t e same as he le i iss t h percen t i le le i iss t h percen t i le Q1; t he 50 e he Q , or t h t he same as 2 percen t i le le h 5 t 75 7 med i iaan; t he as Q3. i iss t he same
p Study T i p ou yo t y ha t th t t an t ta t is impor t I t tiile ercen t p a ha t wh and w ta unders t e h th f ance, i f t ta means. For ins t h-old th t n o m x eigh t o f a si x we w tiile, p h ercen t th he 78 t th t t an t is a t fa in f h th t eighs more an we t w an t fa he in f th t h-old th x-mon t 78 % o f all si xha t th t t does no t mean tss. I t an t fa in f weighs 78 % o f t w an t fa he in f th t t.. eigh t we some ideal w
Meas Me asur ures es of Pos osiiti tio on
97
Percentiles and Other Fractiles In addition to using quartiles to specify a measure of position, you can also use percentiles and deciles.These deciles. These common fractiles are summarized as follows. follows.
Fractiles
Summary
Symbols
Quarti Quar tile less Deci De cile less Percen Per centil tiles es
Divide Divi de a dat data a set set in into to 4 equ equal al pa part rts. s. Divi Di vide de a da data ta se sett in intto 10 equ qua al pa part rts. s. Divide Div ide a dat data a set int into o 100 equ equal al parts. parts.
Q1, Q2, Q3 D1, D2, D3, Á , D9 P 1, P 2, P 3, Á , P 99
Percentiles are often used in education and health-related fields to indicate how one individual compares with others others in a group. They can also be used to identify unusually unusually high or unusually low values. For instance, test scores and children’s growth growth measurements are often expressed in percentiles. percentiles. Scores or measurements in the 95th percentile and above are unusually unusually high, while those in the 5th percentile and below are unusually low.
EXAMPLE
5
Interpreting Percentile Percentiless
100 90
The ogive represents the cumulative frequency distribution for SAT test scores of college-bound students in a recent year. What test score represents the 64th percentile? How should you (Source: rce: Coll College ege Boar Board d interpret this? (Sou
SAT Scores
80 70 el
60 50 e
cr
e
40 P
n
it
30 20
Online)
10 200 400 600 800 1000120 100012001400160 014001600 0
Score
SOLUTION
From the ogive, you can see that the 64th percentile corresponds to a test score of 1100.
100 90 80 70 lei
60 50 e
cr
e
t
40
n
Ages of Residents of Akhiok P
85
Interpretation This means that 64% of the students had an SAT score of 1100 or less.
75 65
it
n 55 e cr
10 200 400 600 800 1000120 100012001400160 014001600 0
Score
Try It Yourself 5
e 45 P
30 20
95
el
SAT Scores
35
The ages of the residents of Akhiok are represented in the cumulative frequency graph at the left. At what percentile is a resident whose age is 45?
25 15 5 5 1015202 1015202530354045505560 530354045505560 6570
Ages
a. Use the graph to find the percentile that corresponds to the given age. Answer: Page A33 b. Interpret the results in the context of the data.
98
CHA HAPT PTER2 ER2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
The Standard Score When you know the mean and standard standard deviation of a data data set, you can measure a data value’s position in the data set with a standard score, or z- score.
DEFINITION The standard score, or z -score, represents the number of standard deviations a given value x falls from the mean m . To find the z- score for a given value, value, use the following following formula. z
=
Value - Mean Standard deviation
=
x
- m
s
A z -sc -score ore can be nega negativ tive, e, pos positi itive ve,, or zer zero. o. If z is nega negativ tive, e, the cor correresponding x -value is below the mean. If z is positive, the corresponding x -value is above the mean. And if z = 0 , the corresponding x -value is equal to the mean.
6
EXAMPLE
Finding z-Scores The mean speed of vehicles along a stretch of highway is 56 miles per hour with a standard deviation of 4 miles per hour. You measure the speed of three cars traveling along along this stretch of highway as 62 miles per hour, 47 miles per hour, and 56 miles per hour. hour. Fin Find d the z-score that that corresponds corresponds to each each speed.What can can you conclude?
SOLUTION x z
=
The z-score that that corresponds corresponds to each speed is calculated calculated below below.. =
62
62 mph -
56
4
=
47 mph 47 - 56 = - 2.25 4 x
1.5
z
=
x
=
z
=
56 mph 56 - 56 = 0 4 =
Interpretation From the z-scores, you can conclude that a speed of 62 miles per hour is 1.5 standard deviations deviations above the mean; mean; a speed of 47 miles miles per hour is 2.25 standard deviations below the mean; and a speed of 56 miles per hour is equal to the mean.
Try It Yourself 6
Insigh t bu trr i b f t he d i iss t i f No t i icce t ha t n i in t he speeds f t on o f y t i io e l y te ma t x i im e 6 i iss appro E xamp l le g n i in , t he car go be l l l s haped s r i is les per hou 4 7 m i le y usua l l l y n u ing a t an trra ve l in t ecause t he o w speed b s l lo oa to esponds t speed corr . 25. e o f - 2 2 z- scor
The monthly utility bills in a city have a mean of $70 and a standard deviation of $8. Find the z-scores that correspond to utility bills of $60, $71, and $92. What can you conclude? a. Identify m and s of the nonstandard normal distribution. b. Transform each value to a z-score. Answer: Page A33 c. Interpret the results.
When a distribution is is approximately bell shaped, you know from the Empirical Rule that about 95% of the data lie within 2 standard deviations of the mean. So, when this distribution’s values are transformed to z -scores -scores,, about 95% of the z -scores should fall between - 2 and 2. A z -score outside of this range will occur about 5% of the time and would be considered unusual. unusual. So, according to the Empirical Rule, a z -score less than - 3 or greater than 3 would be very unusual, with such a score occurring about 0.3% 0.3% of the time. time.
SEC ECTI TION2.5 ON2.5
Meas Me asur ures es of Pos osiiti tio on
99
In Example Example 6, you used used z- scores to ccompare ompare data values values within the same same data set. You can also use z-scores to compare compare data values values from different data sets.
7
EXAMPLE Jacksonvill e Houston
5 5
11 1 1 11
0 0
.312 .312
276 255
331 3 80 38
Pct . 81 812 .625 .250 .250
PF 4 84 84 381 270 313
PA 33 2 301 3 79 37 441
Comparing z-Scores from Different Data Sets
West
yz- Ka Ka ns ns as as Ci ty ty x-Denver Oakland San Diego
W 13 10 4 4
L
T
3 6 12 12 12
0 0 0 0
NATIONAL CONFERENCE East
yz- Ph Ph il ad ade lph ia x-Dallas Washington N.Y. Giants
W
L
T
12 10 5 4
4 6 11 11 12 12
0 0 0 0
Pct . 75 750 .625 .312 .250
PF 3 74 74 289 287 243
PA 28 7 2 60 26 372 387
During the 2003 regular season the Kansas City Chiefs, Chiefs, one of 32 teams in the National Nati onal Footba Football ll League League (NFL), (NFL), score scored d 63 touchdowns touchdowns.. Durin Duringg the 2003 regular season the Tampa Tampa Bay Storm, one of 16 teams in the Arena Football League (AFL), scored 119 touchdowns. touchdowns. The mean number of touchdowns touchdowns in the NFL is 37.4, 37.4, with a standard deviation deviation of 9.3. The mean number of touchdowns in the the AFL is 111.7, 111.7, with a standard standard devia deviation tion of 17.3. 17.3. Fi Find nd the the z-score that corresponds to the number of touchdowns for each team. Then compare your Football League) results. (Source: The National Football League and the Arena Football
SOLUTION The z-score that corresponds to to the number of touchdowns for each team is calculated below.
NATIONAL CONFERENCE EASTERN DIVISION Team
x-New York y-Detroit y-Las Vegas Buffalo
Won Lost Tie
8 8 8 5
8 8 8 11 11
0 0 0 0
Pct .500 . 50 500 .500 . 31 313
PF PF 85 8 57 799 79 756 554
PA PA 825 819 81 821 751 75
SOUTHERN DIVISION Team
x-Tampa Bay y-Orlando y-Georgia Carolina
Won Lost Tie
12 12 8 0
4 4 8 16
0 0 0 0
Pct .750 .750 .500 .000
PF 849 805 80 731 73 553
PA 689 6 70 70 701 70 886
Kansas City Chiefs z
=
=
y--clinched playoff playoff berth, x--clinched division title
L
x
- m
s
63
37.4 9.3
-
2.8
Tampa Bay Storm z
=
= L
x
- m
s
119
111.7 17.3 -
0.4
The number of touchdowns scored by the Chiefs is 2.8 standard deviations above the mean, and the number of touchdowns scored by the Storm is 0.4 0.4 standard deviations above the mean. Interpretation The z-score corresponding to the number of touchdowns for the Chiefs is more than two standard deviations from the mean, so it is considered unusual. The Chiefs scored an unusually high number of touchdowns touchdowns in the NFL, whereas the number of touchdowns scored by the Storm was only slightly higher than the AFL average.
Try It Yourself 7 During the 2003 regular season the Kansas City Chiefs scored 16 field goals. During the 2003 regular season the Tampa Tampa Bay Storm scored 12 field goals.The mean number of field goals in the NFL is 23.6, with a standard deviation of 6.0. The mean number of field goals in the AFL is 11.7, with a standard deviation of 4.6. Find the z-score that corresponds to the the number of field field goals for each each Football League and the team. Then compare your results results.. (Source: The National Football Arena Football Football League)
a. Identify m and s of each nonstandard normal distribution. b. Transform each value to a z-score. c. Compare your results. Answer: Page A33
100
CHA HAP PTE TER R2
Des esccri rip pti tiv ve Sta Stati tissti tics cs
Exercises
2.5 2. 5
Building Basic Skills and Vocabulary In Exercises 1 and 2, (a) find the three quartiles and (b) draw a box-and-whisker box-and-whisker plot of the data.
Help
1. 4 7 7 5 2 9 7 6 8 5 8 4 1 5 2 8 7 6 6 9 DATA
2. 2 7 1 3 1 2 8 9 9 2 5 4 7 3 7 5 4 7
2 3 5 9 5 6 3 9 3 4 9 8 8 2 3 9 5
DATA
Student Stud y Pack
3. The points scored per game by a basketball team represent the third
quartile for all teams in a league. What can you conclude about the team’s points scored per game? 1. (a) Q1
4.5, Q2
=
=
6, Q3
=
7.5
4. A salesperson at a company sold $6,903,435 of hardware equipment last
year, a figure that represented the eighth decile decile of sales performance at the company.. What can you conclude about the salesperson’s performance? company performance?
(b) 1
4.5
6
7.5
5. A student’s score on the ACT placement test for college algebra is in the
9
63rd percentile.What percentile. What can you conclude about the student’s test score? 0
1
2. (a) Q1
=
2
3
3, Q2
4 =
5
6
7
8
9
5, Q3
=
8
6. A doctor tells a child’s parents that their child’s height is in the 87th
percentile for the child’s child’s age group. What can you conclude conclude about the child’s height?
(b) 1
3
5
8 9
statement is true or false. True or False? In Exercises 7–10, determine whether the statement 0
1
2
3
4
5
6
7
8
If it is false, rewrite it as a true statement. statement.
9
more 3. The basketball team scored more points per game than 75% of the teams in the league. sold more 4. The salesperson sold hardware equipment than 80% of the other salespeople.
7. The second quartile is the median of an ordered data set. 8. The five numbers you need to graph a box-and-whisker plot are the
minimum, the maximum, Q1, Q3, and the mean. 9. The 50th percentile is equivalent to Q1.
above 63% 5. The student scored above of the students who took the ACT placement test.
10. It is impossible to have a negative z- score.
6. The child is taller than 87% of the other children in the same age group.
Using and Interpreting Concepts
7. True
(a) the minimum entry.
(d) the second quartile.
False. The five numbers you need 8. False. to graph a box-and-whisker plot are the minimum,the maximum, Q1, Q3, and the median.
(b) the maximum entry.
(e) the third quartile.
(c) the first quartile.
(f ) the interquartile range.
11.
12.
to identify Graphical Analysis In Exercises 11–16, use the box-and-whisker plot to
False. The 50th percentile is 9. False. equivalent to Q2.
10
False. The only way to have a 10. False. negative z -score is if the value is less than the mean. 11. (a) Min
=
10
(b) Max
(c) Q1
=
13
(d) Q2
(e) Q3
=
17
( f ) IQR
=
= =
13
15
17
20
10 11 12 13 14 15 16 17 18 19 20 21
20
15 4
13.
100 130
205
100
200
150
270 250
320 300
14. 900 900
1250
1500 15 1500
1950 2100 2000
25
50
65 70
85
25 30 35 40 45 50 55 60 65 70 75 80 85
SEC SE CTI TION2. ON2.5 5
12. (a) Min
100
=
(b) Max
(c) Q1
=
130
(d) Q2
(e) Q3
=
270
(f ) IQR
13. (a) Min
900
=
=
1250
(d) Q2
(e) Q3
=
1950
( f ) IQR
25
(b) Max
14. (a) Min
=
=
50
(d) Q2
(e) Q3
=
70
(f ) IQR
Min 15. (a) Mi
= - 1.9
(c) Q1
= - 0.5
(e) Q3
=
Min 16. (a) Mi
0.7 = - 1.3
= - 0.3
(d) Q2
(e) Q3
=
0.4
(f ) IQR
2.1
=
=
T, P 50
R, P 80
=
=
=
(b)
2, Q2
4, Q3
=
0
1
2
1
2.1
15
16
17
0.2
18
0.7
19
A
B
20
21
22
C
The letter letterss R, S, and T are marked marked on the the histogram. histogram. Match them to P10, P50, and P80. Justify your answer.
18. Graphical Analysis
5 4 3
S
2 1
Because 10% of the values are below T, T, 50% of the values are below R, and 80% of the values are below S. 19. (a) Q1
−1
2
2
17. Q1 = B, Q2 = A, Q3 = C, because about one quarter of the data fall on or below 17, 18.5 is the median of the entire data set, set, and about three quarters of the data fall on or below 20. 18. P 10
1
3
1.2
=
=
0
4
0.1
=
−1
5
20
=
(c) Q1
2.1
85
=
(b) Ma Max
− 0.3 0.2 0.4
The letter The letterss A, B, and C are are marked marked on on the hist histogram ogram.. Match them to Q1, Q2 (the median), and Q3. Justif Justifyy your answer.
65
=
( f ) IQR
−1.3
2.1
17. Graphical Analysis
700
=
=
(d) Q2
− 0.5 0.1 0.7
−2
1500
=
(b) Ma Max
−1.9
2100
=
=
(c) Q1
16.
1 40
=
(b) Max
15.
205
=
(c) Q1
3 20
=
101
Meas Me asur ure es of of Posi siti tio on
15
16
17
18
T
19
20
R
21
22
23
24
S
5
=
Exercisess 19–22, use a Using Technology to Find Quartiles and Draw Graphs In Exercise
Watching Television
calculator or a computer computer to (a) find the data set’s first, second, and third quartiles, and (b) draw a box-and-whisker plot that represents the data set. 0
2
0
1
2
4 5 3
4
5
9 6
7
8
19. TV Viewing DATA
9
Hours
20. (a) Q1
=
(b)
2, Q2
=
4.5, Q3
=
The number of hours of television watched per day by a sample of 28 people 2 4 1 5 7 2 5 4 4 2 3 6 4 3 5 2 0 3 5 9 4 5 2 1 3 6 7 2
6.5
Vacation Days
The number of vacation days used by a sample of 20 employees in a recent year
20. Vacation Days DATA
0
2
4.5 6.5
0
2
4
6
3 9 2 1 7 5 3 2 2 6 4 0 10 0 3 5 7 8 6 5
10
8
10 10
21. Butterfly Wingspans
Number of days DATA
21. (a) Q1
(b)
=
3.2, Q2
=
3.65, Q3
=
3.9
3.2 3.1 2.9 4.6 3.7 3.8 4.0 3.0 2.8 3.3 3.6 3.9 3.7 3.9 4.1 2.9 3.2 3.8 3.9 3.5 3.7 3.3
Butterfly Wingspans
2.8 3.2 3.65 3.9 4.6 2
3
4
5
Wingspan (in inches)
Answers, page A## 22. See Selected Answers,
The lengths (in inches) of a sample of 22 butterfly
wingspans
22. Hourly Earnings DATA
The hourly earnings (in dollars) of a sample of 25 railroad equipment manufacturers 15.60 18.75 14.60 15.80 14.35 13.90 17.50 17.55 13.80 14.20 19.05 15.35 15.20 19.45 15.95 16.50 16.30 15.25 15.05 19.10 15.20 16.22 17.75 18.40 15.25
102
CHA HAP PTE TER R2
Des esccri rip pti tiv ve Sta Stati tissti tics cs
23. (a) 5
23. TV Viewing
Refer to the data set given in Exercise 19 and the box-andwhisker plot you drew that represents the data set.
(b) 50% (c) 25%
(a) About 75% of the people people watched no more than than how many many hours of television per day? (b) What percent percent of the people watched more than than 4 hours of television television per day? (c) If you randoml randomlyy selected selected one person person from from the sample sample,, what is is the likelihood that the person watched less than 2 hours of television per day? Write your answer as a percent.
(a)) $1 $17. 7.65 65 24. (a (b) 50% (c) 50% 25. A : z =
- 1.43
B : z = 0 C : z
=
2.14
A z -score of 2.14 would be unusual. unusual. 26. B : z = 0.77
C : z = 1.54 A : z =
- 1.54
None of the z -scores are unusual. Statis tistic tics: s: z 27. (a) Sta
=
Biology: z =
73
-
63
7
26
23
-
3.9
L
L
1.43
=
60
-
(a) About 75% of the manufact manufacturers urers made less less than what amount amount per hour? (b) What percent percent of the manufacturers made more than than $15.80 per hour? (c) If you randomly randomly selected selected one one manufacture manufacturerr from the sample sample,, what is the likelihood that the manufacturer made less than $15.80 per hour? Write your answer as a percent.
0.77
(b) The studen studentt did better better on on the statistics test. Statis tistic tics: s: z 28. (a) Sta
Refer to the data set given in Exercise 22 and the box-and-whisker plot you drew that represents the data set.
24. Manufacturer Earnings
63
7
Graphical Analysis In Exercises 25 and 26,the midpoints A,B,and C are marked on the histogram. Match them to the indicated z -scores. Which z -scores, if any any,, would be considered unusual? 25. z
=
0
z
=
2.14
z
= -
L - 0.43
Biology: z =
20
23
-
3.9
L - 0.77
=
Biology: z =
78
-
63
7
29
23
-
3.9
L
L
2.14
1.54
(b) The studen studentt did better better on on the statistics test. Statis tistic tics: s: z 30. (a) Sta
=
Biology: z =
63
23
-
63
7 -
3.9
=
0.77
z
=
1.54
z
= -
Statistics Test Scores
(b) The studen studentt did better better on on the statistics test. Statis tistic tics: s: z 29. (a) Sta
1.43
26. z
23
=
=
0
0
(b) The student student perform performed ed equally equally on both tests.
1.54 Biology Test Scores
16 14 r 12 e b10 m u 8 N 6 4 2
16 14 r 12 e b10 m u 8 N 6 4 2 48 53 58 63 63 68 68 73 73 78
Scores (out of 80) A B
C
17
A
20
23
26
29
Scores (out of 30) B C
Comparing Test Scores
For the statistics statistics test scores in Exercise Exercise 25, the mean is 63 and the standard deviation is 7.0, and for the biology test scores scores in Exercise 26 the mean is 23 and the the standard deviation deviation is 3.9. In Exercises 27–30, you are given the test scores of a student who took both tests. (a) Tr Transf ansform orm each test score to a z -sco -score. re. (b) Deter Determine mine on which test test the student student had a better score. score. 27. A student gets a 73 on the statistics test and a 26 on the biology test. 28. A student gets a 60 on the statistics test and a 20 on the biology test. 29. A student gets a 78 on the statistics test and a 29 on the biology test. 30. A student gets a 63 on the statistics test and a 23 on the biology test.
SEC SE CTI TION2. ON2.5 5
31. (a) z 1
=
34 ,000 - 35,000 2250
=
37,000 - 35,000 2250
z 3
=
31,000 - 35,000 2250
A certain brand of automobile tire has a mean life span of 35,000 miles and a standard deviation of 2250 miles. miles. (Assume the life spans of the tires have a bell-shaped distribution.)
L
(a) The life life spans of three three randomly selected selected tires are 34,000 miles, miles, 37,0000 miles 37,00 miles,, and 31,00 31,0000 miles miles.. Fi Find nd the z-score that corre corresponds sponds to each life span. According to the z- scores, would the life spans of any of these tires be considered unusual? (b) The life life spans of three randomly randomly selected selected tires are 30,500 miles miles,, 37,250 miles miles,, and 35,00 35,0000 miles miles.. Using the Empi Empirical rical Rule Rule,, find the percentile that corresponds to each life span.
0.89
L - 1.78
None of the selected tires have unusual life spans. (b) Fo Forr 30,500,2.5th 30,500,2.5th percen percentile tile For 37,250,84th 37,250, 84th percentile
The life spans of a species of fruit fly have a bell-shaped distribution, with a mean of 33 days and a standard deviation of 4 days. days.
32. Life Span of Fruit Flies
For 35,000,50th 35,000, 50th percentile 32. (a) z 1
=
z 2
=
z 3
=
34
-
33
4 30
-
33
4 42
-
33
4
=
(a) The life spans of three randomly randomly selected fruit flies are 34 days, days, 30 days, days, and 42 days. Find the z-score that corresponds to each life span and determine if any of these life spans are unusual. (b) The life life spans of three three randomly selected selected fruit flies flies are 29 days, days, 41 days, days, and 25 25 days. days. Using the Empir Empirical ical Rule Rule,, find the percen percentile tile that corresponds to each life span.
0.25,
= - 0.75,
=
103
31. Life Span of Tires
L - 0.44
z 2
Meas Me asur ure es of of Posi siti tio on
2.25
The life span of 42 days is unusual. (b) For For 29, 16t 16th h percent percentile ile For 41, 97.5th percentile percentile For 25, 2.5th percentile percentile inches; 20% of the 33. About 67 inches;20% heights are below 67 inches.
frequency distribInterpreting Percentiles In Exercises 33–38, use the cumulative frequency ution to answer the questions. The cumulative frequency distribution represents represents the heights of males in the United States in the 20–29 age group.The heights have a bell-shaped distribution (see Picturing the World World,, page 80) with a mean of (Source: ce: Natio National nal Center Center for 69.2 inches and a standard deviation of 2.9 inches. (Sour Health Statistics)
34. 99th percentile 35. z 1
=
z 2
=
z 3
=
74
-
69.2 2.9
L
62
-
69.2 2.9
L - 2.48
80
-
69.2 2.9
100
L
3.72
The heights that are 62 and 80 inches are unusual. 36. z 1
=
z 2
=
z 3
=
70
-
66
-
68
69.2 2.9
L
69.2 2.9
L - 1.10
69.2 2.9
L - 0.41
-
Adult Males Ages 20–29
1.66
0.28
None of the heights are unusual.
90 e l i t n e c r e P
80 70 60 50 40 30 20 10 62 64 66 68 70 72 74 76 78
Height (in inches)
33. What height represents the 20th percentile? How should you interpret this? 34. What percentile is a height of 76 inches? How should you interpret this? 35. Three adult adult males in the 20–29 age group are randomly selected. Their
heights are heights are 74 inche inchess, 62 inches inches,, and 80 80 inches inches.. Use z -scor -scores es to to determi determine ne which heights heights,, if any, any, are unusual. unusual. 36. Three adult adult males in the 20–29 age group are randomly selected. Their
heights are heights are 70 inche inchess, 66 inches inches,, and 68 68 inches inches.. Use z -scor -scores es to to determi determine ne which heights heights,, if any, any, are unusual. unusual.
104
37. z =
CHA HAP PTE TER R2
71.1
-
69.2
Des esccri rip pti tiv ve Sta Stati tissti tics cs
L
2.9
37. Find the z-score for a male in the 20 20–29 –29 age group whose whose height is
0.66
71.1 inches.What inches. What percentile is this?
About the 70th percentile 38. z =
66.3
-
69.2
38. Find the z-score for a male in the 20 20–29 –29 age group whose whose height is
66.3 inches.What inches. What percentile is this?
= -1
2.9
About the 11th percentile 39. (a) Q1
=
(b)
42, Q2
=
49, Q3
=
56
Extending Concepts
Ages of Executives
39. Ages of Executives
The ages of a sample of 100 executives are listed.
DATA
27 25
42 49 56 35
45
55
82 65
75
85
Ages
(c) Half of the ages ages are betwee between n 42 and 56 years.
31 50 60 49 61
62 54 42 47 56
51 61 50 51 57
44 41 48 28 32
61 48 42 54 38
(d) 49,because 49,because half half of of the executives are older and half are younger.
47 49 42 36 48
49 51 36 36 64
45 54 57 41 51
40 39 42 60 45
52 54 48 55 46
60 47 56 42 62
51 52 51 59 63
67 36 54 35 59
47 53 42 65 63
63 74 27 48 32
54 33 43 56 47
59 53 43 82 40
43 68 41 39 37
63 44 54 54 49
52 40 49 49 57
Over the hill or on top? Number of 100 top executives in the following age groups:
40. 5
TOP EXECUTIVES
36
41. 33.75
31
42. 10.975 43. 19.8
13 2
16 1
1
24.5 34.5 44.5 54.5 64.5 74.5 84.5 Age
(a) (b) (c) (d)
Order the the data and and find the the first, first, secon second, d, and third third quartile quartiles. s. Draw a box-and-whisker plot that represents the data set. Interpret Inter pret the result resultss in the context context of the data. data. On the basis basis of this this sample sample,, at what what age would would you expect expect to to be an executive? Explain your reasoning. (e) Whi Which ch age groups groups,, if any any, can be consi considered dered unusu unusual? al? Explai Explain n your reasoning.
Midquartile Another measure of position is called the midquartile. You can find the midquartile of a data set by using the following formula. Midquartile
=
Q1
+
Q3
2
In Exercises 40– 43, find the midquartile midquartile of the given data data set. 40. 5
7 1 2 3 10 8 7 5 3
41. 23
36 47 33 34 40 39 24 32 22 38 41
42. 12.3
9.7 8.0 15.4 16.1 11.8 12.7 13.4 12.2 8.1 7.9 10.3 11.2
43. 21.4
20.8 19.7 15.2 31.9 18.7 15.6 16.7 19.8 13.4 22.9 28.7 19.8 17.2 30.1
Uses and Abuses Statistics in the Real World
Uses It can be difficult to see trends or patterns from a set of raw data. data. Descriptive statistics helps you do so. so. A good description of a data set consists of three features: feat ures: (1) the shape shape of the data, (2) a measure measure of the center of the data, data, and (3) a measure of how much variability variability there is in the data. data. When you read reports repor ts,, news item items, s, or advert advertisem isements ents prepa prepared red by other other peopl people, e, you are seldom given given raw data sets. sets. Instead, you are given graphs, graphs, measures of central tendency tende ncy,, and measures measures of variation variation.. To be a discerning discerning reader, reader, you need to understand the terms and techniques of descriptive statistics.
Abuses Cropped Vertical Axis Misleading statistical graphs are common in newspapers and magazines. Compare the two time series charts below. below. The data are the same for each. each. Howev However, er, the first first graph graph has a cropped cropped vertica verticall axis, axis, which makes makes it appear that the stock price has increased greatly over the 10-year period. In the second graph, graph, the scale on the vertical vertical axis begins at zero. zero. Thi Thiss graph correctly correctly shows that stock prices increased only modestly during the 10-year period. Stock Price
Stock Price )s 90
)s 64
r 80 al
r 62 a ll 60 o
l 70 o d
d 58
ni 60
ni 56 (
( 50 e
e 54
ci 40
icr 52 r
p 30
p 50 c 48
k 20 c
S 46 S
k
ot 10
ot
1996
1998
2000
2002
2004
Year
1996
1998
2000
2002
2004
Year
Outliers,or or extreme values, values, can have significant Effect of Outliers on on the Mean Outliers, effect eff ectss on the the mean. mean. Supp Suppose ose,, for exam example ple,, tha thatt in recru recruiti iting ng infor informat mation, ion, a company stated that the average commission earned by the five people in its salesforce was $60,000 last year.This statement would be misleading if four of the five earned $25,000 and the fifth person earned $200,000.
Exercises 1. Cropped Vertical Axis
In a newspaper newspaper or magazine magazine,, find an example example of a graph that has a cropped vertical vertical axis. axis. Is the graph misleading? misleading? Do you think this graph was intended to be misleading? Redraw the graph so that it is not misleading.
2. Effect of Outliers on the Mean
Describe a situation in which an outlier can make the mean misleading. Is the median also affected significantly by outliers? Explain your reasoning.
106
CHA CH APT PTE ER 2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
Chapter Summary
2
What did you learn?
Review Exercises
Section 2.1 ◆
How to construct a frequency distribution including limits, boundaries boundaries,, midpoints,, relative frequencies midpoints frequencies,, and cumulative cumulative frequencies
◆
How to construct frequency histograms, histograms, frequency polygons, polygons, relative frequency histograms histograms,, and ogives
1 2– 6
Section 2.2 ◆
How to graph quantitative data sets using the exploratory data analysis tools of stem-and-leaf plots and dot plots
7, 8
◆
How to graph and interpret paired data sets using scatter plots and time series charts
9, 10
◆
How to graph qualitative data sets using pie charts and Pareto charts
11,, 12 11
Section 2.3 ◆
How to find find the mean, mean, median, and mode of a population and a sample m
◆
=
gx
N
,x
=
gx
n
How to find a weighted mean of a data set and the mean of a frequency g 1x # w2 g 1x # f2 distribution x = ,x =
15–18
How to describe the shape of a distribution distribution as symmetric, symmetric, uniform, or skewed and how to compare the mean and median for each
19–24
n
gw
◆
13,, 14 13
Section 2.4 ◆
How to find the range of a data set
25, 26
◆
How to find the variance and standard deviation of a population and a sample
27– 30
A
g 1x
s
=
-
m2
N
2
,s
A
g 1x
=
n
-
x22
1
◆
How to use the Empirical Rule and Chebychev’s Theorem Theorem to interpret standard deviation
31–34
◆
How to approximate the sample standard deviation for grouped data
35, 36
s
=
A
g 1x
-
x22 f
n
-
1
Section 2.5 ◆
How to find the quartiles and interquartile range of a data set
◆
How to draw a box-and-whisker plot
40,, 42 40
◆
How to interpret other fractiles such as percentiles
43,, 44 43
◆
How to find find and interpret interpret the standard standard score score (z -scor -score) e) z
=
1x
37–39, 41
-
m2> s
45–48
Review Exercises
Review Exercises
2
Section 2.1
Answers, page A## 1. See Odd Answers, Answers, page A## 2. See Selected Answers, 3.
107
DATA
Liquid Volume 12-oz Cans
In Exercises 1 and 2, use the following data data set. The data set represents the the income (in thousands of dollars) of 20 employees at a small business. 30 28 26 39 34 33 20 39 28 33 26 39 32 28 31 39 33 31 33 32
12
y10 c n 8 e u 6 q e r F 4
1. Make a frequency distribution of the data set using five classes. classes. Include the
class mi class midpo dpoint intss, lim limits its,, bou bounda ndarie riess, fre freque quenci ncies es,, rel relati ative ve fre freque quenci ncies es,, and cumulative frequencies frequencies..
2 5 7 8 . 1 1
5 5 1 5 9 . 9 . 1 1 1 1
5 5 9 3 9 . 0 . 1 2 1 1
5 5 7 1 0 . 1 . 2 2 1 1
2. Make a relative frequency histogram using the frequency distribution in
Actual volume (in ounces)
Exercise 1. Then determine which which class has the greatest relative frequency and which has the least relative frequency.
Answers, page A## 4. See Selected Answers, 5. Cla lass ss
Mid idp poin intt
Frequency Fr cy,, f
79–93
86
9
94–108
101
12
109–123
116
5
124–138
131
3
139–153
146
2
154 – 168
161
1 g f
=
DATA
In Exercises 3 and 4, use the following data data set. The data represent the actual actual liquid volume (in ounces) in 24 twelve-ounce cans. 11.95 11.91 11.86 11.94 12.00 11.93 12.00 11.94 12.10 11.95 11.99 11.94 11.89 12.01 11.99 11.94 11.92 11.98 11.88 11.94 11.98 11.92 11.95 11.93 3. Make a frequency histogram using seven classes.
32
4. Make a relative frequency histogram of the data set using seven classes. Meals Purchased 14 12
y10 c n e 8 u q 6 e r F 4
DATA
153 104 118 166 89 104 100 79 93 96 116 94 140 84 81 96 108 111 87 126 101 111 122 108 126 93 108 87 103 95 129 93
2 1 6 1 6 1 6 1 6 7 8 0 1 3 4 6 7 1 1 1 1 1 1
5. Make a frequency distribution with six classes and draw a frequency polygon.
Number of meals
Answers, page A## 6. See Selected Answers, 7. 1 3 7 8 9 2 0 12 33 34 45 55 78 89 3 11234578 4 347 5 1
Answers, page A## 8. See Selected Answers, 9.
Height of Buildings 60 s55 e i r50 o t s45 f o40 r35 e b 30 m u25 N 20
In Exercises 5 and 6, use the following data data set. The data represent the number of meals purchased during one night’s night ’s business at a sample of restaurants.
6. Make an ogive of the data set using six classes classes..
Section 2.2 DATA
In Exercises 7 and 8, use the following data set.The data represent the average daily high temperature (in degrees Fahrenheit) during the month of January for Chicago, Illinois. (Source: National Oceanic and Atmospheric Administration) 33 31 25 22 38 51 32 23 23 34 44 43 47 37 29 25 28 35 21 24 20 19 23 27 24 13 18 28 17 25 31 7. Make a stem-and-leaf plot of the the data set. Use one line per stem. 8. Make a dot plot of the data set. 9. The following are the heights (in feet) and the number of stories of nine
400 40 0 50 500 0 60 600 0 70 700 0 80 800 0
Height (in feet)
The number of stories appears to increase with height.
notable buildings in Miami. Use the data to construct a scatter plot. plot. What Skyscrapers.com) .com) type of pattern is shown in the scatter plot? (Source: Skyscrapers Height (in feet) Number of stories
764 625 520 510 484 480 450 430 410 55 47 51 28 35 40 33 31 40
108
CHAP APTE TER R2
10.
. .
Desc scri ript ptiv ive e Sta Stati tist stiics
10. The U.S. U.S. unemployment rate over a 12-year period period is given. Use the data to
nemp nem p oy oyme ment nt at atee
e 8 t a r t 7 n 6 e m5 y 4 o l p 3 m2 e n 1 U
U.S.. Bureau of Labor Statistics) Statistics) construct a time series chart. (Source: U.S
DATA
Year Unemployment rate Year Unemployment rate
2 3 4 5 6 7 8 9 0 1 2 3 9 9 9 9 9 9 9 9 0 0 0 0 9 9 9 9 9 9 9 9 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2
Year
1992 1993 1994 1995 1996 1997 7.5 6.9 6.1 5.6 5.4 4.9 1998 1999 2000 2001 2002 2003 4.5 4.2 4.0 4.7 5.8 6.0
In Exercises 11 and 12, use the following data set. The data set represents the top top seven American Kennel Club registrations (in thousands) in 2003. (Source: American Kennel Club)
Breed Number registered (in thousands)
Labrador Rettri Re riev ever er
Golden Retri Ret riev ever er
Beagle
German Shepherd
Dachshund
Yorkshire Terrier
Boxer
145
53
45
44
39
38
34
11. American Kennel Club
d e r ) s 160 e d t s n 140 i a 120 g s e r u 100 o 80 r h e b t 60 n 40 m i u ( 20 N
11. Make a Pareto chart of the data set. 12. Make a pie chart of the data set.
Section 2.3 r r r o e n e e v d v d e l a i i r r o e r b t G a t e e r L r
e l g a e B
n d a r e m r h e p e G h s
e r d r e n i i u h r s r h e s k t r h c o a Y D
Breed
12.
Dachshund 10% Beagle 11%
=
Median
8.6 =
9
Mode
=
9
14. Mean
=
30.8
Median Mode
=
15. 31.7 16. 2.1 17. 79.5 18. 87.8 19. Skewed 20. Skewed
9 7 8 6 9 12 11 5 9 10
Labrador retriever 36%
28 35 29 29 33 32 29 33 31 29 15. Estimate the mean of the frequency distribution you made in Exercise 1. 16. The following frequency distribution shows the number of magazine
subscriptions per household for a sample of 60 households. households. Find the mean number of subscriptions per household. German shepherd 11%
13. Mean
13. Fi Find nd the mean, medi median, an, and mode of the data set. set.
14. Fi Find nd the mean, medi median, an, and mode of the data set. set.
American Kennel Club Boxer Yorkshire 9% terrier 10%
r e x o B
=
30
29
Golden retriever 13%
Number of magazines Frequency
0 13
1 9
2 19
3 8
4 5
5 2
6 4
17. Six test scores are given. The first five test scores are 15% of the final grade, grade,
and the last test score is 25% of the final grade. grade. Find the weighted weighted mean of the test scores. 65 72 84 89 70 90 18. Four test scores are given. The first three test scores are 20% of the final
grade, and the last test score is 40% of the the final grade. grade. Find the weighted mean of the test scores scores.. 81 95 89 87 19. Describe the shape of the distribution in the histogram you made in
Exercise Exerc ise 3. Is the distribution distribution symmetric symmetric,, unifo uniform, rm, or skewed? 20. Describe the shape of the distribution in the histogram you made in
Exercise Exerc ise 4. Is the distribution distribution symmetric symmetric,, unifo uniform, rm, or skewed?
Review Exercises
109
22. Skewed right
In Exercises 21 and 22, determine whether the approximate approximate shape of the the distribution in the histogram histogram is skewed right, right, skewed left, or symmetric.
23. Median
21.
21. Skewed left
24. Mean
22.
12
12
10
10
25. 2.8
8
8
26. 3.84
6
6
mean 27. Population me
4
4
2
2
=
Stan St and dar ard d de devi viat atiion 28. Population mean
=
Stan St and dar ard d de devi viat atiion 29. Sample mean
=
=
L
3.2 3. 2
69 L
7.8 7. 8
2453.4
Stan St and dar ard d de devi viat atiion mean 30. Sample me
9
L
306 30 6.1
38,653.5
Stan St anda dard rd de devi viat atio ion n
L
6762 67 62.6 .6
31. Between $21.50 and $36.50 32. 68%
2
6
10 14 18 22 26 30 34
2
6
10 14 18 22 26 30 34
23. For the histogram in Exercise Exercise 21, which is greater, greater, the mean or the median? 24. For the histogram in Exercise Exercise 22, which is greater, greater, the mean or the median?
Section 2.4 25. The data set represents the mean price of a movie ticket (in U.S. .S.dollars) dollars) for
a sample of 12 U.S. U.S. cities cities.. Find the range of the the data set. 7.82 7.38 6.42 6.76 6.34 7.44 6.15 5.46 7.92 6.58 8.26 7.17 26. The data set represents the mean price of a movie ticket (in U.S. .S.dollars) dollars) for
a sample of 12 Japanese cities. cities. Find the range of the data set. 19.73 16.48 19.10 18.56 17.68 17.19 16.63 15.99 16.66 19.59 15.89 16.49 27. The mileage mileage (in thousands) for a rental car company’s fleet fleet is listed. Find
the population mean and standard deviation of the data. 6 14 3 7 11 13 8 5 10 9 12 10 28. The age of each each Supreme Court justice as of August August 20, 2003 is listed. Find (Source: ce: Supre Supreme me the population mean and standard deviation of the data. (Sour Court of the United States)
78 83 73 67 67 63 55 70 65 29. Dormitory room prices (in dollars for one school year) for a sample of
four-year universities are listed. listed. Find the sample mean and the sample sample standard deviation of the data. 2445 2940 2399 1960 2421 2940 2657 2153 2430 2278 1947 2383 2710 2761 2377 30. Sample salaries (in dollars) dollars) of public school teachers are listed. listed. Find the
sample mean and standard deviation of the data. 46,098 36,259 35,084 38,617 42,690 26,202 47,169 37,109 31. The mean rate for cable television from a sample of households was $29.00
per month, month, with a standard standard deviati deviation on of $2.50 per month. month. Betw Between een what two values do 99.7% of the data lie? (Assume a bell-shaped distribution.) 32. The mean rate for cable television from a sample of households was $29.50
per month, month, with a standard standard deviat deviation ion of $2.75 $2.75 per month. month. Esti Estimate mate the the percent of cable television television rates between $26.75 $26.75 and $32.25. (Assume that the data set has a bell-shaped distribution.)
110
CHAP APTE TER R2
Desc scri ript ptiv ive e Sta Stati tist stiics
33. 30
33. The mean sale per customer for 40 customers at a grocery store is $23.00,
34. 15
with a standard deviation of $6.00. On the basis of Chebychev’s Theorem, at least how many of the customers spent between $11.00 and $35.00?
mean 35. Sample me
L
2.5
Stan St anda darrd dev evia iati tio on 36. Sample mean
=
1.2 1. 2
34. The mean length of the first 20 space shuttle shuttle flights was about 7 days, and
1.7 1. 7
the standard deviation deviation was about 2 days. On the basis of Chebychev’s Theorem, at least how many of the flights flights lasted between 3 days and 11 (Source: ce: NAS NASA) A) days? (Sour
L
2.4
Stan St anda darrd dev devia iati tio on
L
37. 56
35. From a random sample sample of households, households, the number of television sets are
38. 70
listed. Find the sample sample mean and standard deviation of the data. data.
39. 14 40.
50
56
63
0 1
Number of televisions Number of households
Height of Students
70
75
70
75
1 8
2 13
3 10
4 5
5 3
36. From a random sample of airplanes, airplanes, the number of defects found in their
fuselages are listed. Find the sample mean and standard deviation of the data. 50
55
60
65
Number of defects Number of airplanes
Heights
41. 4 42. Weight of Football Players
145
173 190 208
240
0 0 0 0 0 0 0 0 0 0 0 4 5 6 7 8 9 0 1 2 3 4 1 1 1 1 1 1 2 2 2 2 2
Weights
43. 23% scored higher than 68. 44. 88th percentile 45.
z
=
2.33,unusual
46.
z
=
-
47.
z
=
48.
z
=
0 4
1 5
2 2
3 9
4 1
5 3
6 1
Section 2.5 In Exercises Exercises 37–40, use the followi following ng data set. set. The data data represent represent the heights heights (in inches) of students in a statistics class. 50 64
51 65
54 68
54 69
56 70
59 70
60 71
61 71
37. Find the height that corresponds
61 75
63
38. Find the height that corresponds
to the first quartile.
to the third quartile.
39. Find the interquartile range.
40. Make a box-and-whisker plot of
1.5, not unusual
the data.
1.25, not unusual unusual -
2.125, unusual
41. Find the interquartile range of the data from Exercise 14. 42. The weights (in pounds) of the defensive players on a high school football
team are given. Make a box-and-whisker plot of the data. data. 173 208
145 185
205 190
192 167
197 212
227 228
156 190
240 184
172 195
185
43. A student’s test grade of 68 represents the 77th percentile of the grades.
What percent of students scored higher than 68? 44. In 2004 there were 728 “oldies” radio stations in the United States. If one
station finds that 84 stations have a larger daily audience than it does, what percentile does this station come closest to in the daily audience rankings? (Source: Radioinfo.c Radioinfo.com) om)
In Exercises 45– 45–48, 48, use the following following information. information. The weights of 19 high high school football players have have a bell-shaped distribution, with a mean of 192 pounds and a standard deviation deviation of 24 pounds. Use z -scores to determine if the weights of the following randomly selected football players are unusual. 45. 248 pounds
46. 156 pounds
47. 222 pounds
48. 141 pounds
Chapter Quiz
Chapter Quiz
2
Take this quiz as you you would take a quiz in class. After you are done,check done, check your work work against the answers given in the back of the book.
Answers, page A## 1. See Odd Answers,
1. The data set is the number of minutes a sample of 25 people exercise
2. 125.2,13.0 3. (a)
U.S. Sporting Goods
DATA
Recreational transport 34%
Footwear 13%
(a) Make a frequency distribution distribution of the data set using five classes. classes. Include class limits, limits, midpo midpoints ints,, frequ frequencie enciess, bound boundaries aries,, relat relative ive frequ frequencie encies, s, and cumulative frequencies frequencies.. (b) Display the data using a frequency histogram and a frequency polygon on the same axes. (c) Display the the data using a relative frequency frequency histogram. histogram. (d) Desc Describe ribe the distri distributio bution’s n’s shape shape as symmetric symmetric,, unifo uniform, rm, or skewed. skewed. (e) Displ Display ay the data using using a box-and-whiske box-and-whiskerr plot. (f) Displ Display ay the data using an ogive. ogive.
Equipment 31% ) s r a l l 16 o14 d12 s f10 e l o a s 8 S n o 6 i l l 4 i b 2 n i (
U.S. Sporting Goods
l t t a r n n o e p o i s m t n p a a i e t r u r q c e E R
g n i h t o l C
each week. 108 139 120 123 120 132 123 131 131 157 150 124 111 101 135 119 116 117 127 128 139 119 118 114 127
Clothing 22%
(b)
111
r a e w t o o F
2. Use frequency distribution formulas to approximate the sample mean and
standard deviation of the data set in Exercise 1.
Sales area
(a)) 75 751. 1.6,784. 6,784.5,none 5,none 4. (a
3. U.S. .S.sporting sporting goods sales (in billions of dollars) can be classified in four areas:
The mean best describes a typical salary because there are no outliers.
clothing (10.0), footwear (14.1), equipment (21.7), and recreational recreational transport transport (32.1). Display the data using (a) (a) a pie chart and (b) a Pareto Pareto chart. (Source:
(b)) 57 (b 575;48,1 5;48,135 35.1 .1;; 21 219. 9.4 4
National Sporting Goods Association)
5. Between $125,000 and $185,000 6. (a)
z
=
(b)
z
L -
(c)
z
L
(d)
z
=
4. Weekly salaries (in dollars) for a sample of registered nurses are listed.
3.0, un unusual
774 446 1019 795 908 667 444 960
6.67, very unusual
(a) Find Find the the mean, mean, the medi median, an, and the mode of the salar salaries ies.. Whi Which ch best best describes a typical salary? (b) Find the the range range,, var varian iance ce,, and stand standard ard devi deviati ation on of the the data data set. set. Interpret the results in the context of the real-life setting.
1.33 -
2.2 , unusual
(a)) 71 71,84. ,84.5,90 5,90 7. (a (b) 19 (c)
5. The mean price of new homes from a sample of houses is $155,000 with a
Wins for Each Team
71 84.5 90 101
43 40
standard deviation of $15,000. The data set has a bell-shaped bell-shaped distribution. Between what two prices do 95% of the houses fall?
50
60
70
80
6. Refer to the the sample statistics statistics from Exercise 5 and use z -scores to determine
which, if any, any, of the following following house prices is unusual.
90 10 100
Number of wins
(a) $200,000
(b) $55,000
(c) $175,000
(d) $122,000
7. The number of wins for each Major League Baseball team in 2003 are listed. DATA
(Source: Major League Baseball)
101 95 86 71 63 90 86 83 68 43 96 93 77 71 101 91 86 83 66 88 87 85 75 69 68 100 85 84 74 64 (a) Fi Find nd the quartil quartiles es of the data data set. set. (b) Fi Find nd the interqu interquartil artilee range. range. (c) Draw a box-and-w box-and-whiske hiskerr plot. plot.
112
CHA HAP PTE TER R2
Des esccri rip pti tiv ve Sta Stati tissti tics cs
P U T T I N G
I T
A L L
T O G E T H E R
Real Statistics Real Decisions ■
You are a consumer journalist for a newspaper. You have received several letters and emails from readers who are concerned about the cost of their automobile insurance premiums. premiums. One of the readers wrote the following: “I think, think, on the avera average, ge, a driver driver in in our city city pays pays a higher higher automobile insurance premium than drivers in other cities like ours in this state.”
Your editor asks you to investigate the costs of insurance premiums and write an article about it. You have gathered the data shown at the right (your city is City City A). The data represent the automobile automobile insurance premiums paid annually (in dollars) by a random sample of drivers in your city and three other cities of similar size in your state. (The prices of the premiums premiums from the sample include comprehensive, comprehensive, collision, bodily injury, injury, property damage, damage, and uninsured motorist coverage.)
The Prices, in Dollars, of Automobile Insurance Premiums Paid by 10 Randomly Selected Drivers in Four Cities
City A
City B
City C
City D
246 5 198 4 254 5 164 0 198 3 230 2 254 2 187 5 192 0 265 5
2 514 1 600 1 545 2 716 1 987 2 200 2 005 1 945 1 380 2 400
2 030 1 450 2 715 2 145 1 600 1 430 1 545 1 792 1 645 1 368
2 345 2 152 1 570 1 850 1 450 1 745 1 590 1 800 2 575 2 016
Exercises 1. How Would You Do It?
(a) How would you investigate the statement about the price price of automobile insurance premiums? (b) What statistical statistical measures measures in this this chapter would you use? 2. Displaying the Data
(a) What type of graph would you choose to display the data? Why? (b) Const Construct ruct the the graph from from part part (a). (c) On the basis basis of what what you did in in part (b), (b), does it appear appear that that the average avera ge automobile automobile insuranc insurancee premium in your city city, City A, is higher than in any of the other cities? Explain. 3. Measuring the Data
(a) What statistical statistical measures discussed in this this chapter would would you use to analyze the automobile insurance premium data? (b) Calc Calculate ulate the the measures measures from part part (a). (c) Compare the measures from part (b) with with the graphs graphs you made in Exercise 2. Do the measurements measurements support your conclusion conclusion in Exercise 2? Explain. 4. Discussing the Data
(a) What would would you tell your readers? Is the average automobile insurance premium in your city more than in the other cities? (b) What reasons might you give to to your readers as to why the prices prices of automobile insurance premiums vary from city to city?
(Adapted from Runzheimer International)
Lowest auto insurance premiums AVERAGE PER CITY
Nashville
$978
Boise
$990
Richmond, VA
$1038
Burlington, VT
$1039
(Source:Runzheimer Internation International) al)
Technology Techno logy
FPO
www.dfamilk.com
Milk Cows, 1994–2003 ) s
Dairy Farmers of America is an association that provides help to dairy farmers. Part of this help is gathering and distributing statistics on milk production. 0 0
9,800 0
9,600 s
9,400
in(
1 w o c
9,200 f o r
Monthly Milk Production
9,000 e b m u
94 95 96 97 98 99 00 01 02 03 Year N
The following data set was supplied by a dairy farmer. It lists the monthly milk production production (in pounds) for 50 Holstein dairy cows. (Source:
(Source: National Agricultural Agricultural Statistics Statistics Service)
Matlink Matli nk Dairy Dairy,, Clyme Clymerr, NY)
2825 4285 1258 2597 1884 3109 2207 3223 2711 2281
2072 2862 2982 3512 2359 2804 2882 2383 1874 1230
273 3 335 3 204 5 244 4 204 6 165 8 164 7 173 2 197 9 166 5
4% decrease over a 10-year period
Rate per p er Cow, Co w, 1994– 199 4–2003
2069 1449 1677 1773 2364 2207 2051 2230 1319 1294
2484 2029 1619 2284 2669 2159 2202 1147 2923 2936
lki m f o s d n u o P
19,000 18,500 18,000 17,500 17,000 16,500 16,000
15% increase over a 10-year period 94 95 96 97 98 99 00 01 02 03 Year
(Source: National Agricultural Agricultural Statistics Statistics Service)
From 1994 to 2003, the number of dairy cows cows in the United States decreased and the yearly milk production increased increased..
Exercises In Exerci Exercises ses 1–4, use a compu computer ter or or calculat calculator or.. If possible, print your your results.
In Exercises Exercises 6–8, use the freque frequency ncy distribut distribution ion found in Exercise 3.
1. Find the sample mean of the data.
6. Use the frequency distribution to estimate the
sample mean of of the data. Compar Comparee your results with Exercise 1.
2. Find the sample standard deviation of the data. 3. Make a frequency distribution for the data.
Use a class width of 500.
7. Use the frequency distribution to find the
sample standard deviation for the data. Compare your results with Exercise 2.
4. Draw a histogram histogram for the the data. data. Does the the
distribution appear to be bell shaped?
8. Writing
5. What percent of the distribution lies within
one standard deviation of the mean? Withi Within n two standard deviations deviations of the mean? How do these results agree with the Empirical Rule? Extended solutions are given in the Technology Supplement. Technical instruction instruction is provided for MINITAB, MINITAB, Excel, and the TI-83.
Use the results of Exercises 6 and 7 to write a general statement about the mean and standard stand ard deviation deviation for grouped grouped data. Do the formulas for grouped data give results that are as accurate as the individual entry formulas?
113
114
2
CHA CH APT PTE ER 2
Desscr De crip ipti tiv ve Sta Stati tissti ticcs
Using Technology to Determine Descriptive Statistics
Here are some MINITAB MINITAB and TI-83 printouts for three examples in this this chapter. (See Example Example 7, page 55.) Graph Plot... Time Series Plot... Chart... Histogram... Boxplot... Matrix Plot... Draftsman Plot... Contour Plot...
130 120
) 110 s n 100 o i l l 90 i m 80 n i 70 ( s r 60 e b i 50 r c 40 s b 30 u S 20 10 0
Year
1991
1993
1995
1997
1999
2001
(See Example Example 4, page 77.) Display Descriptive Statistics... Store Descriptive Statistics...
Descriptive Statistics
1-Sample Z... 1-Sample t... 2-Sample t... Paired t...
Variable Salaries
N 10
1 Proportion... 2 Proportions...
Variable Salaries
Minimum 37.000
Mean 41.500
Median 41.000
Maximum 47.000
TrMean TrMe an 41.375 Q1 38.750
StDev 3.136 Q3 44.250
2 Variances... Correlation... Covariance... Normality Test...
(See Example Example 4, page 96.) Graph Plot... Time Series Plot... Chart... Histogram... Boxplot... Matrix Plot... Draftsman Plot... Contour Plot...
35
e r 25 o c S t s e T 15
5
SE Mean 0.992
Using Technology Technology to Determine Descriptive Statistics
(See Example 7, pa page 55.)
STAT PLOTS 1: Plot1...Off L1 L2 2: Plot2...Off L1 L2 3: Plot3...Off L1 4↓ PlotsOff
L2
Plot1 Plot2 Plot3 On Off
(See Example 4, pa page 77.)
EDIT CALC TESTS 1: 1-Var Stats 2: 2-Var Stats 3: Med-Me Med-Med d 4: LinReg(ax+b) 5: QuadReg 6: CubicReg 7↓ QuartReg
1-Var Stats L1
STAT PLOTS 1: Plot1...Off L1 L2 2: Plot2...Off L1
L2
3: Plot3...Off L1 L2 4↓ PlotsOff
Plot1 Plot2 Plot3 On Off Type:
Type:
Xlist: L1 Freq: 1
Xlist: L1 Ylist: L2 Mark:
(See Example 4, pa page 96.)
+ .
ZOOM MEMORY 4↑ ZDecimal 5: ZSquare 6: ZStandard 7: ZTrig 8: ZInteger 9: ZoomStat 0: ZoomF ZoomFit it
1-Var Stats x= 41.5 x= 415 2 x = 17311 Sx= 3.13581462 x= 2.974894956 ↓n= 10
ZOOM MEMORY 4↑ ZDecimal 5: ZSquar ZSquare e 6: ZStandard 7: ZTrig 8: ZInteger 9: ZoomStat 0: ZoomFi ZoomFit t
115
A30
TRY IT YOURSELF ANSWERS
Try It Yourself Answers
CHAPTER 1
2a. Example: start with the first digits 92630782 b. 92 ƒ 63 ƒ 07 ƒ 82 ƒ 40 ƒ 19 ƒ 26
Section 1.1
c. 63, 63, 7,40 7,40,, 19, 19, 26
1a. The population consists of the prices per gallon of regular gasoline at all gasoline stations in the United States.
3. (1a) The sample sample was selected selected by using using only available students.
b. The sample consists of the prices per gallon of regular gasoline at the 900 surveyed stations.
(1b) Convenience Convenience sampling sampling (2a) The sample sample was selected selected by numbering numbering each student student in the school, randomly choosing a starting number, and selecting students at regular intervals from the starting number.
c. The data set consists of the 900 prices. 2a. Population
b. Parameter
3a. Descriptive statistics involve the statement “76% of women and 60% of men had a physical examination within the previous year.” b. An inference drawn from the study is that a higher percentage of women had a physical examination within the previous year.
Section 1.2 1a. City names and city population
(2b) Systematic Systematic sampling sampling
CHAPTER 2
Section 2.1 1a. 8 classes c.
b. City name: Nonnumeric Nonnumerical al City population: Numerical c. City name: Qualitative Qualitative City population: Quantitative 2. (1a) The final final standings standings represent represent a ranking of hockey hockey teams.
(1b) Ordinal, Ordinal, because because the data data can be be put in order. order. (2a) The collection of phone numbers represents labels. No mathematical computations can be made. (2b) Nominal, Nominal, because because you cannot cannot make calculatio calculations ns on the data. 3. (1a) The collectio collection n of body temperatures temperatures represents represents data that can be ordered but makes no sense written as a ratio.
(1b) (1b) Interval, Interval, because because meaning meaningful ful differences differences can be calculated. (2a) The collecti collection on of heart rates represents represents data that can be ordered and written as a ratio that makes sense. (2b) Ratio, because the the data are a ratio of heartbeats and minutes.
Section 1.3 1. (1a) Focus: Focus: Effect of exercise exercise on senior senior citizen citizens. s.
(1b) Populati Population: on: Collection Collection of all senior senior citizens citizens.. (1c) (1c) Experi Experimen mentt (2a) Focus: Effect Effect of radiation fallout fallout on senior senior citizens citizens.. (2b) Populati Population: on: Collection Collection of all senior senior citizens citizens.. (2c) (2c) Samp Sampli ling ng
Á
e.
b. Min
=
0; Max
Lower Lower limit limit
Upper Upper limit limit
0 10 20 30 40 50 60 70
9 19 29 39 49 59 69 79
Clas Classs
Freq Freque uency ncy,, f
0 –9 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79
15 19 14 7 14 6 4 1
=
72; Class width d. See part (e).
=
10
TRY IT YOURSELF YOURSELF ANSWERS
2a. See part (b). b.
5abc.
Frequ requen ency cy,, Class
f
0 –9 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79
a f =
Ages of Akhiok Residents y
MidMid- Rela Relativ tive e Cumu Cumula lativ tive e point point freque frequency ncy freque frequency ncy
15 19 14 7 14 6 4 1
4.5 14.5 24.5 34.5 44.5 54.5 64.5 74.5
0.1875 0.2375 0.1750 0. 0 .0875 0.1750 0. 0 .0750 0. 0 .0500 0. 0 .0125 f
gn
80
=
15 34 48 55 69 75 79 80
1
0.5–9.5 9.5–19.5 19.5–29.5 29.5–39.5 39.5–49.5 49.5–59.5 59.5–69.5 69.5–79.5
c.
12 er
q
e
n
8
c
y u F
R
Age
6a. Use upper class class boundaries boundaries for the the horizontal horizontal scale scale and cumulative frequency for the vertical scale. b. See part (c). c.
Ages of Akhiok Residents y c
80 72 64 56 48 40 32 24 16 8 u q
q re F
al u m u C
5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . 0 9 9 9 9 9 9 9 9 1 2 3 4 5 6 7 −
20
80 0
Section 2.2
4
Ages of Ahkiok Residents
u
t
0
b. See part (c).
e
f vi
d. Same as 2c.
4a. Same as 3b.
n
er e
b.
Age
c
5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . 4 4 4 4 4 4 4 4 1 2 3 4 5 6 7
7a. Enter data.
5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . 4 4 4 4 4 4 4 1 2 3 4 5 6 7
y
0.05 e
d. Approximately 69 residents are 49 years old or younger.
4
c.
0.10
Age
20 16
0.15 ivt al
b. Use class midpoints for the horizontal scale and frequency for the vertical scale.
Ages of Akhiok Residents
0.20 f e
c. 42.5% of the population is under 20 years old. old. 6.25% of the population is over 59 years old.
-
q er
e
Class boundaries
0.25 u
n
c e
n
3a.
20 18 16 14 12 10 8 6 4 2
1a. 0 1
2 3 4 5 6 7 b. Key: 3 ƒ 3
5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . . 5 5 4 4 4 4 4 4 4 4 4 −
A31
1 2 3 4 5 6 7 8
Age
d. The population increases up to the age of 14.5 and then decreases. Population increases again between the ages of 34.5 and 44.5, but then after 44.5, the population population decreases. decreases.
0 1 2 3 4 5 6 7
=
33
5 27 15 31 01 8 2 56 33 7 30 5 42 03 34 01 9697993 4 24 71 80 01 831689 0878 2
33 90 45 78 2 38 93 6 99 59 66 6 99 51 9
A32
TRY IT YOURSELF ANSWERS
c. Key: 3 ƒ 3
0 1 2 3 4 5 6
b. Motor Vehicle Occupants
33
=
0 01 11 23 33 0 2 23 33 3 35 0 01 23 34 45 3679999 0 01 11 24 45 136889 0788
Killed in 1991
45 55 79 66 7 78 88 9 99 56 66 9
Trucks 25%
=
Other 1%
Cars 66%
78 99 9
c. As a percentage of total motor vehicle deaths, deaths, car deaths decreased decreased by by 10%, truck deaths increased increased by by 9%, and motorcycle deaths stayed about the same.
7 2 d. It seems that most of the residents are under 40. 2ab. Key: 3 ƒ 3
Motorcycle 8%
5a.
33
Cause
Frequency, f
0 0011123334
Auto Dealers
14,668
0 55579
Auto Repair
9,728
1 02233333
Home Home Furn urnishi ishin ng
7,79 7,792 2
1 5 66 7 78 88 9 99
Computer Sales
5,733
2 00123344
Dry Cleaning
4,649
2 556669
b.
3 3
Causes of BBB Complaints 16,000 14,000 y c 12,000 n e 10,000 u 8,000 q e r 6,000 F 4,000
3 679999 4 00111244 4 578999
2,000 r s y g o s o s r t r e g t e l e r n n t e u i u l i a m u a o i h n p s D A a A p s a e r e H i e d m n l r o c u f C
5 13 5 6889 6 0
Cause
6 788
c. It appears that the auto industry (dealers and repair shops) account for the largest portion of complaints filed at the BBB.
7 2 7 3a. Use ages for the horizontal axis. b.
6ab. ) s r a l l o d n i ( y r a l a S
Ages of Akhiok Residents
0
10
20
30
40
50
60
70
80
50,000 45,000 40,000 35,000 30,000 25,000 20,000
Age (in years)
2
Killed (fre (frequ quen ency cy))
Relative freq freque uenc ncy y
Central angl angle e
22,385
0.6556
236
Trucks
8,457
0.2477
89
Motorcycles
2,806
0.0822
30
497
0.0146
5
Vehic ehicle le type type Cars
Ot h e r
g f =
34,145
g
f n
L
1
a
=
4
6
8
10 10
Length of employment (in years)
c. A large percentage of the residents are under 40 years old. 4a.
c. It appears that the longer an employee is with the company company, the larger his or her salary will be.
Salaries
7ab.
) s r a 80 l l 70 o d 60 n i ( 50 l l i 40 b 30 e g 20 a r e 10 v A
360°
Cellular Phone Bills
c. From 1991 1991 to to 1998, 1998, the average bill decreased signific significantl antlyy. From From 1998 until 2001,the 2001, the average bill increased slightly.
1 2 3 4 5 6 7 8 9 0 1 9 9 9 9 9 9 9 9 9 0 0 9 9 9 9 9 9 9 9 9 0 0 1 1 1 1 1 1 1 1 1 2 2
Year
Section 2.3 1a. 578
b. 41.3
c. The typical age of an employee in a department store is 41.3 years old.
A33
TRY IT YOURSELF YOURSELF ANSWERS
2a. 0,0,1,1,1,2,3,3,4,5,5,5,9,10,12,12,13,13,13,13,13,15, 16,, 16 16 16,, 17 17,, 17 17,, 18 18,, 18 18,, 18 18,, 19 19,, 19 19,, 19 19,, 20 20,, 20 20,, 21 21,, 22 22,, 23 23,, 23 23,, 24 24,, 24,, 25 24 25,, 25 25,, 26 26,, 26 26,, 26 26,, 29 29,, 36 36,, 39 39,, 39 39,, 39 39,, 39 39,, 40 40,, 40 40,, 41 41,, 41 41,, 41 41,, 42,, 44 42 44,, 44 44,, 45 45,, 47 47,, 48 48,, 49 49,, 49 49,, 49 49,, 51 51,, 53 53,, 56 56,, 58 58,, 58 58,, 60 60,, 67 67,, 68 68,, 68,, 72 68 b. 23 3a. 0, 0,1, 1, 1, 2,3, 3, 3, 4, 5, 5, 5, 7, 9, 10, 12, 12, 13 13,, 13, 13,13 3,13,, 13,, 15 13 15,, 16 16,, 16 16,, 17 17,, 17 17,, 18 18,, 18 18,, 18 18,, 19 19,, 19 19,, 19 19,, 20 20,, 20 20,, 21 21,, 22 22,, 23 23,, 23,, 24 23 24,, 24 24,, 25 25,, 25 25,, 26 26,, 26 26,, 26 26,, 29 29,, 33 33,, 36 36,, 37 37,, 39 39,, 39 39,, 39 39,, 39 39,, 40 40,, 40,, 41 40 41,, 41 41,, 41 41,, 42 42,, 44 44,, 44 44,, 45 45,, 47 47,, 48 48,, 49 49,, 49 49,, 49 49,, 51 51,, 53 53,, 56 56,, 58 58,, 58,, 59 58 59,, 60 60,, 67 67,, 68 68,, 68 68,, 72
Section 2.4 1a. Min
=
23, or $2 $23,000; M ax ax
58, o r $5 $58,000
=
b. 35, or $35,000 c. The range of the starting salaries for Corporation B is 35, or $35,000 (much larger than the range of Corporation A). 2a. 41.5, or $41,500 $41,500 b.
Salary, x (1000s (1000s of dollars) dollars)
Deviation, x M (1000s (1000s of dollars) dollars)
23
-
18.5
b. 23.5
29
-
12.5
c. Half of the residents of Akhiok are younger than 23.5 years old and half are older than 23.5 years old.
32
-
9.5
40
-
1.5
4a. 0, 0,1, 1, 1, 2,3, 3, 3, 4, 5, 5, 5, 7, 9, 10, 12, 12, 13 13,, 13, 13,13 3,13,, 13,, 15 13 15,, 16 16,, 16 16,, 17 17,, 17 17,, 18 18,, 18 18,, 18 18,, 19 19,, 19 19,, 19 19,, 20 20,, 20 20,, 21 21,, 22 22,, 23 23,, 23,, 24 23 24,, 24 24,, 25 25,, 25 25,, 26 26,, 26 26,, 26 26,, 29 29,, 33 33,, 36 36,, 37 37,, 39 39,, 39 39,, 39 39,, 39 39,, 40 40,, 40,, 41 40 41,, 41 41,, 41 41,, 42 42,, 44 44,, 44 44,, 45 45,, 47 47,, 48 48,, 49 49,, 49 49,, 49 49,, 51 51,, 53 53,, 56 56,, 58 58,, 58,, 59 58 59,, 60 60,, 67 67,, 68 68,, 68 68,, 72
41
-
0.5
41
-
0.5
b. 13
c. The mode of the ages is 13 years old.
5a. Yes
b. The mode of the responses to the survey is “Yes.”
6a. 21 21.6 .6;; 21; 20 b. The mean in Example 6 1 x L 23.8 23.822 was heavily influenced influ enced by the age 65. Neith Neither er the median nor the mode was affected as much by the age 65.
Weigh eight, t,
Source
x
w
x # w
Test Mean
86
0.50
43.0
Midterm
96
0.15
14.4
Final
98
0. 0 .20
19.6
Computer Lab
98
0.10
9.8
100
0.05
5.0
Homework
g w = c. 91.8
1.00
g
1 x # w 2
=
7.5
50
8.5
52
10.5
58
16.5
g x = 3ab.
m
Salary, x
91.8
Frequency,
x
342.25
29
-
12.5
156.25
32
-
9.5
90.25
40
-
1.5
2.25
41
-
0.5
0.25
41
-
0.5
0.25
49
7.5
56.25
50
8.5
72.25
52
10.5
110.25
58
16.5
272.25
415
g 1 x -
m2
=
x # f
15
67.50
10 –19
14.5
19
275.50
5a. Enter data.
20 – 29
24.5
14
343.00
6a. 7, 7, 7, 7, 7, 13, 13, 13, 13, 13
30 – 39
34.5
7
241.50
7a. 1 standard deviation
40 – 49
44.5
14
623.00
50 – 59
54.5
6
327.00
60 –69
64.5
4
258.00
70 –79
74.5
1
74.50
d. 27.6
g ( x # f 2
=
2 m2
=
1102.5
e. The population variance is 110.3 and the population standard deviation is 10.5, or $10,500.
4.5
80
g 1 x -
0
d. 10.5, or $10,500 $10,500
0 –9
=
2 M2
18.5
Midp Midpoi oint nt,, x
N
0
1 x
M
Clas Classs
f
=
-
c. 110.3
8abc.
m2
23
g x =
d. The weighted mean for the course is 91.8.
g 1 x -
415
41.5, or $41,500
=
7ab. Scor Score, e,
49
2210
4a. See 3ab.
b. 122.5
c. 11.1, or $11,100 $11,100
b. 37. 37.89; 89; 3.9 3.988 b. 3
b. 34%
c. The estimated percent of the heights that are between 61.25 and 64 inches is 34%. 8a. 0
b. 70.6
c. At least 75% of the data lie within 2 standard deviations of the mean. At least 75% of the population of Alaska is between 0 and 70.6 years old.
A34
9a.
TRY IT YOURSELF ANSWERS
x
f
0
10
0
1
19 19
19
2
7
14
3
7
21
4
5
20
5
1
5
6
1
6
n
c.
x
=
x
b. 1.7
xf
3a. 13, 41.5 41.5
4a. 0, 13, 13, 23.5 23.5,, 41.5 41.5,, 72 bc.
1 x x 22
1 x x 22 # f
1.70
2.8900
28.90
-
0.70
0.4900
9.31
0.30
0.0900
0.63
1.30
1.6900
11.83
2.30
5.2900
26.45
3.30
10 10.8900
10.89
4.30
18 18.4900
18.49 2
x 2 f
0 13 23.5 41.5
d. It appears that half of the ages are between 13 and 41.5 years. 5a. 80th percentile b. 80% of the ages are 45 years or younger. 6a.
=
b.
106.5
f
xf
=
z1 =
0 – 99
49.5
380
18,810
100 –199
149.5
230
34,385
200 –299
249.5
210
52,395
300 –399
349.5
50
17,475
400 – 499
449.5
60
26,970
500+
650.0
70
45,500
n
=
1000
z3 =
x -
g xf = 195,535
s
=
60
-
70
x
2
2
1 x x 2
1 x x 2
f
21,327.68
8,104,518.4
46.04
2,119.68
487,526.4
53.96
2,911.68
611,452.8
153.96
23,703.68
1,185,184.0
253.96
64,495.68
3,869,740.8
454.46
206,5 6,533.89
14,457,372.3
g 1 x -
2
x 2 f
=
28,715,794.7
d. 169.5
Section 2.5 1a. 0, 0, 1, 1,1, 2, 3, 3, 3, 4, 5,5, 5, 7, 9,10 9,10,, 12, 12, 13, 13, 13, 13, 13, 13, 13, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 29, 29, 33, 33, 36, 36, 37, 37, 39, 39, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 42, 42, 44, 44, 44, 44, 45, 45, 47, 47, 48, 48, 49, 49, 49, 49, 49, 49, 51, 51, 53, 53, 56, 56, 58, 58, 58, 58, 59, 59, 60, 60, 67, 67, 68, 68, 68, 68, 72 c. 13, 41.5 41.5
8
8 71
-
70
8 92
-
70
8
1.25
=
-
=
0.125
=
2.75
7a. NFL: m
=
23.6, s
=
6.0
AFL: m
=
11.7, s
=
4.6
b. Kansas City: z
Tampa Bay: z
146.04
-
70 ,
c. From the z-score,$60 -score, $60 is 1.25 standard deviations below below the mean, $71 is 0.125 standard standard deviation deviation above the mean, and $92 is 2.75 standard deviations above the mean.
b. 195.5
b. 23.5
m
z2 =
x
72
0 10 20 30 30 40 50 60 70 80 80
d. 1.5
c.
Ages Ag es o Ak Akh h ok Res Res de dent ntss
85
-
Class
b. 28.5
c. The ages in the middle half of the data set vary by 28.5 years.
g 1 x 10a.
b. 17, 17, 23, 28.5 28.5
c. One quarter of the tuition costs is $17,000 or less, one half is $23,000 or less, and three quarters is $28,500 or less.
g xf =
50
2a. Enter data.
= =
-
1.27
0.07
c. The number of field goals scored by Kansas City is 1.27 standard deviations below the mean and the number of field goals scored by Tampa Bay is 0.07 standard deviations above the mean. Comparing the two measures of position indicates that Tampa Bay has a higher position within the AFL than Kansas City has in the NFL.
A3
ODD ANSWERS
21. Class with greatest greatest frequency: frequency: 500–550
CHAPTER 2
Classes Classes with least frequenc frequency: y: 250–300 and 700 –750
Section 2.1
(page 43)
23.
Frequ requen ency cy,,
1. Organizing the data into a frequency distribution may make patterns within the data more evident. 3. Class limits determine which numbers can belong to that class.
Class boundaries are the numbers that separate classes without forming gaps between them. 5. False. False. The midpoint of a class is the sum of the lower and upper limits of the class divided by two.
Class
f
MidMidpoin pointt
0 –7
8
3.5
0. 0.32
8
8 –15
8
11.5
0.32
16
16 –23
3
19.5
0.12
19
24 –31
3
27.5
0.12
22
32 –39
3
35.5
0.12
25
g f
=
Rela Relati tive ve freque frequency ncy
Cum umul ulat ativ ive e freq freque uenc ncy y
f
g
25
n
7. True 9. (a) 10
=
1
25.
(b) and (c)
f
MidMidpoin pointt
1000 –2019
12
1509.5
0. 0 .5455
12
29.5 – 39.5
2020 –3039
3
2529.5
0. 0.1364
15
44.5
39.5 – 49.5
3040 –4059
2
3549.5
0. 0.0909
17
50 – 59
54.5
49.5 – 59.5
4060 –5079
3
4569.5
0. 0.1364
20
60 – 69
64.5
59.5 – 69.5
5080 –6099
1
5589.5
0. 0.0455
21
70 – 79
74.5
69.5 – 79.5
6100 –7119
1
6609.5
0. 0.0455
22
80 – 89
84.5
79.5 – 89.5
Clas Classs
Midp Midpoi oint nt
Clas Classs boun bounda dari ries es
20 – 29
24.5
19.5 – 29.5
30 – 39
34.5
40 – 49
Frequ requen ency cy,, Class
g f
11.
Frequ requen ency cy,,
MidMid- Rela Relati tive ve Cumu Cumula lativ tive e point point freque frequency ncy freque frequency ncy
Class
f
20 – 29
10
24.5
0.01
10
30 – 39
132
34.5
0.13
142
40 – 49
284
44.5
0.29
426
50 – 59
300
54.5
0.30
726
60 – 69
175
64.5
0.18
901
70 – 79
65
74.5
0.07
966
80 – 89
25
84.5
0.03
991
g f
=
g
991
13. (a) Number of classes
(b) Least Least freque frequenc ncyy
L
(c) Greate Greatest st frequenc frequencyy (d) (d) Clas Classs width width
=
=
f n
=
1
7
10 L
300
10
15. (a) 50
(b) 12.5–13.5 pounds
17. (a) 24
(b) 19.5 p po ounds
19. (a) Class with greatest greatest relative relative frequen frequency: cy: 8– 9 inches inches
Class with least least relative frequency: frequency: 17–18 inches (b) Greatest Greatest relative relative frequency frequency Least relative frequency (c) Approx Approxima imatel telyy 0.015 0.015
L
L
0.195
0.005
=
22
Rela Relati tive ve Cum umul ulat ativ ive e freque frequency ncy freq freque uency ncy
g
f N
L
1
July Sales for Representatives 14 12 y c 10 n e u 8 q e r 6 F 4 2 1509.5 3549.5 5589.5
Sales (in dollars)
Class with greatest greatest frequency: frequency: 1000–2019 Classes Classes with least least frequency frequency:: 5080 – 6099 and 6100–7119 6100–7119
A4
ODD ANSWERS
27.
31. Class
f
Mididpoin pointt
5
33–36
8
34.5
0.3077
8
0.1333
9
37–40
6
38.5
0.2308
14
360.5
0.1000
12
41–44
5
42.5
0.1923
19
5
388.5
0.1667
17
45–48
2
46.5
0.0769
21
403 – 430
6
416.5
0.2000
23
49–52
5
50.5
0.1923
26
431– 458
4
444.5
0.1333
27
459 – 486
1
472.5
0.0333
28
487–514
2
500.5
0.0667
30
Frequ requen ency cy,, Class
f
MidMidpoin pointt
291–318
5
304.5
0.1667
319 –346
4
332.5
347–374
3
375–402
g f
=
Rela Relati tive ve freq freque uenc ncy y
g
30
f n
Cum umul ulat ativ ive e freq freque uenc ncy y
Frequency, cy,
g f
=
Relat elativ ive e freq freque uenc ncy y
Cumu mula lati tiv ve freq freque uenc ncy y
g
26
f n
L
1
Heights of Douglas-Fir Trees =
1
y 0.35 c n e 0.30 u 0.25 q e r 0.20 f e 0.15 v i t a 0.10 l e R0.05
Reaction Times for Females 6
y c n 4 e u q e r 2 F
5 . 5 . 5 . 5 . 5 . 4 8 2 6 0 3 3 4 4 5
Heights (in feet) 5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . 4 2 0 8 6 4 2 0 0 3 6 8 1 4 7 0 3 3 3 3 4 4 4 5
Class with greatest relative frequency: 33–36 Class with with least relative relative frequenc frequency: y: 45– 48
Reaction times (in milliseconds)
33. Class
f
Rela Relativ tive e freq freque uency ncy
Cumu Cumula lativ tive e freq freque uency ncy
Class with with least frequenc frequency: y: 459– 486
50 – 53
1
0.0417
1
29.
54 – 57
0
0.0000
1
Class with with greatest frequency: frequency: 403– 430
Frequ requen ency cy,,
Frequ requen ency cy,,
Rela Relati tive ve freq freque uenc ncy y
Cumu Cumula lati tive ve freq freque uenc ncy y
58 – 61
4
0.1667
5
62 – 65
9
0.3750
14
Class
f
MidMidpoin pointt
146 –169
6
157.5
0. 0 .2308
6
66 – 69
7
0.2917
21
170 –193
9
181.5
0.3462
15
70 –73
3
0.1250
24
194 –217
3
205.5
0.1154
18
218 –241
6
229.5
0.2308
24
242 –265
2
253.5
0.0769
26
g f
=
26
g
f n
g f
=
g
24
f n
L
1
Retirement Ages L
1
Bowling Scores y 0.40 c n 0.35 e 0.30 u q 0.25 e r f e 0.20 v 0.15 i t a 0.10 l e R0.05
y25 c n e u20 q e r f 15 e v i t 10 a l u m 5 u C 49.5
57.5
65.5
73.5
Ages
5 . 7 5 1
5 . 1 8 1
5 . 5 0 2
5 . 9 2 2
Scores
5 . 3 5 2
Class with with greatest relative frequency: frequency: 170 –193 Class with least least relative frequency: frequency: 242–265
Location Locati on of the greatest increase increase in frequency frequency:: 62– 65
ODD ANSWERS
35.
Freq Freque uenc ncy y, Class
f
Rela Relativ tive e freq freque uency ncy
2–4
9
0.3214
9
5–7
6
0.2143
15
8 –10
7
0.2500
22
11 –13
3
0.1071
25
14 –16
2
0.0714
27
17 –19
1
0.0357
28
g f
=
g
28
f n
L
Cumu Cumula lativ tive e freq freque uency ncy
39. (a)
5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . 3 9 5 1 7 3 9 5 6 6 7 8 8 9 9 0 1
Dollars (in hundreds)
(b) 16.7%, 16.7%, becaus becausee the sum sum of the relativ relativee frequenci frequencies es for the last three classes is 0.167.
1
(c) $9600, becaus becausee the sum of the the relative relative frequencie frequenciess for the last two classes is 0.10. 41.
y30 c n e 25 u q e r 20 f e 15 v i t a 10 l u m 5 u C 7.5
13.5
Daily Withdrawals y c 0.35 n e 0.30 u q0.25 e r 0.20 f e 0.15 v i t 0.10 a l e R0.05
Gallons of Gasoline Purchased
1.5
A5
Histogram (5 Classes) 8 7 y 6 c n 5 e u 4 q e r 3 F 2 1
19.5
Histogram (10 Classes) 6 5
y c 4 n e u 3 q e r F2 1
Gasoline (in gallons) 2
5
8
11
14 14
1.5 5.5
Data
Location of the greatest increase increase in frequency: 2– 4 37.
9.5 13.5 13.5 17 17.5 .5
Data
Histogram (20 Classes)
Frequ requen ency cy,, Class
f
MidMidpoin pointt
47 – 57
1
52
0.05
1
58 – 68
1
63
0.05
2
69 –79
5
74
0.25
7
80 – 90
8
85
0.40
15
91 –101
5
96
0.25
20
g f
=
20
Rela Relati tive ve freq freque uenc ncy y
Cum umul ulat ativ ive e freq freque uenc ncy y
g
f N
=
1
Exam Scores 10
y 8 c n e u 6 q e r F 4 2 41 52 63 74 85 96 107
5
y 4 c n e 3 u q e r 2 F 1 1 3 5 7 9 11 13 15 1719
Data
In general, a greater number of classes classes better preserves the actual values of the data set but is not as helpful for observing general trends and making conclusions conclusions.. In choosing the number of classes, classes, an important consideration is the size of the data data set. set. Fo Forr instance instance,, you would would not not want want to use use 20 classes classes if your data set contained contained 20 entries. entries. In this particular particu lar example, example, as the number of classes increases increases,, the histogram shows more fluctuation. The histograms with 10 and 20 classes have classes with zero frequencies. frequencies. Not much is gained gained by using more more than five classes classes.. There Therefore, fore, it appears that five classes would be best.
Scores
Class with with greatest greatest frequency: frequency: 80– 90 Classes Class es with least frequenc frequency: y: 47–57 and 58– 68
Section 2.2
(page 56)
1. Quant Quantita itati tive: ve: ste stem-a m-andnd-lea leaff plot, plot, do dott plot, plot, hi histo stogra gram, m, scatter plot, time series chart
Qualitative Quali tative:: pie chart, chart, Paret Pareto o chart 3. a
4. d
5. b
6. c
7. 27, 27, 32 32,, 41 41,, 43 43,, 43 43,, 44 44,, 47 47,, 47 47,, 48 48,, 50 50,, 51 51,, 51 51,, 52 52,, 53 53,, 53 53,, 53 53,, 54 54,, 54,, 54 54 54,, 54 54,, 55 55,, 56 56,, 56 56,, 58 58,, 59 59,, 68 68,, 68 68,, 68 68,, 73 73,, 78 78,, 78 78,, 85
Max: Ma x: 85 85;; Mi Min: n: 27
A6
ODD ANSWERS
9. 13 13,, 13 13,, 14 14,, 14 14,, 14 14,, 15 15,, 15 15,, 15 15,, 15 15,, 15 15,, 16 16,, 17 17,, 17 17,, 18 18,, 19
25. y r 55 a l 50 a s s 45 ’ r e 40 h c 35 a e t . 30 g v 25 A
Max: Ma x: 19 19;; Mi Min: n: 13 11. Anheuser-Busch spends the most on advertising and Honda spends spends the least. (Answ (Answers ers will vary vary.) 13. Tailgaters irk drivers drivers the most, and too-cautious drivers irk drivers drivers the least. (Answ (Answers ers will vary.) vary.) 15. Key: 3 ƒ 3
=
33
13
3 233459 4 0 1 1 34 55 6 67 8 5 133 6
0069
17. Key: 17 ƒ 5
=
It appears that most farmers charge 17 to 19 cents per pound of apples. apples. (Answers will vary.) vary.)
18 1 3 4 4 6 6 6 9 19 0 0 2 3 3 5 6 19.
18
Life span (in days)
It appears that the life span of a housefly tends to be between betwee n 4 and 14 days days.. (Answ (Answers ers will vary vary.) 21.
s ) g n 1.35 g e e z o A d 1.25 e r d e 1.15 a r p s 1.05 G r a f l l o o 0.95 e d c n i r i 0.85 P (
Housefly Life Spans
4 5 6 7 8 9 10 11 12 13 14
Science, aeronautics, and exploration 49.5% Space flight capabilities 50.3%
Inspector General 0.2%
19
21
Price of Grade A Eggs
0 1 2 3 4 5 6 7 8 9 0 1 9 9 9 9 9 9 9 9 9 9 0 0 9 9 9 9 9 9 9 9 9 9 0 0 1 1 1 1 1 1 1 1 1 1 2 2
Year
It appears the price price of eggs peaked in in 1996. (Answers will vary.) 29. (a) When data data are taken at regular regular intervals intervals over over a period of time, time, a time series chart chart should be used. used. (Answers will vary.)
(b)
2004 NASA Budget
17
It appears that a teacher’s average salary decreases as the number of students per teacher increases. increases. (Answers will vary.) 27.
17 1 1 3 4 5 5 6 7 9
15
Students per teacher
It appears that most elephants tend to drink less than 55 gallons of water per day. day. (Answers will will vary.)
17.5
16 4 8
20
Teachers’ Salaries
Sales for Company A ) 130 s r a l l 120 o d f s e 110 l o a s S d n100 a s u o 90 h t ( 1st
2nd 2n d
3rd
4th
Quarter
It appears that 50.3% of NASA’s budget went to space flight capabilities. capabilities. (Answers will vary). 23.
Ultraviolett Index Ultraviole
Section 2.3
(page 67)
1. False. The mean is the measure of central tendency most likely to be affected by an extreme value (or outlier).
10
x e d 8 n 6 i V4 U
3. False. All quantitative data sets have a median.
2 L F , i m a i M
A G , a t n a l t A
H N , d r o c n o C
D I , e s i o B
O C , r e v n e D
It appears appears that Boise Boise,, ID ID,, and Denver, Denver, CO CO,, have the same same UV index. (Answers will vary.) vary.)
5. A data set with an outlier within it would be an example. (Answers will vary.) 7. The shape of the distribution is skewed right because the bars have a “tail” “tail” to the right. 9. The shape of the distribution is uniform because the bars are approximately the same height. 11. (9), because the distribution of values ranges from 1 to 12 and has (approximately) equal frequencies. 13. (10), because the distribution has a maximum value of 90 and is skewed left owing to a few students’ scoring much lower than the majority of the students.
ODD ANSWERS
15. (a) x
L
6.2
median mode
47. =
Clas Classs
6
3
3.5
5– 6
8
5.5
7– 8
4
7.5
9 –10
2
9.5
11–12
2
11.5
13 –14
1
13.5
(b) Median Median,, becau because se the distri distributio bution n is skewed. skewed. 17. (a) x
L
4.57
median mode
=
4.8
4.8
=
Freq Freque uency ncy,, f Midpoint
3– 4
5
=
(b) Median Median,, becau because se there there are no outliers outliers.. 19. (a) x
L
g f
mode
=
90.3, 9 1. 1.8
=
mode
=
not po possible
“Worse”
=
5 . 5 . 5 . 5 . 5 . 5 . 3 5 7 9 1 3 1 1
(b) Mode, Mode, becau because se the data data are at at the nominal nominal level of of measurement. L
mode
=
169.3
none
=
(b) Mean, becaus becausee there there are are no outli outliers ers.. 25. (a) x
=
22.6
median mode
=
Days hospitalized
49.
170.63
median
19
Clas Classs
Freq Freque uenc ncy y, f Midpoint
62 – 64
3
63
65 – 67
7
66
68 –70
9
69
71–73
8
72
74 –76
3
75
14
=
Positively skewed
8 7 y 6 c 5 n e 4 u q 3 e r F2 1
not possible
median
23. (a) x
20
Hospitalization
92.9
(b) Median Median,, becau because se the distri distributio bution n is skewed. skewed. =
=
93.81
median
21. (a) x
A7
g f
=
30
(b) Median Median,, becau because se the distri distributio bution n is skewed. skewed. 27. (a) x
L
Heights of Males
14.11
median mode
=
14.25
2.5
=
(b) Mean, becaus becausee there there are are no outli outliers ers.. 29. (a) x
=
41.3
median mode
=
y c n e u q e r F
9 8 7 6 5 4 3 2 1
39.5
63
66
69
72
75
Heights (to the nearest inch)
45
=
Symmetric
(b) Median Median,, becau because se the distri distributio bution n is skewed. skewed. 31. (a) x
L
19.5
median mode
=
51. (a) x
=
20
15
=
(b) x 6.01
=
5.945
median
=
6.01
(c) Mean
33. A = mo mode de,, be beca caus usee it’s it’s the the data data ent entry ry tha thatt occu occurr rred ed mos mostt often.
B
=
medi me dian an,, be beca caus usee the the dist distri ribu buti tion on is is skew skewed ed righ right. t.
C
=
mean me an,, be beca caus usee the the dist distri ribu buti tion on is is skew skewed ed rig right ht..
35. Mode, Mode, because the the data are at the the nominal level of measurement. 37. Mean, because there are no outliers. outliers. 41. 2.8
6.005
median
(b) Median Median,, becau because se the distri distributio bution n is skewed. skewed.
39. 89.3
=
43. 65.5
45. 35.0
53. (a) Mean, becaus becausee Car A has the highest highest mean of of the three.
(b) Median, Median, becau because se Car B has the highest highest median of the three. (c) Mode, Mode, becaus becausee Car C has the highest highest mode of of the three.
A8
ODD ANSWERS
55. (a) x
L
49.2
(c) Key: 3 ƒ 6
(b) median
=
46.5
23. (a) Greates Greatestt sample sample standar standard d deviatio deviation: n: (ii)
1 13
Data set (ii) has more entries that are farther away from the mean.
2 28
Least sample standard deviation: (iii)
3 6667778
Data set (iii) has more entries that are close to the mean.
=
36
(d) Positively skewed
4 13467
mean
(b) The three three data sets have have the same mean but but have different standard deviations.
5 1113 6 1234
median
25. (a) Greatest sample standar standard d deviatio deviation: n: (ii)
7 2246 8 5
Data set (ii) has more entries that are farther away from the mean.
9
Least sample standard deviation: deviation: (iii)
0
57. Two different symbols are needed because they describe a measure of central tendency for two different sets of data (sample is a subset of the population).
Section 2.4
(page 84)
1. Range = 7, me m ean = 8.1, va variance standard deviation L 2.4 3. Range = 14, m ea ean standard deviation
L L
L
11.1, va v ariance 4.6
5.7, L
21.6,
Data set (iii) has more entries that are close to the mean. (b) The three data sets sets have have the the same same mean, mean, median median,, and mode but have different standard deviations. 27. Similarity: Similarity: Both estimate estimate proport proportions ions of the the data contained within k standard deviations of the mean.
Difference: The Empirical Rule assumes the distribution is bell shaped; shaped; Chebyc Chebychev’s hev’s Theo Theorem rem makes no such assumption.
5. 73
29. 68%
7. The range is the difference between the maximum and minimum values of a data set. The advantage of the range is that it is easy to calculate. The disadvantage is that it uses only two entries from the data set.
33. $12 $1250, 50, $13 $1375, 75, $14 $1450, 50, $55 $5500
9. The units of variance variance are are squared. squared. Its units units are are 2 meaning mean ingles less. s. (Ex (Examp ample: le: dol dollar larss )
31. (a) 51
37. Sample mean
L
(b) 17 35. 24
2.1
Sample standard deviation L 1.3 Max - Min 14 - 4 39. Class width = = 5 5
11. (a) Range
=
25.1
Class
f
(b) Range
=
45.1
4 –5
10
4.5
40.5
6–7
6
6.5
39.0
8–9
3
8.5
25.5
10 – 11
7
10.5
73.5
12 – 14
6
13.0
78.0
(c) Changing Changing the maximum maximum value of of the data set greatly greatly affects the range. 13. (a) has a standard deviation of 24 and (b) has a standard deviation of of 16, because the the data in (a) have more variability. 15. When calculating calculating the population standard deviation, you divide the sum of the squared deviations by n , then take the square root root of that value. When calculating calculating the sample standard deviation, you divide the sum of the squared deviations by n - 1 , then take the square root of that value.
N x
M
=
2
=
Midpoint, x
xf
g xf
32
1 x
2 M2
2 M 2 f
1 x
- 3.7
13.69
136.90
- 1.7
2.89
17.34
0.3
0.09
0.27
17. Company B
2.3
5.29
37.03
19. (a (a)) Lo Loss An Ange gele les: s: 17 17.6 .6,, 37 37.3 .35, 5, 6. 6.11 11
4.8
23.04
138.24
Longg Beach: Lon Beach: 8.7 8.7,, 8.7 8.71, 1, 2.9 2.955 (b) It appears from from the data that that the annual salaries in in Los Angeles are more variable than the salaries in Long Beach.
g 1 x g xf m
=
s
=
21. (a (a)) Ma Male les: s: 40 405; 5; 16 16,2 ,225 25.3 .3;; 12 127. 7.44
Femal Fe males: es: 552 552;; 34, 34,575. 575.1; 1; 185 185.9 .9 (b) It appears from from the data that the SA SAT T scores for females are more variable than the SAT scores for males.
N
C
=
g 1x
261 32 -
N
L
2
m2 f
=
329.78
8.2
2
m2
-
f =
B
=
329.78 32
L
3.2
261
ODD ANSWERS
41. Midpoint, f
n
x
xf
x
1 x x 22 1 x x 22f
x
1
70.5
70.5
-
44
1936
1936
12
92.5
11 1110.0
-
22
484
5808
25
114.5
2862.5
0
0
0
10
136.5
13 1 365.0
22
4 84
4840
2
158.5
317.0
44
1936
3872
=
x s
43.
g xf = 5725
50
g xf =
n
C
g 1x
=
5725 50
=
-
n
1
(b) x
=
5500, s
L
(c) x
=
55 , s
30.28
Set 1
1
-
k2
Section 2.5 =
(page 100)
4.5, Q2
L
18.33
1
Midpoint, x
0
xf
1
2
3
16.9
15.5
261.95
18 – 24
29.8
21.0
625.80
25 – 34
38.3
29.5
1129.85
35 – 44
40.0
39.5
1580.00
45 – 64
78.3
54.5
4267.35
11. (a) Min
65+
39.0
70.0
2730.00
(b) Max
=
= 10,951.55
(c) Q1
=
(d) Q2 1 x x 22f
(e) Q3
24,127.36 27,243.04
(f ) IQR
- 27.82
1212.43 773.95
15. (a) Min
- 21.32
454.54
7,681.73
(b) Max
- 15.82
250.27
7,458.05
53.58
2,052.11
(c) Q1 (d) Q2
= -
- 7.32
=
0.1
2.68
7.18
287.20
0.7
312.58
24,475.01
(e) Q3
=
17.68 33.18
1100.91
42,935.49
- 34.82
g xf =
=
n
C
g 1x
=
g xf
297.4
g 1 x
n
-
2
x 2 f
10,951.55 297.4 -
2
x2 f
1
=
=
3.44 # 100 72.75
CVweights
=
18.47 # 100 187.83
L
L
21.44
4.73 L
5
6
7.5
7
9
8
9
7. True 9. Fal False. se. The 50th percentile is equivalent to Q2.
19. (a) Q1
A
4
6
10
13. (a) Min
20
(b) Max
=
13
(c) Q1
=
1250
=
15
(d) Q2
=
1500
=
17
(e) Q3
=
1950
=
= = =
=
4 -
(f ) IQR
=
=
900 2100
700
1.9
2.1 0.5
1.2
17. Q1 = B, Q2 = A, Q3 = C, because about one quarter of the data fall fall on or below 17, 18.5 is the median median of the entire data set, and about three quarters of the data fall on or below 20.
36.82 136,259.99 296.4
45. CVheights
7.5
5. The student scored above 63% of the students who took the ACT placement test.
(f ) IQR
= 136,259.99
L
=
3. The basketball team scored more points per game than 75% of the teams in the league.
14 –17
1 x x 22
6, Q3
=
4.5
39.80 316.80
x
0.99 and solve for k.
=
2.0 9.0
L
3028
49. 10
19.9 35.2
=
302.8
L
(d) When each each entry is multipl multiplied ied by a constant constant k, the # new sample mean is k x , and the new sample standard deviation is k # s .
0 –4 5 –13
x
s
= 16,456
550, s
(b)
16,456 49
=
f
n
x
2
x 2 f
=
1. (a) Q1
A
x2 f
Class
-
114.5
=
2
-
g 1 x
47. (a) x
A9
9.83
It appears that weight is more variable than height.
(b)
=
2, Q2
=
4, Q3
=
5
Watching Television
0 0
2 1
2
4 5 3
4
5
Hours
9 6
7
8
9
A10
ODD ANSWERS
21. (a) Q1
(b)
=
3.2, Q2
3.65, Q3
=
=
3.9
39. (a) Q1
(b)
Butterfly Wingspans
3
4
42, Q2
27
5
(b) 50%
25. A : z
=
-
25
:
z
=
0
C
:
z
=
2.14
35
45
55
82 65
75
85
(d) 49, becau because se half of of the executiv executives es are older older and half half are younger. 41. 33.75
27. (a) Sta Statis tistic tics: s: z
73
=
7 26
=
63
-
-
23
3.9
L
L
1.43 0.77
(b) The student student did better better on the statistics statistics test. test. 29. (a) Sta Statis tistic tics: s: z
Biology: z
78
=
=
63
-
7 29
-
23
3.9
L
L
43. 19.8
Uses and Abuses for Chapter 2
2. The salaries of employees at a business could contain an outlier.
The median is not affected by an outlier because the median does not take into account the outlier’s numerical value.
1.54
None of the selected tires have unusual life spans. (b) Fo Forr 30,500 30,500,, 2.5th percen percentile tile
Review Answers for Chapter 2
Clas Classs
Midpoin pointt
Frequency, Boun Bounda dari ries es
f
Rel freq
20–23
21.5
19.5–23.5
1
0.05
1
24–27
25.5
23.5–27.5
2
0.10
3
28–31
29.5
27.5–31.5
6
0.30
9
32–35
33.5
31.5–35.5
7
0.35
16
36–39
37.5
35.5–39.5
4
0.20
20
g f
For 35,000, 50th percentile 33. About 67 inches; 20% of the heights are below 67 inches. 35. z1
=
z2
=
z3
=
69.2 2.9
L
69.2 2.9
L -
62
-
80
-
69.2 2.9
1.66 2.48
3.72
The heights that are 62 and 80 inches are unusual. 37. z
=
71.1
-
2.9
69.2
3.
iquid Volume 12-oz Cans 12
y10 c n 8 e u 6 q e r F 4 2
L
L
0.66
About the 70th percentile
(page 107)
1.
For 37,250, 84th percentile
-
(page 105)
1. Answers will vary.
2.14
(b) The student student did better better on the statistics statistics test. test. 34,000 - 35,000 31. (a) z1 = L - 0.44 2250 37,000 - 35,000 z2 = L 0.89 2250 31,000 - 35,000 z3 = L - 1.78 2250
74
56
(c) Half of the the ages are between between 42 and and 56 years. years.
(c) 25%
A z -score of 2.14 2.14 would would be unusual.
Biology: z
=
Ages
1.43
B
49, Q3
42 49 56
Wingspan (in inches)
23. (a) 5
=
Ages of Executives
2.8 3.2 3.65 3.9 4.6 2
=
5 7 8 . 1 1
5 1 9 . 1 1
5 5 5 9 9 . 9 . 1 1 1 1
5 5 3 7 0 . 0 . 2 2 1 1
5 1 1 . 2 1
Actual volume (in ounces)
=
20
g
f n
=
Cum freq
1
A11
ODD ANSWERS
5.
Class lass
Midp idpoin oint
79 –93
31. Between $21.50 and $36.50
Frequ equency ency,, f
33. 30
86
9
94 –108
101
12
109 –123
11 1 16
5
124 –138
13 1 31
3
37. 56
139 –153
14 1 46
2
43. 23% scored higher than 68.
154 –168
16 1 61
1
45. z
g f
=
35. Sample mean
32
39. 14
=
41. 4 47. z
2 1 6 1 6 1 6 1 6 7 8 0 1 3 4 6 7 1 1 1 1 1 1
Number of meals
3789
Clas lass
Class boun bounda dari ries es
101–112
106.5
100.5–112.5
3
0.12
3
113–124
118.5
11 112.5–124.5
11
0.44
14
125–136
130.5
124.5–136.5
7
0.28
21
137–148
142.5
136.5–148.5
2
0.08
23
149–160
154.5
148.5–160.5
2
0.08
25
3 11234578
Frequency, f
10
y c 8 n e 6 u q e r 4 F
1 Height of Buildings
2
60
s55 e i r50 o t s45 f o40 r35 e b 30 m u25 N
5 . 5 . 4 6 9 0 1
5 . 5 . 5 . 5 . 8 0 2 4 1 3 4 5 1 1 1 1
Minutes
5 . 6 6 1
Rel freq freq
Cum freq freq
(c) Relative frequency histogram
Weekly Exercise
4 347
Weekly Exercise y c n0.40 e u0.32 q e r 0.24 f e v i t 0.16 a0.08 l e R
5 . 6 0 1
5 . 8 1 1
5 . 0 3 1
5 . 2 4 1
Minutes
5 . 4 5 1
(d)) Sk (d Skew ewed ed
20
(e)
400 40 0 50 500 0 60 600 0 70 700 0 80 800 0
(f ) Weekly Exercise
Weekly Exercise
Height (in feet)
The number of stories appears to increase with height. d e ) 160 r s e d t s n 140 i 120 g a s e r u 100 o 80 r h e b t 60 n 40 m i u ( 20 N
101 117.5 123 131.5
American Kennel Club
=
Median =
19. Skewed
157
100 110 120 130 140 150 160
Minutes
r r n r o e e e v d v d e l e a i r r o i r b t t a r e G r e L
e l g a e B
n r a d e m h r e p e G h s
r d e r i e n i r u h r s h e s k t r h o c a Y D
8.6 =
15. 31.7
3. (a) Footwear 18%
9
27. Population mean
=
Stan St anda darrd dev eviiat atio ion n 29. Sample mean
=
3.22 3.
2453.4
Stan St anda darrd dev deviiat atio ion n
25. 2.8
Clothing 13%
9 L
L
306. 30 6.11
5 . 2 4 1
5 . 4 5 1
(b) U.S. Sporting Goods
17. 79.5
23. Median
5 . 5 . 5 . 5 . 4 6 8 0 9 0 1 3 1 1 1
2. 125. 125.2, 2, 13. 13.00
9 21. Skewed left
y c25 n e u q20 e r f e15 v i t 10 a l u m 5 u C
Minutes
r e x o B
Breed
Mode
(page 111)
Midpoin pointt
(b) Frequency histogram and polygon
2 0 12 33 34 45 55 78 89
13. Mean
1.25, not unusua unusuall
1. (a)
12
11.
=
Chapter Quiz for Chapter 2
y10 c n e 8 u q 6 e r F 4
9.
1. 2
L
2.33, unusu unusual al
14
5
2.5
Stan and dard dev deviation
Meals Purchased
7. 1
L
Equipment 28%
Recreational transport 41%
U.S. Sporting Goods ) s r a l l 32 o d 30 s f e o 24 l a s 18 S n o 12 i l l 6 i b n l r i r t a t ( n g a n o e o p i s m t n i p a a e t r u r q c e E R
n i h t o l C
Sales area
e w t o o F
A12
ODD ANSWERS
4. (a (a)) 75 751. 1.6, 6, 78 784. 4.5, 5, no none ne
The mean best describes a typical salary because there are no outliers. (b)) 57 (b 575; 5; 48 48,1 ,135 35.1 .1;; 21 219. 9.44 5. Between $125,000 and $185,000 6. (a) z
=
3.0, un unusual
(b) z
L
-
(c) z
L
(d) z
=
1.33 2.2 , unusual
7. (a) 71 71,, 84 84.5 .5,, 90
(b) 19 (c)
Wins for Each Team
71 84.5 90 101
43 40
50
60
70
80
90 10 100
Number of wins
Real Statistics–Real Decisions for Chapter 2 (page 112)
1. (a) Find Find the average price price of automobile automobile insuranc insurancee for each city and do a comparison.
(b)) Fin (b ind d the the mean, mean, ran range ge,, and pop popul ulati ation on sta standa ndard rd deviation for each ea ch city. 2. (a) Construct Construct a Pareto Pareto chart because because the the data in use are are quantitative and a Pareto chart positions data in order of decreasing height, with the tallest bar positioned at the left.
(b) e c n ) 2200 a r s r u a 2000 s l n l i o 1800 d f o n 1600 i e ( c i 1400 r P
Price of Insurance per City
A y t i C
B y t i C
D y t i C
C y t i C
City
(c) Yes. Fr From om the Pareto chart chart you can can see that City A has the highest average automobile insurance premium followed by City City B, B, City D, D, and City C. C. 3. (a (a)) Fin ind d the the mean, mean, ran range ge,, an and d popu popula lati tion on sta standa ndard rd deviation for each city city..
(b)
City A
City B
x
=
$2191.00
x
s
L
$351.86
s
range
=
$1015.00
=
$1772.00
x
s
L
$418.52
s
=
$1347.00
L
$2029.20 $437.54 =
$1336.00
City D
x
range
=
range
City C
4. (a) Tell your your readers readers that on on average, average, the price price of of automobile insurance premiums is higher in this city than in other cities.
(b) Loc Locati ation, on, wea weather ther,, pop popula ulatio tion n
6.67, very unusual
-
(c) Yes. City A has the highest highest mean mean and lowest lowest range range and standard deviation.
= L
range
$1909.30 $361.14 =
$1125.00
SELECTED ANSWERS
A1
Selected Answers
CHAPTER 1
Review Answers for Chapter 1
Section 1.1
28. Convenience sampling is used because of the convenience of surveying people leaving one restaurant.
28. Parameter. Parameter. 12% is a numerical numerical description description of all new magazines.
30. Because of the convenience convenience sample taken, the study may be biased toward the opinions of the student’s friends.
36. (a) An inference inference drawn drawn from the the sample is is that the number of people who have strokes has increased every year for the past 15 years.
32. In heavy interstate traffic, it may be difficult to identify every tenth car that passed the law enforcement official.
(b) This This inference inference implies implies the same trend will continue continue for the next 15 years.
Section 1.3 2. False. False. A census is is a count of an entire population.
CHAPTER 2
Section 2.1 10. (a) 5
(b) and (c)
6. Use sampling because it would be impossible to ask ev ery consumer whether he or she would still buy a product with a warning label.
Clas Classs
Midp Midpoi oint nt
Clas Classs boun bounda dari ries es
8. Take a census because the U.S. Congress keeps records on the ages of its members.
16 –20
18
15.5 – 20.5
21 –25
23
20.5 – 25.5
10. Stratified sampling is used because the persons are divided into strata and a sample is selected from each stratum.
26 –30
28
25.5 – 30.5
31 –35
33
30.5 – 35.5
12. Cluster sampling is used because the disaster area was divided into grids and 30 grids were then entirely selected. Certain grids may have been much more severely damaged than others, so this is a possible source of bias. 14. Systematic sampling is used because every twentieth engine part is sampled. It is possible for bias to enter into the sample if, if, for some reason, the assembly line performs differently on a consistent basis. 18. Simple random sampling is used because each telephone has an equal chance of being dialed and all samples of 1012 phone numbers have an equal chance of being selected. The sample may be biased because only homes with telephones have a chance of being sampled. 20. Sampling. The population of cars is too large to easily record their color. color. Cluster sampling is advised advised because it would be easy to randomly select car dealerships then record the color for every car sold at the selected dealerships. 26. Stratified sampling ensures that each segment of the population is represented. 28. (a) Advantage: Advantage: Usually Usually results results in a saving savingss in the survey survey cost.
36 – 40
38
35.5 – 40.5
41 – 45
43
40.5 – 45.5
46 – 50
48
45.5 – 50.5
(b) Disadvantage Disadvantage:: There There tends tends to be a lower lower respon response se rate and this can introduce a bias into the sample. Sampling technique: Convenience sampling
12. Frequ requen ency cy,, f
MidMidpoin pointt
Rela Relati tive ve freq freque uenc ncy y
16 – 20
100
18
0.03
100
21 – 25
122
23
0.04
222
26 – 30
900
28
0.30
1122
31 – 35
207
33
0.07
1329
36 – 40
795
38
0.26
2124
41 – 45
568
43
0.19
2692
46 – 50
322
48
0.11
3014
Class
g f
=
3014
g
f n
=
1
Cum umul ulat ativ ive e freq freque uency ncy
A2
SELECTED ANSWERS
24.
30. Frequency ncy, f
Mididpoin pointt
30 –113
5
71.5
0. 0 .1724
5
114 –197
7
155.5
0.2414
12
198 –281
8
239.5
0.2759
20
282 –365
2
323.5
0.0690
22
366 – 449
3
407.5
0.1034
25
450 –533
4
491.5
0.1379
29
Class
g f
=
Relat elativ ive e freq freque uenc ncy y
g
29
f n
Cumu mula lati tiv ve freq freque uenc ncy y
Frequ equency ency,, f
Mid Midpoin pointt
Relat elativ ive e freq freque uenc ncy y
Cum umul ula ativ tive freq freque uenc ncy y
10 –23
11
16.5
0.3438
11
24 – 37
9
30.5
0.2813
20
38 – 51
6
44.5
0.1875
26
52 – 65
2
58.5
0.0625
28
66 – 80
4
72.5
0.1250
32
Class
g f
=
g
32
1
=
Frequ equency, cy, Class
f
Mid Midpoin pointt
32 – 35
3
33.5
0.1250
3
36 – 39
9
37.5
0.3750
12
40 – 43
8
41.5
0.3333
20
44 – 47
3
45.5
0.1250
23
48 – 51
1
49.5
0.0417
24
g f
=
Relat elativ ive e freq freque uenc ncy y
Cum umul ula ativ tive freq freque uenc ncy y
g
24
Pungencies of Peppers 9 8
7 y c 6 n e 5 u q 4 e r 3 F2
f n
L
1
Class with greatest relative freque fre quenc ncy: y: 10 –23
ATM Withdrawals
26.
f n
0.40 y c 0.35 n e 0.30 u q 0.25 e r f e 0.20 v 0.15 i t a 0.10 l e R0.05
Class with least freque fre quenc ncy: y: 52–65
relative
5 . 5 . 5 . 5 . 5 . 6 0 4 8 2 1 3 4 5 7
Dollars
32. =
1
Frequency, cy,
Mididpoin pointt
Rela elativ tive freq freque uenc ncy y
Cumu mula lati tiv ve freq freque uenc ncy y
7
7.5
0.28
7
9 –10
8
9.5
0.32
15
11 –12
6
11.5
0.24
21
13 –14
3
13.5
0.12
24
15 –16
1
15.5
0.04
25
Class
f
Class with greatest frequency: 36–39
7 –8
Class with least frequency: frequency: 48–51
1
g f
33.5 37.5 41.5 45.5 49.5
Pungencies (in 1000s of Scoville units)
=
g
25
Acres on Small Farms
28.
y0.35
Frequ requen ency cy,, Class
f
MidMidpoin pointt
2456 –2542
7
2499
0.28
7
2543 –2629
3
2586
0.12
10
2630 –2716
2
2673
0.08
12
2717 –2803
4
2760
0.16
16
2804 –2890
9
2847
0.36
25
g f
=
25
Pressure at Fracture Time 10 9 8 y c 7 n 6 e 5 u q 4 e r F 3 2 1 2499
2673
2847
Pressure (in pounds per square inch)
Rela Relati tive ve frequ frequen ency cy
Cumu Cumula lati tive ve freq freque uency ncy
g
f n
=
1
c n0.30 e u0.25 q e r 0.20 f e v0.15 i t a0.10 l e R0.05
f n
=
1
Class with greatest relative freque fre quenc ncy: y: 9 –10 Class with least frequency: freque ncy: 15–16
7.5 9.5 11.513.515.5 11.513.515.5
Acres
34. Class
Freq Freque uenc ncy y, f
Rela Relati tive ve freque frequency ncy
Cumu Cumula lati tive ve freque frequency ncy
16 – 22
2
0.10
2
Class with greatest frequency: 2804–2890
23 – 29
3
0.15
5
30 – 36
8
0.40
13
Class with 26300 –27 263 –2716 16
37 – 43
5
0.25
18
44 – 50
0
0.00
18
51 – 57
2
0.10
20
least
frequency:
relative
g f
=
20
g
f n
=
1
SELECTED ANSWERS
Location of the greatest increasee in frequenc increas frequency: y: 30 –36
Daily Saturated Fat Intake
y c n e 20 u q e r 15 f e v10 i t a l u 5 m u C
40. (a)
SAT Scores y0.20 c 0.18 n e 0.16 u0.14 q e r 0.12 f 0.10 e 0.08 v i t 0.06 a 0.04 l e R0.02
5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 2 9 6 3 0 7 1 2 2 3 4 5 5
5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . 5 . 7 3 9 5 1 7 3 9 5 1 5 5 4 4 4 3 3 2 2 2 4 5 6 7 8 9 0 1 2 3 1 1 1 1
Daily saturated fat intake (in grams)
36.
SAT scores
Freq Freque uenc ncy y, Class
f
Rela Relativ tive e freque frequency ncy
1 –5
5
0.2083
5
6 – 10
9
0.3750
14
11 – 15
3
0.1250
17
16 – 20
4
0.1667
21
21 – 25
2
0.0833
23
26 – 30
1
0.0417
24
g f
=
g
24
Length of Long-Distance Phone Calls
y c n30 e u25 q e r 20 f e v15 i t a 10 l u m 5 u C
0. 5
10. 5
20 . 5
f n
Cumu Cumula lativ tive e freque frequency ncy
(b) 48%, becaus becausee the sum sum of the relati relative ve frequencie frequenciess for the last four classes is 0.48. (c) 698, becaus becausee the sum sum of the the relative relative freque frequencies ncies for the last seven classes is 0.88.
Section 2.2 18.
Advertisements
150 =
f
Location of the greatest increase incre ase in freque frequency: ncy: 6 –10
22.
3 0. 5
MidMidpoin pointt
Rela Relati tive ve freq freque uenc ncy y
Cum umul ulat ativ ive e freq freque uenc ncy y
17
1
0.4048
17
3–5
16
4
0.3810
33
6–8
7
7
0.1667
40
9 –11
1
10
0.0238
41
12 –14
0
13
0.0000
41
15 –17
1
16
0.0238
42
=
42
Number of Children of First 42 Presidents
g
f n
L
1
Class with greatest frequency: 0 –2 Class with least frequency: 12–14
20
y c 15 n e u q 10 e r F 5 − 2 1 4 7 10 13 16 19
Number of children
450
55 0
650
750
850
2003 NASA Space Shuttle Expenditures
) s n o 700 i l l i 600 m 500 n 400 i ( 300 s r 200 a 100 l l o D d t y i d r o k
0 –2
g f
35 0
It appears that most of the 30 people from the United States see or hear between 450 and 750 advertisements per week. (Answers will vary vary.) .)
n i l t n a i a v o s o t e t m l l e t a c c a l i e b k n r h r e a a e l s c t V u u o x r e c E i R h e v a r t x e
38. Class
25 0
Number of ads
1
Length of call (in minutes)
Frequ requen ency cy,,
A3
e n i g n e n i a M
e s r e a d w a r d r g a p h u t h g i l F
r e t s o o b t e k c o r d i l o S
Operations
The greatest NASA space shuttle operations expenditures in 2003 were for vehicle and extravehicular extravehicular activity; the least were for solid rocket booster. (Answers will vary.) vary.) 26.
Ultraviolet Index 10 8 x e d n i 6 V U4 2 14 1516 1718 19 2021 22 23
Date in June
Of the period period from from June June 14 to 23, the ultravi ultraviolet olet index index was highe highest st from from June June 16 16 to 21 21 in Memp Memphis his,, TN TN.. (Answers will vary.)
A4
28.
SELECTED ANSWERS
) d n k u a o p e t s r f e o p s e r a c l l i r o P d n i (
Section 2.4
Price of T-Bone Steak 7.50
40.
Class
f
6.00
145–164
5.50
7.00
5.00 0 1 2 3 4 5 6 7 8 9 0 1 9 9 9 9 9 9 9 9 9 9 0 0 9 9 9 9 9 9 9 9 9 9 0 0 1 1 1 1 1 1 1 1 1 1 2 2
Year
Midpoint, x
xf
8
154.5
1236.0
165–184
7
174.5
1221.5
185–204
3
194.5
583.5
205–224
1
214.5
214.5
225–244
1
234.5
234.5
N
It appears that the price of a T-bone steak steadily increased from 1991 to 2001. x
30. (a) The pie pie chart should should be displaying displaying all four quarters, quarters, not just the first three.
(b)
-
Sales for Company B
1 x
M
20
2nd quarter 15%
m
=
10. The shape of the distribution is skewed left because the bars have a “tail” “tail” to the left. 12. (7), the distribut distribution ion of values values ranges from from 20,000 20,000 to 100,000 and the distribution is skewed right owing to a few executives’ having much m uch higher salaries. 14. (8), the distribution of values ranges from 80 to 160 and the distribution is basically symmetric.
213.4
mode
=
=
s
=
1200
40
1600
1600
60
3600
3600
C
mean, mea n, be beca caus usee the the dist distri ribu buti tion on is is skew skewed ed left left..
B
=
medi me dian an,, be beca caus usee the dis distr trib ibut utio ion n is skew skewed ed left left..
C = mod mode, e, bec becaus ausee it’ it’ss the the dat dataa entry entry tha thatt occu occurre rred d most most often. Frequ Frequency ency,, f
1
6
2
5
3
4
4
6
5
4
6
5
g f
=
Results of Rolling Six-Sided Die 6
y c n e u q e r F
5 4 3 2 1 1
2
3
4
5
Number rolled
30
Uniform
6
=
A
9600 20
L
21.9
1 x x 22 1 x x 22f
xf
0
1
0
-
1.93
3. 3 .72
3.72
1
9
9
-
0.93
0. 0 .86
7.74
2
13
26
0.07
0.00
0.00
3
5
15
1.07
1.14
5.70
4
2
8
2.07
4. 4 .28
8.56
x
=
f
N
9600
174.5
2
m2
-
=
f
217
34. A
=
2
m2 f
Class
n
(b) Median Median,, becau because se the distrib distribution ution is skewed. skewed.
3490 20
-
42.
214
Class Class
0
400
=
3490.0
2 M 2 f
20
N
=
3200
0
g xf
Section 2.3
median
1 x
400
g 1x
L
2 M2
g 1 x
3rd quarter 45%
32. (a) x
g xf
20
=
0
1st quarter 20%
4th quarter 20%
50.
6.50
s
=
30
g xf =
n
C
=
g 1x
=
n
x
x
g xf = 58
58 30 -
L
1.9
x22 f
1
g 1 x
=
A
25.72 29
L
0.9
-
2
x 2 f
=
25.27
SELECTED ANSWERS
44.
Class
f
Midpoint, x
Review Answers for Chapter 2
xf
0.5 – 9.5 10.5 – 19.5 20.5 – 29.5 30.5 – 39.5 40.5 – 49.5 50.5 – 59.5 60.5 – 69.5
11 1 1.9 12 12.1 14 14.0 18 18.5 16 16.6 16 16.3 17 1 7.8
5 15 25 35 45 55 65
59.5 181.5 350.0 647.5 747.0 896.5 1157.0
70.5 – 79.5 80.5 – 89.5 90.5 – 99.5
12 12.4 6. 6 .3 1. 1 .3
75 85 95
930.0 535.5 123.5
n
g xf
127.2
=
1 x x 22
1 x x 22 f
39.25 - 29.25 - 19.25 - 9.25 0.75 10.75 20.75 30.75 40.75
15 1540.5625 855.5625 3 70 70.5625 85.5625 0.5625 115.5625 430.5625 945.5625 16 1660.5625
18,332.69 10,352.31 5,187.88 1,582.91 9.34 1,883.67 7,664.01 11,724.98 10,461.54
50.75
2575.5625
3,348.23
x
x
-
g 1 x g xf
x
=
s
=
n
C
=
g 1x
n
5628 127.2 -
L
2
x 2 f
=
=
A
70,547.56 126.2
L
(b)
=
Income of Employees y c0.35 n e0.30 u q0.25 e r f 0.20 e v0.15 i t a0.10 l e R0.05 5 . 5 . 5 . 5 . 5 . 1 5 9 3 7 2 2 2 3 3
Income (in thousands of dollars)
The class with the greatest relative frequency is 32–35 and that with the least is 20–23.
5628
4.
iquid Volume 12-oz Cans y c0.45 n0.40 e u0.35 q0.30 e r 0.25 f e0.20 v i t 0.15 a0.10 l e R0.05 5 5 5 5 5 5 5 7 1 5 9 3 7 1 8 . 9 . 9 . 9 . 0 . 0 . 1 . 1 1 1 1 2 2 2 1 1 1 1 1 1 1
Actual volume (in ounces)
6.
70,547.56
Meals Purchased y c35 n e u30 q e25 r f e20 v i 15 t a l u10 m 5 u C
23.64 8.
15.125, Q2
=
15.8, Q3
Railroad Equipment Manufacturers
13.8 15.125
17.65
19.45
15.8 13.5 14.5 15.5 16.5 17.5 18.5 19.5
Hourly earnings (in dollars)
=
5 . 5 . 5 . 5 . 8 3 8 3 7 9 0 2 1 1
5 . 8 3 1
5 . 3 5 1
Number of meals
Section 2.5 22. (a) Q1
2.
44.25
x22 f
1
-
=
A5
5 . 8 6 1
Average Daily Highs
17.65 12
22
32
42
Temperature (in ˚F)
52
CHAPTER 3