Additional "er!inology:
Lower Class Limit – "he least value that can belong to a class. Upper Class Limit – "he greatest value that can belong to a class. Class Width – "he difference beteen the upper 0or loer class li!its of consecutive classes. All classes should have the sa!e class idth. Class Midpoint – "he !iddle value of each data class. "o find the class !idpoint( average the upper and loer class li!its.
class !idpoint 3
upper
+
loer
2
Class Boundaries – "he nu!bers that separate classes ithout for!ing gaps beteen the!. Rane 0of data – "he highest value – the loest value Ex: Fro! the frequency table of statistics grades on p1.
"he upper class li!its are: "he loer class li!its are: "he class !idpoints are: "he class boundaries are: "he range is:
"he idth of each class is:
2
Creating a Frequency "able:
1. Decide on the nu!ber of data classes you ish to use. 2. Divide the range of the data by the nu!ber of classes to get an estimate of class idth. 04ound up. +. Decide on class upper and loer li!its.
0Start ith loest data value if
ascending( highest if descending. NOTE: if the lower limit is 5 and the class width
is 2, the class will be 5 – 6.
-. Construct the frequency table by counting the nu!ber of data values in each class. 05seful to !a6e tally !ar6s. Class Exercise:
Construct a frequency table 0in ascending order ith , data classes fro! the folloing data set. 07eave space( e ill be adding to this. A!ount of gasoline purchased by 2) drivers: '( -( 1)( -( &( )( )( '( ,( 2( &( *( &( 12( -( 1-( 1*( '( 1$( 2( +( 11( -( -( &( 12( *( +
+
8athe!atical otation:
9n this course( the folloing sy!bols and variables ill have the !eanings given belo 0unless otherise specified.
ariables x 3 data value n 3 nu!ber of values in a sa!ple data set N 3 nu!ber of values in a population data set f 3 frequency of a data class Sy!bol 3 the su! of all values for the folloing variable or e#pression. ∑
Ex: 5sing our notation( e can rite the state!ent that the su! of the frequencies in a frequency table should equal the nu!ber of values in the sa!ple data set as follos:
-
Cu!ulative Frequency:
"he cumulati!e frequency of a data class is the nu!ber of data ele!ents in that class and all previous classes. 09t can be ascending or descending. Ex:
Class
Frequency 0 f
&$/&& )$/)& '$/'& ,$/,& *$/*& -$/-&
, + 2 1
Cu!ulative Frequency 1$ 11' 1& 2$
otice that the last entry in the cu!ulative frequency colu!n is n 3 2$. Class Exercise: Add a cu!ulative frequency colu!n to the table of gasoline purchases.
*
4elative Frequency:
"he relati!e frequency of a data class is the percentage of data ele!ents in that class. e can calculate the relative frequency for each class as follos: relative frequency 3
f n
Ex:
Class
Frequency 0 f
Cu!ulative Frequency
&$/&& )$/)& '$/'& ,$/,& *$/*& -$/-&
, + 2 1
1$ 11' 1& 2$
4elative Frequency 0 f n .2$ .+$ .2$ .1* .1$ .$*
"ote: "he su! of the relative frequencies should be 10or 1$$;. f
∑n
=1
Class Exercise: Add a relative frequency colu!n to table of gasoline purchases.
,
#E$CR%B%"& #'(' $E($:
ar graphs: ertical bars represent nu!ber or percentage of responses for each variable 0also see p *& ?areto Chart @istogra!s: 7i6e a bar graph( e#cept the data is continuous( so bars touch Frequency ?olygon: Connect the !idpoints of each class to !a6e a polygon 0another na!e for this is a line graph Ste! and 7eaf ?lots: !aintain the e#act data 0section 2.2 >o# ?lots: based on quartiles 0section 2.*
A cu!ulative frequency graph or distribution is also called an ogive.
'
@istogra!s:
A historam is a graphical representation of the infor!ation in a frequency table using a bar graph ith sides touching. "he histogra! should have the variable being measured in the data set as its horizontal axis) and the class frequency as the vertical axis. ach data class ill be represented by a vertical bar whose height is the frequency of the class and whose width is the class width. Example: Created in #cel fro! the data used in the previous e#a!ples.
otice that the bar for each class is centered at the class !idpoint( and the bars for successive classes touch. Class Exercise: Construct a histogra! for the frequency table of gasoline purchases.
)
Frequency ?olygon:
A frequency polyon is a line graph representation of the infor!ation in a frequency table. 7i6e a histogra!( the vertical a#is represents frequency and the hori=ontal a#is represents the variable being !easured in the data set. "o construct the graph( a point is plotted for each class at its midpoint and with height given by the frequency of the class. "he points are then connected by straight lines. Ex: Created in #cel using the sa!e data as in the previous e#a!ples.
Class Exercise: Construct a frequency polygon fro! the gasoline purchase frequency table. o construct a c!m!lati"e frequency polygon 0ogive fro! the gasoline purchase frequency table. hat does the slope of the line seg!ents tell you in either caseB hat does a line seg!ent ith =ero slope 0flat tell you in a cu!ulative frequency polygonB
&
2.2 8ore Eraphs and Displays:
Ste! and 7eaf ?lot:
A stem and leaf plot reports the e#act data by the left!ost nu!ber0s being part of the ste!( and the right!ost nu!ber0s being the leaves. Ex: Eiven the previous statistics e#a! grades for 2$ statistics students( let us create a ste! and leaf plot.
&'( &2( ))( '*( )+( ,'( )&( **( '2( ')( )1( &1( *'( ,+( ,'( '-( )'( )-( &)( -,
1$
2.+ 8easures of Central "endency:
A measure of central tendency is a value used to represent the typical or average value in a data set. (hree Common Measures of Central (endency:
Mean – 0average the su! of all data values divided by the nu!ber of values in the data set. "he !ean of a sa!ple data set is denoted •
by x and the !ean of a population data set by the Eree6 letter µ . Sa!ple data set:
?opulation data set:
x =
∑ x
µ =
n
∑ x N
Exercise: Find the !ean of the folloing data set:
ui= Scores: 1( *( '( '( ,( )( 1$( &( *( 1$( )
Median – the value hich separates the largest *$; of data values fro! the loest *$;. "o calculate the !edian( place data values in •
nu!ber order. 9f n is odd( the middle "al!e is the !edian. 9f n is even( the mean of the two middle "al!es is the !edian. Exercise: Find the !edian value for the set of qui= scores.
Find the !edian if the lo score of 1 is dropped.
Mode – the data value 0or values hich appears the largest nu!ber of ti!es in the set. 9f no data value is repeated( e say that there is •
no !ode. Exercise: Find the !ode0s of the qui= score data set. •
*utlier – a data entry far re!oved fro! the other entries in the data set. 11
?roperties of 8ean( 8edian( and 8ode: •
8ean is the !ost co!!only used !easure of central tendency.
"he !ode has the advantage that it can be used to !easure data sets e"en if the# contain onl# %!alitati"e data. A disadvantage is that a data set !ay not have a !ode. Example: good use: 8odal college !aGor not such good use: !odal height in !illi!eters.
eighted 8eans:
A weihted mean is used hen e ant so!e data values in a set to factor !ore often into the calculation of the !ean than others. 9n this case( e attach a nu!erical weiht 0w to each value and calculate the !ean as follos: 0 x ×w ∑ x = ∑w Note: This is e%!i"alent to co!nting each data "al!e the n!mber of times gi"en b# its weight . Ex: Erade
point average. e assign the letter grades the nu!ber values A3-( >3+( C32( D31( F3$( and then each grade value is counted into the E?A according to the nu!ber of credits earned 0course%s eight ith that course grade. Course grade. "he final grade in a course is calculated according to the folloing scale: ui==es count for 1*;( + e#a!s hose average counts ,$;(
12
and the final e#a! is orth 2*;. e can eigh the score for each co!ponent of the final grade ith its percentage to calculate the final grade. Exercises +use the pre!ious pae,s information-: •
•
Calculate the E?A of a student ho has earned 12 credits of A%s( 21 credits of >%s( * credits of C%s and + credits of D%s.
Calculate the final score for a student ho has scored &* on qui==es( has e#a! scores of )+( &-( and ''( and a final e#a! score of )).
o( suppose e had the folloing test scores in a class: '$( )$( )$( )$( )$( )$( )$( )$( )$( )$( &$( 1$$( 1$$ Create a frequency distribution ith - 0single value classes. hat is the !ode and the !edianB @o ould you calculate the !eanB
1+
sti!ating a 8ean fro! a Frequency "able:
Eiven the frequency distribution of a data set( e can !a6e the best esti!ate of the !ean for the data set by using a eighted !ean. 1. Calculate the class midpoint for each data class.
lower
+
!pper
2
0
values for calculating the eighted !ean. 2. 5se the frequency of the data class as the weiht for each data class. +. Calculate the eighted !ean by the eighted !ean for!ula( or: 0 xmid ×f ∑ x = ∑ f Exercise: sti!ate the !ean of the data set hose frequency distribution is given by: Class Frequency 0 f &$/&& )$/)& , '$/'& ,$/,& + *$/*& 2 -$/-& 1
1-
Shapes of Data Distributions:
$ymmetric – "he data distribution is appro#i!ately the sa!e shape on either side of a central dividing line. "he !ean and !edian 0and !ode if uni!odal are equal in a sy!!etric distribution.
Ex: 8en%s @eights( SA" 8ath scores
Left.$/ewed – A fe data values are !uch loer than the !aGority of values in the set. 0"ail e#tends to the left Eenerally the !ean is less 0to the left than the !edian 0and !ode in a left/ s6eed distribution.
Ex: #a! scores ith a fe students doing poorly
1*
Riht.$/ewed – A fe data values are !uch higher than the !aGority of values in the set. 0"ail e#tends to the right Eenerally the !ean is greater 0to the right than the !edian 0and !ode in a right/s6eed distribution.
Examples: ?ersonal 9nco!e in the 5.S.( 8en%s eights
Uniform – All data values are equally represented.
Example: u!ber rolled on a die
1,
2.- 08AS54S
0ariation in a data set is the a!ount of difference beteen data values. 9n a data set ith little variation( al!ost all data values ould be close to one another. "he histogra! of such a data set ould be narro and tall. Example: ui= Scores: +( +( -( -( -( -( -( -( *( *( *
9n a data set ith a great deal of variation( the data values ould be spread idely. "he histogra! of this data set ould be lo and ide. Example: ui= Scores: 1( +( -( *( ,( ,( '( )( )( &( 1$ Co!!on 8easures of ariation:
12 Rane – the difference beteen the largest and s!allest data values in a data set. range = ( highest value − loest value ) 32 $tandard #e!iation – "he !ost co!!only used !easure of variation. A !easure of the average distance of a data value fro! the !ean for the data set. Standard deviation is calculated using to different for!ulae depending on hether the data set being considered is a population data set or a sa!ple data set. 4opulation standard de!iation) sig!a σ
σ (
=
is calculated using the folloing for!ula:
∑ 0 x − µ
2
N
$ample standard de!iation ( s( is calculated using the folloing for!ula :
s =
∑ 0 x − x
2
n −1
52 0ariance – the square of the standard deviation. ?opulation variance is represented by σ 2 and sa!ple variance by s 2
1'
Calculating Standard Deviation 5sing the For!ula:
1. Calculate the !ean of the data set. 2. Subtract the !ean fro! each data value in the set. "hese values are called the de!iations of the data values. +. Square each of the deviations calculated in Step 2. -. Su! the squares calculated fro! Step +. *. Divide the su! fro! Step - by the population si=e for population standard deviation or the sa!ple si=e !inus 1 for sa!ple standard deviation. ,. "a6e the square root of the result of Step *. Exercise: Find the range and standard deviation of the data set of qui= scores used in the previous e#a!ple: ui= Scores: 1( *( '( '( ,( )( 1$( &( *( 1$( )
1)
sti!ating Standard Deviation using a Frequency "able 0Erouped Data:
Eiven the frequency distribution of a data set( e can !a6e the best esti!ate of the standard deviation for the data set by using the sa!e technique as for !ean. 1. Calculate the class !idpoint for each data class. "hese ill be our data values for calculating the standard deviation. 2. 5se the frequency of the data class as the eight for each data class !idpoint. 0"hat is( !ultiply by the frequency rather than having to su! that !any ti!es. +. Calculate the standard deviation by using the for!ula:
s =
∑ 0 xmid − x
2
×f
0sa!ple
<4
n −1
& =
∑ 0 xmid − µ
2
× f
0population
N
Exercise: sti!ate the standard deviation of the data set hose frequency distribution is given by: Class
Frequency 0 f
&$/&& )$/)& '$/'& ,$/,& *$/*& -$/-&
, + 2 1
1&
5sing the "9/)+ for 8ean( 8edian H Standard Deviation: $tep 1: Enter the #ata 0alues
?ress IS"A"J. A !enu ill appear in hich D9" is selected( choose D9" by pressing I"4J nter the data values into one of the lists 71( 72( etc. 5se the arro 6eys or press enter to enter the ne#t value in the list. ?ress I2ndJI8
?ress IS"A"J. 5se the arro 6eys to highlight the CA7C !enu. Select the first entry 1.0ar $tats fro! the CA7C !enu by pressing I"4J
x &x x
σ
n 8ed
is the !ean of the data set is the standard deviation if the set is a sa!ple is the standard deviation if the set is a population is the nu!ber of data values is the !edian of the data set.
5sing the "9/)+ ith a Frequency "able: "he esti!ated !ean( !edian( and standard deviation for data in a frequency table can be calculated using the "9/)+ as follos: $tep 1: Enter the Midpoints and 6requencies
nter the class !idpoints in 71 and the corresponding frequencies in 72. $tep 3: Calculate
Choose 1.0ar $tats fro! the CA7C !enu as before and hen it appears on the screen choose 71 co!!a 72: 1/ar Stats 71( 72 ?ress I"4J to calculate. $tep 5: Read 0alues
"he sa!e calculations as before ill be displayed.
2$
"heore!s 9nvolving Standard Deviation:
"he standard deviation of a data set is an i!portant quantity because it li!its the nu!ber of data values that can be very far 0high or lo fro! average. (he Empirical Rule +78.9.992; RuleApplies only to bell'shaped distributions. • Appro#i!ately ,); of data values !ust be ithin 1 standard deviation of • the !ean. Appro#i!ately &*; of data values !ust be ithin 2 standard deviation of • the !ean. Appro#i!ately &&.'; of data values !ust be ithin + standard deviation of • the !ean. Ex: 8en%s @eights have a bell/shaped distribution ith a !ean of ,&.2 inches and a standard deviation of 2.& inches. >eteen hat heights does &*; of the !ale population lieB
Chebyche!,s (heorem Applies to an# data set. • "he portion 0; of data values that !ust be ithin $ 0 for $ K1 standard • 1
deviations of the !ean is at minimum: 1 − $ 2 Ex: 7ets try 632( +( -( *. hat happens as 6 increasesB 0See p)+ e#).
Ex: A class of +$ statistics students has a !ean e#a! score of '+ ith a standard deviation of ' points. At !ini!u!( )).&; of the students scored beteen hat scoresB Appro#i!ately ho !any students( at !ini!u!( scored ithin 2 standard deviations of '+B
21
"ote: Chebychev%s "heore! gives only cautious lower bo!nds for the proportion of data values( hereas the !pirical 4ule gives appro#i!ations. 9f a data distribution is $nown to be bell'shaped, the Empirical (!le sho!ld be !sed . 2.* 8easures of ?osition:
6ractiles divide a data set into consecutive intervals so that each interval has 0at least appro#i!ately the sa!e nu!ber of data values. "he !ost co!!on fractiles are: •
4ercentiles – divide a data set into 1$$ parts. For e#a!ple( the +,th percentile is the value hich separates the loest +,; of data values fro! the highest ,-; of data values and is denoted by ? +,.
•
•
#eciles – divide a data set into 1$ parts. For e#a!ple( the 'th decile is the value hich separates the loest 'L1$ of data values fro! the highest +L1$ of data values and is denoted D '.
"ote: "here are && percentiles ?1/?&&( + quartiles 1/+( and & deciles D 1/D&. "ote: ?*$ 3 2 3 D* 3 8edian Ex: 5sing the qui= scores fro! before put the! in order( then find and interpret 1/+.
ui= Scores: 1( *( '( '( ,( )( 1$( &( *( 1$( )
A >o# 0and his6er ?lot illustrates the range( 1( 2 0!edian and +. 7ets dra one and discuss it.
)*e can also do all of this in o!r calc!lators, an#one interested+ )p -25
22
Ex: 9f your doctor tells you your + year old is in the *$th percentile for height and the +*th percentile for eight( hat does that !eanB "he Standard Score:
"he standard score 0or =.score of a data value is the n!mber of standard de"iations that the "al!e lies abo"e or below the mean for a bell/shaped distribution. "hin6 about it. "he larger the =/score( the MMMMMMMMMMMMMMMMMMMMthe !ean. "he MMMMMMMMMMMMMMMMMMMMMMthe !ean( the MMMMMMMMMMM the percentage of data beteen the !ean and that =/score. Standard Scores can be calculated using the for!ula: . =
x − µ σ
Exercise: 8en have a !ean height of ,&.2 inches ith a standard deviation of 2.& inches. Find the standard 0=/score of a !an ho is:
, feet tall
*%1L2%%
,%+%%
o find the percentile for the last to !en above. "ote: "he =/score of a value is positive if the value is above the !ean and negative if it is belo the !ean. "he !ean itself alays has a =/score of MMMMM.
A data value is considered to be unusual if it is !ore than to standard deviations fro! the !ean. A data value is unusually high if it has a =/score larger than 2 and unusually lo if it has a =/score of less than /2.
2+
"hin6 about the !pirical 4ule and Chebychev%s "heore!. hy does this !a6e senseB #: p112 N+2( +, 0loo6 at uses H abuses charts p 11*
2-