Student Lecture Notes
Chapter 2 Probability, Random Variables and Probability Distributions
1
Learning Objectives 1.
Differences Differ Dif ferenc ences es between betwee bet ween n the the Two Types Types of Random Random Ran dom ar a es
2.
Discrete Disc Di scre rete te Random Rand Ra ndom om Variables Vari Va riab able les s
3.
1.
Describe Discrete Random Variables
2.
Compute the Expected Value & Variance of Discrete Random Variables
Continuous Cont Co ntin inuo uous us Random Rando Ran dom m Variables Variab Vari able les s 1.
Describe Normal Random Variables
2.
Introduce the Normal Distribution
3.
Calculate Probabilities for Continuous Random Variables
4. Assessing Asse As ses ssi sin ng No Norm Normality rma ali lity ty 2
1
Student Lecture Notes
Random Variables
of each possible value in the population.
3
Data Types Data
Numerical
Discrete
4
Continuous
Qualitative
2
Student Lecture Notes
3
Types of Random Variables
Whole Number (0, 1, 2, 3 etc.) Countable, Finite Number of Values z
Jump from one value to the next and cannot take any values in between.
Continuous Random Variables
Obtained by Measuring Infinite Number of Values in Interval z
Too Many to List Like Discrete Variable
5
Discrete Random Variable Examples Variable Children of One Gender in Family
# Girls
Values 0, 1, 2, ..., 10?
, , , ..., Count Cars at Toll # Cars Between 11:00 & 1:00 Arriving 6
0, 1, 2, ...,
Student Lecture Notes
4
Discrete Probability Distribution .
,
x = x = Value of Random Variable (Outcome)
p( p( x x ) = Probability Associated with Value
2. Mutually Exclusive (No Overlap) . 4. 0 ≤ p( p( x x ) ≤ 1 5.
p( x x ) = 1 Σ p(
7
Marilyn says: It may sound strange, but more families of 4 children have 3 of one gender and one of the other than any other combination. Explain this. Construct a sample space and look at the total number of
Sample Space
combinations that can occur, and calculate frequencies. • Are Are all 16 combinations equally likely? Is the sex of each child independent of the other three? P (girl) = 1/2 P (boy) = 1/2 so, P (BBBB) = ½ x ½ x ½ x ½ = 1/16
• If you have a family of of four, what is the probability probability of… of…
8
P(all girls or all boys) = 2/16 = 1/8 P (2 boys, 2 girls)= 6/16 = 3/8 six different ways to have 2 boys and 2 girls P(3 boys, 1 girl or 3 girls, 1 boy)= 8/16=4/8=1/2 8 ways to have 3 of 1 and 2 of the other.
BBBB GBBB BGBB BBGB BBBG GGBB GBGB GBBG
BGBG BBGG BGGG GBGG GGBG GGGB GGGG
Student Lecture Notes
5
Assume the random variable X represents the number of girls in a family of 4 kids. (lower case x is a particular value of X, ie: x=3 girls in the family) Sample Space
Random Variable X
BBBB
x=0
GBBB
x=1
BGBB
9
Number of Girls x
Probability, Px
x=1
0
1/16
BBGB
x=1
1
4/16
BBBG
x=1 x=2
2
6/16
GGBB GBGB
x=2
3
4/16
GBBG
x=2
4
1/16
BGGB
x=2
BGBG
x=2
BBGG
x=2
BGGG
x=3
GBGG
x=3
GGBG
x=3
GGGB
x=3
GGGG
x=4
= . What is the probability of exactly 3 girls in 4 kids?
P(X=3) = 4/16 What is the probability of at least 3 girls in 4 kids? P(X≥ 3) = 5/16
Visualizing Discrete Probability Distributions Listing
Table
{(0,1/16), (1,.25), (2,3/8),(3,.25),(4,1/16) }
Graph Probability, P(x) 6/16
0.40
Number of Girls, x
Probability, P(x)
0
1/16
1
4/16
2
6/16
3
4/16
4
1/16
Total
16/16=1.00
0.35 .
4/16
4/16
0.25
) x ( 0.20 P
0.15 0.10
1/16
1/16
0.05 0.00 0
10
1
2 Number of Girls, x
3
4
X is random and x is fixed. We can calculate the probability that different values of X will occur and make a probability distribution.
Student Lecture Notes
6
Probability Distributions Probability, P(x) 6/16
0.40 0.35 0.30
4/16
0.25
4/16
) x ( 0.20 P
0.15 0.10
1/16
1/16
0.05 . 0
1
2
3
4
Number of Girls, x
Probability distributions can be written as probability histograms. 11
Cumulative probabilities: probabilities: Adding up probabilities of a range of values.
Washington State Population Survey and Random Variables A telephone survey of ouse o s roug ou Washington State.
number of of telephones,x telephones,x
But some households don’t have phones. 0.71 0.70 0.60
0.03500
1
0.70553
2
0.21769
3
0.02966
4
0.00775
5
0.00332
6
0.00088
8
0.00000
9
0.00015
Total
1.00000
.
. ) x0.40 ( P
0.30
0.22
0.20 0.10 0.04
0.03 0.01 0.00
0.00 0
12
P(x) P(x)
0
1
2
3
4
5
6
7
Number of Telephone Lines (x)
8
9
Student Lecture Notes
7
Probabilities about Telephone in Washington State
• telephone?
13
•
What is the probability that a household will have have 2 or or more telephone lines?
•
What is the probability that a household will have have 2 to to 44 phone lines?
•
a s e pro a y a ouse o lines or more than 4 phone lines?
•
Who do you think is in that 3.5% of the the population?
•
What are the implications of this for for the the quality quality of of the survey?
w
ave no p one
Probability Histogram of Telephone Lines, 1998 0.71 0.70 0.60 0.50 ) x0.40 ( P
0.30
0.22
0.20 0.10 0.04
0.03 0.01 0.00
0.00 0
1
2
3
4
5
6
7
Number of Telephone Lines (x)
14
8
9
Student Lecture Notes
8
Summary Measures 1. Expected Value
Weighted Average of All Possible Values
μ = E ( X X ) = Σ x x p( p( x x )
mu
2. Variance Weighted Average Squared Deviation about
σ2 = V(X)= E [ ( x x − μ)2 ] = Σ ( x x − μ)2 p( p( x x ) σ2 = V(X)=E(X 2) −[E(X )]2 3. Standard Deviation σ =√σ σ2 = SD(X)
Sigma -squared
15
What is the average number of telephones in Washington Households and how much does size vary from the average? # of
Approach 1: Variance
Phones
x
P(x)
xP(x)
(x(x-μ)
(x(x- μ)
(x(x-μ) P(x)
x
x P(x) P (x)
0
198,286
0.04
0.00
-1.3
1.65
0.06
0
0.00
1
4,142,030
0.71
0.71
-0.3
0.08
0.06
1
0.71
2
1,278,026
0.22
0.44
0.7
0.51
0.11
4
0.87
3
174,110
0.03
0.09
1.7
2.94
0.09
9
0.27
4
45,499
0.01
0.03
2.7
7.38
0.06
16
0.12
5
19,473
0.00
0.02
3.7
13.81
0.05
25
0.08
6
5,170
0.00
0.01
4.7
22.24
0.02
36
0.03
7
118
0.00
0.00
5.7
32.67
0.00
49
0.00
8
-
0.00
0.00
6.7
45.10
0.00
64
0.00
9
897
0.00
0.00
7.7
59.53
0.01
81
0.01
Sum
16
Frequency
Approach Approach 2: Variance
5,863,609
1.00
=1.28
32.16
2=0.45
2.10
Student Lecture Notes
Chebyshev’s Theorem
a value of a standard deviation •
Empirical rule applies only to data sets with a bellbell-shaped distribution
•
Cheb shev’s theorem a lies to ANY data set, but its result are very approximate
17
Chebyshev’s Theorem •
The proportion (or fraction) of any any set of of data lying within K standard K standard deviations of the mean is always at least 1 – – 1/K 1/K 2 (where K > K > 1)
•
For K For K = = 2 and K = K = 3, the results are as
18
At least ¾ (or 75%) off all values lie within 2 standard deviations of the mean At least 8/9 (or 89%) off all values lie within 3 standard deviations of the mean
9
Student Lecture Notes
10
Cherbyshev’s Rule and Empirical Rule for a Discrete Random Variable
distribution p distribution p(( x x ), ), mean μ, and standard deviation σ. Then, depending on the shape of p( p( x x ), ), the following probability statements can be made: Chebyshev’s Rule Empirical Rule Applies to any probability Applies to probability distributions distribution (eg: telephones that are moundmound -shaped and n as ng on a e symme r c eg: eg: g r s orn o children) P(μ - σ < x <
μ + σ) P(μ - 2σ < x < μ + 2σ) P(μ - 3σ < x < μ + 3σ)
≥0 ≥3/4 ≥8/9
≈.68 ≈.95 ≈1.00
19
Data Types Data
Numerical
Discrete
20
Continuous
Qualitative
Student Lecture Notes
11
Continuous Random Variable all intervals
21
Continuous Random Variable Examples Variable
Values
Weigh 100 People
Weight
45.1, 78, ...
Measure Part Life
Hours
900, 875.9, ... .
Measure Time Between Arrivals 22
,
, ...
Inter - Arrival InterArrival 0, 1.3, 2.78, ... Time
Student Lecture Notes
12
Continuous Probability Density Function .
a ema ca
ormu a
2. Shows All Values, x Values, x , & Frequencies, f( x x )
f( X X ) Is Not Probability
(Value, Frequency)
f(x)
3. Properties
Area under curve sums to 1 Can add up areas of function to get probability less than a specific value
a
b
x
Value
23
Continuous Random Variable Probability ro a y s rea Under Curve!
f(x)
c 24
© 1984-1994 T/Maker Co.
d
X
Student Lecture Notes
Continuous Probability Distribution Models Continuous Probability Distribution
Uniform
Normal
Exponential
25
Importance of Normal Distribution . Continuous Phenomena 2. Can Be Used to Approximate Discrete Probability Distributions
3. Basis for Classical Statistical Inference
26
13
Student Lecture Notes
14
Normal Distribution 1. ‘Bell‘Bell-Shaped’ & Symmetrical
f ( X )
2. Mean, Median, Mode Are Equal X
3. ‘Middle Spread’ Is 1.33 σ 4. Random Variable Has Infinite Range
Mean Median Mode
27
Normal Distribution Useful Properties • About About half of “weight” below mean ecause symmetrical) • About About 68% of probability within 1 standard deviation of mean (at change in curve)
f ( X )
μ + σ
μ − 3σ μ − 2σ μ − σ
• About About 95% of probability • More than 99% of probability within 3 standard deviations
28
X
μ + σ μ + 2σ μ + 3σ
Mean Median Mode
Student Lecture Notes
15
Probability Density Function − ⎛ ⎜
1
= σ
2 π
1 ⎞ ⎛ x
⎟⎜
−
μ
⎞ 2 ⎟
e
(-∞ < x < x = Value of Random Variable ( x < ∞) σ = Population Standard Deviation = . e = 2.71828 μ = Mean of Random Variable x 29
Don’t memorize this!
Notation , The random variable X has a normal distribution (N) with mean μ and standard deviation σ. X is N 40 1 X is N(10,5) X is N(50,3) 30
Student Lecture Notes
16
Effect of Varying Parameters ( & ) f(X) B A
C X
31
Normal Distribution Probability Probability is area under curve!
P (c ≤ x ≤ d ) = ? f ( x) dx
∫
c
f ( x )
c 32
d
x
Student Lecture Notes
17
Infinite Number of Tables Normal distributions differ by mean & standard deviation.
Each distribution would require its own table.
f(X)
X That’s an infinite number! 33
Standardize the Normal Distribution X Normal Distribution
Z is N 0 1 Standardized Normal Distribution
=1
X 34
=0 One table!
Z
Student Lecture Notes
18
Standardizing Example X
6.2 5 10
Normal Distribution
. Standardized Normal Distribution
= 10
=1
= 0 .12
= 5 6.2 X
Z
35
Obtaining the Probability Standardized Normal Probability Table (Portion) Z
.00
.01
.02
=1
0.0 .0000 .0040 .0080
.0478
0.1 .0398 .0438 .0478 0.2 .0793 .0832 .0871
= 0 .12
0.3 .1179 .1217 .1255 36
Probabilities
Z
Shaded area exaggerated
Student Lecture Notes
Z
.00
.01
.02
19
Example P(3.8 X 5)
0.0 .0000 .0040 .0080 0.1 .0398 .0438 .0478 0.2 .0793 .0832 .0871 0.3 .1179 .1217 .1255
Z
Normal Distribution
3.8
5
.12
Standardized Normal Distribution
= 10
=1 .0478
3.8
=5
37
Z
X
-.12
=0
Z
Shaded area exaggerated
.00
.01
.02
0.0 .0000 .0040 .0080 0.1 .0398 .0438 .0478
Example P(2.9 X 7.1)
0.2 .0793 .0832 .0871 0.3 .1179 .1217 .1255
Normal Distribution
Z
Z
X
X
2.9 5 .21 10 7.1 5 .21 Standardized 10 Normal Distribution
= 10
=1 .1664 .0832 .0832
2.9 5 7.1 X 38
-.21 0 .21
Shaded area exaggerated
Z
Student Lecture Notes
Z
.00
.01
.02
20
Example P( X 8)
0.0 .0000 .0040 .0080 0.1 .0398 .0438 .0478 0.2 .0793 .0832 .0871 0.3 .1179 .1217 .1255
X
Normal Distribution
8 5 10
. Standardized Normal Distribution
= 10
=1 .5000
.3821
.1179
=5
8
39
Z
=0
X
.30 Z
Shaded area exaggerated
.00
.01
.02
Example P(7.1 X 8)
0.0 .0000 .0040 .0080 0.1 .0398 .0438 .0478 0.2 .0793 .0832 .0871 0.3 .1179 .1217 .1255
Normal Distribution
Z
Z
X
X
7.1 5 .21 10 8 5 .30 10
Standardized Normal Distribution
= 10
=1 .1179
.0347
.0832
= 5 7.1 8 X 40
Shaded area exaggerated
= 0 .21 .30 Z
Student Lecture Notes
21
Travel Time and the Normal Distribution To help people plan their travel, WSDOT estimates a average r p rom ea e o e evue a : pm (at peak) takes 11 minutes and with a standard deviation of 10. They also believe this travel time approximates a normal distribution. a propor on o r ps a e ess
an
m nu es
41
Process 1. Draw a picture and write down the probability you need. 2. Convert probability to standard scores. 3. Find cumulative probability in the table.
42
Student Lecture Notes
22
More Travel Time from Bellevue. What proportion of trips will make it in that time? ⎛ 10 − 11 ⎞ ⎛ 15 − 11 ⎞ ⎟ < Z < P ⎜ ⎟ ⎝ 10 ⎠ ⎝ 10 ⎠
P (10 < X < 15) = P ⎜
= P (− 0.1 < Z < .4)
= 1 − P ( Z < −0.1) − P ( Z > .4)
Since normal curves are symmetrical: 43
= 1 − P ( Z > .1) − P ( Z > .4) = 1 − (.5 − .0398) − (.5 − .1554) = 1 − (.4602) − (.3446) = .1952
19.5% of trips will make it in between 10 and 15 minutes.
Finding Z Values for Known Probabilities Standardized Normal Probability Table (Portion)
What is Z given P(Z) = .1217? .1217
=1
Z
.00
.01
0.2
0.0 .0000 .0040 .0080 0.1 .0398 .0438 .0478
= 0 .31 Shaded area exaggerated 44
Z
0.2 .0793 .0832 .0871
0.3 .1179 .1217 .1255
Student Lecture Notes
Z
.00
.01
.02
23
Finding Z Values for Known Probabilities
0.0 .0000 .0040 .0080 0.1 .0398 .0438 .0478 0.2 .0793 .0832 .0871 0.3 .1179 .1217 .1255
Standardized Normal Distribution
Normal Distribution
= 10
=1 .1217
?
=5 X
X
.1217
= 0 .31
= μ + Z ⋅ σ = 5 + (. 31 )(10 ) = 8 .1
45
Shaded areas exaggerated
Travel Times Take 3
the time?
46
Z
Student Lecture Notes
Finding Z Values for Known Probabilities . picture P(Z<____)=.99
2. Look up Z value in table 2.325 P(Z<_____)=.99
. using mean and SD. 34.25 2.325 X=μ+Zσ so X=11+(_____)(10)=
So, the trip can be made 99% of the time in 34.25 minutes.
47
Assessing Normality
48
1.
A A histogram of the data is mound shaped and symmetrical about the mean.
2.
Determine the percentage of measurements falling in each of the intervals x ± s, x± 2s, and x± 3s. If the data are approximately normal, the percentages will be approximately equal to 68%, 95%, and 100% respectively.
3.
Find the interquartile interquartile range, IQR, and and standard deviation, s, for the sample, then calculate the ratio IQR/s. If the data are approximately normal, then IQR/S ≈ 1.3.
4.
Construct a normal probability plot for the data. If the data are approximately normal, the points will fall (approximately) on a straight line.
24
Student Lecture Notes
25
Assessing Normality: Is Class Height Normally Distributed? 1.
7
How does the histo ram look?
6
SPSS can produce the line of the normal curve for you. In SPSS select GRAPH, HISTOGRAM. After you choose the variable you want, click on the box “Display Normal Curve” and you’ll get something that looks like this.
5
y c n4 e u q e r F3
2
1 Mean = 66.52 Std. Dev. = 3.117 N = 23 0 60
62
64
66
68
70
72
Height 527 2005
49
Assessing Normality: Is Class Height Normally Distributed? Anticipated Actual Actual Percent Percent
. x±s
Height 527 2005
Valid 60 62
1
4.3
4.3
8.7
63
3
13.0
13.0
21.7
64
2
8.7
8.7
30.4
65
1
4.3
4.3
34.8
.
.
8.7
8.7
68%
43%
x±2s [60.29,72.75]
95%
96%
x±3s [57.17,75.87]
100%
100%
.
67
2
68
2
8.7
8.7
65.2
69
5
21.7
21.7
87.0
70
1
4.3
4.3
91.3
71
1
4.3
4.3
95.7
72
1
4.3
4.3
100.0
23
100.0
100.0
Total
50
Cumulative requency Percent alid Percent Percent 1 4.3 4.3 4.3
[63.40,69.64]
56.5
SPSS: ANALYZE, DESCRIPTIVE STATISTICS, FREQUENCIES
Student Lecture Notes
26
Assessing Normality: Is Class Height Normally Distributed? Statistics
.
oes
s≈ .
IQR=69--64=5 IQR=69 IQR/s=5/3.117=1.6
Height 527 2005 N Valid
23
Missing
0
Std. Deviation Percentiles
3.117 25
64.00
50
67.00
75
69.00
SPSS: ANALYZE, DESCRIPTIVE S TATISTICS, FREQUENCIES then click on STATISTICS and choose the ones you want.
51
Assessing Normality: Is Class Height Normally Distributed? Normal Q-Q Plot of Height 527 2005
.
a oes e norma probability plot look like?
74
72
SPSS: Graphs>QGraphs>Q-Q Test distribu tion is normal and click estimate distribution parameters from data.
e70 u l a V l a68 m r o N d66 e t c p x E64
62
60 60
62
64
66
68
Observed Value
52
70
72
74
Student Lecture Notes
27
Exercise 1 •
Identify the given random variable as being discrete or continuous a)
The weight of the cola in a randomly selected can.
b)
The cost of a randomly selected can of Coke.
c)
The time it takes to fill a can of Pepsi.
53
Exercise 2 •
Below is a case where a probability s r u on s escr e . n s mean and standard deviation. In a study of the MicroSort gendergender selection method, couples in a control rou are not iven a treatment, and the each have three children. The probability distribution for the number of girls is given. 54
0
0.125
1
0.375
2
0.375
3
0.125
Student Lecture Notes
Exercise 3 •
Below is a case where a probability s r u on s escr e . n s mean and standard deviation. To settle a paternity suit, two different people are given blood tests. If x is x is the number havin rou A blood, then x then x can can be 0, 1 or 2, and the corresponding probabilities are 0.36, 0.48 and 0.16, respectively. 55
Exercise 4 •
56
Let the random variable x represent the number of girls in a family of four children. Construct a table describing the probability distribution, then find the mean and standard deviation.
28
Student Lecture Notes
Exercise 5 •
Assume Assume that the readings on the thermometers are º deviation of 1.00º C . A thermometer is randomly selected and tested. In each case, draw a sketch, and find the probability of each reading in degree.
a) Between 0 and 1.50 b)
Between -1.96 and 0
c)
Less than -1.79
d)
Greater than 2.05
e) Between 0.50 and 1.50 f) 57 g)
P (-1.96 < z < z < 1.96) P (z > z > -2.575)
Exercise 6 •
Assume Assume that a test is designed to measure a person’s normally distributed with a mean of 10 and a standard distribution of 2. draw a graph, find the relevant z score, z score, then find the indicated value.
a)
Find the score separating the top 10% from the bottom 90%. 75%.
c)
58
Find the score separating the bottom 20% from the the top 80%.
29