Overview of Probability and Statistics
Gregory Rahn & Regina Rahn
Copyright 2001 Genemetrix
2
2 Overvi Overview ew of of Proba Probabil bility ity and and Stati Statisti stics cs Probability Theory  known distribution or population
• Population parameters are known with certainty  mean (µ)  variance (σ2) shape parameters (skewness & kurtosis) Use the distribution to acquire probabilities of the occurrence of certain events Defined explicitly for the distribution

• •
Statistics  start with data (observed values from an unknown "empirical" distribution) Function ionss of the data data that that estim estimate ate param paramete eters rs {mean {mean,, varian variance, ce, skewn skewness ess,, and • Funct
•
kurtosis} Estimate probabilities probabilities
Copyright 2001 Genemetrix
3
Statistics  Estimation of Parameters Measures of Location Average ( X ) n
∑ Xi
X = i =1
most common measure of central tendency
n
Median (Md) Md = the the value value that divides ranked observations observations in half = X(n+1)/2 if n is odd X n / 2 + X ( n / 2 )+1 = if n is even 2
Mode (Mo) Mo = the most frequent data point
Ex. Data {3, 2, 9, 1, 6, 8, 2}
Ranked Data {1, 2, 2, 3, 6, 8, 9}
X = (3+2+9+1+6+8+2)/7 = 31/7 = 4.43
Md = X(7+1)/2 = X4 = 3 Mo = 2 most frequent frequent observation observation (occurred twice) twice)
Copyright 2001 Genemetrix
4
Properties of the Average Σ(Xi X )2 is less than the squared deviations from any other estimate Ex. Σ(Xi X )2
≤ Σ(XiMd)2
 average is the minimum variance estimate
Gets pulled in the direction of extreme points
Example
Data {1, 2, 3, 4, 9} _ X
Including X5 = 9
X5
X = 3.8 Md = 3
Md
_ X Excluding X5 = 9
X = 2.5 Md = 2.5
Md
• Average can be very sensitive towards extreme points, while the median is fairly robust Sensitivity depends upon the sample size and the deviation of the extreme point!
• Assumption of X : Xi's are independently and identically distributed (i.i.d.) This is often not a good assumption!
Copyright 2001 Genemetrix
5
Measures of Dispersion Range (R) R = Xn  X1
= largest value  smallest value
• Must sort data from low (X 1) to high (Xn) Ex. Data {3, 2, 9, 1, 6, 8, 2}
Ranked Data {1, 2, 2, 3, 6, 8, 9}
R =9–1=8
Properties of the Range Bad: It only uses two pieces of information. Good: It is easy to compute manually. Uses of the Range • Range itself is useful for characterizing a distribution (order statistics) ∧
• Range can be used to estimate the standard deviation ( σ = R/d2) • Many practical applications once the standard deviation is approximated: 
Control Charts Process Capability Gage Repeatability & Reproducibility
Problems when using the Range to Approximate the Standard Deviation The d2 coefficient depicts the relationship between the range and standard deviation for a normal distribution. Thus, the Range method for estimating standard deviation is only valid if the parent distribution is normally distributed.
Copyright 2001 Genemetrix
6
Sample Variance (S 2 ) n
∑ ( X i − X )
2
S2 = i =1
=
sum of squares deg rees of freedom
n −1
Most common and reliable measure of dispersion
Ex. Data {3, 2, 9, 1, 6, 8, 2} S2 = S=
Ranked Data {1, 2, 2, 3, 6, 8, 9}
(3 − 4.43) 2 + ( 2 − 4.43) 2 + ... + (2 − 4.43) 2 7 −1
= 61.71/6 = 10.286
2 S = 3.207
Xi
(Xi X )
3 2 9 1 6 8 2 Average = 4.43
–1.43 –2.43 4.57 –3.43 1.57 3.57 –2.43
(Xi X ) }
Importance of S: 4.43
Copyright 2001 Genemetrix
(Xi X )2 2.04 5.90 20.88 11.76 2.46 12.74 5.90 Sum = 61.71
7
• Same units as measurements • Positive numbers that increase when variability increases Sample variance is the unbiased and minimum variance estimate for the population variance (irrespective of the distribution type)
The sample variance is really an average of the squared deviations. n
∑ ( X i − X )
2
S2 = i =1
=
sum of squares deg rees of freedom
n −1
Why (n1) degrees of freedom?
Only (n1) independent deviations!
Ex. Data {1, 2, 3, 4, 5}
Σ(Xi X ) = 0 ΣXi  n X = 0 ΣXi  ΣXi = 0
X = Σxi /n
Example ( X = 3) Xi Dev 1
2
2 3 4 5
1 0 1 2 Sum Dev = 0
Copyright 2001 Genemetrix
8
Grand Average and Pooled Variance Estimates
• Subgroup averages and variances are merged into historical estimates of average and variance – used for control chart centerlines
• Grand average ( x ) = average of subgroup averages • Pooled variance ( S p2 ) = average of subgroup variances m
x=
∑X i
if n is constant
i =1
m m
∑ ni Xi
x=
i =1 m
if ni is variable
Always Correct
∑ ni
i =1
m
2
S p =
∑
Si2
if n is constant
i =1
m m
∑ ν S
2 i
i
2 S p =
i =1 m
∑ ν
if ni is variable
Always Correct
i
i =1
Copyright 2001 Genemetrix
9
Copyright 2001 Genemetrix
10
Probability Theory Distribution Functions
Discrete Distributions Discrete Probability Density Function ( pdf ): f(x) = Pr[X=x] Properties of the discrete pdf 1) f(x) ≥ 0 each probability is greater than or equal to 0. 2) ∑ f(x) = 1 the sum of the probabilities equals 1.0 ∀x
Discrete Cumulative Distribution Function ( cdf ): F(x) = P(X ≤ x) = Ex. Binomial (n=5 trials, p=.2)
f(x) = n
( ) x
5
f(0) = ( 0 ) .20 (1.2)5 = 0.32768 5 f(1) = ( 1 ) .21 (1.2)4 = 0.4096
∑ f(t) t≤x
n
( ) px (1p)nx x
n!
= x!(n − x)!
Probability of 0 successes in 5 trials Probability of 1 success in 5 trials
5
f(2) = ( 2 ) .22 (1.2)3 = 0.2048 5 f(3) = ( 3 ) .23 (1.2)2 = 0.0512
Probability of 2 successes in 5 trials Probability of 3 successes in 5 trials
5
f(4) = ( 4 ) .24 (1.2)1 = 0.0064
Probability of 4 successes in 5 trials
5
f(5) = ( 5 ) .25 (1.2)0 = 0.00032
Probability of 5 successes in 5 trials
f(0) + f(1) + f(2) + f(3) + f(4) + f(5) = 1.0
Property of a pdf
F(2) = P(X≤2) = f(0) + f(1) + f(2) = 0.94208
Copyright 2001 Genemetrix
11
Joint Probability of Multiple Events P = probability of success F = probability of failure = 1P Pr[success on 1st trial] = P Pr[success on 1st trial “and” success on 2nd trial] = P*P Pr[success on 1st trial “and” success on 2nd trial “and” failure on 3 rd trial]=P*P*F = P 2(1P) Therefore: Pr[x successes in n trials] = Px(1P)nx
Combinations: n
Number of combinations = ( x ) =
n! x!(n − x)!
Number of combinations of 1 success in 5 trials = 1) 2) 3) 4) 5)
S F F F F
F S F F F
F F S F F
F F F S F
S F F F S S S F F F
F S F F S F F S S F
F F S F F S F S F S
5 1
5! 1!(5
F F F F S
Number of combinations of 2 success in 5 trials = 1) S 2) S 3) S 4) S 5) F 6) F 7) F 8) F 9) F 10) F
( ) =
( ) = 5 2
−1)! = 5
5!
2!(5 −2)! =
F F F S F F S F S S
Copyright 2001 Genemetrix
5x 4 2!
= 10
12
Continuous Distributions Continuous Probability Density Function ( pdf ): f(x) does not equal a probability Properties of the continuous pdf 1) f(x) ≥ 0 the function is positive over all the region of X +∞
2)
∫ f(x) dx = 1
the total area under the curve equals 1.0 (probability)
−∞
Cumulative Distribution Function ( cdf ): F(x)
= P(X ≤ x) =
x
∫ f(t) dt
−∞
f(x)
x F(x) = area under f(x) to the left of x
(0 ≤ x ≤ 1)
Ex. f(x) = 2x 1
1
∫ 2x dx = x 0
2
 =
0
12  02 = 1
x
F(x) =
∫ 2t dt = t 0
x
2
 =
0
x2
area under the curve equals 1, thus proving it is a pdf
F(.5) = 0.52 = 0.25 = Pr[X ≤ .5]
Probabilities and Percentage Points (Variates) from Common Distributions
Copyright 2001 Genemetrix
13 Tables or functions exist for common distributions such as Z, t, F, and chisquared to: • determine the lower tail probability for a given value of x • determine the value of x based on the lower tail probability Area between two limits b
Pr[a < X < b] =
∫ f(x) dx = F(b)F(a)
= Pr[Conformance] if a=LSL and b=USL
a
Copyright 2001 Genemetrix
14
Expectations Discrete Distributions Let the possible values (sample space) for X be denoted by x 1,x2, ... ,xn and f(xi) = Pr[X=xi] n
E[X] =
∑ xi f(xi)
i=1
n
E[X ] = 2
∑ x 2i f(xi)
i=1
n
E[u(X)] =
∑ u(xi) f(xi)
i=1
Ex. Binomial (n=5,p=.2)
E[X] =
0 (0.32768) + 1 (0.4096) + 2 (0.2048) + 3 (0.0512) + 4 (0.0064) + 5 (0.00032)
E[X2] =
= 1.0 = np *binomial property
= 1.8
Consideration: What if f(xi) was a constant ∀ i? (ex. 1/n) n
X ∑ xi (1/n) = i∑= 1 i = sample average k
E[X] =
i=1
n
The sample average puts an equal weighting on all observations.
Copyright 2001 Genemetrix
02 (0.32768) + 12 (0.4096) + 22 (0.2048) + 32 ((0.0512) + 42 (0.0064) + 52 (0.00032)
15
Continuous Distributions +∞
E[X] =
∫ x f(x) dx
−∞ +∞
E[u(X)] =
∫ u(x) f(x) dx
−∞
Ex. f(x) = 2x
(0 ≤ x ≤ 1) 1
E[X] =
1
∫ x 2x dx = ∫ 2x dx = 2/3 x 0 1
E[X ] = 2
1
2
3
0
1
0
3
0
= 2/3
1
∫ x 2x dx = ∫ 2x dx = 2/4 x 2
 0
4 0
= 2/4
Copyright 2001 Genemetrix
16
Variance is an Expectation VAR[X] = E[(XE[X]) 2]
where E[X] = µ
VAR[X] = E[X2]  {E[X]} 2
when factored out
VAR[X] = "Expected value of the product minus the product of the expected values"
Ex. Binomial (n=5,p=.2)
VAR[X] = (1.8)  {1} 2 = 0.8
Ex. f(x) = 2x
= npq = np(1p) *binomial property
(0 ≤ x ≤ 1)
VAR[X] = (2/4)  {2/3} 2 = 0.055555
Copyright 2001 Genemetrix
17 Example: Discrete Expected Value
Daily sales records for a computer manufacturing firm show that it will sell 0, 1, or 2 mainframe computer systems with probabilities as listed. Number of sales (x) Probability f(x) A)
0 0.7
1 0.2
2 0.1
Find the expected value and standard deviation of daily sales. Expected value of daily sales: 3
E(x) =
∑
x f(x) = (0)(0.7) + (1)(0.2) + (2)(0.1) = 0.4 mainframe computers
i =1
Standard deviation (σx) of daily sales:
σx2 = E [x2] – E[x]2 3
E [x ] = 2
∑
x2 f(x) = (02)(0.7) + (12)(0.2) + (22)(0.1) = 0.6
i =1
σx2 = 0.6 – (0.4)2 = 0.44
B)
σx = (0.44)1/2 =
0.6633
The firm’s daily fixed cost is $30,000 and their marginal cost is $200,000 (cost per unit). If a mainframe system sells for $500,000, what is the expected daily profit? Daily profit = Revenues – costs
Fixed daily cost = $30,000; Cost per unit = $200,000; Revenue per unit = $500,000 Daily profit = (revenue per unit)(expected value sold) – fixed daily cost – (cost per unit)(expected value ) = (500000)(0.4)  (30000) – (200000)(0.4) =
Copyright 2001 Genemetrix
$90,000 per day
18 Example: Continuous Expected Value
The outside diameter of washers is a continuous random variable, x, distributed uniformly from 300 – 320 mm. Calculate: A)
f(x) Let x = outside diameter x
This is a uniform distribution f(x) = c, a constant
for
a pdf
∫ f(x) dx = 1
−∞
320
∫
c dx = 1 c(320 – 300) = 1
solve for c c = 1/20
300
Therefore: f(x) = 1/20 for 300 < x < 320 f(x) = 0 elsewhere
B)
E[x] 320
E[x] =
∫
320
320
∫
x f(x) dx =
300
x/20 dx = 1/40 x 2
300

= 1/40 (102400 – 90000)
300
= 310 mm
C)
VAR[x] VAR[x] = E[x2] – E[x]2 320
E[x2] =
∫
300
320
320
x2 f(x) dx =
∫
2 x /20 f(x) dx = 1/60 x3
300
 300
Therefore: Var[x] = 96133.33 – (310)2 = 33.33
Copyright 2001 Genemetrix
= 96133.33
19
Median and Mode Median – value of the 50 th percentile
F(x) 1.0
0.5
X
0 Md
Mode – value with the largest f(x)
Value of X where the derivative of f(x) equals 0 f(x)
Mo
Copyright 2001 Genemetrix
X
20
Specific Distributions Discrete
Binomial X=Nn= number of successes in n trials f(x) =
n
( ) px (1p)nx x
F(x) =
x={0,1, ...,n}
x
∑ ( ) pt (1p)nt n t
t = 0
E[X] = np = µ
VAR[X] = npq = σ2
Example The probability that a piece of luggage will survive the stress test is 0.65. If six bags are randomly tested: A)
What is the probability that exactly four will survive? Given: P(luggage survives) = 0.65
P(luggage fails) = 0.35
Exactly 4 bags survive, let x = number that survive This is binomial. p = 0.65, q = 0.35, n = 6 P(x = 4) = P(4) = 0.3280 B)
( ) p4 q2 = ( ) (0.65)4 (0.35)2 = (15)(0.1758)(0.1225) = 6 4
6 4
Given that the 1st and 2nd bags survived, what is the probability that the 3 rd and 4th bags will fail? Note here that the trials are independent, and that trials 1 and 2 already occurred, so the probability of their occurrence = 1. Let x = number that survive p = 0.65, q = 0.35, n = 2 P(x = 0) = P(0) =
( ) p0 q2 = ( ) (0.65)0 (0.35)2 = 2 0
2 0
Copyright 2001 Genemetrix
0.1225
21
Poisson X=N(t)= number of arrivals occurring in a given time interval f(x) =
e
−λt
E[X] =
x
x={0,1, ..., ∞ )
x! x
F(x) =
(λt)
e
∑
−λt
(λt)
i
i!
i=0
λt = µ
VAR[X] = λt = σ2
Example The manufacturing defect rate of a product is 0.005 defects per unit. probability of zero defects occurring in 100 units?
λt = 0.005 DPU * 100 units = 0.5 f(0) =
e−
0.5
(0.5)
0!
0
= 0.60653
Copyright 2001 Genemetrix
What is the
22 Continuous
Normal 1 (X − µ ) 2 exp 1/2 2πσ σ
[
f(x) =
]
∞
≤ x ≤ ∞
x
∫ f(t) dt which is estimated numerically.
F(x) =
−∞
Since an infinite number of meanvariance combinations exist, a standardized variable was developed.
Standard Normal Transformation Z=
X − µ
σ
Transforms all the observations of any normal random variable X to a new set of observations of a standard normal variable Z. E[X] = E[Z] = 0
VAR[X] = 2 VAR[Z] = 1
Proof:
E[Z] =
E[X] − µ
σ
=
µ−µ = 0 σ
VAR[Z] = VAR[
X − µ
σ
] = VAR[
X
σ
]
σ2 VAR[Z] = 2 VAR[X] = 2 = 1 σ σ 1
Corollary: VAR[cX] = c2 VAR[X] X  N( , 2) ⇒ Z  N(0,1) Importance: a single table of Z probabilities can be used for all combinations of ( µ,σ2). FYI S2 is an unbiased estimate of σ2, but
2 S is a biased estimate of σ. Some authors espouse using a C4 index to compensate for the bias induced by taking the square root. The problem with the C 4 index is that the VAR[Z] no longer equals 1 as described above.
Copyright 2001 Genemetrix
23
Two Types of Normal Distribution Problems 1) 3 Knowns
Transform to Z
Find corresponding probability
Example Given a normal distribution with µ = 50 and σ = 10, find the probability that X falls within its specification limits of 45 and 62.
Pr[45 ≤ X ≤ 62] = Pr [
45 − 50 10
≤
X − µ
≤
62 − 50 10
]
= Pr[0.5 ≤ Z ≤ 1.2]
σ
Pr[Z ≤ 1.2] = 0.8849 Pr[Z < 0.5] = 0.3085 Pr[0.5 ≤ Z ≤ 1.2] = 0.8849 – 0.3085
2) Known probability
45 50
62
X Space
0.5 0
1.2
Z Space
Find corresponding Z value
Solve for 1 unkown given 2 knowns
Example On an examination, the average grade was 74 and the standard deviation was 7. If 12% of the class are given A’s, and the grades are follow a normal distribution, what is the lowest possible A?
Pr[Z < z] = 0.88 1.175 =
X − 74
7
z = 1.175 X = 82.225
82.225
X Space
1.175 Two Types of Sampling Normal Distribution0Problems
Z Space
1) 4 Knowns
Transform to Z
74
Find corresponding probability
Copyright 2001 Genemetrix
24
Example Given a normal distribution with µ = 50, σ = 10, and a sample size of n = 40, find the probability that X falls within its control limits of 47 and 54.
Pr[45 ≤
X
≤ 62] = Pr
47 − 50
[ 10
54 − 50
]
= Pr[1.90 ≤ Z ≤ 2.53]
Find corresponding Z value
Solve for 1 unkown given 3 knowns
40
≤ X − µ ≤ σ
n
10
40
Pr[Z ≤ 2.53] = 0.9943 Pr[Z < 1.90] = 0.0287 Pr[1.90 ≤ Z ≤ 2.53] = 0.9943 – 0.0287 = 0.9656 2) Known probability
Example A drilling operation produces holes with diameters that are approximately normally distributed. If the process mean and variance are 2.1 and 0.0225, respectively, what should be the sample size to ensure that no more than 14% of the sample means will be greater than 2.15?
This is normally distributed, where µ = 2.1 and σ2 = 0.0225. We want to find n. Given: P(x > 2.15) < 0.14 or P(x < 2.15) > 0.86 Transform to Z P(Z < Z*) > 0.86 Look up Zvalue in the table Z* > 1.08 Now: Z* > Solve for n:
X −µ σ
n
=
2.15 − 2.1 0.0225
n
=
(0.05) n (0.15)
n > (1.08)(0.15)/(0.05) = 3.24
n > 10.49
Therefore: n > 11 (Need a whole number sample.)
Copyright 2001 Genemetrix
25
Assumptions of the standard normal distribution 1) X is normally distributed 2) is known with certainty 3) 2 is known with certainty 4) observations (xi) are independently and identically distributed (i.i.d.)
When the population variance is known the Z distribution is used.
Z=
X −µ σ
n
When the population variance is unknown, there is uncertainty in the estimate of Therefore, a wider distribution was developed to account for this uncertainty.
t=
t Distribution:
X − X 2 p
S / n
σ2.
2
Sp = pooled variance
Probabilities and percentage points can be obtained from a t table. E[t]=0, VAR[t]=1 Example The outside diameter of washers follows a normal distribution with a mean of 1.20 inches. A sample of 9 washers will result in a sample standard deviation of 0.03”. Calculate the probability that a sample mean will lie between 1.18140 and 1.22306.
This is normally distributed, where µ = 1.20”, s = 0.0225, and n = 9. P(1.18140 < X < 1.22306) = P(t1 < t < t2) Use the transformation: t1 = t2 =
1.1#140 −1.20 0.03
"
1.22306 −1.20 0.03
"
= 1.86 = 2.306
Therefore: P(1.86 < t < 2.306) = P(t < 2.306) – P(t < 1.86) = 0.975 – 0.05 = 0.925
(Look up values.)
Copyright 2001 Genemetrix
26
2.1 Types of Inferences  Gather some knowledge concerning the population using data
Considerations: 1) Are the samples representative of the population? (Sampling) 2) How do we make inferences about the population parameters? 3) How reliable are these inferences?
Sampling  In order to obtain valid inferences of the population, we must obtain samples that are representative of the population.
Random Sample  observations are made independently (x 1, x2, ..., xn) and randomly  each value (xi) came from distributions having the same pdf {f(x)}  i.i.d.: independently and identically distributed
Importance  Joint probability equals the product of the marginal probabilities.  COV[X1,X2]=0, Variance of the sum equals the sum of the variances  Rational sample
Copyright 2001 Genemetrix
27
Hypothesis Testing Make a hypothesis (assumption) about the population parameter of interest
ex.
H0: Null hypothesis H0: µ=4
HA: Alternative hypothesis (compliment of H 0) HA: µ ≠ 4
α /2
α /2
µ0
Two Conclusions:
1) Reject H0 2) Cannot reject H0  Can never "accept" because we don't know what the true parameter really is, however we can conclude that it is not some value.
Copyright 2001 Genemetrix
28
Hypothesis Testing of the Mean Test Statistic:
Z=
X −µ
σ
n
Critical Values (define rejection regions)
Zcrit = Z α /2 and Z 1α /2
Compute test statistic (Zcalc) where does the observed value fall with respect to the assumed reference distribution? Rejection Criterion: Given a mean of µ0, (1α) of the values will fall between Z crit and Zcrit. If the calculated statistic (Z calc) falls in the rejection regions, then with a probability of (1α) this sample did not come from a population with mean µ 0. µ ≠ µ 0 Possible Situations:
Cannot Reject H0 Reject H0
H0 is True Correct Decision Type I Error
H0 is False Type II Error Correct Decision
Type I Error  "Wrongful rejection"  rejection of null hypothesis when it is true Pr[Type I Error] = α Type II Error  "Wrongful acceptance"  "acceptance" of the null hypothesis when it is false Pr[Type II Error] = β Pr[Rejection] = α when null hypothesis is true Pr[Rejection] = 1 β = "power" when null hypothesis is false
Copyright 2001 Genemetrix
29
Hypothesis Testing of the Variance Test Statistic:
χ2 =
(n − 1)S
2
σ2
Copyright 2001 Genemetrix
30
Example: Hypothesis Test Using a Z Distribution
For a random sample of 50 measurements on the breaking strength of cotton threads, the mean breaking strength was found to be 210 grams and the standard deviation 18 grams. A)
The manufacturer claims that the population mean is 215 grams. State the hypothesis and solve for α = 0.10. The claim is that µ = 215 g. This is a twotailed test. Null hypothesis: HO: µ = 215 Zcritical = Z0.10/2 = Z0.05 = 1.645 Compare Zcalc to Zcritical :
Alternative hypothesis: HA: µ Zcalc =
X − µ σ
n
=
210 − 215 1#
50
≠
215
= 1.96
−1."6 is not less than 1.645
Therefore: Reject HO that µ = 215 for α = 0.10. Manufacturer’s claim is invalid.
C)
Is there evidence that the population mean of breaking strength exceeds 218 grams? State the hypothesis and solve for α = 0.05. Using an α = 0.05, check if the population mean is > 218g. This is a onetail test. Null hypothesis: HO: µ < 218
Alternative hypothesis: HA: µ > 218
Zcritical = Zα = Z0.05 = 1.645
Zcalc =
X − µ σ
n
=
210 − 21# 1#
50
= 3.771
Compare Zcalc to Zcritical : 3.771 is not greater than 1.645 Therefore: The null hypothesis, HO: µ < 218, cannot be rejected at α = 0.05.
Copyright 2001 Genemetrix
31 Example: Hypothesis Test Using a t Distribution
An auto company states that its new compact car has an average fuel economy (miles per gallon) greater than or equal to 55 mpg on the highway. Eight cars were randomly selected and driven. The results of the study were: 57, 52, 50, 49, 53, 51, 47, and 55. State the hypothesis and solve for α = 0.05. The claim is that the average mpg > 55 mpg. This is a onetailed test with (n1) = ν = (81) = 7 (degrees of freedom). Null hypothesis: HO: µ > 55
Alternative hypothesis: HA: µ < 55
Since the true variance is unknown a t distribution will be used.
X = (57 + 52 + 50 + 49 + 53 + 51 + 47 + 55 ) / 8 = 51.75 mpg #
Sp = 2
( xi − 51.75) 2
∑
n −1
i =1
S p2 n
= s X =
s
n
=
=
73.5 7
3.24037 #
= 10.5
s = 3.24037
= 1.14564
tcritical = t ν, 1α = t7, 0.95 = 1.895 (from tables) tcalc =
X − µ 2 p
S n
=
51.75 − 55 1.14564
α = 0.05, and
= 2.8368
Compare tcalc to tcritical : 2.8368 < 1.895
Therefore: Reject the null hypothesis, HO: µ > 55 at a = 0.05 Manufacturer’s claim is invalid.
Copyright 2001 Genemetrix
32 Example: Hypothesis Test Using a
Distribution
2
The same auto company as in the previous example claims that the true variance of fuel economy (mpg) is less than or equal to 5. Using the same data, state the hypothesis and solve for α = 0.01. The company claims that 0.01.
σ2 of car fuel economy < 5. This is a onetailed test with α =
Null hypothesis: HO: σ2 < 5
Alternative hypothesis: HA: σ2 > 5
From exercise 27, Sp2 = 10.5 and
ν = 7
χ2critical = χ2 ν, α = χ27, .01 = 18.745 (from tables) χ calc = 2
(n −1) S p 2
σ
2
=
(7)(10.5) 5
= 14.7
Compare χ2calc to χ2critical : 14.7 is not greater than 18.745
Therefore: Reject the null hypothesis, HO: σ2 < 5, at α = 0.01. Manufacturer’s claim is invalid.
Copyright 2001 Genemetrix
33
Decision Making Using Conditional Probabilities Example  Number of people in a small town
Male (M)
Female ( M )
Employed (E)
50
25
75
Unemployed ( E )
10
15
25
Total
60
Marginal Probabilities
•
Total
Consider only one distribution
Pr[Employed] = 75 / 100 = 0.75 Pr[Unemployed] = 25 / 100 = 0.25 Pr[Male] = 60 / 100 = 0.6 Pr[Female] = 40 / 100 = 0.4
Copyright 2001 Genemetrix
40
100
34
Joint Probabilities
• •
Consider more than one distribution Pr[A, B] = Probability that event A occurred and event B occurred
Pr[Employed, Male] = 50 / 100 = 0.50 Pr[Employed, Female] = 25 / 100 = 0.25 Pr[Unemployed, Male] = 10 / 100 = 0.10 Pr[Unemployed, Female] = 15 / 100 = 0.15
Conditional Probabilities
•
Pr[A  B] = Probability that event A will occur given that event B has already occurred
Pr[Employed  Male] = 50 / 60 = 0.833 Pr[Unemployed  Male] = 10 / 60 = 0.167 Pr[Employed, Female] = 25 / 40 = 0.625 Pr[Unemployed, Female] = 15 / 40 = 0.375
Copyright 2001 Genemetrix