Probability and Statistics Notes

Overview of Probability and Statistics

Gregory Rahn & Regina Rahn

Copyright 2001 Genemetrix

2

2 Overvi Overview ew of of Proba Probabil bility ity and and Stati Statisti stics cs Probability Theory - known distribution or population

• Population parameters are known with certainty - mean (µ) - variance (σ2) shape parameters (skewness & kurtosis) Use the distribution to acquire probabilities of the occurrence of certain events Defined explicitly for the distribution

-

• •

Statistics - start with data (observed values from an unknown "empirical" distribution) Function ionss of the data data that that estim estimate ate param paramete eters rs {mean {mean,, varian variance, ce, skewn skewness ess,, and • Funct

•

kurtosis} Estimate probabilities probabilities


3

Statistics - Estimation of Parameters Measures of Location Average ( X ) n

∑ Xi

X = i =1

most common measure of central tendency

n

Median (Md) Md = the the value value that divides ranked observations observations in half = X(n+1)/2 if n is odd X n / 2 + X ( n / 2 )+1 = if n is even 2

Mode (Mo) Mo = the most frequent data point

Ex. Data {3, 2, 9, 1, 6, 8, 2}

Ranked Data {1, 2, 2, 3, 6, 8, 9}

X = (3+2+9+1+6+8+2)/7 = 31/7 = 4.43

Md = X(7+1)/2 = X4 = 3 Mo = 2 most frequent frequent observation observation (occurred twice) twice)


4



Properties of the Average Σ(Xi- X )2 is less than the squared deviations from any other estimate Ex. Σ(Xi- X )2



≤ Σ(Xi-Md)2

- average is the minimum variance estimate

Gets pulled in the direction of extreme points

Example

Data {1, 2, 3, 4, 9} _ X

Including X5 = 9

X5

X = 3.8 Md = 3

Md

_ X Excluding X5 = 9

X = 2.5 Md = 2.5

Md

• Average can be very sensitive towards extreme points, while the median is fairly robust Sensitivity depends upon the sample size and the deviation of the extreme point!

• Assumption of X : Xi's are independently and identically distributed (i.i.d.) This is often not a good assumption!


5

Measures of Dispersion Range (R) R = Xn - X1

= largest value - smallest value

• Must sort data from low (X 1) to high (Xn) Ex. Data {3, 2, 9, 1, 6, 8, 2}

Ranked Data {1, 2, 2, 3, 6, 8, 9}

R =9–1=8

Properties of the Range Bad: It only uses two pieces of information. Good: It is easy to compute manually. Uses of the Range • Range itself is useful for characterizing a distribution (order statistics) ∧

• Range can be used to estimate the standard deviation ( σ = R/d2) • Many practical applications once the standard deviation is approximated: -

Control Charts Process Capability Gage Repeatability & Reproducibility

Problems when using the Range to Approximate the Standard Deviation The d2 coefficient depicts the relationship between the range and standard deviation for a normal distribution. Thus, the Range method for estimating standard deviation is only valid if the parent distribution is normally distributed.


6

Sample Variance (S 2 ) n

∑ ( X i − X )

2

S2 = i =1

=

sum of squares deg rees of freedom

n −1

Most common and reliable measure of dispersion

Ex. Data {3, 2, 9, 1, 6, 8, 2} S2 = S=

Ranked Data {1, 2, 2, 3, 6, 8, 9}

(3 − 4.43) 2 + ( 2 − 4.43) 2 + ... + (2 − 4.43) 2 7 −1

= 61.71/6 = 10.286

2 S = 3.207

Xi

(Xi- X )

3 2 9 1 6 8 2 Average = 4.43

–1.43 –2.43 4.57 –3.43 1.57 3.57 –2.43

(Xi- X ) }

Importance of S: 4.43


(Xi- X )2 2.04 5.90 20.88 11.76 2.46 12.74 5.90 Sum = 61.71

7

• Same units as measurements • Positive numbers that increase when variability increases Sample variance is the unbiased and minimum variance estimate for the population variance (irrespective of the distribution type)

The sample variance is really an average of the squared deviations. n

∑ ( X i − X )

2

S2 = i =1

=

sum of squares deg rees of freedom

n −1

Why (n-1) degrees of freedom?

Only (n-1) independent deviations!

Ex. Data {1, 2, 3, 4, 5}

Σ(Xi- X ) = 0 ΣXi - n X = 0 ΣXi - ΣXi = 0

X = Σxi /n

Example ( X = 3) Xi Dev 1

-2

2 3 4 5

-1 0 1 2 Sum Dev = 0


8

Grand Average and Pooled Variance Estimates

• Subgroup averages and variances are merged into historical estimates of average and variance – used for control chart centerlines

• Grand average ( x ) = average of subgroup averages • Pooled variance ( S p2 ) = average of subgroup variances m

x=

∑X i

if n is constant

i =1

m m

∑ ni Xi

x=

i =1 m

if ni is variable

Always Correct

∑ ni

i =1

m

2

S p =

∑

Si2

if n is constant

i =1

m m

∑ ν S

2 i

i

2 S p =

i =1 m

∑ ν

if ni is variable

Always Correct

i

i =1


9


10

Probability Theory Distribution Functions

Discrete Distributions Discrete Probability Density Function ( pdf ): f(x) = Pr[X=x] Properties of the discrete pdf 1) f(x) ≥ 0 each probability is greater than or equal to 0. 2) ∑ f(x) = 1 the sum of the probabilities equals 1.0 ∀x

Discrete Cumulative Distribution Function ( cdf ): F(x) = P(X ≤ x) = Ex. Binomial (n=5 trials, p=.2)

f(x) = n

( ) x

5

f(0) = ( 0 ) .20 (1-.2)5 = 0.32768 5 f(1) = ( 1 ) .21 (1-.2)4 = 0.4096

∑ f(t) t≤x

n

( ) px (1-p)n-x x

n!

= x!(n − x)!

Probability of 0 successes in 5 trials Probability of 1 success in 5 trials

5

f(2) = ( 2 ) .22 (1-.2)3 = 0.2048 5 f(3) = ( 3 ) .23 (1-.2)2 = 0.0512

Probability of 2 successes in 5 trials Probability of 3 successes in 5 trials

5

f(4) = ( 4 ) .24 (1-.2)1 = 0.0064

Probability of 4 successes in 5 trials

5

f(5) = ( 5 ) .25 (1-.2)0 = 0.00032

Probability of 5 successes in 5 trials

f(0) + f(1) + f(2) + f(3) + f(4) + f(5) = 1.0

Property of a pdf

F(2) = P(X≤2) = f(0) + f(1) + f(2) = 0.94208


11

Joint Probability of Multiple Events P = probability of success F = probability of failure = 1-P Pr[success on 1st trial] = P Pr[success on 1st trial “and” success on 2nd trial] = P*P Pr[success on 1st trial “and” success on 2nd trial “and” failure on 3 rd trial]=P*P*F = P 2(1-P) Therefore: Pr[x successes in n trials] = Px(1-P)n-x

Combinations: n

Number of combinations = ( x ) =

n! x!(n − x)!

Number of combinations of 1 success in 5 trials = 1) 2) 3) 4) 5)

S F F F F

F S F F F

F F S F F

F F F S F

S F F F S S S F F F

F S F F S F F S S F

F F S F F S F S F S

5 1

5! 1!(5

F F F F S

Number of combinations of 2 success in 5 trials = 1) S 2) S 3) S 4) S 5) F 6) F 7) F 8) F 9) F 10) F

( ) =

( ) = 5 2

−1)! = 5

5!

2!(5 −2)! =

F F F S F F S F S S


5x 4 2!

= 10

12

Continuous Distributions Continuous Probability Density Function ( pdf ): f(x) does not equal a probability Properties of the continuous pdf 1) f(x) ≥ 0 the function is positive over all the region of X +∞

2)

∫ f(x) dx = 1

the total area under the curve equals 1.0 (probability)

−∞

Cumulative Distribution Function ( cdf ): F(x)

= P(X ≤ x) =

x

∫ f(t) dt

−∞

f(x)

x F(x) = area under f(x) to the left of x

(0 ≤ x ≤ 1)

Ex. f(x) = 2x 1

1

∫ 2x dx = x 0

2

| =

0

12 - 02 = 1

x

F(x) =

∫ 2t dt = t 0

x

2

| =

0

x2

area under the curve equals 1, thus proving it is a pdf

F(.5) = 0.52 = 0.25 = Pr[X ≤ .5]

Probabilities and Percentage Points (Variates) from Common Distributions


13 Tables or functions exist for common distributions such as Z, t, F, and chi-squared to: • determine the lower tail probability for a given value of x • determine the value of x based on the lower tail probability Area between two limits b

Pr[a < X < b] =

∫ f(x) dx = F(b)-F(a)

= Pr[Conformance] if a=LSL and b=USL

a


14

Expectations Discrete Distributions Let the possible values (sample space) for X be denoted by x 1,x2, ... ,xn and f(xi) = Pr[X=xi] n

E[X] =

∑ xi f(xi)

i=1

n

E[X ] = 2

∑ x 2i f(xi)

i=1

n

E[u(X)] =

∑ u(xi) f(xi)

i=1

Ex. Binomial (n=5,p=.2)

E[X] =

0 (0.32768) + 1 (0.4096) + 2 (0.2048) + 3 (0.0512) + 4 (0.0064) + 5 (0.00032)

E[X2] =

= 1.0 = np *binomial property

= 1.8

Consideration: What if f(xi) was a constant ∀ i? (ex. 1/n) n

X ∑ xi (1/n) = i∑= 1 i = sample average k

E[X] =

i=1

n

The sample average puts an equal weighting on all observations.


02 (0.32768) + 12 (0.4096) + 22 (0.2048) + 32 ((0.0512) + 42 (0.0064) + 52 (0.00032)

15

Continuous Distributions +∞

E[X] =

∫ x f(x) dx

−∞ +∞

E[u(X)] =

∫ u(x) f(x) dx

−∞

Ex. f(x) = 2x

(0 ≤ x ≤ 1) 1

E[X] =

1

∫ x 2x dx = ∫ 2x dx = 2/3 x 0 1

E[X ] = 2

1

2

3

0

1

0

3

0

= 2/3

1

∫ x 2x dx = ∫ 2x dx = 2/4 x 2

| 0

4| 0

= 2/4


16

Variance is an Expectation VAR[X] = E[(X-E[X]) 2]

where E[X] = µ

VAR[X] = E[X2] - {E[X]} 2

when factored out

VAR[X] = "Expected value of the product minus the product of the expected values"

Ex. Binomial (n=5,p=.2)

VAR[X] = (1.8) - {1} 2 = 0.8

Ex. f(x) = 2x

= npq = np(1-p) *binomial property

(0 ≤ x ≤ 1)

VAR[X] = (2/4) - {2/3} 2 = 0.055555


17 Example: Discrete Expected Value

Daily sales records for a computer manufacturing firm show that it will sell 0, 1, or 2 mainframe computer systems with probabilities as listed. Number of sales (x) Probability f(x) A)

0 0.7

1 0.2

2 0.1

Find the expected value and standard deviation of daily sales. Expected value of daily sales: 3

E(x) =

∑

x f(x) = (0)(0.7) + (1)(0.2) + (2)(0.1) = 0.4 mainframe computers

i =1

Standard deviation (σx) of daily sales:

σx2 = E [x2] – E[x]2 3

E [x ] = 2

∑

x2 f(x) = (02)(0.7) + (12)(0.2) + (22)(0.1) = 0.6

i =1

σx2 = 0.6 – (0.4)2 = 0.44

B)



σx = (0.44)1/2 =

0.6633

The firm’s daily fixed cost is $30,000 and their marginal cost is $200,000 (cost per unit). If a mainframe system sells for $500,000, what is the expected daily profit? Daily profit = Revenues – costs

Fixed daily cost = $30,000; Cost per unit = $200,000; Revenue per unit = $500,000 Daily profit = (revenue per unit)(expected value sold) – fixed daily cost – (cost per unit)(expected value ) = (500000)(0.4) - (30000) – (200000)(0.4) =


$90,000 per day

18 Example: Continuous Expected Value

The outside diameter of washers is a continuous random variable, x, distributed uniformly from 300 – 320 mm. Calculate: A)

f(x) Let x = outside diameter x

This is a uniform distribution  f(x) = c, a constant

 for

a pdf

∫ f(x) dx = 1

−∞

320

∫

c dx = 1  c(320 – 300) = 1



solve for c  c = 1/20

300

Therefore: f(x) = 1/20 for 300 < x < 320 f(x) = 0 elsewhere

B)

E[x] 320

E[x] =

∫

320

320

∫

x f(x) dx =

300

x/20 dx = 1/40 x 2

300

|

= 1/40 (102400 – 90000)

300

= 310 mm

C)

VAR[x] VAR[x] = E[x2] – E[x]2 320

E[x2] =

∫

300

320

320

x2 f(x) dx =

∫

2 x /20 f(x) dx = 1/60 x3

300

| 300

Therefore: Var[x] = 96133.33 – (310)2 = 33.33


= 96133.33

19

Median and Mode Median – value of the 50 th percentile

F(x) 1.0

0.5

X

0 Md

Mode – value with the largest f(x) 

Value of X where the derivative of f(x) equals 0 f(x)

Mo


X

20

Specific Distributions Discrete

Binomial X=Nn= number of successes in n trials f(x) =

n

( ) px (1-p)n-x x

F(x) =

x={0,1, ...,n}

x

∑ ( ) pt (1-p)n-t n t

t = 0

E[X] = np = µ

VAR[X] = npq = σ2

Example The probability that a piece of luggage will survive the stress test is 0.65. If six bags are randomly tested: A)

What is the probability that exactly four will survive? Given: P(luggage survives) = 0.65

P(luggage fails) = 0.35

Exactly 4 bags survive, let x = number that survive This is binomial.  p = 0.65, q = 0.35, n = 6 P(x = 4) = P(4) = 0.3280 B)

( ) p4 q2 = ( ) (0.65)4 (0.35)2 = (15)(0.1758)(0.1225) = 6 4

6 4

Given that the 1st and 2nd bags survived, what is the probability that the 3 rd and 4th bags will fail? Note here that the trials are independent, and that trials 1 and 2 already occurred, so the probability of their occurrence = 1. Let x = number that survive  p = 0.65, q = 0.35, n = 2 P(x = 0) = P(0) =

( ) p0 q2 = ( ) (0.65)0 (0.35)2 = 2 0

2 0


0.1225

21

Poisson X=N(t)= number of arrivals occurring in a given time interval f(x) =

e

−λt

E[X] =

x

x={0,1, ..., ∞ )

x! x

F(x) =

(λt)

e

∑

−λt

(λt)

i

i!

i=0

λt = µ

VAR[X] = λt = σ2

Example The manufacturing defect rate of a product is 0.005 defects per unit. probability of zero defects occurring in 100 units?

λt = 0.005 DPU * 100 units = 0.5 f(0) =

e−

0.5

(0.5)

0!

0

= 0.60653


What is the

22 Continuous

Normal 1 (X − µ ) 2 exp -1/2 2πσ σ

[

f(x) =

]

-∞

≤ x ≤ ∞

x

∫ f(t) dt which is estimated numerically.

F(x) =

−∞

Since an infinite number of mean-variance combinations exist, a standardized variable was developed.

Standard Normal Transformation Z=

X − µ

σ

Transforms all the observations of any normal random variable X to a new set of observations of a standard normal variable Z. E[X] = E[Z] = 0

VAR[X] = 2 VAR[Z] = 1

Proof:

E[Z] =

E[X] − µ

σ

=

µ−µ = 0 σ

VAR[Z] = VAR[

X − µ

σ

] = VAR[

X

σ

]

σ2 VAR[Z] = 2 VAR[X] = 2 = 1 σ σ 1

Corollary: VAR[cX] = c2 VAR[X] X - N( , 2) ⇒ Z - N(0,1) Importance: a single table of Z probabilities can be used for all combinations of ( µ,σ2). FYI S2 is an unbiased estimate of σ2, but

2 S is a biased estimate of σ. Some authors espouse using a C4 index to compensate for the bias induced by taking the square root. The problem with the C 4 index is that the VAR[Z] no longer equals 1 as described above.


23

Two Types of Normal Distribution Problems 1) 3 Knowns

Transform to Z

Find corresponding probability

Example Given a normal distribution with µ = 50 and σ = 10, find the probability that X falls within its specification limits of 45 and 62.

Pr[45 ≤ X ≤ 62] = Pr [

45 − 50 10

≤

X − µ

≤

62 − 50 10

]

= Pr[-0.5 ≤ Z ≤ 1.2]

σ

Pr[Z ≤ 1.2] = 0.8849 Pr[Z < -0.5] = 0.3085 Pr[-0.5 ≤ Z ≤ 1.2] = 0.8849 – 0.3085

2) Known probability

45 50

62

X Space

-0.5 0

1.2

Z Space

Find corresponding Z value

Solve for 1 unkown given 2 knowns

Example On an examination, the average grade was 74 and the standard deviation was 7. If 12% of the class are given A’s, and the grades are follow a normal distribution, what is the lowest possible A?

Pr[Z < z] = 0.88 1.175 =

X − 74

7

z = 1.175 X = 82.225

82.225

X Space

1.175 Two Types of Sampling Normal Distribution0Problems

Z Space

1) 4 Knowns

Transform to Z

74

Find corresponding probability


24

Example Given a normal distribution with µ = 50, σ = 10, and a sample size of n = 40, find the probability that X falls within its control limits of 47 and 54.

Pr[45 ≤

X

≤ 62] = Pr

47 − 50

[ 10 

54 − 50

]

= Pr[-1.90 ≤ Z ≤ 2.53]

Find corresponding Z value

Solve for 1 unkown given 3 knowns

40

≤ X − µ ≤ σ 

n

10 

40

Pr[Z ≤ 2.53] = 0.9943 Pr[Z < -1.90] = 0.0287 Pr[-1.90 ≤ Z ≤ 2.53] = 0.9943 – 0.0287 = 0.9656 2) Known probability

Example A drilling operation produces holes with diameters that are approximately normally distributed. If the process mean and variance are 2.1 and 0.0225, respectively, what should be the sample size to ensure that no more than 14% of the sample means will be greater than 2.15?

This is normally distributed, where µ = 2.1 and σ2 = 0.0225. We want to find n. Given: P(x > 2.15) < 0.14  or P(x < 2.15) > 0.86 Transform to Z  P(Z < Z*) > 0.86 Look up Z-value in the table  Z* > 1.08 Now: Z* > Solve for n:

X −µ σ

n

=

2.15 − 2.1 0.0225 

n

=

(0.05)  n (0.15)

n > (1.08)(0.15)/(0.05) = 3.24 

n > 10.49

Therefore: n > 11 (Need a whole number sample.)


25

Assumptions of the standard normal distribution 1) X is normally distributed 2) is known with certainty 3) 2 is known with certainty 4) observations (xi) are independently and identically distributed (i.i.d.)

When the population variance is known the Z distribution is used.

Z=

X −µ σ

n

When the population variance is unknown, there is uncertainty in the estimate of Therefore, a wider distribution was developed to account for this uncertainty.

t=

t Distribution:

X − X 2 p

S / n

σ2.

2

Sp = pooled variance

Probabilities and percentage points can be obtained from a t table. E[t]=0, VAR[t]=1 Example The outside diameter of washers follows a normal distribution with a mean of 1.20 inches. A sample of 9 washers will result in a sample standard deviation of 0.03”. Calculate the probability that a sample mean will lie between 1.18140 and 1.22306.

This is normally distributed, where µ = 1.20”, s = 0.0225, and n = 9. P(1.18140 < X < 1.22306) = P(t1 < t < t2) Use the transformation: t1 = t2 =

1.1#140 −1.20 0.03 

"

1.22306 −1.20 0.03 

"

= -1.86 = 2.306

Therefore: P(-1.86 < t < 2.306) = P(t < 2.306) – P(t < -1.86) = 0.975 – 0.05 = 0.925

(Look up values.)


26

2.1 Types of Inferences - Gather some knowledge concerning the population using data

Considerations: 1) Are the samples representative of the population? (Sampling) 2) How do we make inferences about the population parameters? 3) How reliable are these inferences?

Sampling - In order to obtain valid inferences of the population, we must obtain samples that are representative of the population.

Random Sample - observations are made independently (x 1, x2, ..., xn) and randomly - each value (xi) came from distributions having the same pdf {f(x)} - i.i.d.: independently and identically distributed

Importance - Joint probability equals the product of the marginal probabilities. - COV[X1,X2]=0, Variance of the sum equals the sum of the variances - Rational sample


27

Hypothesis Testing Make a hypothesis (assumption) about the population parameter of interest

ex.

H0: Null hypothesis H0: µ=4

HA: Alternative hypothesis (compliment of H 0) HA: µ ≠ 4

α /2

α /2

µ0

Two Conclusions:

1) Reject H0 2) Cannot reject H0 - Can never "accept" because we don't know what the true parameter really is, however we can conclude that it is not some value.


28

Hypothesis Testing of the Mean Test Statistic:

Z=

X −µ

σ

n

Critical Values (define rejection regions)

Zcrit = Z α /2 and Z 1-α /2

Compute test statistic (Zcalc)- where does the observed value fall with respect to the assumed reference distribution? Rejection Criterion: Given a mean of µ0, (1-α) of the values will fall between Z crit and -Zcrit. If the calculated statistic (Z calc) falls in the rejection regions, then with a probability of (1-α) this sample did not come from a population with mean µ 0. µ ≠ µ 0 Possible Situations:

Cannot Reject H0 Reject H0

H0 is True Correct Decision Type I Error

H0 is False Type II Error Correct Decision

Type I Error - "Wrongful rejection" - rejection of null hypothesis when it is true Pr[Type I Error] = α Type II Error - "Wrongful acceptance" - "acceptance" of the null hypothesis when it is false Pr[Type II Error] = β Pr[Rejection] = α when null hypothesis is true Pr[Rejection] = 1- β = "power" when null hypothesis is false


29

Hypothesis Testing of the Variance Test Statistic:

χ2 =

(n − 1)S

2

σ2


30

Example: Hypothesis Test Using a Z Distribution

For a random sample of 50 measurements on the breaking strength of cotton threads, the mean breaking strength was found to be 210 grams and the standard deviation 18 grams. A)

The manufacturer claims that the population mean is 215 grams. State the hypothesis and solve for α = 0.10. The claim is that µ = 215 g. This is a two-tailed test. Null hypothesis: HO: µ = 215 Zcritical = Z0.10/2 = Z0.05 = 1.645 Compare Zcalc to Zcritical :

Alternative hypothesis: HA: µ Zcalc =

X − µ σ 

n

=

210 − 215 1# 

50

≠

215

= -1.96

−1."6 is not less than 1.645

Therefore: Reject HO that µ = 215 for α = 0.10. Manufacturer’s claim is invalid.

C)

Is there evidence that the population mean of breaking strength exceeds 218 grams? State the hypothesis and solve for α = 0.05. Using an α = 0.05, check if the population mean is > 218g. This is a one-tail test. Null hypothesis: HO: µ < 218

Alternative hypothesis: HA: µ > 218

Zcritical = Zα = Z0.05 = 1.645

Zcalc =

X − µ σ 

n

=

210 − 21# 1# 

50

= -3.771

Compare Zcalc to Zcritical : -3.771 is not greater than 1.645 Therefore: The null hypothesis, HO: µ < 218, cannot be rejected at α = 0.05.


31 Example: Hypothesis Test Using a t Distribution

An auto company states that its new compact car has an average fuel economy (miles per gallon) greater than or equal to 55 mpg on the highway. Eight cars were randomly selected and driven. The results of the study were: 57, 52, 50, 49, 53, 51, 47, and 55. State the hypothesis and solve for α = 0.05. The claim is that the average mpg > 55 mpg. This is a one-tailed test with (n-1) = ν = (8-1) = 7 (degrees of freedom). Null hypothesis: HO: µ > 55

Alternative hypothesis: HA: µ < 55

Since the true variance is unknown a t distribution will be used.

X = (57 + 52 + 50 + 49 + 53 + 51 + 47 + 55 ) / 8 = 51.75 mpg #

Sp = 2

( xi − 51.75) 2

∑

n −1

i =1

S p2  n

= s X =

s

n

=

=

73.5 7

3.24037 #

= 10.5



s = 3.24037

= 1.14564

tcritical = t ν, 1-α = t7, 0.95 = -1.895 (from tables) tcalc =

X − µ 2 p

S  n

=

51.75 − 55 1.14564

α = 0.05, and

= -2.8368

Compare tcalc to tcritical : -2.8368 < -1.895

Therefore: Reject the null hypothesis, HO: µ > 55 at a = 0.05 Manufacturer’s claim is invalid.


32 Example: Hypothesis Test Using a

Distribution

2

The same auto company as in the previous example claims that the true variance of fuel economy (mpg) is less than or equal to 5. Using the same data, state the hypothesis and solve for α = 0.01. The company claims that 0.01.

σ2 of car fuel economy < 5. This is a one-tailed test with α =

Null hypothesis: HO: σ2 < 5

Alternative hypothesis: HA: σ2 > 5

From exercise 2-7, Sp2 = 10.5 and

ν = 7

χ2critical = χ2 ν, α = χ27, .01 = 18.745 (from tables) χ calc = 2

(n −1) S p 2

σ

2

=

(7)(10.5) 5

= 14.7

Compare χ2calc to χ2critical : 14.7 is not greater than 18.745

Therefore: Reject the null hypothesis, HO: σ2 < 5, at α = 0.01. Manufacturer’s claim is invalid.


33

Decision Making Using Conditional Probabilities Example - Number of people in a small town

Male (M)

Female ( M )

Employed (E)

50

25

75

Unemployed ( E )

10

15

25

Total

60

Marginal Probabilities

•

Total

Consider only one distribution

Pr[Employed] = 75 / 100 = 0.75 Pr[Unemployed] = 25 / 100 = 0.25 Pr[Male] = 60 / 100 = 0.6 Pr[Female] = 40 / 100 = 0.4


40

100

34

Joint Probabilities

• •

Consider more than one distribution Pr[A, B] = Probability that event A occurred and event B occurred

Pr[Employed, Male] = 50 / 100 = 0.50 Pr[Employed, Female] = 25 / 100 = 0.25 Pr[Unemployed, Male] = 10 / 100 = 0.10 Pr[Unemployed, Female] = 15 / 100 = 0.15

Conditional Probabilities

•

Pr[A | B] = Probability that event A will occur given that event B has already occurred

Pr[Employed | Male] = 50 / 60 = 0.833 Pr[Unemployed | Male] = 10 / 60 = 0.167 Pr[Employed, Female] = 25 / 40 = 0.625 Pr[Unemployed, Female] = 15 / 40 = 0.375


Probability and Statistics Notes

Recommend Documents