05 : Sampling Distributions and Methods of Estimation
(1)
05. Sampling and Sampling Distributions We are often interested in calculating some properties (called as the parameters) of a population. For a very large population, the exact calculation of a parameter is typically prohibitive. A more economical and sensible approach is to take a random sample from the population of interest, calculate a statistic related to the parameter of interest, and then make an inference bout the parameter based on the value of the statistic. This is called statistical inference. The distribution of a statistic is called a sampling distribution. The sampling distribution helps us understand how close a statistic to its corresponding population parameter is. Typical parameters of interest include:
Mean
Proportion
Variance The standard statistic that is used to infer about the population mean is the sample mean.
Definitions: (1)
A population is defined as an aggregate of all individuals, or elements, or objects under consideration. They are called statistical units. Examples (i). The manager of an automobile agency is interested in fuel economy of the Suzuki cars in the company’s fleet. Here the population consists of all Suzuki cars in the fleet. The elements of the population are the individual cars. (ii).
A quality assurance manager wishes information about the quality level of the firm’s for manufacturing light bulbs. Here the population consists of all the bulbs that could be produced by the process. The elements of the population are the individual electric bulbs.
(2) (3) (4) (5) (6)
A population containing finite or fixed number of elements is called a finite population otherwise it is infinite. A population which consists of concrete objects is called an existent population otherwise it is called hypothetical population. A small part of a population is called a sample. Technique of selecting a true sample is called sampling. We’ll discuss sampling later. The sampling distribution of the mean is the probability distribution or the relative frequency distribution of the means X of all possible random samples of the same size that could be selected from a given population. The mean of this distribution is represented by x- and the standard deviation which is called the standard error of the
(7)
mean, by x- . Sampling is said to be with replacement if the selected unit is replaced to the population before selecting the next unit. Thus sampling unit can be selected more than once. Case (i) when sampling is without replacement from a finite population: x- =
and
x- =
n
N-n N-1
Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation where
N-n N-1 is
(2)
called the ‘finite population multiplier’ or ‘the finite population correction
n is less than 0.05, then finite population multiplier need not be N used. For a large N, this factor, of course, approach 1 and hence can be ignored. The usual rule of thumb is to consider N is large enough if it is at least 20 times larger than n. Case (ii) when the sampling is with replacement from infinite population: x- = and x- = n (8) An element of a sample is called a sample unit. A complete list of all possible sampling factor’. If the sampling fraction
(9) (10)
(11)
(12)
(13)
(14)
units is called a sampling Numerical information or frame. values drawn from population are called parameter. For example population mean and the standard deviation . Numerical information or values drawn from sample are called statistic. It varies from sample to sample from the same population. For example sample mean X and sample standard deviation S. The difference between parameter and statistic due to small sample is called sampling error. It can be reduced by increasing the sample size to a sufficient level. sampling error = X The non sampling errors are those which arise due to defective sampling frame or information not being provided correctly. For example, income, sale, production age etc. are not coated correctly in the most of the cases. Bias is a cumulative component of error which arise due to defective selection of the sample or negligence of the investigator. Errors due to bias increase with an increase in the size of the sample. A population in which every sampling unit have similar characteristic and have equal chances of selection in sample is called a homogeneous population.
Definition (Sampling) Sampling techniques are used to estimate the population parameters on the basis of samples measures called as statistic and usually these inferences are mean, variance, standard deviation, Skewness and Kurtosis etc. That’s why we discuss here sampling distributions as an application of these inferences.
Sampling methods (1)
Probability Sampling
(2)
when each unit in population has known non-zero (not necessarily equal) probability of its being included in the sample, the sampling is said to be probability sampling is also called random sampling. e.g. simple random sampling, stratified sampling, systematic sampling, cluster sampling etc. Non-probability Sampling a non-probability sampling is a process in which the personal judgment determines which units of the population are selected for the sample. It is also called non-random or judgment sampling
Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
(3)
Types of Sa mpli ng Random or Probability Sampling Non-random or Judgment Sampling In probability sampling or random sampling, all the items in the population have a chance of being chosen in the sample. In judgment sampling, personal knowledge and opinion are used to identify the items from the population that are to be included in the sample. Sometimes judgment sample is used as pilot or trial sample to decide how to take a random sample later. The rigorous statistical analysis can be done only with the probability samples.
Types of Random Sa mpli ng (i)
Simple Random Sampling Goldfish Bowl Procedure: In this procedure each unit of the population is allotted a different serial number from 1 to N and record each number on a card or on a slip of paper. Place these numbered cards or the folded slips of paper in a bowl or a basket and mix them thoroughly. Then draw out blindly the desired number of cards or the folded slips of paper one by one for the sample. Using a Random Number Table: Assign a number from 1 to N to each of the N units in the population. Consult a random number table, read digits in groups of two or three or more according to the largest number assigned to a unit in the population, from the table vertically , horizontally or diagonally.
(ii)
Systematic Sampling
A sample of size n is defined to be a systematic random sample if it is obtained by choosing one unit at random from the first k units and thereafter selecting every kth unit after the N units in the population have been serially numbered from 1 to N or arranged in a systematic way. (iii)
Stratified Sampling
A sample of size n is defined to be stratified random sample if it is selected from a population which has been divided into a number of non-overlapping groups called strata, such that parts of the sample is drawn at random from each stratum. (iv)
Cluster Sampling
A random sample is said to be a cluster sample if it consists of first selecting at random groups of individual units, called cluster into which a population can be divided and then including in the sample either all the units from each of the chosen clusters, or selecting a random sample of the units which the cluster comprises.
Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
(4)
Sampling Distribution A frequency distribution of all the means of the samples is called the sampling distribution of the mean.
Explanation: Suppose we draw samples from a normally distributed population with mean 100 and a standard deviation of 25. We draw samples of 5 items each and calculate their mean. Relationship between the population distribution and sampling distribution of the mean for a normal population is:
Suppose we increase our sample size from 5 to 20. This would increase the effect of averaging in each sample and would expect even less dispersion among the sample means
Examples (1) Consider the data concerning the experience of five motorcycle owners with life of tires. Owners Carl Debbie Elizabeth Frank George Total Tire Life
3
3
7
9
14
36
(in months) Because only five people are involved, the population is too small to be approximated by a normal distribution. We will take all of the possible samples of the owners in groups of three. Compute the sample mean X , list them and compute the mean of the sampling distribution x- ? Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
(5)
Solution Calculation of sample mean of tire’s life with n = 3 is given below: Sample of Three
Sample Data
Sample Mean
(Tire lives) EFG
7+9+14
10
DFG
3+9+14
8 2/3
DEG DEF
3+7+14 3+7+9
8 6 1/3
CFG
3+9+14
8 2/3
CEG
3+7+14
8
CEF
3+7+9
6 1/3
CDF
3+3+9
5
CDE
3+3+7
4 1/3
CDG
3+3+14
6 2/3
Total
72
x- = 72/10 = 7.2
Calculations show that even the population is not normal, the mean of the sampling distribution x- , is still equal to t he population mean . In the following figures, we observe that the distributions of the population is not normal whereas the sampling distribution of the mean looks a little like the bell shape.
Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
(6)
As the sample size is increased, the sampling distribution of the mean looks more likely to a bell shape of the normal distribution.
Now we state central limit theorem which supports the above sited arguments.
Central L imit Theorem The central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed.
The central limi t theorem explains:
the mean of the sampling distribution of the mean will equal the population mean
it measures that the sampling distribution of the mean approaches normal as the sample size increases It is a relationship between the shape of the population distribution and the shape of the sampling distribution of the mean.
Examples (2) A population consists of 5 numbers 2, 3, 6, 8 and 9. Consider all possible samples of size 3 that can be drawn with replacement from this population. Find (a) the mean of the population, (b) the standard deviation of the population, (c) the mean of the sampling distribution of means and (d) the standard deviation of the sampling distribution. Using software Minitab, this question may be solved.
Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
(7)
06. POINT ESTIMATION AND INTERVAL ESTIMATION The sample mean x is the best estimator of the population mean . It is unbiased, consistent, the most efficient estimator, and, as long as the sample is sufficiently large, its sampling distribution can be approximated by the normal distribution as central limit theorem says.
Definition (Point Estimate) Point estimate of a population parameter is a single numerical value of a sample statistic.
(1)
Point Estim ates (a)
Sampling mean x as point estimate of the population mean. E( x) = i.e.
(b)
s2 is a point estimate of the population variance 2 (xi - x) 2 s2 = n-1 i.e.
(c)
x is an unbiased estimate of the population mean .
an unbiased estimate of the population variance
2
s is also a point estimate of the population variance 2 (xi - x) 2 s2 = n i.e.
an biased estimate of the population variance
Examples (3) A bank calculates that its individual saving accounts are normally distributed with mean of $2000 and a standard deviation of $600. If the bank takes random samples of 100 accounts, what is the probability that the sample mean will lie between $1900 and 2050.
Solution First we calculate standard error of the mean: x- = (for infinite population) n 600 = 100 = $ 60 To determine the probability that sample mean will lie between $1900 and $2050. We find that corresponding values z1 and z2 using x- Z= = x It tells us to convert any normal random variable to a standard normal random variable. Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation For X = $ 1900
z1 =
1900 - 2000 = - 1.67 60
For X = $ 2050
z2 =
2050 - 2000 = 0.83 60
(8)
using table we have the total area between z1 and z2 is 0.7492 i.e. P[1900 x 2050] = P[- 1.67 z 0.83] = 0.7492
Examples (4) In a sample of 25 observations from normal distribution with mean 98.6 and standard deviation 17.5 (a) (b) (c)
what is the standard error of the mean what is P[92 < x < 102 ]
SC 6.5
Find the corresponding probability given a sample of 36.
Solution (a)
n = 25, = 98.6, = 17.2, x = / n = 17.2 /
(b)
25 = 3.44 (Standard Error)
92 - 98.6 x- 102 - 98.6 P[92 < x < 102 ] = P[ < < ] 3.44 3.44 x = P[-1.92 < z < 0.99] = 0.4726 + 0.3389 = 0.8115
(c)
n = 36, x = / n = 17.2 / 36 = 2.87 92 - 98.6 x- 102 - 98.6 P[92 < x < 102 ] = P[ < < ] 2.87 2.87 x = P[-2.30 < z < 1.18] = 0.4893 + 0.3810 = 0.8703
Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
(9)
Examples (5) Mary Bartel an Auditor for a large credit card company, knows that on average, the monthly balance of any given customer is $ 112 and the standard deviation is $ 56. If Mary audits 50 randomly selected accounts, what is the probability that the sample average monthly balance is Page 321 AIOU SC 6.5 (i) (ii)
below $ 100 between $ 100 and $ 130.
solution
x = / n = 56 / 50 = 7.92 n = 50, = 56, = 112, x- 100 - 112 (i) P[x < 100] = P[ < ] = P[z < -1.52 ] = 0.5 – 0.4357 = 0.0613 7.920 x 100 - 112 x- 130 - 112 (ii) P[100 < x < 130 ] = P[ < < ] = P[-1.52 < z < 2.27] 7.920 7.920 x = 0.4357 + 0.4884 = 0.9241
Examples (6) In a sample of 16 observations from a normal distribution with a mean of 150 and a variance of 256, what is P[ x < 160] ? P[ x > 142] ?
(i)
(AIOU p-321 Prob. 6.27)
(ii) If, instead of 16 observations, 9 observations are taken, find (iii) P[x < 160] ? (iv) P[ x > 142] ?
Examples (7) From a population of 125 items with a mean of 105 and a standard deviation of 17, 64 items were chosen.
(a)
what is the standard error of the mean what is the P(107.5 < X 109)?
(b)
(AIOU -327 SC 6.7)
Solution N = 125, = 105, (a)
(b)
x- =
n
= 17 and n = 64 N-n N-1 =
17 8
61 = 1.4904 124
107.5-105 X- 109-105 P(107.5 < X 109) = P( 1.4904 < - < 1.4904 ) x
= P(1.68 < z < 2.68) = 0.4963-0.4535 = 0.0428
Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
(10)
Examples (8) From a population of 75 items with a mean of 364 and a variance of 18, 32 items were randomly selected without replacement. (d) (e) (f)
What is the standard error of the mean what is the P[363 < x < 366].
(AIOU p-327 Prob. 6.40)
What would your answer to part (a) be if we sample with replacement.
Examples (9) Given a population of size N = 80 with a mean of 8.2 and standard deviation of 3.2. What is the probability that a sample of 25 will have a mean between 21 and 23.5? (AIOU p-327 Prob. 6.41)
Examples (10) For a population of size N = 80 with a mean of 8.2 and standard deviation of 2.1, find the S.E of the mean for the following sample size (a) n= 16, (b) n= 25, (c) n = 49 (AIOU p-327 Prob. 6.42)
Examples (11) Data on pull-off force (pounds) for connectors used in an automobile engine application are as follows: (Douglas Montgomary Ch 7 page 228) 79.3, 75.1, 78.2, 74.1, 73.9, 75.0, 77.6, 77.3, 73.8, 74.6, 75.5, 74.0, 74.7, 75.9, 72.9, 73.8, 74.2, 78.1, 75.4, 76.3, 75.3, 76.2, 74.9, 78.0, 75.1, 76.8. (a) Calculate a point estimate of the mean pull-off force of all connectors in the population. State which estimator you used and why. (b) (c) (d)
Calculate point estimates of the population variance and the population standard deviation. Calculate the standard error of the point estimate found in part (a). Provide an interpretation of the standard error. Calculate a point estimate of the proportion of all connectors in the population whose pull-off force is less than 73 pounds.
Examples (12) Data on oxide thickness of semiconductors are as follows: 425, 431, 416, 419, 421, 436, 418, 410, 431, 433, 423, 426, 410, 435, 436, 428, 411, 426, 409, 437, 422, 428, 413, 416. (a)
Calculate a point estimate of the mean oxide thickness for all wafers in the population.
(b)
Calculate a point estimate of the standard deviation of oxide thickness for all wafers in the population.
(c) (d)
Calculate the standard error of the point estimate from part (a). Calculate a point estimate of the proportion of wafers in the population that have oxide thickness greater than 430 angstrom. (Douglas Montgomary Ch 7 p age 228)
Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
07
(11)
INTERVAL ESTIMATION
An interval estimate for a population parameter is called a confidence interval. A confidence interval is constructed so that we have high confidence that it does contain the unknown population parameter.
Objective Construction of confidence intervals on the mean of a normal distribution using either the normal distribution or the t-distribution method
Motivation Whenever we use mathematical approximation formula, we should try to find out how much the approximated value can at most deviate from the unknown true value. e.g. suppose that in a certain case we obtain 2.47 as an approximated value of a given formula and ± 0.02 as the maximum possible deviation from the unknown exact value. Then we are sure that the values 2.47 – 0.02 = 2.45 and 2.47 + 0.02 = 2.49 include the unknown exact value. In estimating a parameter , the corresponding problem would be the determination of two numerical 1 and 2 that depend on the sample values and include the unknown value of the parameter with certainty. However we already know that from a sample we cannot draw conclusions about the corresponding population that are 100% certain. So we choose a probability 1- close to 1 (for example 1- = 95%, 99%). Then determine two quantities 1 and 2 such that the probability that 1 and 2 include the exact unknown value of the parameter equal to 1-. i.e.
P(1 2) = 1-
The number 1- is called the confidence coefficient or the confidence level. It represents the probability associated with the interval. We should choose by considering the affordable risk of making false decision. The interval ( 1 , 2) is called 100(1- )% confidence interval for the unknown parameter . If = 0.05, then the probability that the interval ( 1 , 2) contains is 0.95
Confidence Interval of
a Population Mean
To compute a confidence interval for the population mean , we have to see whether or not a. The population is normal b. The population standard deviation is known c. The sample size is small
(a)
Confidence Interva l on the Mean of a Normal Distribu tion with known
2
Let a sample of size n be drawn from a normal population with an unknown mean and known variance 2. Then the sampling distribution of the mean x will be normal with a mean and standard deviation / n. Then Z=
x- / n
is standard normal without considering how small the sample size is. Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
(12)
The probability of falling Z in the interval ( - Z /2 , Z/2 ) is 1- and the corresponding interval is: - Z/2 ≤ Z ≤ Z/2 - Z/2 ≤
x- / n
≤ Z/2
- Z/2 / n ≤ x - ≤ Z/2 / n - x - Z /2 / n ≤ - ≤ - x + Z /2 / n x + Z /2 / n ≥ ≥ x - Z /2 / n
i.e.
x - Z /2 / n ≤ ≤ x + Z/2 / n
which is 100(1-)% confidence interval for of normal distribution with known 2
Examples (13) Determine a 95% confidence interval for the mean of a normal distribution with variance = 9 using a sample of 100 values taken with replacement with mean x = 5. Repeat the example with n= 30, 15 2
Examples (14) A confidence interval is constructed from a sample of size 25 taken with replacement for the mean of a normal population with = 50. The limits for the interval are 110.2 and 135.8. Find the confidence coefficient (or the confidence level).
Examples (15) A sample of size n = 200 selected without replacement from a population of size N = 1000 with =1.08 showed that x = 69.2. Construct a 95% confidence interval for the true mean of the population.
Examples (16) Find a 90% confidence interval for the mean of a normal distribution with = 3 given the sample (2.3, -0.2, -0.4, -0.9).
(b)
Confidence Interva ls for
of Normal Distribu tion wit h unknow n
2
In practice, the population variance 2 is usually not known and is estimated from the sample data. So when the sample size n is small (n < 30) and 2 is replaced with its unbiased (xi - x) 2 estimate s2= n-1 where v = n-1 is called the degree of freedom. i.e. the number of values we can choose freely. The statistic
t=
x-
s/ n
is used. Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
(13)
and the corresponding interval is: - t/2 (v) ≤ t ≤ t /2 (v) - t/2 (v) ≤
x-
s/ n
- t/2 (v) s / n
≤
≤ t/2 (v)
x - ≤ t/2 (v)s / n
- x - t /2 (v) s / n ≤ - ≤ - x + t /2 (v)s / n x + t /2 (v) s / n ≥ ≥ x - t /2 (v) / n
i.e.
x - t /2 (v) s / n ≤ ≤ x + t /2 (v)s / n
which is 100(1-)% confidence interval for of normal distribution with unknown
Examples (17) Five independent measurements of the point of inflammation (flash point) of diesel oil gave the values 144, 147, 146, 142, 144. assuming normality, determine 99% confidence interval for the mean.
Examples (18) Find a 99% confidence interval for the mean of a normal population from the sample 425, 420, 425, 435 length of 20 bolts with sample mean 20.2 cm and sample variance 0.04cm 2. knoop hardness of diamond 9500, 9800, 9750, 9200, 9400, 9550 copper contents (%) of brass 66, 66, 65, 64, 66, 67, 64, 65, 63, 64 melting point ( 0C) of aluminium 660, 667, 654, 663, 662
Examples (19) A sample of size 16 from a normal population with unknown standard deviation gave x = 14.5 and s = 5. Find 90% confidence interval for the mean.
Examples (20) For the following sample sizes and confidence levels, find the appropriate t value for constructing confidence intervals: (i) n = 28, 95% (ii) n = 8, 98% (iii) n = 13, 90% (iv) n = 10, 95% (v) n = 25, 99%
Examples (21) Seven homemakers were randomly sampled, and it was determined that the distances they walked in their housework had an average of 39.2 miles per week and a sample standard deviation of 3.2 miles per week. Construct a 95% confidence interval for the population mean.
Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation
(14)
Examples (22) Given the following sample sizes and t values used to construct confidence intervals, find the corresponding confidence level: (i).
n = 27, t = 2.056
(ii).
n = 5, t = 2.132
Examples (23) For the sample size 10 and confidence level 99%, find the appropriate t value for constructing confidence intervals. Given the sample size18 and t values t = 2.898 used to construct confidence intervals, find the corresponding confidence level. Practi ce Problems (1)
If X ~ N (80, 25) Find a. a point that has 14% area below it b. a point that has 85.31% area above it c. a point that has 30.5 % area above it d. two points symmetrical to mean containing 92% area between them
(2) If X ~ N(24, 16) Find a. b. c. d.
lower and upper quartiles 37th percentile median mode
(3) In a sample of 36 observations taken without replacement from normal distribution with mean 98.5 and standard deviation 16.5 (i) what is P[85 < x < 100 ] (ii)
Find the corresponding probability given a sample of 36.
Muhammad Na eem Sa ndhu, Assist ant Professor, Department of Mathematics, University o
f Engineering and Technology, Lahore