Course Syllabus Course Number: Stat 1 Course Title: Elementary Statistics Course Credit: 3 units Course Requirements: Prelim Exam 25% Midterm Exam 30% Final Exam 30% Quizzes/Homework 15% Total 100% Passing Mark: 60% Course Outline: I.
Introduction a) Definition of Statistics b) Levels of Measurement c) Data Collection and Sampling Techniques II. Frequency Distribution and Graphs a) Frequency Distribution b) Histograms, Frequency Polygon and Ogives c) Other Types of Graphs III. Data Description a) Measures of Central Tendency b) Measures of Variation c) Measures of Position IV. Counting Techniques a) Tree Diagram and the Fundamental Principle of Counting b) Permutations c) Combinations V. Probability a) Probability b) General Addition Rule c) Conditional Probability d) Baye’s Theorem VI. Probability Distribution a) Probability Distribution b) Mean, Variance and Expectation c) The Binomial Distribution VII. The Normal Distribution a) The Normal Distribution b) The Standard Normal Curve c) Standardizing a Normal Curve d) The Normal Curve as a Probability e) The Central Limit Theorem VIII. Confidence Intervals
a) Confidence Interval for the Mean when σ is known or n≥30 b) Confidence Interval for the Mean when σ is unknown or n<30 c) Confidence Interval for Variances and Standard Deviation IX. Hypothesis Testing with One Sample a) Null and Alternative Hypothesis b) Outcomes and the Type I and Type II Errors c) Distribution Needed for Hypothesis Testing d) Rare Events, the Sample, Decision and Conclusion e) Additional Information and Full Hypothesis Test Examples X. Hypothesis Testing with Two Samples a) Steps in Hypothesis Testing b) Test on Large Sample Mean c) Test on Small Sample Mean d) Test on Standard Deviation or Variance
References: 1. Alferez, M. & Duro, M.C. (2006). MSA Statistics and Probability. Cainta, Philippines: MSA Publishing House 2. Caras, M. et. al. (2009). Statistics and Probability: A Simplified Approach. Navotas City, Philippines: Navotas Press
Prepared by: Cristyflor M. Escordial June 2016
Chapter 1 Introduction I. Definition of Statistics
Plural sense: set of measurements Singular sense: branch of science which deals with the collection, organization or presentation, analysis and interpretation of data (COPAI)
Statistics has two aspects: Theoretical and Applied 1. Theoretical aspect deals with the development, derivation, and proof of statistical theorems, formulas, rules and laws. 2. Applied Statistics involves the application of these theorems, rules and laws to solve real world problems. In order for a statistician to gain information, he/she collects data for variables used to describe an event. Data are values that the variables can assume. Variables whose values are determined by chance are called random variables. Two Types of Variables 1. Qualitative Variable - words or codes that represent a class or category. 2. Quantitative Variable - numbers that represent an amount or count a) Discrete variable - can be assigned values such as 1, 2, 3, ... and are said to be countable. b) Continuous variable - can assume all values between two specific values like 0.5, 1.2, etc. Types of Statistics 1. Descriptive Statistics - summarizes or describes the important characteristic of a known set of data. For example, the National Statistics Office conducts surveys to determine the average age, income, and other characteristics of the Filipino population. 2. Inferential Statistics - uses sample data to make inferences about a population. It consists of generalizing from samples to populations, performing hypothesis, determining relationships among variables, and making predictions. It uses the concept of probability - the chance of an event to happen. In statistics, we commonly use the terms population and sample. Population - is the complete and entire collection of elements to be studied Sample - subset of a population Parameter - is a numerical measurement describing some characteristic of a population Statistic - s a numerical measurement describing some characteristic of a sample II. Levels of Measurement 1. Nominal Level Characterized by data that consists of names, labels, or categories only. For example: 1. in classifying the instructors in a university as male or female
2. classifying residents according to their area codes 2. Ordinal Level Involves data that may be arranged in some order, but differences between data values either cannot be determined or meaningless. For example: grading system involving letters (A, B, C, D, F) 3. Interval Level Same as ordinal level, with additional property that we can determine meaningful amounts of differences between the data Data at this level may lack an inherent zero point For example: Temperature, IQ level 4. Ratio Level Interval level modified to include the inherent zero starting point. Differences and ratios of data are meaningful The highest level of measurement For example: weight (in kg), height (in)
III. Data Collection and Sampling Techniques Data can be collected in different ways. The most common is through survey - telephone, mailed questionnaire, or personal interview. There are also other methods of collecting data: surveying records or direct observations.
Four Basic Methods of Sampling 1. Random sampling This is done by using chance methods or random numbers For example, number each subject in the population. Place each number in a bowl, and select as many card numbers as needed. The subjects whose numbers are selected composes the sample. 2. Systematic sampling This is done by numbering each subject of the population and then selecting every kth number. For example, there are 5000 families in a city. Fifty families are needed as sample for an experiment. Since 5000/50 = 100, then k=100. This means that every 100 th subject would be selected. However, the first subject would be selected at random from subjects 1 to 100. Suppose the subject 88 was selected, then the sample would consist of subjects whose numbers were 88, 188, 288 and so on until 50 families were obtained. 3. Stratified sampling If a population has distinct groups, it is possible to divide the population into these groups and to draw simple random sampling into these groups. Groups are called strata. Strata are designed so that members in each strata are more homogeneous, that is, more similar to each other. The results are grouped together to form the sample. This technique is useful in populations that can be stratified into groups by gender, race, or geography.
4. Cluster Sampling This method uses intact groups called clusters. Suppose a medical researcher wants to study the patients in Metro Manila. It would be very costly and time consuming to obtain a random sample since they would be spread over different parts of Metro Manila. Rather, a few hospitals could be selected at random and the patients in these hospitals would be studied in a cluster.
Chapter 2 Frequency Distribution and Graphs 2.1 Frequency Distributions - is a collection of observations produced by sorting them into classes and showing their frequency (or numbers) of occurrences in each class. - three basic types of frequency distribution: categorical, ungrouped, and grouped 2.1.1 The categorical frequency distribution is used for data that can be placed in specific categories, such as nominal, or ordinal level data. Example 1: The following data give the results of a sample survey. The letters A, B, and C represent the three categories. A C B
B B C
A C C
A B A
C B C
C C C
A B C
C B B
C B C
C C A
Construct a frequency distribution table for these data. Solution: The categories are letters. Record these categories in the first column. Then read each result from the given data and mark a tally, denoted by “ | “ in the second column next to the corresponding category. The tallies are marked in blocks of fives for counting convenience. Lastly, record the tallies for each category in the third column. This column is called the column of frequency. Category Tally Frequency A |||| - | 6 B |||| - |||| 9 C |||| - |||| - |||| 15 sum = 30 The sum of the entries in the frequency column gives the sample size or total frequency. Exercise 1: Twenty five students were given a blood test to determine their blood types. The data set is as follows. A B B AB O O O B AB B B B O A O A A O O A AB O O B AB
Category
Tally
Frequency
When observations are sorted into classes of single values, the result is called a frequency distribution for ungrouped data. When the observations are sorted into classes of more than one value, the result is called a frequency distribution for grouped data.
variable 2nd class lower limit of the 4th class
Weekly Expenses of 80 Employees Weekly Expenses Number of Employees 100-104 5 105-109 16 110-114 11 115-119 40 120-124 8
frequency of the 2nd class
upper limit of the 4th class
Terminologies associated with frequency tables. 1. Lower class limit - the smallest data value that can be included in the class. 2. Upper class limit - the largest data value that can be included in the class. 3. Class boundaries - are used to separate the classes so that there are no gaps in the frequency dictribution. 4. Class marks - the midpoints of the classes. 5. Class width - the difference between two consecutive lower class limit. The class width of the preceding distribution is 5 (105-100 = 5).
Steps in constructing a frequency table. 1. Decide on the number of classes your frequency table will have. Usually, it is between 5 and 20. 2. Find the range. This is the difference between the highest and lowest scores. 3. Find the class width. Divide the range by the number of classes. The class width should be an odd number. This ensures that the midpoint of each class has the same place value as the data. 4. Select a starting point, either the lowest score or the lower class limit. Add the class width to the starting point to get the second lower class limit. Then enter the upper class limit. 5. Find the boundaries by subtracting 0.5 from each lower class limit and adding 0.5 to the upper class limit. 6. Represent each score by a tally. 7. Count the total frequency for each class. Example 2: When 40 people were surveyed at Greenbelt 3, they reported the distance they drove to the mall, and the results (in kilometers) are given below.
2 8 1 5 9 15 4 10 6 5 25 40 31 24 20 25 8 1 1 16 Construct a frequency distribution table.
5 5 20 23
14 1 3 18
10 8 9 25
31 12 15 21
20 10 15 12
Solution: Step 1: The number of classes is 8. (Chosen arbitrarily) Step 2: Range = highest - lowest = 40 - 1 = 39 Step 3: Class width = Range / classes = 39/8 = 4.875 ≈ 5 Step 4: Determine the lower class limits. Subtract 1 unit from the lower class limit of the second Class limits class to obtain the upper limit of the first class: 6 - 1 = 5. 1 Then add the width to get the succeeding upper class limit 6 11 16 21 Class limits 26 1 - 5 31 6 - 10 36 11 - 15 16 - 20 Step 5: Determine the class boundaries. 21 - 25 26 - 30 Class limits Class boundaries 31 - 35 1 - 5 0.5 - 5.5 36 - 40 6 - 10 5.5 - 10.5 11 - 15 10.5 - 15.5 16 - 20 15.5 - 20.5 21 - 25 20.5 - 25.5 26 - 30 25.5 - 30.5 31 - 35 30.5 - 35.5 36 - 40 35.5 - 40.5 Step 6: Tally the scores. Class limits 1 - 5 6 - 10 11 - 15 16 - 20 21 - 25 26 - 30 31 - 35 36 - 40
Class boundaries 0.5 - 5.5 5.5 - 10.5 10.5 - 15.5 15.5 - 20.5 20.5 - 25.5 25.5 - 30.5 30.5 - 35.5 35.5 - 40.5
Tally |||| - |||| - | |||| - |||| |||| - | |||| |||| - |
Class limits 1 - 5 6 - 10 11 - 15 16 - 20 21 - 25 26 - 30 31 - 35 36 - 40
Class boundaries 0.5 - 5.5 5.5 - 10.5 10.5 - 15.5 15.5 - 20.5 20.5 - 25.5 25.5 - 30.5 30.5 - 35.5 35.5 - 40.5
Tally |||| - |||| - | |||| - |||| |||| - | |||| |||| - |
|| |
|| |
Frequency 11 9 6 5 6 0 2 1
Step 7: Make the frequency distribution table.
A variation of the standard frequency table is used when cumulative total are desired. The cumulative frequency for a table, whose classes are in increasing order, is the sum of the frequencies for that class and all previous classes. Class limits 1 6 11 16 21 26 31 36 -
5 10 15 20 25 30 35 40
Class boundaries
Class Midpoints
Tally
Frequency
0.5 - 5.5 5.5 - 10.5 10.5 - 15.5 15.5 - 20.5 20.5 - 25.5 25.5 - 30.5 30.5 - 35.5 35.5 - 40.5
3 8 13 18 23 28 33 38
|||| - |||| - | |||| - |||| |||| - | |||| |||| - |
11 9 6 5 6 0 2 1
|| |
Cumulative Frequency 11 20 26 31 37 0 39 40
Note: When constructing frequency tables, 1. The classes must be mutually exclusive; each score must belong to only one class. 2. Include all classes, even if their frequency is zero. 3. Make sure that all classes have the same width. 4. Try to select convenient numbers for class limits. 5. Make sure that the number of classes should be between 5 and 20. Exercise 2. Construct a frequency table using 6 classes for the IQ scores for a group of thirty-five high school students. 91 110 80 75 90 95 77 87 112 69 105 79 100 108 95 85 109 100 86 98 90 123 96 90 99 90 80 103 98 71 84 94 93 104 89 Class limits
Class boundaries
Class Midpoints
Tally
Frequency
Cumulative Frequency
Exercises 1. The following table gives the frequency distribution of ages for all employees of a company.
Ages 18 to 30 31 to 43 44 to 56 57 to 69
Number of Employees 12 17 14 7
a) Find the class boundaries and class midpoints. b) Do all classes have the same class width? If yes, what is the class width? c) Prepare the cumulative frequency column. 2) The data represents the cholesterol level (in milligrams per 100 milliliters) of 20 patients. Construct a frequency table using a class width of 5. 210 202 221 210 208 208 207 212 200 199 217 209 218 213 203 215 210 210 208 218
2.2 Histograms, Frequency Polygons and Ogives 2.2.1 Histogram - this is a graph that displays the data by using vertical bars or various heights to represent the frequencies. To draw a histogram, first mark the classes on the horizontal axis and frequencies on the vertical axis. Next, draw a bar for each class so that its height represents the frequency of that class. The bars are drawn adjacent to each other.
2.2.2 Frequency Polygon - this graph displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes. To draw a frequency polygon, mark a dot above the midpoint of each class at a height equal to the frequency of that class. Next, mark two more classes, one at each end, and mark their midpoints. Note that these two classes have zero frequencies. Lastly, join consecutive dots with straight lines. 2.2.3 Ogives - this is a graph that represents the cumulative frequencies of the classes. To draw an ogive, mark the class boundaries on the horizontal axis and the cumulative frequencies on the vertical axis, Plot the cumulative frequencies at each upper class boundary. Upper class boundaries are used since the cumulative frequencies represent the number of observations accumulated up to the upper boundary of each class.
Example 1: Using the data below, construct a histogram, a frequency polygon and an ogive. Class limits Class boundaries Class Midpoints Tally Frequency Cumulative Frequency 1 - 5 0.5 - 5.5 3 |||| - |||| - | 11 11 6 - 10 5.5 - 10.5 8 |||| - |||| 9 20 11 - 15 10.5 - 15.5 13 |||| - | 6 26 16 - 20 15.5 - 20.5 18 |||| 5 31 21 - 25 20.5 - 25.5 23 |||| - | 6 37 26 - 30 25.5 - 30.5 28 0 0 31 - 35 30.5 - 35.5 33 || 2 39 36 - 40 35.5 - 40.5 38 | 1 40 Solution: 1. Histogram
Exercises 1. Thirty cars were tested for fuel efficiency, in kilometers per liter. The following frequency
distribution was obtained. Construct a histogram, frequency polygon, and ogive for the data. km per liter 2.5 - 3.5 3.5 - 4.5 4.5 - 5.5 5.5 - 6.5 6.5 - 7.5
number of cars 2 4 6 10 8
2.3 Other types of graphs 2.3.1 Pareto graph - it is used to represent a frequency distribution for a categorical or qualitative data, and the frequencies are displayed by the heights of vertical bars.
2.3.2 Time Series Graph - it is used to represent data that occur over a specific period of data.
2.3.3 Pie Graph - it wedges according category of the
is a circle that is divided into sections of to the percentage frequencies in each distribution.
Example: A survey of 500 families were asked the question “Where are you planning to spend your vacation this summer?” It resulted in the following distribution. Place Number of People Davao 50 Boracay 200 Palawan 125 Tagaytay 90 Baguio 35 Construct a pie graph for the data and summarize the results. Solution: Step 1. Since there are 360° in a circle, the frequency for each class must be converted into a
proportional part of the circle. ; f - frequency and n - sum of the frequencies Davao: (50/500)*360=36° Boracay: (200/500)*360=144° Palawan: (125/500)*360=90° Tagaytay: (90/500)*360=64.8° Baguio: (35/500)*360=25.2° Step 2. Convert each frequency to percentage
Davao: (50/500)*100=10% Boracay: (200/500)*100=40% Palawan: (125/500)*100=25% Tagaytay: (90/500)*100=18% Baguio: (35/500)*100=7% Step 3. Using a protractor, draw the graph and label each section with the name and percentage.
Exercises 1. Construct a conditions per 100
Condition Arthritis Hypertension Heart Disease Cataracts Diabetes
Number 48 36 32 17 11
Pareto graph for the number of health reported by the elderly in a survey.
2. Draw a time series graph to represent the number of road accidents for the given years. Year 2000 2001 2002 2003 2004 2005 Number 440 440 312 250 210 185 3. In a survey of 100 males concerning the sports they play, the following data were obtained. Construct a pie graph. Sport: Golf - 45, Tennis - 20, Swimming - 10, Badminton - 25.