Demand Estimation by Using Regression Analysis Regression Analysis a statistical method used to establish a relationship between a variable (Dependent Variable) and other factors that will affect it (Independent Variables). This relationship can be expressed as a functional form: Q = a0 + a1 A + a2 B + a3 C
Demand Estimation for a product or service using regression analysis is important in the business world especially to the corporate executives and managers because it will enable them to make reasonable forecast for their goods and services in the near future. The manager can narrow down those factors that are important in influencing their sales and thereby formulate appropriate strategies or policies to achieve their management objectives.
The actual process of Regression Analysis can be very complex but it can be summarized into FOUR important steps: 1. Mod Model el Specific Specificati ation: on: Set the objecti objective ve and identify identify the importan importantt variab variables les which which have influence on the dependent variable. 2. Data Data collect collected ed for all the the variab variables les speci specifie fied. d. 3. Choi Choice ce of of a func functi tion on for form m e.g. Linear or non-linear form 4. Estima Estimatio tion n and interp interpret retati ation on of results results..
1. Model Specification
If we want to study the factors affecting the demand for automobiles (Qx) in the country, we mustt identi mus identify fy the mos mostt import important ant variab variables les that that are believ believed ed to affect affect the demand for automobiles e.g.
a)
Price of the automobile
(Px)
b)
Per capita income
(Yc)
c)
No. of working population
(L)
d)
Rate of interest, etc
(I)
Qx = f(Px, Yc, L, I,…..)
1
2.
Data collection on the variables.
2 types of data : a) Time Series Data Data is collected for each variable over time (yearly, quarterly, monthly or daily, etc) b) Cross-Sectional Data Data are collected for same time period but from different section or geographical area of the society.
Types of data to be used depend on the availability of data. a) Primary data – Data collected from the field through market survey, sampling, & etc. b) Secondary data – These are published data by relevant authority such as Statistical Department, Economic Reports, etc. 3.
Specifying the form of Equation. i) The simplest model to deal with and the one which is often also the most realistic is the linear model. e.g. Qx = a0 + a1 Px + a2 Y + a3 L + a4 I + ……..+ e
a0,a1,….,a4 are parameters (coefficients) to be estimated e = disturbance term or error term ii) Non- Linear model Sometimes a non-linear form may be the data better than a linear equation. Qx = a0 Pxα1.Yc α2. L α3. I α4 4.
(Power Function)
Testing the (Econometric) Result
To evaluate the regression results several statistics are examined. a) The sign of each estimated coefficient must be checked to see if it conforms to what is expected on the theoretical grounds. b) Coefficient of Determination, R 2 c) t – tests (coefficient) d) Durbin-Watson statistics, etc. e) The F-statistics (F-stats) Note : The statistical procedure in solving Multiple Regression Problems can be very complicated. Fortunately there are many computer software’s available to achieve our objective. i.e TSP (Time-Series Processor) or SPSS can be used to solve our problems. 2
REGRESSION ANALYSIS
It describes the way in which one variable is related to another. Regression analysis derives an equation that can be used to estimate the unknown values of one variable on the basis of known values of another variable. (a) Simple Regression Analysis Y = a + bX
where Y is sales volume & X is advertising expenditure
Example 1 (Taken from ECO556 Manual Table 4.1, page 136 ) Year
Sales (Y) (million dollars) 44 58 48 46 42 60 52 54 56 40
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Advertising Expenditure (X) (million dollars) 10 13 11 12 11 15 12 13 14 9
The result from computer print out : LS// Dependent variable is SAL SMPL range 1986 - 1995 Number of observation 10 Variable
Coefficient
Std. Error
T-Stat
2-Tail Sig.
C ADV
7.6000000 3.5333333
6.332345 0.5222813
1.2001912 6.751919
0.264 0.000
R-squared Adjusted R-squared S.E. of regression Durbin-Watson stat Log likelihood ^
^
Y = ^
=>
Mean of dependent var S.D of dependent var Sum of squared resid F-statistic
50.00000 6.992059 65.46667 45.76782
^
a ^
0.851212 0.832614 2.860653 1.224915 -23.58417
+
bX
^
Y = 7.6 +
3.53X
3
(b) Multiple Regression Analysis Y = a1 + b 1 X 1 + b 2 X 2 where Y is sales volume X 1 is advertising expenditure X 2 is price of the product
, a1 is the intercept , b1 is the Y/X1, marginal effect of adv on sales , b2 is the Y/X2, marginal effect of price on sales
Example 2 (Taken from ECO556 Manual Table 4.3, page 141 ) Year
Sales (Y) (million dollars)
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
44 58 48 46 42 60 52 54 56 40
Advertising Expenditure (X1) (million dollars) 10 13 11 12 11 15 12 13 14 9
Price (X2) (million dollars) 1 1.2 2 1.8 2.1 0.8 1.4 2.0 1.5 1.0
The result from computer print out : LS// Dependent variable is SAL SMPL range 1986 - 1995 Number of observation 10 Variable
Coefficient
Std. Error
T-Stat
2-Tail Sig.
C ADV P
11.60403 3.4936051 -2.3836921
6.9633945 0.5078770 1.9495316
1.6665152 6.8788413 -1.2226999
0.140 0.000 0.261
R-squared Adjusted R-squared S.E. of regression Durbin-Watson stat ^
^
Y = ^
=>
^
a1 ^
0.877397 0.842367 2.776058 1.41
+ ^
Y = 11.60 +
Mean of dependent var S.D of dependent var Sum of squared resid F-statistic
50.00000 6.992059 53.94549 25.04734
^
b1X1
+ b2X2
^
3.49X1 - 2.38X2
4
Evaluation of Results (Computer Printouts)
These are the importance statistical results should be interpreted: a. The sign of each estimated coefficient b. Coefficient of determination (R 2) c. Standard error of estimate (Se) d. The t-statistics (t-stats) e. The F-statistics (F-stats)
Interpretation :
a.
The sign of each estimated coefficient must be checked to see if it conforms to what is expected on the theoretical grounds. From Example 1:
^ ^ Y = 7.6 +
^ 3.53X
The estimated function show positive value (+ 3.53) , so it conforms to the expected economic theory. If we spend $1 on Advertisement (X) then the Sales(Y) will increase by 3.53 units.
b.
Coefficient of determination (R 2) The value of R 2 ranges from ‘0’ to ‘1’
R 2 = ‘0’
(it shows that none of the independent variables explain the changes in the dependent variable)
R 2 = ‘1’
(it shows that all the changes in the dependent variable is explained by the variation in the independent variables)
R 2 = ‘0.85’
(it shows that 85% of the changes in the dependent variables is explained by the variation in the independent variables, advertising expenditure. The other 15% cannot be explaine by the regression analysis. This may be due to the omission of some important independent variables.)
5
c.
Standard error of estimate (Se) It is a measure of dispersion of data points from the line of best fit (regression line). Actual points do not lie on the regression line but are dispersed above and below the line. Thus, the value predicted by regression line will be subjected to error. Therefore, the Se measures the probable error in the predicted value.
For example, data from table 4.1, when the advertising expenditure is $9 the sales is $40. If we use the regression results, the sales is $39.37. Therefore the value predicted will have an error. The std. error of estimation can be calculated by using the following formula: n Se
Σ (Y t – Y) 2
=
t=1 n - k
Se is useful to estimate the range within which the dependent variable will lie at a specified probability. At 95% probability the dependent variable will lie in the predicted interval of :
Y +
t n – k * Se
Where Y is the predicted value of dependent value based on the regression, n – k is the degree of freedom (df), it is used to get the critical value for students’ distribution, n is the number of observation and k is the number of coefficient estimated.
6
Example : Se = 2.8
At 95% confidence interval of sales when Adv. Exp. (X) = 9 and
Y = 39.37 then
Y +
t n – k * Se
=> 39.37 + (2.306)(2.8)
39.37 + 6.457
Thus, at 95% C.I. when adv. Exp. Is $9 million, the range of Sales from $32.913 to $45.827 million
d.
T-Statistics The t-statistics is used in t – test to determine if there is a significant relationship between the dependent and each of the independent variable. To do this test, we need the std. error of coefficient (Sb) and calculate the ‘t’ value. Then we compare the calculated ‘t’ value and the critical ‘t’ value from the student ‘t’ distribution table.
The ‘t’ value is calculated by dividing the value of coefficient (b) by Sb :
Calculated t = b
Sb
i.e : Calculated t = 3.53 = 6.79 0.52 To calculate the critical value from student ‘t’ distribution table: n – k = 10 – 2 = 8 df at 95% C.I and the ‘t critical ‘ = 2.306 Since t computed ( 6.79) > t critical (2.306) then adv.exp. is statistically significant in explaining the variations in sales at 95% C.I. Note: if there is more than one independent variable then you have to test significance for all the independent vars.
7
-
e. Durbin Watson Statistics
It indicates that whether the presence or absence of auto correlation means the problem that can arise in regression analysis with time series data. There are 3 possibilities where autocorrelation or multi-co linearity problem can arise: When independent variables are interrelated or duplicated Where independent variables have been miss- specified • • •
Where important independent variables are found missing.
f. F-statistics It is another test of overall explanatory power of regression analysis. (Refer pg 147 manual)
----end of short notes on demand estimation----
8