Reminders/announcements.
Proposal due today. Midterm Friday. Format? Like sets and quizzes. Coverage? Over twelve days of class, materials posted to Moodle, the three problem sets and their keys, sections 1-8 of chapter 11, sections 1-4 and section 8 of chapter 12, and section 2 of chapter 13.) Takehome quiz (quiz 2) is almost ready and will be posted to Moodle by 6:00 PM this evening. It is due Friday when you come to take the midterm exam.
Two things on the agenda for today. (1) Quick recap and review of multiple category dummy variables (see section 8 of chapter 12 and section 2 of chapter 13) (2) "t" tests (section 5 of chapter 11 and section 4 of chapter 12).
Dummy variables (see section 8 of chapter 12 and section 2 of chapter 13).
Can you put qualitative factors into a regression
analysis (for example location of a property, or the ethnicity or political
affiliation of a person)? Yes! We define a variable -called a dummy or indicator
variable- that represents that characteristic numerically) Problem 1 problem set
3 (employee salary based on an employee's gender and their years of experience).
More than two categories to a qualitative factor? You are not looking at
something like gender which is divided into two categories male or female but
instead looking at a qualitative factor that can be broken down into three or
more categories? Follow the dummy variable rule. What is this?
Dummy variable rule. If have two categories to a qualitative factor would be
silly to use two dummy variables (i.e. gender a dummy variable for male which is
1 if a person is male and 0 if female along with a dummy variable for female,
defined as 1 if female and 0 if male. What is the one factor telling you that
the other one isn't?) Similarly if you have three categories for a qualitative
factor you should use two dummy variables, four categories then three dummies,
five then four and so on. The category you do NOT create a dummy variable for is
referred to as the omitted category or the base case. The estimated coefficients
are then interpreted relative to that omitted category or base case. (problems 3
and 5 on problem set 3). Forgetting this rule leads to what is often referred to
as the "dummy variable trap" (see Newbold, Carlson and Thorne, p. 568).
Part 1 of t tests with
multiple regression results.
As you know from the assigned reading, the formula for the regression model in general is:
y= ß0+ ß1*x1+ ß2*x2+...+ßk*xk + e
(The parameters or constants of the above equation are estimated using LS and
generating the sample regression equation/equation of best fit of
Ŷ=b0+b1*x1+b2*x2+…+bk*xk )
You specify the dependent variable that you are trying to better understand. You
then list the explanatory factors or "x's" that you think significantly
influence that y. The claim that one of the the explanatory variables the model
is based on is NOT significant would be written as:
H0: ßi=0 (where i = the particular variable you want to test. The first x, the
second one, the third and so on.)
And the alternative that that xi IS significantly related to y can be written in
one of the following three ways:
Ha: ßi >0 OR Ha: ßi <0 OR Ha: ßi ¹0. Of course, how you set up the alternative
depends on whether you think that factor has a statistically significant
positive effect on y, a statistically significant negative effect on y, or you
think that there is a statistically significant relationship between y and this
x but you are not sure if it is a positive or if it is a negative relationship.
Examples: Using alpha of 5% determine (1) if AE in the Fresh Detergent example
(#6, problem set 3, part c) has a statistically significant relationship to
sales (2) if the price differential in the Fresh Detergent example has a
statistically significant effect on sales (#6, problem set 3, part d) and (3) if
traffic flow has a statistically significant effect on daily revenues in the
restaurant example (#6, problem set 3, part c).
To test your understanding, as you review your notes from today's class, use a
5% level of significance and, with t tests, identify which variables in the
regression of hours of internet use per week on number of kids, income,
education, and number of computers are significant. Do the same for the small
appliance sales example..
Part 2 of t tests with multiple regression results: Once you've established that
it likely that a statistically significant relationship between a particular
explanatory variable and the dependent variable exists you can then test more
specific claims. For example, in the appliance sales example, can we reject the
claim that, all else equal, an additional dollar of advertising will on average
increase sales by no more than $2?