Class outline Day 12: 12-14-11 Wednesday week 5

Reminders/announcements.

Proposal due today.  Midterm Friday.  Format?  Like sets and quizzes.  Coverage? Over twelve days of class, materials posted to Moodle, the three problem sets and their keys, sections 1-8 of chapter 11, sections 1-4 and section 8 of chapter 12, and section 2 of chapter 13.)  Takehome quiz (quiz 2) is almost ready and will be posted to Moodle by 6:00 PM this evening. It is due Friday when you come to take the midterm exam. 

Two things on the agenda for today.  (1) Quick recap and review of multiple category dummy variables (see section 8 of chapter 12 and section 2 of chapter 13) (2) "t" tests (section 5 of chapter 11 and section 4 of chapter 12).

 

Dummy variables (see section 8 of chapter 12 and section 2 of chapter 13).

Can you put qualitative factors into a regression analysis (for example location of a property, or the ethnicity or political affiliation of a person)? Yes! We define a variable -called a dummy or indicator variable- that represents that characteristic numerically) Problem 1 problem set 3 (employee salary based on an employee's gender and their years of experience).

More than two categories to a qualitative factor? You are not looking at something like gender which is divided into two categories male or female but instead looking at a qualitative factor that can be broken down into three or more categories? Follow the dummy variable rule. What is this?

Dummy variable rule. If have two categories to a qualitative factor would be silly to use two dummy variables (i.e. gender a dummy variable for male which is 1 if a person is male and 0 if female along with a dummy variable for female, defined as 1 if female and 0 if male. What is the one factor telling you that the other one isn't?) Similarly if you have three categories for a qualitative factor you should use two dummy variables, four categories then three dummies, five then four and so on. The category you do NOT create a dummy variable for is referred to as the omitted category or the base case. The estimated coefficients are then interpreted relative to that omitted category or base case. (problems 3 and 5 on problem set 3). Forgetting this rule leads to what is often referred to as the "dummy variable trap" (see Newbold, Carlson and Thorne, p. 568).

 

Part 1 of t tests with multiple regression results.

As you know from the assigned reading, the formula for the regression model in general is:

y= ß0+ ß1*x1+ ß2*x2+...+ßk*xk + e

(The parameters or constants of the above equation are estimated using LS and generating the sample regression equation/equation of best fit of

Ŷ=b0+b1*x1+b2*x2+…+bk*xk )

You specify the dependent variable that you are trying to better understand. You then list the explanatory factors or "x's" that you think significantly influence that y. The claim that one of the the explanatory variables the model is based on is NOT significant would be written as:

H0: ßi=0 (where i = the particular variable you want to test. The first x, the second one, the third and so on.)

And the alternative that that xi IS significantly related to y can be written in one of the following three ways:

Ha: ßi >0 OR Ha: ßi <0 OR Ha: ßi ¹0. Of course, how you set up the alternative depends on whether you think that factor has a statistically significant positive effect on y, a statistically significant negative effect on y, or you think that there is a statistically significant relationship between y and this x but you are not sure if it is a positive or if it is a negative relationship.

Examples: Using alpha of 5% determine (1) if AE in the Fresh Detergent example (#6, problem set 3, part c) has a statistically significant relationship to sales (2) if the price differential in the Fresh Detergent example has a statistically significant effect on sales (#6, problem set 3, part d) and (3) if traffic flow has a statistically significant effect on daily revenues in the restaurant example (#6, problem set 3, part c).

To test your understanding, as you review your notes from today's class, use a 5% level of significance and, with t tests, identify which variables in the regression of hours of internet use per week on number of kids, income, education, and number of computers are significant. Do the same for the small appliance sales example..

Part 2 of t tests with multiple regression results: Once you've established that it likely that a statistically significant relationship between a particular explanatory variable and the dependent variable exists you can then test more specific claims. For example, in the appliance sales example, can we reject the claim that, all else equal, an additional dollar of advertising will on average increase sales by no more than $2?