Day 6 Class outline BUSN212. 

Monday week 3 11/28/2011

___________________________________________________________________________________

Preliminaries. Hope you all really enjoyed the long weekend. Quiz 1 will be this Friday. SAS assignment #1 is due Friday. Details regarding group project will be be posted to Moodle by Wednesday AM.  After this week, two more weeks until Christmas break, in that time you will have another quiz, the midterm, and your group's project proposal will be due.  Please try to keep all this in mind.

Also, if you were not here on Wednesday it is very important that you get notes, REVIEW THESE NOTES, and review the class outline from last Wednesday's class.

TODAY:

(1) Introduction to regression analysis.

When we reject the null in tests of significance on rho this indicates that there is a significant linear (line-like) relationship. (As in training time and performance time example)

What sort of information does a straight line equation yield?   (1) slope, (2) y intercept and (3) predictions of y based on x.

Technique of OLS (ordinary least squares) or for short, just LS (least squares). 

 

Outlier analysis. 

Like sample correlation coefficient, LS estimates are sensitive to outliers.  With simple regression (only one explanatory variable) can more often than not identify outliers visually.  Technique for identifying outliers statistically introduced in problem set 2.  Technique is called least median squares (LMS) .  LMS is an extremely powerful technique we can use to identify outliers.

Idea is that, as you know, median values are far less sensitive to outliers than mean averages are.

For more background on LMS click here or here.


 

Writing SAS code to generate LMS statistics:

As an example take the following program:

TITLE'first example of LMS';

TITLE2'PERFORMANCE TIME';

DATA TIME;

LABEL TT='TRAINING TIME (HR)'

PT='PERFORMANCE TIME (MIN)';

INPUT TT PT;

DATALINES;

27 15

28 14

22 18

22 17

15 22

29 13

24 15

20 16

26 14

15 21

22 16

25 18

23 17

28 13

20 15

25 15

18 20

;

proc iml;

use time;

read all var {TT} into x;

/*you would list the explanatory variables here*/

read all var {PT} into y;

/*you would list the dependent variable here*/

optn=j(8,1,.);

optn[2]=2;

optn[3]=1;

optn[8]=0;

call lms(sc,coef,wgt,optn,y,x);

run;

/*ignore the red and leave the six lines above as they are here*/