CLASS OUTLINE DAY 8:

Friday 12-2-2011.  Week 3.  Winter 2011-2012

(1) Quiz 1 and SAS assignment #1 are both moved to next Monday, December 4th.  (2) We started "regression" analysis Monday. (3) Note handout on project posted to Moodle Wednesday morning.  Please take some time to read this. I should be meeting with your group soon to discuss project ideas with you.  (4) The midterm exam is scheduled for Friday of week five (12-16-2011). (5) Keys to both set 1 and 2 have been posted to Moodle.  Note lab times for both class material as well as help with SAS and JMP.

Stats II

Kevin Zaker

Sundays 5:00-7:00p.m. Olin 109

Tuesdays 4:00-6:00p.m. Evald 305

Thursdays  3:00-5:00p.m. Evald 305

 

Lynn Reinacher

Mondays & Wednesdays 2:00-4:00p.m.  Evald 305

 

Jessica DuPerow

Tuesdays & Thursdays 1:00-3:00p.m. Evald 305

 

Stats II

Jacob O’Rourke

Tuesdays & Thursdays 12:30-2:30p.m. Evald 311

 

SAS LAB 

Kevin Zaker

Sundays 5:00-7:00p.m. Olin 109  

Tuesdays & Thursdays  3:00-5:00p.m . Evald 305 

 

Lynn Reinacher

Mondays & Wednesdays 2:00-4:00p.m.  Evald 305

 

Justin Sell

Tuesdays & Wednesdays 5:00-7:00p.m.  Olin 109

 

 

Stats II and SAS

Alex Kurian

Email as needed: alexkurian09@augustana.edu

 




The coefficient of determination, R2

Note on SAS output or JMP where you find  the coefficient of determination: R

R squared is a measure of well the "line of best fit" (the LS sample regression equation) fits the data. 

Calculated by taking the ratio of total "explained" variation in y to the total variation in y.

Total variation in y is equal to the numerator of the sample variance formula.   The amount of "explained" variation in y is equal to the total variation in y that you start with  minus the amount of variation left unaccounted for by the estimated regression line, SSE or the error sum of squares (sum of squared residuals).


 

Outlier analysis. 

As noted on Wednesday, like sample correlation coefficient, LS estimates sensitive to outlier.  With simple regression can more often than not identify outliers visually.  Technique for identifying outliers statistically introduced in problem set 2.  Technique is called least median squares (LMS) .  LMS is an extremely powerful technique we can use to identify outliers.

Idea is that, as you know, median values are far less sensitive to outliers than mean averages are.

For more background on LMS click here or here.

Keep in mind basic four step process: (1) Run LMS (2) Go to LMS residuals section of printout (3) Look at Res/S (4) If Res/S for a particular observation is greater than 2.5 in absolute value we regard this observation as an outlier.  The logic behind LMS and the 2.5 cutoff have been discussed in class. 


 

Writing SAS code to generate LMS statistics:

As an example take the following program :

TITLE'first example of LMS';

TITLE2'PERFORMANCE TIME';

DATA TIME;

LABEL TT='TRAINING TIME (HR)'

PT='PERFORMANCE TIME (MIN)';

INPUT TT PT;

DATALINES;

27 15

28 14

22 18

22 17

15 22

29 13

24 15

20 16

26 14

15 21

22 16

25 18

23 17

28 13

20 15

25 15

18 20

;

proc iml;

use time;

read all var {TT} into x;

/*you would list the explanatory variables here*/

read all var {PT} into y;

/*you would list the dependent variable here*/

optn=j(8,1,.);

optn[2]=2;

optn[3]=1;

optn[8]=0;

call lms(sc,coef,wgt,optn,y,x);

run;

/*ignore the red and leave the six lines above as they are here*/

Note that whenever I want to run LMS I simply come to this program, copy it, paste it into SAS program editor, and modify it to fit the problem I am working with at the time.



MULTIPLE REGRESSION (SECTIONS 1-3 OF CHAPTER 12) (Note not on Monday's quiz!)

MULTIPLE REGRESSION.

R square. "Multiple regression"

Hours of internet use example:  Simple regression: Hours of internet use and number of kids. R squared (i.e. coefficient of determination)

MLR (multiple linear regression) Estimation of regression planes versus regression lines.

        (a) Running MLR with SAS:
                proc reg; model H=C I E N;

        (b) Interpretation of results. (powerpoint)

        (c) Forecasts using estimated regression equation.

        (d) Using JMP to get MLR results (click here.)

        (e) Reading SAS output (click here

 

 

: