Suppose for the same 3000 women, we also measured their weight. Suppose the data were again normally distributed. The average weight is 143 lbs., and the standard deviation is 30 lbs. Suppose the correlation between height and weight is r = +.69.
What is the slope of the best-fit line to predict height from weight? What is the intercept of the line? (Make height the Y variable and weight the X variable).
Write a sentence or two that says what the equation of the line tells you about the relation between the variables: that is, when weight increases by one pound, how much does the predicted value of height increase?
What is your best guess for the height of a woman who weighs 143 lbs? What is your best guess for the height of a woman who weighs 158 lbs?
4 points for part A: 1 point for correct final answer, 1 point for showing work – this applies to both slope and intercept. You don’t need any original data here – you should use the shortcut formulas in the book, first for the slope, and then for the Y-intercept.
Slope = .? Intercept = ?
2 points for part B: The slope is ______, so predicted height will _______with weight. For each additional pound added, one should predict an increase of _____ inches in height.
4 points for part C: 2 points for each guess: 1 for correct final answer, 1 for showing work.
Plug and chug into regression equation: Y = ____________
For the second, Y = _________
3. (20 pts) For the same data as in Questions 1-2, make a fairly detailed drawing by hand of what the scatterplot would look like. (You don’t have the original data, but you can actually provide quite a bit of information about the scatterplot!) Be sure to clearly indicate each of the following: which variable is X or Y, the range on the X and Y axis of each of the variables (you can figure out the approximate range that will include most of the scores of height and weight by knowing the means and standard deviations, and by using Table A in Appendix D), the equation of the best-fit line for predicting Y from X, and a sense of the dispersion of points that is close to a correlation of +.69 (see examples in chapter 6). Be sure to accurately draw the regression line on the scatterplot, label it with its equation (taken from Question 2), and also to include a few sample deviates for the best-fit line (again, see examples in chapter 6); make sure the deviates go in the right direction. (Note: You don’t need to draw all 3000 data points – just include enough to give a sense of the spread of the data.
Point allocation:
2 points for labeling X axis _______
2 points for labeling Y axis __________
2 points for correct units and range on X axis (+ 3 standard deviations – the N is large)
2 points for correct units and range on Y axis (+ 3 standard deviations – the N is large)
6 points for correct best fit line for predicting Y (drawn precisely and labeled with the equation from question #2:
2 points for a scatter of points that is roughly like r = (see examples in chapter 6)
2 points for sample deviates from best fit line for predicting Y ( )
1 point for title for the graph
1 point for concentration of points in middle of graph, reflecting normally distributed variables
4. (15 pts) Use the SPSS data file ‘hw3-spring2015’ for this problem. The data represent the scores of some students on two tests. Make a scatterplot of the data (however, you don’t need to include it when you turn in your assignment).
A. Is the relation linear or nonlinear? Is it perfect or imperfect? Is it positive or negative?
B. What is the value of the correlation coefficient between the scores for each student on the first and second exams?
C. What is the equation of the best-fit line when you try to predict a student’s score on the second test (Y) by looking at their score on the first test (X)?
D. What would you predict as a student’s score on the second test if that student scored a 75 on the first test? What would you predict as a student’s score on the second test if that student scored an 88 on the first test? (Do these by hand – show you work, and round each answer to the nearest whole number.)
The SPSS file in on Blackboard.
A. 3 points, 1 for each subquestion: The relationship is _____,____,________.
B. 1 point, straight out of SPSS. The correlation coefficient is_______.
C. 5 points: 1 for correct slope = ______, 1 for correct intercept = _____, 1 for correct general form of the equation:____________, 2 for naming the variables (1 point for _____and 1 point for ______, rather than just generic X and Y).
D. 6 points: for each part, 1 for correct final answer, 2 for showing work.
For the first part:
Answer is________.You should just plug in 75 into the regression equation in Part C and show your work.
For the second part:
Answer is______. You should just plug in 88 into the regression equation in Part C and you should show your work.
5. (20 pts) Answer each of the following True-False questions. Assume that all of the assumptions for correlation and linear regression have been met (including the assumption that the X and Y variables are each normally distributed). In your write-up, just list the sub-question letter (A-J) and whether the statement is True or False – no need to restate the question or to justify your answer.
A. Correlation always implies causation.
B. If a correlation is negative, then as it becomes even more negative, r2 decreases.
C. If the units used to measure the Y variable change (like from inches to centimeters), then the value of r will change.
D. As |r| increases, the average deviation of data from the predicted value (according to the best-fit regression line) increases.
E. The best-fit regression line to predict Y when you know X will always go through the point ZX = 0 and ZY = 0.
F. If a positive correlation exists between X and Y, and the range of X is then greatly restricted, |r| must increase.
G. If a positive correlation exists between X and Y, and a new data point is added whose ZX = 3 and ZY = 0, the correlation will increase. (Note: for G, H, I, and J, assume there are many, many data points in the dataset, so that the introduction of new data points doesn’t change the values of the averages in any meaningful way.)
H. If no correlation exists between X and Y, and a new data point is added whose ZX = 2.5 and ZY = 2.5, r will increase.
I. If a negative correlation exists between X and Y, and a new data point is added whose ZX = 2.5 and ZY = 2.5, |r| will decrease.
J. If a positive correlation exists between X and Y, and a new data point is added whose ZX = 2.5 and ZY = 2.5, |r| will decrease.