Statistics
posted by Pam on .
Instructions for Data Sets: Choose one of the data sets A − J below or as assigned by your instructor. Only the first 10 observations are shown for each data set (files are on the CD). In each data set, the dependent variable (response) is the first variable. Choose the independent variables (predictors) as you judge appropriate. Use a spreadsheet or a statistical package (e.g., MegaStat or MINITAB) to perform the necessary regression calculations and to obtain the required graphs. Write a concise report answering questions 13.9 through 13.25 (or a subset of these questions assigned by your instructor). Label sections of your report to correspond to the questions. Insert tables and graphs in your report as appropriate. You may work with a partner if your instructor allows it.
13.9 Is this crosssectional data or timeseries data? What is the unit of observation (e.g., firm, individual, year)?
13.10 Are the X and Y data wellconditioned? If not, make any transformations that may be necessary and explain.
13.11 State your a priori hypotheses about the sign (+ or −) of each predictor and your reasoning about cause and effect. Would the intercept have meaning in this problem? Explain.
13.12 Does your sample size fulfill Evans¡¯s Rule (n/k ¡Ý 10) or at least Doane¡¯s Rule (n/k ¡Ý 5)?
13.13 Perform the regression and write the estimated regression equation (round off to 3 or 4 significant digits for clarity). Do the coefficient signs agree with your a priori expectations?
13.14 Does the 95 percent confidence interval for each predictor coefficient include zero? What conclusion can you draw? Note: Skip this question if you are using MINITAB, since predictor confidence intervals are not shown.
13.15 Do a twotailed t test for zero slope for each predictor coefficient at ¦Á = .05. State the degrees of freedom and look up the critical value in Appendix D (or from Excel).
13.16 (a) Which pvalues indicate predictor significance at ¦Á = .05? (b) Do the pvalues support the conclusions you reached from the t tests? (c) Do you prefer the t test or the pvalue approach? Why?
13.17 Based on the R2 and ANOVA table for your model, how would you describe the fit?
13.18 Use the standard error to construct an approximate prediction interval for Y. Based on the width of this prediction interval, would you say the predictions are good enough to have practical value?
13.19 (a) Generate a correlation matrix for your predictors. Round the results to three decimal places. (b) Based on the correlation matrix, is collinearity a problem? What rules of thumb (if any) are you using?
13.20 (a) If you did not already do so, rerun the regression requesting variance inflation factors (VIFs) for your predictors. (b) Do the VIFs suggest that multicollinearity is a problem? Explain.
13.21 (a) If you did not already do so, request a table of standardized residuals. (b) Are any residuals outliers (three standard errors) or unusual (two standard errors)?
13.22 If you did not already do so, request leverage statistics. Are any observations influential? Explain.
13.23 If you did not already do so, request a histogram of standardized residuals and/or a normal probability plot. Do the residuals suggest nonnormal errors? Explain.
13.24 If you did not already do so, request a plot of residuals versus the fitted Y. Is heteroscedasticity a concern?
13.25 If you are using timeseries data, perform one or more tests for autocorrelation (visual inspection of residuals plotted against observation order, runs test, DurbinWatson test). Is autocorrelation a concern?
DATA SET A Mileage and Other Characteristics of Randomly Selected Vehicles
(n = 43, k = 4) Mileage
Obs Vehicle City Length Width Weight Japan
1 Acura CL 20 192 69 3,450 1
2 Acura TSX 23 183 59 3,320 1
3 BMW 3Series 19 176 69 3,390 0
4 Buick Century 20 195 73 3,350 0
5 Buick Rendezvous 18 187 74 4,230 0
6 Cadillac Seville 18 201 75 4,050 0
7 Chevrolet Corvette 19 180 74 3,255 0
8 Chevrolet Silverado 1500 14 228 79 4,935 0
9 Chevrolet TrailBlazer 15 192 75 4,660 0
10 Chrysler Pacifica 17 199 79 4,660 0
.. .. .. .. .. .. .. . . . . . . .
City = EPA miles per gallon in city driving, Length = vehicle length (inches), Width = vehicle width (inches), Weight = weight (pounds), Japan = 1 if carmaker is Japanese, 0 otherwise.
Source: Consumer Reports New Car Buying Guide 2003¨C2004 (Consumers Union, 2003). Sampling methodology was to select the vehicle on every fifth page starting at page 40. Data are intended for purposes of statistical education and should not be viewed as a guide to vehicle performance.
DATA SET C Assessed Value of Small Medical Office Buildings (n = 32, k = 5)
Assessed
Obs Assessed Floor Offices Entrances Age Freeway
1 1,796 4,790 4 2 8 0
2 1,544 4,720 3 2 12 0
3 2,094 5,940 4 2 2 0
4 1,968 5,720 4 2 34 1
5 1,567 3,660 3 2 38 1
6 1,878 5,000 4 2 31 1
7 949 2,990 2 1 19 0
8 910 2,610 2 1 48 0
9 1,774 5,650 4 2 42 0
10 1,187 3,570 2 1 4 1
.. .. .. .. .. .. .. . . . . . . .
Assessed = assessed value (thousands of dollars), Floor = square feet of floor space, Offices = number of offices in the building, Entrances = number of customer entrances (excluding service doors), Age = age of the building (years), Freeway = 1 if within one mile of freeway, 0 otherwise
DATA SET D Changes in Consumer Price Index and Money Supply Components
(n = 41, k = 4) Money
Year ChgCPI CapUtil ChgM1 ChgM2 ChgM3
1960 0.7 80.1 0.5 4.9 5.2
1961 1.3 77.3 3.2 7.4 8.1
1962 1.6 81.4 1.8 8.1 8.9
1963 1 83.5 3.7 8.4 9.3
1964 1.9 85.6 4.6 8.0 9.0
1965 3.5 89.5 4.7 8.1 9.0
1966 3 91.1 2.5 4.6 4.8
1967 4.7 87.2 6.6 9.3 10.4
1968 6.2 87.1 7.7 8.0 8.8
1969 5.6 86.6 3.3 3.7 1.4
1978 13.3 85.2 8.0 7.5 11.8
1979 12.5 85.3 6.9 7.9 10.0
.. .. .. .. .. .. . . . . . .
ChCPI = percent change in the Consumer Price Index (CPI) over previous year, CapUtil = percent utilization of manufacturing capacity in current year, ChgM1 = percent change in currency and demand deposits (M1) over previous year, ChgM2 = percent change in small time deposits and other nearmoney (M2) over previous year, ChgM3 = percent change in large time deposits, Eurodollars, and other institutional balances (M3) over previous year.
Source: Economic Report of the President, 2003. These variables are selected from LearningStats (TimeSeries Data).
DATA SET E College Graduation Rate and Selected Characteristics of U.S. States in 1990 (n = 50, k = 8)
ColGrads
State ColGrad% Dropout EdSpend Urban Age Femlab Neast Seast West
AL 15.6 35.3 3627 60.4 33.0 51.8 0 1 0
AK 23.0 31.6 8330 67.5 29.4 64.9 0 0 1
AZ 20.3 27.5 4309 87.5 32.2 55.6 0 0 1
AR 13.4 23.3 3700 53.5 33.8 53.6 0 1 0
CA 23.4 32.2 4491 92.6 31.5 56.0 0 0 1
CO 27.0 25.9 5064 82.4 32.5 63.3 0 0 1
CT 27.2 25.1 7602 79.1 34.4 63.9 1 0 0
DE 21.4 31.5 5865 73.0 32.9 60.7 1 0 0
FL 18.3 38.9 5276 84.8 36.4 54.6 0 1 0
GA 19.3 37.3 4466 63.2 31.6 57.4 0 1 0
.. .. .. .. .. .. .. .. .. .. . . . . . . . . . .
ColGrad% = percent of state population with a college degree, Dropout = percent of high school students who do not graduate, EdSpend = per capita spending on K¨C12 education, Urban = percent of state population living in urban areas, Age = median age of state¡¯s population, FemLab = percent of adult females who are in the labor force, Neast = 1 if state is in the
Northeast, 0 otherwise, Seast = 1 if state is in the Southeast, 0 otherwise, West = 1 if state is in the West, 0 otherwise. Source: Statistical Abstract of the United States. These variables are selected from LearningStats (States).
DATA SET F Characteristics of Selected Piston Aircraft (n = 55, k = 4)
CruiseSpeed
Obs Manufacturer/Model Cruise Year TotalHP NumBlades Turbo
1 Cessna Turbo Stationair TU206 148 1981 310 3 1
2 Cessna 310 R 194 1975 570 3 0
3 Piper 125 Tri Pacer 107 1951 125 2 0
4 Maule Comet 115 1996 180 2 0
5 Cessna P210 186 1982 285 3 0
6 Piper Dakota 147 1979 235 2 0
7 Cessna 1825 Skylane 140 1997 230 3 0
8 Cessna 421B 234 1974 750 3 0
9 Cessna T210K 190 1970 285 3 1
10 Piper Super Cab 100 1975 150 2 0
.. .. .. .. .. .. .. . . . . . . .
Cruise =best cruise speed (knots indicated air speed) at 65¨C75 percent power, Year = year of manufacture, TotalHP = total horsepower (both engines if twin), NumBlades = number of propeller blades, Turbo = 1 if turbocharged, 0 otherwise. Source: Flying Magazine (various issues). Data are for educational purposes only and not as a guide to performance. These variables are selected from LearningStats (Technology Data).
DATA SET G Characteristics of Randomly Chosen Hydrocarbons (n = 35, k = 7) Retention
Obs Name Ret MW BP RI H1 H2 H3 H4 H5
1 2,4,4trimethyl2pentene 153.57 112.215 105.06 1.4135 0 1 0 0 0
2 1,5cyclooctadiene 237.56 108.183 150.27 1.4905 0 0 0 1 0
3 methylcyclohexane 153.57 98.188 101.08 1.4206 0 0 1 0 0
4 mdiethylbenzene 281.50 134.221 181.29 1.4931 0 0 0 0 1
5 2,2,4trimethylpentane 139.24 114.231 99.39 1.3890 1 0 0 0 0
6 undecane 288.50 156.310 196.00 1.4170 1 0 0 0 0
7 toluene 174.00 92.140 110.60 1.4970 0 0 0 0 1
8 1,7octadiene 172.20 110.200 117.00 1.4220 0 1 0 0 0
9 betapinene 254.00 136.240 165.00 1.4780 0 0 0 1 0
10 methylcyclohexane 152.40 98.190 101.00 1.4220 0 0 1 0 0
.. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . .
Ret = Chromatographic retention time (seconds), MW = molecular weight (gm/mole), BP = boiling point in ¡ãC, RI = refractive index (dimensionless), Class = hydrocarbon class (H1 =acyclic saturated, H2 = acyclic unsaturated, H3 = cyclic saturated, H4 = cyclic unsaturated, H5 = aromatic). Source: Data are courtesy of John Seeley of Oakland University. This is a 50 percent sample of the full data set found in LearningStats (Technology Data).
DATA SET H Michigan High School Top 50 Football Players in 2003 (n = 50, k = 5)
Football50
Obs Position Weight Height Line LB DB RB
1 OT 317 77 1 0 0 0
2 LB/FB 235 74 0 1 0 0
3 LB 220 73 0 1 0 0
4 RB 190 70 0 0 0 1
5 DT 285 76 1 0 0 0
6 OL 315 75 1 0 0 0
7 DE 245 76 1 0 0 0
8 DL 255 77 1 0 0 0
9 WR 180 69 0 0 0 1
10 LB 212 75 0 1 0 0
.. .. .. .. .. .. .. .. . . . . . . . .
Weight = weight in pounds, Height = height in inches, Line = lineman (0, 1) , LB = linebacker (0, 1), DB = defensive back (0, 1), RB = running back (0,1). Source: Detroit Free Press, February 5, 2004, p. 9D.
DATA SET I Body Fat and Personal Measurements for Males (n = 50, k = 8) BodyFat2
Obs Fat% Age Weight Height Neck Chest Abdomen Hip Thigh
1 12.6 23 154.25 67.75 36.2 93.1 85.2 94.5 59.0
2 6.9 22 173.25 72.25 38.5 93.6 83.0 98.7 58.7
3 24.6 22 154.00 66.25 34.0 95.8 87.9 99.2 59.6
4 10.9 26 184.75 72.25 37.4 101.8 86.4 101.2 60.1
5 27.8 24 184.25 71.25 34.4 97.3 100.0 101.9 63.2
6 20.6 24 210.25 74.75 39.0 104.5 94.4 107.8 66.0
7 19.0 26 181.00 69.75 36.4 105.1 90.7 100.3 58.4
8 12.8 25 176.00 72.50 37.8 99.6 88.5 97.1 60.0
9 5.1 25 191.00 74.00 38.1 100.9 82.5 99.9 62.9
10 12.0 23 198.25 73.50 42.1 99.6 88.6 104.1 63.1
.. .. .. .. .. .. .. .. .. .. . . . . . . . . . .
Fat% = percent body fat, Age = age (yrs.), Weight = weight (lbs.), Height = height (in.), Neck = neck circumference (cm), Chest = chest circumference (cm), Abdomen = abdomen circumference (cm), Hip = hip circumference (cm), Thigh = thigh circumference (cm). Data are a subsample of 252 males analyzed in Roger W. Johnson (1996), ¡°Fitting Percentage of Body Fat to Simple Body Measurements,¡± Journal of Statistics Education 4, no. 1.
DATA SET J Used Vehicle Prices (n = 637 observations, k = 4 predictors)
Vehicles
Obs Model Price Age Car Truck SUV
1 Astro GulfStream Conversion 12,988 3 0 0 0
2 Astro LS 4.3L V6 5,950 9 0 0 0
3 Astro LS V6 19,995 4 0 0 0
4 Astro V6 7 Passenger 5,763 6 0 0 0
5 Avalanche 4¡Á4 20,988 3 0 1 0
6 Avalanche 5.3L V8 4¡Á4 22,700 3 0 1 0
7 Avalanche V8 4 DR 23,671 2 0 1 0
8 Avalanche V8 4¡Á4 24,995 2 0 1 0
9 Avalanche V8 4¡Á4 & 2¡Á2 19,990 3 0 1 0
10 Avalanche Z71 20,995 3 0 1 0
...
...
...
...
Price = asking price ($), Age = vehicle age (yrs), Car = 1 if passenger car, 0 otherwise, Truck = 1 if truck, 0 otherwise, SUV = 1 if sport utility vehicle, 0 otherwise. (Van is the omitted fourth binary). Source: DetroitAutoFocus 4, Issue 38 (Sept. 17¨C23, 2004). Data are for educational purposes only and should not be used as a guide to depreciation.

We do not do your work for you. Once you have answered your questions, we will be happy to give you feedback on your work. Although it might require more time and effort, you will learn more if you do your own work. Isn't that why you go to school?
Since we are volunteers, and our time is also committed to other activities, it would help to just include just one of these problems for us to respond to. 
Using Data Set I
State your a priori hypotheses about the sign (+ or −) of each predictor and your reasoning about cause and effect. Would the intercept have meaning in this problem? Explain.
Perform the regression and write the estimated regression equation (round off to 3 or 4 significant digits for clarity). Do the coefficient signs agree with your a priori expectations?