# Statistics

Question 1 - Use the dataset InkjetPrinters.txt to answer the following questions:

Data was collected on a sample of 20 different consumer inkjet printers. The dataset contains 5 different properties:

Variable Description
Printing rate (PPM) Printing rate (pages per minute) for a print job
Photo Time Time (in second) to print 4 x 6 color photos
Cost BW Cost per page (in cents) for printing in black & white
Cost Color Cost per page (in cents) for printing in color
Price Typical retail price (in dollars)

a. Suppose we want to predict the retail price of an inkjet printer. Which of the following explanatory variables is the best predictor of retail price? Circle the best choice. [1 mark]

PPM Photo time Cost BW Cost Color

Provide relevant statistics to support your answer in 1a. [1 mark]

b. Assume the printing rate (PPM) is the best predictor. Fit the regression line to the data.
Answer (keep 2 decimal places)
i. Write down the equation of the regression line [2 marks]

ii. Construct an appropriate plot to assess the adequacy of the linear ﬁt in part b (i). Is the ﬁt adequate? Explain. Insert the plot in the Appendix section (Page 10). [2 marks]

c. In the following situation, predict the retail price of a printer, if appropriate, or explain why it is not appropriate to predict the retail price. [3 marks]
i. The laser printer has a printing rate of 2 pages per minute.

ii. The inkjet printer has a printing rate of 4 pages per minute.

iii. The inkjet printer has a printing rate of 6 pages per minute.

d. Interpret the slope of the regression equation. [1 mark]

Question 2 - Below is the final grade report for two sections of a business class:

Section <50 50 - 63 64 -75 76 - 89 90 - 100
Morning 10 48 100 105 18
Afternoon 9 82 106 109 33

The following is a complaint from one of the students in the Morning Section.
"I know a lot of people who got a low mark on the final although they are academically outstanding and their average is above 90! Regardless of class average, they are only 18 people in the Morning section who passed this course with a grade above 90, but in the Afternoon section, 33 people got 90+."
a. Why is it not appropriate to compare the performance between the two sections based on the number of students? (E.g. the complaint compared the number of students, 18 vs 33.) [2 marks]

b. To reply to the student's complaint, the department head hired a statistician to analyze the data. The statistician concluded that the difference in the number of students who received a final grade of 90 or higher is NOT statistically significant. What does this mean? [2 marks]
Hint: Read Chapter 11 - Observational Studies and Experiments 141-155

c. If students truly performed better in the afternoon section than in the morning section, can we conclude the instructor in the morning did a poor job (should he/she be fired?) compared to the afternoon instructor? [2 marks]

Question 3 - Use the dataset RestaurantTips.txt to answer the following questions:

A new restaurant, "Steak Out Fish & Chips", collected a sample of dining visits over a one week period. The dataset contains 149 dining visits on the following 7 variables:

Variable Description
Bill Total bill payment (in dollars) not including tip
Tip Total tip amount (in dollars)
Credit Whether bill was paid by credit card (y) or not (n)
Guests Number of guests seated per dining visit
Day The day of the week (m,t,w,th,f)
Server Server servicing the guest(s)
PctTip Total tip / Total bill payment (%)

a. Construct an appropriate graphical display to visualize the relationship between the tip amount (Tip) and the number of guests seated per dining visit (Guests). [2 marks]
(Attach graphical display in the appendix)

b. Provide only one measure of centre and spread appropriate for the data. Compare both the centre, spread and presence of outliers of the distributions. [4 marks]

Question 4 - Use the dataset StudentServey.txt to answer the following questions:
Data was collected on a sample of 325 different students. The dataset contains 5 different properties:

Variable Description
Year Current year level of the student
Gender Gender of the student
Smoke Whether the student smokes (Yes) or not (No)
Exercise Average amount of exercise (in hours) per week
TV Average amount of TV (in hours) per week
Height Height of the student (in feet)
Weight Weight of the student (in lbs)
BMI Body mass index (BMI) of the student
BMI Categories BMI Classification based on BMI
VerbalSAT Verbal SAT score when admitted
MathSAT Math SAT score when admitted
SAT SAT score when admitted
GPA Current GPA of the student
Pulse Resting pulse rate of the student (in bpm)

In the following situations, construct the appropriate graphical display that will helps us visualize:
(Attach graphical displays in the appendix)
a. The relationship between pulse rate and weight. [2 marks]
b. The relationship between year of study and smoking status. [2 marks]
c. The distribution of the BMI categories. [2 marks]
d. The distribution of GPA. [2 marks]
e. How SAT scores of students are related to their BMI categories. [2 marks]

Question 5 - Suppose it is known that at a certain large college, the amount of money students spend on textbooks in a semester has mean \$605 and standard deviation \$80. The bookstore will give a small voucher to students who spend more than \$600 on textbooks in a semester.
Part (a) - Assuming the amount of money students spend on textbooks in a semester follows a Normal distribution, find the following probabilities: [6 marks]
Note: In each of the following probability problems, you must
- define the random variable of interest with an appropriate notation or describe the random variable in words,
- determine the probability distribution the random variable follows and
- show detailed calculation and a picture (if necessary)

I. A student is selected at random from the college. What is the probability that the student spends between \$400 and \$450 on textbooks in a semester? [2 marks]

Probability (keep 4 decimal places)

II. A sample of 5 students is selected at random from the college. What is the probability that at least 3 students spend between \$400 and \$450 on textbooks in a semester? [3 marks]

Probability (keep 4 decimal places)

III. A sample of 5 students is selected at random from the college. What is the probability that the average amount spent on textbook of these 5 students falls between \$600 and \$620? [3 marks]

Probability (keep 4 decimal places)

IV. A sample of 50 students is selected at random from the college. What is the probability that the average amount spent on textbook of these 50 students falls between \$600 and \$620? [3 marks]

Probability (keep 4 decimal places)

V. A sample of 50 students is selected at random from the college. What is the probability that 30 or fewer of the students get a voucher? [3 marks]

Probability (keep 4 decimal places)

Part (b) - If the amount spent on textbooks in a semester DOES NOT follow a normal distribution, will the probabilities computed in part(a) still be accurate? [5 marks]
I. Will the probability computed in part(a) - (I) still be accurate? ___ Yes or ____ No.
Explain.

II. Will the probability computed in part(a) - (II) still be accurate? ___ Yes or ____ No.
Explain.

III. Will the probability computed in part(a) - (III) still be accurate? ___ Yes or ____ No.
Explain.

IV. Will the probability computed in part(a) - (IV) still be accurate? ___ Yes or ____ No.
Explain.

V. Will the probability computed in part(a) - (V) still be accurate? ___ Yes or ____ No.
Explain.

