The following data shows the annual salaries of 12 randomly selected employees of a

company (in £ 1000’s) on the X-axis and the average number of days they have taken days
off work in the last 10 years on the Y-axis.
Y X Y X
11.1 29.6 9.6 39.2
8.4 58.6 10.9 46.8
8.7 71.2 6.8 70.0
6.3 73.9 10.4 33.5
7.8 70.7 11.0 35.3
8.2 46.3 12.5 30.8
[Note: Use hand calculation to answer this Question. You may use a calculator to help
you. However, a statistical package or worksheet output will not be accepted for this
question.]
(i) Calculate the least-squares estimates of slope and intercept for the simple linear
regression of Days off (Y) on Annual Salaries (X).
(ii) Construct the analysis of variance table and perform an F-test to find out the
overall significance of the regression model. What conclusion can you make?
(iii) Perform a T-test for the estimate of the slope and comment on the result. Do you
see a relationship between the test statistic for the T-test and the F-test?
Investigate why an F-distribution is used to model the quantity MSR/MSE while
testing the overall significance of the regression model.
(iv) Using the fitted regression line, find the number of days that an employee with an
annual salary of £20,000 is expected to take in an year and obtain a 95%
prediction interval for it. Give the meaning of this prediction interval.
(v) The manager of the study suggested that another variable, money spent on health
benefits of an employee, might be important. To analyse the same data as earlier
with the additional variable, a multiple linear regression model was proposed with
the form , 1 1 2 2 Y = β + β X + β X + ε o where ε ∼ N(0, σ2
), X1 represents the annual
salary, and X2 represents the money spent on health benefits of an employee. It
was found that the increase of the R2
-value of this new model was 2.40%. Using
this result, apply an appropriate test to decide whether it was necessary to include
the money spent on health benefits of an employee (X2) in the final model

You want to set up a games stall at the Winter Wonderland. You have a single pack of 52

cards and decide to play card games with your customers. The game is designed as follows. A
customer draws 4 cards at random. For every “Ace” that the customer draws, he/she wins £3
and for every face card (“Jack”, “Queen” or “King”), he/she wins £1.
Let the two discrete random variables U and V, be the number of aces and face cards
obtained, respectively.
(i) Derive the joint probability mass function p(U, V).
(ii) Find the marginal probability mass functions of U and V.
(iii) Are U and V independent? Specify the reason.
(iv) Find E(U), E(V), Var(U), Var(V) and Cov(U, V) and interpret the obtained values.
(v) Find the minimum price for the game, so that you don’t bear a loss if a very large
number of games are played.
(vi) If you set the price for each game by rounding up the value found in (

(i) To calculate the least-squares estimates of slope and intercept for the simple linear regression of Days off (Y) on Annual Salaries (X), follow these steps:

1. Calculate the means of X and Y:
- X̄ = (29.6 + 39.2 + 58.6 + 46.8 + 71.2 + 73.9 + 70.7 + 46.3 + 30.8) / 9
- Ȳ = (11.1 + 9.6 + 8.4 + 10.9 + 8.7 + 6.8 + 10.4 + 7.8 + 11.0) / 9

2. Calculate the deviations of each X value from the mean of X (X - X̄) and the deviations of each Y value from the mean of Y (Y - Ȳ).

3. Calculate the product of the deviations of X and Y for each data point (i.e., (X - X̄) * (Y - Ȳ)).

4. Calculate the sum of the product of deviations of X and Y (Σ((X - X̄) * (Y - Ȳ))).

5. Calculate the sum of the squared deviations of X (Σ((X - X̄)^2)).

6. Calculate the slope (β1) as Σ((X - X̄) * (Y - Ȳ)) / Σ((X - X̄)^2).

7. Calculate the intercept (β0) as Ȳ - β1 * X̄.

(ii) To construct the analysis of variance table and perform an F-test to find out the overall significance of the regression model, follow these steps:

1. Calculate the sum of squares total (SST) by summing the squared deviations of each Y value from the mean of Y (i.e., Σ((Y - Ȳ)^2)).

2. Calculate the sum of squares regression (SSR) by multiplying the squared slope by the sum of squared deviations of X (i.e., β1^2 * Σ((X - X̄)^2)).

3. Calculate the sum of squares error (SSE) by subtracting SSR from SST (i.e., SST - SSR).

4. Calculate the mean square regression (MSR) by dividing SSR by the degrees of freedom of the regression (which is 1 in simple linear regression).

5. Calculate the mean square error (MSE) by dividing SSE by the degrees of freedom of the error (which is n - 2, where n is the number of data points).

6. Calculate the F-statistic by dividing MSR by MSE.

7. Use the F-distribution with degrees of freedom (1, n - 2) to determine the p-value associated with the F-statistic.

(iii) To perform a T-test for the estimate of the slope, follow these steps:

1. Calculate the standard error of the slope by taking the square root of MSE divided by the sum of squared deviations of X (i.e., sqrt(MSE / Σ((X - X̄)^2))).

2. Calculate the t-statistic by dividing the estimated slope (β1) by the standard error of the slope.

3. Use the t-distribution with n - 2 degrees of freedom to determine the p-value associated with the t-statistic.

(iv) To find the number of days that an employee with an annual salary of £20,000 is expected to take in a year and obtain a 95% prediction interval for it using the fitted regression line, follow these steps:

1. Plug the annual salary value (£20,000) into the fitted regression line equation (Y = β0 + β1 * X) to get the predicted value of Y.

2. Calculate the standard error of the estimate (SE) using the formula SE = sqrt(MSE * (1/n + ((X - X̄)^2) / Σ((X - X̄)^2))), where X is the given annual salary value and n is the number of data points.

3. Calculate the 95% prediction interval using the formula ± t * SE, where t is the critical value from the t-distribution with n - 2 degrees of freedom for a desired level of confidence (e.g., 1.96 for a 95% confidence level).

(v) To decide whether it was necessary to include the money spent on health benefits of an employee (X2) in the final model, perform an appropriate test based on the increase in R2 value:

1. Calculate the increase in R2 value by subtracting the R2 value of the simple linear regression model from the R2 value of the multiple linear regression model.

2. Perform a hypothesis test to determine whether the increase in R2 value is statistically significant.

- Null hypothesis (H0): The increase in R2 value is not statistically significant (β2 = 0).
- Alternative hypothesis (Ha): The increase in R2 value is statistically significant (β2 ≠ 0).

3. Use the F-distribution with degrees of freedom (1, n - p - 1) to determine the p-value associated with the F-statistic.

- n is the number of data points, and p is the number of predictors (including the intercept) in the multiple linear regression model.

4. If the p-value is less than a predetermined significance level (e.g., 0.05), reject the null hypothesis and conclude that it was necessary to include the money spent on health benefits of an employee (X2) in the final model. Otherwise, fail to reject the null hypothesis and conclude that there is no significant evidence to suggest the inclusion of X2.