Instructions: Prepare your answers using Excel and show your work to get full or partial credit.

Question 1:

Using the file Wages.xlsx which contains information of working individuals, for a nationwide firm (Value 7 points).

a) Run Descriptive Statistics for the variables Education, Number of siblings, Mother’s years of education and Father’s years of education. Show your output. What is the mean of each of these variables? What is the number of observations (n)?

b) Estimate a regression equation of education on the number of siblings, mother’s years of education and father’s years of education as independent variables. Show the Excel output.

c) Write the Sample Regression Function

d) Interpret the coefficients of the regression

e) Are the coefficients statistically significant at the 95% confidence level? Write down the hypothesis that you are testing for each coefficient, and explain your conclusions.

f) Suppose that Tom has 3 siblings, and his mother has 10 years of education and his father has 14 years of education. Henry has no siblings, and his mother and father each has 16 years of education. Using the regression equation, calculate the estimated education for Tom and Henry

g) What is the predicted difference in years of education between Tom and Henry?

a) To run Descriptive Statistics in Excel for the given variables Education, Number of siblings, Mother's years of education, and Father's years of education, follow these steps:

1. Open the "Wages.xlsx" file in Excel.

2. Select the columns containing the variables Education, Number of siblings, Mother's years of education, and Father's years of education.

3. Go to the "Data" tab and click on the "Data Analysis" option in the "Analysis" group. If you don't see the "Data Analysis" option, you may need to enable it by going to File -> Options -> Add-Ins -> Excel Add-ins -> Click on "Go" -> Check "Analysis ToolPak" -> Click "OK".

4. In the "Data Analysis" dialog box, select the "Descriptive Statistics" option and click "OK".

5. In the "Descriptive Statistics" dialog box, select the range of the input variables and check the options for "Mean" and "Observations".

6. Specify the output range where you want the output to be displayed.

7. Click "OK" to run the Descriptive Statistics analysis.

8. The output will show the mean for each variable under the "Mean" column and the number of observations (n) under the "N" column.

b) To estimate a regression equation of education on the number of siblings, mother's years of education, and father's years of education, follow these steps:

1. In a new sheet, set up the following column headers: Column A - Education, Column B - Number of siblings, Column C - Mother's years of education, and Column D - Father's years of education.

2. Enter the corresponding data for each variable in the rows below the column headers.

3. Go to the "Data" tab and click on the "Data Analysis" option in the "Analysis" group.

4. In the "Data Analysis" dialog box, select the "Regression" option and click "OK".

5. In the "Regression" dialog box, specify the input range for the dependent variable (Education) and the independent variables (Number of siblings, Mother's years of education, and Father's years of education).

6. Check the option for "Labels" if your data has column labels.

7. Specify the output range where you want the regression analysis output to be displayed.

8. Click "OK" to run the regression analysis.

9. The output will show the regression equation coefficients, standard errors, t-values, and p-values.

c) The Sample Regression Function is the estimated regression equation. It can be written as:

Education = b0 + b1(Number of siblings) + b2(Mother's years of education) + b3(Father's years of education)

where b0, b1, b2, and b3 are the estimated coefficients obtained from the regression analysis.

d) The coefficients of the regression represent the impact of each independent variable on the dependent variable (Education). The coefficient of the variable Number of siblings indicates how education changes for each additional sibling. The coefficients of Mother's years of education and Father's years of education represent how education changes for each additional year of education for the mother and the father, respectively.

e) To determine if the coefficients are statistically significant at the 95% confidence level, check the p-values associated with each coefficient in the regression output. The hypothesis being tested is whether the coefficient is significantly different from zero (null hypothesis: coefficient = 0). If the p-value is less than 0.05 (95% confidence level), then the coefficient is considered statistically significant.

f) To calculate the estimated education for Tom and Henry using the regression equation, substitute the values of Number of siblings, Mother's years of education, and Father's years of education into the equation:

For Tom:
Number of siblings = 3
Mother's years of education = 10
Father's years of education = 14

For Henry:
Number of siblings = 0
Mother's years of education = 16
Father's years of education = 16

Substitute these values into the regression equation and calculate the estimated education for Tom and Henry.

g) To calculate the predicted difference in years of education between Tom and Henry, subtract the estimated education for Henry from the estimated education for Tom.

To answer these questions using Excel, follow these steps:

Step 1: Open the file "Wages.xlsx" in Excel.

Step 2: Locate the variables 'Education,' 'Number of siblings,' 'Mother's years of education,' and 'Father's years of education' in the data. The file should contain a column for each variable.

Step 3: Run Descriptive Statistics for the variables 'Education,' 'Number of siblings,' 'Mother's years of education,' and 'Father's years of education'. To do this in Excel:

- Select the range of data for the variable 'Education.'
- Go to the 'Data' tab and click on 'Data Analysis' (if you don't see 'Data Analysis,' you may need to enable it).
- Choose 'Descriptive Statistics' from the list and click 'OK.'
- In the 'Descriptive Statistics' dialog box, select the range of data for the variable 'Education' as the 'Input Range.'
- Check the 'Summary Statistics' option.
- Choose a location in your workbook to output the results (e.g., a new worksheet or a specific range). Select the 'Output Range.'
- Click 'OK' to run the descriptive statistics analysis.
- Repeat the above steps for the variables 'Number of siblings,' 'Mother's years of education,' and 'Father's years of education.'

The output will display the mean for each variable and the number of observations (n).

Step 4: Estimate a regression equation of education on the number of siblings, mother's years of education, and father's years of education as independent variables. To do this in Excel:

- Select the range of data for the variables 'Education,' 'Number of siblings,' 'Mother's years of education,' and 'Father's years of education.'
- Go to the 'Data' tab and click on 'Data Analysis.'
- Choose 'Regression' from the list and click 'OK.'
- In the 'Regression' dialog box, select the range of data for the dependent variable (education) and the range of data for the independent variables (number of siblings, mother's years of education, and father's years of education).
- Check the 'Labels' and 'Confidence Level' (use 95% for this question) options.
- Choose a location in your workbook to output the results (e.g., a new worksheet or a specific range). Select the 'Output Range.'
- Click 'OK' to run the regression analysis.

The output will show coefficients, standard errors, t-statistics, p-values, and other regression-related information.

Step 5: Write the Sample Regression Function using the estimated regression equation. The regression equation will be displayed in the output from Step 4.

Step 6: Interpret the coefficients of the regression. Look at the output from Step 4 to interpret the coefficients. The coefficients represent the change in the dependent variable (education) for a one-unit change in the corresponding independent variable (number of siblings, mother's years of education, and father's years of education). Interpret each coefficient in the context of the variables.

Step 7: Test the statistical significance of the coefficients at the 95% confidence level. To do this, look at the p-values associated with each coefficient in the regression output from Step 4. The null hypothesis is that the coefficient is equal to zero. If the p-value is less than 0.05 (assuming a 5% significance level), you can reject the null hypothesis and conclude that the coefficient is statistically significant.

Step 8: Calculate the estimated education for Tom and Henry using the regression equation. Substitute the values for the independent variables (number of siblings, mother's years of education, and father's years of education) into the regression equation and solve for the dependent variable (education). For Tom's case, substitute 3 for the number of siblings, 10 for the mother's years of education, and 14 for the father's years of education. For Henry's case, substitute 0 for the number of siblings, 16 for the mother's years of education, and 16 for the father's years of education.

Step 9: Calculate the predicted difference in years of education between Tom and Henry. Subtract the estimated education for Tom from the estimated education for Henry to get the predicted difference.

By following these steps and using Excel, you can answer all the questions and show your work.