5. HR Assistant Survey

A sample of HR assistants in Plano Texas was created. It was suggested that the data should be analyzed with a regression model to determine if salary was related to years of experience, aptitude test scores, EI (emotional intelligence)scores, number of foreign languages spoken, word processing speed and employee satisfaction. This data is shown in Table 1. Sorry the units of y are difficult to read ($000).

a. Using a correlation matrix determine what variables you might flag going into the initial model. Explain your choices.
b. Determine an initial regression model including all variables. Discuss your adjusted coefficient of determination. Use in a clear easy to understand sentence.
c. Check you significance levels and residuals. Clean up your model based on these results. Explain your step by step results with analytical reasons.
d. Forecast a salary for an applicant with 7 years of experience, an EI score of 50, aptitude score of 82, speed of 25, satisfaction score of 10, languages is 6. Just use those x’s you have determined should be in your final model, do not go back to the original model before cleanup. How comfortable are you with this prediction?

a. To determine which variables to flag going into the initial model, we can use a correlation matrix to analyze the relationships between each variable. A correlation matrix will show the correlation coefficients between all possible pairs of variables. Variables with high correlation coefficients (close to 1 or -1) indicate a strong relationship and might be flagged for inclusion in the initial model. Variables with low correlation coefficients (close to 0) indicate a weak or no relationship and might not be included in the initial model.

b. To determine the initial regression model, all variables should be included. The adjusted coefficient of determination, also known as adjusted R-squared, measures the proportion of the variance in the dependent variable (salary) that can be explained by the independent variables (years of experience, aptitude test scores, EI scores, number of foreign languages spoken, word processing speed, and employee satisfaction). A higher adjusted R-squared indicates a better fit of the model to the data, with values ranging from 0 to 1. In this case, the adjusted R-squared can provide insights into how well the chosen independent variables explain the variation in salary.

c. After checking the significance levels and residuals of the initial model, we can clean up the model based on these results. Significance levels (p-values) indicate the statistical significance of each independent variable in relation to the dependent variable. Variables with high p-values (greater than the chosen significance level, typically 0.05) can be considered for removal from the model as they may not have a significant impact on the dependent variable. Residuals, on the other hand, measure the differences between the observed and predicted values of the dependent variable. Large or systematic residuals may indicate a poor model fit or the presence of outliers or influential cases. Therefore, adjusting or removing variables based on significance levels and residuals can help improve the model's fit and accuracy.

d. To forecast the salary for an applicant with specific values of the chosen independent variables, we can use the final model obtained after cleaning up based on significance levels and residuals. By plugging in the values of years of experience, EI score, aptitude score, word processing speed, satisfaction score, and number of languages into the final model equation, we can estimate the predicted salary for the applicant. The comfort level with this prediction depends on the accuracy and reliability of the final model, which can be assessed through statistical measures like R-squared, adjusted R-squared, and residual analysis.