Influential observation and Outliers: Are we just splitting hair? If so, explain. If not, how then do you differentiate between an influential observation and an outlier.

Since this is not my area of expertise, I searched Google under the key words "influential observation outlier difference" to get these possible sources:

http://www.google.com/search?client=safari&rls=en&q=influential+observation+outlier+difference&ie=UTF-8&oe=UTF-8

In the future, you can find the information you desire more quickly, if you use appropriate key words to do your own search. Also see http://hanlib.sou.edu/searchtools/.

When it comes to influential observations and outliers, we are indeed dealing with slight nuances, but they have distinct meanings and purposes in statistics. Let me explain the difference and how you can differentiate between the two.

Outliers, in statistical terms, are observations that deviate significantly from the majority of the data points. These are data points that are unusually high or low compared to the other values in the dataset. Outliers in a dataset can arise due to various reasons like measurement errors, data entry mistakes, or even genuine rare occurrences. Outliers can have a substantial impact on statistical measures like the mean and standard deviation, potentially skewing the results if not handled correctly.

On the other hand, influential observations refer to data points that have a significant impact on the statistical analysis or models. These observations can greatly affect the outcome of statistical methods like regression analysis. An influential observation may not necessarily be an outlier, as it doesn't need to deviate significantly from the majority of the data. Instead, it has the ability to influence the parameter estimates, regression coefficients, and overall model fit.

Differentiating between outliers and influential observations requires some analytical techniques and judgment:

1. Visualization: Start by plotting your data in a graph, such as a scatter plot. Outliers are usually easier to identify visually as they are points that fall significantly away from the general trend of the data. Influential observations may not be as visually apparent, but can usually be detected when they disproportionately impact the analysis.

2. Statistical tests: Various statistical tests can be used to detect outliers and influential observations. Outlier tests like the Z-score or modified Z-score can help identify extreme observations based on their deviation from the mean or median. Influence measures like Cook's distance or leverage statistics in regression analysis can help identify influential observations.

3. Model comparison: Fit models both with and without the suspected outliers or influential observations. Compare the results and evaluate the impact of each observation on the model's fit, coefficients, and predictions. An observation that leads to significant changes in these measures can be considered influential.

In summary, outliers and influential observations are not the same, although they are related concepts. While outliers deviate significantly from the majority of the data, influential observations have a significant impact on the statistical analysis or models. By using visualization techniques, statistical tests, and model comparison, you can differentiate between them.