what are outliers and influential points in graphs dealing with data

An outlier is a point that lies far away from a line or curve through the rest of the data. The chances are that it is a mistake and you might well throw it out.

An influential point is a point that is not close to the other data points in x coordinate but it might be close to a line or curve through the rest of the data.
On the other hand, an influential point might not lie close to a line or curve through the rest of the data, and therefore an influential point might also be an outlier.

The number of arrests, y, of a city over a period of time, x, is graphed on a rectangular coordinate system. Write a paragraph describing your interpretation when the slope is positive, zero, and negative. If you were buying a home in this particular city, which slope would be most attractive to you and why?

Outliers and influential points are both related to the analysis of data in graphs. Let me explain each of them separately:

1. Outliers:

Outliers are data points that significantly deviate from the overall pattern or trend of the data. These points are either unusually high or low in value compared to the rest of the data points. Outliers can occur due to various reasons such as measurement errors, data entry mistakes, or exceptional cases within the studied population. It's important to identify and handle outliers appropriately, as they can affect the accuracy and reliability of your analysis.

To identify outliers in a graph, you can follow these steps:

a. Plot your data points on a scatter plot or line graph to visualize the overall pattern.
b. Look for data points that are further away from the main cluster or trend line.
c. Examine the values of these points and compare them with the rest of the data.
d. Determine if these points are indeed outliers or if they have a valid reason for their deviation.
e. Decide on the appropriate action based on the specific context of your analysis. You can either remove the outliers if they are due to measurement errors, or you can treat them as legitimate data points if they represent important information.

2. Influential points:

Influential points are data points that have a large impact on the fitted regression line or the overall analysis. These points can greatly influence the estimated coefficients, standard errors, and statistical significance of the regression model. Influential points are characterized by their leverage (how far they are from the average of the predictor variables) and their influence (how much they affect the model's estimates).

To detect influential points in a regression analysis, you can use various diagnostic techniques such as:

a. Cook's distance: This is a statistical measure that quantifies the influence of each observation. Points with high Cook's distance are considered influential.
b. Studentized residuals: These are residuals divided by their estimated standard deviations. Points with high absolute values of studentized residuals can be influential.
c. Leverage values: Leverage measures how far each data point is from the average value of predictors. Points with high leverage can have a disproportionate influence on the regression model.
d. Influence plots or added variable plots: These graphical techniques can help visualize the impact of each point on the regression model.

If influential points are detected, it's important to examine them further and evaluate their impact on your analysis. You can assess and address their influence by re-analyzing the data with or without these points, performing sensitivity analyses, or considering robust regression techniques that are less affected by influential observations.

Remember, both outliers and influential points can significantly impact the interpretation and conclusions drawn from your data analysis, so it's essential to handle them appropriately based on the context and objectives of your study.