Correlation and Least Square Regression

A long term study of changing environmental conditions in Chesapeake Bay found the following
annual average salinity readings in one location in the bay:

Year Salinity (%)
1971 13.4
1972 9.8
1973 15.1
1974 14.7
1975 15.1
1976 14
1977 15.7
1978 16
1979 16
1980 13.8
1981 17.9
1982 20
1983 16
1984 15.4

a) Make a plot of salinity against time. Was salinity generally increasing or decreasing
over these years? Is there an overall straight line tend over time?

b) What is the correlation between salinity and year/ What percentage of the observed
variation in salinity is accounted for by straight line change over time?

c) Find the least squares regression line for predicting salinity from year. Explain in simple
language what the slope of this line tells your about Chesapeake Bay.

d) If the rend in these past data had continued, what would be the average salinity at this point in the bay in 1988?

23

a) To make a plot of salinity against time, you will need to create a scatter plot with the year on the x-axis and the salinity on the y-axis. Each data point represents a year and its corresponding salinity reading. By visually examining the plot, you can determine whether the salinity is generally increasing or decreasing over the years and whether there is an overall straight line trend.

b) To find the correlation between salinity and year, you can calculate the correlation coefficient. The correlation coefficient measures the strength and direction of the linear relationship between two variables. In this case, the year and the salinity are the two variables. The correlation coefficient ranges from -1 to 1. A positive value indicates a positive correlation, meaning that as one variable increases, the other tends to increase as well. On the other hand, a negative value indicates a negative correlation, meaning that as one variable increases, the other tends to decrease. A correlation coefficient close to 0 indicates a weak or no linear relationship.

To calculate the correlation coefficient, you can use the following formula:

r = (n * Σ(xy) - Σx * Σy) / sqrt((n * Σx^2 - (Σx)^2) * (n * Σy^2 - (Σy)^2))

In this formula, n represents the number of data points, Σxy is the sum of the products of each year and its corresponding salinity, Σx is the sum of the years, and Σy is the sum of the salinity values.

To calculate the percentage of the observed variation in salinity accounted for by the straight line change over time, you can square the correlation coefficient (r) and multiply it by 100. This will give you the percentage of variation in salinity that can be explained by the linear relationship with the year.

c) To find the least squares regression line for predicting salinity from the year, you will need to calculate the slope and y-intercept of the line. The slope of the regression line tells us the rate at which the salinity changes over time, and the y-intercept represents the estimated salinity when the year is 0. The equation of the least squares regression line can be written as:

y = mx + b

where y is the predicted salinity, x is the year, m is the slope, and b is the y-intercept.

To find the slope and y-intercept, you can use the following formulas:

m = (n * Σ(xy) - Σx * Σy) / (n * Σx^2 - (Σx)^2)

b = (Σy - m * Σx) / n

d) To predict the average salinity in 1988 based on the trend in the past data, you can use the equation of the least squares regression line. Simply substitute the year 1988 into the equation and calculate the corresponding predicted salinity value.