LibGuides: Statistics : Correlation Analysis

A correlation exists between two variables when one of them is related to the other in some way. The values of the two variables can be grouped in pairs.

Linear Correlation

•A linear correlation exists if there is a straight-line relationship between two variables and It is also called as simple (Pearson) correlation.

A linear correlation coefficient is always know as the “Pearson Correlation Coefficient ”
Measures the strength of the linear relationship between the paired values representing quantitative data in a sample
it is represented by r and for a given sample with two variables, X and Y, linear correlation coefficient r can be computed by the formula:

Notations:
n: Number of data pairs (sample size)
X: X-value (independent variable, e.g., cost of pizza slice)
Y: Y-value (dependent variable, e.g., subway fare)
r: Linear correlation coefficient for a sample (measures the strength and direction of the linear relationship between X and Y)

Linear Correlation Coefficient

•It is a statistic

•The value of r does not change if all values of either variable are converted to a different scale.

r>0	Positive Correlation
r<0	Negative Correlation
\|r\|=1	Perfect Correlation
r=0	No Correlation
\|r\|=1	Strong Correlation
\|r\|=0	Weak Correlation

Inferences of the linear correlation between two variables in population from sample

Linear correlation between paired variables in a population
- If we had all population values for x and y, the result of using the formula would be a population parameter representing the correlation coefficient between the paired variables.
- The linear correlation coefficient for population is represented by.
For this inference, we will use the ‘Pearson Correlation coefficient’ standard table
- We will NOT find the value for
- We will use the sample to have an inference about if there is really a linear correlation between the paired variables for the population

The Pearson Correlation Coefficient Standard Table

The standard table provides the minimum linear correlation coefficient that samples must exceed to prove the linear correlation between variables for populations, by the considering the sample and test properties:
- Sample size
- Significance level

The values in standard table can be viewed as critical values for the inference test.

The inferences & result interpretation

Calculate the linear correlation coefficient (𝑟) for the paired values in a sample.

If |r| exceeds the critical value in the table, conclude that, there is a linear correlation.

Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation.

Requirements for the sample

The sample of paired (x, y) data is a random sample of independent quantitative data

Visual examination of the scatter-plot must confirm that the points approximate a straight-line pattern

Any outliers must be removed if they are known to be errors; otherwise they must be considered when calculating r

Example

In a production operation to make steel plates, the price of a kilogram of steel was tracked against the price of a kilowatt of electricity used to run the production equipment.

A relationship between these two parameters was suspected to exist by the production engineer in charge.

The following table was found to emerge over a 5 week period and Find the correlation coefficient r for the paired steel price / price of electricity costs given in the table.

Week	Steel Price (kg)	Electricity Price (kWh)
Week 1	£2.50	£0.10
Week 2	£2.55	£0.25
Week 3	£2.64	£0.40
Week 4	£2.78	£0.60
Week 5	£2.92	£0.90

•x = steel price, y = price of electricity and n = 5 weeks

Steel (x)	Electricity (y)	x²	y²	xy
2.50	0.10	6.2500	0.0100	0.2500
2.55	0.25	6.5025	0.0625	0.6375
2.64	0.40	6.9696	0.1600	1.0560
2.78	0.60	7.7284	0.3600	1.6680
2.92	0.90	8.5264	0.8100	2.6280
Σx = 13.39	Σy = 2.25	Σx² = 35.9769	Σy² = 1.4025	Σxy = 6.2395

On the basis of the above result for r, is it safe to support the conclusion of a linear correlation at a 95% confidence interval?

•For a 95% confidence interval α = 0.05 n = 5

•Refer to the Pearson Correlation Coefficient r Tables df:5, p value =0.878

If the absolute value of the computed value of r exceeds the value in the table, conclude that there is a linear correlation

Here computed value of r = 0.9955256426

Value in table = 0.878

Thus, we can conclude a linear correlation at a 95% confidence level

Statistics

Correlation Analysis

Linear Correlation

Notations:

Inferences of the linear correlation between two variables in population from sample

The inferences & result interpretation

Requirements for the sample

Example