LibGuides: Statistics : Regression Analysis

If we clearly know that two paired variables have linear relationship;

Decision can be made by p-Value method or traditional method based on critical value Normal distribution table will be needed

General procedure for testing is like In the cases of 𝜎₁and 𝜎₂ unknown; 𝜎₁ ≠𝜎₂ ,𝜎₁ and 𝜎₂ unknown;𝜎₁ = 𝜎₂

Linear regression is a tool to quantitatively describe the linear relationship between two variables

This is done by finding the graph and equation of the straight line that best represents the relationship

The best-fitting straight line is called the regression line, and its equation is called the regression equation.

Equation of Linear Regression

For two paired variables which have perfect linear correlation, we know from Algebra there is straight line (all paired values are perfectly located on the line) with the equation shown below:

y=mx+b

This is done by finding the graph and equation of the straight line that best represents the relationship
The best-fitting straight line is called the regression line, and its equation is called the regression equation.

Basic Concepts of Regression

For two paired variables which have perfect linear correlation, we know from Algebra there is straight line (all paired values are perfectly located on the line) with the equation shown below

y=mx+b

But in the most common cases, two variables do not have perfect linear correlation. When we find the best straight line, we cannot let all paired values exactly located on the line.

Given a collection of paired sample data

The regression equation (best fitting) is expressed as:

Residuals

For a pair of sample x and y values, the residual is the difference between the observed sample value of y and the value of “𝑦 ̂ ”that is predicted by using the regression equation

Residual= y-𝑦 ̂

Example:

x=1,2,4,5

y=4,24,8,32 find the regression equation

How to find the regression line and equation

The least square methodology

The regression line will have the equation that will let the sum of the squares of the residuals is the smallest sum possible, For a given sample with paired variables of X and Y, the slope and the intercept of the regression equation can be found by the following equations:

The slope and the intercept of the regression equation can be also computed by the following equations:

where r is the linear correlation coefficient, sy is the standard deviation of observed Y values, and sx is the standard deviation of observed X values.

where Ybar id the mean of observed Y values, and xbar is the mean values of observed X values

Example

In New York in the 1960s, it was noted that the cost of a slice of pizza was the same as the cost of a subway ride. Over the years it was also noted that the two costs seemed to increase by about the same amounts

Year	Pizza Slice	Subway Fare
1960	$0.15	$0.15
1973	$0.35	$0.35
1986	$1.00	$1.00
1995	$1.25	$1.35
2002	$1.75	$1.50
2003	$2.00	$2.00

Cost of Pizza Slice (x)	Subway Fare (y)	x²	xy
$0.15	$0.15	$0.0225	$0.0225
$0.35	$0.35	$0.1225	$0.1225
$1.00	$1.00	$1.0000	$1.0000
$1.25	$1.35	$1.5625	$1.6875
$1.75	$1.50	$3.0625	$2.6250
$2.00	$2.00	$4.0000	$4.0000
Σ x = $6.50	Σ y = $6.35	Σ x² = 9.77	Σ xy = 9.4575

n=6n
∑x=6.50
∑y=6.35
∑x2=9.77
∑xy=9.4575

Calculations:

Slope m≈0.9455m
Intercept b≈0.0340b

Final Equation:

y=0.9455x+0.034

Requirements for the sample

The sample of paired (x, y) data is a random sample of independent quantitative data.

Visual examination of the scatter-plot must confirm that the points approximate a straight-line pattern.

Outliers known to be errors should be removed, and outliers that are not errors should be considered.

Using Regression for Predictions

Considerations:
Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables.

Use the regression line for predictions only if the data do not go much beyond the scope of the available sample data
Only make predictions if the regression line fits the points on the graph reasonably well.

If the regression equation does not appear to be useful for making predictions, the best predicted value of a variable is its point estimate, which is its sample mean.