**Home** / **explained and unexplained variation calculator** / 10-4 Variation and Prediction Intervals

10-4 Variation and Prediction Intervals

Explained and unexplained variation

In this section, we study two measures used in correlation and regression studies.

(The coefficient of determination and the standard error of estimate.) We also

learn how to construct a prediction interval for y using a regression line and a

given value of x. To study these concepts, we need to understand and calculate

the total variation, explained deviation, and the unexplained deviation for each

ordered pair in a data set.

Assume that we have a collection of paired data containing the sample point

(x , y), that is the predicted value of y, and that the mean of the sample y-values

is .

The total variation about a regression line is the sum of the squares of the

differences between the y-value of each ordered pair and the mean of y.

total variation = ( - )

The explained variation is the sum of the squared of the differences between

each predicted y-value and the mean of y.

explained variation = ( - )

The unexplained variation is the sum of the squared of the differences between

the y-value of each ordered pair and each corresponding predicted y-value.

unexplained variation = ( - )

The sum of the explained and unexplained variations is equal to the total

variation.

Total variation = Explained variation + Unexplained variation

As its name implies, the explained variation can be explained by the relationship

between x and y. The unexplained variation cannot be explained by the

relationship between x and y and is due to chance or other variables.

Consider the advertising and sales data used throughout this section with a

regression line of = 50.729 x + 104.061.

Using the data point (2.0, 220) we can find the total, explained, and unexplained

variation:

The Coefficient of determination

The coefficient of determination r2 is the ratio of the explained variation to the

total variation.

2 =

We can compute 2 by using the definition or by squaring the linear correlation

coefficient r.

Ex 1)

The correlation coefficient for the following advertising expenses and company

sales data is 0.913. Find the coefficient of determination. What does this tell you

about the explained variation of the data about the regression line? About the

unexplained variation? (r= 0.913 suggests a strong positive linear correlation)

= 0.834

About 83.4% of the variation in the company sales can be explained by the

variation in the advertising expenditures. About 16.6% of the variation is

unexplained and is due to chance or other variables.

Advertising expenses Company sales xy x2 y2

(1000s of $), x (1000s of $), y

2.4 225 540 5.76 50,625

1.6 184 294.4 2.56 33,856

2.0 220 440 4 48,400

2.6 240 624 6.76 57,600

1.4 180 252 1.96 32,400

1.6 184 294.4 2.56 33,856

2.0 186 372 4 34,596

2.2 215 473 4.84 46,225

Sums

15.8 1634 3289.8 32.44 337,558

=(1634/8) =204.25 =(15.8/8)=1.975 ,

The Standard Error of Estimate

The Standard Error of Estimate se is the standard deviation of the observed

y-values about the predicted -value for a given x-value. It is given by

= ( - )2

-2

Or as the following equivalent formula:

What percentage of variation is explained by the regression line?In Section 9.1, we calculated that r = 20:969, so r = :939 and 93.9% of the variation is explained by the regression line (and 6.1% is due to random and unexplained factors). 2.A study involved comparing the per capita income (in thousands of dollars) to the number of medical doctors per 10,000 residents.

Author: Melody

Creator: Microsoft® Office Word 2007

Producer: Microsoft® Office Word 2007

CreationDate: Tue Apr 20 13:30:44 2010

ModDate: Tue Apr 20 13:30:44 2010

Tagged: yes

Form: none

Pages: 5

Encrypted: no

Page size: 612 x 792 pts (letter) (rotated 0 degrees)

File size: 348812 bytes

Optimized: no

PDF version: 1.5