Home / residual plot creator / Curvature and Nonconstant Variance - Winona State University
Curvature and Nonconstant Variance
Testing for Curvature ~ Black Cherry Tree Data
We begin by examining a scatterplot matrix for these data.
The marginal response plot of Vol vs. D shows evidence of a nonlinearity. The Ht vs. D panel suggests that Ht and D are linearly related. It is also worthwhile examining a 3-D spin plot of Vol vs. (Ht,D). Try fitting a plane and examining the residuals graphically. Also try removing linear trend from the plot and spinning. Is there evidence of lack of fit for the planar model exhibited in the 3-D plot? (yes, check it out)
We will now examine the results of fitting the mean function
E(Vol|Ht,D) = ηο + η1 Ht + η2 D
The regression summary is given below:
Data set = Trees, Name of Fit = L1
Normal Regression
Kernel mean function = Identity
Response = Vol
Terms = (D Ht)
Coefficient Estimates
Label Estimate Std. Error t-value
Constant -57.9877 8.63823 -6.713
D 4.70816 0.264265 17.816
Ht 0.339251 0.130151 2.607
R Squared: 0.94795
Sigma hat: 3.88183
Number of cases: 31
Degrees of freedom: 28
Summary Analysis of Variance Table
Source df SS MS F p-value
Regression 2 7684.16 3842.08 254.97 0.0000
Residual 28 421.921 15.0686
Lack of fit 26 421.716 16.2199 158.24 0.0063
Pure Error 2 0.205 0.1025
Note that the p-value for the LOF tests suggests that this model may be inappropriate. We will now examine a plot of the residuals vs. the fitted values.
There is definitely evidence of lack of fit. The linear mean function above is not supported by this residual plot.
Testing for Curvature ~ Tukey's Test for Nonadditivity
Following the procedure outlined in class, we first fit the linear model above and save the fitted values by selecting the Add to dataset … option from the pull down menu for model. Then use Transform… option to square the fitted values. Now perform the regression of Vol on Ht, D, and L1:Fit-Values^2. The results are shown below:
Normal Regression
Kernel mean function = Identity
Response = Vol
Terms = (D Ht L1.Fit-Values^2)
Coefficient Estimates
Label Estimate Std. Error t-value
Constant -16.5517 8.93717 -1.852
D 1.48988 0.560572 2.658
Ht 0.207008 0.0891259 2.323
L1.Fit-Values^2 0.00971504 0.00160720 6.045
R Squared: 0.977882
Sigma hat: 2.5769
Number of cases: 31
Degrees of freedom: 27
Summary Analysis of Variance Table
Source df SS MS F p-value
Regression 3 7926.79 2642.26 397.91 0.0000
Residual 27 179.291 6.64042
Lack of fit 25 179.086 7.16346 69.89 0.0142
Pure Error 2 0.205 0.1025
t dist. with 27 df, value = 6.045, two-tail probability = 1.87924e-06
The test statistic for Tukey's Test for Nonadditivity is the t-value for the L1:Fit-Values^2 term in this model. Here t = 6.045 ~ t df = 27 (p-value = .0000018), thus we reject the NH and conclude there is sufficient evidence of nonadditivity.
An easier way to obtain these results is to use the Residual plots… option from the model menu. Below is a plot of the residuals vs. the fitted values obtained in this manner.
Notice that the results of Tukey's Test for Nonadditivity are given above the plot. You can also test for significant nonadditivity as a function of the individual terms in the model. By clicking on the slider for the horizontal axis we can obtain plots of the residuals vs. Ht & D as well as the corresponding nonadditivity test. The results are shown below:
The test for nonadditivity as a function of diameter (D) is significant, while the nonadditivity test for height (Ht) is mildly significant (not shown, p < .10). Based on these results we would conclude there is significant curvature present and some reformulation of the mean function is needed. For example, log transforming Vol, Ht, and D does not result in significant nonadditivity.
Testing for Nonconstant Variance ~ Transaction Data
Below is a plot of the residuals vs. the fitted values from fitting the mean function
E(Time|T1,T2) = ηο + η1 T1 + η2 T2
In working with these data previously you have used weighted least squares to account for the fact that the Var(Time|T1,T2) is not constant. We will now examine formal tests to determine if nonconstant variances is present, and secondly gain some insight into the form of the variance function. First we will test to see if the variance depends significantly on the estimated mean function. In Arc select Nonconstant variance plot… from the model menu, which gives the plot and test result shown below.
The plot clearly shows nonconstant error variance. The score test for nonconstant variance suggests that the variance changes with the value of the estimated mean, E(Time|T1,T2). (χ2’61.66 df = 1, p = .000)
You can specify a different linear combination of T1 and T2 to use for testing the nonconstancy of the variance by clicking on the Variance terms menu to the left of the NCV plot. Here we can specify the variance depends on T1 and T2, but not in the exact form specified by the estimated mean function. The result is shown below.
The score test again suggests that the variance is not constant. (χ2’ 82.93 df = 2, p = .000). Which choice of v is more appropriate for these data? We can perform a hypothesis test to answer this question.
NH: variance changes with the estimated mean E(Time|T1,T2)
log(Var(Time|T1,T2)) = log(σ2) + γE(Time|T1,T2)
AH: variance changes as a function of T1 and T2
log(Var(Time|T1,T2)) = log(σ2) + α1*T1 + α2*T2
The test statistic is the difference in the chi-square statistics above.
χ2diff = χ2“Η − χ2ΝΗ ’ 82.93 − 61.66 ’ 21.27 ∼ χ2 df = (2 - 1) = 1
The associate p-value for the test statistic is:
Chisq dist. with 1 df, value = 21.27, upper-tail probability = .000
Thus we reject the NH and conclude that the estimated mean does not adequately model the variance and an arbitrary linear combination of T1 and T2 is needed. In similar fashion we could test if the total number of transactions, S = T1 + T2, adequately models the variance. (Note: to do this in Arc simply add the variate S = T1 + T2 to the data set.) The results of the test suggest that S = T1 + T2 does not adequately model the variance either.
What is the distribution of residuals in a linear regression analysis? In a linear regression analysis it is assumed that the distribution of residuals, , is, in the population, normal at every level of predicted Y and constant in variance across levels of predicted Y. I shall illustrate how to check that assumption.