# In regression, when is coefficient of determination zero

The coefficient of determination is a measure of how much of the original uncertainty in the data is explained by the regression model.

The coefficient of determination, $r^2$ is defined as

$r^2$=$\frac{S_t-S_r}{S_r}$

where

$S_t$ = sum of the square of the differences between the y values and the average value of y

$S_r$ = sum of the square of the residuals, the residual being the difference between the observed and predicted values from the regression curve.

The coefficient of determination varies between 0 and 1. The value of the coefficient of determination of zero means that no benefit is gained by doing regression. When can that be?

One case comes to mind right away – what if you have only one data point. For example, if I have only one student in my class and the class average is 80, I know just from the average of the class that the student’s score is 80. By regressing student score to the number of hours studied or to his GPA or to his gender would not be of any benefit. In this case, the value of the coefficient of determination is zero.

What if we have more than one data point? Is it possible to get the coefficient of determination to be zero?

The answer is yes. Look at the following data pairs (1,3), (3,-2), (5,4), (7,-5), (9,4.2), (11,3), (2,4). If one regresses this data to a general straight line

y=a+bx,

one gets the regression line to be

y=1.6

In fact, 1.6 is the average value of the given y values. Is this a coincidence? Because the regression line is the average of the y values, $S_t=S_r$, implying $r^2=0$

QUESTIONS

1. Given (1,3), (3,-2), (5,4), (7,a), (9,4.2), find the value of a that gives the coefficient of determination, $r^2=0$. Hint: Write the expression for $S_r$ for the regression line $y=mx+c$. We now have three unknowns, m, c and a. The three equations then are $\frac{\partial S_r} {\partial m} =0$, $\frac{\partial S_r} {\partial c} =0$ and $S_t=S_r$.
2. Show that if n data pairs $(x_1,y_1)......(x_n,y_n)$ are regressed to a straight line, and the regression straight line turns out to be a constant line, then the equation of the constant line is always y=average value of the y-values.

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu

## Author: Autar Kaw

Autar Kaw (http://autarkaw.com) is a Professor of Mechanical Engineering at the University of South Florida. He has been at USF since 1987, the same year in which he received his Ph. D. in Engineering Mechanics from Clemson University. He is a recipient of the 2012 U.S. Professor of the Year Award. With major funding from NSF, he is the principal and managing contributor in developing the multiple award-winning online open courseware for an undergraduate course in Numerical Methods. The OpenCourseWare (nm.MathForCollege.com) annually receives 1,000,000+ page views, 1,000,000+ views of the YouTube audiovisual lectures, and 150,000+ page views at the NumericalMethodsGuy blog. His current research interests include engineering education research methods, adaptive learning, open courseware, massive open online courses, flipped classrooms, and learning strategies. He has written four textbooks and 80 refereed technical papers, and his opinion editorials have appeared in the St. Petersburg Times and Tampa Tribune.

## 2 thoughts on “In regression, when is coefficient of determination zero”

1. Russ Aveney says:

Dr. Kaw,
I get two values for a, a=78.1994 and -147.079? The equation I solved after setting Sr=St was quadratic so it makes sense to have two roots, but would they both be valid answers? I solved using paper and pencil and not Matlab.
Can you give me a hint for part 2) ?

Like

1. What needs to be done now is to take each value of a and see if they both give a0=ybar, and a1=0.

Like