COVID19 Regression Model and Other Thoughts

March 28, 2020
I am not an epidemiologist but do regression modeling for a living. I have done regression modeling to predict student grades just after the first test, and now we will be using adaptive learning metrics to improve our identification schemes even earlier in the semester. It is not to spell boom or doom to a student but only to intervene early with personalized recommendations.
Looking at whatever data I can use and have time to scrape, four things are reasonably clear at this time to me about COVID19.
1) First, the rate of infection is exponential but it does not stay like that forever.  The logistic function of the infection rate is analogous to how the mass of a moving rocket decreases as it burns up its fuel. F=ma, but m is not a constant.
2) Second, President Trump is finally thinking about quarantining NY, NJ, CT area. A little late but it will definitely decrease the power of the exponent.
3) Third, we have to get more testing done but one which is totally random. We could have found the effect of the spring breakers coming to FL and of the college kids being sent home to parents who are having kids late in life. Please do not send your grandkids to grandpa/grandma’s retirement home. They may be the children of the corn.
4) Fourth, Florida and Louisiana need to get their head straightened out and use tougher rules to keep people inside and a method to keep outsiders out. They are the next hot zone.

Sum of the residuals for the linear regression model is zero.

Prove that the sum of the residuals for the linear regression model is zero.sum of residuals is zero Page 1sum of residuals is zero Page 2________________________________________________

This post is brought to you by

Effect of Significant Digits: Example 2: Regression Formatting in Excel

In a series of bringing pragmatic examples of the effect of significant digits, we discuss the influence of using default and scientific formats in the trendline function of Microsoft Excel.  This is the second example (first example was on a beam deflection problem) in the series.

____________________________________________

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu, the textbook on Numerical Methods with Applications available from the lulu storefront, the textbook on Introduction to Programming Concepts Using MATLAB, and the YouTube video lectures available at http://numericalmethods.eng.usf.edu/videos.  Subscribe to the blog via a reader or email to stay updated with this blog. Let the information follow you.

Does the solve command in MATLAB not give you an answer?

Recently, I had assigned a project to my class where they needed to regress n number of x-y data points to a nonlinear regression model y=exp(b*x).  However, they were NOT allowed to transform the data, that is, transform data such that linear regression formulas can be used to find the constant of regression b.  They had to do it the new-fashioned way: Find the sum of the square of the residuals and then minimize the sum with respect to the constant of regression b.

To do this, they conducted the following steps

  1. setup the equation by declaring b as a syms variable,
  2. calculate the sum  of the square of the residuals using a loop,
  3. use the diff command to set up the equation,
  4. use the solve command. 

However, the solve command gave some odd answer like log(z1)/5 + (2*pi*k*i)/5.  The students knew that the equation has only one real solution – this was deduced from the physics of the problem. 

We did not want to set up a separate function mfile to use the numerical solvers such as fsolve.  To circumvent the setting up of a separate function mfile, we approached it as follows.  If dbsr=0 is the equation you want to solve, use

F = vectorize(inline(char(dbsr)))
fsolve(F, -2.0)

What char command does is to convert the function dbsr to a string, inline constructs it to an inline function, vectorize command vectorizes the formula (I do not fully understand this last part myself or whether it is needed).

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu, the textbook on Numerical Methods with Applications available from the lulu storefront, the textbook on Introduction to Programming Concepts Using MATLAB, and the YouTube video lectures available athttp://numericalmethods.eng.usf.edu/videos.  Subscribe to the blog via a reader or email to stay updated with this blog. Let the information follow you.

Does it make a large difference if we transform data for nonlinear regression models

__________________________________________________

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu, the textbook on Numerical Methods with Applications available from the lulu storefront, and the YouTube video lectures available at http://numericalmethods.eng.usf.edu/videos and http://www.youtube.com/numericalmethodsguy

Subscribe to the blog via a reader or email to stay updated with this blog. Let the information follow you.

To prove that the regression model corresponds to a minimum of the sum of the square of the residuals

Many regression models when derived in books only show the first derivative test to find the formulas for the constants of a regression model.  Here we take a  simple example to go through the complete derivation.

Finding minimum of sum of square of residuals

Minimum of sum of square of residuals

_________________________________________________________

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu, the textbook on Numerical Methods with Applications available from the lulu storefront, and the YouTube video lectures available at http://numericalmethods.eng.usf.edu/videos and http://www.youtube.com/numericalmethodsguy

Subscribe to the blog via a reader or email to stay updated with this blog. Let the information follow you.

How do I do polynomial regression in MATLAB?

Many students ask me how do I do this or that in MATLAB.  So I thought why not have a small series of my next few blogs do that.  In this blog, I show you how to do polynomial regression.

  • The MATLAB program link is here.
  • The HTML version of the MATLAB program is here.
  • DO NOT COPY AND PASTE THE PROGRAM BELOW BECAUSE THE SINGLE QUOTES DO NOT TRANSLATE TO THE CORRECT SINGLE QUOTES IN MATLAB EDITOR.  DOWNLOAD THE MATLAB PROGRAM INSTEAD

%% HOW DO I DO THAT IN MATLAB SERIES?
% In this series, I am answering questions that students have asked
% me about MATLAB.  Most of the questions relate to a mathematical
% procedure.

%% TOPIC
% How do I do polynomial regression?

%% SUMMARY

% Language : Matlab 2008a;
% Authors : Autar Kaw;
% Mfile available at
% http://nm.mathforcollege.com/blog/regression_polynomial.m;
% Last Revised : August 3, 2009;
% Abstract: This program shows you how to do polynomial regression?
%           .
clc
clear all
clf

%% INTRODUCTION

disp(‘ABSTRACT’)
disp(‘   This program shows you how to do polynomial regression’)
disp(‘ ‘)
disp(‘AUTHOR’)
disp(‘   Autar K Kaw of https://autarkaw.wordpress.com’)
disp(‘ ‘)
disp(‘MFILE SOURCE’)
disp(‘   http://nm.mathforcollege.com/blog/regression_polynomial.m’)
disp(‘ ‘)
disp(‘LAST REVISED’)
disp(‘   August 3, 2009’)
disp(‘ ‘)

%% INPUTS
% y vs x data to regress
% x data
x=[-340  -280  -200  -120  -40  40  80];
% ydata
y=[2.45  3.33  4.30   5.09  5.72  6.24  6.47];
% Where do you want to find the values at
xin=[-300 -100 20  125];
%% DISPLAYING INPUTS
disp(‘  ‘)
disp(‘INPUTS’)
disp(‘________________________’)
disp(‘     x         y  ‘)
disp(‘________________________’)
dataval=[x;y]’;
disp(dataval)
disp(‘________________________’)
disp(‘   ‘)
disp(‘The x values where you want to predict the y values’)
dataval=[xin]’;
disp(dataval)
disp(‘________________________’)
disp(‘  ‘)

%% THE CODE
% Using polyfit to conduct polynomial regression to a polynomial of order 1
pp=polyfit(x,y,1);
% Predicting values at given x values
yin=polyval(pp,xin);
% This is only for plotting the regression model
% Find the number of data points
n=length(x);
xplot=x(1):(x(n)-x(1))/10000:x(n);
yplot=polyval(pp,xplot);
%% DISPLAYING OUTPUTS
disp(‘  ‘)
disp(‘OUTPUTS’)
disp(‘________________________’)
disp(‘   xasked   ypredicted  ‘)
disp(‘________________________’)
dataval=[xin;yin]’;
disp(dataval)
disp(‘________________________’)

xlabel(‘x’);
ylabel(‘y’);
title(‘y vs x ‘);
plot(x,y,’o’,’MarkerSize’,5,’MarkerEdgeColor’,’b’,’MarkerFaceColor’,’b’)
hold on
plot(xin,yin,’o’,’MarkerSize’,5,’MarkerEdgeColor’,’r’,’MarkerFaceColor’,’r’)
hold on
plot(xplot,yplot,’LineWidth’,2)
legend(‘Points given’,’Points found’,’Regression Curve’,’Location’,’East’)
hold off
disp(‘  ‘)

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://nm.mathforcollege.com, the textbook on Numerical Methods with Applications available from the lulu storefront, and the YouTube video lectures available at http://nm.mathforcollege.com/videos and http://www.youtube.com/numericalmethodsguy

Subscribe to the blog via a reader or email to stay updated with this blog. Let the information follow you.

Finding the optimum polynomial order to use for regression

Many a times, you may not have the privilege or knowledge of the physics of the problem to dictate the type of regression model. You may want to fit the data to a polynomial. But then how do you choose what order of polynomial to use.

Do you choose based on the polynomial order for which the sum of the squares of the residuals, Sr is a minimum? If that were the case, we can always get Sr=0 if the polynomial order chosen is one less than the number of data points. In fact, it would be an exact match.

So what do we do? We choose the degree of polynomial for which the variance as computed by

Sr(m)/(n-m-1)

is a minimum or when there is no significant decrease in its value as the degree of polynomial is increased. In the above formula,

Sr(m) = sum of the square of the residuals for the mth order polynomial

n= number of data points

m=order of polynomial (so m+1 is the number of constants of the model)

Let’s look at an example where the coefficient of thermal expansion is given for a typical steel as a function of temperature. We want to relate the two using polynomial regression.

Temperature

Instantaneous Thermal Expansion

oF

1E-06 in/(in oF)

80

6.47

40

6.24

0

6.00

-40

5.72

-80

5.43

-120

5.09

-160

4.72

-200

4.30

-240

3.83

-280

3.33

-320

2.76

If a first order polynomial is chosen, we get

alpha=0.009147T+5.999, with Sr=0.3138.

If a second order polynomial is chosen, we get

alpha=-0.00001189T^2+0.006292T+6.015 with Sr=0.003047.

Below is the table for the order of polynomial, the Sr value and the variance value, Sr(m)/(n-m-1)

Order of

polynomial, m

Sr(m)

Sr(m)/(n-m-1)

1

0.3138

0.03486

2

0.003047

0.0003808

3

0.0001916

0.000027371

4

0.0001566

0.0000261

5

0.0001541

0.00003082

6

0.0001300

0.000325

So what order of polynomial would you choose?

From the above table, and the figure below, it looks like the second or third order polynomial would be a good choice as very little change is taking place in the value of the variance after m=2.

Optimum order of polynomial for regression

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu

Subscribe to the feed to stay updated and let the information follow you.

Data for aluminum cylinder in iced water experiment

A colleague asked me what if he did not have time or resources to do the experiments that have been developed at University of South Florida (USF) for numerical methods. He asked if I could share the data taken at USF.

Why not – here is the data for the experiment where an aluminum cylinder is placed in iced water. This link also has the exercises that the students were asked to do.

The temperature vs time data is as follows: (0,23.3), (5,16.3), (10,13), (15,11.8), (20,11), (25,10.7), (30,9.6), (35,8.9), (40,8.4). Time is in seconds and temperature in Celcius. Other data needed is

Ambient temperature of iced water = 1.1oC

Diameter of cylinder = 44.57 mm

Length of cylinder = 105.47 mm

Density of aluminum = 2700 kg/m3

Specific heat of aluminum = 901 J/(kg-oC)

Thermal conductivity of aluminum = 240 W/(m-K)

Table 1. Coefficient of thermal expansion vs. temperature for aluminum (Data taken from http://www.llnl.gov/tid/lof/documents/pdf/322526.pdf by using mid values of temperatures at which CTE is reported)

Temperature

(oC)

Coefficient of thermal expansion

(μm/m/oC)

-10

58

12.5

59

37.5

60

62.5

62

87.5

66

112.5

71

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu

Subscribe to the feed to stay updated and let the information follow you.

In regression, when is coefficient of determination zero

The coefficient of determination is a measure of how much of the original uncertainty in the data is explained by the regression model.

The coefficient of determination, r^2 is defined as

r^2=\frac{S_t-S_r}{S_r}

where

S_t = sum of the square of the differences between the y values and the average value of y

S_r = sum of the square of the residuals, the residual being the difference between the observed and predicted values from the regression curve.

The coefficient of determination varies between 0 and 1. The value of the coefficient of determination of zero means that no benefit is gained by doing regression. When can that be?

One case comes to mind right away – what if you have only one data point. For example, if I have only one student in my class and the class average is 80, I know just from the average of the class that the student’s score is 80. By regressing student score to the number of hours studied or to his GPA or to his gender would not be of any benefit. In this case, the value of the coefficient of determination is zero.

What if we have more than one data point? Is it possible to get the coefficient of determination to be zero?

The answer is yes. Look at the following data pairs (1,3), (3,-2), (5,4), (7,-5), (9,4.2), (11,3), (2,4). If one regresses this data to a general straight line

y=a+bx,

one gets the regression line to be

y=1.6

When is rsquared zero?

In fact, 1.6 is the average value of the given y values. Is this a coincidence? Because the regression line is the average of the y values, S_t=S_r, implying r^2=0

QUESTIONS

  1. Given (1,3), (3,-2), (5,4), (7,a), (9,4.2), find the value of a that gives the coefficient of determination, r^2=0. Hint: Write the expression for S_r for the regression line y=mx+c. We now have three unknowns, m, c and a. The three equations then are \frac{\partial S_r} {\partial m} =0, \frac{\partial S_r} {\partial c} =0 and S_t=S_r.
  2. Show that if n data pairs (x_1,y_1)......(x_n,y_n) are regressed to a straight line, and the regression straight line turns out to be a constant line, then the equation of the constant line is always y=average value of the y-values.

This post is brought to you by Holistic Numerical Methods: Numerical Methods for the STEM undergraduate at http://numericalmethods.eng.usf.edu

Subscribe to the feed to stay updated and let the information follow you.