In the previous post, I spoke about multiple linear regression. I will build onto regression analysis by moving onto a special case of linear regression which is referred to as polynomial regression. There are 2 ways to check for linearity of a model:
- To check the scatterplot of the residuals versus the fitted values
- To check the scatterplot of the residuals versus each parameter.
In some instances, a plot of the residuals versus a predictor may suggest there is a non-linear relationship. One way to account for such a relationship is through a polynomial regression model. Such a model, for a single predictor can be shown as:
Y – Dependent variable
X – Independent variable
β0, β1,…, βh – coefficients of X variable
h-the degree of the polynomial
For lower degrees, the relationship has a specific name (i.e. h=2 is called quadratic, h=3 is called, h=4 is called quadratic, and so on).
As you can see, this equation allows a non-linear relationship between the response and predictor variables, however it is essentially considered linear regression due to the linear regression coefficients β1,β2,…,βh.
Now that we are familiar with what polynomial regression is, let us see how it can be implemented using R.
Step 1: Loading the input
As input, I would be using a data set of counts of variables decreasing over time.
A <- structure(list(Time = c(0, 1, 2, 4, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 30),
Counts = c(126.6, 101.8, 71.6, 101.6, 68.1, 62.9, 45.5, 41.9, 46.3, 34.1, 38.2, 41.7, 24.7, 41.5, 36.6, 19.6, 22.8, 29.6, 23.5, 15.3, 13.4, 26.8, 9.8, 18.8, 25.9, 19.3)),
.Names = c(“Time”, “Counts”),
row.names = c(1L, 2L, 3L, 5L, 7L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 19L, 20L, 21L, 22L, 23L, 25L, 26L, 27L, 28L, 29L, 30L, 31L),
class = “data.frame”)
Once you are done with this, attach the dataset so that you can refer to all the variables directly by name, i.e.
Step2: Fit a linear model first
Before directly fitting a polynomial model, try to see if a linear model can be fitted.
linear.model = lm (Counts ~ Time)
Let us use the plot function to plot the counts over time and superpose the linear model as:
plot (Time, Counts, pch=16, ylab = “Counts “, cex.lab = 1.3, col = “red” )
abline (lm(Counts ~ Time), col = “blue”)
The output of the above piece of a code can be shown as:
Fig 1: Output of a linear fitting
Though this model looks fine, we can see that the plot has a curvature that is not explained well by a linear model. Hence there is a need to fit it using a polynomial model.
Step 3: Fitting a polynomial model
Let us create a variable named Time2 which is the square of Time variable. So using this variable to fit a quadratic model:
Time2 = Time ^ 2
quadratic.model = lm(Counts ~ Time + Time2)
This produces the following output:
As you can see in this figure, the assumed model achieves a high level of significance and it can be mathematically estimated as:
Step 4: Prediction
To obtain predicted values and to plot them subsequently, let us first create a grid of time values running from 0 to 30 seconds in increments of 0.1s.
timevalues = seq (0, 30, 0.1)
Next, let use the predict function to obtain a few predicted values and store it in a variable.
predictedcounts <- predict (quadratic.model,list (Time=timevalues, Time2=timevalues^2))
Let us plot Count and Time variables by using the plot function, and then include the quadratic model to the plot using the lines () command:
plot (Time, Counts, pch=16, xlab = “Time (s)”, ylab = “Counts”, cex.lab = 1.3, col = “blue”)
lines (timevalues, predictedcounts, col = “darkgreen”, lwd = 3)
As we can see, the quadratic model fits the data better than its linear counterpart.
Hope you have developed some understanding of what polynomial regression is about and its implementation mechanism.
- Polynomial regression example in R – http://rtutorialseries.blogspot.in/2010/02/r-tutorial-series-basic-polynomial.html
- For a tutorial on polynomial regression, visit this link.
- Few ideas were implemented by using this for reference.