What is Linear Regression? Part 4: Non-linear Transformations of the Predictors

In the previous post, we discussed a way to introduce Interaction Terms to our Linear Model. The other previous posts talked about Simple Linear Regression and Multiple Linear Regression. Now we are ready to talk about situations where sometimes the target is related to a predictor but in a non-linear manner, for example medv being related to (lstat)2  and not just lstat.

 

This non-linear relationship can be accommodated by the lm() function. One thing we need to be careful of is when we are using R, the usual mathematical power character “^” has a special meaning so we cannot use it here. We will use the I() function to help us use the “^” character. Here is how it’s done:

 

lm.fit4 <- lm(medv ~ lstat + I(lstat ^ 2), data = boston_data)
summary(lm.fit4)

Call:
lm(formula = medv ~ lstat + I(lstat^2), data = boston_data)

Residuals:
Min 1Q Median 3Q Max 
-15.2834 -3.8313 -0.5295 2.3095 25.4148

Coefficients:
Estimate Std. Error t value Pr(>|t|) 
(Intercept) 42.862007 0.872084 49.15 <2e-16 ***
lstat -2.332821 0.123803 -18.84 <2e-16 ***
I(lstat^2) 0.043547 0.003745 11.63 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.524 on 503 degrees of freedom
Multiple R-squared: 0.6407, Adjusted R-squared: 0.6393 
F-statistic: 448.5 on 2 and 503 DF, p-value: < 2.2e-16

 

When you look at the summary results, you can see that he quadratic term (lstat)2  is very significant and having it there makes the model better than when we only use lstat by itself. Remember our Simple Linear Regression post? We did see evidence of there being a non-linear relationship between medv and lstat. As an exercise, plot the residuals of this model which has the non-linear term. You will be glad to see that the pattern which was clear when there was just the linear terms disappears when there in the non-linear term. Not having a discernible pattern when you plot the residuals is an indicator that you are accounting for the pattern in the data and you are now just left with trends due to random chance.

 

You can also include cubic terms in the model. One can write them as I(X3) etc and add them to the model but this can quickly becoming tiring. R allows us to use the poly() function to create polynomials of a certain power. The first argument is the variable and the second argument is the degree of the polynomial (highest power of the variable), for example poly(x, 2) creates a polynomial of degree 2 which is x + x2 . See that plugging this into our lm() function creates a linear model with the “+” sign in between the terms of our model:

 

lm.fit4 <- lm(medv ~ poly(lstat, 5)

This is the same as regressing medv onto terms of lstat all the way up to power 5. poly(lstat, 5) is the same as writing lstat + I(lstat ^ 2) + I(lstat ^ 3) + I(lstat ^ 4) + I(lstat ^ 5)

 

summary(lm.fit4)

Call:
lm(formula = medv ~ poly(lstat, 5), data = boston_data)

Residuals:
Min 1Q Median 3Q Max 
-13.5433 -3.1039 -0.7052 2.0844 27.1153

Coefficients:
Estimate Std. Error t value Pr(>|t|) 
(Intercept) 22.5328 0.2318 97.197 < 2e-16 ***
poly(lstat, 5)1 -152.4595 5.2148 -29.236 < 2e-16 ***
poly(lstat, 5)2 64.2272 5.2148 12.316 < 2e-16 ***
poly(lstat, 5)3 -27.0511 5.2148 -5.187 3.10e-07 ***
poly(lstat, 5)4 25.4517 5.2148 4.881 1.42e-06 ***
poly(lstat, 5)5 -19.2524 5.2148 -3.692 0.000247 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.215 on 500 degrees of freedom
Multiple R-squared: 0.6817, Adjusted R-squared: 0.6785 
F-statistic: 214.2 on 5 and 500 DF, p-value: < 2.2e-16

 

The summary of this model says including powers to the fifth power actually improves our model! See they are all significant.

 

There are other transformations one can use, for example logarithmic transformations by regressing medv onto log(rm). Machine learning modeling is an art, I’m sure you have heard that before. We are trying to estimate real life situations with mathematical models so we sometimes have to do many trial and errors before we find a model that works. Upcoming posts are going to discuss how we will test this accuracy using validation and test sets. Before closing off this part about basics of Linear Regression, there is one thing we need to talk about and that is how to deal with qualitative variables.