Simple Linear Regression using R’s lm() function

Regression

By Kuya Kevin

R can be pretty useful in doing statistics. It’s advisable for most classroom instructors to integrate this tool because it’s free and open-sourced. One of the most used tool in R is the lm() function to run a linear regression. (You can run the code ?lm for a full documentation of the function.

Say, that there have been a chemical factory accident and radiation levels (measured in Sievert) are measured at 7 separate distances from the factory. We wish to find out whether there is a linear relationship between the distance and radioactive levels; and create a prediction tool where we can forecast the radioactive levels for a given distance.

We have this data:

DistanceRadioactive level
10155
20121
50110
10084
20045
50010
10004
Radioactive levels (in Sievert) measured about the following distance in meters

We begin by writing the data in R. We can set the vector names as x and y for us to simply call those vectors:

x <- c(10,20,50,100,200,500,1000)
y <- c(155,121,110,84,45,10,4)

It has been of practice that we assign the linear regression to a variable. In our case, we name our variable “model”. The code for running the linear regression will be:

model <-lm(y~x)

This basically tells R to find the least squares regression model and store it in “model”. To print the summary of the regression, we simply code:

summary(model)

And the results are:

Call:
lm(formula = RadiationLevels ~ Distance)

Residuals:
      1       2       3       4       5       6       7 
 45.019  12.350   5.342 -14.004 -39.696 -34.774  25.763 


Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 111.3112    16.5231   6.737  0.00109 **
Distance     -0.1331     0.0383  -3.475  0.01776 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 34.21 on 5 degrees of freedom
Multiple R-squared:  0.7072,	Adjusted R-squared:  0.6486 
F-statistic: 12.07 on 1 and 5 DF,  p-value: 0.01776
Interpreting the results

The first thing we look for in running the regression is the model itself. It can be done by using the coefficients estimate. With this, the model will be:

Radioactive level = 111.3112 - 0.1331*Distance
Testing the Overall Model

Once we have established the model, the second thing we need to assess is the validity of the overall model. This can be done by testing the F-stat of the overall model. Since we are using R, we simply look at the p-value and check whether it is lower than our set level of significance.

In this case, the p-value 0.01776 is lower than our significance level 0.05. With this fact, we have gathered sufficient evidence to conclude that the model is predicting significant proportion of the variation of radioactive levels.

Interpreting R-Squared

The next thing that needs to be done if we find the model as significant is to interpret the coefficient of determination. It gives us the proportion of variation accurately explained by the model in all our observations. In this case, we interpret R-squared as “70.72% of the variation in the dependent variable are accurately predicted by the model”.

Diagnostics

After interpreting the results of the entire regression and testing whether the model is significant, we proceed by analyzing the residuals. We can do this by simply plotting the residuals via plot() function.

plot(model)

The first result is the “Residual vs Fit Plot”. The thing we need to check here is that there should be no discernible pattern with the plots. Given that residuals represent the randomness in a regression, it should not form any correlation with any predictor. If a pattern is noticed, we will need a transformation in any of the variable involved since this indicates non-linearity of the relationship between the dependent and independent variable.

The second plot shows the normal QQ plot of the residuals. As an assumption in running a regression, the residuals should be approximately normal. If the points deviate significantly in the normal QQ line, this assumption will be violated making the regression results problematic.

The third plot will show the scale location. This tests the homoskedasticity assumption of regression. The main indicator is the scale line. If it’s approximately flat, we can conclude that the variation is distributed almost equally along the regression. A sloped line indicates that the distribution is not homogenous implying problems on heteroskedasticity.

The last plot will show the effects of any outlier. If some points are outside the leverage plots, we can say that an outlier significantly affected the whole model.

To show all graphs all at once, we can use the par(mfrow=c(2,2)) script:

par(mfrow=c(2,2))
plot(model)
Residual Plot

Once all this test are satisfied, it is only by then that we conclude the results as valid.

90 thoughts on “Simple Linear Regression using R’s lm() function

  1. I agree with your point of view, your article has given me a lot of help and benefited me a lot. Thanks. Hope you continue to write such excellent articles.

  2. When someone writes an paragraph he/she retains the
    thought of a user in his/her mind that how a
    user can be aware of it. Thus that’s why this piece of
    writing is perfect. Thanks!

  3. excellent points altogether, you just gained a brand new reader.
    What could you suggest in regards to your post that you just
    made some days in the past? Any certain?

  4. Hi! This post couldn’t be written any better!
    Reading this post reminds me of my old room mate!
    He always kept talking about this. I will forward this post to him.
    Pretty sure he will have a good read. Many thanks for sharing!

  5. Woah! I’m really loving the template/theme of this site. It’s simple, yet effective.
    A lot of times it’s very hard to get that “perfect balance” between user friendliness and appearance.
    I must say you have done a very good job with this. Additionally, the
    blog loads extremely fast for me on Firefox.
    Exceptional Blog!

  6. I just could not go away your web site prior to suggesting that I extremely enjoyed the standard information an individual provide on your
    visitors? Is going to be again incessantly in order to check up on new posts

  7. We’re a group of volunteers and opening a new scheme in our community.
    Your website offered us with helpful information to work on. You’ve done a formidable task and our entire group will probably be grateful to you.

  8. Heya i’m for the first time here. I came across this board and I find It really useful & it helped me out a lot.
    I hope to give something back and aid others like you helped me.

  9. You should take part in a contest for one of the highest quality websites
    online. I most certainly will highly recommend
    this web site!

Leave a Reply

Your email address will not be published. Required fields are marked *

New Report

Close