Simple linear regression

Hi all, this blog is about regression. I will be writing a series of blogs to explain different regression methods. Starting with what is regression; it is a statistical method to find relationship between variables.

One of the methods of regression is Simple linear regression; it is followed when there is only one explanatory variable, wondering what “explanatory” variable is? It is nothing but an independent variable.

Basically Simple linear regression is used to predict the value of dependent variable based on independent variable. There are some statistical terms which are to be understood before going to regression so here we go.

  • SUM OF ERRORS (SSE): It is a measure of deviation between data and an estimated model. Purpose of Simple linear regression is to make a model which minimizes SSE.
  • In Simple linear regression we follow the equation,                                                                                                             Y=α + (β*x) + ε where,

X –> independent variable, Y–>dependent variable,

α, β –> parameters i.e. coefficient of the variable, constant and

ε is the error term.

  • Correlation: correlation is a measure of the extent to which two or more variables vary together.

For Applying simple linear regression using R the following steps are to be followed.

STEP 1: Load the data into R

You can get the data from the link provided data set

Data <- read.csv(“Unemployment_rate.csv”)

The first few records of data are as shown.

head (Data)

image

For every 2.9 unemployed male the rate of unemployment in female is 4.0.

Unemployment_rate_for_male –> Independent variable.

Unemployment_rate_for_female –> Dependent variable.

I am plotting the data for better understanding.

plot(Data)

plot1

STEP 2: Creating the linear model.

Now that we know what are independent and dependent variable lets create the linear model and see the relationship that exists between them.

Linearmodel <- lm (data$unemployment_male ~ data$unemployment_female, data = data)

The summary of the linear model can be viewed using the command

Summary ( linearmodel )

summary.png

For a clear understanding of summary(linearmodel), go to the following link.

Now, abline(linearmodel) draws the linear model line on the plot, which visualizes the model that we just created.

plot2.png

STEP 3: Finally using the linear model, predict the dependent variable.

The value of dependent variable can be predicted using predict () function.

result

Hence, it can be inferred that, for every 2 unemployed males there are 2.823 unemployed females as per the data given.

References:

Introduction to linear regression

Simple linear regression

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s