Hi all, this blog is about regression. I will be writing a series of blogs to explain different regression methods. Starting with what is regression; it is a statistical method to find relationship between variables.
One of the methods of regression is Simple linear regression; it is followed when there is only one explanatory variable, wondering what “explanatory” variable is? It is nothing but an independent variable.
Basically Simple linear regression is used to predict the value of dependent variable based on independent variable. There are some statistical terms which are to be understood before going to regression so here we go.
- SUM OF ERRORS (SSE): It is a measure of deviation between data and an estimated model. Purpose of Simple linear regression is to make a model which minimizes SSE.
- In Simple linear regression we follow the equation, Y=α + (β*x) + ε where,
X –> independent variable, Y–>dependent variable,
α, β –> parameters i.e. coefficient of the variable, constant and
ε is the error term.
- Correlation: correlation is a measure of the extent to which two or more variables vary together.
For Applying simple linear regression using R the following steps are to be followed.
STEP 1: Load the data into R
You can get the data from the link provided data set
Data <- read.csv(“Unemployment_rate.csv”)
The first few records of data are as shown.
For every 2.9 unemployed male the rate of unemployment in female is 4.0.
Unemployment_rate_for_male –> Independent variable.
Unemployment_rate_for_female –> Dependent variable.
I am plotting the data for better understanding.
STEP 2: Creating the linear model.
Now that we know what are independent and dependent variable lets create the linear model and see the relationship that exists between them.
Linearmodel <- lm (data$unemployment_male ~ data$unemployment_female, data = data)
The summary of the linear model can be viewed using the command
Summary ( linearmodel )
For a clear understanding of summary(linearmodel), go to the following link.
Now, abline(linearmodel) draws the linear model line on the plot, which visualizes the model that we just created.
STEP 3: Finally using the linear model, predict the dependent variable.
The value of dependent variable can be predicted using predict () function.
Hence, it can be inferred that, for every 2 unemployed males there are 2.823 unemployed females as per the data given.