Hello everyone, this blog is about a regression technique called Poisson regression. Poisson regression is used to model **count variables**.

Let me explain about count data first..!! It is type of data in which observations can take only non-negative integer values and these integers arise from counting rather than ranking. **Count variable** is an individual piece of count data.

An** example** of where Poisson regression can be used will help you understand better; Poisson regression can be used to estimate number of people in line in front of you at the grocery store. Predictors may include the number of items currently offered at a special discounted price and whether a special event (e.g., a holiday, a big sporting event) is a few days away.

Difference between a Poisson regression and a linear regression is that the linear regression assumes that true values of dependent variable are distributed normally around the expected value and can take any real, positive, negative or fractional value.

Whereas** Poisson regression** is used to model count of events and typical scenarios are no: of orders placed by a customer in a time frame or no: of visits to the website by an individual user.

Now let’s discuss how to create a Poisson regression model using R.

I will use the data on the distribution of 3605 individual trees of *Beilschmiedia pendula(a tree species)* in 500 x 1000 m forest plot in Panama. The dataset is freely available as a part of the R’s spatstat library.

**Step 1**: loading the data and plotting it for better understanding.

install.packages(“spatstat”)

library(spatstat)

library(raster)

library(sp)

plot(bei$x, bei$y, pch = 19, cex = 0.5, main = “Spatial distribution of individuals in the 50-ha Barro Colorado plot”,

xlab = “x coordinate [m]”, ylab = “y coordinate [m]”, frame = FALSE)

abline(h = 0, col = “black”)

abline(h = 500, col = “black”)

abline(v = 0, col = “black”)

abline(v = 1000, col = “black”)

The dataset comes with two environmental layers elevation, slope. **My goal is to model density of tree individuals as a function of elevation**. I am interested in predicting density of the trees i.e. number **n** of individuals per unit area. Hence, I will resample the data into a grid of 50 x 50 m:

elev <- raster(bei.extra[[1]]) #coarsening the predictor data into the 50*50 grid by taking mean of the 5*5m grid cells

ext <- extent(2.5, 1002.5, 2.5, 1002.5) # cropping the data so that they have exactly 500 x 1000 cells

elev <- crop(elev, ext)

elev50 <- aggregate(elev, fact = 10, fun = mean) # aggregating the elevation data

xy <- data.frame(x = bei$x, y = bei$y) # fitting the point data into the 50 x 50 m grid

n50 <- rasterize(xy, elev50, fun = “count”)

n50[is.na(n50)] <- 0 # replacing the NA values by 0

**STEP 2**: Initial data visualization.

plot(elev50[], n50[], cex = 1, pch = 19, col = “grey”, ylab = “# of Individuals”,

xlab = “Mean Elevation [m]”)

**STEP 3**:centering and standardization

I find it necessary to center (to 0 mean) and standardize (to variance of 1) the variables. Hence,

scale2 <- function(x) {

sdx <- sqrt(var(x))

meanx <- mean(x)

return((x – meanx)/sdx)

}

elev50 <- scale2(elev50[])

pow.elev50 <- elev50^2

n50 <- n50[]

**Step 4**: Fitting the model using glm()

Fitting the model with the glm() function is easy. You just need to specify that the data is drawn from Poisson distribution and that is modeled in logarithmic space.

m.glm <- glm(n50 ~ elev50 + pow.elev50, family = “poisson”)

summary(m.glm)

Follow this link to understand the summary of m.glm.

Now, this helps me in creating a **smooth prediction** curve.

elev.seq <- seq(-3, 2, by = 0.05) #generates a sequence of numbers

new.data <- data.frame(elev50 = elev.seq, pow.elev50 = elev.seq^2)# The numbers generated by sequence are stored in data frame new.predict <- predict(m.glm, newdata = new.data, type = “response”) #generating the prediction curve

lines(elev.seq, new.predict, col = “red”, lwd = 2) #plotting the predicted curve

With the help of this prediction curve the prediction can be made.!!

References:

http://www.ats.ucla.edu/stat/r/dae/poissonreg.htm