Hi everyone,

Here, i have explained about the possibilities of Normal distribution in R.

A **normal distribution** is a very important statistical data distribution pattern occurring in many natural phenomena, such as height, blood pressure, lengths of objects produced by machines, etc. Certain data, when graphed as a histogram (data on the horizontal axis, amount of data on the vertical axis), creates a bell-shaped curve known as a normal curve or normal distribution.

Normal distributions are symmetrical with a single central peak at the mean (average) of the data. The shape of the curve is described as bell-shaped with the graph falling off evenly on either side of the mean. Fifty percent of the distribution lies to the left of the mean and fifty percent lies to the right of the mean.

Graph 1: A sample normal distribution

**Importance of Normal distribution.**

- Heights of people
- Size of the things produced by machines
- Errors in measurement
- Blood pressure
- Marks on a test

- The normal distribution is easy to work with mathematically. In many practical cases, the methods developed using normal theory work quite well even when the distribution is not normal.
- There is a very strong connection between the size of a sample N and the extent to which a sampling distribution approaches the normal form. Many sampling distributions based on large N can be approximated by the normal distribution even though the population distribution itself is definitely not normal.
- Normal distribution is important because of Central Limit TheoremTells us that sampling distribution of other non-normal distributions approaches a normal distribution as the sample size increases. It allows us to perform hypothesis testing on all sorts of data.

**Normal distribution with R.**

The following functions support Normal distribution in R.

Functions |
Purpose |
Syntax |
Examples |

rnorm | Generates random number from normal distribution | rnorm(n,mean,sd) | rnorm(500, 3, .25) Generates 500 numbers from a normal with mean 3 and sd=.25 |

dnorm | Probability density function (PDF) | dnorm(x,mean,sd) | dnorm(0, 0, .5) Gives the density (height of the PDF) of the normal with mean=0 and sd=0.5 |

pnorm | Cumulative distribution function (CDF) | pnorm(q,mean,sd) | pnorm(1.96, 0, 1) Gives the area under the standard normal curve to the left of 1.96, i.e. ~0.975 |

qnorm | Quantile function inverse of pnorm | qnorm(p,mean,sd) | qnorm(0.975, 0, 1) Gives the value at which the CDF of the standard normal is .975, i.e. ~1.96 |

Table 1: Representation of functions of Normal distribution

**Real time examples involving the above functions.**

**1. Cumulative distribution function(CDF)**

**Ex: 1** Suppose the mean and standard deviation for heights of class 10 students are as follows.

Mean = 172 cm

Standard deviation = 10 cm

A. Compute the probability of a student being no taller than 180 cm. (i.e. less than or equal to 180 cm)

P(X<=180)

**Solution**:

R code: pnorm(180,mean=172,sd=10)

(Or) pnorm(180,mean=172,sd=10,lower.tail=TRUE)

[1] 0.7881446

Therefore, the percentage of students being no taller than 180 cm is 78.81%

B. Compute the probability of a student being taller than 185 cm.(i.e. greater than or equal to 185 cm)

P(X>=185)

**Solution**:

R code: 1-pnorm(185,mean=172,sd=10)

(or) pnorm(185,mean=172,sd=10,lower .tail= FALSE)

[1] 0.09680048 Therefore, we can conclude that percentage of students being taller than 185 cm is 9.7%

**2**. **Quantile function**

The function qnorm(), which comes standard with R, aims to do the opposite of pnorm

**Ex 1**: suppose you want to find that 85th percentile of a normal distribution whose mean is 70 and whose standard deviation is 3. Then you ask for:

qnorm(0.85,mean=70,sd=3)

[1] 73.1093

The value 73.1093 is indeed the 85th percentile, in the sense that 85% of the values in a population that is normally distributed with mean 70 and standard deviation 3 will lie below 73.1093. In other words, if you were to pick a random member XX from such a population, then

P(X<73.1093) =0.85

**Ex 2:** Let’s find 25^{th} percentile of first quartile. Q1: qnorm(p=0.25,mean=75,sd=5,lower.tail=TRUE)

[1] 71.62755 71.62% is the value for the first quartile.

P(X<73.1093) =0.85

**3. Density function**** **

Let’s create a sequence to find density function

x <- seq(from 55,to 95,by 0.25)

density<- dnorm(x,mean=75,sd=5)

plot(x,density,type=”l”)

plot(x,density,type=’l’,main=”X Normal: Mean=75,SD=5″,xlab=”X”,ylab =”Probability Density”,las=1)

abline(v=75)

Graph 2: Plot of x and density function

**4**. **A random sample from a normally distributed population**

rnorm is used to generate n normal random numbers with arguments mean and sd.random<-rnorm(n=40,mean=75,sd=5)

hist(random)

Below plot shows the normal distribution of random numbers.

Graph 3: Histogram representing Normal distribution

**The following code demonstrates the above functions.**

set.seed(3000) # To specify seeds

xseq<-seq(-4,4,.01) # creating sequence of number from -4 to 4 by .01 difference

densities<-dnorm(xseq, 0,1) # Generates probability density function.

cumulative<-pnorm(xseq, 0, 1) # Generates cumulative distribution function

randomdeviates<-rnorm(1000,0,1) #Generates 1000 random numbers from normal distribution

par(mfrow=c(1,3), mar=c(3,4,4,2))

plot(xseq, densities, col=”darkgreen”,xlab=””, ylab=”Density”, type=”l”,lwd=2, cex=2, main=”PDF of Standard Normal”, cex.axis=.8)

# A plot between series of numbers and corresponding densities

plot(xseq, cumulative, col=”darkorange”, xlab=””, ylab=”Cumulative Probability”,type=”l”,lwd=2, cex=2, main=”CDF of Standard Normal”, cex.axis=.8)

#A plot between series and Cumulative distribution function

hist(randomdeviates, main=”Random draws from Std Normal”, cex.axis=.8, xlim=c(-4,4))

# Histogram to represent random deviates

Graph 4: Plots showing normal distribution

**References:**

- http://statistation.blogspot.in/2012/05/importance-of-normal-distribution.html
- http://www.r-bloggers.com/normal-distribution-functions/