Normal Distribution in R » Data Science Tutorials

Normal Distribution in R, also known as the Gaussian distribution, is a probability distribution that is frequently used in statistics and probability theory to describe continuous random variables.

It has a bell-shaped curve, which is symmetrical around its peak, and has a single mode, mean, and median.

In a normal distribution, the majority of the data lies within one, two, or three standard deviations of the mean.

This distribution is useful for modeling many natural phenomena, such as human heights, IQ scores, and manufacturing error rates.

In R, there are several functions to work with normal distributions.

The dnorm() function returns the probability density function (PDF) of the normal distribution at a given value, pnorm() calculates the cumulative distribution function (CDF), qnorm() computes the quantile function (inverse CDF), and rnorm() generates random numbers from the normal distribution.

dnorm(x, mean = 0, sd = 1, log = FALSE): This function calculates the probability density function (PDF) of the normal distribution at a given value x, with mean mean and standard deviation sd. If log is set to TRUE, the logarithm of the PDF is returned.
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE): This function returns the cumulative distribution function (CDF) of the normal distribution at a given value q, with mean mean and standard deviation sd. If lower.tail is set to FALSE, the upper tail probability is returned. If log.p is set to TRUE, the logarithm of the CDF is returned.
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE): This function calculates the quantile function (inverse CDF) of the normal distribution at a given probability p, with mean mean and standard deviation sd. If lower.tail is set to FALSE, the upper tail quantile is returned. If log.p is set to TRUE, the logarithm of the probability is used.
rnorm(n, mean = 0, sd = 1): This function generates n random numbers from the normal distribution with mean mean and standard deviation sd.

These functions are essential for statistical analysis and modeling in R.

1. The `rnorm()` function in R

In R, the rnorm() function is used to generate random numbers from a normal distribution with a specified mean and standard deviation. This function takes as input the number of random values to generate (n) and returns a vector of random numbers drawn from the specified normal distribution.

Here is an example of generating 1000 random numbers from a normal distribution with a mean of 0 and a standard deviation of 1:

# Generate 1000 random numbers from a normal distribution
data <- rnorm(1000, mean = 0, sd = 1)

To plot the histogram of the data to visualize the distribution, you can use the hist() function:

# Plot a histogram of the data to visualize the distribution
hist(data,
main = "Normal Distribution",
xlab = "Data",
ylab = "Frequency"
)

If you want to generate random numbers from a normal distribution with a different mean or standard deviation, simply adjust the values for the mean and sd parameters accordingly.

2. The `dnorm()` function in R

In R, the dnorm() function is used to compute the probability density function (PDF) of the normal distribution at a given value or set of values. The syntax of the dnorm() the function is as follows:

dnorm(x, mean = 0, sd = 1)

[1] 0.233739125 0.017084889 0.188385307 0.272568291 0.373586386 0.144752089 0.365860132 0.384779915
  [9] 0.147678959 0.338534206 0.353151001 0.390962827 0.297257330 0.391133451 0.117495067 0.383190027
 [17] 0.023892197 0.272226033 0.398719859 0.307927605 0.121609105 0.341852595 0.380249640 0.367853079
 [25] 0.206605168 0.333416097 0.398560637 0.371930210 0.211678541 0.293681652 0.393405671 0.374058955
 [33] 0.155446389 0.009008756 0.087112549 0.326664227 0.376238285 0.210204343 0.398244162 0.310979512
 [41] 0.339018219 0.174948518 0.103788892 0.188139800 0.199243090 0.398578486 0.265473965 0.388577476
 [49] 0.379801170 0.194105896 0.363096813 0.382807954 0.106012053 0.398941893 0.269469825 0.128395923
 [57] 0.183784182 0.044026248 0.299050353 0.391241424 0.151763476 0.308618457 0.398854064 0.164238186
 [65] 0.247213665 0.171499988 0.162905118 0.220206821 0.347788264 0.338269967 0.267998510 0.391359635
 [73] 0.396530559 0.320255774 0.169151618 0.252836459 0.259766972 0.395515933 0.231694667 0.134013696
 [81] 0.395509755 0.394290213 0.311963592 0.355775391 0.122693881 0.072890003 0.257311129 0.320089837
 [89] 0.363368668 0.249414172 0.380394542 0.205330877 0.204147458 0.376235326 0.380064332 0.381959229
 [97] 0.312948351 0.373478535 0.315240301 0.186672691

Here, x is the value or vector of values at which to evaluate the PDF. The mean parameter specifies the mean of the normal distribution (default is 0), and the sd parameter specifies the standard deviation of the normal distribution (default is 1).

How to use the image function in R » Data Science Tutorials

For example, to compute the PDF of a normal distribution with a mean of 2 and a standard deviation of 3 at the values 0, 1, 2, 3, and 4, you can use the following code:

dnorm(c(0, 1, 2, 3, 4), mean = 2, sd = 3)

[1] 0.1064827 0.1257944 0.1329808 0.1257944 0.1064827

This will produce the PDF values at the specified values, with the output being a vector of the same length as the input values.

To plot the probability density function of the normal distribution, you can use the dnorm() function. The dnorm() function takes two arguments: the value(s) for which to calculate the density, and the mean and standard deviation of the distribution.

Here is an example of plotting the probability density function of a normal distribution with a mean of 0 and a standard deviation of 1:

# Generate a sequence of 100 x-values from -4 to 4
x <- seq(-4, 4, length.out = 100)

# Calculate the PDF of the standard normal distribution
y <- dnorm(x, mean = 0, sd = 1)

# Plot the PDF of the normal distribution
plot(x, y, type = "l")

3. The `pnorm()` function in R

In R, the pnorm() function is used to calculate the cumulative distribution function (CDF) of a normal distribution. The CDF gives the probability that a random variable is less than or equal to a given value.

The pnorm() function takes two arguments: the value(s) for which to calculate the CDF, and the mean and standard deviation of the normal distribution. By default, pnorm() calculates the area to the left of the given value(s).

Here is an example of using the pnorm() function to calculate the CDF of a normal distribution with a mean of 0 and a standard deviation of 1:

# Calculate the CDF for x = 1
pnorm(1, mean = 0, sd = 1)

# Calculate the CDF for x = -1
pnorm(-1, mean = 0, sd = 1)

# Calculate the CDF for x = 0
pnorm(0, mean = 0, sd = 1)

The output will be the probabilities that a random variable from the given normal distribution is less than or equal to the given values (1, -1, and 0, respectively).

If you want to calculate the area to the right of a given value(s), you can set the lower.tail argument to FALSE. For example:

# Calculate the area to the right of x = 1
pnorm(1, mean = 0, sd = 1, lower.tail = FALSE)

# Calculate the area to the right of x = -1
pnorm(-1, mean = 0, sd = 1, lower.tail = FALSE)

4. The `qnorm()` function in R

In R, the qnorm() function is used to calculate the quantiles of the normal distribution. This function takes two arguments:

p – the probability of getting a value less than or equal to the quantile
mean and sd – the mean and standard deviation of the normal distribution (default is mean = 0 and sd = 1)

The output of the function is the quantile for the given probability p. For example, to find the quantile for a probability of 0.95 in a normal distribution with a mean of 10 and standard deviation of 2, you can use the following code:

qnorm(0.95, mean = 10, sd = 2)

Summary

The dnorm() function calculates the height of the normal distribution at a specific point or set of points, the rnorm() function generates random numbers from a normal distribution, the pnorm() function calculates the CDF of a normal distribution and the qnorm() function computes the quantile function of a normal distribution.

These functions are useful for different purposes in statistical analysis and data science.

Likelihood Ratio Test in R with Example »