Descriptive Statistics in R: A Step-by-Step Guide

Descriptive statistics are a crucial part of data analysis, as they provide a snapshot of the central tendency and variability of a dataset.

In R, there are two primary functions that can be used to calculate descriptive statistics: `summary()`

and `sapply()`

.

In this article, we will explore how to use these functions to gain a deeper understanding of our data.

Replace first match in R » Data Science Tutorials

**Method 1: Using the summary() Function**

The `summary()`

function is a simple and efficient way to calculate various descriptive statistics for each variable in a data frame. To use this function, simply call it on your data frame, like so:

summary(my_data)

The `summary()`

function will return a variety of values for each variable, including the minimum, first quartile, median, mean, third quartile, and maximum.

For example, let’s say we have the following data frame:

df <- data.frame(x=c(1, 4, 4, 5, 6, 7, 10, 12), y=c(2, 2, 3, 3, 4, 5, 11, 11), z=c(8, 9, 9, 9, 10, 13, 15, 17))

We can use the `summary()`

function to calculate descriptive statistics for each variable:

summary(df)

This will output:

x y z Min. :1.000 Min. :2.000 Min. :8.00 1st Qu.:4.000 1st Qu.:2.750 1st Qu.:9.00 Median :5.500 Median :3.500 Median :9.50 Mean :6.125 Mean :5.125 Mean :11.25 3rd Qu.:7.750 3rd Qu.:6.500 3rd Qu.:13.50 Max. :12.000 Max. :11.000 Max. :17.00

**Method 2: Using the sapply() Function**

The `sapply()`

function is a more versatile option for calculating descriptive statistics. It allows us to specify a custom function to apply to each variable in the data frame.

For example, we can use the `sapply()`

function to calculate the standard deviation of each variable:

sapply(df, sd, na.rm=TRUE)

This will output:

x y z 3.522884 3.758324 3.327376

We can also use the `sapply()`

function to calculate more complex descriptive statistics by defining a custom function within it.

For example, let’s say we want to calculate the range of each variable:

sapply(df, function(df) max(df)-min(df), na.rm=TRUE)

This will output:

x y z

11 9 9

**Conclusion**

In this article, we have explored two methods for calculating descriptive statistics in R: the `summary()`

function and the `sapply()`

function.

The `summary()`

function provides a quick and easy way to calculate common descriptive statistics for each variable in a data frame.

The `sapply()`

function offers more flexibility and allows us to define custom functions to calculate more complex descriptive statistics.

By using these functions effectively, we can gain a deeper understanding of our data and make more informed decisions about our analysis and visualization strategies.

- Major Components of Time Series Analysis
- Sample Size Calculation and Power Clinical Trials
- Biases in Statistics Common Pitfalls
- Area Under Curve in R (AUC)
- Filtering Data in R 10 Tips -tidyverse package
- How to Perform Tukey HSD Test in R
- Statistical Hypothesis Testing-A Step by Step Guide
- How to Create Frequency Tables in R
- PCA for Categorical Variables in R
- sweep function in R