Skip to content

Data Science Tutorials

For Data Science Learners

  • Top 7 Skills Required to Become a Data Scientist
    Top 7 Skills Required to Become a Data Scientist Machine Learning
  • How to check regression analysis heteroscedasticity in R
    How to check regression analysis heteroscedasticity in R R
  • How to Find Unmatched Records in R
    How to Find Unmatched Records in R R
  • How to Specify Histogram Breaks in R R
  • Separate a data frame column into multiple columns
    Separate a data frame column into multiple columns-tidyr Part3 R
  • How to Get a Job as a Data Engineer
    How to Get a Job as a Data Engineer? R
  • Detecting and Dealing with Outliers
    Detecting and Dealing with Outliers: First Step R
  • stacked barplot in R
    Stacked Barplot in R R
How to Create Summary Tables in R

How to Create Summary Tables in R

Posted on July 26July 26 By Admin No Comments on How to Create Summary Tables in R

How to Create Summary Tables in R?, The describe() and describeBy() methods from the psych package is the simplest to use for creating summary tables in R.

How to apply a transformation to multiple columns in R?

library(psych)

Let’s create a summary table

describe(df)

We can now create a summary table that is organized by a certain variable.

describeBy(df, group=df$var_name)

The practical application of these features is demonstrated in the examples that follow.

Example 1:- Create a simple summary table

Let’s say we have the R data frame shown below:

make a data frame

df <- data.frame(team=c('P1', 'P1', 'P1', 'P2', 'P2', 'P2', 'P1'),
points=c(150, 222, 229, 421, 330, 211, 219),
rebounds=c(17, 28, 36, 16, 17, 29, 15),
steals=c(11, 151, 152, 73, 85, 79, 58))

Now we can view the data frame

df
   team points rebounds steals
1   P1    150       17     11
2   P1    222       28    151
3   P1    229       36    152
4   P2    421       16     73
5   P2    330       17     85
6   P2    211       29     79
7   P1    219       15     58

For each variable in the data frame, a summary table can be made using the describe() function.

Add new calculated variables to a data frame and drop all existing variables

library(psych)

Now will create a summary table

describe(df)
vars n   mean    sd median trimmed   mad min max range skew kurtosis
team*       1 7   1.43  0.53      1    1.43  0.00   1   2     1 0.23    -2.20
points      2 7 254.57 90.56    222  254.57 16.31 150 421   271 0.71    -1.03
rebounds    3 7  22.57  8.30     17   22.57  2.97  15  36    21 0.44    -1.73
steals      4 7  87.00 50.34     79   87.00 31.13  11 152   141 0.08    -1.47
            se
team*     0.20
points   34.23
rebounds  3.14
steals   19.03

Here’s how to interpret each value in the output:

vars: column number

n: Number of valid cases

mean: The mean value

median: The median value

trimmed: The trimmed mean (default trims 10% of observations from each end)

mad: The median absolute deviation (from the median)

min: The minimum value

max: The maximum value

range: The range of values (max – min)

skew: The skewness

kurtosis: The kurtosis

se: The standard error

Any variable that has an asterisk (*) next to it has been transformed from being categorical or logical to becoming a numerical variable with values that represent the numerical ordering of the values.

How to Use Spread Function in R?-tidyr

We shouldn’t take the summary statistics for the variable “team” which has been transformed into a numerical variable.

Also, take note that the setting fast=TRUE allows you to merely compute the most typical summary statistics.

Now we can create a smaller summary table

describe(df, fast=TRUE)
         vars n   mean    sd min  max range    se
team        1 7    NaN    NA Inf -Inf  -Inf    NA
points      2 7 254.57 90.56 150  421   271 34.23
rebounds    3 7  22.57  8.30  15   36    21  3.14
steals      4 7  87.00 50.34  11  152   141 19.03

Additionally, we have the option of only computing the summary statistics for a subset of the data frame’s variables:

make a summary table using only the columns “points” and “rebounds”

describe(df[ , c('points', 'rebounds')], fast=TRUE)
         vars n   mean    sd min max range    se
points      1 7 254.57 90.56 150 421   271 34.23
rebounds    2 7  22.57  8.30  15  36    21  3.14

Example 2: Make a summary table that is grouped by a certain variable.

The describeBy() function can be used to group the data frame’s summary table by the variable “team” using the following code.

build the summary table with teams as the primary grouping.

How to Use Mutate function in R – Data Science Tutorials

describeBy(df, group=df$team, fast=TRUE)

Descriptive statistics by group

group: P1
         vars n mean    sd min  max range    se
team        1 4  NaN    NA Inf -Inf  -Inf    NA
points      2 4  205 36.91 150  229    79 18.45
rebounds    3 4   24  9.83  15   36    21  4.92
steals      4 4   93 70.22  11  152   141 35.11
-------------------------------------------------------------
group: P2
         vars n   mean     sd min  max range    se
team        1 3    NaN     NA Inf -Inf  -Inf    NA
points      2 3 320.67 105.31 211  421   210 60.80
rebounds    3 3  20.67   7.23  16   29    13  4.18
steals      4 3  79.00   6.00  73   85    12  3.46

The summary statistics for each of the three teams in the data frame are displayed in the output.

Check your inbox or spam folder to confirm your subscription.

R Tags:psych

Post navigation

Previous Post: Convert multiple columns into a single column-tidyr Part4
Next Post: How to Create an Interaction Plot in R?

Related Posts

  • Export output as text in R R
  • How to put margins on tables or arrays in R?
    How to put margins on tables or arrays in R? R
  • Add Footnote to ggplot2 R
  • Filter Using Multiple Conditions in R
    Filter Using Multiple Conditions in R R
  • Dynamic data visualizations in R
    Dynamic data visualizations in R R
  • How to do Conditional Mutate in R
    How to do Conditional Mutate in R? R

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Maximizing Model Accuracy with Train-Test Splits in Machine Learning
  • Type II Errors in R
  • Best Prompt Engineering Books
  • Understanding Machine Learning and Data Science
  • Best Git Books
  • About Us
  • Contact
  • Disclaimer
  • Privacy Policy

https://www.r-bloggers.com

  • YouTube
  • Twitter
  • Facebook
  • Course
  • Excel
  • Machine Learning
  • Opensesame
  • R
  • Statistics

Check your inbox or spam folder to confirm your subscription.

  • similarity measure between two populations
    Similarity Measure Between Two Populations-Brunner Munzel Test Statistics
  • How to Compare Two Excel Sheets for Differences
    How to Compare Two Excel Sheets for Differences Excel
  • Logistic Function in R R
  • what-is-epoch-in-machine-learning
    What is Epoch in Machine Learning? Machine Learning
  • Best Data Visualization Books Course
  • How to Calculate Lag by Group in R
    How to Calculate Lag by Group in R? R
  • Control Chart in Quality Control
    Control Chart in Quality Control-Quick Guide Statistics
  • Top 5 Books to Learn Data Engineering R

Privacy Policy

Copyright © 2025 Data Science Tutorials.

Powered by PressBook News WordPress theme