How to Count Distinct Values in R

How to Count Distinct Values in R?, using the n_distinct() function from dplyr, you can count the number of distinct values in an R data frame using one of the following methods.

With the given data frame, the following examples explain how to apply each of these approaches in practice.

Hypothesis Testing Examples-Quick Overview – Data Science Tutorials

Let’s make a data frame

df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(106, 106, 108, 110, 209, 209, 122, 212),
                 assists=c(203, 206, 204, 202, 24, 25, 125, 119))
df

   team points assists
1    A    106     203
2    A    106     206
3    A    108     204
4    A    110     202
5    B    209      24
6    B    209      25
7    B    122     125
8    B    212     119

Approach 1: Count Distinct Values in One Column

The following code demonstrates how to count the number of distinct values in the ‘team’ column using n distinct().

What is Ad Hoc Analysis? – Data Science Tutorials

count the number of distinct values in the ‘team’ column

library(dplyr)
n_distinct(df$team)
[1] 2

In the ‘team’ column, there are two separate values.

Approach 2: Count Distinct Values in All Columns

The following code demonstrates how to count the number of unique values in each column of the data frame using the sapply() and n distinct() functions.

count the number of distinct values in each column

sapply(df, function(x) n_distinct(x))

    team  points assists
      2       6       8

We can observe the following from the output:

In the ‘team’ column, there are two separate values.

Arrange the rows in a specific sequence in R – Data Science Tutorials

In the ‘points’ column, there are 6 different values.

The ‘assists’ column has 8 different values.

Approach 3: Count Distinct Values by Group

The following code demonstrates how to count the number of distinct values by group using the n distinct() function.

count the number of different ‘points’ values by ‘team’

df %>%
  group_by(team) %>%
  summarize(distinct_points = n_distinct(points))

   team  distinct_points
  <chr>           <int>
1 A                   3
2 B                   3

We can observe the following from the output:

For team A, there are three different point values.

How to perform One-Sample Wilcoxon Signed Rank Test in R? – Data Science Tutorials

For team B, there are three different point values.