How to Find Unmatched Records in R?, To retrieve all rows in one data frame that do not have matching values in another data frame, use R’s anti_join() function from the dplyr package.
The basic syntax used by this function is as follows.
How to Remove Columns from a data frame in R – Data Science Tutorials
anti_join(df1, df2, by='col_name')
The usage of this syntax is demonstrated in the examples that follow.
Example 1: Use anti_join() with One Column
Suppose we have the two R data frames shown below:
Let’s build data frames
df1 <- data.frame(Q1 = c('a', 'b', 'c', 'd', 'e', 'f'), Q2 = c(152, 514, 114, 218, 322, 323))
df2 <- data.frame(Q1 = c('a', 'a', 'a', 'b', 'b', 'b'), Q3 = c(523, 324, 233, 134, 237, 141))
To return all rows in the first data frame that don’t have a matching Q1 in the second data frame, we can use the anti_join() function.
Bind together two data frames by their rows or columns in R (datasciencetut.com)
library(dplyr)
use the ‘Q1’ column to perform anti join
anti_join(df1, df2, by='Q1')
Q1 Q2 1 c 114 2 d 218 3 e 322 4 f 323
We can see that there are exactly 4 Q1’s from the first data frame that does not have a matching Q1 name in the second data frame.
Example 2: Use anti_join() with Multiple Columns
Suppose we have the two R data frames shown below.
How to Join Data Frames for different column names in R (datasciencetut.com)
Let’s create a data frames
df1 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'), position=c('G', 'G', 'F', 'G', 'F', 'C'), points=c(152, 114, 219, 254, 356, 441))
df2 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'), position=c('G', 'G', 'C', 'G', 'F', 'F'), points=c(142, 214, 319, 133, 517, 422))
All rows in the first data frame that lack a matching team and position in the second data frame can be returned using the anti_join() function:
library(dplyr)
utilizing the columns for “team” and “position,” perform anti _join.
How to Count Distinct Values in R – Data Science Tutorials
anti_join(df1, df2, by=c('team', 'position'))
team position points 1 A F 219 2 B C 441
We can see that there are exactly two records from the first data frame that do not have a matching team name and position in the second data frame.