Skip to content

Data Science Tutorials

  • Home
  • R
  • Statistics
  • Course
  • Machine Learning
  • Guest Blog
  • Contact
  • About Us
  • Toggle search form
  • How do augmented analytics work
    How do augmented analytics work? R
  • How to Use the Multinomial Distribution in R
    How to Use the Multinomial Distribution in R? R
  • How to perform MANOVA test in R
    How to perform the MANOVA test in R? R
  • Change ggplot2 Theme Color in R
    Change ggplot2 Theme Color in R ggthemr Package R
  • Augmented Dickey-Fuller Test in R
    Augmented Dickey-Fuller Test in R R
  • Crosstab calculation in R
    Crosstab calculation in R R
  • Ad Hoc Analysis
    What is Ad Hoc Analysis? Statistics
  • OLS Regression in R
    OLS Regression in R R
How to Label Outliers in Boxplots in ggplot2

How to Label Outliers in Boxplots in ggplot2?

Posted on August 19August 19 By Jim No Comments on How to Label Outliers in Boxplots in ggplot2?
Tweet
Share
Share
Pin

How to Label Outliers in Boxplots in ggplot2, This article offers a detailed illustration of how to name outliers in ggplot2 boxplots.

Step 1: Construct the data frame.

Create the following data frame first, which will include details on the 60 distinct basketball players who played for three separate teams and scored points.

How to add labels at the end of each line in ggplot2? (datasciencetut.com)

Make this illustration repeatable.

set.seed(123)

Now we can create a data frame

df <- data.frame(team=rep(c('A', 'B', 'C'), each=20),
                 player=rep(LETTERS[1:20], times=3),
                 points=round(rnorm(n=60, mean=30, sd=10), 2))

Let’s view the head of the data frame

head(df)
   team player points
1    A      A  37.84
2    A      B  42.60
3    A      C  40.96
4    A      D   5.78
5    A      E  37.65
6    A      F  24.98

Step 2: Define a Function to Identify Outliers

An observation is considered an outlier in ggplot2 if it satisfies any of the following two criteria:

The observation falls within the first quartile by 1.5 times the interquartile range (Q1)

The observation exceeds the third quartile by 1.5 times the interquartile range (Q3).

If an observation satisfies any of these two criteria, we can build the following function in the R language to classify it as an outlier.

Change ggplot2 Theme Color in R- Data Science Tutorials

findoutlier <- function(x) {
  return(x < quantile(x, .25) - 1.5*IQR(x) | x > quantile(x, .75) + 1.5*IQR(x))
}

Step 3: In ggplot2, label outliers in boxplots

The next step is to use the code below to label outliers in ggplot2 boxplots:

library(ggplot2)
library(dplyr)

to the data frame, including a new column that shows if each observation is an outlier.

df <- df %>%
        group_by(team) %>%
        mutate(outlier = ifelse(findoutlier(points), points, NA))

Now we can create a box plot of points by team and label outliers

ggplot(df, aes(x=team, y=points)) +
  geom_boxplot() +
  geom_text(aes(label=outlier), na.rm=TRUE, hjust=-.5)

Please take note that we may alternatively classify these outliers using a different variable.

To label the outliers based on the player name instead, we could, for instance, switch out points for players in the mutate() function.

library(ggplot2)
library(dplyr)
df <- df %>%
        group_by(team) %>%
        mutate(outlier = ifelse(findoutlier(points), player, NA))

build a box plot of the team’s points and identify outliers.

Best GGPlot Themes You Should Know – Data Science Tutorials

ggplot(df, aes(x=team, y=points)) +
  geom_boxplot() +
  geom_text(aes(label=outlier), na.rm=TRUE, hjust=-.5)

The outlier on team A now has a label of N and the outlier on team B now has a label of D, since these represent the player names who have outlier values for points.

Check your inbox or spam folder to confirm your subscription.

Tweet
Share
Share
Pin
R

Post navigation

Previous Post: Best Books About Data Analytics
Next Post: Is Data Science a Dying Profession?

Related Posts

  • How do augmented analytics work
    How do augmented analytics work? R
  • pheatmap function in R
    The pheatmap function in R R
  • Two-Way ANOVA Example in R
    How to perform a one-sample t-test in R? R
  • Gamma distribution in R
    Gamma distribution in R R
  • How to put margins on tables or arrays in R?
    How to put margins on tables or arrays in R? R
  • Subset rows based on their integer locations
    Subset rows based on their integer locations-slice in R R

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • About Us
  • Contact
  • Disclaimer
  • Guest Blog
  • Privacy Policy
  • YouTube
  • Twitter
  • Facebook
  • Top 7 Skills Required to Become a Data Scientist
  • Learn Hadoop for Data Science
  • How Do Online Criminals Acquire Sensitive Data
  • Top Reasons To Learn R in 2023
  • Linear Interpolation in R-approx

Check your inbox or spam folder to confirm your subscription.

 https://www.r-bloggers.com
  • Random Forest Machine Learning
    Random Forest Machine Learning Introduction R
  • Subsetting with multiple conditions in R
    Subsetting with multiple conditions in R R
  • Best Books to Learn R Programming
    Best Books to Learn R Programming Course
  • How to Join Multiple Data Frames in R
    How to Join Multiple Data Frames in R R
  • Arrange the rows in a specific sequence in R
    Arrange the rows in a specific sequence in R R
  • Subset rows based on their integer locations
    Subset rows based on their integer locations-slice in R R
  • OLS Regression in R
    OLS Regression in R R
  • How to move from Junior Data Scientist
    How to move from Junior Data Scientist Machine Learning

Copyright © 2023 Data Science Tutorials.

Powered by PressBook News WordPress theme