Skip to content

Data Science Tutorials

  • Home
  • R
  • Statistics
  • Course
  • Machine Learning
  • Guest Blog
  • Contact
  • About Us
  • Toggle search form
  • The Uniform Distribution in R
    The Uniform Distribution in R R
  • How to Calculate Ratios in R
    How to Calculate Ratios in R R
  • droplevels in R with examples
    droplevels in R with examples R
  • Remove Columns from a data frame
    How to Remove Columns from a data frame in R R
  • Count Observations by Group in R
    Count Observations by Group in R R
  • How to use image function in R
    How to use the image function in R R
  • How to Find Unmatched Records in R
    How to Find Unmatched Records in R R
  • Separate a data frame column into multiple columns
    Separate a data frame column into multiple columns-tidyr Part3 R
How to Label Outliers in Boxplots in ggplot2

How to Label Outliers in Boxplots in ggplot2?

Posted on August 19August 19 By Jim No Comments on How to Label Outliers in Boxplots in ggplot2?
Tweet
Share
Share
Pin

How to Label Outliers in Boxplots in ggplot2, This article offers a detailed illustration of how to name outliers in ggplot2 boxplots.

Step 1: Construct the data frame.

Create the following data frame first, which will include details on the 60 distinct basketball players who played for three separate teams and scored points.

How to add labels at the end of each line in ggplot2? (datasciencetut.com)

Make this illustration repeatable.

set.seed(123)

Now we can create a data frame

df <- data.frame(team=rep(c('A', 'B', 'C'), each=20),
                 player=rep(LETTERS[1:20], times=3),
                 points=round(rnorm(n=60, mean=30, sd=10), 2))

Let’s view the head of the data frame

head(df)
   team player points
1    A      A  37.84
2    A      B  42.60
3    A      C  40.96
4    A      D   5.78
5    A      E  37.65
6    A      F  24.98

Step 2: Define a Function to Identify Outliers

An observation is considered an outlier in ggplot2 if it satisfies any of the following two criteria:

The observation falls within the first quartile by 1.5 times the interquartile range (Q1)

The observation exceeds the third quartile by 1.5 times the interquartile range (Q3).

If an observation satisfies any of these two criteria, we can build the following function in the R language to classify it as an outlier.

Change ggplot2 Theme Color in R- Data Science Tutorials

findoutlier <- function(x) {
  return(x < quantile(x, .25) - 1.5*IQR(x) | x > quantile(x, .75) + 1.5*IQR(x))
}

Step 3: In ggplot2, label outliers in boxplots

The next step is to use the code below to label outliers in ggplot2 boxplots:

library(ggplot2)
library(dplyr)

to the data frame, including a new column that shows if each observation is an outlier.

df <- df %>%
        group_by(team) %>%
        mutate(outlier = ifelse(findoutlier(points), points, NA))

Now we can create a box plot of points by team and label outliers

ggplot(df, aes(x=team, y=points)) +
  geom_boxplot() +
  geom_text(aes(label=outlier), na.rm=TRUE, hjust=-.5)

Please take note that we may alternatively classify these outliers using a different variable.

To label the outliers based on the player name instead, we could, for instance, switch out points for players in the mutate() function.

library(ggplot2)
library(dplyr)
df <- df %>%
        group_by(team) %>%
        mutate(outlier = ifelse(findoutlier(points), player, NA))

build a box plot of the team’s points and identify outliers.

Best GGPlot Themes You Should Know – Data Science Tutorials

ggplot(df, aes(x=team, y=points)) +
  geom_boxplot() +
  geom_text(aes(label=outlier), na.rm=TRUE, hjust=-.5)

The outlier on team A now has a label of N and the outlier on team B now has a label of D, since these represent the player names who have outlier values for points.

Check your inbox or spam folder to confirm your subscription.

Tweet
Share
Share
Pin
R

Post navigation

Previous Post: Best Books About Data Analytics
Next Post: Is Data Science a Dying Profession?

Related Posts

  • How to Implement the Sklearn Predict Approach
    How to Implement the Sklearn Predict Approach? R
  • What Is the Best Way to Filter by Date in R
    What Is the Best Way to Filter by Date in R? R
  • Two-Way ANOVA Example in R
    Two-Way ANOVA Example in R-Quick Guide R
  • gganatogram Plot in R
    How to create Anatogram plot in R R
  • How to Add a title to ggplot2 Plots in R
    How to Add a caption to ggplot2 Plots in R? R
  • Bind together two data frames by their rows or columns in R
    Bind together two data frames by their rows or columns in R R

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • About Us
  • Contact
  • Disclaimer
  • Guest Blog
  • Privacy Policy
  • YouTube
  • Twitter
  • Facebook
  • Tips for Data Scientist Interview Openings
  • What is Epoch in Machine Learning?
  • Dynamic data visualizations in R
  • How Do Machine Learning Chatbots Work
  • Convex optimization role in machine learning

Check your inbox or spam folder to confirm your subscription.

  • Sampling from the population in R
  • Two of the Best Online Data Science Courses for 2023
  • Process of Machine Learning Optimisation?
  • ggplot2 scale in R (grammar for graphics)
  • ggplot aesthetics in R (Grammer of graphics)
  • Applications of Data Science in Education
    Applications of Data Science in Education Machine Learning
  • How do augmented analytics work
    How do augmented analytics work? R
  • How to convert characters from upper to lower case in R
    How to convert characters from upper to lower case in R? R
  • Best Books on Data Science with Python
    Best Books on Data Science with Python Course
  • Convex optimization role in machine learning
    Convex optimization role in machine learning Machine Learning
  • Autocorrelation and Partial Autocorrelation in Time Series
    Autocorrelation and Partial Autocorrelation in Time Series Statistics
  • Methods for Integrating R and Hadoop
    Methods for Integrating R and Hadoop complete Guide R
  • Gamma distribution in R
    Gamma distribution in R R

Copyright © 2023 Data Science Tutorials.

Powered by PressBook News WordPress theme