Skip to content

Data Science Tutorials

For Data Science Learners

  • Data Science Strategies for Improving Customer Experience in R
    Data Science Strategies for Improving Customer Experience in R R
  • learn Hadoop for Data Science
    Learn Hadoop for Data Science Machine Learning
  • similarity measure between two populations
    Similarity Measure Between Two Populations-Brunner Munzel Test Statistics
  • Two-Way ANOVA Example in R
    Two-Way ANOVA Example in R-Quick Guide R
  • Arrange Data by Month in R
    Arrange Data by Month in R with example R
  • Run a specific code block in R R
  • Sort or Order Rank in R R
  • Two-Way ANOVA Example in R
    How to perform One-Sample Wilcoxon Signed Rank Test in R? R
Create new variables from existing variables in R

Create new variables from existing variables in R

Posted on June 21September 4 By Admin 1 Comment on Create new variables from existing variables in R

Create new variables from existing variables in R?. To create new variables from existing variables, use the case when() function from the dplyr package in R.

What Is the Best Way to Filter by Date in R? – Data Science Tutorials

The following is the fundamental syntax for this function.

library(dplyr)
df %>%
  mutate(new_var = case_when(var1 < 25 ~ 'low',
                             var2 < 35 ~ 'med',
                             TRUE ~ 'high'))

It’s worth noting that TRUE is the same as an “else” expression.

With the given data frame, the following examples demonstrate how to utilize this function in practice.

Calculate the P-Value from Chi-Square Statistic in R.Data Science Tutorials

Let’s create a data frame

df <- data.frame(player = c('A', 'B', 'C', 'D', 'E', 'F'),
                 position = c('R1', 'R2', 'R3', 'R4', 'R5', NA),
                 points = c(102, 105, 219, 322, 232, NA),
                 assists = c(405, 407, 527, 412, 211, NA))

Now we can view the data frame

df
  player position points assists
1      A       R1    102     405
2      B       R2    105     407
3      C       R3    219     527
4      D       R4    322     412
5      E       R5    232     211
6      F     <NA>     NA      NA

Example 1: Create New Variable from One Existing Variable

The following code demonstrates how to make a new variable named quality with values generated from the points column.

Test for Normal Distribution in R-Quick Guide – Data Science Tutorials

df %>%
mutate(quality = case_when(points > 215 ~ ‘high’,
points > 120 ~ ‘med’,
TRUE ~ ‘low’ ))
    player position points assists quality
1      A       R1    102     405     low
2      B       R2    105     407     low
3      C       R3    219     527    high
4      D       R4    322     412    high
5      E       R5    232     211    high
6      F     <NA>     NA      NA     low

The case when() function created the values for the new column in the following way.

The value in the quality column is “high” if the value in the points column is greater than 120.

If the score in the points column is greater than 215, the quality column value will be “med.”

Count Observations by Group in R – Data Science Tutorials

Otherwise, if the points column value is less than or equal to 215 (or a missing value like NA), the quality column value is “poor.”

Example 2: Create New Variable from Multiple Variables

The following code demonstrates how to make a new variable named quality with values drawn from both the points and assists columns.

df %>%
  mutate(quality = case_when(points > 215 & assists > 10 ~ 'great',
                             points > 215 & assists > 5 ~ 'good',
                             TRUE ~ 'average' ))
  player position points assists quality
1      A       R1    102     405 average
2      B       R2    105     407 average
3      C       R3    219     527   great
4      D       R4    322     412   great
5      E       R5    232     211   great
6      F     <NA>     NA      NA average

It’s worth noting that the is.na() function can also be used to explicitly assign strings to NA values.

Best GGPlot Themes You Should Know – Data Science Tutorials

df %>%
  mutate(quality = case_when(is.na(points) ~ 'missing',
                             points > 215 & assists > 100 ~ 'great',
                             points > 215 & assists > 150 ~ 'good',
                             TRUE ~ 'average' ))
   player position points assists quality
1      A       R1    102     405 average
2      B       R2    105     407 average
3      C       R3    219     527   great
4      D       R4    322     412   great
5      E       R5    232     211   great
6      F     <NA>     NA      NA missing

Check your inbox or spam folder to confirm your subscription.

R Tags:dplyr

Post navigation

Previous Post: How to Find Unmatched Records in R
Next Post: Bind together two data frames by their rows or columns in R

Related Posts

  • Best Git Books R
  • one-sample-proportion-test-in-r
    One sample proportion test in R-Complete Guide R
  • display the last value of each line in ggplot
    How to add labels at the end of each line in ggplot2? R
  • How to Find Optimal Clusters in R, K-means clustering is one of the most widely used clustering techniques in machine learning.
    How to Find Optimal Clusters in R? R
  • Calculate the p-Value from Z-Score in R
    Calculate the p-Value from Z-Score in R R
  • Data Science Challenges in R Programming Language
    Data Science Challenges in R Programming Language Machine Learning

Comment (1) on “Create new variables from existing variables in R”

  1. Barry Gribben says:
    August 30 at 3:53 am

    Hi there – isn’t that example code wrong? Might be already pointed out. Gives correct result but only by chance !
    df %>%
    mutate(quality = case_when(points > 120 ~ ‘high’,
    points > 215 ~ ‘med’,
    TRUE ~ ‘low’ ))
    should be
    > df %>%
    + mutate(quality = case_when(points > 215 ~ ‘high’,
    + points > 120 ~ ‘med’,
    + TRUE ~ ‘low’ ))

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Best Prompt Engineering Books
  • Understanding Machine Learning and Data Science
  • Best Git Books
  • Top 5 Books to Learn Data Engineering
  • Mastering R Programming for Data Science: Tips and Tricks
  • About Us
  • Contact
  • Disclaimer
  • Privacy Policy

https://www.r-bloggers.com

  • YouTube
  • Twitter
  • Facebook
  • Course
  • Excel
  • Machine Learning
  • Opensesame
  • R
  • Statistics

Check your inbox or spam folder to confirm your subscription.

  • How to Create Summary Tables in R
    How to Create Summary Tables in R R
  • Box Cox transformation in R
    Box Cox transformation in R R
  • Group By Maximum in R
    Group By Maximum in R R
  • How to Specify Histogram Breaks in R R
  • Duplicate and concatenate in R R
  • Convert characters to time in R R
  • Lottery Prediction Using Machine Learning
    Lottery Prediction Using Machine Learning Machine Learning
  • Run a specific code block in R R

Privacy Policy

Copyright © 2025 Data Science Tutorials.

Powered by PressBook News WordPress theme