Skip to content

Data Science Tutorials

  • Home
  • R
  • Statistics
  • Course
  • Machine Learning
  • Guest Blog
  • Contact
  • About Us
  • Toggle search form
  • Difference between R and Python
    Difference between R and Python R
  • The Multinomial Distribution in R
    The Multinomial Distribution in R R
  • Statistical test assumptions and requirements
    Statistical test assumptions and requirements Statistics
  • Select the First Row by Group in R
    Select the First Row by Group in R R
  • How to Use Spread Function in R
    How to Use Spread Function in R?-tidyr Part1 R
  • gganatogram Plot in R
    How to create Anatogram plot in R R
  • Crosstab calculation in R
    Crosstab calculation in R R
  • Top 10 Data Visualisation Tools
    Top 10 Data Visualisation Tools Every Data Science Enthusiast Must Know Course
Create new variables from existing variables in R

Create new variables from existing variables in R

Posted on June 21June 19 By Jim No Comments on Create new variables from existing variables in R
Tweet
Share
Share
Pin

Create new variables from existing variables in R?. To create new variables from existing variables, use the case when() function from the dplyr package in R.

What Is the Best Way to Filter by Date in R? – Data Science Tutorials

The following is the fundamental syntax for this function.

library(dplyr)
df %>%
  mutate(new_var = case_when(var1 < 25 ~ 'low',
                             var2 < 35 ~ 'med',
                             TRUE ~ 'high'))

It’s worth noting that TRUE is the same as an “else” expression.

With the given data frame, the following examples demonstrate how to utilize this function in practice.

Calculate the P-Value from Chi-Square Statistic in R.Data Science Tutorials

Let’s create a data frame

df <- data.frame(player = c('A', 'B', 'C', 'D', 'E', 'F'),
                 position = c('R1', 'R2', 'R3', 'R4', 'R5', NA),
                 points = c(102, 105, 219, 322, 232, NA),
                 assists = c(405, 407, 527, 412, 211, NA))

Now we can view the data frame

df
  player position points assists
1      A       R1    102     405
2      B       R2    105     407
3      C       R3    219     527
4      D       R4    322     412
5      E       R5    232     211
6      F     <NA>     NA      NA

Example 1: Create New Variable from One Existing Variable

The following code demonstrates how to make a new variable named quality with values generated from the points column.

Test for Normal Distribution in R-Quick Guide – Data Science Tutorials

df %>%
  mutate(quality = case_when(points > 120 ~ 'high',
                             points > 215 ~ 'med',
                             TRUE ~ 'low' ))
    player position points assists quality
1      A       R1    102     405     low
2      B       R2    105     407     low
3      C       R3    219     527    high
4      D       R4    322     412    high
5      E       R5    232     211    high
6      F     <NA>     NA      NA     low

The case when() function created the values for the new column in the following way.

The value in the quality column is “high” if the value in the points column is greater than 120.

If the score in the points column is greater than 215, the quality column value will be “med.”

Count Observations by Group in R – Data Science Tutorials

Otherwise, if the points column value is less than or equal to 215 (or a missing value like NA), the quality column value is “poor.”

Example 2: Create New Variable from Multiple Variables

The following code demonstrates how to make a new variable named quality with values drawn from both the points and assists columns.

df %>%
  mutate(quality = case_when(points > 215 & assists > 10 ~ 'great',
                             points > 215 & assists > 5 ~ 'good',
                             TRUE ~ 'average' ))
  player position points assists quality
1      A       R1    102     405 average
2      B       R2    105     407 average
3      C       R3    219     527   great
4      D       R4    322     412   great
5      E       R5    232     211   great
6      F     <NA>     NA      NA average

It’s worth noting that the is.na() function can also be used to explicitly assign strings to NA values.

Best GGPlot Themes You Should Know – Data Science Tutorials

df %>%
  mutate(quality = case_when(is.na(points) ~ 'missing',
                             points > 215 & assists > 100 ~ 'great',
                             points > 215 & assists > 150 ~ 'good',
                             TRUE ~ 'average' ))
   player position points assists quality
1      A       R1    102     405 average
2      B       R2    105     407 average
3      C       R3    219     527   great
4      D       R4    322     412   great
5      E       R5    232     211   great
6      F     <NA>     NA      NA missing

Check your inbox or spam folder to confirm your subscription.

Tweet
Share
Share
Pin
R Tags:dplyr

Post navigation

Previous Post: How to Find Unmatched Records in R
Next Post: Bind together two data frames by their rows or columns in R

Related Posts

  • Gamma distribution in R
    Gamma distribution in R R
  • glm function in R
    glm function in r-Generalized Linear Models R
  • Top 10 online data science programmes
    Top 10 online data science programs Course
  • Two-Way ANOVA Example in R
    How to perform One-Sample Wilcoxon Signed Rank Test in R? R
  • Error: Can't rename columns that don't exist
    Can’t rename columns that don’t exist R
  • How to Use Italic Font in R
    How to Use Italic Font in R R

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • About Us
  • Contact
  • Disclaimer
  • Guest Blog
  • Privacy Policy
  • YouTube
  • Twitter
  • Facebook
  • Defensive Programming Strategies in R
  • Plot categorical data in R
  • Top Data Modeling Tools for 2023
  • Ogive Graph in R
  • Is R or Python Better for Data Science in Bangalore

Check your inbox or spam folder to confirm your subscription.

  • Data Scientist Career Path Map in Finance
  • Is Python the ideal language for machine learning
  • Convert character string to name class object
  • How to play sound at end of R Script
  • Pattern Searching in R
  • Interactive 3d plot in R
    Interactive 3d plot in R-Quick Guide R
  • How to Perform Bootstrapping in R
    How to Perform Bootstrapping in R R
  • Top Reasons To Learn R
    Top Reasons To Learn R in 2023 Machine Learning
  • Convert Multiple Columns to Numeric in R
    Convert Multiple Columns to Numeric in R R
  • How to perform kruskal wallis test in r
    How to perform the Kruskal-Wallis test in R? R
  • How to Replace Inf Values with NA in R
    How to Replace Inf Values with NA in R R
  • Crosstab calculation in R
    Crosstab calculation in R R
  • How to create contingency tables in R
    How to create contingency tables in R? R

Copyright © 2023 Data Science Tutorials.

Powered by PressBook News WordPress theme