Extract patterns in R, R’s str extract() function can be used to extract matching patterns from strings. It is part of the stringr package.
The syntax for this function is as follows:
str_extract(string, pattern)
where:
string: Character vector
pattern: Pattern to extract
The practical application of this function is demonstrated in the examples that follow.
Data Science Challenges in R Programming Language
Example 1: Take a String and Extract One Pattern
The R code below demonstrates how to separate the word “for” from a specific string.
library(stringr)
Let’s define string
string <- "datascience.com for data science articles"
Now we can extract “for” from string
str_extract(string, "for") [1] "for"
The pattern “for” was successfully extracted from the string.
How to add columns to a data frame in R – Data Science Tutorials
Note that we will simply get NA if we try to extract a pattern that isn’t present in the string.
Example 2: Take String Data and Extract Numeric Values
Use the regex d+ to extract just numerical values from a text using the following code.
library(stringr)
Now we can define string
string <- "There are 100 phones over there"
extract only numeric values from string
Triangular Distribution in R – Data Science Tutorials
str_extract(string, "\\d+") [1] "100"
Example 3: Take Strings from a Vector and Extract Characters
The code below demonstrates how to extract only characters from a vector of strings using the regex [a-z]+.
Let’s define a vector of strings
strings <- c("3 phones", "3 battery", "7 pen")
Now let’s try to extract only characters from each string in the vector
str_extract(strings, "[a-z]+") [1] "phones" "battery" "pen"
Take note that each string’s characters are the only ones that are returned.