Skip to content

Data Science Tutorials

For Data Science Learners

  • How to test the significance of a mediation effect
    How to test the significance of a mediation effect R
  • Arrange Data by Month in R
    Arrange Data by Month in R with example R
  • Locate position of patterns in a character string in R R
  • Compare numeric vectors in R R
  • How Do Online Criminals Acquire Sensitive Data
    How Do Online Criminals Acquire Sensitive Data Machine Learning
  • Top Data Science Examples You Should Know 2023
    Top Data Science Applications You Should Know 2023 Machine Learning
  • Box Cox transformation in R
    Box Cox transformation in R R
  • Best AI and Machine Learning Courses
    Best AI and Machine Learning Courses Machine Learning
Checking Missing Values in R

Checking Missing Values in R

Posted on April 27April 30 By Admin No Comments on Checking Missing Values in R

Checking Missing Values in R, we’ll undertake data wrangling, which is the pre-processing and preparation of data.

In fact, practicing data science will consume more than 70% of your time. We’ll only look at a few of the most important commands to make things as simple as possible.

However, you will devote a significant amount of time to twisting your data in various directions. And for that, some valuable R packages have been built, which we’ll look at today.

So, let’s take a look at the slides for pre-processing data with R. And, of course, you’re going to set yourself up first.

Checking Missing Values in R

So the first thing we’ll do is, we’ll look at a command that checks for missing values. And missing values in R are called NA’s. And if you look at the function is.na() in R, here it is.

And it tells you that it’s in the base package, NA is not available, and they’re missing values. So you can check for them and, you’ll also be able to impute or replace them with another value.

So, this is quite important when we get started because some of the functions don’t accept missing data and will have strange behavior.

So for instance, if we start up with an example dataset, a small vector that I build up with the command c(), and I look at this example and then I want to compute say, the mean of

the example. So, it tells me that the result of this computation is NA.

vec<-c(1,2,3,4,NA)
vec
[1]  1  2  3  4 NA
mean(vec)
[1] NA

So, it does give me a result but it gives it the value not available and this happens for two reasons here in fact.

One is that we have some strings which have been mixed in with our actual numbers and it doesn’t know how to compute the mean of strings, and then we actually have some missing values.

Get the first value in each group in R? – Data Science Tutorial

But the output of this is NA.

So if in your function at some point you do a manipulation that gives you an NA, this will perk you all the way down the different results as you go along. So if I have any missing values, it’s going to tell you.

Where are the missing values?

The first is a character that isn’t missing, the second isn’t missing, and the third and fourth aren’t missing, but the fifth is recognized as being missing.

That’s because NA has its own character, which isn’t actually between quotes and represents the NA value.

Furthermore, you must be cautious when importing data because the common value for NA in other software is 9999, which will not be recognized as missing; you must re-code it to make it a missing value.

So, here’s another little example, where we actually have encoded mostly numbers and then one NA, so there are no characters in this. And, if we do well,

is.na(vec)
[1] FALSE FALSE FALSE FALSE  TRUE

Everything is false except the true in the fifth position, according to the output.

So if we do mean(vec), it will return missing since it will return a missing if there is one missing value in any vector.

However, many R functions have this capability, so to remove the missing value, use na.rm, which removes the missing value and sets it to true, then computes the mean for you while disregarding any missing values.

mean(vec,na.rm=TRUE)
2.5

And this is also possible for the median function or many other functions that allow you to do this,

but you have to be careful that if you have some missing values, you’re going to take them out.

How to replace NA, we will discuss in an upcoming post.

R

Post navigation

Previous Post: Best GGPlot Themes You Should Know
Next Post: Dealing With Missing values in R

Related Posts

  • Linear Interpolation in R
    Linear Interpolation in R-approx R
  • Separate a data frame column into multiple columns
    Separate a data frame column into multiple columns-tidyr Part3 R
  • Mastering the tapply() Function in R R
  • Confidence Intervals in R
    Confidence Intervals in R R
  • R Percentage by Group Calculation
    R Percentage by Group Calculation R
  • Anderson-Darling Test in R With Examples
    Anderson-Darling Test in R With Examples R

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Best Prompt Engineering Books
  • Understanding Machine Learning and Data Science
  • Best Git Books
  • Top 5 Books to Learn Data Engineering
  • Mastering R Programming for Data Science: Tips and Tricks
  • About Us
  • Contact
  • Disclaimer
  • Privacy Policy

https://www.r-bloggers.com

  • YouTube
  • Twitter
  • Facebook
  • Course
  • Excel
  • Machine Learning
  • Opensesame
  • R
  • Statistics

Check your inbox or spam folder to confirm your subscription.

  • How to Perform Bootstrapping in R
    How to Perform Bootstrapping in R R
  • How to Join Data Frames for different column names in R
    How to Join Data Frames for different column names in R R
  • Select variables of data frame in R R
  • Data Scientist in 2023
    How to Become a Data Scientist in 2023 Machine Learning
  • How to Label Outliers in Boxplots in ggplot2
    How to Label Outliers in Boxplots in ggplot2? R
  • Data Science Strategies for Improving Customer Experience in R
    Data Science Strategies for Improving Customer Experience in R R
  • What Is the Best Way to Filter by Date in R
    What Is the Best Way to Filter by Date in R? R
  • How to Load the Analysis ToolPak in Excel
    How to Load the Analysis ToolPak in Excel Excel

Privacy Policy

Copyright © 2025 Data Science Tutorials.

Powered by PressBook News WordPress theme