Skip to content

Data Science Tutorials

  • Home
  • R
  • Statistics
  • Course
  • Contact
  • About Us
  • Toggle search form
  • similarity measure between two populations
    Similarity Measure Between Two Populations-Brunner Munzel Test Statistics
  • Augmented Dickey-Fuller Test in R
    Augmented Dickey-Fuller Test in R R
  • Two Sample Proportions test in R
    Two Sample Proportions test in R-Complete Guide R
  • Best Books on Data Science with Python
    Best Books on Data Science with Python Course
  • Count Observations by Group in R
    Count Observations by Group in R R
  • Best Online Course For Statistics
    Free Best Online Course For Statistics Course
  • Find the Maximum Value by Group in R
    Find the Maximum Value by Group in R R
  • display the last value of each line in ggplot
    How to add labels at the end of each line in ggplot2? R
Checking Missing Values in R

Checking Missing Values in R

Posted on April 27April 30 By Jim No Comments on Checking Missing Values in R
Tweet
Share
Share
Pin

Checking Missing Values in R, we’ll undertake data wrangling, which is the pre-processing and preparation of data.

In fact, practicing data science will consume more than 70% of your time. We’ll only look at a few of the most important commands to make things as simple as possible.

However, you will devote a significant amount of time to twisting your data in various directions. And for that, some valuable R packages have been built, which we’ll look at today.

So, let’s take a look at the slides for pre-processing data with R. And, of course, you’re going to set yourself up first.

Checking Missing Values in R

So the first thing we’ll do is, we’ll look at a command that checks for missing values. And missing values in R are called NA’s. And if you look at the function is.na() in R, here it is.

And it tells you that it’s in the base package, NA is not available, and they’re missing values. So you can check for them and, you’ll also be able to impute or replace them with another value.

So, this is quite important when we get started because some of the functions don’t accept missing data and will have strange behavior.

So for instance, if we start up with an example dataset, a small vector that I build up with the command c(), and I look at this example and then I want to compute say, the mean of

the example. So, it tells me that the result of this computation is NA.

vec<-c(1,2,3,4,NA)
vec
[1]  1  2  3  4 NA
mean(vec)
[1] NA

So, it does give me a result but it gives it the value not available and this happens for two reasons here in fact.

One is that we have some strings which have been mixed in with our actual numbers and it doesn’t know how to compute the mean of strings, and then we actually have some missing values.

Get the first value in each group in R? – Data Science Tutorial

But the output of this is NA.

So if in your function at some point you do a manipulation that gives you an NA, this will perk you all the way down the different results as you go along. So if I have any missing values, it’s going to tell you.

Where are the missing values?

The first is a character that isn’t missing, the second isn’t missing, and the third and fourth aren’t missing, but the fifth is recognized as being missing.

That’s because NA has its own character, which isn’t actually between quotes and represents the NA value.

Furthermore, you must be cautious when importing data because the common value for NA in other software is 9999, which will not be recognized as missing; you must re-code it to make it a missing value.

So, here’s another little example, where we actually have encoded mostly numbers and then one NA, so there are no characters in this. And, if we do well,

is.na(vec)
[1] FALSE FALSE FALSE FALSE  TRUE

Everything is false except the true in the fifth position, according to the output.

So if we do mean(vec), it will return missing since it will return a missing if there is one missing value in any vector.

However, many R functions have this capability, so to remove the missing value, use na.rm, which removes the missing value and sets it to true, then computes the mean for you while disregarding any missing values.

mean(vec,na.rm=TRUE)
2.5

And this is also possible for the median function or many other functions that allow you to do this,

but you have to be careful that if you have some missing values, you’re going to take them out.

How to replace NA, we will discuss in an upcoming post.

Tweet
Share
Share
Pin
R

Post navigation

Previous Post: Best GGPlot Themes You Should Know
Next Post: Dealing With Missing values in R

Related Posts

  • sorting in r
    Sorting in r: sort, order & rank R Functions R
  • Best GGPlot Themes
    Best GGPlot Themes You Should Know R
  • How to Create Summary Tables in R
    How to Create Summary Tables in R R
  • Comparing group means in R
    One way ANOVA Example in R-Quick Guide R
  • How to Count Distinct Values in R
    How to Count Distinct Values in R R
  • Survival Plot in R
    How to Perform a Log Rank Test in R R

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *




  • About Us
  • Contact
  • Disclaimer
  • Privacy Policy
  • YouTube
  • Twitter
  • Facebook
  • Is Data Science a Dying Profession?
  • How to Label Outliers in Boxplots in ggplot2?
  • Best Books About Data Analytics
  • How to Scale Only Numeric Columns in R
  • Best Books to Learn Statistics for Data Science

Check your inbox or spam folder to confirm your subscription.




 https://www.r-bloggers.com
  • Crosstab calculation in R
    Crosstab calculation in R R
  • Best online course for R programming
    Best online course for R programming Course
  • Dealing Missing values in R
    Dealing With Missing values in R R
  • Best GGPlot Themes
    Best GGPlot Themes You Should Know R
  • Detecting and Dealing with Outliers
    Detecting and Dealing with Outliers: First Step R
  • How to Label Outliers in Boxplots in ggplot2
    How to Label Outliers in Boxplots in ggplot2? R
  • Two-Way ANOVA Example in R
    How to perform a one-sample t-test in R? R
  • How to Use Mutate function in R
    How to Use Mutate function in R R

Copyright © 2022 Data Science Tutorials.

Powered by PressBook News WordPress theme