Skip to content

Data Science Tutorials

  • Home
  • R
  • Statistics
  • Course
  • Machine Learning
  • Guest Blog
  • Contact
  • About Us
  • Toggle search form
  • Subsetting with multiple conditions in R
    Subsetting with multiple conditions in R R
  • Augmented Dickey-Fuller Test in R
    Augmented Dickey-Fuller Test in R R
  • Get the first value in each group in R
    Get the first value in each group in R? R
  • How Do Online Criminals Acquire Sensitive Data
    How Do Online Criminals Acquire Sensitive Data Machine Learning
  • How to perform MANOVA test in R
    How to perform the MANOVA test in R? R
  • How to add columns to a data frame in R
    How to add columns to a data frame in R R
  • Best Books on Data Science with Python
    Best Books on Data Science with Python Course
  • Load Multiple Packages in R
    Load Multiple Packages in R R
Checking Missing Values in R

Checking Missing Values in R

Posted on April 27April 30 By Jim No Comments on Checking Missing Values in R
Tweet
Share
Share
Pin

Checking Missing Values in R, we’ll undertake data wrangling, which is the pre-processing and preparation of data.

In fact, practicing data science will consume more than 70% of your time. We’ll only look at a few of the most important commands to make things as simple as possible.

However, you will devote a significant amount of time to twisting your data in various directions. And for that, some valuable R packages have been built, which we’ll look at today.

So, let’s take a look at the slides for pre-processing data with R. And, of course, you’re going to set yourself up first.

Checking Missing Values in R

So the first thing we’ll do is, we’ll look at a command that checks for missing values. And missing values in R are called NA’s. And if you look at the function is.na() in R, here it is.

And it tells you that it’s in the base package, NA is not available, and they’re missing values. So you can check for them and, you’ll also be able to impute or replace them with another value.

So, this is quite important when we get started because some of the functions don’t accept missing data and will have strange behavior.

So for instance, if we start up with an example dataset, a small vector that I build up with the command c(), and I look at this example and then I want to compute say, the mean of

the example. So, it tells me that the result of this computation is NA.

vec<-c(1,2,3,4,NA)
vec
[1]  1  2  3  4 NA
mean(vec)
[1] NA

So, it does give me a result but it gives it the value not available and this happens for two reasons here in fact.

One is that we have some strings which have been mixed in with our actual numbers and it doesn’t know how to compute the mean of strings, and then we actually have some missing values.

Get the first value in each group in R? – Data Science Tutorial

But the output of this is NA.

So if in your function at some point you do a manipulation that gives you an NA, this will perk you all the way down the different results as you go along. So if I have any missing values, it’s going to tell you.

Where are the missing values?

The first is a character that isn’t missing, the second isn’t missing, and the third and fourth aren’t missing, but the fifth is recognized as being missing.

That’s because NA has its own character, which isn’t actually between quotes and represents the NA value.

Furthermore, you must be cautious when importing data because the common value for NA in other software is 9999, which will not be recognized as missing; you must re-code it to make it a missing value.

So, here’s another little example, where we actually have encoded mostly numbers and then one NA, so there are no characters in this. And, if we do well,

is.na(vec)
[1] FALSE FALSE FALSE FALSE  TRUE

Everything is false except the true in the fifth position, according to the output.

So if we do mean(vec), it will return missing since it will return a missing if there is one missing value in any vector.

However, many R functions have this capability, so to remove the missing value, use na.rm, which removes the missing value and sets it to true, then computes the mean for you while disregarding any missing values.

mean(vec,na.rm=TRUE)
2.5

And this is also possible for the median function or many other functions that allow you to do this,

but you have to be careful that if you have some missing values, you’re going to take them out.

How to replace NA, we will discuss in an upcoming post.

Tweet
Share
Share
Pin
R

Post navigation

Previous Post: Best GGPlot Themes You Should Know
Next Post: Dealing With Missing values in R

Related Posts

  • How to perform kruskal wallis test in r
    How to perform the Kruskal-Wallis test in R? R
  • Boosting in Machine Learning
    Boosting in Machine Learning:-A Brief Overview Machine Learning
  • Survival Plot in R
    How to Perform a Log Rank Test in R R
  • How to Replace Inf Values with NA in R
    How to Replace Inf Values with NA in R R
  • How to convert characters from upper to lower case in R
    How to convert characters from upper to lower case in R? R
  • How to Avoid Overfitting
    How to Avoid Overfitting? Machine Learning

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • About Us
  • Contact
  • Disclaimer
  • Guest Blog
  • Privacy Policy
  • YouTube
  • Twitter
  • Facebook
  • Defensive Programming Strategies in R
  • Plot categorical data in R
  • Top Data Modeling Tools for 2023
  • Ogive Graph in R
  • Is R or Python Better for Data Science in Bangalore

Check your inbox or spam folder to confirm your subscription.

  • Data Scientist Career Path Map in Finance
  • Is Python the ideal language for machine learning
  • Convert character string to name class object
  • How to play sound at end of R Script
  • Pattern Searching in R
  • Best GGPlot Themes
    Best GGPlot Themes You Should Know R
  • How to compare the performance of different algorithms in R
    How to compare the performance of different algorithms in R? R
  • Algorithm Classifications in Machine Learning
    Algorithm Classifications in Machine Learning Machine Learning
  • Autocorrelation and Partial Autocorrelation in Time Series
    Autocorrelation and Partial Autocorrelation in Time Series Statistics
  • How to Use the Multinomial Distribution in R
    How to Use the Multinomial Distribution in R? R
  • Two-Way ANOVA Example in R
    Two-Way ANOVA Example in R-Quick Guide R
  • How to Calculate Relative Frequencies in R
    How to Calculate Relative Frequencies in R? R
  • How to Use Bold Font in
    How to Use Bold Font in R with Examples R

Copyright © 2023 Data Science Tutorials.

Powered by PressBook News WordPress theme