Skip to content

Data Science Tutorials

For Data Science Learners

  • Logistic Function in R R
  • How to perform MANOVA test in R
    How to perform the MANOVA test in R? R
  • Steps to Mastering Natural Language Processing
    Steps to Mastering Natural Language Processing Machine Learning
  • How to create a ggalluvial plot in r
    How to create a ggalluvial plot in R? R
  • How to move from Junior Data Scientist
    How to move from Junior Data Scientist Machine Learning
  • Calculate the p-Value from Z-Score in R
    Calculate the p-Value from Z-Score in R R
  • Group By Sum in R
    Group By Sum in R R
  • Load Multiple Packages in R
    Load Multiple Packages in R R
Dealing Missing values in R

Dealing With Missing values in R

Posted on April 28April 30 By Admin No Comments on Dealing With Missing values in R

Dealing With Missing Values in R, one of the issues is that when you have a large matrix of data and some of the columns have a few missing values, it might be difficult to work with.

Checking Missing Values in R – (datasciencetut.com)

You won’t be able to perform a lot of multivariate or bivariate studies. As a result, we frequently want to be able to substitute missing values for them known as data imputation.

So that’s all we are going to say about it. The data imputation, necessitates the installation of a package, thus the package.

So there’s a package called a mice, you should enter the below code if not installed the package.

install.packages("mice")

Now, in the package, it has some examples so here we have

library(mice)

and we’ll have data, therefore we’ll use mammal sleep data as an example. Then you can inquire about mammal sleep data. Let’s have a look at mammalsleep.

?mammalseep

What does the data on mammal sleep tell us?

We have a few animal species. Body mass index, brain mass index, slow-wave sleep, paradoxical sleep, total sleep, maximum lifespan, gestation time, predation index, sleep exposure index, and overall danger index is all factors to consider.

So, now that we have the data, we can examine it. So mammalsleep, just to tell you how many rows and columns, it’s a little bigger than what would fit on the screen.

head(mammalsleep)

So, because there are 62 rows, let’s start at the beginning.

you’ll see the first few observations.

                                  species       bw    brw sws  ps   ts  mls  gt pi sei odi
1          African elephant 6654.000 5712.0  NA  NA  3.3 38.6 645  3   5   3
2 African giant pouched rat    1.000    6.6 6.3 2.0  8.3  4.5  42  3   1   3
3                Arctic Fox    3.385   44.5  NA  NA 12.5 14.0  60  1   1   1
4    Arctic ground squirrel    0.920    5.7  NA  NA 16.5   NA  25  5   2   3
5            Asian elephant 2547.000 4603.0 2.1 1.8  3.9 69.0 624  3   5   4
6                    Baboon   10.550  179.5 9.1 0.7  9.8 27.0 180  4   4   4

So far, we’ve established the species, and we’ll begin with the African elephant. Various mammals descend here, and body weight is highly variable, as is evidenced by several of the data related to sleep.

So, for example, the slow-wave sleep measurement has not been done because it is unlikely to be practical to do so in the wild, on an African elephant, example.

As a result, the missing measures are identified. They are instantly recognizable.

You’ve got NA, NA, NA, and so on.

dim(mammalsleep)
[1] 62 11

So the function which tells me how many missing variables there are is called nic().

nic(mammalsleep)

And now it says there are 20 of us. As a result, nic() is the number of the absence of clarity.

Is there at least one NA in that row? So the number of incomplete cases is tested across every row.

As a result, we can see that 20 of the 62 examples are missing.

When we removed all of the data with missing variables, we were left with 42 instead of 62.

That accounts for almost a third of the data. Now, ignoring anything about those observations can be destructive to the entire study, so we have the means to accomplish what’s known as imputation.

Now, the way it works is that you have to look for these missing data in some method. So, in mice, we have a function that informs us where they are.

So we have md.pattern, which tells me about the missing variable pattern in mammalsleep.

md.pattern(mammalsleep)
 species bw brw pi sei odi ts mls gt ps sws   
42       1  1   1  1   1   1  1   1  1  1   1  0
9        1  1   1  1   1   1  1   1  1  0   0  2
3        1  1   1  1   1   1  1   1  0  1   1  1
2        1  1   1  1   1   1  1   0  1  1   1  1
1        1  1   1  1   1   1  1   0  1  0   0  3
1        1  1   1  1   1   1  1   0  0  1   1  2
2        1  1   1  1   1   1  0   1  1  1   0  2
2        1  1   1  1   1   1  0   1  1  0   0  3
         0  0   0  0   0   0  4   4  4 12  14 38

So we have 42 observations with no missing data, indicating that this is, in some ways, counting the entire cases.

As a result, we’ve arrived at number 42, where you’ll find entire cases on everything. We have two mls files that are missing, as well as one gt file that is missing.

What exactly are we on the lookout for?

Why do we pay attention to these patterns?

When trying to impute data, you shouldn’t have too many blocks of variables that are all missing simultaneously.

So the maximum you have is three observations for which we have ts, ps, and these are the sleeping variables that are present.

We also have several with the mls, ps, and sws loaded. As a result, the patterns aren’t overly blocky.

And that’s what you’re looking for when you’re trying to figure out what’s lacking. If they’re too many in blocks, we refer to them as systematic patterns, and we refer to them as missing not at random, or MNAR, as opposed to missing at random, or MCAR.

And so that’s something you look for when you’re just trying to figure out if you can impute them.

And so, with the help of the function mice, you should look into the actual imputation; they have a variety of methods for doing so, and they show you how to accomplish it just by using the mice function.

So we could perform imp, which stands for imputed data, and mice(mammalsleep), which is an iterative procedure that takes care of obtaining local averages, and this is because the lines emerge one by one.

imp<-mice(mammalsleep)
iter imp variable
  1   1  sws*  ps*  ts*  mls*  gt*
  1   2  sws*  ps*  ts*  mls*  gt*
  1   3  sws*  ps*  ts*  mls*  gt*
  4   1  sws*  ps*  ts*  mls*  gt*
   .....................................................

As a result, it makes an educated guess at a reasonable value.

And now, if I look at the values returned by head(imp), you can see the results.

So, for example, for sws, you guessed that part of the values throughout a number of permutations are imputed, and this is the original data.

So we have the original data and the data that is missing. So there was the $data and the $call.

As you can see, we have the actual data that the imputation function returns, as well as the imputed values for when it happens, whether as most of the imputation.

This is the total number of steps it took. Will catch up with some other interesting posts soon…

R

Post navigation

Previous Post: Checking Missing Values in R
Next Post: Detecting and Dealing with Outliers: First Step

Related Posts

  • Multiple regression model in R R
  • Extract values from vector in R: dplyr R
  • Extract columns of data frame in R R
  • Filter Using Multiple Conditions in R
    Filter Using Multiple Conditions in R R
  • Creating a Histogram of Two Variables in R R
  • Replace NA with Zero in R
    Replace NA with Zero in R R

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Type II Errors in R
  • Best Prompt Engineering Books
  • Understanding Machine Learning and Data Science
  • Best Git Books
  • Top 5 Books to Learn Data Engineering
  • About Us
  • Contact
  • Disclaimer
  • Privacy Policy

https://www.r-bloggers.com

  • YouTube
  • Twitter
  • Facebook
  • Course
  • Excel
  • Machine Learning
  • Opensesame
  • R
  • Statistics

Check your inbox or spam folder to confirm your subscription.

  • Predictive Modeling and Data Science
    Predictive Modeling and Data Science Machine Learning
  • How Do Online Criminals Acquire Sensitive Data
    How Do Online Criminals Acquire Sensitive Data Machine Learning
  • Calculating Z-Scores in R: A Step-by-Step Guide R
  • Descriptive statistics vs Inferential statistics
    Descriptive statistics vs Inferential statistics: Guide Statistics
  • Boosting in Machine Learning
    Boosting in Machine Learning:-A Brief Overview Machine Learning
  • How to Perform Bootstrapping in R
    How to Perform Bootstrapping in R R
  • Descriptive Statistics in R R
  • computational gastronomy for data science
    Computational Gastronomy for Data Science Machine Learning

Privacy Policy

Copyright © 2025 Data Science Tutorials.

Powered by PressBook News WordPress theme