Why does dplyr's filter drop NA values from a factor variable?

Multi tool use
Multi tool use











up vote
12
down vote

favorite
2












When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example:



library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1

filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2


This does not seem ideal -- I only wanted to drop rows where var1 == 1.



It looks like this is occurring because any comparison with NA returns NA, which filter then drops. So, for example, filter(dat, !(var1 %in% 1)) produces the correct results. But is there a way to tell filter not to drop the NA values?










share|improve this question




















  • 2




    @akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
    – LyzandeR
    Oct 2 '15 at 14:31






  • 1




    I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
    – wjchulme
    Oct 2 '15 at 16:08








  • 1




    Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
    – Jake Fisher
    Oct 2 '15 at 17:07










  • Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
    – Arthur Yip
    Jun 23 '17 at 1:51















up vote
12
down vote

favorite
2












When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example:



library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1

filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2


This does not seem ideal -- I only wanted to drop rows where var1 == 1.



It looks like this is occurring because any comparison with NA returns NA, which filter then drops. So, for example, filter(dat, !(var1 %in% 1)) produces the correct results. But is there a way to tell filter not to drop the NA values?










share|improve this question




















  • 2




    @akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
    – LyzandeR
    Oct 2 '15 at 14:31






  • 1




    I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
    – wjchulme
    Oct 2 '15 at 16:08








  • 1




    Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
    – Jake Fisher
    Oct 2 '15 at 17:07










  • Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
    – Arthur Yip
    Jun 23 '17 at 1:51













up vote
12
down vote

favorite
2









up vote
12
down vote

favorite
2






2





When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example:



library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1

filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2


This does not seem ideal -- I only wanted to drop rows where var1 == 1.



It looks like this is occurring because any comparison with NA returns NA, which filter then drops. So, for example, filter(dat, !(var1 %in% 1)) produces the correct results. But is there a way to tell filter not to drop the NA values?










share|improve this question















When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example:



library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1

filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2


This does not seem ideal -- I only wanted to drop rows where var1 == 1.



It looks like this is occurring because any comparison with NA returns NA, which filter then drops. So, for example, filter(dat, !(var1 %in% 1)) produces the correct results. But is there a way to tell filter not to drop the NA values?







r dplyr subset na






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 6 hours ago









zx8754

28.3k76394




28.3k76394










asked Oct 2 '15 at 13:45









Jake Fisher

1,5831428




1,5831428








  • 2




    @akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
    – LyzandeR
    Oct 2 '15 at 14:31






  • 1




    I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
    – wjchulme
    Oct 2 '15 at 16:08








  • 1




    Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
    – Jake Fisher
    Oct 2 '15 at 17:07










  • Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
    – Arthur Yip
    Jun 23 '17 at 1:51














  • 2




    @akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
    – LyzandeR
    Oct 2 '15 at 14:31






  • 1




    I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
    – wjchulme
    Oct 2 '15 at 16:08








  • 1




    Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
    – Jake Fisher
    Oct 2 '15 at 17:07










  • Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
    – Arthur Yip
    Jun 23 '17 at 1:51








2




2




@akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
– LyzandeR
Oct 2 '15 at 14:31




@akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
– LyzandeR
Oct 2 '15 at 14:31




1




1




I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
– wjchulme
Oct 2 '15 at 16:08






I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
– wjchulme
Oct 2 '15 at 16:08






1




1




Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
– Jake Fisher
Oct 2 '15 at 17:07




Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
– Jake Fisher
Oct 2 '15 at 17:07












Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
– Arthur Yip
Jun 23 '17 at 1:51




Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
– Arthur Yip
Jun 23 '17 at 1:51












1 Answer
1






active

oldest

votes

















up vote
17
down vote



accepted










You could use this:



 filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>


And it won't.



Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:



test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})


This test above was taken from the tests for filter from github.






share|improve this answer



















  • 2




    Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
    – Heisenberg
    Dec 9 '16 at 22:08






  • 1




    @Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
    – LyzandeR
    Dec 10 '16 at 18:35











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f32908589%2fwhy-does-dplyrs-filter-drop-na-values-from-a-factor-variable%23new-answer', 'question_page');
}
);

Post as a guest
































1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
17
down vote



accepted










You could use this:



 filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>


And it won't.



Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:



test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})


This test above was taken from the tests for filter from github.






share|improve this answer



















  • 2




    Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
    – Heisenberg
    Dec 9 '16 at 22:08






  • 1




    @Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
    – LyzandeR
    Dec 10 '16 at 18:35















up vote
17
down vote



accepted










You could use this:



 filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>


And it won't.



Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:



test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})


This test above was taken from the tests for filter from github.






share|improve this answer



















  • 2




    Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
    – Heisenberg
    Dec 9 '16 at 22:08






  • 1




    @Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
    – LyzandeR
    Dec 10 '16 at 18:35













up vote
17
down vote



accepted







up vote
17
down vote



accepted






You could use this:



 filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>


And it won't.



Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:



test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})


This test above was taken from the tests for filter from github.






share|improve this answer














You could use this:



 filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>


And it won't.



Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:



test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})


This test above was taken from the tests for filter from github.







share|improve this answer














share|improve this answer



share|improve this answer








edited Oct 2 '15 at 23:24

























answered Oct 2 '15 at 13:58









LyzandeR

27.2k114767




27.2k114767








  • 2




    Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
    – Heisenberg
    Dec 9 '16 at 22:08






  • 1




    @Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
    – LyzandeR
    Dec 10 '16 at 18:35














  • 2




    Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
    – Heisenberg
    Dec 9 '16 at 22:08






  • 1




    @Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
    – LyzandeR
    Dec 10 '16 at 18:35








2




2




Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08




Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08




1




1




@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35




@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f32908589%2fwhy-does-dplyrs-filter-drop-na-values-from-a-factor-variable%23new-answer', 'question_page');
}
);

Post as a guest




















































































MxDa201sTd Y kY1K5RW1NdxY5I8XYpkVLPpeugplT 4dR,eoKIYJJNSk,i1ONRmPqQ,9 p56csRM8hiau M,qN2meNO
tn,P,zYzWOwjt 5DE38RzWHk5ObD XsB4,QmNaAop sBJ9dOt V vPq1FRRkZmcd XCHT44nFzEE5x2RJ2KVugG 83R8gdB,zwov2vAu2s9kiy

Popular posts from this blog

How to pass form data using jquery Ajax to insert data in database?

Guess what letter conforming each word

Run scheduled task as local user group (not BUILTIN)