Why does dplyr's filter drop NA values from a factor variable?











up vote
12
down vote

favorite
2












When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example:



library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1

filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2


This does not seem ideal -- I only wanted to drop rows where var1 == 1.



It looks like this is occurring because any comparison with NA returns NA, which filter then drops. So, for example, filter(dat, !(var1 %in% 1)) produces the correct results. But is there a way to tell filter not to drop the NA values?










share|improve this question




















  • 2




    @akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
    – LyzandeR
    Oct 2 '15 at 14:31






  • 1




    I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
    – wjchulme
    Oct 2 '15 at 16:08








  • 1




    Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
    – Jake Fisher
    Oct 2 '15 at 17:07










  • Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
    – Arthur Yip
    Jun 23 '17 at 1:51















up vote
12
down vote

favorite
2












When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example:



library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1

filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2


This does not seem ideal -- I only wanted to drop rows where var1 == 1.



It looks like this is occurring because any comparison with NA returns NA, which filter then drops. So, for example, filter(dat, !(var1 %in% 1)) produces the correct results. But is there a way to tell filter not to drop the NA values?










share|improve this question




















  • 2




    @akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
    – LyzandeR
    Oct 2 '15 at 14:31






  • 1




    I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
    – wjchulme
    Oct 2 '15 at 16:08








  • 1




    Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
    – Jake Fisher
    Oct 2 '15 at 17:07










  • Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
    – Arthur Yip
    Jun 23 '17 at 1:51













up vote
12
down vote

favorite
2









up vote
12
down vote

favorite
2






2





When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example:



library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1

filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2


This does not seem ideal -- I only wanted to drop rows where var1 == 1.



It looks like this is occurring because any comparison with NA returns NA, which filter then drops. So, for example, filter(dat, !(var1 %in% 1)) produces the correct results. But is there a way to tell filter not to drop the NA values?










share|improve this question















When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example:



library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1

filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2


This does not seem ideal -- I only wanted to drop rows where var1 == 1.



It looks like this is occurring because any comparison with NA returns NA, which filter then drops. So, for example, filter(dat, !(var1 %in% 1)) produces the correct results. But is there a way to tell filter not to drop the NA values?







r dplyr subset na






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 6 hours ago









zx8754

28.3k76394




28.3k76394










asked Oct 2 '15 at 13:45









Jake Fisher

1,5831428




1,5831428








  • 2




    @akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
    – LyzandeR
    Oct 2 '15 at 14:31






  • 1




    I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
    – wjchulme
    Oct 2 '15 at 16:08








  • 1




    Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
    – Jake Fisher
    Oct 2 '15 at 17:07










  • Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
    – Arthur Yip
    Jun 23 '17 at 1:51














  • 2




    @akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
    – LyzandeR
    Oct 2 '15 at 14:31






  • 1




    I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
    – wjchulme
    Oct 2 '15 at 16:08








  • 1




    Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
    – Jake Fisher
    Oct 2 '15 at 17:07










  • Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
    – Arthur Yip
    Jun 23 '17 at 1:51








2




2




@akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
– LyzandeR
Oct 2 '15 at 14:31




@akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned filter(dat, !(var1 %in% 1)) which is similar, but I think this would be the only way to do it with dplyr::filter.
– LyzandeR
Oct 2 '15 at 14:31




1




1




I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
– wjchulme
Oct 2 '15 at 16:08






I don't think there is a way to explicitly tell filter not to drop NA values but in general, logical NA queries can be intuitively handled using the base %in% operator and it's negation, defined as %ni% <- Negate('%in%'). Thus, you could use filter(dat, var1 %ni% 1) which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
– wjchulme
Oct 2 '15 at 16:08






1




1




Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
– Jake Fisher
Oct 2 '15 at 17:07




Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
– Jake Fisher
Oct 2 '15 at 17:07












Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
– Arthur Yip
Jun 23 '17 at 1:51




Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
– Arthur Yip
Jun 23 '17 at 1:51












1 Answer
1






active

oldest

votes

















up vote
17
down vote



accepted










You could use this:



 filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>


And it won't.



Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:



test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})


This test above was taken from the tests for filter from github.






share|improve this answer



















  • 2




    Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
    – Heisenberg
    Dec 9 '16 at 22:08






  • 1




    @Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
    – LyzandeR
    Dec 10 '16 at 18:35











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f32908589%2fwhy-does-dplyrs-filter-drop-na-values-from-a-factor-variable%23new-answer', 'question_page');
}
);

Post as a guest
































1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
17
down vote



accepted










You could use this:



 filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>


And it won't.



Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:



test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})


This test above was taken from the tests for filter from github.






share|improve this answer



















  • 2




    Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
    – Heisenberg
    Dec 9 '16 at 22:08






  • 1




    @Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
    – LyzandeR
    Dec 10 '16 at 18:35















up vote
17
down vote



accepted










You could use this:



 filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>


And it won't.



Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:



test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})


This test above was taken from the tests for filter from github.






share|improve this answer



















  • 2




    Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
    – Heisenberg
    Dec 9 '16 at 22:08






  • 1




    @Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
    – LyzandeR
    Dec 10 '16 at 18:35













up vote
17
down vote



accepted







up vote
17
down vote



accepted






You could use this:



 filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>


And it won't.



Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:



test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})


This test above was taken from the tests for filter from github.






share|improve this answer














You could use this:



 filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>


And it won't.



Also just for completion, dropping NAs is the intended behavior of filter as you can see from the following:



test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})


This test above was taken from the tests for filter from github.







share|improve this answer














share|improve this answer



share|improve this answer








edited Oct 2 '15 at 23:24

























answered Oct 2 '15 at 13:58









LyzandeR

27.2k114767




27.2k114767








  • 2




    Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
    – Heisenberg
    Dec 9 '16 at 22:08






  • 1




    @Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
    – LyzandeR
    Dec 10 '16 at 18:35














  • 2




    Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
    – Heisenberg
    Dec 9 '16 at 22:08






  • 1




    @Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
    – LyzandeR
    Dec 10 '16 at 18:35








2




2




Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08




Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08




1




1




@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35




@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f32908589%2fwhy-does-dplyrs-filter-drop-na-values-from-a-factor-variable%23new-answer', 'question_page');
}
);

Post as a guest




















































































Popular posts from this blog

Guess what letter conforming each word

Port of Spain

Run scheduled task as local user group (not BUILTIN)