Why does dplyr's filter drop NA values from a factor variable?
up vote
12
down vote
favorite
When I use filter
from the dplyr
package to drop a level of a factor variable, filter
also drops the NA
values. Here's an example:
library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1
filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2
This does not seem ideal -- I only wanted to drop rows where var1 == 1
.
It looks like this is occurring because any comparison with NA
returns NA
, which filter
then drops. So, for example, filter(dat, !(var1 %in% 1))
produces the correct results. But is there a way to tell filter
not to drop the NA
values?
r dplyr subset na
add a comment |
up vote
12
down vote
favorite
When I use filter
from the dplyr
package to drop a level of a factor variable, filter
also drops the NA
values. Here's an example:
library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1
filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2
This does not seem ideal -- I only wanted to drop rows where var1 == 1
.
It looks like this is occurring because any comparison with NA
returns NA
, which filter
then drops. So, for example, filter(dat, !(var1 %in% 1))
produces the correct results. But is there a way to tell filter
not to drop the NA
values?
r dplyr subset na
2
@akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentionedfilter(dat, !(var1 %in% 1))
which is similar, but I think this would be the only way to do it withdplyr::filter
.
– LyzandeR
Oct 2 '15 at 14:31
1
I don't think there is a way to explicitly tellfilter
not to dropNA
values but in general, logical NA queries can be intuitively handled using the base%in%
operator and it's negation, defined as%ni% <- Negate('%in%')
. Thus, you could usefilter(dat, var1 %ni% 1)
which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
– wjchulme
Oct 2 '15 at 16:08
1
Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
– Jake Fisher
Oct 2 '15 at 17:07
Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
– Arthur Yip
Jun 23 '17 at 1:51
add a comment |
up vote
12
down vote
favorite
up vote
12
down vote
favorite
When I use filter
from the dplyr
package to drop a level of a factor variable, filter
also drops the NA
values. Here's an example:
library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1
filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2
This does not seem ideal -- I only wanted to drop rows where var1 == 1
.
It looks like this is occurring because any comparison with NA
returns NA
, which filter
then drops. So, for example, filter(dat, !(var1 %in% 1))
produces the correct results. But is there a way to tell filter
not to drop the NA
values?
r dplyr subset na
When I use filter
from the dplyr
package to drop a level of a factor variable, filter
also drops the NA
values. Here's an example:
library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1
filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2
This does not seem ideal -- I only wanted to drop rows where var1 == 1
.
It looks like this is occurring because any comparison with NA
returns NA
, which filter
then drops. So, for example, filter(dat, !(var1 %in% 1))
produces the correct results. But is there a way to tell filter
not to drop the NA
values?
r dplyr subset na
r dplyr subset na
edited 6 hours ago
zx8754
28.3k76394
28.3k76394
asked Oct 2 '15 at 13:45
Jake Fisher
1,5831428
1,5831428
2
@akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentionedfilter(dat, !(var1 %in% 1))
which is similar, but I think this would be the only way to do it withdplyr::filter
.
– LyzandeR
Oct 2 '15 at 14:31
1
I don't think there is a way to explicitly tellfilter
not to dropNA
values but in general, logical NA queries can be intuitively handled using the base%in%
operator and it's negation, defined as%ni% <- Negate('%in%')
. Thus, you could usefilter(dat, var1 %ni% 1)
which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
– wjchulme
Oct 2 '15 at 16:08
1
Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
– Jake Fisher
Oct 2 '15 at 17:07
Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
– Arthur Yip
Jun 23 '17 at 1:51
add a comment |
2
@akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentionedfilter(dat, !(var1 %in% 1))
which is similar, but I think this would be the only way to do it withdplyr::filter
.
– LyzandeR
Oct 2 '15 at 14:31
1
I don't think there is a way to explicitly tellfilter
not to dropNA
values but in general, logical NA queries can be intuitively handled using the base%in%
operator and it's negation, defined as%ni% <- Negate('%in%')
. Thus, you could usefilter(dat, var1 %ni% 1)
which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699
– wjchulme
Oct 2 '15 at 16:08
1
Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
– Jake Fisher
Oct 2 '15 at 17:07
Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
– Arthur Yip
Jun 23 '17 at 1:51
2
2
@akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned
filter(dat, !(var1 %in% 1))
which is similar, but I think this would be the only way to do it with dplyr::filter
.– LyzandeR
Oct 2 '15 at 14:31
@akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned
filter(dat, !(var1 %in% 1))
which is similar, but I think this would be the only way to do it with dplyr::filter
.– LyzandeR
Oct 2 '15 at 14:31
1
1
I don't think there is a way to explicitly tell
filter
not to drop NA
values but in general, logical NA queries can be intuitively handled using the base %in%
operator and it's negation, defined as %ni% <- Negate('%in%')
. Thus, you could use filter(dat, var1 %ni% 1)
which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699– wjchulme
Oct 2 '15 at 16:08
I don't think there is a way to explicitly tell
filter
not to drop NA
values but in general, logical NA queries can be intuitively handled using the base %in%
operator and it's negation, defined as %ni% <- Negate('%in%')
. Thus, you could use filter(dat, var1 %ni% 1)
which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699– wjchulme
Oct 2 '15 at 16:08
1
1
Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
– Jake Fisher
Oct 2 '15 at 17:07
Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
– Jake Fisher
Oct 2 '15 at 17:07
Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
– Arthur Yip
Jun 23 '17 at 1:51
Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
– Arthur Yip
Jun 23 '17 at 1:51
add a comment |
1 Answer
1
active
oldest
votes
up vote
17
down vote
accepted
You could use this:
filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>
And it won't.
Also just for completion, dropping NAs is the intended behavior of filter
as you can see from the following:
test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})
This test above was taken from the tests for filter
from github.
2
Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08
1
@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
17
down vote
accepted
You could use this:
filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>
And it won't.
Also just for completion, dropping NAs is the intended behavior of filter
as you can see from the following:
test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})
This test above was taken from the tests for filter
from github.
2
Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08
1
@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35
add a comment |
up vote
17
down vote
accepted
You could use this:
filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>
And it won't.
Also just for completion, dropping NAs is the intended behavior of filter
as you can see from the following:
test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})
This test above was taken from the tests for filter
from github.
2
Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08
1
@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35
add a comment |
up vote
17
down vote
accepted
up vote
17
down vote
accepted
You could use this:
filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>
And it won't.
Also just for completion, dropping NAs is the intended behavior of filter
as you can see from the following:
test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})
This test above was taken from the tests for filter
from github.
You could use this:
filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>
And it won't.
Also just for completion, dropping NAs is the intended behavior of filter
as you can see from the following:
test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})
This test above was taken from the tests for filter
from github.
edited Oct 2 '15 at 23:24
answered Oct 2 '15 at 13:58
LyzandeR
27.2k114767
27.2k114767
2
Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08
1
@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35
add a comment |
2
Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08
1
@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35
2
2
Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08
Venturing a bit into opinion-based territory, do you have an idea why this is the chosen approach? This behavior was unexpected to me (I got bitten by it today).
– Heisenberg
Dec 9 '16 at 22:08
1
1
@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35
@Heisenberg I am assuming according to Hadley most people would like not to get any NAs when filtering. But that is a question for the developer / maintainer i.e. Hadley.
– LyzandeR
Dec 10 '16 at 18:35
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f32908589%2fwhy-does-dplyrs-filter-drop-na-values-from-a-factor-variable%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
@akrun For some reason I didn't get this notification :P. Well I thought that the OP already knows about this, as he mentioned
filter(dat, !(var1 %in% 1))
which is similar, but I think this would be the only way to do it withdplyr::filter
.– LyzandeR
Oct 2 '15 at 14:31
1
I don't think there is a way to explicitly tell
filter
not to dropNA
values but in general, logical NA queries can be intuitively handled using the base%in%
operator and it's negation, defined as%ni% <- Negate('%in%')
. Thus, you could usefilter(dat, var1 %ni% 1)
which will work. See stackoverflow.com/a/11303276/4269699 and stackoverflow.com/a/27015823/4269699– wjchulme
Oct 2 '15 at 16:08
1
Yes, I did know about both this approach and the approach that @LyzandeR used for an answer. It looks like filter doesn't have an explicit option for "keep NA", so these workarounds will be fine. Thanks for your help.
– Jake Fisher
Oct 2 '15 at 17:07
Argh this happened to me and I was going crazy trying to understand why I was losing so much data. Agreed this seems like it is not ideal...
– Arthur Yip
Jun 23 '17 at 1:51