How to subset data in R without losing NA rows?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.

I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.

df2 <- subset ( df1 , Height < 40 )

However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm

f1 <- function ( x , na.rm = FALSE ) {

df2 <- subset ( x , Height < 40 )

}

f1 ( df1 , na.rm = FALSE )

but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?

edited Nov 6 '16 at 5:45

李哲源

49.1k1498153

asked Nov 6 '16 at 5:02

Ryan Rothman

58210

Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

– Zach
Nov 6 '16 at 5:07

For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

– Simon Jackson
Nov 6 '16 at 5:57

add a comment |

I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.

I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.

df2 <- subset ( df1 , Height < 40 )

However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm

f1 <- function ( x , na.rm = FALSE ) {

df2 <- subset ( x , Height < 40 )

}

f1 ( df1 , na.rm = FALSE )

but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?

edited Nov 6 '16 at 5:45

李哲源

49.1k1498153

asked Nov 6 '16 at 5:02

Ryan Rothman

58210

Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

– Zach
Nov 6 '16 at 5:07

For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

– Simon Jackson
Nov 6 '16 at 5:57

add a comment |

I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.

I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.

df2 <- subset ( df1 , Height < 40 )

However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm

f1 <- function ( x , na.rm = FALSE ) {

df2 <- subset ( x , Height < 40 )

}

f1 ( df1 , na.rm = FALSE )

but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?

edited Nov 6 '16 at 5:45

李哲源

49.1k1498153

asked Nov 6 '16 at 5:02

Ryan Rothman

58210

I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.

I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.

df2 <- subset ( df1 , Height < 40 )

However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm

f1 <- function ( x , na.rm = FALSE ) {

df2 <- subset ( x , Height < 40 )

}

f1 ( df1 , na.rm = FALSE )

but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?

r dataframe subset na

edited Nov 6 '16 at 5:45

李哲源

49.1k1498153

asked Nov 6 '16 at 5:02

Ryan Rothman

58210

edited Nov 6 '16 at 5:45

李哲源

49.1k1498153

asked Nov 6 '16 at 5:02

Ryan Rothman

58210

edited Nov 6 '16 at 5:45

李哲源

49.1k1498153

edited Nov 6 '16 at 5:45

李哲源

49.1k1498153

edited Nov 6 '16 at 5:45

李哲源

49.1k1498153

asked Nov 6 '16 at 5:02

Ryan Rothman

58210

asked Nov 6 '16 at 5:02

Ryan Rothman

58210

asked Nov 6 '16 at 5:02

Ryan Rothman

58210

Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

– Zach
Nov 6 '16 at 5:07

For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

– Simon Jackson
Nov 6 '16 at 5:57

add a comment |

Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

– Zach
Nov 6 '16 at 5:07

For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

– Simon Jackson
Nov 6 '16 at 5:57

Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

– Zach
Nov 6 '16 at 5:07

For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

– Simon Jackson
Nov 6 '16 at 5:57

add a comment |

2 Answers
2

active

oldest

votes

If we decide to use subset function, then we need to watch out:

For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.

So only non-NA values will be retained.

If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:

subset(df1, Height < 40 | is.na(Height))

# or `df1[df1$Height < 40 | is.na(df1$Height), ]`

Don't use directly (to be explained soon):

df2 <- df1[df1$Height < 40, ]

Example

df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)



subset(df1, Height < 40 | is.na(Height))



#  Height y

#1     NA 1

#2      2 2

#3      4 3

#4     NA 4



df1[df1$Height < 40, ]



#  Height  y

#1     NA NA

#2      2  2

#3      4  3

#4     NA NA

The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:

x <- 1:4

ind <- c(NA, TRUE, NA, FALSE)

x[ind]

# [1] NA  2 NA

We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):

x[ind | is.na(ind)]

# [1] 1 2 3

This is exactly what will happen in your situation. If your Height contains NA, then logical operation Height < 40 ends up a mix of TRUE / FALSE / NA, so we need replace NA by TRUE as above.

edited Nov 6 '16 at 5:39

answered Nov 6 '16 at 5:05

李哲源

49.1k1498153

add a comment |

You could also do:

df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]

answered Apr 20 '17 at 14:00

dede

4111819

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f40446165%2fhow-to-subset-data-in-r-without-losing-na-rows%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

If we decide to use subset function, then we need to watch out:

For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.

So only non-NA values will be retained.

If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:

subset(df1, Height < 40 | is.na(Height))

# or `df1[df1$Height < 40 | is.na(df1$Height), ]`

Don't use directly (to be explained soon):

df2 <- df1[df1$Height < 40, ]

Example

df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)



subset(df1, Height < 40 | is.na(Height))



#  Height y

#1     NA 1

#2      2 2

#3      4 3

#4     NA 4



df1[df1$Height < 40, ]



#  Height  y

#1     NA NA

#2      2  2

#3      4  3

#4     NA NA

The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:

x <- 1:4

ind <- c(NA, TRUE, NA, FALSE)

x[ind]

# [1] NA  2 NA

We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):

x[ind | is.na(ind)]

# [1] 1 2 3

edited Nov 6 '16 at 5:39

answered Nov 6 '16 at 5:05

李哲源

49.1k1498153

add a comment |

If we decide to use subset function, then we need to watch out:

For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.

So only non-NA values will be retained.

If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:

subset(df1, Height < 40 | is.na(Height))

# or `df1[df1$Height < 40 | is.na(df1$Height), ]`

Don't use directly (to be explained soon):

df2 <- df1[df1$Height < 40, ]

Example

df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)



subset(df1, Height < 40 | is.na(Height))



#  Height y

#1     NA 1

#2      2 2

#3      4 3

#4     NA 4



df1[df1$Height < 40, ]



#  Height  y

#1     NA NA

#2      2  2

#3      4  3

#4     NA NA

The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:

x <- 1:4

ind <- c(NA, TRUE, NA, FALSE)

x[ind]

# [1] NA  2 NA

We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):

x[ind | is.na(ind)]

# [1] 1 2 3

edited Nov 6 '16 at 5:39

answered Nov 6 '16 at 5:05

李哲源

49.1k1498153

add a comment |

If we decide to use subset function, then we need to watch out:

For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.

So only non-NA values will be retained.

If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:

subset(df1, Height < 40 | is.na(Height))

# or `df1[df1$Height < 40 | is.na(df1$Height), ]`

Don't use directly (to be explained soon):

df2 <- df1[df1$Height < 40, ]

Example

df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)



subset(df1, Height < 40 | is.na(Height))



#  Height y

#1     NA 1

#2      2 2

#3      4 3

#4     NA 4



df1[df1$Height < 40, ]



#  Height  y

#1     NA NA

#2      2  2

#3      4  3

#4     NA NA

The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:

x <- 1:4

ind <- c(NA, TRUE, NA, FALSE)

x[ind]

# [1] NA  2 NA

We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):

x[ind | is.na(ind)]

# [1] 1 2 3

edited Nov 6 '16 at 5:39

answered Nov 6 '16 at 5:05

李哲源

49.1k1498153

If we decide to use subset function, then we need to watch out:

For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.

So only non-NA values will be retained.

If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:

subset(df1, Height < 40 | is.na(Height))

# or `df1[df1$Height < 40 | is.na(df1$Height), ]`

Don't use directly (to be explained soon):

df2 <- df1[df1$Height < 40, ]

Example

df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)



subset(df1, Height < 40 | is.na(Height))



#  Height y

#1     NA 1

#2      2 2

#3      4 3

#4     NA 4



df1[df1$Height < 40, ]



#  Height  y

#1     NA NA

#2      2  2

#3      4  3

#4     NA NA

The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:

x <- 1:4

ind <- c(NA, TRUE, NA, FALSE)

x[ind]

# [1] NA  2 NA

We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):

x[ind | is.na(ind)]

# [1] 1 2 3

edited Nov 6 '16 at 5:39

answered Nov 6 '16 at 5:05

李哲源

49.1k1498153

edited Nov 6 '16 at 5:39

answered Nov 6 '16 at 5:05

李哲源

49.1k1498153

answered Nov 6 '16 at 5:05

李哲源

49.1k1498153

answered Nov 6 '16 at 5:05

李哲源

49.1k1498153

add a comment |

You could also do:

df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]

answered Apr 20 '17 at 14:00

dede

4111819

add a comment |

You could also do:

df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]

answered Apr 20 '17 at 14:00

dede

4111819

add a comment |

You could also do:

df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]

answered Apr 20 '17 at 14:00

dede

4111819

You could also do:

df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]

answered Apr 20 '17 at 14:00

dede

4111819

answered Apr 20 '17 at 14:00

dede

4111819

answered Apr 20 '17 at 14:00

dede

4111819

answered Apr 20 '17 at 14:00

dede

4111819

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk