How to subset data in R without losing NA rows?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







4















I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.



I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.



df2 <- subset ( df1 , Height < 40 )


However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm



f1 <- function ( x , na.rm = FALSE ) {
df2 <- subset ( x , Height < 40 )
}
f1 ( df1 , na.rm = FALSE )


but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?










share|improve this question

























  • Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

    – Zach
    Nov 6 '16 at 5:07











  • For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

    – Simon Jackson
    Nov 6 '16 at 5:57


















4















I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.



I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.



df2 <- subset ( df1 , Height < 40 )


However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm



f1 <- function ( x , na.rm = FALSE ) {
df2 <- subset ( x , Height < 40 )
}
f1 ( df1 , na.rm = FALSE )


but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?










share|improve this question

























  • Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

    – Zach
    Nov 6 '16 at 5:07











  • For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

    – Simon Jackson
    Nov 6 '16 at 5:57














4












4








4


3






I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.



I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.



df2 <- subset ( df1 , Height < 40 )


However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm



f1 <- function ( x , na.rm = FALSE ) {
df2 <- subset ( x , Height < 40 )
}
f1 ( df1 , na.rm = FALSE )


but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?










share|improve this question
















I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.



I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.



df2 <- subset ( df1 , Height < 40 )


However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm



f1 <- function ( x , na.rm = FALSE ) {
df2 <- subset ( x , Height < 40 )
}
f1 ( df1 , na.rm = FALSE )


but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?







r dataframe subset na






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 6 '16 at 5:45









李哲源

49.1k1498153




49.1k1498153










asked Nov 6 '16 at 5:02









Ryan RothmanRyan Rothman

58210




58210













  • Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

    – Zach
    Nov 6 '16 at 5:07











  • For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

    – Simon Jackson
    Nov 6 '16 at 5:57



















  • Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

    – Zach
    Nov 6 '16 at 5:07











  • For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

    – Simon Jackson
    Nov 6 '16 at 5:57

















Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

– Zach
Nov 6 '16 at 5:07





Alternately, we can use subset (df1 , Height < 40 | is.na(Height))

– Zach
Nov 6 '16 at 5:07













For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

– Simon Jackson
Nov 6 '16 at 5:57





For completeness sake, similar option from dplyr package is filter(df1, Height < 40 | is.na(Height))

– Simon Jackson
Nov 6 '16 at 5:57












2 Answers
2






active

oldest

votes


















9














If we decide to use subset function, then we need to watch out:



For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.


So only non-NA values will be retained.



If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:



subset(df1, Height < 40 | is.na(Height))
# or `df1[df1$Height < 40 | is.na(df1$Height), ]`


Don't use directly (to be explained soon):



df2 <- df1[df1$Height < 40, ]


Example



df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)

subset(df1, Height < 40 | is.na(Height))

# Height y
#1 NA 1
#2 2 2
#3 4 3
#4 NA 4

df1[df1$Height < 40, ]

# Height y
#1 NA NA
#2 2 2
#3 4 3
#4 NA NA


The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:



x <- 1:4
ind <- c(NA, TRUE, NA, FALSE)
x[ind]
# [1] NA 2 NA


We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):



x[ind | is.na(ind)]
# [1] 1 2 3


This is exactly what will happen in your situation. If your Height contains NA, then logical operation Height < 40 ends up a mix of TRUE / FALSE / NA, so we need replace NA by TRUE as above.






share|improve this answer

































    1














    You could also do:



    df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]





    share|improve this answer
























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f40446165%2fhow-to-subset-data-in-r-without-losing-na-rows%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      9














      If we decide to use subset function, then we need to watch out:



      For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.


      So only non-NA values will be retained.



      If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:



      subset(df1, Height < 40 | is.na(Height))
      # or `df1[df1$Height < 40 | is.na(df1$Height), ]`


      Don't use directly (to be explained soon):



      df2 <- df1[df1$Height < 40, ]


      Example



      df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)

      subset(df1, Height < 40 | is.na(Height))

      # Height y
      #1 NA 1
      #2 2 2
      #3 4 3
      #4 NA 4

      df1[df1$Height < 40, ]

      # Height y
      #1 NA NA
      #2 2 2
      #3 4 3
      #4 NA NA


      The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:



      x <- 1:4
      ind <- c(NA, TRUE, NA, FALSE)
      x[ind]
      # [1] NA 2 NA


      We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):



      x[ind | is.na(ind)]
      # [1] 1 2 3


      This is exactly what will happen in your situation. If your Height contains NA, then logical operation Height < 40 ends up a mix of TRUE / FALSE / NA, so we need replace NA by TRUE as above.






      share|improve this answer






























        9














        If we decide to use subset function, then we need to watch out:



        For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.


        So only non-NA values will be retained.



        If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:



        subset(df1, Height < 40 | is.na(Height))
        # or `df1[df1$Height < 40 | is.na(df1$Height), ]`


        Don't use directly (to be explained soon):



        df2 <- df1[df1$Height < 40, ]


        Example



        df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)

        subset(df1, Height < 40 | is.na(Height))

        # Height y
        #1 NA 1
        #2 2 2
        #3 4 3
        #4 NA 4

        df1[df1$Height < 40, ]

        # Height y
        #1 NA NA
        #2 2 2
        #3 4 3
        #4 NA NA


        The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:



        x <- 1:4
        ind <- c(NA, TRUE, NA, FALSE)
        x[ind]
        # [1] NA 2 NA


        We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):



        x[ind | is.na(ind)]
        # [1] 1 2 3


        This is exactly what will happen in your situation. If your Height contains NA, then logical operation Height < 40 ends up a mix of TRUE / FALSE / NA, so we need replace NA by TRUE as above.






        share|improve this answer




























          9












          9








          9







          If we decide to use subset function, then we need to watch out:



          For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.


          So only non-NA values will be retained.



          If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:



          subset(df1, Height < 40 | is.na(Height))
          # or `df1[df1$Height < 40 | is.na(df1$Height), ]`


          Don't use directly (to be explained soon):



          df2 <- df1[df1$Height < 40, ]


          Example



          df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)

          subset(df1, Height < 40 | is.na(Height))

          # Height y
          #1 NA 1
          #2 2 2
          #3 4 3
          #4 NA 4

          df1[df1$Height < 40, ]

          # Height y
          #1 NA NA
          #2 2 2
          #3 4 3
          #4 NA NA


          The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:



          x <- 1:4
          ind <- c(NA, TRUE, NA, FALSE)
          x[ind]
          # [1] NA 2 NA


          We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):



          x[ind | is.na(ind)]
          # [1] 1 2 3


          This is exactly what will happen in your situation. If your Height contains NA, then logical operation Height < 40 ends up a mix of TRUE / FALSE / NA, so we need replace NA by TRUE as above.






          share|improve this answer















          If we decide to use subset function, then we need to watch out:



          For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.


          So only non-NA values will be retained.



          If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:



          subset(df1, Height < 40 | is.na(Height))
          # or `df1[df1$Height < 40 | is.na(df1$Height), ]`


          Don't use directly (to be explained soon):



          df2 <- df1[df1$Height < 40, ]


          Example



          df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)

          subset(df1, Height < 40 | is.na(Height))

          # Height y
          #1 NA 1
          #2 2 2
          #3 4 3
          #4 NA 4

          df1[df1$Height < 40, ]

          # Height y
          #1 NA NA
          #2 2 2
          #3 4 3
          #4 NA NA


          The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:



          x <- 1:4
          ind <- c(NA, TRUE, NA, FALSE)
          x[ind]
          # [1] NA 2 NA


          We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):



          x[ind | is.na(ind)]
          # [1] 1 2 3


          This is exactly what will happen in your situation. If your Height contains NA, then logical operation Height < 40 ends up a mix of TRUE / FALSE / NA, so we need replace NA by TRUE as above.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 6 '16 at 5:39

























          answered Nov 6 '16 at 5:05









          李哲源李哲源

          49.1k1498153




          49.1k1498153

























              1














              You could also do:



              df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]





              share|improve this answer




























                1














                You could also do:



                df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]





                share|improve this answer


























                  1












                  1








                  1







                  You could also do:



                  df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]





                  share|improve this answer













                  You could also do:



                  df2 <- df1[(df1$Height < 40 | is.na(df1$Height)),]






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Apr 20 '17 at 14:00









                  dededede

                  4111819




                  4111819






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f40446165%2fhow-to-subset-data-in-r-without-losing-na-rows%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Guess what letter conforming each word

                      Run scheduled task as local user group (not BUILTIN)

                      Port of Spain