Manipulating data into histogram like bins












0















I would like to change the format of my data for some specific code that I am working on. Below are the first 50 observations and the format it is in, each individual has its own line with the observation number, species, length (mm), weight (kg), and mesh size of the net it was caught in (in inches).



fish_data <- read.table(header = T,
text = "Index Species Length Weight mesh

1 SVCP 450 1.26 4

2 SVCP 584 2.24 3

3 SVCP 586 2.46 3

6 SVCP 590 2.4 3

7 SVCP 590 2.04 3

8 SVCP 594 2.62 3

9 SVCP 595 2.24 3

10 SVCP 595 2.04 3

11 SVCP 596 2.46 3

12 SVCP 603 2.6 3

13 SVCP 603 2.44 3

14 SVCP 604 2.68 3

15 SVCP 604 2.48 3

16 SVCP 606 2.06 3

17 SVCP 609 3.74 5

18 SVCP 609 2.44 3

20 SVCP 611 2.56 3

30 SVCP 618 2.52 3

31 SVCP 620 2.66 3

32 SVCP 620 2.66 3

33 SVCP 621 2.72 3

34 SVCP 625 2.8 3

36 SVCP 625 2.08 3

37 SVCP 626 2.74 3

38 SVCP 627 2.09 3

39 SVCP 627 2.82 3

40 SVCP 628 2.8 3

41 SVCP 630 2.68 3

42 SVCP 630 2.82 3

43 SVCP 637 3 3

45 SVCP 639 2.54 3

47 SVCP 640 3.01 3

49 SVCP 643 3.36 3

50 SVCP 644 6.82 4.25")


I would like to change the format to something like this below. Where the first column is the mesh size of the net, and the subsequent columns are the number of observations in specific length bin (for example 101-105mm, 106-110mm, 111-115 mm... ect.). I will be using 10 mm length bins.



52.5  52  11   1   1   0   0   0   0

54.5 102 91 16 4 4 2 0 3

56.5 295 232 131 61 17 13 3 1

58.5 309 318 362 243 95 26 4 3

60.5 118 173 326 342 199 100 10 11

62.5 79 87 191 239 202 201 39 15

64.5 27 48 111 143 133 185 72 25

66.5 14 17 44 51 52 122 74 41

68.5 8 6 14 23 25 59 65 76

70.5 7 3 8 14 15 16 34 33

72.5 0 3 1 2 5 4 6 15









share|improve this question




















  • 1





    Please review how to share your data in a reproducible format

    – Conor Neilson
    Nov 20 '18 at 23:58






  • 1





    It is not clear how the second data table is related to the first data table? Please explain how rows in the second table are computed?

    – TeeKea
    Nov 21 '18 at 0:05











  • They are not related, it is an example of what I need to do. The rows are counts for a specific mesh size and the number of fish in a size bin. For example in the 1st row: the 1st value is the mesh size of the net (52.5 units of measure), the 2nd value (52) is the number of fish in a certain size bin caught in that net.

    – fishy_stats
    Nov 21 '18 at 0:12











  • hey fishy welcome to stack. next time you post a question use that read.table() pattern to share data

    – Nate
    Nov 21 '18 at 0:21











  • hist(..., plot = FALSE) will put your data into histogram bins. Specify breaks = c(...) for your bin intervals

    – Umaomamaomao
    Nov 21 '18 at 0:23
















0















I would like to change the format of my data for some specific code that I am working on. Below are the first 50 observations and the format it is in, each individual has its own line with the observation number, species, length (mm), weight (kg), and mesh size of the net it was caught in (in inches).



fish_data <- read.table(header = T,
text = "Index Species Length Weight mesh

1 SVCP 450 1.26 4

2 SVCP 584 2.24 3

3 SVCP 586 2.46 3

6 SVCP 590 2.4 3

7 SVCP 590 2.04 3

8 SVCP 594 2.62 3

9 SVCP 595 2.24 3

10 SVCP 595 2.04 3

11 SVCP 596 2.46 3

12 SVCP 603 2.6 3

13 SVCP 603 2.44 3

14 SVCP 604 2.68 3

15 SVCP 604 2.48 3

16 SVCP 606 2.06 3

17 SVCP 609 3.74 5

18 SVCP 609 2.44 3

20 SVCP 611 2.56 3

30 SVCP 618 2.52 3

31 SVCP 620 2.66 3

32 SVCP 620 2.66 3

33 SVCP 621 2.72 3

34 SVCP 625 2.8 3

36 SVCP 625 2.08 3

37 SVCP 626 2.74 3

38 SVCP 627 2.09 3

39 SVCP 627 2.82 3

40 SVCP 628 2.8 3

41 SVCP 630 2.68 3

42 SVCP 630 2.82 3

43 SVCP 637 3 3

45 SVCP 639 2.54 3

47 SVCP 640 3.01 3

49 SVCP 643 3.36 3

50 SVCP 644 6.82 4.25")


I would like to change the format to something like this below. Where the first column is the mesh size of the net, and the subsequent columns are the number of observations in specific length bin (for example 101-105mm, 106-110mm, 111-115 mm... ect.). I will be using 10 mm length bins.



52.5  52  11   1   1   0   0   0   0

54.5 102 91 16 4 4 2 0 3

56.5 295 232 131 61 17 13 3 1

58.5 309 318 362 243 95 26 4 3

60.5 118 173 326 342 199 100 10 11

62.5 79 87 191 239 202 201 39 15

64.5 27 48 111 143 133 185 72 25

66.5 14 17 44 51 52 122 74 41

68.5 8 6 14 23 25 59 65 76

70.5 7 3 8 14 15 16 34 33

72.5 0 3 1 2 5 4 6 15









share|improve this question




















  • 1





    Please review how to share your data in a reproducible format

    – Conor Neilson
    Nov 20 '18 at 23:58






  • 1





    It is not clear how the second data table is related to the first data table? Please explain how rows in the second table are computed?

    – TeeKea
    Nov 21 '18 at 0:05











  • They are not related, it is an example of what I need to do. The rows are counts for a specific mesh size and the number of fish in a size bin. For example in the 1st row: the 1st value is the mesh size of the net (52.5 units of measure), the 2nd value (52) is the number of fish in a certain size bin caught in that net.

    – fishy_stats
    Nov 21 '18 at 0:12











  • hey fishy welcome to stack. next time you post a question use that read.table() pattern to share data

    – Nate
    Nov 21 '18 at 0:21











  • hist(..., plot = FALSE) will put your data into histogram bins. Specify breaks = c(...) for your bin intervals

    – Umaomamaomao
    Nov 21 '18 at 0:23














0












0








0








I would like to change the format of my data for some specific code that I am working on. Below are the first 50 observations and the format it is in, each individual has its own line with the observation number, species, length (mm), weight (kg), and mesh size of the net it was caught in (in inches).



fish_data <- read.table(header = T,
text = "Index Species Length Weight mesh

1 SVCP 450 1.26 4

2 SVCP 584 2.24 3

3 SVCP 586 2.46 3

6 SVCP 590 2.4 3

7 SVCP 590 2.04 3

8 SVCP 594 2.62 3

9 SVCP 595 2.24 3

10 SVCP 595 2.04 3

11 SVCP 596 2.46 3

12 SVCP 603 2.6 3

13 SVCP 603 2.44 3

14 SVCP 604 2.68 3

15 SVCP 604 2.48 3

16 SVCP 606 2.06 3

17 SVCP 609 3.74 5

18 SVCP 609 2.44 3

20 SVCP 611 2.56 3

30 SVCP 618 2.52 3

31 SVCP 620 2.66 3

32 SVCP 620 2.66 3

33 SVCP 621 2.72 3

34 SVCP 625 2.8 3

36 SVCP 625 2.08 3

37 SVCP 626 2.74 3

38 SVCP 627 2.09 3

39 SVCP 627 2.82 3

40 SVCP 628 2.8 3

41 SVCP 630 2.68 3

42 SVCP 630 2.82 3

43 SVCP 637 3 3

45 SVCP 639 2.54 3

47 SVCP 640 3.01 3

49 SVCP 643 3.36 3

50 SVCP 644 6.82 4.25")


I would like to change the format to something like this below. Where the first column is the mesh size of the net, and the subsequent columns are the number of observations in specific length bin (for example 101-105mm, 106-110mm, 111-115 mm... ect.). I will be using 10 mm length bins.



52.5  52  11   1   1   0   0   0   0

54.5 102 91 16 4 4 2 0 3

56.5 295 232 131 61 17 13 3 1

58.5 309 318 362 243 95 26 4 3

60.5 118 173 326 342 199 100 10 11

62.5 79 87 191 239 202 201 39 15

64.5 27 48 111 143 133 185 72 25

66.5 14 17 44 51 52 122 74 41

68.5 8 6 14 23 25 59 65 76

70.5 7 3 8 14 15 16 34 33

72.5 0 3 1 2 5 4 6 15









share|improve this question
















I would like to change the format of my data for some specific code that I am working on. Below are the first 50 observations and the format it is in, each individual has its own line with the observation number, species, length (mm), weight (kg), and mesh size of the net it was caught in (in inches).



fish_data <- read.table(header = T,
text = "Index Species Length Weight mesh

1 SVCP 450 1.26 4

2 SVCP 584 2.24 3

3 SVCP 586 2.46 3

6 SVCP 590 2.4 3

7 SVCP 590 2.04 3

8 SVCP 594 2.62 3

9 SVCP 595 2.24 3

10 SVCP 595 2.04 3

11 SVCP 596 2.46 3

12 SVCP 603 2.6 3

13 SVCP 603 2.44 3

14 SVCP 604 2.68 3

15 SVCP 604 2.48 3

16 SVCP 606 2.06 3

17 SVCP 609 3.74 5

18 SVCP 609 2.44 3

20 SVCP 611 2.56 3

30 SVCP 618 2.52 3

31 SVCP 620 2.66 3

32 SVCP 620 2.66 3

33 SVCP 621 2.72 3

34 SVCP 625 2.8 3

36 SVCP 625 2.08 3

37 SVCP 626 2.74 3

38 SVCP 627 2.09 3

39 SVCP 627 2.82 3

40 SVCP 628 2.8 3

41 SVCP 630 2.68 3

42 SVCP 630 2.82 3

43 SVCP 637 3 3

45 SVCP 639 2.54 3

47 SVCP 640 3.01 3

49 SVCP 643 3.36 3

50 SVCP 644 6.82 4.25")


I would like to change the format to something like this below. Where the first column is the mesh size of the net, and the subsequent columns are the number of observations in specific length bin (for example 101-105mm, 106-110mm, 111-115 mm... ect.). I will be using 10 mm length bins.



52.5  52  11   1   1   0   0   0   0

54.5 102 91 16 4 4 2 0 3

56.5 295 232 131 61 17 13 3 1

58.5 309 318 362 243 95 26 4 3

60.5 118 173 326 342 199 100 10 11

62.5 79 87 191 239 202 201 39 15

64.5 27 48 111 143 133 185 72 25

66.5 14 17 44 51 52 122 74 41

68.5 8 6 14 23 25 59 65 76

70.5 7 3 8 14 15 16 34 33

72.5 0 3 1 2 5 4 6 15






r histogram bins






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 0:20









Nate

6,58512030




6,58512030










asked Nov 20 '18 at 23:51









fishy_statsfishy_stats

53




53








  • 1





    Please review how to share your data in a reproducible format

    – Conor Neilson
    Nov 20 '18 at 23:58






  • 1





    It is not clear how the second data table is related to the first data table? Please explain how rows in the second table are computed?

    – TeeKea
    Nov 21 '18 at 0:05











  • They are not related, it is an example of what I need to do. The rows are counts for a specific mesh size and the number of fish in a size bin. For example in the 1st row: the 1st value is the mesh size of the net (52.5 units of measure), the 2nd value (52) is the number of fish in a certain size bin caught in that net.

    – fishy_stats
    Nov 21 '18 at 0:12











  • hey fishy welcome to stack. next time you post a question use that read.table() pattern to share data

    – Nate
    Nov 21 '18 at 0:21











  • hist(..., plot = FALSE) will put your data into histogram bins. Specify breaks = c(...) for your bin intervals

    – Umaomamaomao
    Nov 21 '18 at 0:23














  • 1





    Please review how to share your data in a reproducible format

    – Conor Neilson
    Nov 20 '18 at 23:58






  • 1





    It is not clear how the second data table is related to the first data table? Please explain how rows in the second table are computed?

    – TeeKea
    Nov 21 '18 at 0:05











  • They are not related, it is an example of what I need to do. The rows are counts for a specific mesh size and the number of fish in a size bin. For example in the 1st row: the 1st value is the mesh size of the net (52.5 units of measure), the 2nd value (52) is the number of fish in a certain size bin caught in that net.

    – fishy_stats
    Nov 21 '18 at 0:12











  • hey fishy welcome to stack. next time you post a question use that read.table() pattern to share data

    – Nate
    Nov 21 '18 at 0:21











  • hist(..., plot = FALSE) will put your data into histogram bins. Specify breaks = c(...) for your bin intervals

    – Umaomamaomao
    Nov 21 '18 at 0:23








1




1





Please review how to share your data in a reproducible format

– Conor Neilson
Nov 20 '18 at 23:58





Please review how to share your data in a reproducible format

– Conor Neilson
Nov 20 '18 at 23:58




1




1





It is not clear how the second data table is related to the first data table? Please explain how rows in the second table are computed?

– TeeKea
Nov 21 '18 at 0:05





It is not clear how the second data table is related to the first data table? Please explain how rows in the second table are computed?

– TeeKea
Nov 21 '18 at 0:05













They are not related, it is an example of what I need to do. The rows are counts for a specific mesh size and the number of fish in a size bin. For example in the 1st row: the 1st value is the mesh size of the net (52.5 units of measure), the 2nd value (52) is the number of fish in a certain size bin caught in that net.

– fishy_stats
Nov 21 '18 at 0:12





They are not related, it is an example of what I need to do. The rows are counts for a specific mesh size and the number of fish in a size bin. For example in the 1st row: the 1st value is the mesh size of the net (52.5 units of measure), the 2nd value (52) is the number of fish in a certain size bin caught in that net.

– fishy_stats
Nov 21 '18 at 0:12













hey fishy welcome to stack. next time you post a question use that read.table() pattern to share data

– Nate
Nov 21 '18 at 0:21





hey fishy welcome to stack. next time you post a question use that read.table() pattern to share data

– Nate
Nov 21 '18 at 0:21













hist(..., plot = FALSE) will put your data into histogram bins. Specify breaks = c(...) for your bin intervals

– Umaomamaomao
Nov 21 '18 at 0:23





hist(..., plot = FALSE) will put your data into histogram bins. Specify breaks = c(...) for your bin intervals

– Umaomamaomao
Nov 21 '18 at 0:23












1 Answer
1






active

oldest

votes


















0














Here's an approach using dplyr and tidyr from the tidyverse meta-package. First I create a new variable Length_bin to assign the bin, then count how many in each mesh side are in each bin, then spread from long format to wide format.



library(tidyverse)
fish_data %>%
mutate(Length_bin = (floor(Length / 5) * 5)) %>%
count(mesh, Length_bin) %>%
spread(Length_bin, n, fill = 0)

# A tibble: 4 x 15
# mesh `450` `580` `585` `590` `595` `600` `605` `610` `615` `620` `625` `630` `635` `640`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 3 0 1 1 3 3 4 2 1 1 3 6 2 2 2
#2 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0
#3 4.25 0 0 0 0 0 0 0 0 0 0 0 0 0 1
#4 5 0 0 0 0 0 0 1 0 0 0 0 0 0 0





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403349%2fmanipulating-data-into-histogram-like-bins%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Here's an approach using dplyr and tidyr from the tidyverse meta-package. First I create a new variable Length_bin to assign the bin, then count how many in each mesh side are in each bin, then spread from long format to wide format.



    library(tidyverse)
    fish_data %>%
    mutate(Length_bin = (floor(Length / 5) * 5)) %>%
    count(mesh, Length_bin) %>%
    spread(Length_bin, n, fill = 0)

    # A tibble: 4 x 15
    # mesh `450` `580` `585` `590` `595` `600` `605` `610` `615` `620` `625` `630` `635` `640`
    # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    #1 3 0 1 1 3 3 4 2 1 1 3 6 2 2 2
    #2 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0
    #3 4.25 0 0 0 0 0 0 0 0 0 0 0 0 0 1
    #4 5 0 0 0 0 0 0 1 0 0 0 0 0 0 0





    share|improve this answer




























      0














      Here's an approach using dplyr and tidyr from the tidyverse meta-package. First I create a new variable Length_bin to assign the bin, then count how many in each mesh side are in each bin, then spread from long format to wide format.



      library(tidyverse)
      fish_data %>%
      mutate(Length_bin = (floor(Length / 5) * 5)) %>%
      count(mesh, Length_bin) %>%
      spread(Length_bin, n, fill = 0)

      # A tibble: 4 x 15
      # mesh `450` `580` `585` `590` `595` `600` `605` `610` `615` `620` `625` `630` `635` `640`
      # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
      #1 3 0 1 1 3 3 4 2 1 1 3 6 2 2 2
      #2 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0
      #3 4.25 0 0 0 0 0 0 0 0 0 0 0 0 0 1
      #4 5 0 0 0 0 0 0 1 0 0 0 0 0 0 0





      share|improve this answer


























        0












        0








        0







        Here's an approach using dplyr and tidyr from the tidyverse meta-package. First I create a new variable Length_bin to assign the bin, then count how many in each mesh side are in each bin, then spread from long format to wide format.



        library(tidyverse)
        fish_data %>%
        mutate(Length_bin = (floor(Length / 5) * 5)) %>%
        count(mesh, Length_bin) %>%
        spread(Length_bin, n, fill = 0)

        # A tibble: 4 x 15
        # mesh `450` `580` `585` `590` `595` `600` `605` `610` `615` `620` `625` `630` `635` `640`
        # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
        #1 3 0 1 1 3 3 4 2 1 1 3 6 2 2 2
        #2 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0
        #3 4.25 0 0 0 0 0 0 0 0 0 0 0 0 0 1
        #4 5 0 0 0 0 0 0 1 0 0 0 0 0 0 0





        share|improve this answer













        Here's an approach using dplyr and tidyr from the tidyverse meta-package. First I create a new variable Length_bin to assign the bin, then count how many in each mesh side are in each bin, then spread from long format to wide format.



        library(tidyverse)
        fish_data %>%
        mutate(Length_bin = (floor(Length / 5) * 5)) %>%
        count(mesh, Length_bin) %>%
        spread(Length_bin, n, fill = 0)

        # A tibble: 4 x 15
        # mesh `450` `580` `585` `590` `595` `600` `605` `610` `615` `620` `625` `630` `635` `640`
        # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
        #1 3 0 1 1 3 3 4 2 1 1 3 6 2 2 2
        #2 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0
        #3 4.25 0 0 0 0 0 0 0 0 0 0 0 0 0 1
        #4 5 0 0 0 0 0 0 1 0 0 0 0 0 0 0






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 21 '18 at 1:23









        Jon SpringJon Spring

        6,9881829




        6,9881829
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403349%2fmanipulating-data-into-histogram-like-bins%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Guess what letter conforming each word

            Port of Spain

            Run scheduled task as local user group (not BUILTIN)