Frequencies of all subsequences of size 3 in a given 0-1 sequence?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







4















Given data



s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)


I can count 1s and 0s with table or ftable



ftable(s,row.vars =1:1)


and the totals of 11s,01s,10s,00s occurred in s with



table(s[-length(s)],s[-1]).


What would be the clever way to count occurrences of 111s, 011s, ..., 100s, 000s? Ideally, I want a table of counts x like



   0 1
11 x x
01 x x
10 x x
00 x x


Is there a general way to compute the total occurrences for all possible sub-sequences of length k=1,2,3,4, ... occurred in data?










share|improve this question































    4















    Given data



    s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)


    I can count 1s and 0s with table or ftable



    ftable(s,row.vars =1:1)


    and the totals of 11s,01s,10s,00s occurred in s with



    table(s[-length(s)],s[-1]).


    What would be the clever way to count occurrences of 111s, 011s, ..., 100s, 000s? Ideally, I want a table of counts x like



       0 1
    11 x x
    01 x x
    10 x x
    00 x x


    Is there a general way to compute the total occurrences for all possible sub-sequences of length k=1,2,3,4, ... occurred in data?










    share|improve this question



























      4












      4








      4


      1






      Given data



      s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)


      I can count 1s and 0s with table or ftable



      ftable(s,row.vars =1:1)


      and the totals of 11s,01s,10s,00s occurred in s with



      table(s[-length(s)],s[-1]).


      What would be the clever way to count occurrences of 111s, 011s, ..., 100s, 000s? Ideally, I want a table of counts x like



         0 1
      11 x x
      01 x x
      10 x x
      00 x x


      Is there a general way to compute the total occurrences for all possible sub-sequences of length k=1,2,3,4, ... occurred in data?










      share|improve this question
















      Given data



      s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)


      I can count 1s and 0s with table or ftable



      ftable(s,row.vars =1:1)


      and the totals of 11s,01s,10s,00s occurred in s with



      table(s[-length(s)],s[-1]).


      What would be the clever way to count occurrences of 111s, 011s, ..., 100s, 000s? Ideally, I want a table of counts x like



         0 1
      11 x x
      01 x x
      10 x x
      00 x x


      Is there a general way to compute the total occurrences for all possible sub-sequences of length k=1,2,3,4, ... occurred in data?







      r count sequence






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 '18 at 5:38









      Cœur

      19.3k10116155




      19.3k10116155










      asked Feb 17 '10 at 7:22









      andrekosandrekos

      1,74712023




      1,74712023
























          2 Answers
          2






          active

          oldest

          votes


















          5














          Well, it seems like you would first need to generate n-tuples from your vector. The following function should accomplish that:



          makeTuples <- function( x, n ){

          # Very inefficient way to loop... but what the heck
          tuples <- list()

          for( i in 1:n ){

          tuples[[i]] <- x[i:(length(x)-n+i)]

          }

          return(tuples)

          }


          Then you could feed the results of makeTuples() to table() using do.call():



          do.call( table, makeTuples(s,3) )

          , , = 0


          0 1
          0 4 1
          1 3 1

          , , = 1


          0 1
          0 2 1
          1 0 1


          This works because the makeTuples() function returns the tuples as a list of lists. The output isn't quite as nice as you wanted, but you could write a function to reformat, say:



          , ,  = 0


          0 1
          0 4 1
          1 3 1


          To:



               0 1
          00 4 1
          01 3 1


          It would require looping over the outer n-2 dimensions of the n-dimensional array returned by table, creating row names and concatenating things together.




          Update




          So, I was just sitting in a Stochastic processes class when I figured out a more or less straight-forward way to produce the output you want without trying to unwind the output of table(). First you will need a function that generates all possible permutations of n selections from your population. The generation of permutations can be done with expand.grid(), but it needs a little sugar-coating:



          permute <- function( population, n ){

          permutations <- do.call( expand.grid, rep( list(population), n ) )

          permutations <- apply( permutations, 1, paste, collapse = '' )

          return( permutations )

          }


          The basic idea is to iterate over the list of permutations and count the number of tuples that match the given permutation. Since you want the results split out into a table, we should select a permutation of n-1 elements from the population and let the last position form the columns of the table. Here's a function that takes a permutation of size n-1, a list of tuples, and the population the tuples were drawn from and produces a named vector of match counts:



          countFrequency <- function(permutation,tuples,population){

          permutations <- paste( permutation, population, sep = '' )

          # Inner lapply applies the equality operator `==` to each
          # permutation and returns a list of TRUE/FALSE vectors.
          # Outer lapply sums the number of TRUE values in each vector.
          frequencies <- lapply(lapply(permutations,`==`,tuples),sum)

          names( frequencies ) <- as.character( population )

          return( unlist(frequencies) )

          }


          Finally, all three functions can be combined into a bigger function that takes a vector, splits it into n-tuples and returns a frequency table. The final aggregation operation is done using ldply() from Hadley Wickham's plyr package as it does a nice job of preserving information such as which permutation corresponds to which row of output matches:



          permutationFrequency <- function( vector, n, population = unique( vector ) ){

          # Split the vector into tuples.
          tuples <- makeTuples( vector, n )

          # Coerce and compact the tuples to a vector of strings.
          tuples <- do.call(cbind,tuples)
          tuples <- apply( tuples, 1, paste, collapse = '' )

          # Generate permutations of n-1 elements from the population.
          # Turn into a named list for ldply() to work it's magic.
          permutations <- permute( population, n-1 )
          names( permutations ) <- permutations

          frequencies <- ldply( permutations, countFrequency,
          tuples = tuples, population = population )

          return( frequencies )

          }


          And there you go:



          require( plyr )
          permutationFrequency( s, 2 )
          .id 1 0
          1 1 2 3
          2 0 2 7

          permutationFrequency( s, 3 )
          .id 1 0
          1 11 1 1
          2 01 1 1
          3 10 0 3
          4 00 2 4

          permutationFrequency( s, 4 )
          .id 1 0
          1 111 0 1
          2 011 1 0
          3 101 0 0
          4 001 1 1
          5 110 0 1
          6 010 0 1
          7 100 0 2
          8 000 2 2

          permutationFrequency( sample( -1:1, 10, replace = T ), 2 )
          .id 1 -1 0
          1 1 1 2 0
          2 -1 0 1 2
          3 0 1 0 2


          Apologies to my stochastic processes teacher, but functional programming problems in R were just more interesting than the Gambler's Ruin today...






          share|improve this answer


























          • Thanks very much for this, but the .id column appears to be missing in my output. Or am I missing something? The rest is exactly what I needed.

            – andrekos
            Feb 18 '10 at 1:30











          • Hmm, I noticed the .id column didn't show up if I gave an unnamed list or vector to ldply(). Did you include names(permutations) <- permutations?

            – Sharpie
            Feb 18 '10 at 1:40











          • Yes, to start with, I copypasted your code.

            – andrekos
            Feb 18 '10 at 8:26











          • Interesting. Could be a version thing-- I'm using R 2.10.1 and plyr 0.1.9

            – Sharpie
            Feb 18 '10 at 9:31











          • SessionInfo() informed I used plyr 0.1.3, and update.packages() did not help. But upgrading from R 2.9.2 did help :)

            – andrekos
            Feb 19 '10 at 0:11



















          1














          One approach is to create a data frame of the subsequences and then use the table function:



          s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)
          n<-length(s)
          k<-3
          subseqs<-t(sapply(1:(n-k+1),function(i){s[i:(i+k-1)]}))
          colnames(subseqs)<-paste('Y',1:k,sep="")
          subseqs<-data.frame(subseqs)
          table(subseqs)


          This produces



          , , Y3 = 0

          Y2
          Y1 0 1
          0 4 1
          1 3 1

          , , Y3 = 1

          Y2
          Y1 0 1
          0 2 1
          1 0 1


          Use ftable instead of table or on the output of table for a display similar to the one in your question:



          ftable(subseqs)
          Y3 0 1
          Y1 Y2
          0 0 4 2
          1 1 1
          1 0 3 0
          1 1 1





          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f2278951%2ffrequencies-of-all-subsequences-of-size-3-in-a-given-0-1-sequence%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            5














            Well, it seems like you would first need to generate n-tuples from your vector. The following function should accomplish that:



            makeTuples <- function( x, n ){

            # Very inefficient way to loop... but what the heck
            tuples <- list()

            for( i in 1:n ){

            tuples[[i]] <- x[i:(length(x)-n+i)]

            }

            return(tuples)

            }


            Then you could feed the results of makeTuples() to table() using do.call():



            do.call( table, makeTuples(s,3) )

            , , = 0


            0 1
            0 4 1
            1 3 1

            , , = 1


            0 1
            0 2 1
            1 0 1


            This works because the makeTuples() function returns the tuples as a list of lists. The output isn't quite as nice as you wanted, but you could write a function to reformat, say:



            , ,  = 0


            0 1
            0 4 1
            1 3 1


            To:



                 0 1
            00 4 1
            01 3 1


            It would require looping over the outer n-2 dimensions of the n-dimensional array returned by table, creating row names and concatenating things together.




            Update




            So, I was just sitting in a Stochastic processes class when I figured out a more or less straight-forward way to produce the output you want without trying to unwind the output of table(). First you will need a function that generates all possible permutations of n selections from your population. The generation of permutations can be done with expand.grid(), but it needs a little sugar-coating:



            permute <- function( population, n ){

            permutations <- do.call( expand.grid, rep( list(population), n ) )

            permutations <- apply( permutations, 1, paste, collapse = '' )

            return( permutations )

            }


            The basic idea is to iterate over the list of permutations and count the number of tuples that match the given permutation. Since you want the results split out into a table, we should select a permutation of n-1 elements from the population and let the last position form the columns of the table. Here's a function that takes a permutation of size n-1, a list of tuples, and the population the tuples were drawn from and produces a named vector of match counts:



            countFrequency <- function(permutation,tuples,population){

            permutations <- paste( permutation, population, sep = '' )

            # Inner lapply applies the equality operator `==` to each
            # permutation and returns a list of TRUE/FALSE vectors.
            # Outer lapply sums the number of TRUE values in each vector.
            frequencies <- lapply(lapply(permutations,`==`,tuples),sum)

            names( frequencies ) <- as.character( population )

            return( unlist(frequencies) )

            }


            Finally, all three functions can be combined into a bigger function that takes a vector, splits it into n-tuples and returns a frequency table. The final aggregation operation is done using ldply() from Hadley Wickham's plyr package as it does a nice job of preserving information such as which permutation corresponds to which row of output matches:



            permutationFrequency <- function( vector, n, population = unique( vector ) ){

            # Split the vector into tuples.
            tuples <- makeTuples( vector, n )

            # Coerce and compact the tuples to a vector of strings.
            tuples <- do.call(cbind,tuples)
            tuples <- apply( tuples, 1, paste, collapse = '' )

            # Generate permutations of n-1 elements from the population.
            # Turn into a named list for ldply() to work it's magic.
            permutations <- permute( population, n-1 )
            names( permutations ) <- permutations

            frequencies <- ldply( permutations, countFrequency,
            tuples = tuples, population = population )

            return( frequencies )

            }


            And there you go:



            require( plyr )
            permutationFrequency( s, 2 )
            .id 1 0
            1 1 2 3
            2 0 2 7

            permutationFrequency( s, 3 )
            .id 1 0
            1 11 1 1
            2 01 1 1
            3 10 0 3
            4 00 2 4

            permutationFrequency( s, 4 )
            .id 1 0
            1 111 0 1
            2 011 1 0
            3 101 0 0
            4 001 1 1
            5 110 0 1
            6 010 0 1
            7 100 0 2
            8 000 2 2

            permutationFrequency( sample( -1:1, 10, replace = T ), 2 )
            .id 1 -1 0
            1 1 1 2 0
            2 -1 0 1 2
            3 0 1 0 2


            Apologies to my stochastic processes teacher, but functional programming problems in R were just more interesting than the Gambler's Ruin today...






            share|improve this answer


























            • Thanks very much for this, but the .id column appears to be missing in my output. Or am I missing something? The rest is exactly what I needed.

              – andrekos
              Feb 18 '10 at 1:30











            • Hmm, I noticed the .id column didn't show up if I gave an unnamed list or vector to ldply(). Did you include names(permutations) <- permutations?

              – Sharpie
              Feb 18 '10 at 1:40











            • Yes, to start with, I copypasted your code.

              – andrekos
              Feb 18 '10 at 8:26











            • Interesting. Could be a version thing-- I'm using R 2.10.1 and plyr 0.1.9

              – Sharpie
              Feb 18 '10 at 9:31











            • SessionInfo() informed I used plyr 0.1.3, and update.packages() did not help. But upgrading from R 2.9.2 did help :)

              – andrekos
              Feb 19 '10 at 0:11
















            5














            Well, it seems like you would first need to generate n-tuples from your vector. The following function should accomplish that:



            makeTuples <- function( x, n ){

            # Very inefficient way to loop... but what the heck
            tuples <- list()

            for( i in 1:n ){

            tuples[[i]] <- x[i:(length(x)-n+i)]

            }

            return(tuples)

            }


            Then you could feed the results of makeTuples() to table() using do.call():



            do.call( table, makeTuples(s,3) )

            , , = 0


            0 1
            0 4 1
            1 3 1

            , , = 1


            0 1
            0 2 1
            1 0 1


            This works because the makeTuples() function returns the tuples as a list of lists. The output isn't quite as nice as you wanted, but you could write a function to reformat, say:



            , ,  = 0


            0 1
            0 4 1
            1 3 1


            To:



                 0 1
            00 4 1
            01 3 1


            It would require looping over the outer n-2 dimensions of the n-dimensional array returned by table, creating row names and concatenating things together.




            Update




            So, I was just sitting in a Stochastic processes class when I figured out a more or less straight-forward way to produce the output you want without trying to unwind the output of table(). First you will need a function that generates all possible permutations of n selections from your population. The generation of permutations can be done with expand.grid(), but it needs a little sugar-coating:



            permute <- function( population, n ){

            permutations <- do.call( expand.grid, rep( list(population), n ) )

            permutations <- apply( permutations, 1, paste, collapse = '' )

            return( permutations )

            }


            The basic idea is to iterate over the list of permutations and count the number of tuples that match the given permutation. Since you want the results split out into a table, we should select a permutation of n-1 elements from the population and let the last position form the columns of the table. Here's a function that takes a permutation of size n-1, a list of tuples, and the population the tuples were drawn from and produces a named vector of match counts:



            countFrequency <- function(permutation,tuples,population){

            permutations <- paste( permutation, population, sep = '' )

            # Inner lapply applies the equality operator `==` to each
            # permutation and returns a list of TRUE/FALSE vectors.
            # Outer lapply sums the number of TRUE values in each vector.
            frequencies <- lapply(lapply(permutations,`==`,tuples),sum)

            names( frequencies ) <- as.character( population )

            return( unlist(frequencies) )

            }


            Finally, all three functions can be combined into a bigger function that takes a vector, splits it into n-tuples and returns a frequency table. The final aggregation operation is done using ldply() from Hadley Wickham's plyr package as it does a nice job of preserving information such as which permutation corresponds to which row of output matches:



            permutationFrequency <- function( vector, n, population = unique( vector ) ){

            # Split the vector into tuples.
            tuples <- makeTuples( vector, n )

            # Coerce and compact the tuples to a vector of strings.
            tuples <- do.call(cbind,tuples)
            tuples <- apply( tuples, 1, paste, collapse = '' )

            # Generate permutations of n-1 elements from the population.
            # Turn into a named list for ldply() to work it's magic.
            permutations <- permute( population, n-1 )
            names( permutations ) <- permutations

            frequencies <- ldply( permutations, countFrequency,
            tuples = tuples, population = population )

            return( frequencies )

            }


            And there you go:



            require( plyr )
            permutationFrequency( s, 2 )
            .id 1 0
            1 1 2 3
            2 0 2 7

            permutationFrequency( s, 3 )
            .id 1 0
            1 11 1 1
            2 01 1 1
            3 10 0 3
            4 00 2 4

            permutationFrequency( s, 4 )
            .id 1 0
            1 111 0 1
            2 011 1 0
            3 101 0 0
            4 001 1 1
            5 110 0 1
            6 010 0 1
            7 100 0 2
            8 000 2 2

            permutationFrequency( sample( -1:1, 10, replace = T ), 2 )
            .id 1 -1 0
            1 1 1 2 0
            2 -1 0 1 2
            3 0 1 0 2


            Apologies to my stochastic processes teacher, but functional programming problems in R were just more interesting than the Gambler's Ruin today...






            share|improve this answer


























            • Thanks very much for this, but the .id column appears to be missing in my output. Or am I missing something? The rest is exactly what I needed.

              – andrekos
              Feb 18 '10 at 1:30











            • Hmm, I noticed the .id column didn't show up if I gave an unnamed list or vector to ldply(). Did you include names(permutations) <- permutations?

              – Sharpie
              Feb 18 '10 at 1:40











            • Yes, to start with, I copypasted your code.

              – andrekos
              Feb 18 '10 at 8:26











            • Interesting. Could be a version thing-- I'm using R 2.10.1 and plyr 0.1.9

              – Sharpie
              Feb 18 '10 at 9:31











            • SessionInfo() informed I used plyr 0.1.3, and update.packages() did not help. But upgrading from R 2.9.2 did help :)

              – andrekos
              Feb 19 '10 at 0:11














            5












            5








            5







            Well, it seems like you would first need to generate n-tuples from your vector. The following function should accomplish that:



            makeTuples <- function( x, n ){

            # Very inefficient way to loop... but what the heck
            tuples <- list()

            for( i in 1:n ){

            tuples[[i]] <- x[i:(length(x)-n+i)]

            }

            return(tuples)

            }


            Then you could feed the results of makeTuples() to table() using do.call():



            do.call( table, makeTuples(s,3) )

            , , = 0


            0 1
            0 4 1
            1 3 1

            , , = 1


            0 1
            0 2 1
            1 0 1


            This works because the makeTuples() function returns the tuples as a list of lists. The output isn't quite as nice as you wanted, but you could write a function to reformat, say:



            , ,  = 0


            0 1
            0 4 1
            1 3 1


            To:



                 0 1
            00 4 1
            01 3 1


            It would require looping over the outer n-2 dimensions of the n-dimensional array returned by table, creating row names and concatenating things together.




            Update




            So, I was just sitting in a Stochastic processes class when I figured out a more or less straight-forward way to produce the output you want without trying to unwind the output of table(). First you will need a function that generates all possible permutations of n selections from your population. The generation of permutations can be done with expand.grid(), but it needs a little sugar-coating:



            permute <- function( population, n ){

            permutations <- do.call( expand.grid, rep( list(population), n ) )

            permutations <- apply( permutations, 1, paste, collapse = '' )

            return( permutations )

            }


            The basic idea is to iterate over the list of permutations and count the number of tuples that match the given permutation. Since you want the results split out into a table, we should select a permutation of n-1 elements from the population and let the last position form the columns of the table. Here's a function that takes a permutation of size n-1, a list of tuples, and the population the tuples were drawn from and produces a named vector of match counts:



            countFrequency <- function(permutation,tuples,population){

            permutations <- paste( permutation, population, sep = '' )

            # Inner lapply applies the equality operator `==` to each
            # permutation and returns a list of TRUE/FALSE vectors.
            # Outer lapply sums the number of TRUE values in each vector.
            frequencies <- lapply(lapply(permutations,`==`,tuples),sum)

            names( frequencies ) <- as.character( population )

            return( unlist(frequencies) )

            }


            Finally, all three functions can be combined into a bigger function that takes a vector, splits it into n-tuples and returns a frequency table. The final aggregation operation is done using ldply() from Hadley Wickham's plyr package as it does a nice job of preserving information such as which permutation corresponds to which row of output matches:



            permutationFrequency <- function( vector, n, population = unique( vector ) ){

            # Split the vector into tuples.
            tuples <- makeTuples( vector, n )

            # Coerce and compact the tuples to a vector of strings.
            tuples <- do.call(cbind,tuples)
            tuples <- apply( tuples, 1, paste, collapse = '' )

            # Generate permutations of n-1 elements from the population.
            # Turn into a named list for ldply() to work it's magic.
            permutations <- permute( population, n-1 )
            names( permutations ) <- permutations

            frequencies <- ldply( permutations, countFrequency,
            tuples = tuples, population = population )

            return( frequencies )

            }


            And there you go:



            require( plyr )
            permutationFrequency( s, 2 )
            .id 1 0
            1 1 2 3
            2 0 2 7

            permutationFrequency( s, 3 )
            .id 1 0
            1 11 1 1
            2 01 1 1
            3 10 0 3
            4 00 2 4

            permutationFrequency( s, 4 )
            .id 1 0
            1 111 0 1
            2 011 1 0
            3 101 0 0
            4 001 1 1
            5 110 0 1
            6 010 0 1
            7 100 0 2
            8 000 2 2

            permutationFrequency( sample( -1:1, 10, replace = T ), 2 )
            .id 1 -1 0
            1 1 1 2 0
            2 -1 0 1 2
            3 0 1 0 2


            Apologies to my stochastic processes teacher, but functional programming problems in R were just more interesting than the Gambler's Ruin today...






            share|improve this answer















            Well, it seems like you would first need to generate n-tuples from your vector. The following function should accomplish that:



            makeTuples <- function( x, n ){

            # Very inefficient way to loop... but what the heck
            tuples <- list()

            for( i in 1:n ){

            tuples[[i]] <- x[i:(length(x)-n+i)]

            }

            return(tuples)

            }


            Then you could feed the results of makeTuples() to table() using do.call():



            do.call( table, makeTuples(s,3) )

            , , = 0


            0 1
            0 4 1
            1 3 1

            , , = 1


            0 1
            0 2 1
            1 0 1


            This works because the makeTuples() function returns the tuples as a list of lists. The output isn't quite as nice as you wanted, but you could write a function to reformat, say:



            , ,  = 0


            0 1
            0 4 1
            1 3 1


            To:



                 0 1
            00 4 1
            01 3 1


            It would require looping over the outer n-2 dimensions of the n-dimensional array returned by table, creating row names and concatenating things together.




            Update




            So, I was just sitting in a Stochastic processes class when I figured out a more or less straight-forward way to produce the output you want without trying to unwind the output of table(). First you will need a function that generates all possible permutations of n selections from your population. The generation of permutations can be done with expand.grid(), but it needs a little sugar-coating:



            permute <- function( population, n ){

            permutations <- do.call( expand.grid, rep( list(population), n ) )

            permutations <- apply( permutations, 1, paste, collapse = '' )

            return( permutations )

            }


            The basic idea is to iterate over the list of permutations and count the number of tuples that match the given permutation. Since you want the results split out into a table, we should select a permutation of n-1 elements from the population and let the last position form the columns of the table. Here's a function that takes a permutation of size n-1, a list of tuples, and the population the tuples were drawn from and produces a named vector of match counts:



            countFrequency <- function(permutation,tuples,population){

            permutations <- paste( permutation, population, sep = '' )

            # Inner lapply applies the equality operator `==` to each
            # permutation and returns a list of TRUE/FALSE vectors.
            # Outer lapply sums the number of TRUE values in each vector.
            frequencies <- lapply(lapply(permutations,`==`,tuples),sum)

            names( frequencies ) <- as.character( population )

            return( unlist(frequencies) )

            }


            Finally, all three functions can be combined into a bigger function that takes a vector, splits it into n-tuples and returns a frequency table. The final aggregation operation is done using ldply() from Hadley Wickham's plyr package as it does a nice job of preserving information such as which permutation corresponds to which row of output matches:



            permutationFrequency <- function( vector, n, population = unique( vector ) ){

            # Split the vector into tuples.
            tuples <- makeTuples( vector, n )

            # Coerce and compact the tuples to a vector of strings.
            tuples <- do.call(cbind,tuples)
            tuples <- apply( tuples, 1, paste, collapse = '' )

            # Generate permutations of n-1 elements from the population.
            # Turn into a named list for ldply() to work it's magic.
            permutations <- permute( population, n-1 )
            names( permutations ) <- permutations

            frequencies <- ldply( permutations, countFrequency,
            tuples = tuples, population = population )

            return( frequencies )

            }


            And there you go:



            require( plyr )
            permutationFrequency( s, 2 )
            .id 1 0
            1 1 2 3
            2 0 2 7

            permutationFrequency( s, 3 )
            .id 1 0
            1 11 1 1
            2 01 1 1
            3 10 0 3
            4 00 2 4

            permutationFrequency( s, 4 )
            .id 1 0
            1 111 0 1
            2 011 1 0
            3 101 0 0
            4 001 1 1
            5 110 0 1
            6 010 0 1
            7 100 0 2
            8 000 2 2

            permutationFrequency( sample( -1:1, 10, replace = T ), 2 )
            .id 1 -1 0
            1 1 1 2 0
            2 -1 0 1 2
            3 0 1 0 2


            Apologies to my stochastic processes teacher, but functional programming problems in R were just more interesting than the Gambler's Ruin today...







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Feb 20 '10 at 19:42

























            answered Feb 17 '10 at 20:38









            SharpieSharpie

            11.4k43945




            11.4k43945













            • Thanks very much for this, but the .id column appears to be missing in my output. Or am I missing something? The rest is exactly what I needed.

              – andrekos
              Feb 18 '10 at 1:30











            • Hmm, I noticed the .id column didn't show up if I gave an unnamed list or vector to ldply(). Did you include names(permutations) <- permutations?

              – Sharpie
              Feb 18 '10 at 1:40











            • Yes, to start with, I copypasted your code.

              – andrekos
              Feb 18 '10 at 8:26











            • Interesting. Could be a version thing-- I'm using R 2.10.1 and plyr 0.1.9

              – Sharpie
              Feb 18 '10 at 9:31











            • SessionInfo() informed I used plyr 0.1.3, and update.packages() did not help. But upgrading from R 2.9.2 did help :)

              – andrekos
              Feb 19 '10 at 0:11



















            • Thanks very much for this, but the .id column appears to be missing in my output. Or am I missing something? The rest is exactly what I needed.

              – andrekos
              Feb 18 '10 at 1:30











            • Hmm, I noticed the .id column didn't show up if I gave an unnamed list or vector to ldply(). Did you include names(permutations) <- permutations?

              – Sharpie
              Feb 18 '10 at 1:40











            • Yes, to start with, I copypasted your code.

              – andrekos
              Feb 18 '10 at 8:26











            • Interesting. Could be a version thing-- I'm using R 2.10.1 and plyr 0.1.9

              – Sharpie
              Feb 18 '10 at 9:31











            • SessionInfo() informed I used plyr 0.1.3, and update.packages() did not help. But upgrading from R 2.9.2 did help :)

              – andrekos
              Feb 19 '10 at 0:11

















            Thanks very much for this, but the .id column appears to be missing in my output. Or am I missing something? The rest is exactly what I needed.

            – andrekos
            Feb 18 '10 at 1:30





            Thanks very much for this, but the .id column appears to be missing in my output. Or am I missing something? The rest is exactly what I needed.

            – andrekos
            Feb 18 '10 at 1:30













            Hmm, I noticed the .id column didn't show up if I gave an unnamed list or vector to ldply(). Did you include names(permutations) <- permutations?

            – Sharpie
            Feb 18 '10 at 1:40





            Hmm, I noticed the .id column didn't show up if I gave an unnamed list or vector to ldply(). Did you include names(permutations) <- permutations?

            – Sharpie
            Feb 18 '10 at 1:40













            Yes, to start with, I copypasted your code.

            – andrekos
            Feb 18 '10 at 8:26





            Yes, to start with, I copypasted your code.

            – andrekos
            Feb 18 '10 at 8:26













            Interesting. Could be a version thing-- I'm using R 2.10.1 and plyr 0.1.9

            – Sharpie
            Feb 18 '10 at 9:31





            Interesting. Could be a version thing-- I'm using R 2.10.1 and plyr 0.1.9

            – Sharpie
            Feb 18 '10 at 9:31













            SessionInfo() informed I used plyr 0.1.3, and update.packages() did not help. But upgrading from R 2.9.2 did help :)

            – andrekos
            Feb 19 '10 at 0:11





            SessionInfo() informed I used plyr 0.1.3, and update.packages() did not help. But upgrading from R 2.9.2 did help :)

            – andrekos
            Feb 19 '10 at 0:11













            1














            One approach is to create a data frame of the subsequences and then use the table function:



            s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)
            n<-length(s)
            k<-3
            subseqs<-t(sapply(1:(n-k+1),function(i){s[i:(i+k-1)]}))
            colnames(subseqs)<-paste('Y',1:k,sep="")
            subseqs<-data.frame(subseqs)
            table(subseqs)


            This produces



            , , Y3 = 0

            Y2
            Y1 0 1
            0 4 1
            1 3 1

            , , Y3 = 1

            Y2
            Y1 0 1
            0 2 1
            1 0 1


            Use ftable instead of table or on the output of table for a display similar to the one in your question:



            ftable(subseqs)
            Y3 0 1
            Y1 Y2
            0 0 4 2
            1 1 1
            1 0 3 0
            1 1 1





            share|improve this answer




























              1














              One approach is to create a data frame of the subsequences and then use the table function:



              s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)
              n<-length(s)
              k<-3
              subseqs<-t(sapply(1:(n-k+1),function(i){s[i:(i+k-1)]}))
              colnames(subseqs)<-paste('Y',1:k,sep="")
              subseqs<-data.frame(subseqs)
              table(subseqs)


              This produces



              , , Y3 = 0

              Y2
              Y1 0 1
              0 4 1
              1 3 1

              , , Y3 = 1

              Y2
              Y1 0 1
              0 2 1
              1 0 1


              Use ftable instead of table or on the output of table for a display similar to the one in your question:



              ftable(subseqs)
              Y3 0 1
              Y1 Y2
              0 0 4 2
              1 1 1
              1 0 3 0
              1 1 1





              share|improve this answer


























                1












                1








                1







                One approach is to create a data frame of the subsequences and then use the table function:



                s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)
                n<-length(s)
                k<-3
                subseqs<-t(sapply(1:(n-k+1),function(i){s[i:(i+k-1)]}))
                colnames(subseqs)<-paste('Y',1:k,sep="")
                subseqs<-data.frame(subseqs)
                table(subseqs)


                This produces



                , , Y3 = 0

                Y2
                Y1 0 1
                0 4 1
                1 3 1

                , , Y3 = 1

                Y2
                Y1 0 1
                0 2 1
                1 0 1


                Use ftable instead of table or on the output of table for a display similar to the one in your question:



                ftable(subseqs)
                Y3 0 1
                Y1 Y2
                0 0 4 2
                1 1 1
                1 0 3 0
                1 1 1





                share|improve this answer













                One approach is to create a data frame of the subsequences and then use the table function:



                s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)
                n<-length(s)
                k<-3
                subseqs<-t(sapply(1:(n-k+1),function(i){s[i:(i+k-1)]}))
                colnames(subseqs)<-paste('Y',1:k,sep="")
                subseqs<-data.frame(subseqs)
                table(subseqs)


                This produces



                , , Y3 = 0

                Y2
                Y1 0 1
                0 4 1
                1 3 1

                , , Y3 = 1

                Y2
                Y1 0 1
                0 2 1
                1 0 1


                Use ftable instead of table or on the output of table for a display similar to the one in your question:



                ftable(subseqs)
                Y3 0 1
                Y1 Y2
                0 0 4 2
                1 1 1
                1 0 3 0
                1 1 1






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Feb 18 '10 at 9:13









                Jyotirmoy BhattacharyaJyotirmoy Bhattacharya

                6,19732434




                6,19732434






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f2278951%2ffrequencies-of-all-subsequences-of-size-3-in-a-given-0-1-sequence%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to pass form data using jquery Ajax to insert data in database?

                    National Museum of Racing and Hall of Fame

                    Guess what letter conforming each word