Find duplicate lines based on column and print both lines and their numbers with awk











up vote
3
down vote

favorite












I have a following file:



userID PWD_HASH
test 1234
admin 1234
user 6789
abcd 5555
efgh 6666
root 1234


Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:



NR $0
1 test 1234
2 admin 1234
6 root 1234


I have tried the following, but it does not print the correct row number with NR :



awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt


Any help would be appreciated!










share|improve this question




























    up vote
    3
    down vote

    favorite












    I have a following file:



    userID PWD_HASH
    test 1234
    admin 1234
    user 6789
    abcd 5555
    efgh 6666
    root 1234


    Using AWK,
    I need to find both original lines and their duplicates with row numbers,
    so that get the output like:



    NR $0
    1 test 1234
    2 admin 1234
    6 root 1234


    I have tried the following, but it does not print the correct row number with NR :



    awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt


    Any help would be appreciated!










    share|improve this question


























      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      I have a following file:



      userID PWD_HASH
      test 1234
      admin 1234
      user 6789
      abcd 5555
      efgh 6666
      root 1234


      Using AWK,
      I need to find both original lines and their duplicates with row numbers,
      so that get the output like:



      NR $0
      1 test 1234
      2 admin 1234
      6 root 1234


      I have tried the following, but it does not print the correct row number with NR :



      awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt


      Any help would be appreciated!










      share|improve this question















      I have a following file:



      userID PWD_HASH
      test 1234
      admin 1234
      user 6789
      abcd 5555
      efgh 6666
      root 1234


      Using AWK,
      I need to find both original lines and their duplicates with row numbers,
      so that get the output like:



      NR $0
      1 test 1234
      2 admin 1234
      6 root 1234


      I have tried the following, but it does not print the correct row number with NR :



      awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt


      Any help would be appreciated!







      linux awk duplicates






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 8 at 10:14









      Wiktor Stribiżew

      299k16121195




      299k16121195










      asked Nov 8 at 10:12









      skazichris

      4227




      4227
























          4 Answers
          4






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          $ awk '
          ($2 in a) { # look for duplicates in $2
          if(a[$2]) { # if found
          print a[$2] # output the first, stored one
          a[$2]="" # mark it outputed
          }
          print NR,$0 # print the duplicated one
          next # skip the storing part that follows
          }
          {
          a[$2]=NR OFS $0 # store the first of each with NR and full record
          }' file


          Output (with the header in file):



          2 test 1234
          3 admin 1234
          7 root 1234





          share|improve this answer

















          • 1




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27


















          up vote
          1
          down vote













          Using GAWK, you can do this by below construct : -



          awk '
          {
          NR>1
          {
          a[$2][NR-1 " " $0];
          }
          }
          END {
          for (i in a)
          if(length(a[i]) > 1)
          for (j in a[i])
          print j;
          }
          ' Input_File.txt


          Create a 2-dimensional array.



          In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).



          To display only duplicate ones, you can use length(a[i] > 1) condition.






          share|improve this answer























          • Thank you so much!!!
            – skazichris
            Nov 8 at 12:28


















          up vote
          1
          down vote













          Could you please try following.



          awk '
          FNR==NR{
          a[$2]++
          b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
          next
          }
          a[$2]>1{
          print b[$2,FNR]
          }
          ' Input_file Input_file


          Output will be as follows.



          1 test 1234
          2 admin 1234
          6 root 1234


          Explanation: Following is the explanation for above code.



          awk '                                        ##Starting awk program here.
          FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
          a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
          b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
          next ##Using next for skipping all further statements from here.
          }
          a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
          print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
          }
          ' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.





          share|improve this answer



















          • 1




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27






          • 1




            And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
            – skazichris
            Nov 8 at 12:31


















          up vote
          0
          down vote













          Without using awk, but GNU coretutils tools:



          tail -n+2 file | nl | sort -k3n | uniq -D -f2


          tail remove the first line.
          nl add line number.
          sort based on the 3rd field.
          uniq only prints duplicate based on the 3rd field.






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205551%2ffind-duplicate-lines-based-on-column-and-print-both-lines-and-their-numbers-with%23new-answer', 'question_page');
            }
            );

            Post as a guest
































            4 Answers
            4






            active

            oldest

            votes








            4 Answers
            4






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            $ awk '
            ($2 in a) { # look for duplicates in $2
            if(a[$2]) { # if found
            print a[$2] # output the first, stored one
            a[$2]="" # mark it outputed
            }
            print NR,$0 # print the duplicated one
            next # skip the storing part that follows
            }
            {
            a[$2]=NR OFS $0 # store the first of each with NR and full record
            }' file


            Output (with the header in file):



            2 test 1234
            3 admin 1234
            7 root 1234





            share|improve this answer

















            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27















            up vote
            1
            down vote



            accepted










            $ awk '
            ($2 in a) { # look for duplicates in $2
            if(a[$2]) { # if found
            print a[$2] # output the first, stored one
            a[$2]="" # mark it outputed
            }
            print NR,$0 # print the duplicated one
            next # skip the storing part that follows
            }
            {
            a[$2]=NR OFS $0 # store the first of each with NR and full record
            }' file


            Output (with the header in file):



            2 test 1234
            3 admin 1234
            7 root 1234





            share|improve this answer

















            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27













            up vote
            1
            down vote



            accepted







            up vote
            1
            down vote



            accepted






            $ awk '
            ($2 in a) { # look for duplicates in $2
            if(a[$2]) { # if found
            print a[$2] # output the first, stored one
            a[$2]="" # mark it outputed
            }
            print NR,$0 # print the duplicated one
            next # skip the storing part that follows
            }
            {
            a[$2]=NR OFS $0 # store the first of each with NR and full record
            }' file


            Output (with the header in file):



            2 test 1234
            3 admin 1234
            7 root 1234





            share|improve this answer












            $ awk '
            ($2 in a) { # look for duplicates in $2
            if(a[$2]) { # if found
            print a[$2] # output the first, stored one
            a[$2]="" # mark it outputed
            }
            print NR,$0 # print the duplicated one
            next # skip the storing part that follows
            }
            {
            a[$2]=NR OFS $0 # store the first of each with NR and full record
            }' file


            Output (with the header in file):



            2 test 1234
            3 admin 1234
            7 root 1234






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 8 at 11:18









            James Brown

            17.1k31634




            17.1k31634








            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27














            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27








            1




            1




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27












            up vote
            1
            down vote













            Using GAWK, you can do this by below construct : -



            awk '
            {
            NR>1
            {
            a[$2][NR-1 " " $0];
            }
            }
            END {
            for (i in a)
            if(length(a[i]) > 1)
            for (j in a[i])
            print j;
            }
            ' Input_File.txt


            Create a 2-dimensional array.



            In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).



            To display only duplicate ones, you can use length(a[i] > 1) condition.






            share|improve this answer























            • Thank you so much!!!
              – skazichris
              Nov 8 at 12:28















            up vote
            1
            down vote













            Using GAWK, you can do this by below construct : -



            awk '
            {
            NR>1
            {
            a[$2][NR-1 " " $0];
            }
            }
            END {
            for (i in a)
            if(length(a[i]) > 1)
            for (j in a[i])
            print j;
            }
            ' Input_File.txt


            Create a 2-dimensional array.



            In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).



            To display only duplicate ones, you can use length(a[i] > 1) condition.






            share|improve this answer























            • Thank you so much!!!
              – skazichris
              Nov 8 at 12:28













            up vote
            1
            down vote










            up vote
            1
            down vote









            Using GAWK, you can do this by below construct : -



            awk '
            {
            NR>1
            {
            a[$2][NR-1 " " $0];
            }
            }
            END {
            for (i in a)
            if(length(a[i]) > 1)
            for (j in a[i])
            print j;
            }
            ' Input_File.txt


            Create a 2-dimensional array.



            In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).



            To display only duplicate ones, you can use length(a[i] > 1) condition.






            share|improve this answer














            Using GAWK, you can do this by below construct : -



            awk '
            {
            NR>1
            {
            a[$2][NR-1 " " $0];
            }
            }
            END {
            for (i in a)
            if(length(a[i]) > 1)
            for (j in a[i])
            print j;
            }
            ' Input_File.txt


            Create a 2-dimensional array.



            In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).



            To display only duplicate ones, you can use length(a[i] > 1) condition.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 8 at 10:59

























            answered Nov 8 at 10:53









            GanitK

            4081415




            4081415












            • Thank you so much!!!
              – skazichris
              Nov 8 at 12:28


















            • Thank you so much!!!
              – skazichris
              Nov 8 at 12:28
















            Thank you so much!!!
            – skazichris
            Nov 8 at 12:28




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:28










            up vote
            1
            down vote













            Could you please try following.



            awk '
            FNR==NR{
            a[$2]++
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
            next
            }
            a[$2]>1{
            print b[$2,FNR]
            }
            ' Input_file Input_file


            Output will be as follows.



            1 test 1234
            2 admin 1234
            6 root 1234


            Explanation: Following is the explanation for above code.



            awk '                                        ##Starting awk program here.
            FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
            a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
            next ##Using next for skipping all further statements from here.
            }
            a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
            print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
            }
            ' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.





            share|improve this answer



















            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27






            • 1




              And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
              – skazichris
              Nov 8 at 12:31















            up vote
            1
            down vote













            Could you please try following.



            awk '
            FNR==NR{
            a[$2]++
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
            next
            }
            a[$2]>1{
            print b[$2,FNR]
            }
            ' Input_file Input_file


            Output will be as follows.



            1 test 1234
            2 admin 1234
            6 root 1234


            Explanation: Following is the explanation for above code.



            awk '                                        ##Starting awk program here.
            FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
            a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
            next ##Using next for skipping all further statements from here.
            }
            a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
            print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
            }
            ' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.





            share|improve this answer



















            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27






            • 1




              And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
              – skazichris
              Nov 8 at 12:31













            up vote
            1
            down vote










            up vote
            1
            down vote









            Could you please try following.



            awk '
            FNR==NR{
            a[$2]++
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
            next
            }
            a[$2]>1{
            print b[$2,FNR]
            }
            ' Input_file Input_file


            Output will be as follows.



            1 test 1234
            2 admin 1234
            6 root 1234


            Explanation: Following is the explanation for above code.



            awk '                                        ##Starting awk program here.
            FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
            a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
            next ##Using next for skipping all further statements from here.
            }
            a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
            print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
            }
            ' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.





            share|improve this answer














            Could you please try following.



            awk '
            FNR==NR{
            a[$2]++
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
            next
            }
            a[$2]>1{
            print b[$2,FNR]
            }
            ' Input_file Input_file


            Output will be as follows.



            1 test 1234
            2 admin 1234
            6 root 1234


            Explanation: Following is the explanation for above code.



            awk '                                        ##Starting awk program here.
            FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
            a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
            next ##Using next for skipping all further statements from here.
            }
            a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
            print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
            }
            ' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 8 at 11:14

























            answered Nov 8 at 11:07









            RavinderSingh13

            23.5k41337




            23.5k41337








            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27






            • 1




              And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
              – skazichris
              Nov 8 at 12:31














            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27






            • 1




              And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
              – skazichris
              Nov 8 at 12:31








            1




            1




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27




            1




            1




            And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
            – skazichris
            Nov 8 at 12:31




            And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
            – skazichris
            Nov 8 at 12:31










            up vote
            0
            down vote













            Without using awk, but GNU coretutils tools:



            tail -n+2 file | nl | sort -k3n | uniq -D -f2


            tail remove the first line.
            nl add line number.
            sort based on the 3rd field.
            uniq only prints duplicate based on the 3rd field.






            share|improve this answer

























              up vote
              0
              down vote













              Without using awk, but GNU coretutils tools:



              tail -n+2 file | nl | sort -k3n | uniq -D -f2


              tail remove the first line.
              nl add line number.
              sort based on the 3rd field.
              uniq only prints duplicate based on the 3rd field.






              share|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                Without using awk, but GNU coretutils tools:



                tail -n+2 file | nl | sort -k3n | uniq -D -f2


                tail remove the first line.
                nl add line number.
                sort based on the 3rd field.
                uniq only prints duplicate based on the 3rd field.






                share|improve this answer












                Without using awk, but GNU coretutils tools:



                tail -n+2 file | nl | sort -k3n | uniq -D -f2


                tail remove the first line.
                nl add line number.
                sort based on the 3rd field.
                uniq only prints duplicate based on the 3rd field.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 8 at 12:57









                oliv

                8,1751130




                8,1751130






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205551%2ffind-duplicate-lines-based-on-column-and-print-both-lines-and-their-numbers-with%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest




















































































                    Popular posts from this blog

                    Guess what letter conforming each word

                    Run scheduled task as local user group (not BUILTIN)

                    Port of Spain