Find duplicate lines based on column and print both lines and their numbers with awk

Multi tool use
Multi tool use











up vote
3
down vote

favorite












I have a following file:



userID PWD_HASH
test 1234
admin 1234
user 6789
abcd 5555
efgh 6666
root 1234


Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:



NR $0
1 test 1234
2 admin 1234
6 root 1234


I have tried the following, but it does not print the correct row number with NR :



awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt


Any help would be appreciated!










share|improve this question




























    up vote
    3
    down vote

    favorite












    I have a following file:



    userID PWD_HASH
    test 1234
    admin 1234
    user 6789
    abcd 5555
    efgh 6666
    root 1234


    Using AWK,
    I need to find both original lines and their duplicates with row numbers,
    so that get the output like:



    NR $0
    1 test 1234
    2 admin 1234
    6 root 1234


    I have tried the following, but it does not print the correct row number with NR :



    awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt


    Any help would be appreciated!










    share|improve this question


























      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      I have a following file:



      userID PWD_HASH
      test 1234
      admin 1234
      user 6789
      abcd 5555
      efgh 6666
      root 1234


      Using AWK,
      I need to find both original lines and their duplicates with row numbers,
      so that get the output like:



      NR $0
      1 test 1234
      2 admin 1234
      6 root 1234


      I have tried the following, but it does not print the correct row number with NR :



      awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt


      Any help would be appreciated!










      share|improve this question















      I have a following file:



      userID PWD_HASH
      test 1234
      admin 1234
      user 6789
      abcd 5555
      efgh 6666
      root 1234


      Using AWK,
      I need to find both original lines and their duplicates with row numbers,
      so that get the output like:



      NR $0
      1 test 1234
      2 admin 1234
      6 root 1234


      I have tried the following, but it does not print the correct row number with NR :



      awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt


      Any help would be appreciated!







      linux awk duplicates






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 8 at 10:14









      Wiktor Stribiżew

      299k16121195




      299k16121195










      asked Nov 8 at 10:12









      skazichris

      4227




      4227
























          4 Answers
          4






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          $ awk '
          ($2 in a) { # look for duplicates in $2
          if(a[$2]) { # if found
          print a[$2] # output the first, stored one
          a[$2]="" # mark it outputed
          }
          print NR,$0 # print the duplicated one
          next # skip the storing part that follows
          }
          {
          a[$2]=NR OFS $0 # store the first of each with NR and full record
          }' file


          Output (with the header in file):



          2 test 1234
          3 admin 1234
          7 root 1234





          share|improve this answer

















          • 1




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27


















          up vote
          1
          down vote













          Using GAWK, you can do this by below construct : -



          awk '
          {
          NR>1
          {
          a[$2][NR-1 " " $0];
          }
          }
          END {
          for (i in a)
          if(length(a[i]) > 1)
          for (j in a[i])
          print j;
          }
          ' Input_File.txt


          Create a 2-dimensional array.



          In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).



          To display only duplicate ones, you can use length(a[i] > 1) condition.






          share|improve this answer























          • Thank you so much!!!
            – skazichris
            Nov 8 at 12:28


















          up vote
          1
          down vote













          Could you please try following.



          awk '
          FNR==NR{
          a[$2]++
          b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
          next
          }
          a[$2]>1{
          print b[$2,FNR]
          }
          ' Input_file Input_file


          Output will be as follows.



          1 test 1234
          2 admin 1234
          6 root 1234


          Explanation: Following is the explanation for above code.



          awk '                                        ##Starting awk program here.
          FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
          a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
          b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
          next ##Using next for skipping all further statements from here.
          }
          a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
          print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
          }
          ' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.





          share|improve this answer



















          • 1




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27






          • 1




            And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
            – skazichris
            Nov 8 at 12:31


















          up vote
          0
          down vote













          Without using awk, but GNU coretutils tools:



          tail -n+2 file | nl | sort -k3n | uniq -D -f2


          tail remove the first line.
          nl add line number.
          sort based on the 3rd field.
          uniq only prints duplicate based on the 3rd field.






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205551%2ffind-duplicate-lines-based-on-column-and-print-both-lines-and-their-numbers-with%23new-answer', 'question_page');
            }
            );

            Post as a guest
































            4 Answers
            4






            active

            oldest

            votes








            4 Answers
            4






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            $ awk '
            ($2 in a) { # look for duplicates in $2
            if(a[$2]) { # if found
            print a[$2] # output the first, stored one
            a[$2]="" # mark it outputed
            }
            print NR,$0 # print the duplicated one
            next # skip the storing part that follows
            }
            {
            a[$2]=NR OFS $0 # store the first of each with NR and full record
            }' file


            Output (with the header in file):



            2 test 1234
            3 admin 1234
            7 root 1234





            share|improve this answer

















            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27















            up vote
            1
            down vote



            accepted










            $ awk '
            ($2 in a) { # look for duplicates in $2
            if(a[$2]) { # if found
            print a[$2] # output the first, stored one
            a[$2]="" # mark it outputed
            }
            print NR,$0 # print the duplicated one
            next # skip the storing part that follows
            }
            {
            a[$2]=NR OFS $0 # store the first of each with NR and full record
            }' file


            Output (with the header in file):



            2 test 1234
            3 admin 1234
            7 root 1234





            share|improve this answer

















            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27













            up vote
            1
            down vote



            accepted







            up vote
            1
            down vote



            accepted






            $ awk '
            ($2 in a) { # look for duplicates in $2
            if(a[$2]) { # if found
            print a[$2] # output the first, stored one
            a[$2]="" # mark it outputed
            }
            print NR,$0 # print the duplicated one
            next # skip the storing part that follows
            }
            {
            a[$2]=NR OFS $0 # store the first of each with NR and full record
            }' file


            Output (with the header in file):



            2 test 1234
            3 admin 1234
            7 root 1234





            share|improve this answer












            $ awk '
            ($2 in a) { # look for duplicates in $2
            if(a[$2]) { # if found
            print a[$2] # output the first, stored one
            a[$2]="" # mark it outputed
            }
            print NR,$0 # print the duplicated one
            next # skip the storing part that follows
            }
            {
            a[$2]=NR OFS $0 # store the first of each with NR and full record
            }' file


            Output (with the header in file):



            2 test 1234
            3 admin 1234
            7 root 1234






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 8 at 11:18









            James Brown

            17.1k31634




            17.1k31634








            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27














            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27








            1




            1




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27












            up vote
            1
            down vote













            Using GAWK, you can do this by below construct : -



            awk '
            {
            NR>1
            {
            a[$2][NR-1 " " $0];
            }
            }
            END {
            for (i in a)
            if(length(a[i]) > 1)
            for (j in a[i])
            print j;
            }
            ' Input_File.txt


            Create a 2-dimensional array.



            In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).



            To display only duplicate ones, you can use length(a[i] > 1) condition.






            share|improve this answer























            • Thank you so much!!!
              – skazichris
              Nov 8 at 12:28















            up vote
            1
            down vote













            Using GAWK, you can do this by below construct : -



            awk '
            {
            NR>1
            {
            a[$2][NR-1 " " $0];
            }
            }
            END {
            for (i in a)
            if(length(a[i]) > 1)
            for (j in a[i])
            print j;
            }
            ' Input_File.txt


            Create a 2-dimensional array.



            In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).



            To display only duplicate ones, you can use length(a[i] > 1) condition.






            share|improve this answer























            • Thank you so much!!!
              – skazichris
              Nov 8 at 12:28













            up vote
            1
            down vote










            up vote
            1
            down vote









            Using GAWK, you can do this by below construct : -



            awk '
            {
            NR>1
            {
            a[$2][NR-1 " " $0];
            }
            }
            END {
            for (i in a)
            if(length(a[i]) > 1)
            for (j in a[i])
            print j;
            }
            ' Input_File.txt


            Create a 2-dimensional array.



            In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).



            To display only duplicate ones, you can use length(a[i] > 1) condition.






            share|improve this answer














            Using GAWK, you can do this by below construct : -



            awk '
            {
            NR>1
            {
            a[$2][NR-1 " " $0];
            }
            }
            END {
            for (i in a)
            if(length(a[i]) > 1)
            for (j in a[i])
            print j;
            }
            ' Input_File.txt


            Create a 2-dimensional array.



            In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).



            To display only duplicate ones, you can use length(a[i] > 1) condition.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 8 at 10:59

























            answered Nov 8 at 10:53









            GanitK

            4081415




            4081415












            • Thank you so much!!!
              – skazichris
              Nov 8 at 12:28


















            • Thank you so much!!!
              – skazichris
              Nov 8 at 12:28
















            Thank you so much!!!
            – skazichris
            Nov 8 at 12:28




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:28










            up vote
            1
            down vote













            Could you please try following.



            awk '
            FNR==NR{
            a[$2]++
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
            next
            }
            a[$2]>1{
            print b[$2,FNR]
            }
            ' Input_file Input_file


            Output will be as follows.



            1 test 1234
            2 admin 1234
            6 root 1234


            Explanation: Following is the explanation for above code.



            awk '                                        ##Starting awk program here.
            FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
            a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
            next ##Using next for skipping all further statements from here.
            }
            a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
            print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
            }
            ' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.





            share|improve this answer



















            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27






            • 1




              And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
              – skazichris
              Nov 8 at 12:31















            up vote
            1
            down vote













            Could you please try following.



            awk '
            FNR==NR{
            a[$2]++
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
            next
            }
            a[$2]>1{
            print b[$2,FNR]
            }
            ' Input_file Input_file


            Output will be as follows.



            1 test 1234
            2 admin 1234
            6 root 1234


            Explanation: Following is the explanation for above code.



            awk '                                        ##Starting awk program here.
            FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
            a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
            next ##Using next for skipping all further statements from here.
            }
            a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
            print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
            }
            ' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.





            share|improve this answer



















            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27






            • 1




              And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
              – skazichris
              Nov 8 at 12:31













            up vote
            1
            down vote










            up vote
            1
            down vote









            Could you please try following.



            awk '
            FNR==NR{
            a[$2]++
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
            next
            }
            a[$2]>1{
            print b[$2,FNR]
            }
            ' Input_file Input_file


            Output will be as follows.



            1 test 1234
            2 admin 1234
            6 root 1234


            Explanation: Following is the explanation for above code.



            awk '                                        ##Starting awk program here.
            FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
            a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
            next ##Using next for skipping all further statements from here.
            }
            a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
            print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
            }
            ' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.





            share|improve this answer














            Could you please try following.



            awk '
            FNR==NR{
            a[$2]++
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
            next
            }
            a[$2]>1{
            print b[$2,FNR]
            }
            ' Input_file Input_file


            Output will be as follows.



            1 test 1234
            2 admin 1234
            6 root 1234


            Explanation: Following is the explanation for above code.



            awk '                                        ##Starting awk program here.
            FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
            a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
            b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
            next ##Using next for skipping all further statements from here.
            }
            a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
            print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
            }
            ' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 8 at 11:14

























            answered Nov 8 at 11:07









            RavinderSingh13

            23.5k41337




            23.5k41337








            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27






            • 1




              And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
              – skazichris
              Nov 8 at 12:31














            • 1




              Thank you so much!!!
              – skazichris
              Nov 8 at 12:27






            • 1




              And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
              – skazichris
              Nov 8 at 12:31








            1




            1




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27




            Thank you so much!!!
            – skazichris
            Nov 8 at 12:27




            1




            1




            And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
            – skazichris
            Nov 8 at 12:31




            And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
            – skazichris
            Nov 8 at 12:31










            up vote
            0
            down vote













            Without using awk, but GNU coretutils tools:



            tail -n+2 file | nl | sort -k3n | uniq -D -f2


            tail remove the first line.
            nl add line number.
            sort based on the 3rd field.
            uniq only prints duplicate based on the 3rd field.






            share|improve this answer

























              up vote
              0
              down vote













              Without using awk, but GNU coretutils tools:



              tail -n+2 file | nl | sort -k3n | uniq -D -f2


              tail remove the first line.
              nl add line number.
              sort based on the 3rd field.
              uniq only prints duplicate based on the 3rd field.






              share|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                Without using awk, but GNU coretutils tools:



                tail -n+2 file | nl | sort -k3n | uniq -D -f2


                tail remove the first line.
                nl add line number.
                sort based on the 3rd field.
                uniq only prints duplicate based on the 3rd field.






                share|improve this answer












                Without using awk, but GNU coretutils tools:



                tail -n+2 file | nl | sort -k3n | uniq -D -f2


                tail remove the first line.
                nl add line number.
                sort based on the 3rd field.
                uniq only prints duplicate based on the 3rd field.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 8 at 12:57









                oliv

                8,1751130




                8,1751130






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205551%2ffind-duplicate-lines-based-on-column-and-print-both-lines-and-their-numbers-with%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest




















































































                    a isKAPKzfcu1lbrA
                    jvjb33Tq

                    Popular posts from this blog

                    How to pass form data using jquery Ajax to insert data in database?

                    Guess what letter conforming each word

                    Run scheduled task as local user group (not BUILTIN)