Find duplicate lines based on column and print both lines and their numbers with awk

up vote
3
down vote

favorite

I have a following file:

userID PWD_HASH

test 1234

admin 1234

user 6789

abcd 5555

efgh 6666

root 1234

Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:

NR $0

1 test 1234

2 admin 1234

6 root 1234

I have tried the following, but it does not print the correct row number with NR :

awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt

Any help would be appreciated!

edited Nov 8 at 10:14

Wiktor Stribiżew

299k16121195

asked Nov 8 at 10:12

skazichris

4227

add a comment |

up vote
3
down vote

favorite

I have a following file:

userID PWD_HASH

test 1234

admin 1234

user 6789

abcd 5555

efgh 6666

root 1234

Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:

NR $0

1 test 1234

2 admin 1234

6 root 1234

I have tried the following, but it does not print the correct row number with NR :

awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt

Any help would be appreciated!

edited Nov 8 at 10:14

Wiktor Stribiżew

299k16121195

asked Nov 8 at 10:12

skazichris

4227

add a comment |

up vote
3
down vote

favorite

I have a following file:

userID PWD_HASH

test 1234

admin 1234

user 6789

abcd 5555

efgh 6666

root 1234

Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:

NR $0

1 test 1234

2 admin 1234

6 root 1234

I have tried the following, but it does not print the correct row number with NR :

awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt

Any help would be appreciated!

edited Nov 8 at 10:14

Wiktor Stribiżew

299k16121195

asked Nov 8 at 10:12

skazichris

4227

I have a following file:

userID PWD_HASH

test 1234

admin 1234

user 6789

abcd 5555

efgh 6666

root 1234

Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:

NR $0

1 test 1234

2 admin 1234

6 root 1234

I have tried the following, but it does not print the correct row number with NR :

awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt

Any help would be appreciated!

linux awk duplicates

edited Nov 8 at 10:14

Wiktor Stribiżew

299k16121195

asked Nov 8 at 10:12

skazichris

4227

edited Nov 8 at 10:14

Wiktor Stribiżew

299k16121195

asked Nov 8 at 10:12

skazichris

4227

edited Nov 8 at 10:14

Wiktor Stribiżew

299k16121195

edited Nov 8 at 10:14

Wiktor Stribiżew

299k16121195

edited Nov 8 at 10:14

Wiktor Stribiżew

299k16121195

asked Nov 8 at 10:12

skazichris

4227

asked Nov 8 at 10:12

skazichris

4227

asked Nov 8 at 10:12

skazichris

4227

add a comment |

4 Answers
4

active

oldest

votes

up vote
1
down vote

accepted

$ awk '

($2 in a) {          # look for duplicates in $2

    if(a[$2]) {      # if found

        print a[$2]  # output the first, stored one

        a[$2]=""     # mark it outputed

    }

    print NR,$0      # print the duplicated one

    next             # skip the storing part that follows

}

{

    a[$2]=NR OFS $0  # store the first of each with NR and full record

}' file

Output (with the header in file):

2 test 1234

3 admin 1234

7 root 1234

answered Nov 8 at 11:18

James Brown

17.1k31634

1

Thank you so much!!!
– skazichris
Nov 8 at 12:27

add a comment |

up vote
1
down vote

Using GAWK, you can do this by below construct : -

awk '

{

    NR>1

    {

       a[$2][NR-1 " " $0];

    }

}

END {

    for (i in a)

       if(length(a[i]) > 1)

          for (j in a[i])

             print j;

}

' Input_File.txt

Create a 2-dimensional array.

In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).

To display only duplicate ones, you can use length(a[i] > 1) condition.

edited Nov 8 at 10:59

answered Nov 8 at 10:53

GanitK

4081415

Thank you so much!!!
– skazichris
Nov 8 at 12:28

add a comment |

up vote
1
down vote

Could you please try following.

awk '

FNR==NR{

  a[$2]++

  b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0

  next

}

a[$2]>1{

  print b[$2,FNR]

}

'  Input_file  Input_file

Output will be as follows.

1 test 1234

2 admin 1234

6 root 1234

Explanation: Following is the explanation for above code.

awk '                                        ##Starting awk program here.

FNR==NR{                                     ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.

  a[$2]++                                    ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.

  b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0        ##Creating array b whose index is $2,FNR and concatenating its value to its own.

  next                                       ##Using next for skipping all further statements from here.

}

a[$2]>1{                                     ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.

  print b[$2,FNR]                            ##Printing value of array b whose index is $2,FNR here.

}

'  Input_file  Input_file                    ##Mentioning Input_file(s) names here 2 times.

edited Nov 8 at 11:14

answered Nov 8 at 11:07

RavinderSingh13

23.5k41337

1

Thank you so much!!!
– skazichris
Nov 8 at 12:27

1

And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31

add a comment |

up vote
0
down vote

Without using awk, but GNU coretutils tools:

tail -n+2 file | nl | sort -k3n | uniq -D -f2

tail remove the first line.
nl add line number.
sort based on the 3rd field.
uniq only prints duplicate based on the 3rd field.

answered Nov 8 at 12:57

oliv

8,1751130

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205551%2ffind-duplicate-lines-based-on-column-and-print-both-lines-and-their-numbers-with%23new-answer', 'question_page');
}
);

Post as a guest

Name

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
1
down vote

accepted

$ awk '

($2 in a) {          # look for duplicates in $2

    if(a[$2]) {      # if found

        print a[$2]  # output the first, stored one

        a[$2]=""     # mark it outputed

    }

    print NR,$0      # print the duplicated one

    next             # skip the storing part that follows

}

{

    a[$2]=NR OFS $0  # store the first of each with NR and full record

}' file

Output (with the header in file):

2 test 1234

3 admin 1234

7 root 1234

answered Nov 8 at 11:18

James Brown

17.1k31634

1

Thank you so much!!!
– skazichris
Nov 8 at 12:27

add a comment |

up vote
1
down vote

accepted

$ awk '

($2 in a) {          # look for duplicates in $2

    if(a[$2]) {      # if found

        print a[$2]  # output the first, stored one

        a[$2]=""     # mark it outputed

    }

    print NR,$0      # print the duplicated one

    next             # skip the storing part that follows

}

{

    a[$2]=NR OFS $0  # store the first of each with NR and full record

}' file

Output (with the header in file):

2 test 1234

3 admin 1234

7 root 1234

answered Nov 8 at 11:18

James Brown

17.1k31634

1

Thank you so much!!!
– skazichris
Nov 8 at 12:27

add a comment |

up vote
1
down vote

accepted

$ awk '

($2 in a) {          # look for duplicates in $2

    if(a[$2]) {      # if found

        print a[$2]  # output the first, stored one

        a[$2]=""     # mark it outputed

    }

    print NR,$0      # print the duplicated one

    next             # skip the storing part that follows

}

{

    a[$2]=NR OFS $0  # store the first of each with NR and full record

}' file

Output (with the header in file):

2 test 1234

3 admin 1234

7 root 1234

answered Nov 8 at 11:18

James Brown

17.1k31634

$ awk '

($2 in a) {          # look for duplicates in $2

    if(a[$2]) {      # if found

        print a[$2]  # output the first, stored one

        a[$2]=""     # mark it outputed

    }

    print NR,$0      # print the duplicated one

    next             # skip the storing part that follows

}

{

    a[$2]=NR OFS $0  # store the first of each with NR and full record

}' file

Output (with the header in file):

2 test 1234

3 admin 1234

7 root 1234

answered Nov 8 at 11:18

James Brown

17.1k31634

answered Nov 8 at 11:18

James Brown

17.1k31634

answered Nov 8 at 11:18

James Brown

17.1k31634

answered Nov 8 at 11:18

James Brown

17.1k31634

1

Thank you so much!!!
– skazichris
Nov 8 at 12:27

add a comment |

1

Thank you so much!!!
– skazichris
Nov 8 at 12:27

Thank you so much!!!
– skazichris
Nov 8 at 12:27

add a comment |

up vote
1
down vote

Using GAWK, you can do this by below construct : -

awk '

{

    NR>1

    {

       a[$2][NR-1 " " $0];

    }

}

END {

    for (i in a)

       if(length(a[i]) > 1)

          for (j in a[i])

             print j;

}

' Input_File.txt

Create a 2-dimensional array.

In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).

To display only duplicate ones, you can use length(a[i] > 1) condition.

edited Nov 8 at 10:59

answered Nov 8 at 10:53

GanitK

4081415

Thank you so much!!!
– skazichris
Nov 8 at 12:28

add a comment |

up vote
1
down vote

Using GAWK, you can do this by below construct : -

awk '

{

    NR>1

    {

       a[$2][NR-1 " " $0];

    }

}

END {

    for (i in a)

       if(length(a[i]) > 1)

          for (j in a[i])

             print j;

}

' Input_File.txt

Create a 2-dimensional array.

In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).

To display only duplicate ones, you can use length(a[i] > 1) condition.

edited Nov 8 at 10:59

answered Nov 8 at 10:53

GanitK

4081415

Thank you so much!!!
– skazichris
Nov 8 at 12:28

add a comment |

up vote
1
down vote

Using GAWK, you can do this by below construct : -

awk '

{

    NR>1

    {

       a[$2][NR-1 " " $0];

    }

}

END {

    for (i in a)

       if(length(a[i]) > 1)

          for (j in a[i])

             print j;

}

' Input_File.txt

Create a 2-dimensional array.

In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).

To display only duplicate ones, you can use length(a[i] > 1) condition.

edited Nov 8 at 10:59

answered Nov 8 at 10:53

GanitK

4081415

Using GAWK, you can do this by below construct : -

awk '

{

    NR>1

    {

       a[$2][NR-1 " " $0];

    }

}

END {

    for (i in a)

       if(length(a[i]) > 1)

          for (j in a[i])

             print j;

}

' Input_File.txt

Create a 2-dimensional array.

In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).

To display only duplicate ones, you can use length(a[i] > 1) condition.

edited Nov 8 at 10:59

answered Nov 8 at 10:53

GanitK

4081415

edited Nov 8 at 10:59

answered Nov 8 at 10:53

GanitK

4081415

answered Nov 8 at 10:53

GanitK

4081415

answered Nov 8 at 10:53

GanitK

4081415

Thank you so much!!!
– skazichris
Nov 8 at 12:28

add a comment |

Thank you so much!!!
– skazichris
Nov 8 at 12:28

Thank you so much!!!
– skazichris
Nov 8 at 12:28

add a comment |

up vote
1
down vote

Could you please try following.

awk '

FNR==NR{

  a[$2]++

  b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0

  next

}

a[$2]>1{

  print b[$2,FNR]

}

'  Input_file  Input_file

Output will be as follows.

1 test 1234

2 admin 1234

6 root 1234

Explanation: Following is the explanation for above code.

awk '                                        ##Starting awk program here.

FNR==NR{                                     ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.

  a[$2]++                                    ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.

  b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0        ##Creating array b whose index is $2,FNR and concatenating its value to its own.

  next                                       ##Using next for skipping all further statements from here.

}

a[$2]>1{                                     ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.

  print b[$2,FNR]                            ##Printing value of array b whose index is $2,FNR here.

}

'  Input_file  Input_file                    ##Mentioning Input_file(s) names here 2 times.

edited Nov 8 at 11:14

answered Nov 8 at 11:07

RavinderSingh13

23.5k41337

1

Thank you so much!!!
– skazichris
Nov 8 at 12:27

1

And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31

add a comment |

up vote
1
down vote

Could you please try following.

awk '

FNR==NR{

  a[$2]++

  b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0

  next

}

a[$2]>1{

  print b[$2,FNR]

}

'  Input_file  Input_file

Output will be as follows.

1 test 1234

2 admin 1234

6 root 1234

Explanation: Following is the explanation for above code.

awk '                                        ##Starting awk program here.

FNR==NR{                                     ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.

  a[$2]++                                    ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.

  b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0        ##Creating array b whose index is $2,FNR and concatenating its value to its own.

  next                                       ##Using next for skipping all further statements from here.

}

a[$2]>1{                                     ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.

  print b[$2,FNR]                            ##Printing value of array b whose index is $2,FNR here.

}

'  Input_file  Input_file                    ##Mentioning Input_file(s) names here 2 times.

edited Nov 8 at 11:14

answered Nov 8 at 11:07

RavinderSingh13

23.5k41337

1

Thank you so much!!!
– skazichris
Nov 8 at 12:27

1

And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31

add a comment |

up vote
1
down vote

Could you please try following.

awk '

FNR==NR{

  a[$2]++

  b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0

  next

}

a[$2]>1{

  print b[$2,FNR]

}

'  Input_file  Input_file

Output will be as follows.

1 test 1234

2 admin 1234

6 root 1234

Explanation: Following is the explanation for above code.

awk '                                        ##Starting awk program here.

FNR==NR{                                     ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.

  a[$2]++                                    ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.

  b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0        ##Creating array b whose index is $2,FNR and concatenating its value to its own.

  next                                       ##Using next for skipping all further statements from here.

}

a[$2]>1{                                     ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.

  print b[$2,FNR]                            ##Printing value of array b whose index is $2,FNR here.

}

'  Input_file  Input_file                    ##Mentioning Input_file(s) names here 2 times.

edited Nov 8 at 11:14

answered Nov 8 at 11:07

RavinderSingh13

23.5k41337

Could you please try following.

awk '

FNR==NR{

  a[$2]++

  b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0

  next

}

a[$2]>1{

  print b[$2,FNR]

}

'  Input_file  Input_file

Output will be as follows.

1 test 1234

2 admin 1234

6 root 1234

Explanation: Following is the explanation for above code.

awk '                                        ##Starting awk program here.

FNR==NR{                                     ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.

  a[$2]++                                    ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.

  b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0        ##Creating array b whose index is $2,FNR and concatenating its value to its own.

  next                                       ##Using next for skipping all further statements from here.

}

a[$2]>1{                                     ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.

  print b[$2,FNR]                            ##Printing value of array b whose index is $2,FNR here.

}

'  Input_file  Input_file                    ##Mentioning Input_file(s) names here 2 times.

edited Nov 8 at 11:14

answered Nov 8 at 11:07

RavinderSingh13

23.5k41337

edited Nov 8 at 11:14

answered Nov 8 at 11:07

RavinderSingh13

23.5k41337

answered Nov 8 at 11:07

RavinderSingh13

23.5k41337

answered Nov 8 at 11:07

RavinderSingh13

23.5k41337

1

Thank you so much!!!
– skazichris
Nov 8 at 12:27

1

And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31

add a comment |

1

Thank you so much!!!
– skazichris
Nov 8 at 12:27

1

And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31

Thank you so much!!!
– skazichris
Nov 8 at 12:27

And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31

add a comment |

up vote
0
down vote

Without using awk, but GNU coretutils tools:

tail -n+2 file | nl | sort -k3n | uniq -D -f2

tail remove the first line.
nl add line number.
sort based on the 3rd field.
uniq only prints duplicate based on the 3rd field.

answered Nov 8 at 12:57

oliv

8,1751130

add a comment |

up vote
0
down vote

Without using awk, but GNU coretutils tools:

tail -n+2 file | nl | sort -k3n | uniq -D -f2

tail remove the first line.
nl add line number.
sort based on the 3rd field.
uniq only prints duplicate based on the 3rd field.

answered Nov 8 at 12:57

oliv

8,1751130

add a comment |

up vote
0
down vote

Without using awk, but GNU coretutils tools:

tail -n+2 file | nl | sort -k3n | uniq -D -f2

tail remove the first line.
nl add line number.
sort based on the 3rd field.
uniq only prints duplicate based on the 3rd field.

answered Nov 8 at 12:57

oliv

8,1751130

Without using awk, but GNU coretutils tools:

tail -n+2 file | nl | sort -k3n | uniq -D -f2

tail remove the first line.
nl add line number.
sort based on the 3rd field.
uniq only prints duplicate based on the 3rd field.

answered Nov 8 at 12:57

oliv

8,1751130

answered Nov 8 at 12:57

oliv

8,1751130

answered Nov 8 at 12:57

oliv

8,1751130

answered Nov 8 at 12:57

oliv

8,1751130

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk