Find duplicate lines based on column and print both lines and their numbers with awk
up vote
3
down vote
favorite
I have a following file:
userID PWD_HASH
test 1234
admin 1234
user 6789
abcd 5555
efgh 6666
root 1234
Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:
NR $0
1 test 1234
2 admin 1234
6 root 1234
I have tried the following, but it does not print the correct row number with NR :
awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt
Any help would be appreciated!
linux awk duplicates
add a comment |
up vote
3
down vote
favorite
I have a following file:
userID PWD_HASH
test 1234
admin 1234
user 6789
abcd 5555
efgh 6666
root 1234
Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:
NR $0
1 test 1234
2 admin 1234
6 root 1234
I have tried the following, but it does not print the correct row number with NR :
awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt
Any help would be appreciated!
linux awk duplicates
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I have a following file:
userID PWD_HASH
test 1234
admin 1234
user 6789
abcd 5555
efgh 6666
root 1234
Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:
NR $0
1 test 1234
2 admin 1234
6 root 1234
I have tried the following, but it does not print the correct row number with NR :
awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt
Any help would be appreciated!
linux awk duplicates
I have a following file:
userID PWD_HASH
test 1234
admin 1234
user 6789
abcd 5555
efgh 6666
root 1234
Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:
NR $0
1 test 1234
2 admin 1234
6 root 1234
I have tried the following, but it does not print the correct row number with NR :
awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt
Any help would be appreciated!
linux awk duplicates
linux awk duplicates
edited Nov 8 at 10:14
Wiktor Stribiżew
299k16121195
299k16121195
asked Nov 8 at 10:12
skazichris
4227
4227
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
up vote
1
down vote
accepted
$ awk '
($2 in a) { # look for duplicates in $2
if(a[$2]) { # if found
print a[$2] # output the first, stored one
a[$2]="" # mark it outputed
}
print NR,$0 # print the duplicated one
next # skip the storing part that follows
}
{
a[$2]=NR OFS $0 # store the first of each with NR and full record
}' file
Output (with the header in file
):
2 test 1234
3 admin 1234
7 root 1234
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
add a comment |
up vote
1
down vote
Using GAWK, you can do this by below construct : -
awk '
{
NR>1
{
a[$2][NR-1 " " $0];
}
}
END {
for (i in a)
if(length(a[i]) > 1)
for (j in a[i])
print j;
}
' Input_File.txt
Create a 2-dimensional array.
In first dimension, store PWD_HASH
and in second dimension, store line number(NR-1
) concatenated with whole line($0
).
To display only duplicate ones, you can use length(a[i] > 1)
condition.
Thank you so much!!!
– skazichris
Nov 8 at 12:28
add a comment |
up vote
1
down vote
Could you please try following.
awk '
FNR==NR{
a[$2]++
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
next
}
a[$2]>1{
print b[$2,FNR]
}
' Input_file Input_file
Output will be as follows.
1 test 1234
2 admin 1234
6 root 1234
Explanation: Following is the explanation for above code.
awk ' ##Starting awk program here.
FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
next ##Using next for skipping all further statements from here.
}
a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
}
' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
1
And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31
add a comment |
up vote
0
down vote
Without using awk, but GNU coretutils tools:
tail -n+2 file | nl | sort -k3n | uniq -D -f2
tail
remove the first line.nl
add line number.sort
based on the 3rd field.uniq
only prints duplicate based on the 3rd field.
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
$ awk '
($2 in a) { # look for duplicates in $2
if(a[$2]) { # if found
print a[$2] # output the first, stored one
a[$2]="" # mark it outputed
}
print NR,$0 # print the duplicated one
next # skip the storing part that follows
}
{
a[$2]=NR OFS $0 # store the first of each with NR and full record
}' file
Output (with the header in file
):
2 test 1234
3 admin 1234
7 root 1234
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
add a comment |
up vote
1
down vote
accepted
$ awk '
($2 in a) { # look for duplicates in $2
if(a[$2]) { # if found
print a[$2] # output the first, stored one
a[$2]="" # mark it outputed
}
print NR,$0 # print the duplicated one
next # skip the storing part that follows
}
{
a[$2]=NR OFS $0 # store the first of each with NR and full record
}' file
Output (with the header in file
):
2 test 1234
3 admin 1234
7 root 1234
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
$ awk '
($2 in a) { # look for duplicates in $2
if(a[$2]) { # if found
print a[$2] # output the first, stored one
a[$2]="" # mark it outputed
}
print NR,$0 # print the duplicated one
next # skip the storing part that follows
}
{
a[$2]=NR OFS $0 # store the first of each with NR and full record
}' file
Output (with the header in file
):
2 test 1234
3 admin 1234
7 root 1234
$ awk '
($2 in a) { # look for duplicates in $2
if(a[$2]) { # if found
print a[$2] # output the first, stored one
a[$2]="" # mark it outputed
}
print NR,$0 # print the duplicated one
next # skip the storing part that follows
}
{
a[$2]=NR OFS $0 # store the first of each with NR and full record
}' file
Output (with the header in file
):
2 test 1234
3 admin 1234
7 root 1234
answered Nov 8 at 11:18
James Brown
17.1k31634
17.1k31634
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
add a comment |
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
1
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
Thank you so much!!!
– skazichris
Nov 8 at 12:27
add a comment |
up vote
1
down vote
Using GAWK, you can do this by below construct : -
awk '
{
NR>1
{
a[$2][NR-1 " " $0];
}
}
END {
for (i in a)
if(length(a[i]) > 1)
for (j in a[i])
print j;
}
' Input_File.txt
Create a 2-dimensional array.
In first dimension, store PWD_HASH
and in second dimension, store line number(NR-1
) concatenated with whole line($0
).
To display only duplicate ones, you can use length(a[i] > 1)
condition.
Thank you so much!!!
– skazichris
Nov 8 at 12:28
add a comment |
up vote
1
down vote
Using GAWK, you can do this by below construct : -
awk '
{
NR>1
{
a[$2][NR-1 " " $0];
}
}
END {
for (i in a)
if(length(a[i]) > 1)
for (j in a[i])
print j;
}
' Input_File.txt
Create a 2-dimensional array.
In first dimension, store PWD_HASH
and in second dimension, store line number(NR-1
) concatenated with whole line($0
).
To display only duplicate ones, you can use length(a[i] > 1)
condition.
Thank you so much!!!
– skazichris
Nov 8 at 12:28
add a comment |
up vote
1
down vote
up vote
1
down vote
Using GAWK, you can do this by below construct : -
awk '
{
NR>1
{
a[$2][NR-1 " " $0];
}
}
END {
for (i in a)
if(length(a[i]) > 1)
for (j in a[i])
print j;
}
' Input_File.txt
Create a 2-dimensional array.
In first dimension, store PWD_HASH
and in second dimension, store line number(NR-1
) concatenated with whole line($0
).
To display only duplicate ones, you can use length(a[i] > 1)
condition.
Using GAWK, you can do this by below construct : -
awk '
{
NR>1
{
a[$2][NR-1 " " $0];
}
}
END {
for (i in a)
if(length(a[i]) > 1)
for (j in a[i])
print j;
}
' Input_File.txt
Create a 2-dimensional array.
In first dimension, store PWD_HASH
and in second dimension, store line number(NR-1
) concatenated with whole line($0
).
To display only duplicate ones, you can use length(a[i] > 1)
condition.
edited Nov 8 at 10:59
answered Nov 8 at 10:53
GanitK
4081415
4081415
Thank you so much!!!
– skazichris
Nov 8 at 12:28
add a comment |
Thank you so much!!!
– skazichris
Nov 8 at 12:28
Thank you so much!!!
– skazichris
Nov 8 at 12:28
Thank you so much!!!
– skazichris
Nov 8 at 12:28
add a comment |
up vote
1
down vote
Could you please try following.
awk '
FNR==NR{
a[$2]++
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
next
}
a[$2]>1{
print b[$2,FNR]
}
' Input_file Input_file
Output will be as follows.
1 test 1234
2 admin 1234
6 root 1234
Explanation: Following is the explanation for above code.
awk ' ##Starting awk program here.
FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
next ##Using next for skipping all further statements from here.
}
a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
}
' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
1
And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31
add a comment |
up vote
1
down vote
Could you please try following.
awk '
FNR==NR{
a[$2]++
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
next
}
a[$2]>1{
print b[$2,FNR]
}
' Input_file Input_file
Output will be as follows.
1 test 1234
2 admin 1234
6 root 1234
Explanation: Following is the explanation for above code.
awk ' ##Starting awk program here.
FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
next ##Using next for skipping all further statements from here.
}
a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
}
' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
1
And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31
add a comment |
up vote
1
down vote
up vote
1
down vote
Could you please try following.
awk '
FNR==NR{
a[$2]++
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
next
}
a[$2]>1{
print b[$2,FNR]
}
' Input_file Input_file
Output will be as follows.
1 test 1234
2 admin 1234
6 root 1234
Explanation: Following is the explanation for above code.
awk ' ##Starting awk program here.
FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
next ##Using next for skipping all further statements from here.
}
a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
}
' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.
Could you please try following.
awk '
FNR==NR{
a[$2]++
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
next
}
a[$2]>1{
print b[$2,FNR]
}
' Input_file Input_file
Output will be as follows.
1 test 1234
2 admin 1234
6 root 1234
Explanation: Following is the explanation for above code.
awk ' ##Starting awk program here.
FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
next ##Using next for skipping all further statements from here.
}
a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
}
' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.
edited Nov 8 at 11:14
answered Nov 8 at 11:07
RavinderSingh13
23.5k41337
23.5k41337
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
1
And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31
add a comment |
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
1
And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31
1
1
Thank you so much!!!
– skazichris
Nov 8 at 12:27
Thank you so much!!!
– skazichris
Nov 8 at 12:27
1
1
And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31
And I just did. I had to gain more rep. (above 15) to be able to add more votes. Thanks again!
– skazichris
Nov 8 at 12:31
add a comment |
up vote
0
down vote
Without using awk, but GNU coretutils tools:
tail -n+2 file | nl | sort -k3n | uniq -D -f2
tail
remove the first line.nl
add line number.sort
based on the 3rd field.uniq
only prints duplicate based on the 3rd field.
add a comment |
up vote
0
down vote
Without using awk, but GNU coretutils tools:
tail -n+2 file | nl | sort -k3n | uniq -D -f2
tail
remove the first line.nl
add line number.sort
based on the 3rd field.uniq
only prints duplicate based on the 3rd field.
add a comment |
up vote
0
down vote
up vote
0
down vote
Without using awk, but GNU coretutils tools:
tail -n+2 file | nl | sort -k3n | uniq -D -f2
tail
remove the first line.nl
add line number.sort
based on the 3rd field.uniq
only prints duplicate based on the 3rd field.
Without using awk, but GNU coretutils tools:
tail -n+2 file | nl | sort -k3n | uniq -D -f2
tail
remove the first line.nl
add line number.sort
based on the 3rd field.uniq
only prints duplicate based on the 3rd field.
answered Nov 8 at 12:57
oliv
8,1751130
8,1751130
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205551%2ffind-duplicate-lines-based-on-column-and-print-both-lines-and-their-numbers-with%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password