Pandas merge handling duplicates in join output
up vote
1
down vote
favorite
Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?
e.g
left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]
right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]
left = np.asarray(left)
right = np.asarray(right)
left = pd.DataFrame(left)
right = pd.DataFrame(right)
joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])
So this is what we get
0 1 2
0 1 1 1
1 2 2 2
2 3 3 3
3 9 9 9
4 1 3 2
0 1 2
0 1 2 2
1 1 2 3
2 3 2 2
3 3 2 9
4 3 2 2
0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 NaN NaN
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 NaN NaN
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0
So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.
Is there a nice way of doing it using pandas short cut tricks?
thank you!
python pandas dataframe random merge
add a comment |
up vote
1
down vote
favorite
Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?
e.g
left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]
right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]
left = np.asarray(left)
right = np.asarray(right)
left = pd.DataFrame(left)
right = pd.DataFrame(right)
joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])
So this is what we get
0 1 2
0 1 1 1
1 2 2 2
2 3 3 3
3 9 9 9
4 1 3 2
0 1 2
0 1 2 2
1 1 2 3
2 3 2 2
3 3 2 9
4 3 2 2
0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 NaN NaN
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 NaN NaN
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0
So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.
Is there a nice way of doing it using pandas short cut tricks?
thank you!
python pandas dataframe random merge
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?
e.g
left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]
right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]
left = np.asarray(left)
right = np.asarray(right)
left = pd.DataFrame(left)
right = pd.DataFrame(right)
joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])
So this is what we get
0 1 2
0 1 1 1
1 2 2 2
2 3 3 3
3 9 9 9
4 1 3 2
0 1 2
0 1 2 2
1 1 2 3
2 3 2 2
3 3 2 9
4 3 2 2
0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 NaN NaN
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 NaN NaN
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0
So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.
Is there a nice way of doing it using pandas short cut tricks?
thank you!
python pandas dataframe random merge
Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?
e.g
left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]
right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]
left = np.asarray(left)
right = np.asarray(right)
left = pd.DataFrame(left)
right = pd.DataFrame(right)
joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])
So this is what we get
0 1 2
0 1 1 1
1 2 2 2
2 3 3 3
3 9 9 9
4 1 3 2
0 1 2
0 1 2 2
1 1 2 3
2 3 2 2
3 3 2 9
4 3 2 2
0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 NaN NaN
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 NaN NaN
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0
So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.
Is there a nice way of doing it using pandas short cut tricks?
thank you!
python pandas dataframe random merge
python pandas dataframe random merge
edited Nov 11 at 1:26
coldspeed
111k17101170
111k17101170
asked Nov 11 at 0:36
YohanRoth
8901919
8901919
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You can shuffle right
and drop_duplicates(...[, keep='first'])
before merging.
right2 = right.sample(frac=1).drop_duplicates(subset=[0])
left.merge(right2, how='left', left_on=[0], right_on=[0])
0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 2 2 2 NaN NaN
2 3 3 3 2.0 2.0
3 9 9 9 NaN NaN
4 1 3 2 2.0 2.0
We shuffle right
first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.
1
I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43
@YohanRoth - in this case - if your first row of the output is1 1 1 2.0 2.0
, I think that guarantees the last row is also1 3 2 2.0 2.0
since you've dropped1 2 3
. From your question asking for arandom
choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You can shuffle right
and drop_duplicates(...[, keep='first'])
before merging.
right2 = right.sample(frac=1).drop_duplicates(subset=[0])
left.merge(right2, how='left', left_on=[0], right_on=[0])
0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 2 2 2 NaN NaN
2 3 3 3 2.0 2.0
3 9 9 9 NaN NaN
4 1 3 2 2.0 2.0
We shuffle right
first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.
1
I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43
@YohanRoth - in this case - if your first row of the output is1 1 1 2.0 2.0
, I think that guarantees the last row is also1 3 2 2.0 2.0
since you've dropped1 2 3
. From your question asking for arandom
choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47
add a comment |
up vote
1
down vote
accepted
You can shuffle right
and drop_duplicates(...[, keep='first'])
before merging.
right2 = right.sample(frac=1).drop_duplicates(subset=[0])
left.merge(right2, how='left', left_on=[0], right_on=[0])
0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 2 2 2 NaN NaN
2 3 3 3 2.0 2.0
3 9 9 9 NaN NaN
4 1 3 2 2.0 2.0
We shuffle right
first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.
1
I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43
@YohanRoth - in this case - if your first row of the output is1 1 1 2.0 2.0
, I think that guarantees the last row is also1 3 2 2.0 2.0
since you've dropped1 2 3
. From your question asking for arandom
choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You can shuffle right
and drop_duplicates(...[, keep='first'])
before merging.
right2 = right.sample(frac=1).drop_duplicates(subset=[0])
left.merge(right2, how='left', left_on=[0], right_on=[0])
0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 2 2 2 NaN NaN
2 3 3 3 2.0 2.0
3 9 9 9 NaN NaN
4 1 3 2 2.0 2.0
We shuffle right
first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.
You can shuffle right
and drop_duplicates(...[, keep='first'])
before merging.
right2 = right.sample(frac=1).drop_duplicates(subset=[0])
left.merge(right2, how='left', left_on=[0], right_on=[0])
0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 2 2 2 NaN NaN
2 3 3 3 2.0 2.0
3 9 9 9 NaN NaN
4 1 3 2 2.0 2.0
We shuffle right
first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.
answered Nov 11 at 0:39
coldspeed
111k17101170
111k17101170
1
I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43
@YohanRoth - in this case - if your first row of the output is1 1 1 2.0 2.0
, I think that guarantees the last row is also1 3 2 2.0 2.0
since you've dropped1 2 3
. From your question asking for arandom
choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47
add a comment |
1
I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43
@YohanRoth - in this case - if your first row of the output is1 1 1 2.0 2.0
, I think that guarantees the last row is also1 3 2 2.0 2.0
since you've dropped1 2 3
. From your question asking for arandom
choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47
1
1
I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43
I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43
@YohanRoth - in this case - if your first row of the output is
1 1 1 2.0 2.0
, I think that guarantees the last row is also 1 3 2 2.0 2.0
since you've dropped 1 2 3
. From your question asking for a random
choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.– Joel
Nov 11 at 4:47
@YohanRoth - in this case - if your first row of the output is
1 1 1 2.0 2.0
, I think that guarantees the last row is also 1 3 2 2.0 2.0
since you've dropped 1 2 3
. From your question asking for a random
choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.– Joel
Nov 11 at 4:47
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244793%2fpandas-merge-handling-duplicates-in-join-output%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown