Pandas merge handling duplicates in join output

up vote
1
down vote

favorite

Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?

e.g

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]

right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]

left = np.asarray(left)

right = np.asarray(right)

left = pd.DataFrame(left)

right = pd.DataFrame(right)

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

So this is what we get

   0  1  2

0  1  1  1

1  2  2  2

2  3  3  3

3  9  9  9

4  1  3  2



   0  1  2

0  1  2  2

1  1  2  3

2  3  2  2

3  3  2  9

4  3  2  2



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  1    1    1  2.0  3.0

2  2    2    2  NaN  NaN

3  3    3    3  2.0  2.0

4  3    3    3  2.0  9.0

5  3    3    3  2.0  2.0

6  9    9    9  NaN  NaN

7  1    3    2  2.0  2.0

8  1    3    2  2.0  3.0

So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.

Is there a nice way of doing it using pandas short cut tricks?

thank you!

edited Nov 11 at 1:26

coldspeed

111k17101170

asked Nov 11 at 0:36

YohanRoth

8901919

add a comment |

up vote
1
down vote

favorite

Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?

e.g

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]

right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]

left = np.asarray(left)

right = np.asarray(right)

left = pd.DataFrame(left)

right = pd.DataFrame(right)

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

So this is what we get

   0  1  2

0  1  1  1

1  2  2  2

2  3  3  3

3  9  9  9

4  1  3  2



   0  1  2

0  1  2  2

1  1  2  3

2  3  2  2

3  3  2  9

4  3  2  2



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  1    1    1  2.0  3.0

2  2    2    2  NaN  NaN

3  3    3    3  2.0  2.0

4  3    3    3  2.0  9.0

5  3    3    3  2.0  2.0

6  9    9    9  NaN  NaN

7  1    3    2  2.0  2.0

8  1    3    2  2.0  3.0

So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.

Is there a nice way of doing it using pandas short cut tricks?

thank you!

edited Nov 11 at 1:26

coldspeed

111k17101170

asked Nov 11 at 0:36

YohanRoth

8901919

add a comment |

up vote
1
down vote

favorite

Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?

e.g

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]

right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]

left = np.asarray(left)

right = np.asarray(right)

left = pd.DataFrame(left)

right = pd.DataFrame(right)

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

So this is what we get

   0  1  2

0  1  1  1

1  2  2  2

2  3  3  3

3  9  9  9

4  1  3  2



   0  1  2

0  1  2  2

1  1  2  3

2  3  2  2

3  3  2  9

4  3  2  2



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  1    1    1  2.0  3.0

2  2    2    2  NaN  NaN

3  3    3    3  2.0  2.0

4  3    3    3  2.0  9.0

5  3    3    3  2.0  2.0

6  9    9    9  NaN  NaN

7  1    3    2  2.0  2.0

8  1    3    2  2.0  3.0

So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.

Is there a nice way of doing it using pandas short cut tricks?

thank you!

edited Nov 11 at 1:26

coldspeed

111k17101170

asked Nov 11 at 0:36

YohanRoth

8901919

Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?

e.g

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]

right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]

left = np.asarray(left)

right = np.asarray(right)

left = pd.DataFrame(left)

right = pd.DataFrame(right)

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

So this is what we get

   0  1  2

0  1  1  1

1  2  2  2

2  3  3  3

3  9  9  9

4  1  3  2



   0  1  2

0  1  2  2

1  1  2  3

2  3  2  2

3  3  2  9

4  3  2  2



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  1    1    1  2.0  3.0

2  2    2    2  NaN  NaN

3  3    3    3  2.0  2.0

4  3    3    3  2.0  9.0

5  3    3    3  2.0  2.0

6  9    9    9  NaN  NaN

7  1    3    2  2.0  2.0

8  1    3    2  2.0  3.0

So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.

Is there a nice way of doing it using pandas short cut tricks?

thank you!

python pandas dataframe random merge

edited Nov 11 at 1:26

coldspeed

111k17101170

asked Nov 11 at 0:36

YohanRoth

8901919

edited Nov 11 at 1:26

coldspeed

111k17101170

asked Nov 11 at 0:36

YohanRoth

8901919

edited Nov 11 at 1:26

coldspeed

111k17101170

edited Nov 11 at 1:26

coldspeed

111k17101170

edited Nov 11 at 1:26

coldspeed

111k17101170

asked Nov 11 at 0:36

YohanRoth

8901919

asked Nov 11 at 0:36

YohanRoth

8901919

asked Nov 11 at 0:36

YohanRoth

8901919

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can shuffle right and drop_duplicates(...[, keep='first']) before merging.

right2 = right.sample(frac=1).drop_duplicates(subset=[0])

left.merge(right2, how='left', left_on=[0], right_on=[0])



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  2    2    2  NaN  NaN

2  3    3    3  2.0  2.0

3  9    9    9  NaN  NaN

4  1    3    2  2.0  2.0

We shuffle right first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.

answered Nov 11 at 0:39

coldspeed

111k17101170

1

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244793%2fpandas-merge-handling-duplicates-in-join-output%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can shuffle right and drop_duplicates(...[, keep='first']) before merging.

right2 = right.sample(frac=1).drop_duplicates(subset=[0])

left.merge(right2, how='left', left_on=[0], right_on=[0])



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  2    2    2  NaN  NaN

2  3    3    3  2.0  2.0

3  9    9    9  NaN  NaN

4  1    3    2  2.0  2.0

We shuffle right first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.

answered Nov 11 at 0:39

coldspeed

111k17101170

1

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

add a comment |

up vote
1
down vote

accepted

You can shuffle right and drop_duplicates(...[, keep='first']) before merging.

right2 = right.sample(frac=1).drop_duplicates(subset=[0])

left.merge(right2, how='left', left_on=[0], right_on=[0])



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  2    2    2  NaN  NaN

2  3    3    3  2.0  2.0

3  9    9    9  NaN  NaN

4  1    3    2  2.0  2.0

We shuffle right first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.

answered Nov 11 at 0:39

coldspeed

111k17101170

1

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

add a comment |

up vote
1
down vote

accepted

You can shuffle right and drop_duplicates(...[, keep='first']) before merging.

right2 = right.sample(frac=1).drop_duplicates(subset=[0])

left.merge(right2, how='left', left_on=[0], right_on=[0])



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  2    2    2  NaN  NaN

2  3    3    3  2.0  2.0

3  9    9    9  NaN  NaN

4  1    3    2  2.0  2.0

We shuffle right first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.

answered Nov 11 at 0:39

coldspeed

111k17101170

You can shuffle right and drop_duplicates(...[, keep='first']) before merging.

right2 = right.sample(frac=1).drop_duplicates(subset=[0])

left.merge(right2, how='left', left_on=[0], right_on=[0])



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  2    2    2  NaN  NaN

2  3    3    3  2.0  2.0

3  9    9    9  NaN  NaN

4  1    3    2  2.0  2.0

We shuffle right first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.

answered Nov 11 at 0:39

coldspeed

111k17101170

answered Nov 11 at 0:39

coldspeed

111k17101170

answered Nov 11 at 0:39

coldspeed

111k17101170

answered Nov 11 at 0:39

coldspeed

111k17101170

1

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

add a comment |

1

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk