How to substitute string values in one dataframe based on another “corrections” dataframe without for...
Say I have two DataFrames, an "original" and "correction" dataframe such that
orig = pd.DataFrame({'Name':['A','B'],'Code':['9q8','7k2']})
Name Code
A 9q8
B 7k2
corr = pd.DataFrame({'Name':['A',],'Code':['9R8',]})
Name Code
A 9R8
How can I assign orig['Code']=corr['Code']
wherever orig['Name']==corr['Name]
without merging or iterating through each correction in corr and then each original in orig? The final result should look like this:
Name Code
A 9R8
B 7k2
I did see another stack overflow post similar to my current question, but in that example the answer used a merge and then a selection mechanism to get rid of the old rows that are no longer desired. However, I have no such selection mechanism.
For example, if I merge orig and corr, I get:
out = pd.merge(orig,corr,on='Name',how='outer')
Name Code
A 9q8
A 9R8
B 7k2
Which, If I'm looking at thousands of rows, leaves me no way to know which of A 9q8
and A 9R8
was the substitution and which wasn't. I can't use an indicator column to tell substitutions from non-substitutions, because deleting all entries that are "left-only" would also remove the entry B 7k2
.
python pandas dataframe
add a comment |
Say I have two DataFrames, an "original" and "correction" dataframe such that
orig = pd.DataFrame({'Name':['A','B'],'Code':['9q8','7k2']})
Name Code
A 9q8
B 7k2
corr = pd.DataFrame({'Name':['A',],'Code':['9R8',]})
Name Code
A 9R8
How can I assign orig['Code']=corr['Code']
wherever orig['Name']==corr['Name]
without merging or iterating through each correction in corr and then each original in orig? The final result should look like this:
Name Code
A 9R8
B 7k2
I did see another stack overflow post similar to my current question, but in that example the answer used a merge and then a selection mechanism to get rid of the old rows that are no longer desired. However, I have no such selection mechanism.
For example, if I merge orig and corr, I get:
out = pd.merge(orig,corr,on='Name',how='outer')
Name Code
A 9q8
A 9R8
B 7k2
Which, If I'm looking at thousands of rows, leaves me no way to know which of A 9q8
and A 9R8
was the substitution and which wasn't. I can't use an indicator column to tell substitutions from non-substitutions, because deleting all entries that are "left-only" would also remove the entry B 7k2
.
python pandas dataframe
1
orig.update(corr)
– Chris
Nov 19 '18 at 16:46
Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when.update()
will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to.update()
that allows you to chose index/column
– David
Nov 19 '18 at 17:15
update()
"Aligns on indices" and updates matching columns in the left dataframe with data from the right
– Chris
Nov 19 '18 at 17:41
I see, so.update()
replaces the entries in the left table with those in the right wherever the indices match,index_left==index_right
, but only for columns that are in both the left table and the right table,colName_left == colName_right
. And, if I want, I can change the index I'm matching on to some other arbitrary column with.set_index()
.
– David
Nov 20 '18 at 17:45
1
yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index:df1['col1'].update(df2['col2'])
this will updatedf1['col1']
with the data fromdf2['col2']
where the indices match
– Chris
Nov 20 '18 at 19:05
add a comment |
Say I have two DataFrames, an "original" and "correction" dataframe such that
orig = pd.DataFrame({'Name':['A','B'],'Code':['9q8','7k2']})
Name Code
A 9q8
B 7k2
corr = pd.DataFrame({'Name':['A',],'Code':['9R8',]})
Name Code
A 9R8
How can I assign orig['Code']=corr['Code']
wherever orig['Name']==corr['Name]
without merging or iterating through each correction in corr and then each original in orig? The final result should look like this:
Name Code
A 9R8
B 7k2
I did see another stack overflow post similar to my current question, but in that example the answer used a merge and then a selection mechanism to get rid of the old rows that are no longer desired. However, I have no such selection mechanism.
For example, if I merge orig and corr, I get:
out = pd.merge(orig,corr,on='Name',how='outer')
Name Code
A 9q8
A 9R8
B 7k2
Which, If I'm looking at thousands of rows, leaves me no way to know which of A 9q8
and A 9R8
was the substitution and which wasn't. I can't use an indicator column to tell substitutions from non-substitutions, because deleting all entries that are "left-only" would also remove the entry B 7k2
.
python pandas dataframe
Say I have two DataFrames, an "original" and "correction" dataframe such that
orig = pd.DataFrame({'Name':['A','B'],'Code':['9q8','7k2']})
Name Code
A 9q8
B 7k2
corr = pd.DataFrame({'Name':['A',],'Code':['9R8',]})
Name Code
A 9R8
How can I assign orig['Code']=corr['Code']
wherever orig['Name']==corr['Name]
without merging or iterating through each correction in corr and then each original in orig? The final result should look like this:
Name Code
A 9R8
B 7k2
I did see another stack overflow post similar to my current question, but in that example the answer used a merge and then a selection mechanism to get rid of the old rows that are no longer desired. However, I have no such selection mechanism.
For example, if I merge orig and corr, I get:
out = pd.merge(orig,corr,on='Name',how='outer')
Name Code
A 9q8
A 9R8
B 7k2
Which, If I'm looking at thousands of rows, leaves me no way to know which of A 9q8
and A 9R8
was the substitution and which wasn't. I can't use an indicator column to tell substitutions from non-substitutions, because deleting all entries that are "left-only" would also remove the entry B 7k2
.
python pandas dataframe
python pandas dataframe
edited Nov 19 '18 at 16:51
David
asked Nov 19 '18 at 16:30
DavidDavid
316
316
1
orig.update(corr)
– Chris
Nov 19 '18 at 16:46
Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when.update()
will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to.update()
that allows you to chose index/column
– David
Nov 19 '18 at 17:15
update()
"Aligns on indices" and updates matching columns in the left dataframe with data from the right
– Chris
Nov 19 '18 at 17:41
I see, so.update()
replaces the entries in the left table with those in the right wherever the indices match,index_left==index_right
, but only for columns that are in both the left table and the right table,colName_left == colName_right
. And, if I want, I can change the index I'm matching on to some other arbitrary column with.set_index()
.
– David
Nov 20 '18 at 17:45
1
yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index:df1['col1'].update(df2['col2'])
this will updatedf1['col1']
with the data fromdf2['col2']
where the indices match
– Chris
Nov 20 '18 at 19:05
add a comment |
1
orig.update(corr)
– Chris
Nov 19 '18 at 16:46
Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when.update()
will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to.update()
that allows you to chose index/column
– David
Nov 19 '18 at 17:15
update()
"Aligns on indices" and updates matching columns in the left dataframe with data from the right
– Chris
Nov 19 '18 at 17:41
I see, so.update()
replaces the entries in the left table with those in the right wherever the indices match,index_left==index_right
, but only for columns that are in both the left table and the right table,colName_left == colName_right
. And, if I want, I can change the index I'm matching on to some other arbitrary column with.set_index()
.
– David
Nov 20 '18 at 17:45
1
yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index:df1['col1'].update(df2['col2'])
this will updatedf1['col1']
with the data fromdf2['col2']
where the indices match
– Chris
Nov 20 '18 at 19:05
1
1
orig.update(corr)
– Chris
Nov 19 '18 at 16:46
orig.update(corr)
– Chris
Nov 19 '18 at 16:46
Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when
.update()
will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to .update()
that allows you to chose index/column– David
Nov 19 '18 at 17:15
Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when
.update()
will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to .update()
that allows you to chose index/column– David
Nov 19 '18 at 17:15
update()
"Aligns on indices" and updates matching columns in the left dataframe with data from the right– Chris
Nov 19 '18 at 17:41
update()
"Aligns on indices" and updates matching columns in the left dataframe with data from the right– Chris
Nov 19 '18 at 17:41
I see, so
.update()
replaces the entries in the left table with those in the right wherever the indices match, index_left==index_right
, but only for columns that are in both the left table and the right table, colName_left == colName_right
. And, if I want, I can change the index I'm matching on to some other arbitrary column with .set_index()
.– David
Nov 20 '18 at 17:45
I see, so
.update()
replaces the entries in the left table with those in the right wherever the indices match, index_left==index_right
, but only for columns that are in both the left table and the right table, colName_left == colName_right
. And, if I want, I can change the index I'm matching on to some other arbitrary column with .set_index()
.– David
Nov 20 '18 at 17:45
1
1
yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index:
df1['col1'].update(df2['col2'])
this will update df1['col1']
with the data from df2['col2']
where the indices match– Chris
Nov 20 '18 at 19:05
yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index:
df1['col1'].update(df2['col2'])
this will update df1['col1']
with the data from df2['col2']
where the indices match– Chris
Nov 20 '18 at 19:05
add a comment |
1 Answer
1
active
oldest
votes
orig.set_index('Name',inplace=True)
orig.update(corr.set_index('Name'))
orig.reset_index(inplace=True)
orig
Out[164]:
Name Code
0 A 9R8
1 B 7k2
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378961%2fhow-to-substitute-string-values-in-one-dataframe-based-on-another-corrections%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
orig.set_index('Name',inplace=True)
orig.update(corr.set_index('Name'))
orig.reset_index(inplace=True)
orig
Out[164]:
Name Code
0 A 9R8
1 B 7k2
add a comment |
orig.set_index('Name',inplace=True)
orig.update(corr.set_index('Name'))
orig.reset_index(inplace=True)
orig
Out[164]:
Name Code
0 A 9R8
1 B 7k2
add a comment |
orig.set_index('Name',inplace=True)
orig.update(corr.set_index('Name'))
orig.reset_index(inplace=True)
orig
Out[164]:
Name Code
0 A 9R8
1 B 7k2
orig.set_index('Name',inplace=True)
orig.update(corr.set_index('Name'))
orig.reset_index(inplace=True)
orig
Out[164]:
Name Code
0 A 9R8
1 B 7k2
answered Nov 19 '18 at 16:56
community wiki
Wen-Ben
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378961%2fhow-to-substitute-string-values-in-one-dataframe-based-on-another-corrections%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
orig.update(corr)
– Chris
Nov 19 '18 at 16:46
Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when
.update()
will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to.update()
that allows you to chose index/column– David
Nov 19 '18 at 17:15
update()
"Aligns on indices" and updates matching columns in the left dataframe with data from the right– Chris
Nov 19 '18 at 17:41
I see, so
.update()
replaces the entries in the left table with those in the right wherever the indices match,index_left==index_right
, but only for columns that are in both the left table and the right table,colName_left == colName_right
. And, if I want, I can change the index I'm matching on to some other arbitrary column with.set_index()
.– David
Nov 20 '18 at 17:45
1
yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index:
df1['col1'].update(df2['col2'])
this will updatedf1['col1']
with the data fromdf2['col2']
where the indices match– Chris
Nov 20 '18 at 19:05