How to substitute string values in one dataframe based on another “corrections” dataframe without for...












0















Say I have two DataFrames, an "original" and "correction" dataframe such that



orig = pd.DataFrame({'Name':['A','B'],'Code':['9q8','7k2']})

Name Code
A 9q8
B 7k2

corr = pd.DataFrame({'Name':['A',],'Code':['9R8',]})
Name Code
A 9R8


How can I assign orig['Code']=corr['Code'] wherever orig['Name']==corr['Name] without merging or iterating through each correction in corr and then each original in orig? The final result should look like this:



Name Code
A 9R8
B 7k2


I did see another stack overflow post similar to my current question, but in that example the answer used a merge and then a selection mechanism to get rid of the old rows that are no longer desired. However, I have no such selection mechanism.



For example, if I merge orig and corr, I get:



out = pd.merge(orig,corr,on='Name',how='outer')
Name Code
A 9q8
A 9R8
B 7k2


Which, If I'm looking at thousands of rows, leaves me no way to know which of A 9q8 and A 9R8 was the substitution and which wasn't. I can't use an indicator column to tell substitutions from non-substitutions, because deleting all entries that are "left-only" would also remove the entry B 7k2.










share|improve this question




















  • 1





    orig.update(corr)

    – Chris
    Nov 19 '18 at 16:46











  • Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when .update() will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to .update() that allows you to chose index/column

    – David
    Nov 19 '18 at 17:15













  • update() "Aligns on indices" and updates matching columns in the left dataframe with data from the right

    – Chris
    Nov 19 '18 at 17:41













  • I see, so .update() replaces the entries in the left table with those in the right wherever the indices match, index_left==index_right, but only for columns that are in both the left table and the right table, colName_left == colName_right. And, if I want, I can change the index I'm matching on to some other arbitrary column with .set_index().

    – David
    Nov 20 '18 at 17:45






  • 1





    yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index: df1['col1'].update(df2['col2']) this will update df1['col1'] with the data from df2['col2'] where the indices match

    – Chris
    Nov 20 '18 at 19:05


















0















Say I have two DataFrames, an "original" and "correction" dataframe such that



orig = pd.DataFrame({'Name':['A','B'],'Code':['9q8','7k2']})

Name Code
A 9q8
B 7k2

corr = pd.DataFrame({'Name':['A',],'Code':['9R8',]})
Name Code
A 9R8


How can I assign orig['Code']=corr['Code'] wherever orig['Name']==corr['Name] without merging or iterating through each correction in corr and then each original in orig? The final result should look like this:



Name Code
A 9R8
B 7k2


I did see another stack overflow post similar to my current question, but in that example the answer used a merge and then a selection mechanism to get rid of the old rows that are no longer desired. However, I have no such selection mechanism.



For example, if I merge orig and corr, I get:



out = pd.merge(orig,corr,on='Name',how='outer')
Name Code
A 9q8
A 9R8
B 7k2


Which, If I'm looking at thousands of rows, leaves me no way to know which of A 9q8 and A 9R8 was the substitution and which wasn't. I can't use an indicator column to tell substitutions from non-substitutions, because deleting all entries that are "left-only" would also remove the entry B 7k2.










share|improve this question




















  • 1





    orig.update(corr)

    – Chris
    Nov 19 '18 at 16:46











  • Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when .update() will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to .update() that allows you to chose index/column

    – David
    Nov 19 '18 at 17:15













  • update() "Aligns on indices" and updates matching columns in the left dataframe with data from the right

    – Chris
    Nov 19 '18 at 17:41













  • I see, so .update() replaces the entries in the left table with those in the right wherever the indices match, index_left==index_right, but only for columns that are in both the left table and the right table, colName_left == colName_right. And, if I want, I can change the index I'm matching on to some other arbitrary column with .set_index().

    – David
    Nov 20 '18 at 17:45






  • 1





    yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index: df1['col1'].update(df2['col2']) this will update df1['col1'] with the data from df2['col2'] where the indices match

    – Chris
    Nov 20 '18 at 19:05
















0












0








0








Say I have two DataFrames, an "original" and "correction" dataframe such that



orig = pd.DataFrame({'Name':['A','B'],'Code':['9q8','7k2']})

Name Code
A 9q8
B 7k2

corr = pd.DataFrame({'Name':['A',],'Code':['9R8',]})
Name Code
A 9R8


How can I assign orig['Code']=corr['Code'] wherever orig['Name']==corr['Name] without merging or iterating through each correction in corr and then each original in orig? The final result should look like this:



Name Code
A 9R8
B 7k2


I did see another stack overflow post similar to my current question, but in that example the answer used a merge and then a selection mechanism to get rid of the old rows that are no longer desired. However, I have no such selection mechanism.



For example, if I merge orig and corr, I get:



out = pd.merge(orig,corr,on='Name',how='outer')
Name Code
A 9q8
A 9R8
B 7k2


Which, If I'm looking at thousands of rows, leaves me no way to know which of A 9q8 and A 9R8 was the substitution and which wasn't. I can't use an indicator column to tell substitutions from non-substitutions, because deleting all entries that are "left-only" would also remove the entry B 7k2.










share|improve this question
















Say I have two DataFrames, an "original" and "correction" dataframe such that



orig = pd.DataFrame({'Name':['A','B'],'Code':['9q8','7k2']})

Name Code
A 9q8
B 7k2

corr = pd.DataFrame({'Name':['A',],'Code':['9R8',]})
Name Code
A 9R8


How can I assign orig['Code']=corr['Code'] wherever orig['Name']==corr['Name] without merging or iterating through each correction in corr and then each original in orig? The final result should look like this:



Name Code
A 9R8
B 7k2


I did see another stack overflow post similar to my current question, but in that example the answer used a merge and then a selection mechanism to get rid of the old rows that are no longer desired. However, I have no such selection mechanism.



For example, if I merge orig and corr, I get:



out = pd.merge(orig,corr,on='Name',how='outer')
Name Code
A 9q8
A 9R8
B 7k2


Which, If I'm looking at thousands of rows, leaves me no way to know which of A 9q8 and A 9R8 was the substitution and which wasn't. I can't use an indicator column to tell substitutions from non-substitutions, because deleting all entries that are "left-only" would also remove the entry B 7k2.







python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 '18 at 16:51







David

















asked Nov 19 '18 at 16:30









DavidDavid

316




316








  • 1





    orig.update(corr)

    – Chris
    Nov 19 '18 at 16:46











  • Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when .update() will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to .update() that allows you to chose index/column

    – David
    Nov 19 '18 at 17:15













  • update() "Aligns on indices" and updates matching columns in the left dataframe with data from the right

    – Chris
    Nov 19 '18 at 17:41













  • I see, so .update() replaces the entries in the left table with those in the right wherever the indices match, index_left==index_right, but only for columns that are in both the left table and the right table, colName_left == colName_right. And, if I want, I can change the index I'm matching on to some other arbitrary column with .set_index().

    – David
    Nov 20 '18 at 17:45






  • 1





    yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index: df1['col1'].update(df2['col2']) this will update df1['col1'] with the data from df2['col2'] where the indices match

    – Chris
    Nov 20 '18 at 19:05
















  • 1





    orig.update(corr)

    – Chris
    Nov 19 '18 at 16:46











  • Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when .update() will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to .update() that allows you to chose index/column

    – David
    Nov 19 '18 at 17:15













  • update() "Aligns on indices" and updates matching columns in the left dataframe with data from the right

    – Chris
    Nov 19 '18 at 17:41













  • I see, so .update() replaces the entries in the left table with those in the right wherever the indices match, index_left==index_right, but only for columns that are in both the left table and the right table, colName_left == colName_right. And, if I want, I can change the index I'm matching on to some other arbitrary column with .set_index().

    – David
    Nov 20 '18 at 17:45






  • 1





    yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index: df1['col1'].update(df2['col2']) this will update df1['col1'] with the data from df2['col2'] where the indices match

    – Chris
    Nov 20 '18 at 19:05










1




1





orig.update(corr)

– Chris
Nov 19 '18 at 16:46





orig.update(corr)

– Chris
Nov 19 '18 at 16:46













Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when .update() will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to .update() that allows you to chose index/column

– David
Nov 19 '18 at 17:15







Thanks @Chris, I tried this and it works! But how? From the pandas documentation there aren't any examples of column-based substitution, only index-based. Then again, although there aren't examples, it does say "only values at matching index/*column* labels are updated" How can I know when .update() will update based on matching column values and when it will update based on matching index values? It doesn't look like there's an argument you can pass to .update() that allows you to chose index/column

– David
Nov 19 '18 at 17:15















update() "Aligns on indices" and updates matching columns in the left dataframe with data from the right

– Chris
Nov 19 '18 at 17:41







update() "Aligns on indices" and updates matching columns in the left dataframe with data from the right

– Chris
Nov 19 '18 at 17:41















I see, so .update() replaces the entries in the left table with those in the right wherever the indices match, index_left==index_right, but only for columns that are in both the left table and the right table, colName_left == colName_right. And, if I want, I can change the index I'm matching on to some other arbitrary column with .set_index().

– David
Nov 20 '18 at 17:45





I see, so .update() replaces the entries in the left table with those in the right wherever the indices match, index_left==index_right, but only for columns that are in both the left table and the right table, colName_left == colName_right. And, if I want, I can change the index I'm matching on to some other arbitrary column with .set_index().

– David
Nov 20 '18 at 17:45




1




1





yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index: df1['col1'].update(df2['col2']) this will update df1['col1'] with the data from df2['col2'] where the indices match

– Chris
Nov 20 '18 at 19:05







yes that is correct. You can also specify the columns to update if the names do not match. Data will be updated based on the index: df1['col1'].update(df2['col2']) this will update df1['col1'] with the data from df2['col2'] where the indices match

– Chris
Nov 20 '18 at 19:05














1 Answer
1






active

oldest

votes


















3














orig.set_index('Name',inplace=True)
orig.update(corr.set_index('Name'))

orig.reset_index(inplace=True)
orig
Out[164]:
Name Code
0 A 9R8
1 B 7k2





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378961%2fhow-to-substitute-string-values-in-one-dataframe-based-on-another-corrections%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    orig.set_index('Name',inplace=True)
    orig.update(corr.set_index('Name'))

    orig.reset_index(inplace=True)
    orig
    Out[164]:
    Name Code
    0 A 9R8
    1 B 7k2





    share|improve this answer






























      3














      orig.set_index('Name',inplace=True)
      orig.update(corr.set_index('Name'))

      orig.reset_index(inplace=True)
      orig
      Out[164]:
      Name Code
      0 A 9R8
      1 B 7k2





      share|improve this answer




























        3












        3








        3







        orig.set_index('Name',inplace=True)
        orig.update(corr.set_index('Name'))

        orig.reset_index(inplace=True)
        orig
        Out[164]:
        Name Code
        0 A 9R8
        1 B 7k2





        share|improve this answer















        orig.set_index('Name',inplace=True)
        orig.update(corr.set_index('Name'))

        orig.reset_index(inplace=True)
        orig
        Out[164]:
        Name Code
        0 A 9R8
        1 B 7k2






        share|improve this answer














        share|improve this answer



        share|improve this answer








        answered Nov 19 '18 at 16:56


























        community wiki





        Wen-Ben































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378961%2fhow-to-substitute-string-values-in-one-dataframe-based-on-another-corrections%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Guess what letter conforming each word

            Port of Spain

            Run scheduled task as local user group (not BUILTIN)