Count match in 2 pandas dataframes












1















I have 2 dataframes containing text as list in each row. This one is called df



Datum   File    File_type   Text    
Datum
2000-01-27 2000-01-27 0864820040_000127_04.txt _04 [business, date, jan, heineken, starts, integr..


and i have another one, df_lm which looks like this



List_type   Words
0 LM_cnstrain. [abide, abiding, bound, bounded, commit, commi...
1 LM_litigius. [abovementioned, abrogate, abrogated, abrogate...
2 LM_modal_me. [can, frequently, generally, likely, often, ou...
3 LM_modal_st. [always, best, clearly, definitely, definitive...
4 LM_modal_wk. [almost, apparently, appeared, appearing, appe...


I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]



Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this



    Datum   ...LM_cnstrain  LM_litigius  Lm_modal_me  ...
2000-01-27 ... 5 3 4
2000-02-25 ... 7 1 0


I hope i was clear on my question.
Thanks in advance!



EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.



The code looked like this:



result_list
for file in file_list:
count_growth = 0
for word in text.split ():
if word in growth:
count_growth = count_growth +1
a={'Grwoth':count_growth}
result_list.append(a)









share|improve this question

























  • Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

    – Rahul Agarwal
    Nov 21 '18 at 14:48











  • what do you mean by 'Rest df transpose can you take care'?

    – user10395806
    Nov 21 '18 at 14:50











  • Addition of columns in the original dataframe...

    – Rahul Agarwal
    Nov 21 '18 at 14:53











  • Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

    – user10395806
    Nov 21 '18 at 14:56


















1















I have 2 dataframes containing text as list in each row. This one is called df



Datum   File    File_type   Text    
Datum
2000-01-27 2000-01-27 0864820040_000127_04.txt _04 [business, date, jan, heineken, starts, integr..


and i have another one, df_lm which looks like this



List_type   Words
0 LM_cnstrain. [abide, abiding, bound, bounded, commit, commi...
1 LM_litigius. [abovementioned, abrogate, abrogated, abrogate...
2 LM_modal_me. [can, frequently, generally, likely, often, ou...
3 LM_modal_st. [always, best, clearly, definitely, definitive...
4 LM_modal_wk. [almost, apparently, appeared, appearing, appe...


I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]



Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this



    Datum   ...LM_cnstrain  LM_litigius  Lm_modal_me  ...
2000-01-27 ... 5 3 4
2000-02-25 ... 7 1 0


I hope i was clear on my question.
Thanks in advance!



EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.



The code looked like this:



result_list
for file in file_list:
count_growth = 0
for word in text.split ():
if word in growth:
count_growth = count_growth +1
a={'Grwoth':count_growth}
result_list.append(a)









share|improve this question

























  • Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

    – Rahul Agarwal
    Nov 21 '18 at 14:48











  • what do you mean by 'Rest df transpose can you take care'?

    – user10395806
    Nov 21 '18 at 14:50











  • Addition of columns in the original dataframe...

    – Rahul Agarwal
    Nov 21 '18 at 14:53











  • Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

    – user10395806
    Nov 21 '18 at 14:56
















1












1








1








I have 2 dataframes containing text as list in each row. This one is called df



Datum   File    File_type   Text    
Datum
2000-01-27 2000-01-27 0864820040_000127_04.txt _04 [business, date, jan, heineken, starts, integr..


and i have another one, df_lm which looks like this



List_type   Words
0 LM_cnstrain. [abide, abiding, bound, bounded, commit, commi...
1 LM_litigius. [abovementioned, abrogate, abrogated, abrogate...
2 LM_modal_me. [can, frequently, generally, likely, often, ou...
3 LM_modal_st. [always, best, clearly, definitely, definitive...
4 LM_modal_wk. [almost, apparently, appeared, appearing, appe...


I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]



Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this



    Datum   ...LM_cnstrain  LM_litigius  Lm_modal_me  ...
2000-01-27 ... 5 3 4
2000-02-25 ... 7 1 0


I hope i was clear on my question.
Thanks in advance!



EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.



The code looked like this:



result_list
for file in file_list:
count_growth = 0
for word in text.split ():
if word in growth:
count_growth = count_growth +1
a={'Grwoth':count_growth}
result_list.append(a)









share|improve this question
















I have 2 dataframes containing text as list in each row. This one is called df



Datum   File    File_type   Text    
Datum
2000-01-27 2000-01-27 0864820040_000127_04.txt _04 [business, date, jan, heineken, starts, integr..


and i have another one, df_lm which looks like this



List_type   Words
0 LM_cnstrain. [abide, abiding, bound, bounded, commit, commi...
1 LM_litigius. [abovementioned, abrogate, abrogated, abrogate...
2 LM_modal_me. [can, frequently, generally, likely, often, ou...
3 LM_modal_st. [always, best, clearly, definitely, definitive...
4 LM_modal_wk. [almost, apparently, appeared, appearing, appe...


I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]



Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this



    Datum   ...LM_cnstrain  LM_litigius  Lm_modal_me  ...
2000-01-27 ... 5 3 4
2000-02-25 ... 7 1 0


I hope i was clear on my question.
Thanks in advance!



EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.



The code looked like this:



result_list
for file in file_list:
count_growth = 0
for word in text.split ():
if word in growth:
count_growth = count_growth +1
a={'Grwoth':count_growth}
result_list.append(a)






python python-3.x pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 14:15







user10395806

















asked Nov 21 '18 at 14:10









user10395806user10395806

356




356













  • Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

    – Rahul Agarwal
    Nov 21 '18 at 14:48











  • what do you mean by 'Rest df transpose can you take care'?

    – user10395806
    Nov 21 '18 at 14:50











  • Addition of columns in the original dataframe...

    – Rahul Agarwal
    Nov 21 '18 at 14:53











  • Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

    – user10395806
    Nov 21 '18 at 14:56





















  • Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

    – Rahul Agarwal
    Nov 21 '18 at 14:48











  • what do you mean by 'Rest df transpose can you take care'?

    – user10395806
    Nov 21 '18 at 14:50











  • Addition of columns in the original dataframe...

    – Rahul Agarwal
    Nov 21 '18 at 14:53











  • Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

    – user10395806
    Nov 21 '18 at 14:56



















Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

– Rahul Agarwal
Nov 21 '18 at 14:48





Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

– Rahul Agarwal
Nov 21 '18 at 14:48













what do you mean by 'Rest df transpose can you take care'?

– user10395806
Nov 21 '18 at 14:50





what do you mean by 'Rest df transpose can you take care'?

– user10395806
Nov 21 '18 at 14:50













Addition of columns in the original dataframe...

– Rahul Agarwal
Nov 21 '18 at 14:53





Addition of columns in the original dataframe...

– Rahul Agarwal
Nov 21 '18 at 14:53













Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

– user10395806
Nov 21 '18 at 14:56







Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

– user10395806
Nov 21 '18 at 14:56














2 Answers
2






active

oldest

votes


















1














According to my comments you can try something like this:



The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)



desc = df_lm.iloc[0,1]
matches = df.text.isin(desc)
result = df.text[matches]


If this helps you, let me know otherwise will update/delete the answer






share|improve this answer


























  • doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

    – user10395806
    Nov 21 '18 at 15:01











  • Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

    – Rahul Agarwal
    Nov 21 '18 at 15:03













  • i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

    – user10395806
    Nov 21 '18 at 15:11













  • The same example above or when u pass df column to set ?

    – Rahul Agarwal
    Nov 21 '18 at 15:13











  • your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

    – user10395806
    Nov 21 '18 at 15:14





















0














So ive come to the following solution:



    for file in file_list:
count_lm_constraint = 0
count_lm_litigious = 0
count_lm_modal_me = 0
for word in text.split()
if word in df_lm.iloc[0,1]:
count_lm_constraint = count_lm_constraint +1
if word in df_lm.iloc[1,1]:
count_lm_litigious = count_lm_litigious +1
if word in df_lm.iloc[2,1]:
count_lm_modal_me = count_lm_modal_me +1
a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
result_list.append(a)





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413936%2fcount-match-in-2-pandas-dataframes%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    According to my comments you can try something like this:



    The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)



    desc = df_lm.iloc[0,1]
    matches = df.text.isin(desc)
    result = df.text[matches]


    If this helps you, let me know otherwise will update/delete the answer






    share|improve this answer


























    • doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

      – user10395806
      Nov 21 '18 at 15:01











    • Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

      – Rahul Agarwal
      Nov 21 '18 at 15:03













    • i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

      – user10395806
      Nov 21 '18 at 15:11













    • The same example above or when u pass df column to set ?

      – Rahul Agarwal
      Nov 21 '18 at 15:13











    • your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

      – user10395806
      Nov 21 '18 at 15:14


















    1














    According to my comments you can try something like this:



    The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)



    desc = df_lm.iloc[0,1]
    matches = df.text.isin(desc)
    result = df.text[matches]


    If this helps you, let me know otherwise will update/delete the answer






    share|improve this answer


























    • doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

      – user10395806
      Nov 21 '18 at 15:01











    • Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

      – Rahul Agarwal
      Nov 21 '18 at 15:03













    • i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

      – user10395806
      Nov 21 '18 at 15:11













    • The same example above or when u pass df column to set ?

      – Rahul Agarwal
      Nov 21 '18 at 15:13











    • your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

      – user10395806
      Nov 21 '18 at 15:14
















    1












    1








    1







    According to my comments you can try something like this:



    The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)



    desc = df_lm.iloc[0,1]
    matches = df.text.isin(desc)
    result = df.text[matches]


    If this helps you, let me know otherwise will update/delete the answer






    share|improve this answer















    According to my comments you can try something like this:



    The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)



    desc = df_lm.iloc[0,1]
    matches = df.text.isin(desc)
    result = df.text[matches]


    If this helps you, let me know otherwise will update/delete the answer







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 21 '18 at 15:17

























    answered Nov 21 '18 at 14:57









    Rahul AgarwalRahul Agarwal

    2,34051229




    2,34051229













    • doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

      – user10395806
      Nov 21 '18 at 15:01











    • Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

      – Rahul Agarwal
      Nov 21 '18 at 15:03













    • i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

      – user10395806
      Nov 21 '18 at 15:11













    • The same example above or when u pass df column to set ?

      – Rahul Agarwal
      Nov 21 '18 at 15:13











    • your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

      – user10395806
      Nov 21 '18 at 15:14





















    • doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

      – user10395806
      Nov 21 '18 at 15:01











    • Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

      – Rahul Agarwal
      Nov 21 '18 at 15:03













    • i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

      – user10395806
      Nov 21 '18 at 15:11













    • The same example above or when u pass df column to set ?

      – Rahul Agarwal
      Nov 21 '18 at 15:13











    • your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

      – user10395806
      Nov 21 '18 at 15:14



















    doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

    – user10395806
    Nov 21 '18 at 15:01





    doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

    – user10395806
    Nov 21 '18 at 15:01













    Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

    – Rahul Agarwal
    Nov 21 '18 at 15:03







    Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

    – Rahul Agarwal
    Nov 21 '18 at 15:03















    i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

    – user10395806
    Nov 21 '18 at 15:11







    i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

    – user10395806
    Nov 21 '18 at 15:11















    The same example above or when u pass df column to set ?

    – Rahul Agarwal
    Nov 21 '18 at 15:13





    The same example above or when u pass df column to set ?

    – Rahul Agarwal
    Nov 21 '18 at 15:13













    your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

    – user10395806
    Nov 21 '18 at 15:14







    your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

    – user10395806
    Nov 21 '18 at 15:14















    0














    So ive come to the following solution:



        for file in file_list:
    count_lm_constraint = 0
    count_lm_litigious = 0
    count_lm_modal_me = 0
    for word in text.split()
    if word in df_lm.iloc[0,1]:
    count_lm_constraint = count_lm_constraint +1
    if word in df_lm.iloc[1,1]:
    count_lm_litigious = count_lm_litigious +1
    if word in df_lm.iloc[2,1]:
    count_lm_modal_me = count_lm_modal_me +1
    a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
    result_list.append(a)





    share|improve this answer




























      0














      So ive come to the following solution:



          for file in file_list:
      count_lm_constraint = 0
      count_lm_litigious = 0
      count_lm_modal_me = 0
      for word in text.split()
      if word in df_lm.iloc[0,1]:
      count_lm_constraint = count_lm_constraint +1
      if word in df_lm.iloc[1,1]:
      count_lm_litigious = count_lm_litigious +1
      if word in df_lm.iloc[2,1]:
      count_lm_modal_me = count_lm_modal_me +1
      a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
      result_list.append(a)





      share|improve this answer


























        0












        0








        0







        So ive come to the following solution:



            for file in file_list:
        count_lm_constraint = 0
        count_lm_litigious = 0
        count_lm_modal_me = 0
        for word in text.split()
        if word in df_lm.iloc[0,1]:
        count_lm_constraint = count_lm_constraint +1
        if word in df_lm.iloc[1,1]:
        count_lm_litigious = count_lm_litigious +1
        if word in df_lm.iloc[2,1]:
        count_lm_modal_me = count_lm_modal_me +1
        a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
        result_list.append(a)





        share|improve this answer













        So ive come to the following solution:



            for file in file_list:
        count_lm_constraint = 0
        count_lm_litigious = 0
        count_lm_modal_me = 0
        for word in text.split()
        if word in df_lm.iloc[0,1]:
        count_lm_constraint = count_lm_constraint +1
        if word in df_lm.iloc[1,1]:
        count_lm_litigious = count_lm_litigious +1
        if word in df_lm.iloc[2,1]:
        count_lm_modal_me = count_lm_modal_me +1
        a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
        result_list.append(a)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 21 '18 at 18:23









        user10395806user10395806

        356




        356






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413936%2fcount-match-in-2-pandas-dataframes%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Guess what letter conforming each word

            Run scheduled task as local user group (not BUILTIN)

            Port of Spain