Count match in 2 pandas dataframes

I have 2 dataframes containing text as list in each row. This one is called df

Datum   File    File_type   Text    

Datum                                               

2000-01-27  2000-01-27  0864820040_000127_04.txt    _04     [business, date, jan, heineken, starts, integr..

and i have another one, df_lm which looks like this

List_type   Words

0   LM_cnstrain.    [abide, abiding, bound, bounded, commit, commi...

1   LM_litigius.    [abovementioned, abrogate, abrogated, abrogate...

2   LM_modal_me.    [can, frequently, generally, likely, often, ou...

3   LM_modal_st.    [always, best, clearly, definitely, definitive...

4   LM_modal_wk.    [almost, apparently, appeared, appearing, appe...

I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]

Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this

    Datum   ...LM_cnstrain  LM_litigius  Lm_modal_me  ...

2000-01-27  ...   5            3             4

2000-02-25 ...    7            1             0

I hope i was clear on my question.
Thanks in advance!

EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.

The code looked like this:

result_list

for file in file_list:

    count_growth = 0

    for word in text.split ():

        if word in growth:

            count_growth = count_growth +1

    a={'Grwoth':count_growth}

    result_list.append(a)

edited Nov 21 '18 at 14:15

asked Nov 21 '18 at 14:10

user10395806

356

Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

– Rahul Agarwal
Nov 21 '18 at 14:48

what do you mean by 'Rest df transpose can you take care'?

– user10395806
Nov 21 '18 at 14:50

Addition of columns in the original dataframe...

– Rahul Agarwal
Nov 21 '18 at 14:53

Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

– user10395806
Nov 21 '18 at 14:56

add a comment |

I have 2 dataframes containing text as list in each row. This one is called df

Datum   File    File_type   Text    

Datum                                               

2000-01-27  2000-01-27  0864820040_000127_04.txt    _04     [business, date, jan, heineken, starts, integr..

and i have another one, df_lm which looks like this

List_type   Words

0   LM_cnstrain.    [abide, abiding, bound, bounded, commit, commi...

1   LM_litigius.    [abovementioned, abrogate, abrogated, abrogate...

2   LM_modal_me.    [can, frequently, generally, likely, often, ou...

3   LM_modal_st.    [always, best, clearly, definitely, definitive...

4   LM_modal_wk.    [almost, apparently, appeared, appearing, appe...

I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]

Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this

    Datum   ...LM_cnstrain  LM_litigius  Lm_modal_me  ...

2000-01-27  ...   5            3             4

2000-02-25 ...    7            1             0

I hope i was clear on my question.
Thanks in advance!

EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.

The code looked like this:

result_list

for file in file_list:

    count_growth = 0

    for word in text.split ():

        if word in growth:

            count_growth = count_growth +1

    a={'Grwoth':count_growth}

    result_list.append(a)

edited Nov 21 '18 at 14:15

asked Nov 21 '18 at 14:10

user10395806

356

Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

– Rahul Agarwal
Nov 21 '18 at 14:48

what do you mean by 'Rest df transpose can you take care'?

– user10395806
Nov 21 '18 at 14:50

Addition of columns in the original dataframe...

– Rahul Agarwal
Nov 21 '18 at 14:53

Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

– user10395806
Nov 21 '18 at 14:56

add a comment |

I have 2 dataframes containing text as list in each row. This one is called df

Datum   File    File_type   Text    

Datum                                               

2000-01-27  2000-01-27  0864820040_000127_04.txt    _04     [business, date, jan, heineken, starts, integr..

and i have another one, df_lm which looks like this

List_type   Words

0   LM_cnstrain.    [abide, abiding, bound, bounded, commit, commi...

1   LM_litigius.    [abovementioned, abrogate, abrogated, abrogate...

2   LM_modal_me.    [can, frequently, generally, likely, often, ou...

3   LM_modal_st.    [always, best, clearly, definitely, definitive...

4   LM_modal_wk.    [almost, apparently, appeared, appearing, appe...

I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]

Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this

    Datum   ...LM_cnstrain  LM_litigius  Lm_modal_me  ...

2000-01-27  ...   5            3             4

2000-02-25 ...    7            1             0

I hope i was clear on my question.
Thanks in advance!

EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.

The code looked like this:

result_list

for file in file_list:

    count_growth = 0

    for word in text.split ():

        if word in growth:

            count_growth = count_growth +1

    a={'Grwoth':count_growth}

    result_list.append(a)

edited Nov 21 '18 at 14:15

asked Nov 21 '18 at 14:10

user10395806

356

I have 2 dataframes containing text as list in each row. This one is called df

Datum   File    File_type   Text    

Datum                                               

2000-01-27  2000-01-27  0864820040_000127_04.txt    _04     [business, date, jan, heineken, starts, integr..

and i have another one, df_lm which looks like this

List_type   Words

0   LM_cnstrain.    [abide, abiding, bound, bounded, commit, commi...

1   LM_litigius.    [abovementioned, abrogate, abrogated, abrogate...

2   LM_modal_me.    [can, frequently, generally, likely, often, ou...

3   LM_modal_st.    [always, best, clearly, definitely, definitive...

4   LM_modal_wk.    [almost, apparently, appeared, appearing, appe...

I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]

Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this

    Datum   ...LM_cnstrain  LM_litigius  Lm_modal_me  ...

2000-01-27  ...   5            3             4

2000-02-25 ...    7            1             0

I hope i was clear on my question.
Thanks in advance!

EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.

The code looked like this:

result_list

for file in file_list:

    count_growth = 0

    for word in text.split ():

        if word in growth:

            count_growth = count_growth +1

    a={'Grwoth':count_growth}

    result_list.append(a)

python python-3.x pandas dataframe

edited Nov 21 '18 at 14:15

asked Nov 21 '18 at 14:10

user10395806

356

edited Nov 21 '18 at 14:15

asked Nov 21 '18 at 14:10

user10395806

356

edited Nov 21 '18 at 14:15

asked Nov 21 '18 at 14:10

user10395806

356

asked Nov 21 '18 at 14:10

user10395806

356

asked Nov 21 '18 at 14:10

user10395806

356

Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

– Rahul Agarwal
Nov 21 '18 at 14:48

what do you mean by 'Rest df transpose can you take care'?

– user10395806
Nov 21 '18 at 14:50

Addition of columns in the original dataframe...

– Rahul Agarwal
Nov 21 '18 at 14:53

Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

– user10395806
Nov 21 '18 at 14:56

add a comment |

Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

– Rahul Agarwal
Nov 21 '18 at 14:48

what do you mean by 'Rest df transpose can you take care'?

– user10395806
Nov 21 '18 at 14:50

Addition of columns in the original dataframe...

– Rahul Agarwal
Nov 21 '18 at 14:53

Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

– user10395806
Nov 21 '18 at 14:56

Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?

– Rahul Agarwal
Nov 21 '18 at 14:48

what do you mean by 'Rest df transpose can you take care'?

– user10395806
Nov 21 '18 at 14:50

Addition of columns in the original dataframe...

– Rahul Agarwal
Nov 21 '18 at 14:53

Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...

– user10395806
Nov 21 '18 at 14:56

add a comment |

2 Answers
2

active

oldest

votes

According to my comments you can try something like this:

The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)

desc = df_lm.iloc[0,1]

matches = df.text.isin(desc)

result = df.text[matches]

If this helps you, let me know otherwise will update/delete the answer

edited Nov 21 '18 at 15:17

answered Nov 21 '18 at 14:57

Rahul Agarwal

2,34051229

doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

– user10395806
Nov 21 '18 at 15:01

Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

– Rahul Agarwal
Nov 21 '18 at 15:03

i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

– user10395806
Nov 21 '18 at 15:11

The same example above or when u pass df column to set ?

– Rahul Agarwal
Nov 21 '18 at 15:13

your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

– user10395806
Nov 21 '18 at 15:14

|
show 2 more comments

So ive come to the following solution:

    for file in file_list:

        count_lm_constraint = 0

        count_lm_litigious = 0

        count_lm_modal_me = 0

          for word in text.split()

        if word in df_lm.iloc[0,1]:

                count_lm_constraint = count_lm_constraint +1 

            if word in df_lm.iloc[1,1]:

                count_lm_litigious = count_lm_litigious +1

            if word in df_lm.iloc[2,1]:

                count_lm_modal_me = count_lm_modal_me +1

            a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}

result_list.append(a)

answered Nov 21 '18 at 18:23

user10395806

356

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413936%2fcount-match-in-2-pandas-dataframes%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

According to my comments you can try something like this:

The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)

desc = df_lm.iloc[0,1]

matches = df.text.isin(desc)

result = df.text[matches]

If this helps you, let me know otherwise will update/delete the answer

edited Nov 21 '18 at 15:17

answered Nov 21 '18 at 14:57

Rahul Agarwal

2,34051229

doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

– user10395806
Nov 21 '18 at 15:01

Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

– Rahul Agarwal
Nov 21 '18 at 15:03

i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

– user10395806
Nov 21 '18 at 15:11

The same example above or when u pass df column to set ?

– Rahul Agarwal
Nov 21 '18 at 15:13

your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

– user10395806
Nov 21 '18 at 15:14

|
show 2 more comments

According to my comments you can try something like this:

The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)

desc = df_lm.iloc[0,1]

matches = df.text.isin(desc)

result = df.text[matches]

If this helps you, let me know otherwise will update/delete the answer

edited Nov 21 '18 at 15:17

answered Nov 21 '18 at 14:57

Rahul Agarwal

2,34051229

doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

– user10395806
Nov 21 '18 at 15:01

Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

– Rahul Agarwal
Nov 21 '18 at 15:03

i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

– user10395806
Nov 21 '18 at 15:11

The same example above or when u pass df column to set ?

– Rahul Agarwal
Nov 21 '18 at 15:13

your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

– user10395806
Nov 21 '18 at 15:14

|
show 2 more comments

According to my comments you can try something like this:

The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)

desc = df_lm.iloc[0,1]

matches = df.text.isin(desc)

result = df.text[matches]

If this helps you, let me know otherwise will update/delete the answer

edited Nov 21 '18 at 15:17

answered Nov 21 '18 at 14:57

Rahul Agarwal

2,34051229

According to my comments you can try something like this:

The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)

desc = df_lm.iloc[0,1]

matches = df.text.isin(desc)

result = df.text[matches]

If this helps you, let me know otherwise will update/delete the answer

edited Nov 21 '18 at 15:17

answered Nov 21 '18 at 14:57

Rahul Agarwal

2,34051229

edited Nov 21 '18 at 15:17

answered Nov 21 '18 at 14:57

Rahul Agarwal

2,34051229

answered Nov 21 '18 at 14:57

Rahul Agarwal

2,34051229

answered Nov 21 '18 at 14:57

Rahul Agarwal

2,34051229

doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

– user10395806
Nov 21 '18 at 15:01

Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

– Rahul Agarwal
Nov 21 '18 at 15:03

i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

– user10395806
Nov 21 '18 at 15:11

The same example above or when u pass df column to set ?

– Rahul Agarwal
Nov 21 '18 at 15:13

your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

– user10395806
Nov 21 '18 at 15:14

|
show 2 more comments

doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

– user10395806
Nov 21 '18 at 15:01

Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

– Rahul Agarwal
Nov 21 '18 at 15:03

i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

– user10395806
Nov 21 '18 at 15:11

The same example above or when u pass df column to set ?

– Rahul Agarwal
Nov 21 '18 at 15:13

your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

– user10395806
Nov 21 '18 at 15:14

doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell

– user10395806
Nov 21 '18 at 15:01

Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!

– Rahul Agarwal
Nov 21 '18 at 15:03

i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists

– user10395806
Nov 21 '18 at 15:11

The same example above or when u pass df column to set ?

– Rahul Agarwal
Nov 21 '18 at 15:13

your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)

– user10395806
Nov 21 '18 at 15:14

|
show 2 more comments

So ive come to the following solution:

    for file in file_list:

        count_lm_constraint = 0

        count_lm_litigious = 0

        count_lm_modal_me = 0

          for word in text.split()

        if word in df_lm.iloc[0,1]:

                count_lm_constraint = count_lm_constraint +1 

            if word in df_lm.iloc[1,1]:

                count_lm_litigious = count_lm_litigious +1

            if word in df_lm.iloc[2,1]:

                count_lm_modal_me = count_lm_modal_me +1

            a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}

result_list.append(a)

answered Nov 21 '18 at 18:23

user10395806

356

add a comment |

So ive come to the following solution:

    for file in file_list:

        count_lm_constraint = 0

        count_lm_litigious = 0

        count_lm_modal_me = 0

          for word in text.split()

        if word in df_lm.iloc[0,1]:

                count_lm_constraint = count_lm_constraint +1 

            if word in df_lm.iloc[1,1]:

                count_lm_litigious = count_lm_litigious +1

            if word in df_lm.iloc[2,1]:

                count_lm_modal_me = count_lm_modal_me +1

            a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}

result_list.append(a)

answered Nov 21 '18 at 18:23

user10395806

356

add a comment |

So ive come to the following solution:

    for file in file_list:

        count_lm_constraint = 0

        count_lm_litigious = 0

        count_lm_modal_me = 0

          for word in text.split()

        if word in df_lm.iloc[0,1]:

                count_lm_constraint = count_lm_constraint +1 

            if word in df_lm.iloc[1,1]:

                count_lm_litigious = count_lm_litigious +1

            if word in df_lm.iloc[2,1]:

                count_lm_modal_me = count_lm_modal_me +1

            a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}

result_list.append(a)

answered Nov 21 '18 at 18:23

user10395806

356

So ive come to the following solution:

    for file in file_list:

        count_lm_constraint = 0

        count_lm_litigious = 0

        count_lm_modal_me = 0

          for word in text.split()

        if word in df_lm.iloc[0,1]:

                count_lm_constraint = count_lm_constraint +1 

            if word in df_lm.iloc[1,1]:

                count_lm_litigious = count_lm_litigious +1

            if word in df_lm.iloc[2,1]:

                count_lm_modal_me = count_lm_modal_me +1

            a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}

result_list.append(a)

answered Nov 21 '18 at 18:23

user10395806

356

answered Nov 21 '18 at 18:23

user10395806

356

answered Nov 21 '18 at 18:23

user10395806

356

answered Nov 21 '18 at 18:23

user10395806

356

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk