Count match in 2 pandas dataframes
I have 2 dataframes containing text as list in each row. This one is called df
Datum File File_type Text
Datum
2000-01-27 2000-01-27 0864820040_000127_04.txt _04 [business, date, jan, heineken, starts, integr..
and i have another one, df_lm which looks like this
List_type Words
0 LM_cnstrain. [abide, abiding, bound, bounded, commit, commi...
1 LM_litigius. [abovementioned, abrogate, abrogated, abrogate...
2 LM_modal_me. [can, frequently, generally, likely, often, ou...
3 LM_modal_st. [always, best, clearly, definitely, definitive...
4 LM_modal_wk. [almost, apparently, appeared, appearing, appe...
I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]
Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this
Datum ...LM_cnstrain LM_litigius Lm_modal_me ...
2000-01-27 ... 5 3 4
2000-02-25 ... 7 1 0
I hope i was clear on my question.
Thanks in advance!
EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.
The code looked like this:
result_list
for file in file_list:
count_growth = 0
for word in text.split ():
if word in growth:
count_growth = count_growth +1
a={'Grwoth':count_growth}
result_list.append(a)
python python-3.x pandas dataframe
add a comment |
I have 2 dataframes containing text as list in each row. This one is called df
Datum File File_type Text
Datum
2000-01-27 2000-01-27 0864820040_000127_04.txt _04 [business, date, jan, heineken, starts, integr..
and i have another one, df_lm which looks like this
List_type Words
0 LM_cnstrain. [abide, abiding, bound, bounded, commit, commi...
1 LM_litigius. [abovementioned, abrogate, abrogated, abrogate...
2 LM_modal_me. [can, frequently, generally, likely, often, ou...
3 LM_modal_st. [always, best, clearly, definitely, definitive...
4 LM_modal_wk. [almost, apparently, appeared, appearing, appe...
I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]
Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this
Datum ...LM_cnstrain LM_litigius Lm_modal_me ...
2000-01-27 ... 5 3 4
2000-02-25 ... 7 1 0
I hope i was clear on my question.
Thanks in advance!
EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.
The code looked like this:
result_list
for file in file_list:
count_growth = 0
for word in text.split ():
if word in growth:
count_growth = count_growth +1
a={'Grwoth':count_growth}
result_list.append(a)
python python-3.x pandas dataframe
Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?
– Rahul Agarwal
Nov 21 '18 at 14:48
what do you mean by 'Rest df transpose can you take care'?
– user10395806
Nov 21 '18 at 14:50
Addition of columns in the original dataframe...
– Rahul Agarwal
Nov 21 '18 at 14:53
Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...
– user10395806
Nov 21 '18 at 14:56
add a comment |
I have 2 dataframes containing text as list in each row. This one is called df
Datum File File_type Text
Datum
2000-01-27 2000-01-27 0864820040_000127_04.txt _04 [business, date, jan, heineken, starts, integr..
and i have another one, df_lm which looks like this
List_type Words
0 LM_cnstrain. [abide, abiding, bound, bounded, commit, commi...
1 LM_litigius. [abovementioned, abrogate, abrogated, abrogate...
2 LM_modal_me. [can, frequently, generally, likely, often, ou...
3 LM_modal_st. [always, best, clearly, definitely, definitive...
4 LM_modal_wk. [almost, apparently, appeared, appearing, appe...
I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]
Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this
Datum ...LM_cnstrain LM_litigius Lm_modal_me ...
2000-01-27 ... 5 3 4
2000-02-25 ... 7 1 0
I hope i was clear on my question.
Thanks in advance!
EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.
The code looked like this:
result_list
for file in file_list:
count_growth = 0
for word in text.split ():
if word in growth:
count_growth = count_growth +1
a={'Grwoth':count_growth}
result_list.append(a)
python python-3.x pandas dataframe
I have 2 dataframes containing text as list in each row. This one is called df
Datum File File_type Text
Datum
2000-01-27 2000-01-27 0864820040_000127_04.txt _04 [business, date, jan, heineken, starts, integr..
and i have another one, df_lm which looks like this
List_type Words
0 LM_cnstrain. [abide, abiding, bound, bounded, commit, commi...
1 LM_litigius. [abovementioned, abrogate, abrogated, abrogate...
2 LM_modal_me. [can, frequently, generally, likely, often, ou...
3 LM_modal_st. [always, best, clearly, definitely, definitive...
4 LM_modal_wk. [almost, apparently, appeared, appearing, appe...
I want to create new columns in df, where the match of words should be counted, so for example how many words are there from df_lm.Words[0] in df.Text[0]
Note: df has ca 500 rows and df_lm has 6 -> so i need to create 6 new columns in df so that the updated df looks somewhat like this
Datum ...LM_cnstrain LM_litigius Lm_modal_me ...
2000-01-27 ... 5 3 4
2000-02-25 ... 7 1 0
I hope i was clear on my question.
Thanks in advance!
EDIT:
i have already done smth. similar by creating a list and loop over it, but as the lists in df_lm are very long this is not an option.
The code looked like this:
result_list
for file in file_list:
count_growth = 0
for word in text.split ():
if word in growth:
count_growth = count_growth +1
a={'Grwoth':count_growth}
result_list.append(a)
python python-3.x pandas dataframe
python python-3.x pandas dataframe
edited Nov 21 '18 at 14:15
user10395806
asked Nov 21 '18 at 14:10
user10395806user10395806
356
356
Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?
– Rahul Agarwal
Nov 21 '18 at 14:48
what do you mean by 'Rest df transpose can you take care'?
– user10395806
Nov 21 '18 at 14:50
Addition of columns in the original dataframe...
– Rahul Agarwal
Nov 21 '18 at 14:53
Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...
– user10395806
Nov 21 '18 at 14:56
add a comment |
Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?
– Rahul Agarwal
Nov 21 '18 at 14:48
what do you mean by 'Rest df transpose can you take care'?
– user10395806
Nov 21 '18 at 14:50
Addition of columns in the original dataframe...
– Rahul Agarwal
Nov 21 '18 at 14:53
Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...
– user10395806
Nov 21 '18 at 14:56
Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?
– Rahul Agarwal
Nov 21 '18 at 14:48
Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?
– Rahul Agarwal
Nov 21 '18 at 14:48
what do you mean by 'Rest df transpose can you take care'?
– user10395806
Nov 21 '18 at 14:50
what do you mean by 'Rest df transpose can you take care'?
– user10395806
Nov 21 '18 at 14:50
Addition of columns in the original dataframe...
– Rahul Agarwal
Nov 21 '18 at 14:53
Addition of columns in the original dataframe...
– Rahul Agarwal
Nov 21 '18 at 14:53
Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...
– user10395806
Nov 21 '18 at 14:56
Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...
– user10395806
Nov 21 '18 at 14:56
add a comment |
2 Answers
2
active
oldest
votes
According to my comments you can try something like this:
The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)
desc = df_lm.iloc[0,1]
matches = df.text.isin(desc)
result = df.text[matches]
If this helps you, let me know otherwise will update/delete the answer
doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell
– user10395806
Nov 21 '18 at 15:01
Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!
– Rahul Agarwal
Nov 21 '18 at 15:03
i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists
– user10395806
Nov 21 '18 at 15:11
The same example above or when u pass df column to set ?
– Rahul Agarwal
Nov 21 '18 at 15:13
your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)
– user10395806
Nov 21 '18 at 15:14
|
show 2 more comments
So ive come to the following solution:
for file in file_list:
count_lm_constraint = 0
count_lm_litigious = 0
count_lm_modal_me = 0
for word in text.split()
if word in df_lm.iloc[0,1]:
count_lm_constraint = count_lm_constraint +1
if word in df_lm.iloc[1,1]:
count_lm_litigious = count_lm_litigious +1
if word in df_lm.iloc[2,1]:
count_lm_modal_me = count_lm_modal_me +1
a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
result_list.append(a)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413936%2fcount-match-in-2-pandas-dataframes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
According to my comments you can try something like this:
The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)
desc = df_lm.iloc[0,1]
matches = df.text.isin(desc)
result = df.text[matches]
If this helps you, let me know otherwise will update/delete the answer
doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell
– user10395806
Nov 21 '18 at 15:01
Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!
– Rahul Agarwal
Nov 21 '18 at 15:03
i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists
– user10395806
Nov 21 '18 at 15:11
The same example above or when u pass df column to set ?
– Rahul Agarwal
Nov 21 '18 at 15:13
your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)
– user10395806
Nov 21 '18 at 15:14
|
show 2 more comments
According to my comments you can try something like this:
The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)
desc = df_lm.iloc[0,1]
matches = df.text.isin(desc)
result = df.text[matches]
If this helps you, let me know otherwise will update/delete the answer
doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell
– user10395806
Nov 21 '18 at 15:01
Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!
– Rahul Agarwal
Nov 21 '18 at 15:03
i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists
– user10395806
Nov 21 '18 at 15:11
The same example above or when u pass df column to set ?
– Rahul Agarwal
Nov 21 '18 at 15:13
your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)
– user10395806
Nov 21 '18 at 15:14
|
show 2 more comments
According to my comments you can try something like this:
The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)
desc = df_lm.iloc[0,1]
matches = df.text.isin(desc)
result = df.text[matches]
If this helps you, let me know otherwise will update/delete the answer
According to my comments you can try something like this:
The below code has to run in a loop where text column from 1st df has to be matched with all 6 from next and make column with value from len(c)
desc = df_lm.iloc[0,1]
matches = df.text.isin(desc)
result = df.text[matches]
If this helps you, let me know otherwise will update/delete the answer
edited Nov 21 '18 at 15:17
answered Nov 21 '18 at 14:57
Rahul AgarwalRahul Agarwal
2,34051229
2,34051229
doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell
– user10395806
Nov 21 '18 at 15:01
Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!
– Rahul Agarwal
Nov 21 '18 at 15:03
i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists
– user10395806
Nov 21 '18 at 15:11
The same example above or when u pass df column to set ?
– Rahul Agarwal
Nov 21 '18 at 15:13
your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)
– user10395806
Nov 21 '18 at 15:14
|
show 2 more comments
doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell
– user10395806
Nov 21 '18 at 15:01
Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!
– Rahul Agarwal
Nov 21 '18 at 15:03
i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists
– user10395806
Nov 21 '18 at 15:11
The same example above or when u pass df column to set ?
– Rahul Agarwal
Nov 21 '18 at 15:13
your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)
– user10395806
Nov 21 '18 at 15:14
doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell
– user10395806
Nov 21 '18 at 15:01
doesnt help, because for your a, i have on average a list of 250 words / cell (for 550 rows) and for b i have ca. 5000 entries/cell
– user10395806
Nov 21 '18 at 15:01
Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!
– Rahul Agarwal
Nov 21 '18 at 15:03
Doesn't matter, how long the lists are. Instead of "a" and "b" you have to pass your columns!!
– Rahul Agarwal
Nov 21 '18 at 15:03
i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists
– user10395806
Nov 21 '18 at 15:11
i have missunderstodd you at first but if i try i get error: unhashable type: 'list' maybe i havent been clear enough: my df looks like: 0 ... [apple, pie,coke...] 1 ...[jaw, bone, cricket,...] so if i understand pandas right my column df.Text would be a series of lists
– user10395806
Nov 21 '18 at 15:11
The same example above or when u pass df column to set ?
– Rahul Agarwal
Nov 21 '18 at 15:13
The same example above or when u pass df column to set ?
– Rahul Agarwal
Nov 21 '18 at 15:13
your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)
– user10395806
Nov 21 '18 at 15:14
your example works fine, when i pass in the columns i get the error: i did: a = df.Text b = df_lm.iloc[0,1] c= (set(a)& set(b)) len(c)
– user10395806
Nov 21 '18 at 15:14
|
show 2 more comments
So ive come to the following solution:
for file in file_list:
count_lm_constraint = 0
count_lm_litigious = 0
count_lm_modal_me = 0
for word in text.split()
if word in df_lm.iloc[0,1]:
count_lm_constraint = count_lm_constraint +1
if word in df_lm.iloc[1,1]:
count_lm_litigious = count_lm_litigious +1
if word in df_lm.iloc[2,1]:
count_lm_modal_me = count_lm_modal_me +1
a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
result_list.append(a)
add a comment |
So ive come to the following solution:
for file in file_list:
count_lm_constraint = 0
count_lm_litigious = 0
count_lm_modal_me = 0
for word in text.split()
if word in df_lm.iloc[0,1]:
count_lm_constraint = count_lm_constraint +1
if word in df_lm.iloc[1,1]:
count_lm_litigious = count_lm_litigious +1
if word in df_lm.iloc[2,1]:
count_lm_modal_me = count_lm_modal_me +1
a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
result_list.append(a)
add a comment |
So ive come to the following solution:
for file in file_list:
count_lm_constraint = 0
count_lm_litigious = 0
count_lm_modal_me = 0
for word in text.split()
if word in df_lm.iloc[0,1]:
count_lm_constraint = count_lm_constraint +1
if word in df_lm.iloc[1,1]:
count_lm_litigious = count_lm_litigious +1
if word in df_lm.iloc[2,1]:
count_lm_modal_me = count_lm_modal_me +1
a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
result_list.append(a)
So ive come to the following solution:
for file in file_list:
count_lm_constraint = 0
count_lm_litigious = 0
count_lm_modal_me = 0
for word in text.split()
if word in df_lm.iloc[0,1]:
count_lm_constraint = count_lm_constraint +1
if word in df_lm.iloc[1,1]:
count_lm_litigious = count_lm_litigious +1
if word in df_lm.iloc[2,1]:
count_lm_modal_me = count_lm_modal_me +1
a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}
result_list.append(a)
answered Nov 21 '18 at 18:23
user10395806user10395806
356
356
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413936%2fcount-match-in-2-pandas-dataframes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are you only looking for a solution where you can match two df columns and get a count of elements matched? Rest df transpose can you take care?
– Rahul Agarwal
Nov 21 '18 at 14:48
what do you mean by 'Rest df transpose can you take care'?
– user10395806
Nov 21 '18 at 14:50
Addition of columns in the original dataframe...
– Rahul Agarwal
Nov 21 '18 at 14:53
Guess i know how to add the new columns to orginal dataframe. I would have tried smth linke: df['name_new_column'] = ...
– user10395806
Nov 21 '18 at 14:56