How to count matches in tokoneized dataframe
I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:
grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]
I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.
I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:
%%time
growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'
result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')
for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)
The problem here was, that it creates a total sum but not a sum for each row as it does for length.
Thanks in advance for any solutions!
python python-3.x pandas
add a comment |
I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:
grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]
I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.
I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:
%%time
growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'
result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')
for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)
The problem here was, that it creates a total sum but not a sum for each row as it does for length.
Thanks in advance for any solutions!
python python-3.x pandas
1
What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 '18 at 16:49
at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 '18 at 16:52
add a comment |
I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:
grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]
I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.
I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:
%%time
growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'
result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')
for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)
The problem here was, that it creates a total sum but not a sum for each row as it does for length.
Thanks in advance for any solutions!
python python-3.x pandas
I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:
grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]
I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.
I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:
%%time
growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'
result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')
for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)
The problem here was, that it creates a total sum but not a sum for each row as it does for length.
Thanks in advance for any solutions!
python python-3.x pandas
python python-3.x pandas
asked Nov 20 '18 at 16:43
user10395806user10395806
356
356
1
What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 '18 at 16:49
at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 '18 at 16:52
add a comment |
1
What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 '18 at 16:49
at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 '18 at 16:52
1
1
What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 '18 at 16:49
What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 '18 at 16:49
at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 '18 at 16:52
at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 '18 at 16:52
add a comment |
1 Answer
1
active
oldest
votes
You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.
I hope I understood correctly what you wanted to do.
Code below:
for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')
for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)
1
Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 '18 at 16:58
I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 '18 at 17:01
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397645%2fhow-to-count-matches-in-tokoneized-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.
I hope I understood correctly what you wanted to do.
Code below:
for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')
for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)
1
Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 '18 at 16:58
I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 '18 at 17:01
add a comment |
You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.
I hope I understood correctly what you wanted to do.
Code below:
for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')
for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)
1
Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 '18 at 16:58
I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 '18 at 17:01
add a comment |
You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.
I hope I understood correctly what you wanted to do.
Code below:
for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')
for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)
You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.
I hope I understood correctly what you wanted to do.
Code below:
for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')
for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)
edited Nov 20 '18 at 16:59
answered Nov 20 '18 at 16:55
FMarazziFMarazzi
323213
323213
1
Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 '18 at 16:58
I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 '18 at 17:01
add a comment |
1
Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 '18 at 16:58
I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 '18 at 17:01
1
1
Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 '18 at 16:58
Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?
– user10395806
Nov 20 '18 at 16:58
I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 '18 at 17:01
I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.
– FMarazzi
Nov 20 '18 at 17:01
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397645%2fhow-to-count-matches-in-tokoneized-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?
– FMarazzi
Nov 20 '18 at 16:49
at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569
– user10395806
Nov 20 '18 at 16:52