How to count matches in tokoneized dataframe












1















I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:



grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]


I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.



I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:



%%time

growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'

result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())




a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)


The problem here was, that it creates a total sum but not a sum for each row as it does for length.



Thanks in advance for any solutions!










share|improve this question


















  • 1





    What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?

    – FMarazzi
    Nov 20 '18 at 16:49











  • at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569

    – user10395806
    Nov 20 '18 at 16:52


















1















I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:



grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]


I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.



I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:



%%time

growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'

result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())




a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)


The problem here was, that it creates a total sum but not a sum for each row as it does for length.



Thanks in advance for any solutions!










share|improve this question


















  • 1





    What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?

    – FMarazzi
    Nov 20 '18 at 16:49











  • at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569

    – user10395806
    Nov 20 '18 at 16:52
















1












1








1








I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:



grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]


I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.



I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:



%%time

growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'

result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())




a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)


The problem here was, that it creates a total sum but not a sum for each row as it does for length.



Thanks in advance for any solutions!










share|improve this question














I have a dataframe, which contains 599 tokenized texts, one per row. Also i have these lists:



grwoth = ['growth', 'grow', 'growing', 'grows']
syergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'Expertise' ]


I want to create a new column in my dataframe for each list and to count how often the words from the lists has been counted in each text.



I tried to input them into my orginal dataframe (without tokenization) but this didnt work either. I Had the following code:



%%time

growth = ['growth', 'grow', 'growing', 'grows']
synergies = ['synergies', 'synergy' ,'accretive', 'accretion','efficiencies' ,'efficient', 'efficiently' ]
intangibles = ['brand','branded','branding','brands','goodwill','patent','patents','goodwil']
customers = ['customer', 'customers' ,'consumer' ,'consumers' ]
technology = ['technological', 'technologically', 'technologies', 'technology', 'innovate', 'innovation', 'innovations', 'innovative', 'innovator', 'innovators']
human = ['employee', 'employees', 'employees', 'team', 'teamed', 'teaming', 'teams', 'expertise' ]
the = 'Wire'

result_list=
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
for file in file_list:
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())




a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)


The problem here was, that it creates a total sum but not a sum for each row as it does for length.



Thanks in advance for any solutions!







python python-3.x pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 '18 at 16:43









user10395806user10395806

356




356








  • 1





    What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?

    – FMarazzi
    Nov 20 '18 at 16:49











  • at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569

    – user10395806
    Nov 20 '18 at 16:52
















  • 1





    What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?

    – FMarazzi
    Nov 20 '18 at 16:49











  • at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569

    – user10395806
    Nov 20 '18 at 16:52










1




1





What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?

– FMarazzi
Nov 20 '18 at 16:49





What do you mean with "it creates a total sum but not a sum for each row as it does for length"? What would you like to see at the end?

– FMarazzi
Nov 20 '18 at 16:49













at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569

– user10395806
Nov 20 '18 at 16:52







at the end i would like to see for example in column customers in row 1: 5, row 2: 3 etc. instead it counts how many time there have been in the previous ro and simply adds the count for the next row so if in row1 there are 5 appreancec and in row2 3 the value in row 2 = 8 In addition: It counts the length for each text seperately so e.g. row1:394, row2: 569 but not row2: = 394+569

– user10395806
Nov 20 '18 at 16:52














1 Answer
1






active

oldest

votes


















1














You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.



I hope I understood correctly what you wanted to do.



Code below:



for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)





share|improve this answer





















  • 1





    Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?

    – user10395806
    Nov 20 '18 at 16:58











  • I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.

    – FMarazzi
    Nov 20 '18 at 17:01













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397645%2fhow-to-count-matches-in-tokoneized-dataframe%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.



I hope I understood correctly what you wanted to do.



Code below:



for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)





share|improve this answer





















  • 1





    Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?

    – user10395806
    Nov 20 '18 at 16:58











  • I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.

    – FMarazzi
    Nov 20 '18 at 17:01


















1














You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.



I hope I understood correctly what you wanted to do.



Code below:



for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)





share|improve this answer





















  • 1





    Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?

    – user10395806
    Nov 20 '18 at 16:58











  • I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.

    – FMarazzi
    Nov 20 '18 at 17:01
















1












1








1







You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.



I hope I understood correctly what you wanted to do.



Code below:



for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)





share|improve this answer















You simply need to clear the variables inside the for loop. This way it outputs the count for the various files, as it does for the length.



I hope I understood correctly what you wanted to do.



Code below:



for file in file_list:
count_growth = 0
count_human = 0
count_technology= 0
count_customers = 0
count_intagibles = 0
count_synergies = 0
count_the = 0
name = file[len(input_path):]
date = name[11:17]
type_1 = name[17:20]
with open(file, "r", encoding="utf-8", errors="surrogateescape") as rfile:
# We need to encode/decode as some text files are not in utf-8 format
text = rfile.read()
text = text.encode('utf-8', 'ignore')
text = text.decode('utf-8', 'ignore')

for word in text.split():
if word in growth:
count_growth = count_growth +1
if word in synergies:
count_synergies = count_synergies +1
if word in intagibles:
count_intagibles = count_intagibles+1
if word in customers:
count_customers = count_customers +1
if word in technology:
count_technology = count_technology +1
if word in human:
count_human = count_human +1
if word == 'The':
count_the = count_the +1
length = len(text.split())
a={"File": name, "Text": text,'the':count_the, 'Datum': date, 'File_type': type_1, 'length':length, 'grwoth':count_growth, 'synergies': count_synergies,'intagibles':count_intagibles,'customers':count_customers, 'technology':count_technology,'human':count_human,}
result_list.append(a)






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 16:59

























answered Nov 20 '18 at 16:55









FMarazziFMarazzi

323213




323213








  • 1





    Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?

    – user10395806
    Nov 20 '18 at 16:58











  • I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.

    – FMarazzi
    Nov 20 '18 at 17:01
















  • 1





    Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?

    – user10395806
    Nov 20 '18 at 16:58











  • I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.

    – FMarazzi
    Nov 20 '18 at 17:01










1




1





Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?

– user10395806
Nov 20 '18 at 16:58





Thanks that solved it...so clumsy... But neverthe less how would the approach for the already tokenized text work? if i dont want to intgrate this in my original dataframe?

– user10395806
Nov 20 '18 at 16:58













I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.

– FMarazzi
Nov 20 '18 at 17:01







I think that should be another question, I am unsure what you are asking and we should not discuss too much in the comment session. Please provide examples of the desired output when asking a question.

– FMarazzi
Nov 20 '18 at 17:01






















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397645%2fhow-to-count-matches-in-tokoneized-dataframe%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Port of Spain

Run scheduled task as local user group (not BUILTIN)