Compare values of a specific column to different files
I have a dataframe called data
it's have 2 columns as follows:
color fruitN
red apple
yellow orange
blue banana
green avocado
Also, I have 3 different files of one columns; each contain list of fruits as follows:
file1
akee
apricot
avocado
file2
avocado
bilberry
banana
blackberry
file3
blackberry
coconut
cranberry
I need to create another dataframe called type
that have # of rows equal the same as data
= 4, and # of columns equal same as the number of files = 3
I need to check each fruit in column fruitN of the dataframe data
if it's exist in first file, put 1 in first row first column and continue checking in file2 and file3. So, the output dataframe type
should be like that:
c1 c2 c3
0 0 0
0 0 0
0 1 0
1 1 0
Then, concatenated with data
to be like this:
color fruitN c1 c2 c3
red apple 0 0 0
yellow orange 0 0 0
blue banana 0 1 0
green avocado 1 1 0
I'm beginner with python, so any help would be appreciated.
python pandas dataframe concatenation
add a comment |
I have a dataframe called data
it's have 2 columns as follows:
color fruitN
red apple
yellow orange
blue banana
green avocado
Also, I have 3 different files of one columns; each contain list of fruits as follows:
file1
akee
apricot
avocado
file2
avocado
bilberry
banana
blackberry
file3
blackberry
coconut
cranberry
I need to create another dataframe called type
that have # of rows equal the same as data
= 4, and # of columns equal same as the number of files = 3
I need to check each fruit in column fruitN of the dataframe data
if it's exist in first file, put 1 in first row first column and continue checking in file2 and file3. So, the output dataframe type
should be like that:
c1 c2 c3
0 0 0
0 0 0
0 1 0
1 1 0
Then, concatenated with data
to be like this:
color fruitN c1 c2 c3
red apple 0 0 0
yellow orange 0 0 0
blue banana 0 1 0
green avocado 1 1 0
I'm beginner with python, so any help would be appreciated.
python pandas dataframe concatenation
what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.
– teng
Nov 20 '18 at 0:38
I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng
– Sara Wasl
Nov 20 '18 at 0:40
add a comment |
I have a dataframe called data
it's have 2 columns as follows:
color fruitN
red apple
yellow orange
blue banana
green avocado
Also, I have 3 different files of one columns; each contain list of fruits as follows:
file1
akee
apricot
avocado
file2
avocado
bilberry
banana
blackberry
file3
blackberry
coconut
cranberry
I need to create another dataframe called type
that have # of rows equal the same as data
= 4, and # of columns equal same as the number of files = 3
I need to check each fruit in column fruitN of the dataframe data
if it's exist in first file, put 1 in first row first column and continue checking in file2 and file3. So, the output dataframe type
should be like that:
c1 c2 c3
0 0 0
0 0 0
0 1 0
1 1 0
Then, concatenated with data
to be like this:
color fruitN c1 c2 c3
red apple 0 0 0
yellow orange 0 0 0
blue banana 0 1 0
green avocado 1 1 0
I'm beginner with python, so any help would be appreciated.
python pandas dataframe concatenation
I have a dataframe called data
it's have 2 columns as follows:
color fruitN
red apple
yellow orange
blue banana
green avocado
Also, I have 3 different files of one columns; each contain list of fruits as follows:
file1
akee
apricot
avocado
file2
avocado
bilberry
banana
blackberry
file3
blackberry
coconut
cranberry
I need to create another dataframe called type
that have # of rows equal the same as data
= 4, and # of columns equal same as the number of files = 3
I need to check each fruit in column fruitN of the dataframe data
if it's exist in first file, put 1 in first row first column and continue checking in file2 and file3. So, the output dataframe type
should be like that:
c1 c2 c3
0 0 0
0 0 0
0 1 0
1 1 0
Then, concatenated with data
to be like this:
color fruitN c1 c2 c3
red apple 0 0 0
yellow orange 0 0 0
blue banana 0 1 0
green avocado 1 1 0
I'm beginner with python, so any help would be appreciated.
python pandas dataframe concatenation
python pandas dataframe concatenation
edited Nov 20 '18 at 0:49
eyllanesc
78.3k103156
78.3k103156
asked Nov 20 '18 at 0:27
Sara WaslSara Wasl
937
937
what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.
– teng
Nov 20 '18 at 0:38
I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng
– Sara Wasl
Nov 20 '18 at 0:40
add a comment |
what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.
– teng
Nov 20 '18 at 0:38
I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng
– Sara Wasl
Nov 20 '18 at 0:40
what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.
– teng
Nov 20 '18 at 0:38
what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.
– teng
Nov 20 '18 at 0:38
I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng
– Sara Wasl
Nov 20 '18 at 0:40
I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng
– Sara Wasl
Nov 20 '18 at 0:40
add a comment |
3 Answers
3
active
oldest
votes
Using isin
l=[f1,f2,f3]
for x,y in enumerate(l):
df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)
df
Out[144]:
color fruitN c1 c2 c3
0 red apple 0 0 0
1 yellow orange 0 0 0
2 blue banana 0 1 0
3 green avocado 1 1 0
add a comment |
I think you can use this:
data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)
That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.
You might need to import numpy
import numpy as pd
add a comment |
At first create the data-frames like:
import pandas as pd
data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],
columns=["color", "fruitN"])
data = data.set_index("fruitN")
file_1 = ["akee", "apricot", "avocado"]
file_2 = ["avocado", "bilberry", "banana", "blackberry"]
file_3 = ["blackberry", "coconut", "cranberry"]
file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])
file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])
file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])
then concatenate them with corresponding axis and set the sort to false:
data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)
then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:
res = data_concat.loc[["apple", "orange", "banana", "avocado"]]
res.reset_index(level=0, inplace=True)
res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]
res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]
print(res)
it gives:
color fruitN type_1 type_2 type_3
0 red apple 0.0 0.0 0.0
1 yellow orange 0.0 0.0 0.0
2 blue banana 0.0 1.0 0.0
3 green avocado 1.0 1.0 0.0
I hope this may help you.
It's too specific
– Sara Wasl
Nov 20 '18 at 12:48
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384545%2fcompare-values-of-a-specific-column-to-different-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using isin
l=[f1,f2,f3]
for x,y in enumerate(l):
df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)
df
Out[144]:
color fruitN c1 c2 c3
0 red apple 0 0 0
1 yellow orange 0 0 0
2 blue banana 0 1 0
3 green avocado 1 1 0
add a comment |
Using isin
l=[f1,f2,f3]
for x,y in enumerate(l):
df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)
df
Out[144]:
color fruitN c1 c2 c3
0 red apple 0 0 0
1 yellow orange 0 0 0
2 blue banana 0 1 0
3 green avocado 1 1 0
add a comment |
Using isin
l=[f1,f2,f3]
for x,y in enumerate(l):
df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)
df
Out[144]:
color fruitN c1 c2 c3
0 red apple 0 0 0
1 yellow orange 0 0 0
2 blue banana 0 1 0
3 green avocado 1 1 0
Using isin
l=[f1,f2,f3]
for x,y in enumerate(l):
df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)
df
Out[144]:
color fruitN c1 c2 c3
0 red apple 0 0 0
1 yellow orange 0 0 0
2 blue banana 0 1 0
3 green avocado 1 1 0
answered Nov 20 '18 at 1:58
Wen-BenWen-Ben
110k83266
110k83266
add a comment |
add a comment |
I think you can use this:
data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)
That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.
You might need to import numpy
import numpy as pd
add a comment |
I think you can use this:
data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)
That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.
You might need to import numpy
import numpy as pd
add a comment |
I think you can use this:
data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)
That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.
You might need to import numpy
import numpy as pd
I think you can use this:
data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)
That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.
You might need to import numpy
import numpy as pd
edited Nov 20 '18 at 0:49
eyllanesc
78.3k103156
78.3k103156
answered Nov 20 '18 at 0:46
JorgeJorge
1,3131921
1,3131921
add a comment |
add a comment |
At first create the data-frames like:
import pandas as pd
data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],
columns=["color", "fruitN"])
data = data.set_index("fruitN")
file_1 = ["akee", "apricot", "avocado"]
file_2 = ["avocado", "bilberry", "banana", "blackberry"]
file_3 = ["blackberry", "coconut", "cranberry"]
file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])
file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])
file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])
then concatenate them with corresponding axis and set the sort to false:
data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)
then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:
res = data_concat.loc[["apple", "orange", "banana", "avocado"]]
res.reset_index(level=0, inplace=True)
res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]
res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]
print(res)
it gives:
color fruitN type_1 type_2 type_3
0 red apple 0.0 0.0 0.0
1 yellow orange 0.0 0.0 0.0
2 blue banana 0.0 1.0 0.0
3 green avocado 1.0 1.0 0.0
I hope this may help you.
It's too specific
– Sara Wasl
Nov 20 '18 at 12:48
add a comment |
At first create the data-frames like:
import pandas as pd
data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],
columns=["color", "fruitN"])
data = data.set_index("fruitN")
file_1 = ["akee", "apricot", "avocado"]
file_2 = ["avocado", "bilberry", "banana", "blackberry"]
file_3 = ["blackberry", "coconut", "cranberry"]
file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])
file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])
file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])
then concatenate them with corresponding axis and set the sort to false:
data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)
then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:
res = data_concat.loc[["apple", "orange", "banana", "avocado"]]
res.reset_index(level=0, inplace=True)
res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]
res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]
print(res)
it gives:
color fruitN type_1 type_2 type_3
0 red apple 0.0 0.0 0.0
1 yellow orange 0.0 0.0 0.0
2 blue banana 0.0 1.0 0.0
3 green avocado 1.0 1.0 0.0
I hope this may help you.
It's too specific
– Sara Wasl
Nov 20 '18 at 12:48
add a comment |
At first create the data-frames like:
import pandas as pd
data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],
columns=["color", "fruitN"])
data = data.set_index("fruitN")
file_1 = ["akee", "apricot", "avocado"]
file_2 = ["avocado", "bilberry", "banana", "blackberry"]
file_3 = ["blackberry", "coconut", "cranberry"]
file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])
file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])
file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])
then concatenate them with corresponding axis and set the sort to false:
data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)
then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:
res = data_concat.loc[["apple", "orange", "banana", "avocado"]]
res.reset_index(level=0, inplace=True)
res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]
res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]
print(res)
it gives:
color fruitN type_1 type_2 type_3
0 red apple 0.0 0.0 0.0
1 yellow orange 0.0 0.0 0.0
2 blue banana 0.0 1.0 0.0
3 green avocado 1.0 1.0 0.0
I hope this may help you.
At first create the data-frames like:
import pandas as pd
data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],
columns=["color", "fruitN"])
data = data.set_index("fruitN")
file_1 = ["akee", "apricot", "avocado"]
file_2 = ["avocado", "bilberry", "banana", "blackberry"]
file_3 = ["blackberry", "coconut", "cranberry"]
file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])
file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])
file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])
then concatenate them with corresponding axis and set the sort to false:
data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)
then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:
res = data_concat.loc[["apple", "orange", "banana", "avocado"]]
res.reset_index(level=0, inplace=True)
res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]
res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]
print(res)
it gives:
color fruitN type_1 type_2 type_3
0 red apple 0.0 0.0 0.0
1 yellow orange 0.0 0.0 0.0
2 blue banana 0.0 1.0 0.0
3 green avocado 1.0 1.0 0.0
I hope this may help you.
answered Nov 20 '18 at 2:31
saeed heidarisaeed heidari
1644
1644
It's too specific
– Sara Wasl
Nov 20 '18 at 12:48
add a comment |
It's too specific
– Sara Wasl
Nov 20 '18 at 12:48
It's too specific
– Sara Wasl
Nov 20 '18 at 12:48
It's too specific
– Sara Wasl
Nov 20 '18 at 12:48
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384545%2fcompare-values-of-a-specific-column-to-different-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.
– teng
Nov 20 '18 at 0:38
I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng
– Sara Wasl
Nov 20 '18 at 0:40