Compare values of a specific column to different files

I have a dataframe called data it's have 2 columns as follows:

color  fruitN

red    apple

yellow orange

blue   banana

green  avocado

Also, I have 3 different files of one columns; each contain list of fruits as follows:

file1

akee

apricot

avocado



file2

avocado

bilberry

banana

blackberry



file3

blackberry

coconut

cranberry

I need to create another dataframe called type that have # of rows equal the same as data= 4, and # of columns equal same as the number of files = 3

I need to check each fruit in column fruitN of the dataframe data if it's exist in first file, put 1 in first row first column and continue checking in file2 and file3. So, the output dataframe type should be like that:

Then, concatenated with data to be like this:

color  fruitN c1 c2 c3

red    apple  0  0  0

yellow orange 0  0  0

blue   banana 0  1  0

green  avocado 1 1  0

I'm beginner with python, so any help would be appreciated.

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

asked Nov 20 '18 at 0:27

Sara Wasl

937

what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

– teng
Nov 20 '18 at 0:38

I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

– Sara Wasl
Nov 20 '18 at 0:40

add a comment |

I have a dataframe called data it's have 2 columns as follows:

color  fruitN

red    apple

yellow orange

blue   banana

green  avocado

Also, I have 3 different files of one columns; each contain list of fruits as follows:

file1

akee

apricot

avocado



file2

avocado

bilberry

banana

blackberry



file3

blackberry

coconut

cranberry

I need to create another dataframe called type that have # of rows equal the same as data= 4, and # of columns equal same as the number of files = 3

Then, concatenated with data to be like this:

color  fruitN c1 c2 c3

red    apple  0  0  0

yellow orange 0  0  0

blue   banana 0  1  0

green  avocado 1 1  0

I'm beginner with python, so any help would be appreciated.

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

asked Nov 20 '18 at 0:27

Sara Wasl

937

what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

– teng
Nov 20 '18 at 0:38

I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

– Sara Wasl
Nov 20 '18 at 0:40

add a comment |

I have a dataframe called data it's have 2 columns as follows:

color  fruitN

red    apple

yellow orange

blue   banana

green  avocado

Also, I have 3 different files of one columns; each contain list of fruits as follows:

file1

akee

apricot

avocado



file2

avocado

bilberry

banana

blackberry



file3

blackberry

coconut

cranberry

I need to create another dataframe called type that have # of rows equal the same as data= 4, and # of columns equal same as the number of files = 3

Then, concatenated with data to be like this:

color  fruitN c1 c2 c3

red    apple  0  0  0

yellow orange 0  0  0

blue   banana 0  1  0

green  avocado 1 1  0

I'm beginner with python, so any help would be appreciated.

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

asked Nov 20 '18 at 0:27

Sara Wasl

937

I have a dataframe called data it's have 2 columns as follows:

color  fruitN

red    apple

yellow orange

blue   banana

green  avocado

Also, I have 3 different files of one columns; each contain list of fruits as follows:

file1

akee

apricot

avocado



file2

avocado

bilberry

banana

blackberry



file3

blackberry

coconut

cranberry

I need to create another dataframe called type that have # of rows equal the same as data= 4, and # of columns equal same as the number of files = 3

Then, concatenated with data to be like this:

color  fruitN c1 c2 c3

red    apple  0  0  0

yellow orange 0  0  0

blue   banana 0  1  0

green  avocado 1 1  0

I'm beginner with python, so any help would be appreciated.

python pandas dataframe concatenation

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

asked Nov 20 '18 at 0:27

Sara Wasl

937

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

asked Nov 20 '18 at 0:27

Sara Wasl

937

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

asked Nov 20 '18 at 0:27

Sara Wasl

937

asked Nov 20 '18 at 0:27

Sara Wasl

937

asked Nov 20 '18 at 0:27

Sara Wasl

937

what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

– teng
Nov 20 '18 at 0:38

I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

– Sara Wasl
Nov 20 '18 at 0:40

add a comment |

what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

– teng
Nov 20 '18 at 0:38

I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

– Sara Wasl
Nov 20 '18 at 0:40

what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

– teng
Nov 20 '18 at 0:38

I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

– Sara Wasl
Nov 20 '18 at 0:40

add a comment |

3 Answers
3

active

oldest

votes

Using isin

l=[f1,f2,f3]

for x,y in enumerate(l):

    df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)



df

Out[144]: 

    color   fruitN  c1  c2  c3

0     red    apple   0   0   0

1  yellow   orange   0   0   0

2    blue   banana   0   1   0

3   green  avocado   1   1   0

answered Nov 20 '18 at 1:58

Wen-Ben

110k83266

add a comment |

I think you can use this:

data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)

That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.

You might need to import numpy

import numpy as pd

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

answered Nov 20 '18 at 0:46

Jorge

1,3131921

add a comment |

At first create the data-frames like:

import pandas as pd

data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],

                    columns=["color", "fruitN"])

data = data.set_index("fruitN")

file_1 = ["akee", "apricot", "avocado"]

file_2 = ["avocado", "bilberry", "banana", "blackberry"]

file_3 = ["blackberry", "coconut", "cranberry"]

file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])

file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])

file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])

then concatenate them with corresponding axis and set the sort to false:

data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)

then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:

res = data_concat.loc[["apple", "orange", "banana", "avocado"]]

res.reset_index(level=0, inplace=True)

res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]

res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]



print(res)

it gives:

    color   fruitN  type_1  type_2  type_3

0     red    apple     0.0     0.0     0.0

1  yellow   orange     0.0     0.0     0.0

2    blue   banana     0.0     1.0     0.0

3   green  avocado     1.0     1.0     0.0

I hope this may help you.

answered Nov 20 '18 at 2:31

saeed heidari

1644

It's too specific

– Sara Wasl
Nov 20 '18 at 12:48

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384545%2fcompare-values-of-a-specific-column-to-different-files%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Using isin

l=[f1,f2,f3]

for x,y in enumerate(l):

    df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)



df

Out[144]: 

    color   fruitN  c1  c2  c3

0     red    apple   0   0   0

1  yellow   orange   0   0   0

2    blue   banana   0   1   0

3   green  avocado   1   1   0

answered Nov 20 '18 at 1:58

Wen-Ben

110k83266

add a comment |

Using isin

l=[f1,f2,f3]

for x,y in enumerate(l):

    df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)



df

Out[144]: 

    color   fruitN  c1  c2  c3

0     red    apple   0   0   0

1  yellow   orange   0   0   0

2    blue   banana   0   1   0

3   green  avocado   1   1   0

answered Nov 20 '18 at 1:58

Wen-Ben

110k83266

add a comment |

Using isin

l=[f1,f2,f3]

for x,y in enumerate(l):

    df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)



df

Out[144]: 

    color   fruitN  c1  c2  c3

0     red    apple   0   0   0

1  yellow   orange   0   0   0

2    blue   banana   0   1   0

3   green  avocado   1   1   0

answered Nov 20 '18 at 1:58

Wen-Ben

110k83266

Using isin

l=[f1,f2,f3]

for x,y in enumerate(l):

    df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)



df

Out[144]: 

    color   fruitN  c1  c2  c3

0     red    apple   0   0   0

1  yellow   orange   0   0   0

2    blue   banana   0   1   0

3   green  avocado   1   1   0

answered Nov 20 '18 at 1:58

Wen-Ben

110k83266

answered Nov 20 '18 at 1:58

Wen-Ben

110k83266

answered Nov 20 '18 at 1:58

Wen-Ben

110k83266

answered Nov 20 '18 at 1:58

Wen-Ben

110k83266

add a comment |

I think you can use this:

data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)

That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.

You might need to import numpy

import numpy as pd

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

answered Nov 20 '18 at 0:46

Jorge

1,3131921

add a comment |

I think you can use this:

data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)

That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.

You might need to import numpy

import numpy as pd

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

answered Nov 20 '18 at 0:46

Jorge

1,3131921

add a comment |

I think you can use this:

data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)

That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.

You might need to import numpy

import numpy as pd

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

answered Nov 20 '18 at 0:46

Jorge

1,3131921

I think you can use this:

data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)

That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.

You might need to import numpy

import numpy as pd

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

answered Nov 20 '18 at 0:46

Jorge

1,3131921

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

edited Nov 20 '18 at 0:49

eyllanesc

78.3k103156

answered Nov 20 '18 at 0:46

Jorge

1,3131921

answered Nov 20 '18 at 0:46

Jorge

1,3131921

answered Nov 20 '18 at 0:46

Jorge

1,3131921

add a comment |

At first create the data-frames like:

import pandas as pd

data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],

                    columns=["color", "fruitN"])

data = data.set_index("fruitN")

file_1 = ["akee", "apricot", "avocado"]

file_2 = ["avocado", "bilberry", "banana", "blackberry"]

file_3 = ["blackberry", "coconut", "cranberry"]

file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])

file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])

file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])

then concatenate them with corresponding axis and set the sort to false:

data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)

then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:

res = data_concat.loc[["apple", "orange", "banana", "avocado"]]

res.reset_index(level=0, inplace=True)

res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]

res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]



print(res)

it gives:

    color   fruitN  type_1  type_2  type_3

0     red    apple     0.0     0.0     0.0

1  yellow   orange     0.0     0.0     0.0

2    blue   banana     0.0     1.0     0.0

3   green  avocado     1.0     1.0     0.0

I hope this may help you.

answered Nov 20 '18 at 2:31

saeed heidari

1644

It's too specific

– Sara Wasl
Nov 20 '18 at 12:48

add a comment |

At first create the data-frames like:

import pandas as pd

data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],

                    columns=["color", "fruitN"])

data = data.set_index("fruitN")

file_1 = ["akee", "apricot", "avocado"]

file_2 = ["avocado", "bilberry", "banana", "blackberry"]

file_3 = ["blackberry", "coconut", "cranberry"]

file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])

file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])

file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])

then concatenate them with corresponding axis and set the sort to false:

data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)

then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:

res = data_concat.loc[["apple", "orange", "banana", "avocado"]]

res.reset_index(level=0, inplace=True)

res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]

res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]



print(res)

it gives:

    color   fruitN  type_1  type_2  type_3

0     red    apple     0.0     0.0     0.0

1  yellow   orange     0.0     0.0     0.0

2    blue   banana     0.0     1.0     0.0

3   green  avocado     1.0     1.0     0.0

I hope this may help you.

answered Nov 20 '18 at 2:31

saeed heidari

1644

It's too specific

– Sara Wasl
Nov 20 '18 at 12:48

add a comment |

At first create the data-frames like:

import pandas as pd

data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],

                    columns=["color", "fruitN"])

data = data.set_index("fruitN")

file_1 = ["akee", "apricot", "avocado"]

file_2 = ["avocado", "bilberry", "banana", "blackberry"]

file_3 = ["blackberry", "coconut", "cranberry"]

file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])

file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])

file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])

then concatenate them with corresponding axis and set the sort to false:

data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)

then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:

res = data_concat.loc[["apple", "orange", "banana", "avocado"]]

res.reset_index(level=0, inplace=True)

res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]

res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]



print(res)

it gives:

    color   fruitN  type_1  type_2  type_3

0     red    apple     0.0     0.0     0.0

1  yellow   orange     0.0     0.0     0.0

2    blue   banana     0.0     1.0     0.0

3   green  avocado     1.0     1.0     0.0

I hope this may help you.

answered Nov 20 '18 at 2:31

saeed heidari

1644

At first create the data-frames like:

import pandas as pd

data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],

                    columns=["color", "fruitN"])

data = data.set_index("fruitN")

file_1 = ["akee", "apricot", "avocado"]

file_2 = ["avocado", "bilberry", "banana", "blackberry"]

file_3 = ["blackberry", "coconut", "cranberry"]

file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])

file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])

file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])

then concatenate them with corresponding axis and set the sort to false:

data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)

then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:

res = data_concat.loc[["apple", "orange", "banana", "avocado"]]

res.reset_index(level=0, inplace=True)

res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]

res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]



print(res)

it gives:

    color   fruitN  type_1  type_2  type_3

0     red    apple     0.0     0.0     0.0

1  yellow   orange     0.0     0.0     0.0

2    blue   banana     0.0     1.0     0.0

3   green  avocado     1.0     1.0     0.0

I hope this may help you.

answered Nov 20 '18 at 2:31

saeed heidari

1644

answered Nov 20 '18 at 2:31

saeed heidari

1644

answered Nov 20 '18 at 2:31

saeed heidari

1644

answered Nov 20 '18 at 2:31

saeed heidari

1644

It's too specific

– Sara Wasl
Nov 20 '18 at 12:48

add a comment |

It's too specific

– Sara Wasl
Nov 20 '18 at 12:48

It's too specific

– Sara Wasl
Nov 20 '18 at 12:48

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk