Compare values of a specific column to different files












1















I have a dataframe called data it's have 2 columns as follows:



color  fruitN
red apple
yellow orange
blue banana
green avocado


Also, I have 3 different files of one columns; each contain list of fruits as follows:



file1
akee
apricot
avocado

file2
avocado
bilberry
banana
blackberry

file3
blackberry
coconut
cranberry


I need to create another dataframe called type that have # of rows equal the same as data= 4, and # of columns equal same as the number of files = 3



I need to check each fruit in column fruitN of the dataframe data if it's exist in first file, put 1 in first row first column and continue checking in file2 and file3. So, the output dataframe type should be like that:



c1 c2 c3
0 0 0
0 0 0
0 1 0
1 1 0


Then, concatenated with data to be like this:



color  fruitN c1 c2 c3
red apple 0 0 0
yellow orange 0 0 0
blue banana 0 1 0
green avocado 1 1 0


I'm beginner with python, so any help would be appreciated.










share|improve this question

























  • what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

    – teng
    Nov 20 '18 at 0:38











  • I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

    – Sara Wasl
    Nov 20 '18 at 0:40
















1















I have a dataframe called data it's have 2 columns as follows:



color  fruitN
red apple
yellow orange
blue banana
green avocado


Also, I have 3 different files of one columns; each contain list of fruits as follows:



file1
akee
apricot
avocado

file2
avocado
bilberry
banana
blackberry

file3
blackberry
coconut
cranberry


I need to create another dataframe called type that have # of rows equal the same as data= 4, and # of columns equal same as the number of files = 3



I need to check each fruit in column fruitN of the dataframe data if it's exist in first file, put 1 in first row first column and continue checking in file2 and file3. So, the output dataframe type should be like that:



c1 c2 c3
0 0 0
0 0 0
0 1 0
1 1 0


Then, concatenated with data to be like this:



color  fruitN c1 c2 c3
red apple 0 0 0
yellow orange 0 0 0
blue banana 0 1 0
green avocado 1 1 0


I'm beginner with python, so any help would be appreciated.










share|improve this question

























  • what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

    – teng
    Nov 20 '18 at 0:38











  • I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

    – Sara Wasl
    Nov 20 '18 at 0:40














1












1








1








I have a dataframe called data it's have 2 columns as follows:



color  fruitN
red apple
yellow orange
blue banana
green avocado


Also, I have 3 different files of one columns; each contain list of fruits as follows:



file1
akee
apricot
avocado

file2
avocado
bilberry
banana
blackberry

file3
blackberry
coconut
cranberry


I need to create another dataframe called type that have # of rows equal the same as data= 4, and # of columns equal same as the number of files = 3



I need to check each fruit in column fruitN of the dataframe data if it's exist in first file, put 1 in first row first column and continue checking in file2 and file3. So, the output dataframe type should be like that:



c1 c2 c3
0 0 0
0 0 0
0 1 0
1 1 0


Then, concatenated with data to be like this:



color  fruitN c1 c2 c3
red apple 0 0 0
yellow orange 0 0 0
blue banana 0 1 0
green avocado 1 1 0


I'm beginner with python, so any help would be appreciated.










share|improve this question
















I have a dataframe called data it's have 2 columns as follows:



color  fruitN
red apple
yellow orange
blue banana
green avocado


Also, I have 3 different files of one columns; each contain list of fruits as follows:



file1
akee
apricot
avocado

file2
avocado
bilberry
banana
blackberry

file3
blackberry
coconut
cranberry


I need to create another dataframe called type that have # of rows equal the same as data= 4, and # of columns equal same as the number of files = 3



I need to check each fruit in column fruitN of the dataframe data if it's exist in first file, put 1 in first row first column and continue checking in file2 and file3. So, the output dataframe type should be like that:



c1 c2 c3
0 0 0
0 0 0
0 1 0
1 1 0


Then, concatenated with data to be like this:



color  fruitN c1 c2 c3
red apple 0 0 0
yellow orange 0 0 0
blue banana 0 1 0
green avocado 1 1 0


I'm beginner with python, so any help would be appreciated.







python pandas dataframe concatenation






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 0:49









eyllanesc

78.3k103156




78.3k103156










asked Nov 20 '18 at 0:27









Sara WaslSara Wasl

937




937













  • what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

    – teng
    Nov 20 '18 at 0:38











  • I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

    – Sara Wasl
    Nov 20 '18 at 0:40



















  • what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

    – teng
    Nov 20 '18 at 0:38











  • I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

    – Sara Wasl
    Nov 20 '18 at 0:40

















what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

– teng
Nov 20 '18 at 0:38





what have you tried, perhaps show us some sample code? and i think color for banana should be yellow, while that for orange should be orange. just saying.

– teng
Nov 20 '18 at 0:38













I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

– Sara Wasl
Nov 20 '18 at 0:40





I don't know how to do it exactly, and yeah for the colors but it just an example to show the idea @teng

– Sara Wasl
Nov 20 '18 at 0:40












3 Answers
3






active

oldest

votes


















1














Using isin



l=[f1,f2,f3]
for x,y in enumerate(l):
df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)

df
Out[144]:
color fruitN c1 c2 c3
0 red apple 0 0 0
1 yellow orange 0 0 0
2 blue banana 0 1 0
3 green avocado 1 1 0





share|improve this answer































    1














    I think you can use this:



    data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)


    That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.



    You might need to import numpy



    import numpy as pd





    share|improve this answer

































      0














      At first create the data-frames like:



      import pandas as pd
      data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],
      columns=["color", "fruitN"])
      data = data.set_index("fruitN")
      file_1 = ["akee", "apricot", "avocado"]
      file_2 = ["avocado", "bilberry", "banana", "blackberry"]
      file_3 = ["blackberry", "coconut", "cranberry"]
      file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])
      file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])
      file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])


      then concatenate them with corresponding axis and set the sort to false:



      data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)


      then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:



      res = data_concat.loc[["apple", "orange", "banana", "avocado"]]
      res.reset_index(level=0, inplace=True)
      res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]
      res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]

      print(res)


      it gives:



          color   fruitN  type_1  type_2  type_3
      0 red apple 0.0 0.0 0.0
      1 yellow orange 0.0 0.0 0.0
      2 blue banana 0.0 1.0 0.0
      3 green avocado 1.0 1.0 0.0


      I hope this may help you.






      share|improve this answer
























      • It's too specific

        – Sara Wasl
        Nov 20 '18 at 12:48











      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384545%2fcompare-values-of-a-specific-column-to-different-files%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      Using isin



      l=[f1,f2,f3]
      for x,y in enumerate(l):
      df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)

      df
      Out[144]:
      color fruitN c1 c2 c3
      0 red apple 0 0 0
      1 yellow orange 0 0 0
      2 blue banana 0 1 0
      3 green avocado 1 1 0





      share|improve this answer




























        1














        Using isin



        l=[f1,f2,f3]
        for x,y in enumerate(l):
        df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)

        df
        Out[144]:
        color fruitN c1 c2 c3
        0 red apple 0 0 0
        1 yellow orange 0 0 0
        2 blue banana 0 1 0
        3 green avocado 1 1 0





        share|improve this answer


























          1












          1








          1







          Using isin



          l=[f1,f2,f3]
          for x,y in enumerate(l):
          df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)

          df
          Out[144]:
          color fruitN c1 c2 c3
          0 red apple 0 0 0
          1 yellow orange 0 0 0
          2 blue banana 0 1 0
          3 green avocado 1 1 0





          share|improve this answer













          Using isin



          l=[f1,f2,f3]
          for x,y in enumerate(l):
          df['c'+str(x+1)]=df.fruitN.isin(y.iloc[:,0].tolist()).astype(int)

          df
          Out[144]:
          color fruitN c1 c2 c3
          0 red apple 0 0 0
          1 yellow orange 0 0 0
          2 blue banana 0 1 0
          3 green avocado 1 1 0






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 20 '18 at 1:58









          Wen-BenWen-Ben

          110k83266




          110k83266

























              1














              I think you can use this:



              data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)


              That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.



              You might need to import numpy



              import numpy as pd





              share|improve this answer






























                1














                I think you can use this:



                data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)


                That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.



                You might need to import numpy



                import numpy as pd





                share|improve this answer




























                  1












                  1








                  1







                  I think you can use this:



                  data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)


                  That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.



                  You might need to import numpy



                  import numpy as pd





                  share|improve this answer















                  I think you can use this:



                  data.loc[:, 'c1'] = np.where(data['fruitN'].isin(file1.values(), 1, 0)


                  That should create c1, repeat to create the other two columns. At the end all the information you want will be in dataframe data.



                  You might need to import numpy



                  import numpy as pd






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 20 '18 at 0:49









                  eyllanesc

                  78.3k103156




                  78.3k103156










                  answered Nov 20 '18 at 0:46









                  JorgeJorge

                  1,3131921




                  1,3131921























                      0














                      At first create the data-frames like:



                      import pandas as pd
                      data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],
                      columns=["color", "fruitN"])
                      data = data.set_index("fruitN")
                      file_1 = ["akee", "apricot", "avocado"]
                      file_2 = ["avocado", "bilberry", "banana", "blackberry"]
                      file_3 = ["blackberry", "coconut", "cranberry"]
                      file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])
                      file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])
                      file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])


                      then concatenate them with corresponding axis and set the sort to false:



                      data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)


                      then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:



                      res = data_concat.loc[["apple", "orange", "banana", "avocado"]]
                      res.reset_index(level=0, inplace=True)
                      res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]
                      res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]

                      print(res)


                      it gives:



                          color   fruitN  type_1  type_2  type_3
                      0 red apple 0.0 0.0 0.0
                      1 yellow orange 0.0 0.0 0.0
                      2 blue banana 0.0 1.0 0.0
                      3 green avocado 1.0 1.0 0.0


                      I hope this may help you.






                      share|improve this answer
























                      • It's too specific

                        – Sara Wasl
                        Nov 20 '18 at 12:48
















                      0














                      At first create the data-frames like:



                      import pandas as pd
                      data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],
                      columns=["color", "fruitN"])
                      data = data.set_index("fruitN")
                      file_1 = ["akee", "apricot", "avocado"]
                      file_2 = ["avocado", "bilberry", "banana", "blackberry"]
                      file_3 = ["blackberry", "coconut", "cranberry"]
                      file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])
                      file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])
                      file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])


                      then concatenate them with corresponding axis and set the sort to false:



                      data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)


                      then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:



                      res = data_concat.loc[["apple", "orange", "banana", "avocado"]]
                      res.reset_index(level=0, inplace=True)
                      res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]
                      res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]

                      print(res)


                      it gives:



                          color   fruitN  type_1  type_2  type_3
                      0 red apple 0.0 0.0 0.0
                      1 yellow orange 0.0 0.0 0.0
                      2 blue banana 0.0 1.0 0.0
                      3 green avocado 1.0 1.0 0.0


                      I hope this may help you.






                      share|improve this answer
























                      • It's too specific

                        – Sara Wasl
                        Nov 20 '18 at 12:48














                      0












                      0








                      0







                      At first create the data-frames like:



                      import pandas as pd
                      data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],
                      columns=["color", "fruitN"])
                      data = data.set_index("fruitN")
                      file_1 = ["akee", "apricot", "avocado"]
                      file_2 = ["avocado", "bilberry", "banana", "blackberry"]
                      file_3 = ["blackberry", "coconut", "cranberry"]
                      file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])
                      file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])
                      file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])


                      then concatenate them with corresponding axis and set the sort to false:



                      data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)


                      then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:



                      res = data_concat.loc[["apple", "orange", "banana", "avocado"]]
                      res.reset_index(level=0, inplace=True)
                      res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]
                      res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]

                      print(res)


                      it gives:



                          color   fruitN  type_1  type_2  type_3
                      0 red apple 0.0 0.0 0.0
                      1 yellow orange 0.0 0.0 0.0
                      2 blue banana 0.0 1.0 0.0
                      3 green avocado 1.0 1.0 0.0


                      I hope this may help you.






                      share|improve this answer













                      At first create the data-frames like:



                      import pandas as pd
                      data = pd.DataFrame(data=[["red", "apple"], ["yellow", "orange"], ["blue", "banana"], ["green", "avocado"]],
                      columns=["color", "fruitN"])
                      data = data.set_index("fruitN")
                      file_1 = ["akee", "apricot", "avocado"]
                      file_2 = ["avocado", "bilberry", "banana", "blackberry"]
                      file_3 = ["blackberry", "coconut", "cranberry"]
                      file_1_df = pd.DataFrame(data=[1] * len(file_1), index=file_1, columns=["type_1"])
                      file_2_df = pd.DataFrame(data=[1] * len(file_2), index=file_2, columns=["type_2"])
                      file_3_df = pd.DataFrame(data=[1] * len(file_3), index=file_3, columns=["type_3"])


                      then concatenate them with corresponding axis and set the sort to false:



                      data_concat = pd.concat([data, file_1_df, file_2_df, file_3_df], axis=1, sort=False).fillna(0)


                      then pick the right indices and reformat result data as you like, I did it to get exactly what you mentioned that you need:



                      res = data_concat.loc[["apple", "orange", "banana", "avocado"]]
                      res.reset_index(level=0, inplace=True)
                      res.columns = ["fruitN", "color", "type_1", "type_2", "type_3"]
                      res = res.ix[:, ["color", "fruitN", "type_1", "type_2", "type_3"]]

                      print(res)


                      it gives:



                          color   fruitN  type_1  type_2  type_3
                      0 red apple 0.0 0.0 0.0
                      1 yellow orange 0.0 0.0 0.0
                      2 blue banana 0.0 1.0 0.0
                      3 green avocado 1.0 1.0 0.0


                      I hope this may help you.







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Nov 20 '18 at 2:31









                      saeed heidarisaeed heidari

                      1644




                      1644













                      • It's too specific

                        – Sara Wasl
                        Nov 20 '18 at 12:48



















                      • It's too specific

                        – Sara Wasl
                        Nov 20 '18 at 12:48

















                      It's too specific

                      – Sara Wasl
                      Nov 20 '18 at 12:48





                      It's too specific

                      – Sara Wasl
                      Nov 20 '18 at 12:48


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384545%2fcompare-values-of-a-specific-column-to-different-files%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Guess what letter conforming each word

                      Run scheduled task as local user group (not BUILTIN)

                      Port of Spain