Creating a new row whenever a comma appears in the column












0















I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.



For example:



Name, location, type, and hours

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.



The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:



Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM


So the program wouldn't show the restaurant if it wasn't open.



So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:



import pandas as pd
import numpy as np

data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>


I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().



So I tried to switch it up to:



if data['Monday'].any('Monday').str.contains(',') == 'True': 


which results in: ValueError: No axis named Monday for object type



And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:



print data[data['Monday'].astype(str).str.contains(',')]


It works and returns the result. But I can't do any kind of conditional without it throwing an error.



3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)



Thanks for reading!










share|improve this question

























  • data already exists in a dataframe? or in a json object?

    – Harikrishna
    Nov 19 '18 at 19:21













  • Yep! It exists already in the dataframe called data (from the csv)

    – Sonicarrow
    Nov 19 '18 at 19:34
















0















I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.



For example:



Name, location, type, and hours

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.



The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:



Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM


So the program wouldn't show the restaurant if it wasn't open.



So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:



import pandas as pd
import numpy as np

data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>


I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().



So I tried to switch it up to:



if data['Monday'].any('Monday').str.contains(',') == 'True': 


which results in: ValueError: No axis named Monday for object type



And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:



print data[data['Monday'].astype(str).str.contains(',')]


It works and returns the result. But I can't do any kind of conditional without it throwing an error.



3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)



Thanks for reading!










share|improve this question

























  • data already exists in a dataframe? or in a json object?

    – Harikrishna
    Nov 19 '18 at 19:21













  • Yep! It exists already in the dataframe called data (from the csv)

    – Sonicarrow
    Nov 19 '18 at 19:34














0












0








0








I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.



For example:



Name, location, type, and hours

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.



The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:



Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM


So the program wouldn't show the restaurant if it wasn't open.



So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:



import pandas as pd
import numpy as np

data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>


I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().



So I tried to switch it up to:



if data['Monday'].any('Monday').str.contains(',') == 'True': 


which results in: ValueError: No axis named Monday for object type



And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:



print data[data['Monday'].astype(str).str.contains(',')]


It works and returns the result. But I can't do any kind of conditional without it throwing an error.



3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)



Thanks for reading!










share|improve this question
















I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.



For example:



Name, location, type, and hours

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.



The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:



Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM

Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM


So the program wouldn't show the restaurant if it wasn't open.



So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:



import pandas as pd
import numpy as np

data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>


I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().



So I tried to switch it up to:



if data['Monday'].any('Monday').str.contains(',') == 'True': 


which results in: ValueError: No axis named Monday for object type



And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:



print data[data['Monday'].astype(str).str.contains(',')]


It works and returns the result. But I can't do any kind of conditional without it throwing an error.



3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)



Thanks for reading!







python pandas numpy






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 '18 at 19:14









sacuL

30.2k41940




30.2k41940










asked Nov 19 '18 at 19:13









SonicarrowSonicarrow

439




439













  • data already exists in a dataframe? or in a json object?

    – Harikrishna
    Nov 19 '18 at 19:21













  • Yep! It exists already in the dataframe called data (from the csv)

    – Sonicarrow
    Nov 19 '18 at 19:34



















  • data already exists in a dataframe? or in a json object?

    – Harikrishna
    Nov 19 '18 at 19:21













  • Yep! It exists already in the dataframe called data (from the csv)

    – Sonicarrow
    Nov 19 '18 at 19:34

















data already exists in a dataframe? or in a json object?

– Harikrishna
Nov 19 '18 at 19:21







data already exists in a dataframe? or in a json object?

– Harikrishna
Nov 19 '18 at 19:21















Yep! It exists already in the dataframe called data (from the csv)

– Sonicarrow
Nov 19 '18 at 19:34





Yep! It exists already in the dataframe called data (from the csv)

– Sonicarrow
Nov 19 '18 at 19:34












2 Answers
2






active

oldest

votes


















1














If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM or PM (using a negative lookbehind). You can then use str.split and stack, after setting all the columns that you don't want to modify to the index. For example:



data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
# Get rid of spaces in your column names
data.columns = data.columns.str.strip(' ')

>>> data
Name location type hours
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


new_data = (data.set_index(['Name', 'location', 'type'])
.hours.str.split(',', expand=True)
.stack()
.reset_index(level=['Name', 'location', 'type']))

>>> new_data
Name location type 0
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM





share|improve this answer

































    0














    try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.



    6:30-10:30AM_11:30AM-2PM_5:30-10:30PM



    Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381170%2fcreating-a-new-row-whenever-a-comma-appears-in-the-column%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM or PM (using a negative lookbehind). You can then use str.split and stack, after setting all the columns that you don't want to modify to the index. For example:



      data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
      # Get rid of spaces in your column names
      data.columns = data.columns.str.strip(' ')

      >>> data
      Name location type hours
      0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


      new_data = (data.set_index(['Name', 'location', 'type'])
      .hours.str.split(',', expand=True)
      .stack()
      .reset_index(level=['Name', 'location', 'type']))

      >>> new_data
      Name location type 0
      0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
      1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
      2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM





      share|improve this answer






























        1














        If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM or PM (using a negative lookbehind). You can then use str.split and stack, after setting all the columns that you don't want to modify to the index. For example:



        data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
        # Get rid of spaces in your column names
        data.columns = data.columns.str.strip(' ')

        >>> data
        Name location type hours
        0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


        new_data = (data.set_index(['Name', 'location', 'type'])
        .hours.str.split(',', expand=True)
        .stack()
        .reset_index(level=['Name', 'location', 'type']))

        >>> new_data
        Name location type 0
        0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
        1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
        2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM





        share|improve this answer




























          1












          1








          1







          If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM or PM (using a negative lookbehind). You can then use str.split and stack, after setting all the columns that you don't want to modify to the index. For example:



          data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
          # Get rid of spaces in your column names
          data.columns = data.columns.str.strip(' ')

          >>> data
          Name location type hours
          0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


          new_data = (data.set_index(['Name', 'location', 'type'])
          .hours.str.split(',', expand=True)
          .stack()
          .reset_index(level=['Name', 'location', 'type']))

          >>> new_data
          Name location type 0
          0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
          1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
          2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM





          share|improve this answer















          If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM or PM (using a negative lookbehind). You can then use str.split and stack, after setting all the columns that you don't want to modify to the index. For example:



          data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
          # Get rid of spaces in your column names
          data.columns = data.columns.str.strip(' ')

          >>> data
          Name location type hours
          0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM


          new_data = (data.set_index(['Name', 'location', 'type'])
          .hours.str.split(',', expand=True)
          .stack()
          .reset_index(level=['Name', 'location', 'type']))

          >>> new_data
          Name location type 0
          0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
          1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
          2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 19 '18 at 19:29

























          answered Nov 19 '18 at 19:19









          sacuLsacuL

          30.2k41940




          30.2k41940

























              0














              try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.



              6:30-10:30AM_11:30AM-2PM_5:30-10:30PM



              Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM






              share|improve this answer




























                0














                try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.



                6:30-10:30AM_11:30AM-2PM_5:30-10:30PM



                Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM






                share|improve this answer


























                  0












                  0








                  0







                  try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.



                  6:30-10:30AM_11:30AM-2PM_5:30-10:30PM



                  Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM






                  share|improve this answer













                  try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.



                  6:30-10:30AM_11:30AM-2PM_5:30-10:30PM



                  Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 19 '18 at 19:26









                  Sudheer Kumar RSudheer Kumar R

                  11




                  11






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381170%2fcreating-a-new-row-whenever-a-comma-appears-in-the-column%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Guess what letter conforming each word

                      Run scheduled task as local user group (not BUILTIN)

                      Port of Spain