Creating a new row whenever a comma appears in the column
I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.
For example:
Name, location, type, and hours
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM
I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.
The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM
So the program wouldn't show the restaurant if it wasn't open.
So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:
import pandas as pd
import numpy as np
data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>
I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So I tried to switch it up to:
if data['Monday'].any('Monday').str.contains(',') == 'True':
which results in: ValueError: No axis named Monday for object type
And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:
print data[data['Monday'].astype(str).str.contains(',')]
It works and returns the result. But I can't do any kind of conditional without it throwing an error.
3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)
Thanks for reading!
python pandas numpy
add a comment |
I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.
For example:
Name, location, type, and hours
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM
I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.
The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM
So the program wouldn't show the restaurant if it wasn't open.
So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:
import pandas as pd
import numpy as np
data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>
I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So I tried to switch it up to:
if data['Monday'].any('Monday').str.contains(',') == 'True':
which results in: ValueError: No axis named Monday for object type
And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:
print data[data['Monday'].astype(str).str.contains(',')]
It works and returns the result. But I can't do any kind of conditional without it throwing an error.
3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)
Thanks for reading!
python pandas numpy
data already exists in a dataframe? or in a json object?
– Harikrishna
Nov 19 '18 at 19:21
Yep! It exists already in the dataframe called data (from the csv)
– Sonicarrow
Nov 19 '18 at 19:34
add a comment |
I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.
For example:
Name, location, type, and hours
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM
I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.
The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM
So the program wouldn't show the restaurant if it wasn't open.
So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:
import pandas as pd
import numpy as np
data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>
I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So I tried to switch it up to:
if data['Monday'].any('Monday').str.contains(',') == 'True':
which results in: ValueError: No axis named Monday for object type
And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:
print data[data['Monday'].astype(str).str.contains(',')]
It works and returns the result. But I can't do any kind of conditional without it throwing an error.
3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)
Thanks for reading!
python pandas numpy
I'm trying to create a mini program that will calculate the closest, open restaurant closest to my location. I have a dataset that includes restaurant names, locations, stars, and hours. However, there is a problem: Sometimes a restaurant will have multiple open/close times in a day.
For example:
Name, location, type, and hours
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM
I'm trying to get the data into a CSV, but for restaurants with multiple hours (like in the example), it can't properly parse it.
The easiest solution for this would (I think) create another line with the same information, but the next set of hours. So, the example would then read:
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 11:30AM-2PM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 5:30-10:30PM
So the program wouldn't show the restaurant if it wasn't open.
So I have three general questions.
1) Is there a better way to go about this than the solution I mentioned above (creating a new row for every iteration of multiple open/close hours)
2) Below, I'm having trouble with the following implementation:
import pandas as pd
import numpy as np
data = pd.import_csv(data.csv)
for row in data:
if data['hours'].str.contains(',') == 'True':
count = data['hours'].str.count(',')
data.append..
<create new row with Name[row], location[row], type[row], and hours[row] for the # of count>
I've tried google-ing around, and I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So I tried to switch it up to:
if data['Monday'].any('Monday').str.contains(',') == 'True':
which results in: ValueError: No axis named Monday for object type
And I'm a bit unclear on the next steps here, or what I'm doing wrong, because if I just do:
print data[data['Monday'].astype(str).str.contains(',')]
It works and returns the result. But I can't do any kind of conditional without it throwing an error.
3) I'm also a bit confused on what to do if there are more than one comma in the row.. I have a vague idea, but if you have any hints, I'd love to hear them :)
Thanks for reading!
python pandas numpy
python pandas numpy
edited Nov 19 '18 at 19:14
sacuL
30.2k41940
30.2k41940
asked Nov 19 '18 at 19:13
SonicarrowSonicarrow
439
439
data already exists in a dataframe? or in a json object?
– Harikrishna
Nov 19 '18 at 19:21
Yep! It exists already in the dataframe called data (from the csv)
– Sonicarrow
Nov 19 '18 at 19:34
add a comment |
data already exists in a dataframe? or in a json object?
– Harikrishna
Nov 19 '18 at 19:21
Yep! It exists already in the dataframe called data (from the csv)
– Sonicarrow
Nov 19 '18 at 19:34
data already exists in a dataframe? or in a json object?
– Harikrishna
Nov 19 '18 at 19:21
data already exists in a dataframe? or in a json object?
– Harikrishna
Nov 19 '18 at 19:21
Yep! It exists already in the dataframe called data (from the csv)
– Sonicarrow
Nov 19 '18 at 19:34
Yep! It exists already in the dataframe called data (from the csv)
– Sonicarrow
Nov 19 '18 at 19:34
add a comment |
2 Answers
2
active
oldest
votes
If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM
or PM
(using a negative lookbehind). You can then use str.split
and stack
, after setting all the columns that you don't want to modify to the index. For example:
data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
# Get rid of spaces in your column names
data.columns = data.columns.str.strip(' ')
>>> data
Name location type hours
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM
new_data = (data.set_index(['Name', 'location', 'type'])
.hours.str.split(',', expand=True)
.stack()
.reset_index(level=['Name', 'location', 'type']))
>>> new_data
Name location type 0
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM
add a comment |
try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.
6:30-10:30AM_11:30AM-2PM_5:30-10:30PM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381170%2fcreating-a-new-row-whenever-a-comma-appears-in-the-column%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM
or PM
(using a negative lookbehind). You can then use str.split
and stack
, after setting all the columns that you don't want to modify to the index. For example:
data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
# Get rid of spaces in your column names
data.columns = data.columns.str.strip(' ')
>>> data
Name location type hours
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM
new_data = (data.set_index(['Name', 'location', 'type'])
.hours.str.split(',', expand=True)
.stack()
.reset_index(level=['Name', 'location', 'type']))
>>> new_data
Name location type 0
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM
add a comment |
If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM
or PM
(using a negative lookbehind). You can then use str.split
and stack
, after setting all the columns that you don't want to modify to the index. For example:
data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
# Get rid of spaces in your column names
data.columns = data.columns.str.strip(' ')
>>> data
Name location type hours
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM
new_data = (data.set_index(['Name', 'location', 'type'])
.hours.str.split(',', expand=True)
.stack()
.reset_index(level=['Name', 'location', 'type']))
>>> new_data
Name location type 0
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM
add a comment |
If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM
or PM
(using a negative lookbehind). You can then use str.split
and stack
, after setting all the columns that you don't want to modify to the index. For example:
data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
# Get rid of spaces in your column names
data.columns = data.columns.str.strip(' ')
>>> data
Name location type hours
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM
new_data = (data.set_index(['Name', 'location', 'type'])
.hours.str.split(',', expand=True)
.stack()
.reset_index(level=['Name', 'location', 'type']))
>>> new_data
Name location type 0
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM
If I understand correctly, you can load the data with a regular expression as the separator, making sure that what precedes the comma is not AM
or PM
(using a negative lookbehind). You can then use str.split
and stack
, after setting all the columns that you don't want to modify to the index. For example:
data = pd.read_csv('data.csv', sep='(?<!AM|PM),')
# Get rid of spaces in your column names
data.columns = data.columns.str.strip(' ')
>>> data
Name location type hours
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM, 11:30AM-2PM,5:30-10:30PM
new_data = (data.set_index(['Name', 'location', 'type'])
.hours.str.split(',', expand=True)
.stack()
.reset_index(level=['Name', 'location', 'type']))
>>> new_data
Name location type 0
0 Blue Duck Tavern 1201 24th St NW American Restaurant 6:30-10:30AM
1 Blue Duck Tavern 1201 24th St NW American Restaurant 11:30AM-2PM
2 Blue Duck Tavern 1201 24th St NW American Restaurant 5:30-10:30PM
edited Nov 19 '18 at 19:29
answered Nov 19 '18 at 19:19
sacuLsacuL
30.2k41940
30.2k41940
add a comment |
add a comment |
try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.
6:30-10:30AM_11:30AM-2PM_5:30-10:30PM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM
add a comment |
try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.
6:30-10:30AM_11:30AM-2PM_5:30-10:30PM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM
add a comment |
try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.
6:30-10:30AM_11:30AM-2PM_5:30-10:30PM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM
try to combine multiple hours with '_' or any other delimiter as mentioned below and take it as a whole.
6:30-10:30AM_11:30AM-2PM_5:30-10:30PM
Blue Duck Tavern, 1201 24th St NW, American Restaurant, 6:30-10:30AM_11:30AM-2PM_5:30-10:30PM
answered Nov 19 '18 at 19:26
Sudheer Kumar RSudheer Kumar R
11
11
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381170%2fcreating-a-new-row-whenever-a-comma-appears-in-the-column%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
data already exists in a dataframe? or in a json object?
– Harikrishna
Nov 19 '18 at 19:21
Yep! It exists already in the dataframe called data (from the csv)
– Sonicarrow
Nov 19 '18 at 19:34