Pandas Dataframes operation
up vote
-1
down vote
favorite
I use pandas dataframes to process my dataset. I have 3 columns, airport_id airline_id and delay. I want to remove all origin airports that have less than 5 airlines.
I did this:
grouped_size = df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).size()
Which gives me the number of airlines per airport(I hope) but I do not know how to remove the ones with less than 5 airlines. Thank you!
python pandas pandas-groupby
|
show 1 more comment
up vote
-1
down vote
favorite
I use pandas dataframes to process my dataset. I have 3 columns, airport_id airline_id and delay. I want to remove all origin airports that have less than 5 airlines.
I did this:
grouped_size = df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).size()
Which gives me the number of airlines per airport(I hope) but I do not know how to remove the ones with less than 5 airlines. Thank you!
python pandas pandas-groupby
df = df[df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).transform('count') >= 5]
?
– coldspeed
Nov 12 at 0:55
@coldspeed or usegroupby(...).filter(...)
- save materialising a Series if it's not being used for anything?
– Jon Clements♦
Nov 12 at 0:59
@JonClements, I'm guessingfilter
would requirelambda
? If so, I'm all in favour of avoidinglamda
:).
– jpp
Nov 12 at 0:59
@coldspeed I get this error "ValueError: Boolean array expected for the condition, not float64"
– Kaan Yolsever
Nov 12 at 1:03
@KaanYolsever Please copy the entire command properly.
– coldspeed
Nov 12 at 1:22
|
show 1 more comment
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I use pandas dataframes to process my dataset. I have 3 columns, airport_id airline_id and delay. I want to remove all origin airports that have less than 5 airlines.
I did this:
grouped_size = df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).size()
Which gives me the number of airlines per airport(I hope) but I do not know how to remove the ones with less than 5 airlines. Thank you!
python pandas pandas-groupby
I use pandas dataframes to process my dataset. I have 3 columns, airport_id airline_id and delay. I want to remove all origin airports that have less than 5 airlines.
I did this:
grouped_size = df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).size()
Which gives me the number of airlines per airport(I hope) but I do not know how to remove the ones with less than 5 airlines. Thank you!
python pandas pandas-groupby
python pandas pandas-groupby
edited Nov 12 at 0:55
Jon Clements♦
97.7k19172216
97.7k19172216
asked Nov 12 at 0:53
Kaan Yolsever
12
12
df = df[df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).transform('count') >= 5]
?
– coldspeed
Nov 12 at 0:55
@coldspeed or usegroupby(...).filter(...)
- save materialising a Series if it's not being used for anything?
– Jon Clements♦
Nov 12 at 0:59
@JonClements, I'm guessingfilter
would requirelambda
? If so, I'm all in favour of avoidinglamda
:).
– jpp
Nov 12 at 0:59
@coldspeed I get this error "ValueError: Boolean array expected for the condition, not float64"
– Kaan Yolsever
Nov 12 at 1:03
@KaanYolsever Please copy the entire command properly.
– coldspeed
Nov 12 at 1:22
|
show 1 more comment
df = df[df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).transform('count') >= 5]
?
– coldspeed
Nov 12 at 0:55
@coldspeed or usegroupby(...).filter(...)
- save materialising a Series if it's not being used for anything?
– Jon Clements♦
Nov 12 at 0:59
@JonClements, I'm guessingfilter
would requirelambda
? If so, I'm all in favour of avoidinglamda
:).
– jpp
Nov 12 at 0:59
@coldspeed I get this error "ValueError: Boolean array expected for the condition, not float64"
– Kaan Yolsever
Nov 12 at 1:03
@KaanYolsever Please copy the entire command properly.
– coldspeed
Nov 12 at 1:22
df = df[df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).transform('count') >= 5]
?– coldspeed
Nov 12 at 0:55
df = df[df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).transform('count') >= 5]
?– coldspeed
Nov 12 at 0:55
@coldspeed or use
groupby(...).filter(...)
- save materialising a Series if it's not being used for anything?– Jon Clements♦
Nov 12 at 0:59
@coldspeed or use
groupby(...).filter(...)
- save materialising a Series if it's not being used for anything?– Jon Clements♦
Nov 12 at 0:59
@JonClements, I'm guessing
filter
would require lambda
? If so, I'm all in favour of avoiding lamda
:).– jpp
Nov 12 at 0:59
@JonClements, I'm guessing
filter
would require lambda
? If so, I'm all in favour of avoiding lamda
:).– jpp
Nov 12 at 0:59
@coldspeed I get this error "ValueError: Boolean array expected for the condition, not float64"
– Kaan Yolsever
Nov 12 at 1:03
@coldspeed I get this error "ValueError: Boolean array expected for the condition, not float64"
– Kaan Yolsever
Nov 12 at 1:03
@KaanYolsever Please copy the entire command properly.
– coldspeed
Nov 12 at 1:22
@KaanYolsever Please copy the entire command properly.
– coldspeed
Nov 12 at 1:22
|
show 1 more comment
1 Answer
1
active
oldest
votes
up vote
0
down vote
Here is a simple way to do it:
grouped_size = df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).size().reset_index()
grouped_size.columns = ['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID', 'size']
hi_mask = grouped_size['size'] > 5
grouped_size = grouped_size[hi_mask]
@ thank you but i get this error: '>' not supported between instances of 'str' and 'int'
– Kaan Yolsever
Nov 12 at 1:20
corrected the code, please try it
– jeevs
Nov 12 at 1:33
one thing is when I do this I lose my other columns. How can I do this while preserving them in the output data structure?
– Kaan Yolsever
Nov 12 at 1:39
Once you do agroupby
, you are basically asking pandas to count on the base dataframe. Hence, you no more are working with the base dataframe. If you want to pick only the those that satisfy this criteria, you will need tomerge
the groupy df with the base df, and filter again.
– jeevs
Nov 12 at 1:46
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Here is a simple way to do it:
grouped_size = df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).size().reset_index()
grouped_size.columns = ['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID', 'size']
hi_mask = grouped_size['size'] > 5
grouped_size = grouped_size[hi_mask]
@ thank you but i get this error: '>' not supported between instances of 'str' and 'int'
– Kaan Yolsever
Nov 12 at 1:20
corrected the code, please try it
– jeevs
Nov 12 at 1:33
one thing is when I do this I lose my other columns. How can I do this while preserving them in the output data structure?
– Kaan Yolsever
Nov 12 at 1:39
Once you do agroupby
, you are basically asking pandas to count on the base dataframe. Hence, you no more are working with the base dataframe. If you want to pick only the those that satisfy this criteria, you will need tomerge
the groupy df with the base df, and filter again.
– jeevs
Nov 12 at 1:46
add a comment |
up vote
0
down vote
Here is a simple way to do it:
grouped_size = df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).size().reset_index()
grouped_size.columns = ['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID', 'size']
hi_mask = grouped_size['size'] > 5
grouped_size = grouped_size[hi_mask]
@ thank you but i get this error: '>' not supported between instances of 'str' and 'int'
– Kaan Yolsever
Nov 12 at 1:20
corrected the code, please try it
– jeevs
Nov 12 at 1:33
one thing is when I do this I lose my other columns. How can I do this while preserving them in the output data structure?
– Kaan Yolsever
Nov 12 at 1:39
Once you do agroupby
, you are basically asking pandas to count on the base dataframe. Hence, you no more are working with the base dataframe. If you want to pick only the those that satisfy this criteria, you will need tomerge
the groupy df with the base df, and filter again.
– jeevs
Nov 12 at 1:46
add a comment |
up vote
0
down vote
up vote
0
down vote
Here is a simple way to do it:
grouped_size = df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).size().reset_index()
grouped_size.columns = ['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID', 'size']
hi_mask = grouped_size['size'] > 5
grouped_size = grouped_size[hi_mask]
Here is a simple way to do it:
grouped_size = df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).size().reset_index()
grouped_size.columns = ['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID', 'size']
hi_mask = grouped_size['size'] > 5
grouped_size = grouped_size[hi_mask]
edited Nov 12 at 1:33
answered Nov 12 at 1:12
jeevs
1164
1164
@ thank you but i get this error: '>' not supported between instances of 'str' and 'int'
– Kaan Yolsever
Nov 12 at 1:20
corrected the code, please try it
– jeevs
Nov 12 at 1:33
one thing is when I do this I lose my other columns. How can I do this while preserving them in the output data structure?
– Kaan Yolsever
Nov 12 at 1:39
Once you do agroupby
, you are basically asking pandas to count on the base dataframe. Hence, you no more are working with the base dataframe. If you want to pick only the those that satisfy this criteria, you will need tomerge
the groupy df with the base df, and filter again.
– jeevs
Nov 12 at 1:46
add a comment |
@ thank you but i get this error: '>' not supported between instances of 'str' and 'int'
– Kaan Yolsever
Nov 12 at 1:20
corrected the code, please try it
– jeevs
Nov 12 at 1:33
one thing is when I do this I lose my other columns. How can I do this while preserving them in the output data structure?
– Kaan Yolsever
Nov 12 at 1:39
Once you do agroupby
, you are basically asking pandas to count on the base dataframe. Hence, you no more are working with the base dataframe. If you want to pick only the those that satisfy this criteria, you will need tomerge
the groupy df with the base df, and filter again.
– jeevs
Nov 12 at 1:46
@ thank you but i get this error: '>' not supported between instances of 'str' and 'int'
– Kaan Yolsever
Nov 12 at 1:20
@ thank you but i get this error: '>' not supported between instances of 'str' and 'int'
– Kaan Yolsever
Nov 12 at 1:20
corrected the code, please try it
– jeevs
Nov 12 at 1:33
corrected the code, please try it
– jeevs
Nov 12 at 1:33
one thing is when I do this I lose my other columns. How can I do this while preserving them in the output data structure?
– Kaan Yolsever
Nov 12 at 1:39
one thing is when I do this I lose my other columns. How can I do this while preserving them in the output data structure?
– Kaan Yolsever
Nov 12 at 1:39
Once you do a
groupby
, you are basically asking pandas to count on the base dataframe. Hence, you no more are working with the base dataframe. If you want to pick only the those that satisfy this criteria, you will need to merge
the groupy df with the base df, and filter again.– jeevs
Nov 12 at 1:46
Once you do a
groupby
, you are basically asking pandas to count on the base dataframe. Hence, you no more are working with the base dataframe. If you want to pick only the those that satisfy this criteria, you will need to merge
the groupy df with the base df, and filter again.– jeevs
Nov 12 at 1:46
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254760%2fpandas-dataframes-operation%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
df = df[df.groupby(['OP_CARRIER_AIRLINE_ID','ORIGIN_AIRPORT_ID']).transform('count') >= 5]
?– coldspeed
Nov 12 at 0:55
@coldspeed or use
groupby(...).filter(...)
- save materialising a Series if it's not being used for anything?– Jon Clements♦
Nov 12 at 0:59
@JonClements, I'm guessing
filter
would requirelambda
? If so, I'm all in favour of avoidinglamda
:).– jpp
Nov 12 at 0:59
@coldspeed I get this error "ValueError: Boolean array expected for the condition, not float64"
– Kaan Yolsever
Nov 12 at 1:03
@KaanYolsever Please copy the entire command properly.
– coldspeed
Nov 12 at 1:22