Groupby and cumcount for valid rows only
up vote
0
down vote
favorite
I have a dataframe looks like this:
ids valid date
0 1 False 2012-06-10
1 1 True 2012-07-11
2 1 True 2012-09-20
3 2 False 2012-01-12
4 2 True 2012-07-11
5 3 True 2012-03-09
6 3 True 2012-07-11
7 3 False 2012-12-09
8 4 False 2012-07-11
I want to count how many valid case the person has so far and going through them in ascending date order
ids valid date occur
1 False 2012-06-10 0
1 True 2012-07-11 1
1 True 2012-09-20 2
2 False 2012-01-12 0
2 True 2012-07-11 1
3 True 2012-03-09 1
3 True 2012-07-11 2
3 False 2012-12-09 0
4 False 2012-07-11 0
What I have tried so far:
df = df.sort_values(['id', 'date'])
df['occur'] = df.groupby('valid').cumcount()+1
python pandas dataframe group-by pandas-groupby
|
show 2 more comments
up vote
0
down vote
favorite
I have a dataframe looks like this:
ids valid date
0 1 False 2012-06-10
1 1 True 2012-07-11
2 1 True 2012-09-20
3 2 False 2012-01-12
4 2 True 2012-07-11
5 3 True 2012-03-09
6 3 True 2012-07-11
7 3 False 2012-12-09
8 4 False 2012-07-11
I want to count how many valid case the person has so far and going through them in ascending date order
ids valid date occur
1 False 2012-06-10 0
1 True 2012-07-11 1
1 True 2012-09-20 2
2 False 2012-01-12 0
2 True 2012-07-11 1
3 True 2012-03-09 1
3 True 2012-07-11 2
3 False 2012-12-09 0
4 False 2012-07-11 0
What I have tried so far:
df = df.sort_values(['id', 'date'])
df['occur'] = df.groupby('valid').cumcount()+1
python pandas dataframe group-by pandas-groupby
"valid" in the input is integer but is boolean in the output? How does that work?
– coldspeed
Nov 9 at 22:13
@coldspeed I multiply the column by one to translate from true false
– Matt-pow
Nov 9 at 22:16
That doesn't explain how 1 * 1= False in row #2.
– coldspeed
Nov 9 at 22:16
Made an edit to correct mistakes
– Matt-pow
Nov 9 at 22:18
My question is how is it possible for any False values to be present if all of the rows are > 0?
– coldspeed
Nov 9 at 22:19
|
show 2 more comments
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a dataframe looks like this:
ids valid date
0 1 False 2012-06-10
1 1 True 2012-07-11
2 1 True 2012-09-20
3 2 False 2012-01-12
4 2 True 2012-07-11
5 3 True 2012-03-09
6 3 True 2012-07-11
7 3 False 2012-12-09
8 4 False 2012-07-11
I want to count how many valid case the person has so far and going through them in ascending date order
ids valid date occur
1 False 2012-06-10 0
1 True 2012-07-11 1
1 True 2012-09-20 2
2 False 2012-01-12 0
2 True 2012-07-11 1
3 True 2012-03-09 1
3 True 2012-07-11 2
3 False 2012-12-09 0
4 False 2012-07-11 0
What I have tried so far:
df = df.sort_values(['id', 'date'])
df['occur'] = df.groupby('valid').cumcount()+1
python pandas dataframe group-by pandas-groupby
I have a dataframe looks like this:
ids valid date
0 1 False 2012-06-10
1 1 True 2012-07-11
2 1 True 2012-09-20
3 2 False 2012-01-12
4 2 True 2012-07-11
5 3 True 2012-03-09
6 3 True 2012-07-11
7 3 False 2012-12-09
8 4 False 2012-07-11
I want to count how many valid case the person has so far and going through them in ascending date order
ids valid date occur
1 False 2012-06-10 0
1 True 2012-07-11 1
1 True 2012-09-20 2
2 False 2012-01-12 0
2 True 2012-07-11 1
3 True 2012-03-09 1
3 True 2012-07-11 2
3 False 2012-12-09 0
4 False 2012-07-11 0
What I have tried so far:
df = df.sort_values(['id', 'date'])
df['occur'] = df.groupby('valid').cumcount()+1
python pandas dataframe group-by pandas-groupby
python pandas dataframe group-by pandas-groupby
edited Nov 9 at 22:59
coldspeed
111k17101170
111k17101170
asked Nov 9 at 22:11
Matt-pow
114214
114214
"valid" in the input is integer but is boolean in the output? How does that work?
– coldspeed
Nov 9 at 22:13
@coldspeed I multiply the column by one to translate from true false
– Matt-pow
Nov 9 at 22:16
That doesn't explain how 1 * 1= False in row #2.
– coldspeed
Nov 9 at 22:16
Made an edit to correct mistakes
– Matt-pow
Nov 9 at 22:18
My question is how is it possible for any False values to be present if all of the rows are > 0?
– coldspeed
Nov 9 at 22:19
|
show 2 more comments
"valid" in the input is integer but is boolean in the output? How does that work?
– coldspeed
Nov 9 at 22:13
@coldspeed I multiply the column by one to translate from true false
– Matt-pow
Nov 9 at 22:16
That doesn't explain how 1 * 1= False in row #2.
– coldspeed
Nov 9 at 22:16
Made an edit to correct mistakes
– Matt-pow
Nov 9 at 22:18
My question is how is it possible for any False values to be present if all of the rows are > 0?
– coldspeed
Nov 9 at 22:19
"valid" in the input is integer but is boolean in the output? How does that work?
– coldspeed
Nov 9 at 22:13
"valid" in the input is integer but is boolean in the output? How does that work?
– coldspeed
Nov 9 at 22:13
@coldspeed I multiply the column by one to translate from true false
– Matt-pow
Nov 9 at 22:16
@coldspeed I multiply the column by one to translate from true false
– Matt-pow
Nov 9 at 22:16
That doesn't explain how 1 * 1= False in row #2.
– coldspeed
Nov 9 at 22:16
That doesn't explain how 1 * 1= False in row #2.
– coldspeed
Nov 9 at 22:16
Made an edit to correct mistakes
– Matt-pow
Nov 9 at 22:18
Made an edit to correct mistakes
– Matt-pow
Nov 9 at 22:18
My question is how is it possible for any False values to be present if all of the rows are > 0?
– coldspeed
Nov 9 at 22:19
My question is how is it possible for any False values to be present if all of the rows are > 0?
– coldspeed
Nov 9 at 22:19
|
show 2 more comments
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
Use groupby
and cumcount
:
df['occur'] = (df.groupby(['ids', 'valid'])
.cumcount()
.add(1)
.where(df.valid, 0))
print(df)
ids valid date occur
0 1 False 2012-06-10 0
1 1 True 2012-07-11 1
2 1 True 2012-09-20 2
3 2 False 2012-01-12 0
4 2 True 2012-07-11 1
5 3 True 2012-03-09 1
6 3 True 2012-07-11 2
7 3 False 2012-12-09 0
8 4 False 2012-07-11 0
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Use groupby
and cumcount
:
df['occur'] = (df.groupby(['ids', 'valid'])
.cumcount()
.add(1)
.where(df.valid, 0))
print(df)
ids valid date occur
0 1 False 2012-06-10 0
1 1 True 2012-07-11 1
2 1 True 2012-09-20 2
3 2 False 2012-01-12 0
4 2 True 2012-07-11 1
5 3 True 2012-03-09 1
6 3 True 2012-07-11 2
7 3 False 2012-12-09 0
8 4 False 2012-07-11 0
add a comment |
up vote
1
down vote
accepted
Use groupby
and cumcount
:
df['occur'] = (df.groupby(['ids', 'valid'])
.cumcount()
.add(1)
.where(df.valid, 0))
print(df)
ids valid date occur
0 1 False 2012-06-10 0
1 1 True 2012-07-11 1
2 1 True 2012-09-20 2
3 2 False 2012-01-12 0
4 2 True 2012-07-11 1
5 3 True 2012-03-09 1
6 3 True 2012-07-11 2
7 3 False 2012-12-09 0
8 4 False 2012-07-11 0
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Use groupby
and cumcount
:
df['occur'] = (df.groupby(['ids', 'valid'])
.cumcount()
.add(1)
.where(df.valid, 0))
print(df)
ids valid date occur
0 1 False 2012-06-10 0
1 1 True 2012-07-11 1
2 1 True 2012-09-20 2
3 2 False 2012-01-12 0
4 2 True 2012-07-11 1
5 3 True 2012-03-09 1
6 3 True 2012-07-11 2
7 3 False 2012-12-09 0
8 4 False 2012-07-11 0
Use groupby
and cumcount
:
df['occur'] = (df.groupby(['ids', 'valid'])
.cumcount()
.add(1)
.where(df.valid, 0))
print(df)
ids valid date occur
0 1 False 2012-06-10 0
1 1 True 2012-07-11 1
2 1 True 2012-09-20 2
3 2 False 2012-01-12 0
4 2 True 2012-07-11 1
5 3 True 2012-03-09 1
6 3 True 2012-07-11 2
7 3 False 2012-12-09 0
8 4 False 2012-07-11 0
answered Nov 9 at 22:57
coldspeed
111k17101170
111k17101170
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53233914%2fgroupby-and-cumcount-for-valid-rows-only%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
"valid" in the input is integer but is boolean in the output? How does that work?
– coldspeed
Nov 9 at 22:13
@coldspeed I multiply the column by one to translate from true false
– Matt-pow
Nov 9 at 22:16
That doesn't explain how 1 * 1= False in row #2.
– coldspeed
Nov 9 at 22:16
Made an edit to correct mistakes
– Matt-pow
Nov 9 at 22:18
My question is how is it possible for any False values to be present if all of the rows are > 0?
– coldspeed
Nov 9 at 22:19