Pandas groupby apply anomaly with datetime

While experimenting Pandas in Jupyter, I noticed very strange symptom. I reduce it down to a bare minimum code that demonstrates the symptom:

import pandas as pd

import numpy as np

from datetime import datetime



df = pd.DataFrame({

    'A': ['a', 'b', 'c'],

    'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]

})

df



    A   B

0   a   2018-11-01

1   b   2018-11-02

2   c   2018-11-03



def process(gdf):

    return pd.Series({

        'C': datetime(2018, 11, 5)

    })

df2 = df.groupby(['A']).apply(process).reset_index()

df2



    A   C

0   a   1541376000000000000

1   b   1541376000000000000

2   c   1541376000000000000



df2['C']



0    1541376000000000000

1    1541376000000000000

2    1541376000000000000

Name: C, dtype: int64

As you can see, the C column ended up being int64 type instead of the expected datetime64[ns] type. But if I don't have the B column then C column correctly ends up being datetime64[ns].

df = pd.DataFrame({

    'A': ['a', 'b', 'c'],

    # 'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]

})

df



    A

0   a

1   b

2   c



def process(gdf):

    return pd.Series({

        'C': datetime(2018, 11, 5)

    })

df2 = df.groupby(['A']).apply(process).reset_index()

df2



    A   C

0   a   2018-11-05

1   b   2018-11-05

2   c   2018-11-05



df2['C']



0   2018-11-05

1   2018-11-05

2   2018-11-05

Name: C, dtype: datetime64[ns]

I have no clue what is happening. Anyone any idea? I'm using Python 3.6 and Pandas 0.23.1

asked Nov 20 '18 at 6:40

Jake

7182616

I am using Python 2.7 and I cannot reproduce it. Also the first output is datetime

– Joe
Nov 20 '18 at 6:48

add a comment |

While experimenting Pandas in Jupyter, I noticed very strange symptom. I reduce it down to a bare minimum code that demonstrates the symptom:

import pandas as pd

import numpy as np

from datetime import datetime



df = pd.DataFrame({

    'A': ['a', 'b', 'c'],

    'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]

})

df



    A   B

0   a   2018-11-01

1   b   2018-11-02

2   c   2018-11-03



def process(gdf):

    return pd.Series({

        'C': datetime(2018, 11, 5)

    })

df2 = df.groupby(['A']).apply(process).reset_index()

df2



    A   C

0   a   1541376000000000000

1   b   1541376000000000000

2   c   1541376000000000000



df2['C']



0    1541376000000000000

1    1541376000000000000

2    1541376000000000000

Name: C, dtype: int64

As you can see, the C column ended up being int64 type instead of the expected datetime64[ns] type. But if I don't have the B column then C column correctly ends up being datetime64[ns].

df = pd.DataFrame({

    'A': ['a', 'b', 'c'],

    # 'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]

})

df



    A

0   a

1   b

2   c



def process(gdf):

    return pd.Series({

        'C': datetime(2018, 11, 5)

    })

df2 = df.groupby(['A']).apply(process).reset_index()

df2



    A   C

0   a   2018-11-05

1   b   2018-11-05

2   c   2018-11-05



df2['C']



0   2018-11-05

1   2018-11-05

2   2018-11-05

Name: C, dtype: datetime64[ns]

I have no clue what is happening. Anyone any idea? I'm using Python 3.6 and Pandas 0.23.1

asked Nov 20 '18 at 6:40

Jake

7182616

I am using Python 2.7 and I cannot reproduce it. Also the first output is datetime

– Joe
Nov 20 '18 at 6:48

add a comment |

While experimenting Pandas in Jupyter, I noticed very strange symptom. I reduce it down to a bare minimum code that demonstrates the symptom:

import pandas as pd

import numpy as np

from datetime import datetime



df = pd.DataFrame({

    'A': ['a', 'b', 'c'],

    'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]

})

df



    A   B

0   a   2018-11-01

1   b   2018-11-02

2   c   2018-11-03



def process(gdf):

    return pd.Series({

        'C': datetime(2018, 11, 5)

    })

df2 = df.groupby(['A']).apply(process).reset_index()

df2



    A   C

0   a   1541376000000000000

1   b   1541376000000000000

2   c   1541376000000000000



df2['C']



0    1541376000000000000

1    1541376000000000000

2    1541376000000000000

Name: C, dtype: int64

As you can see, the C column ended up being int64 type instead of the expected datetime64[ns] type. But if I don't have the B column then C column correctly ends up being datetime64[ns].

df = pd.DataFrame({

    'A': ['a', 'b', 'c'],

    # 'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]

})

df



    A

0   a

1   b

2   c



def process(gdf):

    return pd.Series({

        'C': datetime(2018, 11, 5)

    })

df2 = df.groupby(['A']).apply(process).reset_index()

df2



    A   C

0   a   2018-11-05

1   b   2018-11-05

2   c   2018-11-05



df2['C']



0   2018-11-05

1   2018-11-05

2   2018-11-05

Name: C, dtype: datetime64[ns]

I have no clue what is happening. Anyone any idea? I'm using Python 3.6 and Pandas 0.23.1

asked Nov 20 '18 at 6:40

Jake

7182616

While experimenting Pandas in Jupyter, I noticed very strange symptom. I reduce it down to a bare minimum code that demonstrates the symptom:

import pandas as pd

import numpy as np

from datetime import datetime



df = pd.DataFrame({

    'A': ['a', 'b', 'c'],

    'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]

})

df



    A   B

0   a   2018-11-01

1   b   2018-11-02

2   c   2018-11-03



def process(gdf):

    return pd.Series({

        'C': datetime(2018, 11, 5)

    })

df2 = df.groupby(['A']).apply(process).reset_index()

df2



    A   C

0   a   1541376000000000000

1   b   1541376000000000000

2   c   1541376000000000000



df2['C']



0    1541376000000000000

1    1541376000000000000

2    1541376000000000000

Name: C, dtype: int64

As you can see, the C column ended up being int64 type instead of the expected datetime64[ns] type. But if I don't have the B column then C column correctly ends up being datetime64[ns].

df = pd.DataFrame({

    'A': ['a', 'b', 'c'],

    # 'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]

})

df



    A

0   a

1   b

2   c



def process(gdf):

    return pd.Series({

        'C': datetime(2018, 11, 5)

    })

df2 = df.groupby(['A']).apply(process).reset_index()

df2



    A   C

0   a   2018-11-05

1   b   2018-11-05

2   c   2018-11-05



df2['C']



0   2018-11-05

1   2018-11-05

2   2018-11-05

Name: C, dtype: datetime64[ns]

I have no clue what is happening. Anyone any idea? I'm using Python 3.6 and Pandas 0.23.1

python pandas datetime group-by

asked Nov 20 '18 at 6:40

Jake

7182616

asked Nov 20 '18 at 6:40

Jake

7182616

asked Nov 20 '18 at 6:40

Jake

7182616

asked Nov 20 '18 at 6:40

Jake

7182616

asked Nov 20 '18 at 6:40

Jake

7182616

I am using Python 2.7 and I cannot reproduce it. Also the first output is datetime

– Joe
Nov 20 '18 at 6:48

add a comment |

I am using Python 2.7 and I cannot reproduce it. Also the first output is datetime

– Joe
Nov 20 '18 at 6:48

I am using Python 2.7 and I cannot reproduce it. Also the first output is datetime

– Joe
Nov 20 '18 at 6:48

add a comment |

1 Answer
1

active

oldest

votes

First it seems bug.

In my opinion here is possible create new column for each group and return not Series, but gdp group:

def process(gdf):

    gdf['C'] = datetime(2018, 11, 5)

    return gdf



df2 = df.groupby(['A']).apply(process)

print (df2)

   A          B          C

0  a 2018-11-01 2018-11-05

1  b 2018-11-02 2018-11-05

2  c 2018-11-03 2018-11-05

answered Nov 20 '18 at 6:50

jezrael

338k25288361

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53387552%2fpandas-groupby-apply-anomaly-with-datetime%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

First it seems bug.

In my opinion here is possible create new column for each group and return not Series, but gdp group:

def process(gdf):

    gdf['C'] = datetime(2018, 11, 5)

    return gdf



df2 = df.groupby(['A']).apply(process)

print (df2)

   A          B          C

0  a 2018-11-01 2018-11-05

1  b 2018-11-02 2018-11-05

2  c 2018-11-03 2018-11-05

answered Nov 20 '18 at 6:50

jezrael

338k25288361

add a comment |

First it seems bug.

In my opinion here is possible create new column for each group and return not Series, but gdp group:

def process(gdf):

    gdf['C'] = datetime(2018, 11, 5)

    return gdf



df2 = df.groupby(['A']).apply(process)

print (df2)

   A          B          C

0  a 2018-11-01 2018-11-05

1  b 2018-11-02 2018-11-05

2  c 2018-11-03 2018-11-05

answered Nov 20 '18 at 6:50

jezrael

338k25288361

add a comment |

First it seems bug.

In my opinion here is possible create new column for each group and return not Series, but gdp group:

def process(gdf):

    gdf['C'] = datetime(2018, 11, 5)

    return gdf



df2 = df.groupby(['A']).apply(process)

print (df2)

   A          B          C

0  a 2018-11-01 2018-11-05

1  b 2018-11-02 2018-11-05

2  c 2018-11-03 2018-11-05

answered Nov 20 '18 at 6:50

jezrael

338k25288361

First it seems bug.

In my opinion here is possible create new column for each group and return not Series, but gdp group:

def process(gdf):

    gdf['C'] = datetime(2018, 11, 5)

    return gdf



df2 = df.groupby(['A']).apply(process)

print (df2)

   A          B          C

0  a 2018-11-01 2018-11-05

1  b 2018-11-02 2018-11-05

2  c 2018-11-03 2018-11-05

answered Nov 20 '18 at 6:50

jezrael

338k25288361

answered Nov 20 '18 at 6:50

jezrael

338k25288361

answered Nov 20 '18 at 6:50

jezrael

338k25288361

answered Nov 20 '18 at 6:50

jezrael

338k25288361

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk