Filling the missing data in a timeseries by making an average time series











up vote
1
down vote

favorite












I have an hourly dataseries in pandas for about 10 years. Sometimes the data is missing for 2-3 months. I want to fill the missing periods. The process I thought of it is as follows.




  1. Create one year hourly timeseries from the available data that is
    calculated by average values for each day and hour.

  2. Fill the missing values from this average time series.

  3. For example if 2009/01/28 1:00pm is missing, it will locate 01/28 1:00pm from the timeseries calculated in the first step and fill it.


I tried searching a lot, but I could not accomplish this task.



Any help will be appreciated.



Edit:
This is my attempt so far. I am still testing it. It takes a long time though.



for count in dfaverage.index:
currentday = count.day
currentmonth = count.month
currenthour = count.hour
match_timestamp=('{:02}').format(currentmonth) + '-' + ('{:02}').format(currentday) + ' ' + ('{:02}').format(currenthour)
#print(match_timestamp)
value = df.loc[df.index.strftime('%m-%d %H') == match_timestamp].mean()
dfaverage.loc[count]['value'] = value

for count in df.index:
if math.isnan(df.loc[count]):
currentday = count.day
currentmonth = count.month
currenthour = count.hour
match_timestamp=('{:02}').format(currentmonth) + '-' + ('{:02}').format(currentday) + ' ' + ('{:02}').format(currenthour)
value = dfaverage.loc[dfaverage.index.strftime('%m-%d %H') == match_timestamp].mean()
df.at[count, 'AtmPressurekPa'] = value


Now, I want to iterate through each element of this empty dataseries, find corresponding day, month, and hour values from the main dataframe (df), average it, and assign to this time series.



Later on, I will use the dfaverage timeseries to fill the missing values in the df timeseries.










share|improve this question




















  • 2




    Where's your attempt? Please put the code which you tried.
    – Mayank Porwal
    Nov 12 at 6:50






  • 1




    Hi. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful.
    – jezrael
    Nov 12 at 7:39















up vote
1
down vote

favorite












I have an hourly dataseries in pandas for about 10 years. Sometimes the data is missing for 2-3 months. I want to fill the missing periods. The process I thought of it is as follows.




  1. Create one year hourly timeseries from the available data that is
    calculated by average values for each day and hour.

  2. Fill the missing values from this average time series.

  3. For example if 2009/01/28 1:00pm is missing, it will locate 01/28 1:00pm from the timeseries calculated in the first step and fill it.


I tried searching a lot, but I could not accomplish this task.



Any help will be appreciated.



Edit:
This is my attempt so far. I am still testing it. It takes a long time though.



for count in dfaverage.index:
currentday = count.day
currentmonth = count.month
currenthour = count.hour
match_timestamp=('{:02}').format(currentmonth) + '-' + ('{:02}').format(currentday) + ' ' + ('{:02}').format(currenthour)
#print(match_timestamp)
value = df.loc[df.index.strftime('%m-%d %H') == match_timestamp].mean()
dfaverage.loc[count]['value'] = value

for count in df.index:
if math.isnan(df.loc[count]):
currentday = count.day
currentmonth = count.month
currenthour = count.hour
match_timestamp=('{:02}').format(currentmonth) + '-' + ('{:02}').format(currentday) + ' ' + ('{:02}').format(currenthour)
value = dfaverage.loc[dfaverage.index.strftime('%m-%d %H') == match_timestamp].mean()
df.at[count, 'AtmPressurekPa'] = value


Now, I want to iterate through each element of this empty dataseries, find corresponding day, month, and hour values from the main dataframe (df), average it, and assign to this time series.



Later on, I will use the dfaverage timeseries to fill the missing values in the df timeseries.










share|improve this question




















  • 2




    Where's your attempt? Please put the code which you tried.
    – Mayank Porwal
    Nov 12 at 6:50






  • 1




    Hi. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful.
    – jezrael
    Nov 12 at 7:39













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have an hourly dataseries in pandas for about 10 years. Sometimes the data is missing for 2-3 months. I want to fill the missing periods. The process I thought of it is as follows.




  1. Create one year hourly timeseries from the available data that is
    calculated by average values for each day and hour.

  2. Fill the missing values from this average time series.

  3. For example if 2009/01/28 1:00pm is missing, it will locate 01/28 1:00pm from the timeseries calculated in the first step and fill it.


I tried searching a lot, but I could not accomplish this task.



Any help will be appreciated.



Edit:
This is my attempt so far. I am still testing it. It takes a long time though.



for count in dfaverage.index:
currentday = count.day
currentmonth = count.month
currenthour = count.hour
match_timestamp=('{:02}').format(currentmonth) + '-' + ('{:02}').format(currentday) + ' ' + ('{:02}').format(currenthour)
#print(match_timestamp)
value = df.loc[df.index.strftime('%m-%d %H') == match_timestamp].mean()
dfaverage.loc[count]['value'] = value

for count in df.index:
if math.isnan(df.loc[count]):
currentday = count.day
currentmonth = count.month
currenthour = count.hour
match_timestamp=('{:02}').format(currentmonth) + '-' + ('{:02}').format(currentday) + ' ' + ('{:02}').format(currenthour)
value = dfaverage.loc[dfaverage.index.strftime('%m-%d %H') == match_timestamp].mean()
df.at[count, 'AtmPressurekPa'] = value


Now, I want to iterate through each element of this empty dataseries, find corresponding day, month, and hour values from the main dataframe (df), average it, and assign to this time series.



Later on, I will use the dfaverage timeseries to fill the missing values in the df timeseries.










share|improve this question















I have an hourly dataseries in pandas for about 10 years. Sometimes the data is missing for 2-3 months. I want to fill the missing periods. The process I thought of it is as follows.




  1. Create one year hourly timeseries from the available data that is
    calculated by average values for each day and hour.

  2. Fill the missing values from this average time series.

  3. For example if 2009/01/28 1:00pm is missing, it will locate 01/28 1:00pm from the timeseries calculated in the first step and fill it.


I tried searching a lot, but I could not accomplish this task.



Any help will be appreciated.



Edit:
This is my attempt so far. I am still testing it. It takes a long time though.



for count in dfaverage.index:
currentday = count.day
currentmonth = count.month
currenthour = count.hour
match_timestamp=('{:02}').format(currentmonth) + '-' + ('{:02}').format(currentday) + ' ' + ('{:02}').format(currenthour)
#print(match_timestamp)
value = df.loc[df.index.strftime('%m-%d %H') == match_timestamp].mean()
dfaverage.loc[count]['value'] = value

for count in df.index:
if math.isnan(df.loc[count]):
currentday = count.day
currentmonth = count.month
currenthour = count.hour
match_timestamp=('{:02}').format(currentmonth) + '-' + ('{:02}').format(currentday) + ' ' + ('{:02}').format(currenthour)
value = dfaverage.loc[dfaverage.index.strftime('%m-%d %H') == match_timestamp].mean()
df.at[count, 'AtmPressurekPa'] = value


Now, I want to iterate through each element of this empty dataseries, find corresponding day, month, and hour values from the main dataframe (df), average it, and assign to this time series.



Later on, I will use the dfaverage timeseries to fill the missing values in the df timeseries.







python-3.x pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 at 19:40

























asked Nov 12 at 6:48









Anurag Mishra

166




166








  • 2




    Where's your attempt? Please put the code which you tried.
    – Mayank Porwal
    Nov 12 at 6:50






  • 1




    Hi. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful.
    – jezrael
    Nov 12 at 7:39














  • 2




    Where's your attempt? Please put the code which you tried.
    – Mayank Porwal
    Nov 12 at 6:50






  • 1




    Hi. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful.
    – jezrael
    Nov 12 at 7:39








2




2




Where's your attempt? Please put the code which you tried.
– Mayank Porwal
Nov 12 at 6:50




Where's your attempt? Please put the code which you tried.
– Mayank Porwal
Nov 12 at 6:50




1




1




Hi. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful.
– jezrael
Nov 12 at 7:39




Hi. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful.
– jezrael
Nov 12 at 7:39

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53257115%2ffilling-the-missing-data-in-a-timeseries-by-making-an-average-time-series%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53257115%2ffilling-the-missing-data-in-a-timeseries-by-making-an-average-time-series%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Run scheduled task as local user group (not BUILTIN)

Port of Spain