Pandas dataframe find first and last element given condition and calculate slope
up vote
0
down vote
favorite
The situation:
I have a pandas dataframe where I have some data about the production of a product. The product is produced in 3 phases. The phases are not fixed meaning that their cycles (the time till last) is changing. During the production phases, at each cycle the temperature of the product is measured.
Please see the table below:
The problem:
I need to calculate the slope for each cycle of each phase for each product. I also need to add it to the dataframe in a new column called "Slope". The one you can see, highlighted in yellow was added by me manually in an excel file. The real dataset contains hundreds of parameters (not only temperatures) so in reality I need to calculate the slope for many, many columns, therefore I tried to define a function.
My solution is not working at all:
This is the code I tried, but it does not work. I am trying to catch the first and last row for the given product, for the given phase. And then get the temperature data and the difference of these two rows. And this way I could calculate the slope.
This is all I could come up with so far (I created another column called: "Max_cylce_no", this stores the maximum amount of the cycle for each phase):
temp_at_start=-1
def slope(col_name):
global temp_at_start
start_cycle_no = 1
if row["Cycle"]==1:
temp_at_start =row["Temperature"]
start_row = df.index(row)
cycle_numbers = row["Max_cylce_no"]
last_cycle_row = cycle_numbers + start_row
last_temp = df.loc[last_cycle_row, "Temperature"]
And the way I would like to apply it:
df.apply(slope("Temperature"), axis=1)
Unfortunatelly I get a NameError right away saying that: name 'row' is not defined.
Could you please help me and show me the right direction on how to solve this problem. It gives me a really hard time. :(
Thank you in advance!
python pandas dataframe
add a comment |
up vote
0
down vote
favorite
The situation:
I have a pandas dataframe where I have some data about the production of a product. The product is produced in 3 phases. The phases are not fixed meaning that their cycles (the time till last) is changing. During the production phases, at each cycle the temperature of the product is measured.
Please see the table below:
The problem:
I need to calculate the slope for each cycle of each phase for each product. I also need to add it to the dataframe in a new column called "Slope". The one you can see, highlighted in yellow was added by me manually in an excel file. The real dataset contains hundreds of parameters (not only temperatures) so in reality I need to calculate the slope for many, many columns, therefore I tried to define a function.
My solution is not working at all:
This is the code I tried, but it does not work. I am trying to catch the first and last row for the given product, for the given phase. And then get the temperature data and the difference of these two rows. And this way I could calculate the slope.
This is all I could come up with so far (I created another column called: "Max_cylce_no", this stores the maximum amount of the cycle for each phase):
temp_at_start=-1
def slope(col_name):
global temp_at_start
start_cycle_no = 1
if row["Cycle"]==1:
temp_at_start =row["Temperature"]
start_row = df.index(row)
cycle_numbers = row["Max_cylce_no"]
last_cycle_row = cycle_numbers + start_row
last_temp = df.loc[last_cycle_row, "Temperature"]
And the way I would like to apply it:
df.apply(slope("Temperature"), axis=1)
Unfortunatelly I get a NameError right away saying that: name 'row' is not defined.
Could you please help me and show me the right direction on how to solve this problem. It gives me a really hard time. :(
Thank you in advance!
python pandas dataframe
providing images as a source of data is not really helpful if we want to try our solutions. Can you provide the data in text?
– Yuca
Nov 10 at 19:26
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
The situation:
I have a pandas dataframe where I have some data about the production of a product. The product is produced in 3 phases. The phases are not fixed meaning that their cycles (the time till last) is changing. During the production phases, at each cycle the temperature of the product is measured.
Please see the table below:
The problem:
I need to calculate the slope for each cycle of each phase for each product. I also need to add it to the dataframe in a new column called "Slope". The one you can see, highlighted in yellow was added by me manually in an excel file. The real dataset contains hundreds of parameters (not only temperatures) so in reality I need to calculate the slope for many, many columns, therefore I tried to define a function.
My solution is not working at all:
This is the code I tried, but it does not work. I am trying to catch the first and last row for the given product, for the given phase. And then get the temperature data and the difference of these two rows. And this way I could calculate the slope.
This is all I could come up with so far (I created another column called: "Max_cylce_no", this stores the maximum amount of the cycle for each phase):
temp_at_start=-1
def slope(col_name):
global temp_at_start
start_cycle_no = 1
if row["Cycle"]==1:
temp_at_start =row["Temperature"]
start_row = df.index(row)
cycle_numbers = row["Max_cylce_no"]
last_cycle_row = cycle_numbers + start_row
last_temp = df.loc[last_cycle_row, "Temperature"]
And the way I would like to apply it:
df.apply(slope("Temperature"), axis=1)
Unfortunatelly I get a NameError right away saying that: name 'row' is not defined.
Could you please help me and show me the right direction on how to solve this problem. It gives me a really hard time. :(
Thank you in advance!
python pandas dataframe
The situation:
I have a pandas dataframe where I have some data about the production of a product. The product is produced in 3 phases. The phases are not fixed meaning that their cycles (the time till last) is changing. During the production phases, at each cycle the temperature of the product is measured.
Please see the table below:
The problem:
I need to calculate the slope for each cycle of each phase for each product. I also need to add it to the dataframe in a new column called "Slope". The one you can see, highlighted in yellow was added by me manually in an excel file. The real dataset contains hundreds of parameters (not only temperatures) so in reality I need to calculate the slope for many, many columns, therefore I tried to define a function.
My solution is not working at all:
This is the code I tried, but it does not work. I am trying to catch the first and last row for the given product, for the given phase. And then get the temperature data and the difference of these two rows. And this way I could calculate the slope.
This is all I could come up with so far (I created another column called: "Max_cylce_no", this stores the maximum amount of the cycle for each phase):
temp_at_start=-1
def slope(col_name):
global temp_at_start
start_cycle_no = 1
if row["Cycle"]==1:
temp_at_start =row["Temperature"]
start_row = df.index(row)
cycle_numbers = row["Max_cylce_no"]
last_cycle_row = cycle_numbers + start_row
last_temp = df.loc[last_cycle_row, "Temperature"]
And the way I would like to apply it:
df.apply(slope("Temperature"), axis=1)
Unfortunatelly I get a NameError right away saying that: name 'row' is not defined.
Could you please help me and show me the right direction on how to solve this problem. It gives me a really hard time. :(
Thank you in advance!
python pandas dataframe
python pandas dataframe
asked Nov 10 at 19:19
hunsnowboarder
8111
8111
providing images as a source of data is not really helpful if we want to try our solutions. Can you provide the data in text?
– Yuca
Nov 10 at 19:26
add a comment |
providing images as a source of data is not really helpful if we want to try our solutions. Can you provide the data in text?
– Yuca
Nov 10 at 19:26
providing images as a source of data is not really helpful if we want to try our solutions. Can you provide the data in text?
– Yuca
Nov 10 at 19:26
providing images as a source of data is not really helpful if we want to try our solutions. Can you provide the data in text?
– Yuca
Nov 10 at 19:26
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
I believe you need GroupBy.transform
with subtract last value with first and divide by length:
f = lambda x: (x.iloc[-1] - x.iloc[0]) / len(x)
df['new'] = df.groupby(['Product_no','Phase_no'])['Temperature'].transform(f)
1
Nice one , i believe this is the output OP required.
– pygo
Nov 10 at 19:49
1
You are amazing! Thank you so much! It works like charm!
– hunsnowboarder
Nov 10 at 19:59
@hunsnowboarder - You are welcome!
– jezrael
Nov 10 at 19:59
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
I believe you need GroupBy.transform
with subtract last value with first and divide by length:
f = lambda x: (x.iloc[-1] - x.iloc[0]) / len(x)
df['new'] = df.groupby(['Product_no','Phase_no'])['Temperature'].transform(f)
1
Nice one , i believe this is the output OP required.
– pygo
Nov 10 at 19:49
1
You are amazing! Thank you so much! It works like charm!
– hunsnowboarder
Nov 10 at 19:59
@hunsnowboarder - You are welcome!
– jezrael
Nov 10 at 19:59
add a comment |
up vote
2
down vote
accepted
I believe you need GroupBy.transform
with subtract last value with first and divide by length:
f = lambda x: (x.iloc[-1] - x.iloc[0]) / len(x)
df['new'] = df.groupby(['Product_no','Phase_no'])['Temperature'].transform(f)
1
Nice one , i believe this is the output OP required.
– pygo
Nov 10 at 19:49
1
You are amazing! Thank you so much! It works like charm!
– hunsnowboarder
Nov 10 at 19:59
@hunsnowboarder - You are welcome!
– jezrael
Nov 10 at 19:59
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
I believe you need GroupBy.transform
with subtract last value with first and divide by length:
f = lambda x: (x.iloc[-1] - x.iloc[0]) / len(x)
df['new'] = df.groupby(['Product_no','Phase_no'])['Temperature'].transform(f)
I believe you need GroupBy.transform
with subtract last value with first and divide by length:
f = lambda x: (x.iloc[-1] - x.iloc[0]) / len(x)
df['new'] = df.groupby(['Product_no','Phase_no'])['Temperature'].transform(f)
answered Nov 10 at 19:27
jezrael
311k21247322
311k21247322
1
Nice one , i believe this is the output OP required.
– pygo
Nov 10 at 19:49
1
You are amazing! Thank you so much! It works like charm!
– hunsnowboarder
Nov 10 at 19:59
@hunsnowboarder - You are welcome!
– jezrael
Nov 10 at 19:59
add a comment |
1
Nice one , i believe this is the output OP required.
– pygo
Nov 10 at 19:49
1
You are amazing! Thank you so much! It works like charm!
– hunsnowboarder
Nov 10 at 19:59
@hunsnowboarder - You are welcome!
– jezrael
Nov 10 at 19:59
1
1
Nice one , i believe this is the output OP required.
– pygo
Nov 10 at 19:49
Nice one , i believe this is the output OP required.
– pygo
Nov 10 at 19:49
1
1
You are amazing! Thank you so much! It works like charm!
– hunsnowboarder
Nov 10 at 19:59
You are amazing! Thank you so much! It works like charm!
– hunsnowboarder
Nov 10 at 19:59
@hunsnowboarder - You are welcome!
– jezrael
Nov 10 at 19:59
@hunsnowboarder - You are welcome!
– jezrael
Nov 10 at 19:59
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242570%2fpandas-dataframe-find-first-and-last-element-given-condition-and-calculate-slope%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
providing images as a source of data is not really helpful if we want to try our solutions. Can you provide the data in text?
– Yuca
Nov 10 at 19:26