Pandas - How to write a better for/while loop using Pandas
i'm new to Pandas, currently i have a series like this one:
import pandas as pd
index = [x for x in range(75860, 76510, 10)]
# number of occurrence
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)
This series represent measure and how many time they have been counted.
I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.
Thanks for help.
# return limits where 68% of total count took place
# starting from most_counted length we add the highest count closest to most_counted length
# if 2 count are equal we look for the next label, the one with highest count is choose
def active_area(sample_ser):
# this is the label we have the most occurrence
most_counted = 76310
target = sample_ser.sum()*0.68
total_count = 0
high_label = most_counted + 10
low_label = most_counted - 10
while total_count < target:
# index out of bound
if low_label < sample_ser.index[0]:
total_count += sample_ser[high_label]
high_label += 10
continue
# index out of bound
if high_label >= sample_ser.index[-1]:
total_count += sample_ser[low_label]
low_label -= 10
continue
h_len = sample_ser[high_label]
l_len = sample_ser[low_label]
if h_len > l_len:
total_count += h_len
high_label += 10
continue
if h_len < l_len:
total_count += l_len
low_label -= 10
continue
if h_len == l_len:
counter = 10
while True:
temp_high = high_label+counter
temp_low = low_label-counter
if temp_low < sample_ser.index[0]:
total_count += h_len
high_label += 10
break
if temp_high >= sample_ser.index[-1]:
total_count += l_len
low_label -= 10
break
h_len_temp = sample_ser[temp_high]
l_len_temp = sample_ser[temp_low]
if h_len_temp > l_len_temp:
total_count += h_len
high_label += 10
break
if h_len_temp < l_len_temp:
total_count += l_len
low_label -= 10
break
if h_len_temp == l_len_temp:
counter += 10
continue
if low_label < sample_ser.index[0]:
low_label = sample_ser.index[0]
if high_label >= sample_ser.index[-1]:
high_label = sample_ser.index[-1]
return high_label, low_label
edit: removed 3 of 4 for loop from the starting question, more easy for you to answer
python pandas
add a comment |
i'm new to Pandas, currently i have a series like this one:
import pandas as pd
index = [x for x in range(75860, 76510, 10)]
# number of occurrence
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)
This series represent measure and how many time they have been counted.
I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.
Thanks for help.
# return limits where 68% of total count took place
# starting from most_counted length we add the highest count closest to most_counted length
# if 2 count are equal we look for the next label, the one with highest count is choose
def active_area(sample_ser):
# this is the label we have the most occurrence
most_counted = 76310
target = sample_ser.sum()*0.68
total_count = 0
high_label = most_counted + 10
low_label = most_counted - 10
while total_count < target:
# index out of bound
if low_label < sample_ser.index[0]:
total_count += sample_ser[high_label]
high_label += 10
continue
# index out of bound
if high_label >= sample_ser.index[-1]:
total_count += sample_ser[low_label]
low_label -= 10
continue
h_len = sample_ser[high_label]
l_len = sample_ser[low_label]
if h_len > l_len:
total_count += h_len
high_label += 10
continue
if h_len < l_len:
total_count += l_len
low_label -= 10
continue
if h_len == l_len:
counter = 10
while True:
temp_high = high_label+counter
temp_low = low_label-counter
if temp_low < sample_ser.index[0]:
total_count += h_len
high_label += 10
break
if temp_high >= sample_ser.index[-1]:
total_count += l_len
low_label -= 10
break
h_len_temp = sample_ser[temp_high]
l_len_temp = sample_ser[temp_low]
if h_len_temp > l_len_temp:
total_count += h_len
high_label += 10
break
if h_len_temp < l_len_temp:
total_count += l_len
low_label -= 10
break
if h_len_temp == l_len_temp:
counter += 10
continue
if low_label < sample_ser.index[0]:
low_label = sample_ser.index[0]
if high_label >= sample_ser.index[-1]:
high_label = sample_ser.index[-1]
return high_label, low_label
edit: removed 3 of 4 for loop from the starting question, more easy for you to answer
python pandas
First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))
– Matthieu Brucher
Nov 17 '18 at 11:05
It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.
– John Zwinck
Nov 17 '18 at 11:08
Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest
– ilma
Nov 17 '18 at 11:20
add a comment |
i'm new to Pandas, currently i have a series like this one:
import pandas as pd
index = [x for x in range(75860, 76510, 10)]
# number of occurrence
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)
This series represent measure and how many time they have been counted.
I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.
Thanks for help.
# return limits where 68% of total count took place
# starting from most_counted length we add the highest count closest to most_counted length
# if 2 count are equal we look for the next label, the one with highest count is choose
def active_area(sample_ser):
# this is the label we have the most occurrence
most_counted = 76310
target = sample_ser.sum()*0.68
total_count = 0
high_label = most_counted + 10
low_label = most_counted - 10
while total_count < target:
# index out of bound
if low_label < sample_ser.index[0]:
total_count += sample_ser[high_label]
high_label += 10
continue
# index out of bound
if high_label >= sample_ser.index[-1]:
total_count += sample_ser[low_label]
low_label -= 10
continue
h_len = sample_ser[high_label]
l_len = sample_ser[low_label]
if h_len > l_len:
total_count += h_len
high_label += 10
continue
if h_len < l_len:
total_count += l_len
low_label -= 10
continue
if h_len == l_len:
counter = 10
while True:
temp_high = high_label+counter
temp_low = low_label-counter
if temp_low < sample_ser.index[0]:
total_count += h_len
high_label += 10
break
if temp_high >= sample_ser.index[-1]:
total_count += l_len
low_label -= 10
break
h_len_temp = sample_ser[temp_high]
l_len_temp = sample_ser[temp_low]
if h_len_temp > l_len_temp:
total_count += h_len
high_label += 10
break
if h_len_temp < l_len_temp:
total_count += l_len
low_label -= 10
break
if h_len_temp == l_len_temp:
counter += 10
continue
if low_label < sample_ser.index[0]:
low_label = sample_ser.index[0]
if high_label >= sample_ser.index[-1]:
high_label = sample_ser.index[-1]
return high_label, low_label
edit: removed 3 of 4 for loop from the starting question, more easy for you to answer
python pandas
i'm new to Pandas, currently i have a series like this one:
import pandas as pd
index = [x for x in range(75860, 76510, 10)]
# number of occurrence
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)
This series represent measure and how many time they have been counted.
I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.
Thanks for help.
# return limits where 68% of total count took place
# starting from most_counted length we add the highest count closest to most_counted length
# if 2 count are equal we look for the next label, the one with highest count is choose
def active_area(sample_ser):
# this is the label we have the most occurrence
most_counted = 76310
target = sample_ser.sum()*0.68
total_count = 0
high_label = most_counted + 10
low_label = most_counted - 10
while total_count < target:
# index out of bound
if low_label < sample_ser.index[0]:
total_count += sample_ser[high_label]
high_label += 10
continue
# index out of bound
if high_label >= sample_ser.index[-1]:
total_count += sample_ser[low_label]
low_label -= 10
continue
h_len = sample_ser[high_label]
l_len = sample_ser[low_label]
if h_len > l_len:
total_count += h_len
high_label += 10
continue
if h_len < l_len:
total_count += l_len
low_label -= 10
continue
if h_len == l_len:
counter = 10
while True:
temp_high = high_label+counter
temp_low = low_label-counter
if temp_low < sample_ser.index[0]:
total_count += h_len
high_label += 10
break
if temp_high >= sample_ser.index[-1]:
total_count += l_len
low_label -= 10
break
h_len_temp = sample_ser[temp_high]
l_len_temp = sample_ser[temp_low]
if h_len_temp > l_len_temp:
total_count += h_len
high_label += 10
break
if h_len_temp < l_len_temp:
total_count += l_len
low_label -= 10
break
if h_len_temp == l_len_temp:
counter += 10
continue
if low_label < sample_ser.index[0]:
low_label = sample_ser.index[0]
if high_label >= sample_ser.index[-1]:
high_label = sample_ser.index[-1]
return high_label, low_label
edit: removed 3 of 4 for loop from the starting question, more easy for you to answer
python pandas
python pandas
edited Nov 17 '18 at 11:32
ilma
asked Nov 17 '18 at 10:19
ilmailma
34
34
First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))
– Matthieu Brucher
Nov 17 '18 at 11:05
It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.
– John Zwinck
Nov 17 '18 at 11:08
Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest
– ilma
Nov 17 '18 at 11:20
add a comment |
First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))
– Matthieu Brucher
Nov 17 '18 at 11:05
It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.
– John Zwinck
Nov 17 '18 at 11:08
Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest
– ilma
Nov 17 '18 at 11:20
First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))
– Matthieu Brucher
Nov 17 '18 at 11:05
First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))
– Matthieu Brucher
Nov 17 '18 at 11:05
It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.
– John Zwinck
Nov 17 '18 at 11:08
It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.
– John Zwinck
Nov 17 '18 at 11:08
Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest
– ilma
Nov 17 '18 at 11:20
Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest
– ilma
Nov 17 '18 at 11:20
add a comment |
1 Answer
1
active
oldest
votes
Try the following (in my opinion more pythonic) script.
I added some test printouts. In the final version remove them
and convert the main processintg part into a function.
import pandas as pd
def nxt(ser, kk : int):
"""Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""
if kk in ser.index:
val = ser[kk]
return (kk, val)
else:
return (-1, 0)
# Create test Series
index = range(75860, 76510, 10)
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,
7, 7, 8, 6, 6, 7, 15, 23, 26, 30,
31, 28, 22, 22, 21, 19, 14, 15, 15, 14,
12, 12, 13, 14, 14, 15, 15, 19, 19, 23,
25, 34, 38, 39, 40, 41, 35, 35, 30, 26,
23, 23, 29, 25, 25, 25, 23, 21, 19, 16,
14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)
# Processing
target = sample_ser.sum()*0.68 # Target limit
# Index of the max value. Low / high indices start also from here
idmax = low_ind = high_ind = sample_ser.idxmax()
trg = sample_ser[idmax] # The max value
while 1:
# Get index / value for elements before / after the current range
l_ind, l_val = nxt(sample_ser, low_ind - 10)
h_ind, h_val = nxt(sample_ser, high_ind + 10)
# Diagnostic printout - part 1
print(f'L: {l_ind:5} {l_val:2} R: {h_ind:5} {h_val:2}', end=' ')
if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):
# Previous element found, previous value higher,
# sum of values within the target limit
trg += l_val # Add the current (left) value
low_ind = l_ind # Set new lower index
side = 'Left:' # For diagnostic printout
elif (h_ind >= 0) and (trg + h_val) <= target:
# Next element found, sum of values within the target limit
trg += h_val # Add the current (right) value
high_ind = h_ind # Set new upper index
side = 'Right:' # For diagnostic printout
else:
print() # Diagnostic printout - instead of part 2
break
# Diagnostic printout - part 2
print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')
print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')
Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?
– ilma
Nov 20 '18 at 16:52
Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?
– Valdi_Bo
Nov 20 '18 at 17:21
Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.
– ilma
Nov 22 '18 at 7:17
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53350278%2fpandas-how-to-write-a-better-for-while-loop-using-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try the following (in my opinion more pythonic) script.
I added some test printouts. In the final version remove them
and convert the main processintg part into a function.
import pandas as pd
def nxt(ser, kk : int):
"""Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""
if kk in ser.index:
val = ser[kk]
return (kk, val)
else:
return (-1, 0)
# Create test Series
index = range(75860, 76510, 10)
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,
7, 7, 8, 6, 6, 7, 15, 23, 26, 30,
31, 28, 22, 22, 21, 19, 14, 15, 15, 14,
12, 12, 13, 14, 14, 15, 15, 19, 19, 23,
25, 34, 38, 39, 40, 41, 35, 35, 30, 26,
23, 23, 29, 25, 25, 25, 23, 21, 19, 16,
14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)
# Processing
target = sample_ser.sum()*0.68 # Target limit
# Index of the max value. Low / high indices start also from here
idmax = low_ind = high_ind = sample_ser.idxmax()
trg = sample_ser[idmax] # The max value
while 1:
# Get index / value for elements before / after the current range
l_ind, l_val = nxt(sample_ser, low_ind - 10)
h_ind, h_val = nxt(sample_ser, high_ind + 10)
# Diagnostic printout - part 1
print(f'L: {l_ind:5} {l_val:2} R: {h_ind:5} {h_val:2}', end=' ')
if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):
# Previous element found, previous value higher,
# sum of values within the target limit
trg += l_val # Add the current (left) value
low_ind = l_ind # Set new lower index
side = 'Left:' # For diagnostic printout
elif (h_ind >= 0) and (trg + h_val) <= target:
# Next element found, sum of values within the target limit
trg += h_val # Add the current (right) value
high_ind = h_ind # Set new upper index
side = 'Right:' # For diagnostic printout
else:
print() # Diagnostic printout - instead of part 2
break
# Diagnostic printout - part 2
print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')
print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')
Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?
– ilma
Nov 20 '18 at 16:52
Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?
– Valdi_Bo
Nov 20 '18 at 17:21
Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.
– ilma
Nov 22 '18 at 7:17
add a comment |
Try the following (in my opinion more pythonic) script.
I added some test printouts. In the final version remove them
and convert the main processintg part into a function.
import pandas as pd
def nxt(ser, kk : int):
"""Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""
if kk in ser.index:
val = ser[kk]
return (kk, val)
else:
return (-1, 0)
# Create test Series
index = range(75860, 76510, 10)
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,
7, 7, 8, 6, 6, 7, 15, 23, 26, 30,
31, 28, 22, 22, 21, 19, 14, 15, 15, 14,
12, 12, 13, 14, 14, 15, 15, 19, 19, 23,
25, 34, 38, 39, 40, 41, 35, 35, 30, 26,
23, 23, 29, 25, 25, 25, 23, 21, 19, 16,
14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)
# Processing
target = sample_ser.sum()*0.68 # Target limit
# Index of the max value. Low / high indices start also from here
idmax = low_ind = high_ind = sample_ser.idxmax()
trg = sample_ser[idmax] # The max value
while 1:
# Get index / value for elements before / after the current range
l_ind, l_val = nxt(sample_ser, low_ind - 10)
h_ind, h_val = nxt(sample_ser, high_ind + 10)
# Diagnostic printout - part 1
print(f'L: {l_ind:5} {l_val:2} R: {h_ind:5} {h_val:2}', end=' ')
if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):
# Previous element found, previous value higher,
# sum of values within the target limit
trg += l_val # Add the current (left) value
low_ind = l_ind # Set new lower index
side = 'Left:' # For diagnostic printout
elif (h_ind >= 0) and (trg + h_val) <= target:
# Next element found, sum of values within the target limit
trg += h_val # Add the current (right) value
high_ind = h_ind # Set new upper index
side = 'Right:' # For diagnostic printout
else:
print() # Diagnostic printout - instead of part 2
break
# Diagnostic printout - part 2
print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')
print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')
Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?
– ilma
Nov 20 '18 at 16:52
Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?
– Valdi_Bo
Nov 20 '18 at 17:21
Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.
– ilma
Nov 22 '18 at 7:17
add a comment |
Try the following (in my opinion more pythonic) script.
I added some test printouts. In the final version remove them
and convert the main processintg part into a function.
import pandas as pd
def nxt(ser, kk : int):
"""Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""
if kk in ser.index:
val = ser[kk]
return (kk, val)
else:
return (-1, 0)
# Create test Series
index = range(75860, 76510, 10)
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,
7, 7, 8, 6, 6, 7, 15, 23, 26, 30,
31, 28, 22, 22, 21, 19, 14, 15, 15, 14,
12, 12, 13, 14, 14, 15, 15, 19, 19, 23,
25, 34, 38, 39, 40, 41, 35, 35, 30, 26,
23, 23, 29, 25, 25, 25, 23, 21, 19, 16,
14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)
# Processing
target = sample_ser.sum()*0.68 # Target limit
# Index of the max value. Low / high indices start also from here
idmax = low_ind = high_ind = sample_ser.idxmax()
trg = sample_ser[idmax] # The max value
while 1:
# Get index / value for elements before / after the current range
l_ind, l_val = nxt(sample_ser, low_ind - 10)
h_ind, h_val = nxt(sample_ser, high_ind + 10)
# Diagnostic printout - part 1
print(f'L: {l_ind:5} {l_val:2} R: {h_ind:5} {h_val:2}', end=' ')
if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):
# Previous element found, previous value higher,
# sum of values within the target limit
trg += l_val # Add the current (left) value
low_ind = l_ind # Set new lower index
side = 'Left:' # For diagnostic printout
elif (h_ind >= 0) and (trg + h_val) <= target:
# Next element found, sum of values within the target limit
trg += h_val # Add the current (right) value
high_ind = h_ind # Set new upper index
side = 'Right:' # For diagnostic printout
else:
print() # Diagnostic printout - instead of part 2
break
# Diagnostic printout - part 2
print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')
print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')
Try the following (in my opinion more pythonic) script.
I added some test printouts. In the final version remove them
and convert the main processintg part into a function.
import pandas as pd
def nxt(ser, kk : int):
"""Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""
if kk in ser.index:
val = ser[kk]
return (kk, val)
else:
return (-1, 0)
# Create test Series
index = range(75860, 76510, 10)
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,
7, 7, 8, 6, 6, 7, 15, 23, 26, 30,
31, 28, 22, 22, 21, 19, 14, 15, 15, 14,
12, 12, 13, 14, 14, 15, 15, 19, 19, 23,
25, 34, 38, 39, 40, 41, 35, 35, 30, 26,
23, 23, 29, 25, 25, 25, 23, 21, 19, 16,
14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)
# Processing
target = sample_ser.sum()*0.68 # Target limit
# Index of the max value. Low / high indices start also from here
idmax = low_ind = high_ind = sample_ser.idxmax()
trg = sample_ser[idmax] # The max value
while 1:
# Get index / value for elements before / after the current range
l_ind, l_val = nxt(sample_ser, low_ind - 10)
h_ind, h_val = nxt(sample_ser, high_ind + 10)
# Diagnostic printout - part 1
print(f'L: {l_ind:5} {l_val:2} R: {h_ind:5} {h_val:2}', end=' ')
if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):
# Previous element found, previous value higher,
# sum of values within the target limit
trg += l_val # Add the current (left) value
low_ind = l_ind # Set new lower index
side = 'Left:' # For diagnostic printout
elif (h_ind >= 0) and (trg + h_val) <= target:
# Next element found, sum of values within the target limit
trg += h_val # Add the current (right) value
high_ind = h_ind # Set new upper index
side = 'Right:' # For diagnostic printout
else:
print() # Diagnostic printout - instead of part 2
break
# Diagnostic printout - part 2
print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')
print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')
answered Nov 18 '18 at 10:09
Valdi_BoValdi_Bo
4,7052715
4,7052715
Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?
– ilma
Nov 20 '18 at 16:52
Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?
– Valdi_Bo
Nov 20 '18 at 17:21
Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.
– ilma
Nov 22 '18 at 7:17
add a comment |
Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?
– ilma
Nov 20 '18 at 16:52
Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?
– Valdi_Bo
Nov 20 '18 at 17:21
Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.
– ilma
Nov 22 '18 at 7:17
Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?
– ilma
Nov 20 '18 at 16:52
Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?
– ilma
Nov 20 '18 at 16:52
Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?
– Valdi_Bo
Nov 20 '18 at 17:21
Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?
– Valdi_Bo
Nov 20 '18 at 17:21
Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.
– ilma
Nov 22 '18 at 7:17
Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.
– ilma
Nov 22 '18 at 7:17
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53350278%2fpandas-how-to-write-a-better-for-while-loop-using-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))
– Matthieu Brucher
Nov 17 '18 at 11:05
It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.
– John Zwinck
Nov 17 '18 at 11:08
Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest
– ilma
Nov 17 '18 at 11:20