Pandas - How to write a better for/while loop using Pandas












0















i'm new to Pandas, currently i have a series like this one:



import pandas as pd  

index = [x for x in range(75860, 76510, 10)]
# number of occurrence
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]

sample_ser = pd.Series(value, index=index)


This series represent measure and how many time they have been counted.



I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.



Thanks for help.



# return limits where 68% of total count took place
# starting from most_counted length we add the highest count closest to most_counted length
# if 2 count are equal we look for the next label, the one with highest count is choose

def active_area(sample_ser):

# this is the label we have the most occurrence
most_counted = 76310

target = sample_ser.sum()*0.68

total_count = 0

high_label = most_counted + 10
low_label = most_counted - 10

while total_count < target:
# index out of bound
if low_label < sample_ser.index[0]:
total_count += sample_ser[high_label]
high_label += 10
continue
# index out of bound
if high_label >= sample_ser.index[-1]:
total_count += sample_ser[low_label]
low_label -= 10
continue

h_len = sample_ser[high_label]
l_len = sample_ser[low_label]

if h_len > l_len:
total_count += h_len
high_label += 10
continue

if h_len < l_len:
total_count += l_len
low_label -= 10
continue

if h_len == l_len:
counter = 10
while True:

temp_high = high_label+counter
temp_low = low_label-counter

if temp_low < sample_ser.index[0]:
total_count += h_len
high_label += 10
break

if temp_high >= sample_ser.index[-1]:
total_count += l_len
low_label -= 10
break

h_len_temp = sample_ser[temp_high]
l_len_temp = sample_ser[temp_low]

if h_len_temp > l_len_temp:
total_count += h_len
high_label += 10
break

if h_len_temp < l_len_temp:
total_count += l_len
low_label -= 10
break

if h_len_temp == l_len_temp:
counter += 10
continue

if low_label < sample_ser.index[0]:
low_label = sample_ser.index[0]
if high_label >= sample_ser.index[-1]:
high_label = sample_ser.index[-1]

return high_label, low_label




edit: removed 3 of 4 for loop from the starting question, more easy for you to answer










share|improve this question

























  • First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

    – Matthieu Brucher
    Nov 17 '18 at 11:05











  • It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

    – John Zwinck
    Nov 17 '18 at 11:08











  • Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

    – ilma
    Nov 17 '18 at 11:20


















0















i'm new to Pandas, currently i have a series like this one:



import pandas as pd  

index = [x for x in range(75860, 76510, 10)]
# number of occurrence
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]

sample_ser = pd.Series(value, index=index)


This series represent measure and how many time they have been counted.



I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.



Thanks for help.



# return limits where 68% of total count took place
# starting from most_counted length we add the highest count closest to most_counted length
# if 2 count are equal we look for the next label, the one with highest count is choose

def active_area(sample_ser):

# this is the label we have the most occurrence
most_counted = 76310

target = sample_ser.sum()*0.68

total_count = 0

high_label = most_counted + 10
low_label = most_counted - 10

while total_count < target:
# index out of bound
if low_label < sample_ser.index[0]:
total_count += sample_ser[high_label]
high_label += 10
continue
# index out of bound
if high_label >= sample_ser.index[-1]:
total_count += sample_ser[low_label]
low_label -= 10
continue

h_len = sample_ser[high_label]
l_len = sample_ser[low_label]

if h_len > l_len:
total_count += h_len
high_label += 10
continue

if h_len < l_len:
total_count += l_len
low_label -= 10
continue

if h_len == l_len:
counter = 10
while True:

temp_high = high_label+counter
temp_low = low_label-counter

if temp_low < sample_ser.index[0]:
total_count += h_len
high_label += 10
break

if temp_high >= sample_ser.index[-1]:
total_count += l_len
low_label -= 10
break

h_len_temp = sample_ser[temp_high]
l_len_temp = sample_ser[temp_low]

if h_len_temp > l_len_temp:
total_count += h_len
high_label += 10
break

if h_len_temp < l_len_temp:
total_count += l_len
low_label -= 10
break

if h_len_temp == l_len_temp:
counter += 10
continue

if low_label < sample_ser.index[0]:
low_label = sample_ser.index[0]
if high_label >= sample_ser.index[-1]:
high_label = sample_ser.index[-1]

return high_label, low_label




edit: removed 3 of 4 for loop from the starting question, more easy for you to answer










share|improve this question

























  • First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

    – Matthieu Brucher
    Nov 17 '18 at 11:05











  • It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

    – John Zwinck
    Nov 17 '18 at 11:08











  • Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

    – ilma
    Nov 17 '18 at 11:20
















0












0








0








i'm new to Pandas, currently i have a series like this one:



import pandas as pd  

index = [x for x in range(75860, 76510, 10)]
# number of occurrence
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]

sample_ser = pd.Series(value, index=index)


This series represent measure and how many time they have been counted.



I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.



Thanks for help.



# return limits where 68% of total count took place
# starting from most_counted length we add the highest count closest to most_counted length
# if 2 count are equal we look for the next label, the one with highest count is choose

def active_area(sample_ser):

# this is the label we have the most occurrence
most_counted = 76310

target = sample_ser.sum()*0.68

total_count = 0

high_label = most_counted + 10
low_label = most_counted - 10

while total_count < target:
# index out of bound
if low_label < sample_ser.index[0]:
total_count += sample_ser[high_label]
high_label += 10
continue
# index out of bound
if high_label >= sample_ser.index[-1]:
total_count += sample_ser[low_label]
low_label -= 10
continue

h_len = sample_ser[high_label]
l_len = sample_ser[low_label]

if h_len > l_len:
total_count += h_len
high_label += 10
continue

if h_len < l_len:
total_count += l_len
low_label -= 10
continue

if h_len == l_len:
counter = 10
while True:

temp_high = high_label+counter
temp_low = low_label-counter

if temp_low < sample_ser.index[0]:
total_count += h_len
high_label += 10
break

if temp_high >= sample_ser.index[-1]:
total_count += l_len
low_label -= 10
break

h_len_temp = sample_ser[temp_high]
l_len_temp = sample_ser[temp_low]

if h_len_temp > l_len_temp:
total_count += h_len
high_label += 10
break

if h_len_temp < l_len_temp:
total_count += l_len
low_label -= 10
break

if h_len_temp == l_len_temp:
counter += 10
continue

if low_label < sample_ser.index[0]:
low_label = sample_ser.index[0]
if high_label >= sample_ser.index[-1]:
high_label = sample_ser.index[-1]

return high_label, low_label




edit: removed 3 of 4 for loop from the starting question, more easy for you to answer










share|improve this question
















i'm new to Pandas, currently i have a series like this one:



import pandas as pd  

index = [x for x in range(75860, 76510, 10)]
# number of occurrence
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]

sample_ser = pd.Series(value, index=index)


This series represent measure and how many time they have been counted.



I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.



Thanks for help.



# return limits where 68% of total count took place
# starting from most_counted length we add the highest count closest to most_counted length
# if 2 count are equal we look for the next label, the one with highest count is choose

def active_area(sample_ser):

# this is the label we have the most occurrence
most_counted = 76310

target = sample_ser.sum()*0.68

total_count = 0

high_label = most_counted + 10
low_label = most_counted - 10

while total_count < target:
# index out of bound
if low_label < sample_ser.index[0]:
total_count += sample_ser[high_label]
high_label += 10
continue
# index out of bound
if high_label >= sample_ser.index[-1]:
total_count += sample_ser[low_label]
low_label -= 10
continue

h_len = sample_ser[high_label]
l_len = sample_ser[low_label]

if h_len > l_len:
total_count += h_len
high_label += 10
continue

if h_len < l_len:
total_count += l_len
low_label -= 10
continue

if h_len == l_len:
counter = 10
while True:

temp_high = high_label+counter
temp_low = low_label-counter

if temp_low < sample_ser.index[0]:
total_count += h_len
high_label += 10
break

if temp_high >= sample_ser.index[-1]:
total_count += l_len
low_label -= 10
break

h_len_temp = sample_ser[temp_high]
l_len_temp = sample_ser[temp_low]

if h_len_temp > l_len_temp:
total_count += h_len
high_label += 10
break

if h_len_temp < l_len_temp:
total_count += l_len
low_label -= 10
break

if h_len_temp == l_len_temp:
counter += 10
continue

if low_label < sample_ser.index[0]:
low_label = sample_ser.index[0]
if high_label >= sample_ser.index[-1]:
high_label = sample_ser.index[-1]

return high_label, low_label




edit: removed 3 of 4 for loop from the starting question, more easy for you to answer







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 17 '18 at 11:32







ilma

















asked Nov 17 '18 at 10:19









ilmailma

34




34













  • First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

    – Matthieu Brucher
    Nov 17 '18 at 11:05











  • It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

    – John Zwinck
    Nov 17 '18 at 11:08











  • Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

    – ilma
    Nov 17 '18 at 11:20





















  • First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

    – Matthieu Brucher
    Nov 17 '18 at 11:05











  • It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

    – John Zwinck
    Nov 17 '18 at 11:08











  • Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

    – ilma
    Nov 17 '18 at 11:20



















First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

– Matthieu Brucher
Nov 17 '18 at 11:05





First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

– Matthieu Brucher
Nov 17 '18 at 11:05













It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

– John Zwinck
Nov 17 '18 at 11:08





It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

– John Zwinck
Nov 17 '18 at 11:08













Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

– ilma
Nov 17 '18 at 11:20







Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

– ilma
Nov 17 '18 at 11:20














1 Answer
1






active

oldest

votes


















0














Try the following (in my opinion more pythonic) script.



I added some test printouts. In the final version remove them
and convert the main processintg part into a function.



import pandas as pd

def nxt(ser, kk : int):
"""Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""
if kk in ser.index:
val = ser[kk]
return (kk, val)
else:
return (-1, 0)

# Create test Series
index = range(75860, 76510, 10)
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,
7, 7, 8, 6, 6, 7, 15, 23, 26, 30,
31, 28, 22, 22, 21, 19, 14, 15, 15, 14,
12, 12, 13, 14, 14, 15, 15, 19, 19, 23,
25, 34, 38, 39, 40, 41, 35, 35, 30, 26,
23, 23, 29, 25, 25, 25, 23, 21, 19, 16,
14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)

# Processing
target = sample_ser.sum()*0.68 # Target limit
# Index of the max value. Low / high indices start also from here
idmax = low_ind = high_ind = sample_ser.idxmax()
trg = sample_ser[idmax] # The max value
while 1:
# Get index / value for elements before / after the current range
l_ind, l_val = nxt(sample_ser, low_ind - 10)
h_ind, h_val = nxt(sample_ser, high_ind + 10)
# Diagnostic printout - part 1
print(f'L: {l_ind:5} {l_val:2} R: {h_ind:5} {h_val:2}', end=' ')
if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):
# Previous element found, previous value higher,
# sum of values within the target limit
trg += l_val # Add the current (left) value
low_ind = l_ind # Set new lower index
side = 'Left:' # For diagnostic printout
elif (h_ind >= 0) and (trg + h_val) <= target:
# Next element found, sum of values within the target limit
trg += h_val # Add the current (right) value
high_ind = h_ind # Set new upper index
side = 'Right:' # For diagnostic printout
else:
print() # Diagnostic printout - instead of part 2
break
# Diagnostic printout - part 2
print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')
print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')





share|improve this answer
























  • Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

    – ilma
    Nov 20 '18 at 16:52











  • Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

    – Valdi_Bo
    Nov 20 '18 at 17:21













  • Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

    – ilma
    Nov 22 '18 at 7:17











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53350278%2fpandas-how-to-write-a-better-for-while-loop-using-pandas%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Try the following (in my opinion more pythonic) script.



I added some test printouts. In the final version remove them
and convert the main processintg part into a function.



import pandas as pd

def nxt(ser, kk : int):
"""Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""
if kk in ser.index:
val = ser[kk]
return (kk, val)
else:
return (-1, 0)

# Create test Series
index = range(75860, 76510, 10)
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,
7, 7, 8, 6, 6, 7, 15, 23, 26, 30,
31, 28, 22, 22, 21, 19, 14, 15, 15, 14,
12, 12, 13, 14, 14, 15, 15, 19, 19, 23,
25, 34, 38, 39, 40, 41, 35, 35, 30, 26,
23, 23, 29, 25, 25, 25, 23, 21, 19, 16,
14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)

# Processing
target = sample_ser.sum()*0.68 # Target limit
# Index of the max value. Low / high indices start also from here
idmax = low_ind = high_ind = sample_ser.idxmax()
trg = sample_ser[idmax] # The max value
while 1:
# Get index / value for elements before / after the current range
l_ind, l_val = nxt(sample_ser, low_ind - 10)
h_ind, h_val = nxt(sample_ser, high_ind + 10)
# Diagnostic printout - part 1
print(f'L: {l_ind:5} {l_val:2} R: {h_ind:5} {h_val:2}', end=' ')
if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):
# Previous element found, previous value higher,
# sum of values within the target limit
trg += l_val # Add the current (left) value
low_ind = l_ind # Set new lower index
side = 'Left:' # For diagnostic printout
elif (h_ind >= 0) and (trg + h_val) <= target:
# Next element found, sum of values within the target limit
trg += h_val # Add the current (right) value
high_ind = h_ind # Set new upper index
side = 'Right:' # For diagnostic printout
else:
print() # Diagnostic printout - instead of part 2
break
# Diagnostic printout - part 2
print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')
print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')





share|improve this answer
























  • Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

    – ilma
    Nov 20 '18 at 16:52











  • Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

    – Valdi_Bo
    Nov 20 '18 at 17:21













  • Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

    – ilma
    Nov 22 '18 at 7:17
















0














Try the following (in my opinion more pythonic) script.



I added some test printouts. In the final version remove them
and convert the main processintg part into a function.



import pandas as pd

def nxt(ser, kk : int):
"""Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""
if kk in ser.index:
val = ser[kk]
return (kk, val)
else:
return (-1, 0)

# Create test Series
index = range(75860, 76510, 10)
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,
7, 7, 8, 6, 6, 7, 15, 23, 26, 30,
31, 28, 22, 22, 21, 19, 14, 15, 15, 14,
12, 12, 13, 14, 14, 15, 15, 19, 19, 23,
25, 34, 38, 39, 40, 41, 35, 35, 30, 26,
23, 23, 29, 25, 25, 25, 23, 21, 19, 16,
14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)

# Processing
target = sample_ser.sum()*0.68 # Target limit
# Index of the max value. Low / high indices start also from here
idmax = low_ind = high_ind = sample_ser.idxmax()
trg = sample_ser[idmax] # The max value
while 1:
# Get index / value for elements before / after the current range
l_ind, l_val = nxt(sample_ser, low_ind - 10)
h_ind, h_val = nxt(sample_ser, high_ind + 10)
# Diagnostic printout - part 1
print(f'L: {l_ind:5} {l_val:2} R: {h_ind:5} {h_val:2}', end=' ')
if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):
# Previous element found, previous value higher,
# sum of values within the target limit
trg += l_val # Add the current (left) value
low_ind = l_ind # Set new lower index
side = 'Left:' # For diagnostic printout
elif (h_ind >= 0) and (trg + h_val) <= target:
# Next element found, sum of values within the target limit
trg += h_val # Add the current (right) value
high_ind = h_ind # Set new upper index
side = 'Right:' # For diagnostic printout
else:
print() # Diagnostic printout - instead of part 2
break
# Diagnostic printout - part 2
print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')
print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')





share|improve this answer
























  • Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

    – ilma
    Nov 20 '18 at 16:52











  • Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

    – Valdi_Bo
    Nov 20 '18 at 17:21













  • Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

    – ilma
    Nov 22 '18 at 7:17














0












0








0







Try the following (in my opinion more pythonic) script.



I added some test printouts. In the final version remove them
and convert the main processintg part into a function.



import pandas as pd

def nxt(ser, kk : int):
"""Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""
if kk in ser.index:
val = ser[kk]
return (kk, val)
else:
return (-1, 0)

# Create test Series
index = range(75860, 76510, 10)
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,
7, 7, 8, 6, 6, 7, 15, 23, 26, 30,
31, 28, 22, 22, 21, 19, 14, 15, 15, 14,
12, 12, 13, 14, 14, 15, 15, 19, 19, 23,
25, 34, 38, 39, 40, 41, 35, 35, 30, 26,
23, 23, 29, 25, 25, 25, 23, 21, 19, 16,
14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)

# Processing
target = sample_ser.sum()*0.68 # Target limit
# Index of the max value. Low / high indices start also from here
idmax = low_ind = high_ind = sample_ser.idxmax()
trg = sample_ser[idmax] # The max value
while 1:
# Get index / value for elements before / after the current range
l_ind, l_val = nxt(sample_ser, low_ind - 10)
h_ind, h_val = nxt(sample_ser, high_ind + 10)
# Diagnostic printout - part 1
print(f'L: {l_ind:5} {l_val:2} R: {h_ind:5} {h_val:2}', end=' ')
if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):
# Previous element found, previous value higher,
# sum of values within the target limit
trg += l_val # Add the current (left) value
low_ind = l_ind # Set new lower index
side = 'Left:' # For diagnostic printout
elif (h_ind >= 0) and (trg + h_val) <= target:
# Next element found, sum of values within the target limit
trg += h_val # Add the current (right) value
high_ind = h_ind # Set new upper index
side = 'Right:' # For diagnostic printout
else:
print() # Diagnostic printout - instead of part 2
break
# Diagnostic printout - part 2
print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')
print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')





share|improve this answer













Try the following (in my opinion more pythonic) script.



I added some test printouts. In the final version remove them
and convert the main processintg part into a function.



import pandas as pd

def nxt(ser, kk : int):
"""Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""
if kk in ser.index:
val = ser[kk]
return (kk, val)
else:
return (-1, 0)

# Create test Series
index = range(75860, 76510, 10)
value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,
7, 7, 8, 6, 6, 7, 15, 23, 26, 30,
31, 28, 22, 22, 21, 19, 14, 15, 15, 14,
12, 12, 13, 14, 14, 15, 15, 19, 19, 23,
25, 34, 38, 39, 40, 41, 35, 35, 30, 26,
23, 23, 29, 25, 25, 25, 23, 21, 19, 16,
14, 7, 6, 4, 1]
sample_ser = pd.Series(value, index=index)

# Processing
target = sample_ser.sum()*0.68 # Target limit
# Index of the max value. Low / high indices start also from here
idmax = low_ind = high_ind = sample_ser.idxmax()
trg = sample_ser[idmax] # The max value
while 1:
# Get index / value for elements before / after the current range
l_ind, l_val = nxt(sample_ser, low_ind - 10)
h_ind, h_val = nxt(sample_ser, high_ind + 10)
# Diagnostic printout - part 1
print(f'L: {l_ind:5} {l_val:2} R: {h_ind:5} {h_val:2}', end=' ')
if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):
# Previous element found, previous value higher,
# sum of values within the target limit
trg += l_val # Add the current (left) value
low_ind = l_ind # Set new lower index
side = 'Left:' # For diagnostic printout
elif (h_ind >= 0) and (trg + h_val) <= target:
# Next element found, sum of values within the target limit
trg += h_val # Add the current (right) value
high_ind = h_ind # Set new upper index
side = 'Right:' # For diagnostic printout
else:
print() # Diagnostic printout - instead of part 2
break
# Diagnostic printout - part 2
print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')
print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 18 '18 at 10:09









Valdi_BoValdi_Bo

4,7052715




4,7052715













  • Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

    – ilma
    Nov 20 '18 at 16:52











  • Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

    – Valdi_Bo
    Nov 20 '18 at 17:21













  • Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

    – ilma
    Nov 22 '18 at 7:17



















  • Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

    – ilma
    Nov 20 '18 at 16:52











  • Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

    – Valdi_Bo
    Nov 20 '18 at 17:21













  • Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

    – ilma
    Nov 22 '18 at 7:17

















Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

– ilma
Nov 20 '18 at 16:52





Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

– ilma
Nov 20 '18 at 16:52













Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

– Valdi_Bo
Nov 20 '18 at 17:21







Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

– Valdi_Bo
Nov 20 '18 at 17:21















Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

– ilma
Nov 22 '18 at 7:17





Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

– ilma
Nov 22 '18 at 7:17


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53350278%2fpandas-how-to-write-a-better-for-while-loop-using-pandas%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Port of Spain

Run scheduled task as local user group (not BUILTIN)