Pandas - How to write a better for/while loop using Pandas

i'm new to Pandas, currently i have a series like this one:

import pandas as pd  



index = [x for x in range(75860, 76510, 10)]

# number of occurrence

value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]



sample_ser = pd.Series(value, index=index)

This series represent measure and how many time they have been counted.

I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.

Thanks for help.

# return limits where 68% of total count took place

# starting from most_counted length we add the highest count closest to most_counted length

# if 2 count are equal we look for the next label, the one with highest count is choose



def active_area(sample_ser):



    # this is the label we have the most occurrence

    most_counted = 76310



    target = sample_ser.sum()*0.68



    total_count = 0



    high_label = most_counted + 10

    low_label = most_counted - 10



    while total_count < target:

        # index out of bound

        if low_label < sample_ser.index[0]:

            total_count += sample_ser[high_label]

            high_label += 10

            continue

        # index out of bound

        if high_label >= sample_ser.index[-1]:

            total_count += sample_ser[low_label]

            low_label -= 10

            continue



        h_len = sample_ser[high_label]

        l_len = sample_ser[low_label]



        if h_len > l_len:

            total_count += h_len

            high_label += 10

            continue



        if h_len < l_len:

            total_count += l_len

            low_label -= 10

            continue



         if h_len == l_len:

            counter = 10

            while True:



                temp_high = high_label+counter

                temp_low = low_label-counter



                if temp_low < sample_ser.index[0]:

                    total_count += h_len

                    high_label += 10

                    break



                if temp_high >= sample_ser.index[-1]:

                    total_count += l_len

                    low_label -= 10

                    break



                h_len_temp = sample_ser[temp_high]

                l_len_temp = sample_ser[temp_low]



                if h_len_temp > l_len_temp:

                    total_count += h_len

                    high_label += 10

                    break



                if h_len_temp < l_len_temp:

                    total_count += l_len

                    low_label -= 10

                    break



                if h_len_temp == l_len_temp:

                    counter += 10

                    continue



    if low_label < sample_ser.index[0]:

        low_label = sample_ser.index[0]

    if high_label >= sample_ser.index[-1]:

        high_label = sample_ser.index[-1]



    return high_label, low_label

edit: removed 3 of 4 for loop from the starting question, more easy for you to answer

edited Nov 17 '18 at 11:32

asked Nov 17 '18 at 10:19

ilma

First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

– Matthieu Brucher
Nov 17 '18 at 11:05

It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

– John Zwinck
Nov 17 '18 at 11:08

Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

– ilma
Nov 17 '18 at 11:20

add a comment |

i'm new to Pandas, currently i have a series like this one:

import pandas as pd  



index = [x for x in range(75860, 76510, 10)]

# number of occurrence

value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]



sample_ser = pd.Series(value, index=index)

This series represent measure and how many time they have been counted.

I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.

Thanks for help.

# return limits where 68% of total count took place

# starting from most_counted length we add the highest count closest to most_counted length

# if 2 count are equal we look for the next label, the one with highest count is choose



def active_area(sample_ser):



    # this is the label we have the most occurrence

    most_counted = 76310



    target = sample_ser.sum()*0.68



    total_count = 0



    high_label = most_counted + 10

    low_label = most_counted - 10



    while total_count < target:

        # index out of bound

        if low_label < sample_ser.index[0]:

            total_count += sample_ser[high_label]

            high_label += 10

            continue

        # index out of bound

        if high_label >= sample_ser.index[-1]:

            total_count += sample_ser[low_label]

            low_label -= 10

            continue



        h_len = sample_ser[high_label]

        l_len = sample_ser[low_label]



        if h_len > l_len:

            total_count += h_len

            high_label += 10

            continue



        if h_len < l_len:

            total_count += l_len

            low_label -= 10

            continue



         if h_len == l_len:

            counter = 10

            while True:



                temp_high = high_label+counter

                temp_low = low_label-counter



                if temp_low < sample_ser.index[0]:

                    total_count += h_len

                    high_label += 10

                    break



                if temp_high >= sample_ser.index[-1]:

                    total_count += l_len

                    low_label -= 10

                    break



                h_len_temp = sample_ser[temp_high]

                l_len_temp = sample_ser[temp_low]



                if h_len_temp > l_len_temp:

                    total_count += h_len

                    high_label += 10

                    break



                if h_len_temp < l_len_temp:

                    total_count += l_len

                    low_label -= 10

                    break



                if h_len_temp == l_len_temp:

                    counter += 10

                    continue



    if low_label < sample_ser.index[0]:

        low_label = sample_ser.index[0]

    if high_label >= sample_ser.index[-1]:

        high_label = sample_ser.index[-1]



    return high_label, low_label

edit: removed 3 of 4 for loop from the starting question, more easy for you to answer

edited Nov 17 '18 at 11:32

asked Nov 17 '18 at 10:19

ilma

First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

– Matthieu Brucher
Nov 17 '18 at 11:05

It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

– John Zwinck
Nov 17 '18 at 11:08

Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

– ilma
Nov 17 '18 at 11:20

add a comment |

i'm new to Pandas, currently i have a series like this one:

import pandas as pd  



index = [x for x in range(75860, 76510, 10)]

# number of occurrence

value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]



sample_ser = pd.Series(value, index=index)

This series represent measure and how many time they have been counted.

I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.

Thanks for help.

# return limits where 68% of total count took place

# starting from most_counted length we add the highest count closest to most_counted length

# if 2 count are equal we look for the next label, the one with highest count is choose



def active_area(sample_ser):



    # this is the label we have the most occurrence

    most_counted = 76310



    target = sample_ser.sum()*0.68



    total_count = 0



    high_label = most_counted + 10

    low_label = most_counted - 10



    while total_count < target:

        # index out of bound

        if low_label < sample_ser.index[0]:

            total_count += sample_ser[high_label]

            high_label += 10

            continue

        # index out of bound

        if high_label >= sample_ser.index[-1]:

            total_count += sample_ser[low_label]

            low_label -= 10

            continue



        h_len = sample_ser[high_label]

        l_len = sample_ser[low_label]



        if h_len > l_len:

            total_count += h_len

            high_label += 10

            continue



        if h_len < l_len:

            total_count += l_len

            low_label -= 10

            continue



         if h_len == l_len:

            counter = 10

            while True:



                temp_high = high_label+counter

                temp_low = low_label-counter



                if temp_low < sample_ser.index[0]:

                    total_count += h_len

                    high_label += 10

                    break



                if temp_high >= sample_ser.index[-1]:

                    total_count += l_len

                    low_label -= 10

                    break



                h_len_temp = sample_ser[temp_high]

                l_len_temp = sample_ser[temp_low]



                if h_len_temp > l_len_temp:

                    total_count += h_len

                    high_label += 10

                    break



                if h_len_temp < l_len_temp:

                    total_count += l_len

                    low_label -= 10

                    break



                if h_len_temp == l_len_temp:

                    counter += 10

                    continue



    if low_label < sample_ser.index[0]:

        low_label = sample_ser.index[0]

    if high_label >= sample_ser.index[-1]:

        high_label = sample_ser.index[-1]



    return high_label, low_label

edit: removed 3 of 4 for loop from the starting question, more easy for you to answer

edited Nov 17 '18 at 11:32

asked Nov 17 '18 at 10:19

ilma

i'm new to Pandas, currently i have a series like this one:

import pandas as pd  



index = [x for x in range(75860, 76510, 10)]

# number of occurrence

value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7, 7, 7, 8, 6, 6, 7, 15, 23, 26, 30, 31, 28, 22, 22, 21, 19, 14, 15, 15, 14, 12, 12, 13, 14, 14, 15, 15, 19, 19, 23, 25, 34, 38, 39, 40, 41, 35, 35, 30, 26, 23, 23, 29, 25, 25, 25, 23, 21, 19, 16, 14, 7, 6, 4, 1]



sample_ser = pd.Series(value, index=index)

This series represent measure and how many time they have been counted.

I'm trying to calculate custom parameters, but i'm using standard python for loops i want to know if there's a better way to accomplish that, here is one of the functions.

Thanks for help.

# return limits where 68% of total count took place

# starting from most_counted length we add the highest count closest to most_counted length

# if 2 count are equal we look for the next label, the one with highest count is choose



def active_area(sample_ser):



    # this is the label we have the most occurrence

    most_counted = 76310



    target = sample_ser.sum()*0.68



    total_count = 0



    high_label = most_counted + 10

    low_label = most_counted - 10



    while total_count < target:

        # index out of bound

        if low_label < sample_ser.index[0]:

            total_count += sample_ser[high_label]

            high_label += 10

            continue

        # index out of bound

        if high_label >= sample_ser.index[-1]:

            total_count += sample_ser[low_label]

            low_label -= 10

            continue



        h_len = sample_ser[high_label]

        l_len = sample_ser[low_label]



        if h_len > l_len:

            total_count += h_len

            high_label += 10

            continue



        if h_len < l_len:

            total_count += l_len

            low_label -= 10

            continue



         if h_len == l_len:

            counter = 10

            while True:



                temp_high = high_label+counter

                temp_low = low_label-counter



                if temp_low < sample_ser.index[0]:

                    total_count += h_len

                    high_label += 10

                    break



                if temp_high >= sample_ser.index[-1]:

                    total_count += l_len

                    low_label -= 10

                    break



                h_len_temp = sample_ser[temp_high]

                l_len_temp = sample_ser[temp_low]



                if h_len_temp > l_len_temp:

                    total_count += h_len

                    high_label += 10

                    break



                if h_len_temp < l_len_temp:

                    total_count += l_len

                    low_label -= 10

                    break



                if h_len_temp == l_len_temp:

                    counter += 10

                    continue



    if low_label < sample_ser.index[0]:

        low_label = sample_ser.index[0]

    if high_label >= sample_ser.index[-1]:

        high_label = sample_ser.index[-1]



    return high_label, low_label

edit: removed 3 of 4 for loop from the starting question, more easy for you to answer

python pandas

edited Nov 17 '18 at 11:32

asked Nov 17 '18 at 10:19

ilma

edited Nov 17 '18 at 11:32

asked Nov 17 '18 at 10:19

ilma

edited Nov 17 '18 at 11:32

asked Nov 17 '18 at 10:19

ilma

asked Nov 17 '18 at 10:19

ilma

asked Nov 17 '18 at 10:19

ilma

First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

– Matthieu Brucher
Nov 17 '18 at 11:05

It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

– John Zwinck
Nov 17 '18 at 11:08

Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

– ilma
Nov 17 '18 at 11:20

add a comment |

First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

– Matthieu Brucher
Nov 17 '18 at 11:05

It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

– John Zwinck
Nov 17 '18 at 11:08

Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

– ilma
Nov 17 '18 at 11:20

First, index = [x for x in range(75860, 76510, 10)] -> index = list(range(75860, 76510, 10))

– Matthieu Brucher
Nov 17 '18 at 11:05

It would be better if you ask one question per question. Right now I'm not going to even attempt to answer all of these, and answering just one is not good because (1) ultimately you're expected to "accept" one of the answers, so what if they're all independently useful, and (2) we can't know which of them is most important to you. Perhaps as a start you'd consider editing your question to only contain one chunk of code you want help with, and you can use the help you receive to work on the other chunks yourself.

– John Zwinck
Nov 17 '18 at 11:08

Hi John, thanks for suggestion, i'll leave just one for loop, then i will try to figure out he rest

– ilma
Nov 17 '18 at 11:20

add a comment |

1 Answer
1

active

oldest

votes

Try the following (in my opinion more pythonic) script.

I added some test printouts. In the final version remove them
and convert the main processintg part into a function.

import pandas as pd



def nxt(ser, kk : int):

    """Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""

    if kk in ser.index:

        val = ser[kk]

        return (kk, val)

    else:

        return (-1, 0)



# Create test Series

index = range(75860, 76510, 10)

value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,

     7,  7,  8,  6,  6,  7, 15, 23, 26, 30,

    31, 28, 22, 22, 21, 19, 14, 15, 15, 14,

    12, 12, 13, 14, 14, 15, 15, 19, 19, 23,

    25, 34, 38, 39, 40, 41, 35, 35, 30, 26,

    23, 23, 29, 25, 25, 25, 23, 21, 19, 16,

    14,  7,  6,  4,  1]

sample_ser = pd.Series(value, index=index)



# Processing

target = sample_ser.sum()*0.68  # Target limit

# Index of the max value. Low / high indices start also from here

idmax = low_ind = high_ind = sample_ser.idxmax()

trg = sample_ser[idmax]    # The max value

while 1:

    # Get index / value for elements before / after the current range

    l_ind, l_val = nxt(sample_ser, low_ind - 10)

    h_ind, h_val = nxt(sample_ser, high_ind + 10)

    # Diagnostic printout - part 1

    print(f'L: {l_ind:5} {l_val:2}   R: {h_ind:5} {h_val:2}', end='    ')

    if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):

        # Previous element found, previous value higher,

        # sum of values within the target limit

        trg += l_val      # Add the current (left) value

        low_ind = l_ind   # Set new lower index

        side = 'Left:'    # For diagnostic printout

    elif (h_ind >= 0) and (trg + h_val) <= target:

        # Next element found, sum of values within the target limit

        trg += h_val      # Add the current (right) value

        high_ind = h_ind  # Set new upper index

        side = 'Right:'   # For diagnostic printout

    else:

        print()           # Diagnostic printout - instead of part 2

        break

    # Diagnostic printout - part 2

    print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')

print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')

answered Nov 18 '18 at 10:09

Valdi_Bo

4,7052715

Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

– ilma
Nov 20 '18 at 16:52

Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

– Valdi_Bo
Nov 20 '18 at 17:21

Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

– ilma
Nov 22 '18 at 7:17

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53350278%2fpandas-how-to-write-a-better-for-while-loop-using-pandas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Try the following (in my opinion more pythonic) script.

I added some test printouts. In the final version remove them
and convert the main processintg part into a function.

import pandas as pd



def nxt(ser, kk : int):

    """Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""

    if kk in ser.index:

        val = ser[kk]

        return (kk, val)

    else:

        return (-1, 0)



# Create test Series

index = range(75860, 76510, 10)

value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,

     7,  7,  8,  6,  6,  7, 15, 23, 26, 30,

    31, 28, 22, 22, 21, 19, 14, 15, 15, 14,

    12, 12, 13, 14, 14, 15, 15, 19, 19, 23,

    25, 34, 38, 39, 40, 41, 35, 35, 30, 26,

    23, 23, 29, 25, 25, 25, 23, 21, 19, 16,

    14,  7,  6,  4,  1]

sample_ser = pd.Series(value, index=index)



# Processing

target = sample_ser.sum()*0.68  # Target limit

# Index of the max value. Low / high indices start also from here

idmax = low_ind = high_ind = sample_ser.idxmax()

trg = sample_ser[idmax]    # The max value

while 1:

    # Get index / value for elements before / after the current range

    l_ind, l_val = nxt(sample_ser, low_ind - 10)

    h_ind, h_val = nxt(sample_ser, high_ind + 10)

    # Diagnostic printout - part 1

    print(f'L: {l_ind:5} {l_val:2}   R: {h_ind:5} {h_val:2}', end='    ')

    if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):

        # Previous element found, previous value higher,

        # sum of values within the target limit

        trg += l_val      # Add the current (left) value

        low_ind = l_ind   # Set new lower index

        side = 'Left:'    # For diagnostic printout

    elif (h_ind >= 0) and (trg + h_val) <= target:

        # Next element found, sum of values within the target limit

        trg += h_val      # Add the current (right) value

        high_ind = h_ind  # Set new upper index

        side = 'Right:'   # For diagnostic printout

    else:

        print()           # Diagnostic printout - instead of part 2

        break

    # Diagnostic printout - part 2

    print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')

print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')

answered Nov 18 '18 at 10:09

Valdi_Bo

4,7052715

Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

– ilma
Nov 20 '18 at 16:52

Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

– Valdi_Bo
Nov 20 '18 at 17:21

Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

– ilma
Nov 22 '18 at 7:17

add a comment |

Try the following (in my opinion more pythonic) script.

I added some test printouts. In the final version remove them
and convert the main processintg part into a function.

import pandas as pd



def nxt(ser, kk : int):

    """Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""

    if kk in ser.index:

        val = ser[kk]

        return (kk, val)

    else:

        return (-1, 0)



# Create test Series

index = range(75860, 76510, 10)

value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,

     7,  7,  8,  6,  6,  7, 15, 23, 26, 30,

    31, 28, 22, 22, 21, 19, 14, 15, 15, 14,

    12, 12, 13, 14, 14, 15, 15, 19, 19, 23,

    25, 34, 38, 39, 40, 41, 35, 35, 30, 26,

    23, 23, 29, 25, 25, 25, 23, 21, 19, 16,

    14,  7,  6,  4,  1]

sample_ser = pd.Series(value, index=index)



# Processing

target = sample_ser.sum()*0.68  # Target limit

# Index of the max value. Low / high indices start also from here

idmax = low_ind = high_ind = sample_ser.idxmax()

trg = sample_ser[idmax]    # The max value

while 1:

    # Get index / value for elements before / after the current range

    l_ind, l_val = nxt(sample_ser, low_ind - 10)

    h_ind, h_val = nxt(sample_ser, high_ind + 10)

    # Diagnostic printout - part 1

    print(f'L: {l_ind:5} {l_val:2}   R: {h_ind:5} {h_val:2}', end='    ')

    if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):

        # Previous element found, previous value higher,

        # sum of values within the target limit

        trg += l_val      # Add the current (left) value

        low_ind = l_ind   # Set new lower index

        side = 'Left:'    # For diagnostic printout

    elif (h_ind >= 0) and (trg + h_val) <= target:

        # Next element found, sum of values within the target limit

        trg += h_val      # Add the current (right) value

        high_ind = h_ind  # Set new upper index

        side = 'Right:'   # For diagnostic printout

    else:

        print()           # Diagnostic printout - instead of part 2

        break

    # Diagnostic printout - part 2

    print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')

print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')

answered Nov 18 '18 at 10:09

Valdi_Bo

4,7052715

Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

– ilma
Nov 20 '18 at 16:52

Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

– Valdi_Bo
Nov 20 '18 at 17:21

Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

– ilma
Nov 22 '18 at 7:17

add a comment |

Try the following (in my opinion more pythonic) script.

I added some test printouts. In the final version remove them
and convert the main processintg part into a function.

import pandas as pd



def nxt(ser, kk : int):

    """Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""

    if kk in ser.index:

        val = ser[kk]

        return (kk, val)

    else:

        return (-1, 0)



# Create test Series

index = range(75860, 76510, 10)

value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,

     7,  7,  8,  6,  6,  7, 15, 23, 26, 30,

    31, 28, 22, 22, 21, 19, 14, 15, 15, 14,

    12, 12, 13, 14, 14, 15, 15, 19, 19, 23,

    25, 34, 38, 39, 40, 41, 35, 35, 30, 26,

    23, 23, 29, 25, 25, 25, 23, 21, 19, 16,

    14,  7,  6,  4,  1]

sample_ser = pd.Series(value, index=index)



# Processing

target = sample_ser.sum()*0.68  # Target limit

# Index of the max value. Low / high indices start also from here

idmax = low_ind = high_ind = sample_ser.idxmax()

trg = sample_ser[idmax]    # The max value

while 1:

    # Get index / value for elements before / after the current range

    l_ind, l_val = nxt(sample_ser, low_ind - 10)

    h_ind, h_val = nxt(sample_ser, high_ind + 10)

    # Diagnostic printout - part 1

    print(f'L: {l_ind:5} {l_val:2}   R: {h_ind:5} {h_val:2}', end='    ')

    if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):

        # Previous element found, previous value higher,

        # sum of values within the target limit

        trg += l_val      # Add the current (left) value

        low_ind = l_ind   # Set new lower index

        side = 'Left:'    # For diagnostic printout

    elif (h_ind >= 0) and (trg + h_val) <= target:

        # Next element found, sum of values within the target limit

        trg += h_val      # Add the current (right) value

        high_ind = h_ind  # Set new upper index

        side = 'Right:'   # For diagnostic printout

    else:

        print()           # Diagnostic printout - instead of part 2

        break

    # Diagnostic printout - part 2

    print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')

print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')

answered Nov 18 '18 at 10:09

Valdi_Bo

4,7052715

Try the following (in my opinion more pythonic) script.

I added some test printouts. In the final version remove them
and convert the main processintg part into a function.

import pandas as pd



def nxt(ser, kk : int):

    """Get key / value from ser for key == kk. If the given key absent, return (-1, 0)"""

    if kk in ser.index:

        val = ser[kk]

        return (kk, val)

    else:

        return (-1, 0)



# Create test Series

index = range(75860, 76510, 10)

value = [1, 1, 4, 6, 7, 7, 7, 7, 8, 7,

     7,  7,  8,  6,  6,  7, 15, 23, 26, 30,

    31, 28, 22, 22, 21, 19, 14, 15, 15, 14,

    12, 12, 13, 14, 14, 15, 15, 19, 19, 23,

    25, 34, 38, 39, 40, 41, 35, 35, 30, 26,

    23, 23, 29, 25, 25, 25, 23, 21, 19, 16,

    14,  7,  6,  4,  1]

sample_ser = pd.Series(value, index=index)



# Processing

target = sample_ser.sum()*0.68  # Target limit

# Index of the max value. Low / high indices start also from here

idmax = low_ind = high_ind = sample_ser.idxmax()

trg = sample_ser[idmax]    # The max value

while 1:

    # Get index / value for elements before / after the current range

    l_ind, l_val = nxt(sample_ser, low_ind - 10)

    h_ind, h_val = nxt(sample_ser, high_ind + 10)

    # Diagnostic printout - part 1

    print(f'L: {l_ind:5} {l_val:2}   R: {h_ind:5} {h_val:2}', end='    ')

    if (l_ind >= 0) and (l_val > h_val) and (trg + l_val <= target):

        # Previous element found, previous value higher,

        # sum of values within the target limit

        trg += l_val      # Add the current (left) value

        low_ind = l_ind   # Set new lower index

        side = 'Left:'    # For diagnostic printout

    elif (h_ind >= 0) and (trg + h_val) <= target:

        # Next element found, sum of values within the target limit

        trg += h_val      # Add the current (right) value

        high_ind = h_ind  # Set new upper index

        side = 'Right:'   # For diagnostic printout

    else:

        print()           # Diagnostic printout - instead of part 2

        break

    # Diagnostic printout - part 2

    print(f'{side:<6} {low_ind:5} {high_ind:5} {trg:3}')

print(f'Result: {low_ind:5} {high_ind:5} {trg:3}')

answered Nov 18 '18 at 10:09

Valdi_Bo

4,7052715

answered Nov 18 '18 at 10:09

Valdi_Bo

4,7052715

answered Nov 18 '18 at 10:09

Valdi_Bo

4,7052715

answered Nov 18 '18 at 10:09

Valdi_Bo

4,7052715

Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

– ilma
Nov 20 '18 at 16:52

Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

– Valdi_Bo
Nov 20 '18 at 17:21

Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

– ilma
Nov 22 '18 at 7:17

add a comment |

Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

– ilma
Nov 20 '18 at 16:52

Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

– Valdi_Bo
Nov 20 '18 at 17:21

Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

– ilma
Nov 22 '18 at 7:17

Thanks for this solution, i will try to implement and report back.... So there's no more Panda-onic way to do this kind of things? We need to implement custom for/while loop?

– ilma
Nov 20 '18 at 16:52

Using Pandas methods (especially vectorized) is easy when the task is based on sequential processing of rows in DataFrame or elements in a Series. Here the thing is different: You have to start from "maximal" element in the Series and extend the range either up or down, based on elements "enclosing" the current range. I'm afraid, there is no other way but the custom loop, as I did. Or maybe someone else will propose other (more Pandasonic) solution?

– Valdi_Bo
Nov 20 '18 at 17:21

Thank you very much, in the mean time i'm using your snippet and works as intended! From my little knowledge of Pandas i see no other way to implement something like this in pure Pandas like you said. Let's see if someone can help with this subject.

– ilma
Nov 22 '18 at 7:17

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk