How to use dask to populate DataFrame in parallelized task?












0















I would like to use dask to parallelize a numbercrunching task.



This task utilizes only one of the cores in my computer.



As a result of that task I would like to add an entry to a DataFrame via shared_df.loc[len(shared_df)] = [x, 'y']. This DataFrame should be populized by all the (four) paralllel workers / threads in my computer.



How do I have to setup dask to perform this?










share|improve this question























  • It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what [x, 'y'] are.

    – user32185
    Nov 16 '18 at 14:50
















0















I would like to use dask to parallelize a numbercrunching task.



This task utilizes only one of the cores in my computer.



As a result of that task I would like to add an entry to a DataFrame via shared_df.loc[len(shared_df)] = [x, 'y']. This DataFrame should be populized by all the (four) paralllel workers / threads in my computer.



How do I have to setup dask to perform this?










share|improve this question























  • It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what [x, 'y'] are.

    – user32185
    Nov 16 '18 at 14:50














0












0








0








I would like to use dask to parallelize a numbercrunching task.



This task utilizes only one of the cores in my computer.



As a result of that task I would like to add an entry to a DataFrame via shared_df.loc[len(shared_df)] = [x, 'y']. This DataFrame should be populized by all the (four) paralllel workers / threads in my computer.



How do I have to setup dask to perform this?










share|improve this question














I would like to use dask to parallelize a numbercrunching task.



This task utilizes only one of the cores in my computer.



As a result of that task I would like to add an entry to a DataFrame via shared_df.loc[len(shared_df)] = [x, 'y']. This DataFrame should be populized by all the (four) paralllel workers / threads in my computer.



How do I have to setup dask to perform this?







python pandas python-multiprocessing python-multithreading dask






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 16 '18 at 7:58







user9098935




















  • It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what [x, 'y'] are.

    – user32185
    Nov 16 '18 at 14:50



















  • It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what [x, 'y'] are.

    – user32185
    Nov 16 '18 at 14:50

















It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what [x, 'y'] are.

– user32185
Nov 16 '18 at 14:50





It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what [x, 'y'] are.

– user32185
Nov 16 '18 at 14:50












1 Answer
1






active

oldest

votes


















0














The right way to do something like this, in rough outline:




  • make a function that, for a given argument, returns a data-frame of some part of the total data


  • wrap this function in dask.delayed, make a list of calls for each input argument, and make a dask-dataframe with dd.from_delayed


  • if you really need the index to be sorted and the index to partition along different lines than the chunking you applied in the previous step, you may want to do set_index



Please read the docstrings and examples for each of these steps!






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53333644%2fhow-to-use-dask-to-populate-dataframe-in-parallelized-task%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown
























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    The right way to do something like this, in rough outline:




    • make a function that, for a given argument, returns a data-frame of some part of the total data


    • wrap this function in dask.delayed, make a list of calls for each input argument, and make a dask-dataframe with dd.from_delayed


    • if you really need the index to be sorted and the index to partition along different lines than the chunking you applied in the previous step, you may want to do set_index



    Please read the docstrings and examples for each of these steps!






    share|improve this answer




























      0














      The right way to do something like this, in rough outline:




      • make a function that, for a given argument, returns a data-frame of some part of the total data


      • wrap this function in dask.delayed, make a list of calls for each input argument, and make a dask-dataframe with dd.from_delayed


      • if you really need the index to be sorted and the index to partition along different lines than the chunking you applied in the previous step, you may want to do set_index



      Please read the docstrings and examples for each of these steps!






      share|improve this answer


























        0












        0








        0







        The right way to do something like this, in rough outline:




        • make a function that, for a given argument, returns a data-frame of some part of the total data


        • wrap this function in dask.delayed, make a list of calls for each input argument, and make a dask-dataframe with dd.from_delayed


        • if you really need the index to be sorted and the index to partition along different lines than the chunking you applied in the previous step, you may want to do set_index



        Please read the docstrings and examples for each of these steps!






        share|improve this answer













        The right way to do something like this, in rough outline:




        • make a function that, for a given argument, returns a data-frame of some part of the total data


        • wrap this function in dask.delayed, make a list of calls for each input argument, and make a dask-dataframe with dd.from_delayed


        • if you really need the index to be sorted and the index to partition along different lines than the chunking you applied in the previous step, you may want to do set_index



        Please read the docstrings and examples for each of these steps!







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 18 '18 at 17:23









        mdurantmdurant

        10k11435




        10k11435






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53333644%2fhow-to-use-dask-to-populate-dataframe-in-parallelized-task%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to pass form data using jquery Ajax to insert data in database?

            National Museum of Racing and Hall of Fame

            Guess what letter conforming each word