openmpi runtime error: Hello World run on hosts












1















I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:



Primary job  terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.


it keeps printing HelloWorld and after a while:



mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: 
Process name: [[62648,1],2]
Exit code: 2


Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?



Thanks










share|improve this question























  • please post your code !

    – Gilles Gouaillardet
    Nov 20 '18 at 0:00











  • It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.

    – Fabio Semeraro
    Nov 20 '18 at 0:04













  • can you simply python3 helloworld.py on all your nodes ?

    – Gilles Gouaillardet
    Nov 20 '18 at 0:09











  • In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.

    – Fabio Semeraro
    Nov 20 '18 at 0:18











  • this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.

    – Gilles Gouaillardet
    Nov 20 '18 at 0:24
















1















I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:



Primary job  terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.


it keeps printing HelloWorld and after a while:



mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: 
Process name: [[62648,1],2]
Exit code: 2


Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?



Thanks










share|improve this question























  • please post your code !

    – Gilles Gouaillardet
    Nov 20 '18 at 0:00











  • It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.

    – Fabio Semeraro
    Nov 20 '18 at 0:04













  • can you simply python3 helloworld.py on all your nodes ?

    – Gilles Gouaillardet
    Nov 20 '18 at 0:09











  • In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.

    – Fabio Semeraro
    Nov 20 '18 at 0:18











  • this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.

    – Gilles Gouaillardet
    Nov 20 '18 at 0:24














1












1








1








I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:



Primary job  terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.


it keeps printing HelloWorld and after a while:



mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: 
Process name: [[62648,1],2]
Exit code: 2


Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?



Thanks










share|improve this question














I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:



Primary job  terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.


it keeps printing HelloWorld and after a while:



mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: 
Process name: [[62648,1],2]
Exit code: 2


Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?



Thanks







parallel-processing cluster-computing openmpi






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 19 '18 at 22:01









Fabio SemeraroFabio Semeraro

62




62













  • please post your code !

    – Gilles Gouaillardet
    Nov 20 '18 at 0:00











  • It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.

    – Fabio Semeraro
    Nov 20 '18 at 0:04













  • can you simply python3 helloworld.py on all your nodes ?

    – Gilles Gouaillardet
    Nov 20 '18 at 0:09











  • In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.

    – Fabio Semeraro
    Nov 20 '18 at 0:18











  • this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.

    – Gilles Gouaillardet
    Nov 20 '18 at 0:24



















  • please post your code !

    – Gilles Gouaillardet
    Nov 20 '18 at 0:00











  • It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.

    – Fabio Semeraro
    Nov 20 '18 at 0:04













  • can you simply python3 helloworld.py on all your nodes ?

    – Gilles Gouaillardet
    Nov 20 '18 at 0:09











  • In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.

    – Fabio Semeraro
    Nov 20 '18 at 0:18











  • this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.

    – Gilles Gouaillardet
    Nov 20 '18 at 0:24

















please post your code !

– Gilles Gouaillardet
Nov 20 '18 at 0:00





please post your code !

– Gilles Gouaillardet
Nov 20 '18 at 0:00













It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.

– Fabio Semeraro
Nov 20 '18 at 0:04







It is a simple HelloWorld in Python: while True: print('HelloWorld') Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.

– Fabio Semeraro
Nov 20 '18 at 0:04















can you simply python3 helloworld.py on all your nodes ?

– Gilles Gouaillardet
Nov 20 '18 at 0:09





can you simply python3 helloworld.py on all your nodes ?

– Gilles Gouaillardet
Nov 20 '18 at 0:09













In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.

– Fabio Semeraro
Nov 20 '18 at 0:18





In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.

– Fabio Semeraro
Nov 20 '18 at 0:18













this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.

– Gilles Gouaillardet
Nov 20 '18 at 0:24





this program basically overflows stdout, so I am not sure of what you expect. what if you mpirun ... hostname ? if it works, then I suggest you try a mpi4py helloworld.

– Gilles Gouaillardet
Nov 20 '18 at 0:24












1 Answer
1






active

oldest

votes


















0














SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
Thanks for your help, hope this can be useful to other users.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383262%2fopenmpi-runtime-error-hello-world-run-on-hosts%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
    Thanks for your help, hope this can be useful to other users.






    share|improve this answer




























      0














      SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
      Thanks for your help, hope this can be useful to other users.






      share|improve this answer


























        0












        0








        0







        SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
        Thanks for your help, hope this can be useful to other users.






        share|improve this answer













        SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
        Thanks for your help, hope this can be useful to other users.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 16 '18 at 20:17









        Fabio SemeraroFabio Semeraro

        62




        62
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383262%2fopenmpi-runtime-error-hello-world-run-on-hosts%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Guess what letter conforming each word

            Port of Spain

            Run scheduled task as local user group (not BUILTIN)