openmpi runtime error: Hello World run on hosts
I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:
Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.
it keeps printing HelloWorld and after a while:
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[62648,1],2]
Exit code: 2
Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?
Thanks
parallel-processing cluster-computing openmpi
|
show 4 more comments
I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:
Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.
it keeps printing HelloWorld and after a while:
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[62648,1],2]
Exit code: 2
Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?
Thanks
parallel-processing cluster-computing openmpi
please post your code !
– Gilles Gouaillardet
Nov 20 '18 at 0:00
It is a simple HelloWorld in Python:while True: print('HelloWorld')
Then I do:mpirun -np 4 -hostfile myhosts python3 helloworld.py
Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.
– Fabio Semeraro
Nov 20 '18 at 0:04
can you simplypython3 helloworld.py
on all your nodes ?
– Gilles Gouaillardet
Nov 20 '18 at 0:09
In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
– Fabio Semeraro
Nov 20 '18 at 0:18
this program basically overflowsstdout
, so I am not sure of what you expect. what if youmpirun ... hostname
? if it works, then I suggest you try ampi4py
helloworld.
– Gilles Gouaillardet
Nov 20 '18 at 0:24
|
show 4 more comments
I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:
Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.
it keeps printing HelloWorld and after a while:
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[62648,1],2]
Exit code: 2
Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?
Thanks
parallel-processing cluster-computing openmpi
I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:
Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.
it keeps printing HelloWorld and after a while:
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[62648,1],2]
Exit code: 2
Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why.
I've set passwordless SSH and running a file located in a nfs-mounted folder.
Can you help me?
Thanks
parallel-processing cluster-computing openmpi
parallel-processing cluster-computing openmpi
asked Nov 19 '18 at 22:01
Fabio SemeraroFabio Semeraro
62
62
please post your code !
– Gilles Gouaillardet
Nov 20 '18 at 0:00
It is a simple HelloWorld in Python:while True: print('HelloWorld')
Then I do:mpirun -np 4 -hostfile myhosts python3 helloworld.py
Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.
– Fabio Semeraro
Nov 20 '18 at 0:04
can you simplypython3 helloworld.py
on all your nodes ?
– Gilles Gouaillardet
Nov 20 '18 at 0:09
In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
– Fabio Semeraro
Nov 20 '18 at 0:18
this program basically overflowsstdout
, so I am not sure of what you expect. what if youmpirun ... hostname
? if it works, then I suggest you try ampi4py
helloworld.
– Gilles Gouaillardet
Nov 20 '18 at 0:24
|
show 4 more comments
please post your code !
– Gilles Gouaillardet
Nov 20 '18 at 0:00
It is a simple HelloWorld in Python:while True: print('HelloWorld')
Then I do:mpirun -np 4 -hostfile myhosts python3 helloworld.py
Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.
– Fabio Semeraro
Nov 20 '18 at 0:04
can you simplypython3 helloworld.py
on all your nodes ?
– Gilles Gouaillardet
Nov 20 '18 at 0:09
In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
– Fabio Semeraro
Nov 20 '18 at 0:18
this program basically overflowsstdout
, so I am not sure of what you expect. what if youmpirun ... hostname
? if it works, then I suggest you try ampi4py
helloworld.
– Gilles Gouaillardet
Nov 20 '18 at 0:24
please post your code !
– Gilles Gouaillardet
Nov 20 '18 at 0:00
please post your code !
– Gilles Gouaillardet
Nov 20 '18 at 0:00
It is a simple HelloWorld in Python:
while True: print('HelloWorld')
Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py
Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.– Fabio Semeraro
Nov 20 '18 at 0:04
It is a simple HelloWorld in Python:
while True: print('HelloWorld')
Then I do: mpirun -np 4 -hostfile myhosts python3 helloworld.py
Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.– Fabio Semeraro
Nov 20 '18 at 0:04
can you simply
python3 helloworld.py
on all your nodes ?– Gilles Gouaillardet
Nov 20 '18 at 0:09
can you simply
python3 helloworld.py
on all your nodes ?– Gilles Gouaillardet
Nov 20 '18 at 0:09
In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
– Fabio Semeraro
Nov 20 '18 at 0:18
In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
– Fabio Semeraro
Nov 20 '18 at 0:18
this program basically overflows
stdout
, so I am not sure of what you expect. what if you mpirun ... hostname
? if it works, then I suggest you try a mpi4py
helloworld.– Gilles Gouaillardet
Nov 20 '18 at 0:24
this program basically overflows
stdout
, so I am not sure of what you expect. what if you mpirun ... hostname
? if it works, then I suggest you try a mpi4py
helloworld.– Gilles Gouaillardet
Nov 20 '18 at 0:24
|
show 4 more comments
1 Answer
1
active
oldest
votes
SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
Thanks for your help, hope this can be useful to other users.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383262%2fopenmpi-runtime-error-hello-world-run-on-hosts%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
Thanks for your help, hope this can be useful to other users.
add a comment |
SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
Thanks for your help, hope this can be useful to other users.
add a comment |
SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
Thanks for your help, hope this can be useful to other users.
SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs.
Thanks for your help, hope this can be useful to other users.
answered Dec 16 '18 at 20:17
Fabio SemeraroFabio Semeraro
62
62
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383262%2fopenmpi-runtime-error-hello-world-run-on-hosts%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
please post your code !
– Gilles Gouaillardet
Nov 20 '18 at 0:00
It is a simple HelloWorld in Python:
while True: print('HelloWorld')
Then I do:mpirun -np 4 -hostfile myhosts python3 helloworld.py
Running it from the slave, the mpirun works perfectly. I'm trying to figure out why the master isn't able to do the same.– Fabio Semeraro
Nov 20 '18 at 0:04
can you simply
python3 helloworld.py
on all your nodes ?– Gilles Gouaillardet
Nov 20 '18 at 0:09
In serial and in local parallel it works on all nodes. The error arises when I try to use both master and slave from master, while from slave I can run the command and all runs.
– Fabio Semeraro
Nov 20 '18 at 0:18
this program basically overflows
stdout
, so I am not sure of what you expect. what if youmpirun ... hostname
? if it works, then I suggest you try ampi4py
helloworld.– Gilles Gouaillardet
Nov 20 '18 at 0:24