How to use dask to populate DataFrame in parallelized task?
I would like to use dask to parallelize a numbercrunching task.
This task utilizes only one of the cores in my computer.
As a result of that task I would like to add an entry to a DataFrame via shared_df.loc[len(shared_df)] = [x, 'y']. This DataFrame should be populized by all the (four) paralllel workers / threads in my computer.
How do I have to setup dask to perform this?
python pandas python-multiprocessing python-multithreading dask
add a comment |
I would like to use dask to parallelize a numbercrunching task.
This task utilizes only one of the cores in my computer.
As a result of that task I would like to add an entry to a DataFrame via shared_df.loc[len(shared_df)] = [x, 'y']. This DataFrame should be populized by all the (four) paralllel workers / threads in my computer.
How do I have to setup dask to perform this?
python pandas python-multiprocessing python-multithreading dask
It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what[x, 'y']are.
– user32185
Nov 16 '18 at 14:50
add a comment |
I would like to use dask to parallelize a numbercrunching task.
This task utilizes only one of the cores in my computer.
As a result of that task I would like to add an entry to a DataFrame via shared_df.loc[len(shared_df)] = [x, 'y']. This DataFrame should be populized by all the (four) paralllel workers / threads in my computer.
How do I have to setup dask to perform this?
python pandas python-multiprocessing python-multithreading dask
I would like to use dask to parallelize a numbercrunching task.
This task utilizes only one of the cores in my computer.
As a result of that task I would like to add an entry to a DataFrame via shared_df.loc[len(shared_df)] = [x, 'y']. This DataFrame should be populized by all the (four) paralllel workers / threads in my computer.
How do I have to setup dask to perform this?
python pandas python-multiprocessing python-multithreading dask
python pandas python-multiprocessing python-multithreading dask
asked Nov 16 '18 at 7:58
user9098935
It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what[x, 'y']are.
– user32185
Nov 16 '18 at 14:50
add a comment |
It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what[x, 'y']are.
– user32185
Nov 16 '18 at 14:50
It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what
[x, 'y'] are.– user32185
Nov 16 '18 at 14:50
It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what
[x, 'y'] are.– user32185
Nov 16 '18 at 14:50
add a comment |
1 Answer
1
active
oldest
votes
The right way to do something like this, in rough outline:
make a function that, for a given argument, returns a data-frame of some part of the total data
wrap this function in
dask.delayed, make a list of calls for each input argument, and make a dask-dataframe withdd.from_delayedif you really need the index to be sorted and the index to partition along different lines than the chunking you applied in the previous step, you may want to do
set_index
Please read the docstrings and examples for each of these steps!
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53333644%2fhow-to-use-dask-to-populate-dataframe-in-parallelized-task%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The right way to do something like this, in rough outline:
make a function that, for a given argument, returns a data-frame of some part of the total data
wrap this function in
dask.delayed, make a list of calls for each input argument, and make a dask-dataframe withdd.from_delayedif you really need the index to be sorted and the index to partition along different lines than the chunking you applied in the previous step, you may want to do
set_index
Please read the docstrings and examples for each of these steps!
add a comment |
The right way to do something like this, in rough outline:
make a function that, for a given argument, returns a data-frame of some part of the total data
wrap this function in
dask.delayed, make a list of calls for each input argument, and make a dask-dataframe withdd.from_delayedif you really need the index to be sorted and the index to partition along different lines than the chunking you applied in the previous step, you may want to do
set_index
Please read the docstrings and examples for each of these steps!
add a comment |
The right way to do something like this, in rough outline:
make a function that, for a given argument, returns a data-frame of some part of the total data
wrap this function in
dask.delayed, make a list of calls for each input argument, and make a dask-dataframe withdd.from_delayedif you really need the index to be sorted and the index to partition along different lines than the chunking you applied in the previous step, you may want to do
set_index
Please read the docstrings and examples for each of these steps!
The right way to do something like this, in rough outline:
make a function that, for a given argument, returns a data-frame of some part of the total data
wrap this function in
dask.delayed, make a list of calls for each input argument, and make a dask-dataframe withdd.from_delayedif you really need the index to be sorted and the index to partition along different lines than the chunking you applied in the previous step, you may want to do
set_index
Please read the docstrings and examples for each of these steps!
answered Nov 18 '18 at 17:23
mdurantmdurant
10k11435
10k11435
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53333644%2fhow-to-use-dask-to-populate-dataframe-in-parallelized-task%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It looks to me the same question you asked on this comment Have a look to my comment for a toy example. Otherwise please share a mcve for this particular problem. It's not clear to me what
[x, 'y']are.– user32185
Nov 16 '18 at 14:50