Spark can I manually specify the number of partitions when do textFile
The spark will automatically decide the number of partitions base on the size of the input file. I have two questions:
Can I specify the number of the partition rather than let the spark decide how much partitions?
How bad is shuffle when doing the repartition? Is it really expensive for the performance? My case is that I need repartition to "1" to write into the one Parquet file, the partition was "31". How bad is it? why?
apache-spark text-files hive-partitions
add a comment |
The spark will automatically decide the number of partitions base on the size of the input file. I have two questions:
Can I specify the number of the partition rather than let the spark decide how much partitions?
How bad is shuffle when doing the repartition? Is it really expensive for the performance? My case is that I need repartition to "1" to write into the one Parquet file, the partition was "31". How bad is it? why?
apache-spark text-files hive-partitions
add a comment |
The spark will automatically decide the number of partitions base on the size of the input file. I have two questions:
Can I specify the number of the partition rather than let the spark decide how much partitions?
How bad is shuffle when doing the repartition? Is it really expensive for the performance? My case is that I need repartition to "1" to write into the one Parquet file, the partition was "31". How bad is it? why?
apache-spark text-files hive-partitions
The spark will automatically decide the number of partitions base on the size of the input file. I have two questions:
Can I specify the number of the partition rather than let the spark decide how much partitions?
How bad is shuffle when doing the repartition? Is it really expensive for the performance? My case is that I need repartition to "1" to write into the one Parquet file, the partition was "31". How bad is it? why?
apache-spark text-files hive-partitions
apache-spark text-files hive-partitions
asked Nov 19 '18 at 5:19
bd zhangbd zhang
12
12
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Repartition and coalesce are the two functions that are used for repartitioning of data once it is read.
I know, I mean the performance.
– bd zhang
Nov 19 '18 at 19:15
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53368716%2fspark-can-i-manually-specify-the-number-of-partitions-when-do-textfile%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Repartition and coalesce are the two functions that are used for repartitioning of data once it is read.
I know, I mean the performance.
– bd zhang
Nov 19 '18 at 19:15
add a comment |
Repartition and coalesce are the two functions that are used for repartitioning of data once it is read.
I know, I mean the performance.
– bd zhang
Nov 19 '18 at 19:15
add a comment |
Repartition and coalesce are the two functions that are used for repartitioning of data once it is read.
Repartition and coalesce are the two functions that are used for repartitioning of data once it is read.
answered Nov 19 '18 at 7:40
BDABDA
2569
2569
I know, I mean the performance.
– bd zhang
Nov 19 '18 at 19:15
add a comment |
I know, I mean the performance.
– bd zhang
Nov 19 '18 at 19:15
I know, I mean the performance.
– bd zhang
Nov 19 '18 at 19:15
I know, I mean the performance.
– bd zhang
Nov 19 '18 at 19:15
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53368716%2fspark-can-i-manually-specify-the-number-of-partitions-when-do-textfile%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown