Presto running slower than SQL Server
Configured the SQL Server connnector in Presto, and tried few simple queries like:
Select count(0) from table_name
or,
Select sum(column_name) from table_name
Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.
This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.
Query Plan
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Fragment 1 [SINGLE]
Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)
Output layout: [sum]
Output partitioning: SINGLE
- Aggregate(FINAL) => [sum:double]
Cost: ?%, Output: 1 row (9B)
Input avg.: 1.00 lines, Input std.dev.: 0.00%
sum := "sum"("sum_4")
- LocalExchange[SINGLE] () => sum_4:double
Cost: ?%, Output: 1 row (9B)
Input avg.: 0.06 lines, Input std.dev.: 387.30%
- RemoteSource[2] => [sum_4:double]
Cost: ?%, Output: 1 row (9B)
Input avg.: 0.06 lines, Input std.dev.: 387.30%
Fragment 2 [SOURCE]
Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)
Output layout: [sum_4]
Output partitioning: SINGLE
- Aggregate(PARTIAL) => [sum_4:double]
Cost: 0.21%, Output: 1 row (9B)
Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
sum_4 := "sum"("total_base_dtd")
- TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name ..
Cost: 99.79%, Output: 220770667 rows (1.85GB)
Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}
sql-server prestodb
add a comment |
Configured the SQL Server connnector in Presto, and tried few simple queries like:
Select count(0) from table_name
or,
Select sum(column_name) from table_name
Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.
This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.
Query Plan
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Fragment 1 [SINGLE]
Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)
Output layout: [sum]
Output partitioning: SINGLE
- Aggregate(FINAL) => [sum:double]
Cost: ?%, Output: 1 row (9B)
Input avg.: 1.00 lines, Input std.dev.: 0.00%
sum := "sum"("sum_4")
- LocalExchange[SINGLE] () => sum_4:double
Cost: ?%, Output: 1 row (9B)
Input avg.: 0.06 lines, Input std.dev.: 387.30%
- RemoteSource[2] => [sum_4:double]
Cost: ?%, Output: 1 row (9B)
Input avg.: 0.06 lines, Input std.dev.: 387.30%
Fragment 2 [SOURCE]
Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)
Output layout: [sum_4]
Output partitioning: SINGLE
- Aggregate(PARTIAL) => [sum_4:double]
Cost: 0.21%, Output: 1 row (9B)
Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
sum_4 := "sum"("total_base_dtd")
- TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name ..
Cost: 99.79%, Output: 220770667 rows (1.85GB)
Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}
sql-server prestodb
add a comment |
Configured the SQL Server connnector in Presto, and tried few simple queries like:
Select count(0) from table_name
or,
Select sum(column_name) from table_name
Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.
This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.
Query Plan
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Fragment 1 [SINGLE]
Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)
Output layout: [sum]
Output partitioning: SINGLE
- Aggregate(FINAL) => [sum:double]
Cost: ?%, Output: 1 row (9B)
Input avg.: 1.00 lines, Input std.dev.: 0.00%
sum := "sum"("sum_4")
- LocalExchange[SINGLE] () => sum_4:double
Cost: ?%, Output: 1 row (9B)
Input avg.: 0.06 lines, Input std.dev.: 387.30%
- RemoteSource[2] => [sum_4:double]
Cost: ?%, Output: 1 row (9B)
Input avg.: 0.06 lines, Input std.dev.: 387.30%
Fragment 2 [SOURCE]
Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)
Output layout: [sum_4]
Output partitioning: SINGLE
- Aggregate(PARTIAL) => [sum_4:double]
Cost: 0.21%, Output: 1 row (9B)
Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
sum_4 := "sum"("total_base_dtd")
- TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name ..
Cost: 99.79%, Output: 220770667 rows (1.85GB)
Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}
sql-server prestodb
Configured the SQL Server connnector in Presto, and tried few simple queries like:
Select count(0) from table_name
or,
Select sum(column_name) from table_name
Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.
This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.
Query Plan
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Fragment 1 [SINGLE]
Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)
Output layout: [sum]
Output partitioning: SINGLE
- Aggregate(FINAL) => [sum:double]
Cost: ?%, Output: 1 row (9B)
Input avg.: 1.00 lines, Input std.dev.: 0.00%
sum := "sum"("sum_4")
- LocalExchange[SINGLE] () => sum_4:double
Cost: ?%, Output: 1 row (9B)
Input avg.: 0.06 lines, Input std.dev.: 387.30%
- RemoteSource[2] => [sum_4:double]
Cost: ?%, Output: 1 row (9B)
Input avg.: 0.06 lines, Input std.dev.: 387.30%
Fragment 2 [SOURCE]
Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)
Output layout: [sum_4]
Output partitioning: SINGLE
- Aggregate(PARTIAL) => [sum_4:double]
Cost: 0.21%, Output: 1 row (9B)
Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
sum_4 := "sum"("total_base_dtd")
- TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name ..
Cost: 99.79%, Output: 220770667 rows (1.85GB)
Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}
sql-server prestodb
sql-server prestodb
asked Nov 20 '18 at 16:46
NeoNeo
83
83
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Both example queries are aggregate queries that produce single row result.
Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.
As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).
Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.
add a comment |
Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397707%2fpresto-running-slower-than-sql-server%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Both example queries are aggregate queries that produce single row result.
Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.
As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).
Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.
add a comment |
Both example queries are aggregate queries that produce single row result.
Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.
As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).
Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.
add a comment |
Both example queries are aggregate queries that produce single row result.
Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.
As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).
Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.
Both example queries are aggregate queries that produce single row result.
Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.
As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).
Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.
answered Nov 20 '18 at 21:33
Piotr FindeisenPiotr Findeisen
5,33711640
5,33711640
add a comment |
add a comment |
Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.
add a comment |
Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.
add a comment |
Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.
Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.
answered Nov 23 '18 at 9:31
burak emreburak emre
72411035
72411035
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397707%2fpresto-running-slower-than-sql-server%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown