Presto running slower than SQL Server

Configured the SQL Server connnector in Presto, and tried few simple queries like:

Select count(0) from table_name

or,

Select sum(column_name) from table_name

Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.

This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.

    Query Plan                                                       

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Fragment 1 [SINGLE]                                                                                                                                                      

     Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)                                                                                                              

     Output layout: [sum]                                                                                                                                                 

     Output partitioning: SINGLE                                                                                                                                        

     - Aggregate(FINAL) => [sum:double]                                                                                                                                   

             Cost: ?%, Output: 1 row (9B)                                                                                                                                 

             Input avg.: 1.00 lines, Input std.dev.: 0.00%                                                                                                                

             sum := "sum"("sum_4")                                                                                                                                        

         - LocalExchange[SINGLE] () => sum_4:double                                                                                                                       

                 Cost: ?%, Output: 1 row (9B)                                                                                                                             

                 Input avg.: 0.06 lines, Input std.dev.: 387.30%                                                                                                          

             - RemoteSource[2] => [sum_4:double]                                                                                                                          

                     Cost: ?%, Output: 1 row (9B)                                                                                                                         

                     Input avg.: 0.06 lines, Input std.dev.: 387.30%                                                                                                      



 Fragment 2 [SOURCE]                                                                                                                                                      

     Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)                                                                                                  

     Output layout: [sum_4]                                                                                                                                               

     Output partitioning: SINGLE                                                                                                                                        

     - Aggregate(PARTIAL) => [sum_4:double]                                                                                                                               

             Cost: 0.21%, Output: 1 row (9B)                                                                                                                              

             Input avg.: 220770667.00 lines, Input std.dev.: 0.00%                                                                                                        

             sum_4 := "sum"("total_base_dtd")                                                                                                                             

         - TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name  ..

                 Cost: 99.79%, Output: 220770667 rows (1.85GB)                                                                                                            

                 Input avg.: 220770667.00 lines, Input std.dev.: 0.00%                                                                                                    

                 total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}

asked Nov 20 '18 at 16:46

Neo

add a comment |

Configured the SQL Server connnector in Presto, and tried few simple queries like:

Select count(0) from table_name

or,

Select sum(column_name) from table_name

Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.

This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.

    Query Plan                                                       

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Fragment 1 [SINGLE]                                                                                                                                                      

     Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)                                                                                                              

     Output layout: [sum]                                                                                                                                                 

     Output partitioning: SINGLE                                                                                                                                        

     - Aggregate(FINAL) => [sum:double]                                                                                                                                   

             Cost: ?%, Output: 1 row (9B)                                                                                                                                 

             Input avg.: 1.00 lines, Input std.dev.: 0.00%                                                                                                                

             sum := "sum"("sum_4")                                                                                                                                        

         - LocalExchange[SINGLE] () => sum_4:double                                                                                                                       

                 Cost: ?%, Output: 1 row (9B)                                                                                                                             

                 Input avg.: 0.06 lines, Input std.dev.: 387.30%                                                                                                          

             - RemoteSource[2] => [sum_4:double]                                                                                                                          

                     Cost: ?%, Output: 1 row (9B)                                                                                                                         

                     Input avg.: 0.06 lines, Input std.dev.: 387.30%                                                                                                      



 Fragment 2 [SOURCE]                                                                                                                                                      

     Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)                                                                                                  

     Output layout: [sum_4]                                                                                                                                               

     Output partitioning: SINGLE                                                                                                                                        

     - Aggregate(PARTIAL) => [sum_4:double]                                                                                                                               

             Cost: 0.21%, Output: 1 row (9B)                                                                                                                              

             Input avg.: 220770667.00 lines, Input std.dev.: 0.00%                                                                                                        

             sum_4 := "sum"("total_base_dtd")                                                                                                                             

         - TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name  ..

                 Cost: 99.79%, Output: 220770667 rows (1.85GB)                                                                                                            

                 Input avg.: 220770667.00 lines, Input std.dev.: 0.00%                                                                                                    

                 total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}

asked Nov 20 '18 at 16:46

Neo

add a comment |

Configured the SQL Server connnector in Presto, and tried few simple queries like:

Select count(0) from table_name

or,

Select sum(column_name) from table_name

Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.

This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.

    Query Plan                                                       

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Fragment 1 [SINGLE]                                                                                                                                                      

     Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)                                                                                                              

     Output layout: [sum]                                                                                                                                                 

     Output partitioning: SINGLE                                                                                                                                        

     - Aggregate(FINAL) => [sum:double]                                                                                                                                   

             Cost: ?%, Output: 1 row (9B)                                                                                                                                 

             Input avg.: 1.00 lines, Input std.dev.: 0.00%                                                                                                                

             sum := "sum"("sum_4")                                                                                                                                        

         - LocalExchange[SINGLE] () => sum_4:double                                                                                                                       

                 Cost: ?%, Output: 1 row (9B)                                                                                                                             

                 Input avg.: 0.06 lines, Input std.dev.: 387.30%                                                                                                          

             - RemoteSource[2] => [sum_4:double]                                                                                                                          

                     Cost: ?%, Output: 1 row (9B)                                                                                                                         

                     Input avg.: 0.06 lines, Input std.dev.: 387.30%                                                                                                      



 Fragment 2 [SOURCE]                                                                                                                                                      

     Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)                                                                                                  

     Output layout: [sum_4]                                                                                                                                               

     Output partitioning: SINGLE                                                                                                                                        

     - Aggregate(PARTIAL) => [sum_4:double]                                                                                                                               

             Cost: 0.21%, Output: 1 row (9B)                                                                                                                              

             Input avg.: 220770667.00 lines, Input std.dev.: 0.00%                                                                                                        

             sum_4 := "sum"("total_base_dtd")                                                                                                                             

         - TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name  ..

                 Cost: 99.79%, Output: 220770667 rows (1.85GB)                                                                                                            

                 Input avg.: 220770667.00 lines, Input std.dev.: 0.00%                                                                                                    

                 total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}

asked Nov 20 '18 at 16:46

Neo

Configured the SQL Server connnector in Presto, and tried few simple queries like:

Select count(0) from table_name

or,

Select sum(column_name) from table_name

Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.

This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.

    Query Plan                                                       

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Fragment 1 [SINGLE]                                                                                                                                                      

     Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)                                                                                                              

     Output layout: [sum]                                                                                                                                                 

     Output partitioning: SINGLE                                                                                                                                        

     - Aggregate(FINAL) => [sum:double]                                                                                                                                   

             Cost: ?%, Output: 1 row (9B)                                                                                                                                 

             Input avg.: 1.00 lines, Input std.dev.: 0.00%                                                                                                                

             sum := "sum"("sum_4")                                                                                                                                        

         - LocalExchange[SINGLE] () => sum_4:double                                                                                                                       

                 Cost: ?%, Output: 1 row (9B)                                                                                                                             

                 Input avg.: 0.06 lines, Input std.dev.: 387.30%                                                                                                          

             - RemoteSource[2] => [sum_4:double]                                                                                                                          

                     Cost: ?%, Output: 1 row (9B)                                                                                                                         

                     Input avg.: 0.06 lines, Input std.dev.: 387.30%                                                                                                      



 Fragment 2 [SOURCE]                                                                                                                                                      

     Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)                                                                                                  

     Output layout: [sum_4]                                                                                                                                               

     Output partitioning: SINGLE                                                                                                                                        

     - Aggregate(PARTIAL) => [sum_4:double]                                                                                                                               

             Cost: 0.21%, Output: 1 row (9B)                                                                                                                              

             Input avg.: 220770667.00 lines, Input std.dev.: 0.00%                                                                                                        

             sum_4 := "sum"("total_base_dtd")                                                                                                                             

         - TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name  ..

                 Cost: 99.79%, Output: 220770667 rows (1.85GB)                                                                                                            

                 Input avg.: 220770667.00 lines, Input std.dev.: 0.00%                                                                                                    

                 total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}

sql-server prestodb

asked Nov 20 '18 at 16:46

Neo

asked Nov 20 '18 at 16:46

Neo

asked Nov 20 '18 at 16:46

Neo

asked Nov 20 '18 at 16:46

Neo

asked Nov 20 '18 at 16:46

Neo

add a comment |

2 Answers
2

active

oldest

votes

Both example queries are aggregate queries that produce single row result.
Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.

As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).

Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.

answered Nov 20 '18 at 21:33

Piotr Findeisen

5,33711640

add a comment |

Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.

answered Nov 23 '18 at 9:31

burak emre

72411035

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397707%2fpresto-running-slower-than-sql-server%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

answered Nov 20 '18 at 21:33

Piotr Findeisen

5,33711640

add a comment |

answered Nov 20 '18 at 21:33

Piotr Findeisen

5,33711640

add a comment |

answered Nov 20 '18 at 21:33

Piotr Findeisen

5,33711640

answered Nov 20 '18 at 21:33

Piotr Findeisen

5,33711640

answered Nov 20 '18 at 21:33

Piotr Findeisen

5,33711640

answered Nov 20 '18 at 21:33

Piotr Findeisen

5,33711640

answered Nov 20 '18 at 21:33

Piotr Findeisen

5,33711640

add a comment |

Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.

answered Nov 23 '18 at 9:31

burak emre

72411035

add a comment |

Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.

answered Nov 23 '18 at 9:31

burak emre

72411035

add a comment |

Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.

answered Nov 23 '18 at 9:31

burak emre

72411035

Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.

answered Nov 23 '18 at 9:31

burak emre

72411035

answered Nov 23 '18 at 9:31

burak emre

72411035

answered Nov 23 '18 at 9:31

burak emre

72411035

answered Nov 23 '18 at 9:31

burak emre

72411035

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk