Presto running slower than SQL Server












0















Configured the SQL Server connnector in Presto, and tried few simple queries like:



Select count(0) from table_name


or,



Select sum(column_name) from table_name


Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.



This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.



    Query Plan                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Fragment 1 [SINGLE]
Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)
Output layout: [sum]
Output partitioning: SINGLE
- Aggregate(FINAL) => [sum:double]
Cost: ?%, Output: 1 row (9B)
Input avg.: 1.00 lines, Input std.dev.: 0.00%
sum := "sum"("sum_4")
- LocalExchange[SINGLE] () => sum_4:double
Cost: ?%, Output: 1 row (9B)
Input avg.: 0.06 lines, Input std.dev.: 387.30%
- RemoteSource[2] => [sum_4:double]
Cost: ?%, Output: 1 row (9B)
Input avg.: 0.06 lines, Input std.dev.: 387.30%

Fragment 2 [SOURCE]
Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)
Output layout: [sum_4]
Output partitioning: SINGLE
- Aggregate(PARTIAL) => [sum_4:double]
Cost: 0.21%, Output: 1 row (9B)
Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
sum_4 := "sum"("total_base_dtd")
- TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name ..
Cost: 99.79%, Output: 220770667 rows (1.85GB)
Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}









share|improve this question



























    0















    Configured the SQL Server connnector in Presto, and tried few simple queries like:



    Select count(0) from table_name


    or,



    Select sum(column_name) from table_name


    Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.



    This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.



        Query Plan                                                       
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Fragment 1 [SINGLE]
    Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)
    Output layout: [sum]
    Output partitioning: SINGLE
    - Aggregate(FINAL) => [sum:double]
    Cost: ?%, Output: 1 row (9B)
    Input avg.: 1.00 lines, Input std.dev.: 0.00%
    sum := "sum"("sum_4")
    - LocalExchange[SINGLE] () => sum_4:double
    Cost: ?%, Output: 1 row (9B)
    Input avg.: 0.06 lines, Input std.dev.: 387.30%
    - RemoteSource[2] => [sum_4:double]
    Cost: ?%, Output: 1 row (9B)
    Input avg.: 0.06 lines, Input std.dev.: 387.30%

    Fragment 2 [SOURCE]
    Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)
    Output layout: [sum_4]
    Output partitioning: SINGLE
    - Aggregate(PARTIAL) => [sum_4:double]
    Cost: 0.21%, Output: 1 row (9B)
    Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
    sum_4 := "sum"("total_base_dtd")
    - TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name ..
    Cost: 99.79%, Output: 220770667 rows (1.85GB)
    Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
    total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}









    share|improve this question

























      0












      0








      0








      Configured the SQL Server connnector in Presto, and tried few simple queries like:



      Select count(0) from table_name


      or,



      Select sum(column_name) from table_name


      Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.



      This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.



          Query Plan                                                       
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Fragment 1 [SINGLE]
      Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)
      Output layout: [sum]
      Output partitioning: SINGLE
      - Aggregate(FINAL) => [sum:double]
      Cost: ?%, Output: 1 row (9B)
      Input avg.: 1.00 lines, Input std.dev.: 0.00%
      sum := "sum"("sum_4")
      - LocalExchange[SINGLE] () => sum_4:double
      Cost: ?%, Output: 1 row (9B)
      Input avg.: 0.06 lines, Input std.dev.: 387.30%
      - RemoteSource[2] => [sum_4:double]
      Cost: ?%, Output: 1 row (9B)
      Input avg.: 0.06 lines, Input std.dev.: 387.30%

      Fragment 2 [SOURCE]
      Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)
      Output layout: [sum_4]
      Output partitioning: SINGLE
      - Aggregate(PARTIAL) => [sum_4:double]
      Cost: 0.21%, Output: 1 row (9B)
      Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
      sum_4 := "sum"("total_base_dtd")
      - TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name ..
      Cost: 99.79%, Output: 220770667 rows (1.85GB)
      Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
      total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}









      share|improve this question














      Configured the SQL Server connnector in Presto, and tried few simple queries like:



      Select count(0) from table_name


      or,



      Select sum(column_name) from table_name


      Both above queries ran in SQL server in 300 ms and in Presto its running over 3 min.



      This is the explain analyze of the second query (it seems to do table scan and fetch huge amount of data before doing sum), why it couldnt pushed down the sum operator to SQL Server itself.



          Query Plan                                                       
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Fragment 1 [SINGLE]
      Cost: CPU 2.98ms, Input: 1 row (9B), Output: 1 row (9B)
      Output layout: [sum]
      Output partitioning: SINGLE
      - Aggregate(FINAL) => [sum:double]
      Cost: ?%, Output: 1 row (9B)
      Input avg.: 1.00 lines, Input std.dev.: 0.00%
      sum := "sum"("sum_4")
      - LocalExchange[SINGLE] () => sum_4:double
      Cost: ?%, Output: 1 row (9B)
      Input avg.: 0.06 lines, Input std.dev.: 387.30%
      - RemoteSource[2] => [sum_4:double]
      Cost: ?%, Output: 1 row (9B)
      Input avg.: 0.06 lines, Input std.dev.: 387.30%

      Fragment 2 [SOURCE]
      Cost: CPU 1.67m, Input: 220770667 rows (1.85GB), Output: 1 row (9B)
      Output layout: [sum_4]
      Output partitioning: SINGLE
      - Aggregate(PARTIAL) => [sum_4:double]
      Cost: 0.21%, Output: 1 row (9B)
      Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
      sum_4 := "sum"("total_base_dtd")
      - TableScan[sqlserver:sqlserver:table_name:ivpSQLDatabase:table_name ..
      Cost: 99.79%, Output: 220770667 rows (1.85GB)
      Input avg.: 220770667.00 lines, Input std.dev.: 0.00%
      total_base_dtd := JdbcColumnHandle{connectorId=sqlserver, columnName=total_base_dtd, columnType=double}






      sql-server prestodb






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 20 '18 at 16:46









      NeoNeo

      83




      83
























          2 Answers
          2






          active

          oldest

          votes


















          1














          Both example queries are aggregate queries that produce single row result.
          Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.



          As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).



          Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.






          share|improve this answer































            1














            Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.






            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397707%2fpresto-running-slower-than-sql-server%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              Both example queries are aggregate queries that produce single row result.
              Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.



              As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).



              Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.






              share|improve this answer




























                1














                Both example queries are aggregate queries that produce single row result.
                Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.



                As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).



                Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.






                share|improve this answer


























                  1












                  1








                  1







                  Both example queries are aggregate queries that produce single row result.
                  Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.



                  As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).



                  Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.






                  share|improve this answer













                  Both example queries are aggregate queries that produce single row result.
                  Currently, in Presto it is not possible to push down an aggregation to the underlying data store. Conditions and column selection (narrowing projections) are pushed down, but aggregations are not.



                  As a result, when you query SQL Server from Presto, Presto needs to read all the data (from given column) to do the aggregation, so there is a lot of disk and network traffic. Also, it might be, that SQL Server could optimize away certain aggregations so it might be skipping data read at all (i am guessing here).



                  Presto is not suited to be a frontend to some other database. It can be used as such, but this has some implications. Presto shines when it is put to work as a big data query engine (over S3, HDFS or other object stores) or as a federated query engine, where you combine data from multiple data stores / connectors.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 20 '18 at 21:33









                  Piotr FindeisenPiotr Findeisen

                  5,33711640




                  5,33711640

























                      1














                      Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.






                      share|improve this answer




























                        1














                        Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.






                        share|improve this answer


























                          1












                          1








                          1







                          Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.






                          share|improve this answer













                          Presto doesn't support aggregate-pushdowns but as a workaround, you can create views in the source database (SQL Server in your case) and query those views from Presto.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 23 '18 at 9:31









                          burak emreburak emre

                          72411035




                          72411035






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397707%2fpresto-running-slower-than-sql-server%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Guess what letter conforming each word

                              Port of Spain

                              Run scheduled task as local user group (not BUILTIN)