Elasticsearch: get documents only when value changes












0















I have an ES index with such kind of documents:



from_1,to_1,timestamp_1
from_1,to_1,timestamp_2
from_1,to_2,timestamp_3
from_2,to_3,timestamp_4
from_1,to_2,timestamp_5
from_2,to_3,timestamp_6
from_1,to_1,timestamp_7
from_2,to_4,timestamp_8


I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.



So with the provided sample above:




  1. document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

  2. document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

  3. document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

  4. document with timestamp_4 should be in the result

  5. document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

  6. document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

  7. document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

  8. document with timestamp_8 should be in the result because its combination is completely new so far


I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.



Any ideas how to approach it?










share|improve this question



























    0















    I have an ES index with such kind of documents:



    from_1,to_1,timestamp_1
    from_1,to_1,timestamp_2
    from_1,to_2,timestamp_3
    from_2,to_3,timestamp_4
    from_1,to_2,timestamp_5
    from_2,to_3,timestamp_6
    from_1,to_1,timestamp_7
    from_2,to_4,timestamp_8


    I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.



    So with the provided sample above:




    1. document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

    2. document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

    3. document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

    4. document with timestamp_4 should be in the result

    5. document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

    6. document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

    7. document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

    8. document with timestamp_8 should be in the result because its combination is completely new so far


    I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.



    Any ideas how to approach it?










    share|improve this question

























      0












      0








      0








      I have an ES index with such kind of documents:



      from_1,to_1,timestamp_1
      from_1,to_1,timestamp_2
      from_1,to_2,timestamp_3
      from_2,to_3,timestamp_4
      from_1,to_2,timestamp_5
      from_2,to_3,timestamp_6
      from_1,to_1,timestamp_7
      from_2,to_4,timestamp_8


      I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.



      So with the provided sample above:




      1. document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

      2. document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

      3. document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

      4. document with timestamp_4 should be in the result

      5. document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

      6. document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

      7. document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

      8. document with timestamp_8 should be in the result because its combination is completely new so far


      I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.



      Any ideas how to approach it?










      share|improve this question














      I have an ES index with such kind of documents:



      from_1,to_1,timestamp_1
      from_1,to_1,timestamp_2
      from_1,to_2,timestamp_3
      from_2,to_3,timestamp_4
      from_1,to_2,timestamp_5
      from_2,to_3,timestamp_6
      from_1,to_1,timestamp_7
      from_2,to_4,timestamp_8


      I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.



      So with the provided sample above:




      1. document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

      2. document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

      3. document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

      4. document with timestamp_4 should be in the result

      5. document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

      6. document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

      7. document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

      8. document with timestamp_8 should be in the result because its combination is completely new so far


      I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.



      Any ideas how to approach it?







      elasticsearch






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 19 '18 at 15:35









      martin-gmartin-g

      12.1k1926




      12.1k1926
























          1 Answer
          1






          active

          oldest

          votes


















          1














          The closest thing I could come up with is the following (let me know if it does not work with your data).



          {
          "size": 0,
          "aggs": {
          "from_and_to": {
          "composite" : {
          "size": 5,
          "sources": [
          {
          "from_to_collected":{
          "terms": {
          "script": {
          "lang": "painless",
          "source": "doc['from'].value + '_' + doc['to'].value"
          }
          }
          }
          }]
          },
          "aggs": {
          "top_from_and_to_hits": {
          "top_hits": {
          "size": 1,
          "sort": [{"timestamp":{"order":"asc"}}],
          "_source": {"includes": ["_id"]}
          }
          }
          }
          }
          }
          }


          Keep in mind that the terms aggregations is probabilistic.



          This will allow you to scroll to the next set of buckets over the from_to_collected key.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377956%2felasticsearch-get-documents-only-when-value-changes%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            The closest thing I could come up with is the following (let me know if it does not work with your data).



            {
            "size": 0,
            "aggs": {
            "from_and_to": {
            "composite" : {
            "size": 5,
            "sources": [
            {
            "from_to_collected":{
            "terms": {
            "script": {
            "lang": "painless",
            "source": "doc['from'].value + '_' + doc['to'].value"
            }
            }
            }
            }]
            },
            "aggs": {
            "top_from_and_to_hits": {
            "top_hits": {
            "size": 1,
            "sort": [{"timestamp":{"order":"asc"}}],
            "_source": {"includes": ["_id"]}
            }
            }
            }
            }
            }
            }


            Keep in mind that the terms aggregations is probabilistic.



            This will allow you to scroll to the next set of buckets over the from_to_collected key.






            share|improve this answer






























              1














              The closest thing I could come up with is the following (let me know if it does not work with your data).



              {
              "size": 0,
              "aggs": {
              "from_and_to": {
              "composite" : {
              "size": 5,
              "sources": [
              {
              "from_to_collected":{
              "terms": {
              "script": {
              "lang": "painless",
              "source": "doc['from'].value + '_' + doc['to'].value"
              }
              }
              }
              }]
              },
              "aggs": {
              "top_from_and_to_hits": {
              "top_hits": {
              "size": 1,
              "sort": [{"timestamp":{"order":"asc"}}],
              "_source": {"includes": ["_id"]}
              }
              }
              }
              }
              }
              }


              Keep in mind that the terms aggregations is probabilistic.



              This will allow you to scroll to the next set of buckets over the from_to_collected key.






              share|improve this answer




























                1












                1








                1







                The closest thing I could come up with is the following (let me know if it does not work with your data).



                {
                "size": 0,
                "aggs": {
                "from_and_to": {
                "composite" : {
                "size": 5,
                "sources": [
                {
                "from_to_collected":{
                "terms": {
                "script": {
                "lang": "painless",
                "source": "doc['from'].value + '_' + doc['to'].value"
                }
                }
                }
                }]
                },
                "aggs": {
                "top_from_and_to_hits": {
                "top_hits": {
                "size": 1,
                "sort": [{"timestamp":{"order":"asc"}}],
                "_source": {"includes": ["_id"]}
                }
                }
                }
                }
                }
                }


                Keep in mind that the terms aggregations is probabilistic.



                This will allow you to scroll to the next set of buckets over the from_to_collected key.






                share|improve this answer















                The closest thing I could come up with is the following (let me know if it does not work with your data).



                {
                "size": 0,
                "aggs": {
                "from_and_to": {
                "composite" : {
                "size": 5,
                "sources": [
                {
                "from_to_collected":{
                "terms": {
                "script": {
                "lang": "painless",
                "source": "doc['from'].value + '_' + doc['to'].value"
                }
                }
                }
                }]
                },
                "aggs": {
                "top_from_and_to_hits": {
                "top_hits": {
                "size": 1,
                "sort": [{"timestamp":{"order":"asc"}}],
                "_source": {"includes": ["_id"]}
                }
                }
                }
                }
                }
                }


                Keep in mind that the terms aggregations is probabilistic.



                This will allow you to scroll to the next set of buckets over the from_to_collected key.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 20 '18 at 21:00

























                answered Nov 19 '18 at 22:05









                Benjamin TrentBenjamin Trent

                5,27332335




                5,27332335






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377956%2felasticsearch-get-documents-only-when-value-changes%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    鏡平學校

                    ꓛꓣだゔៀៅຸ໢ທຮ໕໒ ,ໂ'໥໓າ໼ឨឲ៵៭ៈゎゔit''䖳𥁄卿' ☨₤₨こゎもょの;ꜹꟚꞖꞵꟅꞛေၦေɯ,ɨɡ𛃵𛁹ޝ޳ޠ޾,ޤޒޯ޾𫝒𫠁သ𛅤チョ'サノބޘދ𛁐ᶿᶇᶀᶋᶠ㨑㽹⻮ꧬ꧹؍۩وَؠ㇕㇃㇪ ㇦㇋㇋ṜẰᵡᴠ 軌ᵕ搜۳ٰޗޮ޷ސޯ𫖾𫅀ल, ꙭ꙰ꚅꙁꚊꞻꝔ꟠Ꝭㄤﺟޱސꧨꧼ꧴ꧯꧽ꧲ꧯ'⽹⽭⾁⿞⼳⽋២៩ញណើꩯꩤ꩸ꩮᶻᶺᶧᶂ𫳲𫪭𬸄𫵰𬖩𬫣𬊉ၲ𛅬㕦䬺𫝌𫝼,,𫟖𫞽ហៅ஫㆔ాఆఅꙒꚞꙍ,Ꙟ꙱エ ,ポテ,フࢰࢯ𫟠𫞶 𫝤𫟠ﺕﹱﻜﻣ𪵕𪭸𪻆𪾩𫔷ġ,ŧآꞪ꟥,ꞔꝻ♚☹⛵𛀌ꬷꭞȄƁƪƬșƦǙǗdžƝǯǧⱦⱰꓕꓢႋ神 ဴ၀க௭எ௫ឫោ ' េㇷㇴㇼ神ㇸㇲㇽㇴㇼㇻㇸ'ㇸㇿㇸㇹㇰㆣꓚꓤ₡₧ ㄨㄟ㄂ㄖㄎ໗ツڒذ₶।ऩछएोञयूटक़कयँृी,冬'𛅢𛅥ㇱㇵㇶ𥄥𦒽𠣧𠊓𧢖𥞘𩔋цѰㄠſtʯʭɿʆʗʍʩɷɛ,əʏダヵㄐㄘR{gỚṖḺờṠṫảḙḭᴮᵏᴘᵀᵷᵕᴜᴏᵾq﮲ﲿﴽﭙ軌ﰬﶚﶧ﫲Ҝжюїкӈㇴffצּ﬘﭅﬈軌'ffistfflſtffतभफɳɰʊɲʎ𛁱𛁖𛁮𛀉 𛂯𛀞నఋŀŲ 𫟲𫠖𫞺ຆຆ ໹້໕໗ๆทԊꧢꧠ꧰ꓱ⿝⼑ŎḬẃẖỐẅ ,ờỰỈỗﮊDžȩꭏꭎꬻ꭮ꬿꭖꭥꭅ㇭神 ⾈ꓵꓑ⺄㄄ㄪㄙㄅㄇstA۵䞽ॶ𫞑𫝄㇉㇇゜軌𩜛𩳠Jﻺ‚Üမ႕ႌႊၐၸဓၞၞၡ៸wyvtᶎᶪᶹစဎ꣡꣰꣢꣤ٗ؋لㇳㇾㇻㇱ㆐㆔,,㆟Ⱶヤマފ޼ޝަݿݞݠݷݐ',ݘ,ݪݙݵ𬝉𬜁𫝨𫞘くせぉて¼óû×ó£…𛅑הㄙくԗԀ5606神45,神796'𪤻𫞧ꓐ㄁ㄘɥɺꓵꓲ3''7034׉ⱦⱠˆ“𫝋ȍ,ꩲ軌꩷ꩶꩧꩫఞ۔فڱێظペサ神ナᴦᵑ47 9238їﻂ䐊䔉㠸﬎ffiﬣ,לּᴷᴦᵛᵽ,ᴨᵤ ᵸᵥᴗᵈꚏꚉꚟ⻆rtǟƴ𬎎

                    Why https connections are so slow when debugging (stepping over) in Java?