Elasticsearch: get documents only when value changes












0















I have an ES index with such kind of documents:



from_1,to_1,timestamp_1
from_1,to_1,timestamp_2
from_1,to_2,timestamp_3
from_2,to_3,timestamp_4
from_1,to_2,timestamp_5
from_2,to_3,timestamp_6
from_1,to_1,timestamp_7
from_2,to_4,timestamp_8


I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.



So with the provided sample above:




  1. document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

  2. document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

  3. document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

  4. document with timestamp_4 should be in the result

  5. document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

  6. document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

  7. document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

  8. document with timestamp_8 should be in the result because its combination is completely new so far


I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.



Any ideas how to approach it?










share|improve this question



























    0















    I have an ES index with such kind of documents:



    from_1,to_1,timestamp_1
    from_1,to_1,timestamp_2
    from_1,to_2,timestamp_3
    from_2,to_3,timestamp_4
    from_1,to_2,timestamp_5
    from_2,to_3,timestamp_6
    from_1,to_1,timestamp_7
    from_2,to_4,timestamp_8


    I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.



    So with the provided sample above:




    1. document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

    2. document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

    3. document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

    4. document with timestamp_4 should be in the result

    5. document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

    6. document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

    7. document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

    8. document with timestamp_8 should be in the result because its combination is completely new so far


    I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.



    Any ideas how to approach it?










    share|improve this question

























      0












      0








      0








      I have an ES index with such kind of documents:



      from_1,to_1,timestamp_1
      from_1,to_1,timestamp_2
      from_1,to_2,timestamp_3
      from_2,to_3,timestamp_4
      from_1,to_2,timestamp_5
      from_2,to_3,timestamp_6
      from_1,to_1,timestamp_7
      from_2,to_4,timestamp_8


      I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.



      So with the provided sample above:




      1. document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

      2. document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

      3. document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

      4. document with timestamp_4 should be in the result

      5. document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

      6. document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

      7. document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

      8. document with timestamp_8 should be in the result because its combination is completely new so far


      I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.



      Any ideas how to approach it?










      share|improve this question














      I have an ES index with such kind of documents:



      from_1,to_1,timestamp_1
      from_1,to_1,timestamp_2
      from_1,to_2,timestamp_3
      from_2,to_3,timestamp_4
      from_1,to_2,timestamp_5
      from_2,to_3,timestamp_6
      from_1,to_1,timestamp_7
      from_2,to_4,timestamp_8


      I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.



      So with the provided sample above:




      1. document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

      2. document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

      3. document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

      4. document with timestamp_4 should be in the result

      5. document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

      6. document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

      7. document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

      8. document with timestamp_8 should be in the result because its combination is completely new so far


      I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.



      Any ideas how to approach it?







      elasticsearch






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 19 '18 at 15:35









      martin-gmartin-g

      12.1k1926




      12.1k1926
























          1 Answer
          1






          active

          oldest

          votes


















          1














          The closest thing I could come up with is the following (let me know if it does not work with your data).



          {
          "size": 0,
          "aggs": {
          "from_and_to": {
          "composite" : {
          "size": 5,
          "sources": [
          {
          "from_to_collected":{
          "terms": {
          "script": {
          "lang": "painless",
          "source": "doc['from'].value + '_' + doc['to'].value"
          }
          }
          }
          }]
          },
          "aggs": {
          "top_from_and_to_hits": {
          "top_hits": {
          "size": 1,
          "sort": [{"timestamp":{"order":"asc"}}],
          "_source": {"includes": ["_id"]}
          }
          }
          }
          }
          }
          }


          Keep in mind that the terms aggregations is probabilistic.



          This will allow you to scroll to the next set of buckets over the from_to_collected key.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377956%2felasticsearch-get-documents-only-when-value-changes%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            The closest thing I could come up with is the following (let me know if it does not work with your data).



            {
            "size": 0,
            "aggs": {
            "from_and_to": {
            "composite" : {
            "size": 5,
            "sources": [
            {
            "from_to_collected":{
            "terms": {
            "script": {
            "lang": "painless",
            "source": "doc['from'].value + '_' + doc['to'].value"
            }
            }
            }
            }]
            },
            "aggs": {
            "top_from_and_to_hits": {
            "top_hits": {
            "size": 1,
            "sort": [{"timestamp":{"order":"asc"}}],
            "_source": {"includes": ["_id"]}
            }
            }
            }
            }
            }
            }


            Keep in mind that the terms aggregations is probabilistic.



            This will allow you to scroll to the next set of buckets over the from_to_collected key.






            share|improve this answer






























              1














              The closest thing I could come up with is the following (let me know if it does not work with your data).



              {
              "size": 0,
              "aggs": {
              "from_and_to": {
              "composite" : {
              "size": 5,
              "sources": [
              {
              "from_to_collected":{
              "terms": {
              "script": {
              "lang": "painless",
              "source": "doc['from'].value + '_' + doc['to'].value"
              }
              }
              }
              }]
              },
              "aggs": {
              "top_from_and_to_hits": {
              "top_hits": {
              "size": 1,
              "sort": [{"timestamp":{"order":"asc"}}],
              "_source": {"includes": ["_id"]}
              }
              }
              }
              }
              }
              }


              Keep in mind that the terms aggregations is probabilistic.



              This will allow you to scroll to the next set of buckets over the from_to_collected key.






              share|improve this answer




























                1












                1








                1







                The closest thing I could come up with is the following (let me know if it does not work with your data).



                {
                "size": 0,
                "aggs": {
                "from_and_to": {
                "composite" : {
                "size": 5,
                "sources": [
                {
                "from_to_collected":{
                "terms": {
                "script": {
                "lang": "painless",
                "source": "doc['from'].value + '_' + doc['to'].value"
                }
                }
                }
                }]
                },
                "aggs": {
                "top_from_and_to_hits": {
                "top_hits": {
                "size": 1,
                "sort": [{"timestamp":{"order":"asc"}}],
                "_source": {"includes": ["_id"]}
                }
                }
                }
                }
                }
                }


                Keep in mind that the terms aggregations is probabilistic.



                This will allow you to scroll to the next set of buckets over the from_to_collected key.






                share|improve this answer















                The closest thing I could come up with is the following (let me know if it does not work with your data).



                {
                "size": 0,
                "aggs": {
                "from_and_to": {
                "composite" : {
                "size": 5,
                "sources": [
                {
                "from_to_collected":{
                "terms": {
                "script": {
                "lang": "painless",
                "source": "doc['from'].value + '_' + doc['to'].value"
                }
                }
                }
                }]
                },
                "aggs": {
                "top_from_and_to_hits": {
                "top_hits": {
                "size": 1,
                "sort": [{"timestamp":{"order":"asc"}}],
                "_source": {"includes": ["_id"]}
                }
                }
                }
                }
                }
                }


                Keep in mind that the terms aggregations is probabilistic.



                This will allow you to scroll to the next set of buckets over the from_to_collected key.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 20 '18 at 21:00

























                answered Nov 19 '18 at 22:05









                Benjamin TrentBenjamin Trent

                5,27332335




                5,27332335






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377956%2felasticsearch-get-documents-only-when-value-changes%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Guess what letter conforming each word

                    Port of Spain

                    Run scheduled task as local user group (not BUILTIN)