Elasticsearch: get documents only when value changes

I have an ES index with such kind of documents:

from_1,to_1,timestamp_1

from_1,to_1,timestamp_2

from_1,to_2,timestamp_3

from_2,to_3,timestamp_4

from_1,to_2,timestamp_5

from_2,to_3,timestamp_6

from_1,to_1,timestamp_7

from_2,to_4,timestamp_8

I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.

So with the provided sample above:

document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

document with timestamp_4 should be in the result

document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

document with timestamp_8 should be in the result because its combination is completely new so far

I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.

Any ideas how to approach it?

asked Nov 19 '18 at 15:35

martin-g

12.1k1926

add a comment |

I have an ES index with such kind of documents:

from_1,to_1,timestamp_1

from_1,to_1,timestamp_2

from_1,to_2,timestamp_3

from_2,to_3,timestamp_4

from_1,to_2,timestamp_5

from_2,to_3,timestamp_6

from_1,to_1,timestamp_7

from_2,to_4,timestamp_8

I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.

So with the provided sample above:

document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

document with timestamp_4 should be in the result

document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

document with timestamp_8 should be in the result because its combination is completely new so far

I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.

Any ideas how to approach it?

asked Nov 19 '18 at 15:35

martin-g

12.1k1926

add a comment |

I have an ES index with such kind of documents:

from_1,to_1,timestamp_1

from_1,to_1,timestamp_2

from_1,to_2,timestamp_3

from_2,to_3,timestamp_4

from_1,to_2,timestamp_5

from_2,to_3,timestamp_6

from_1,to_1,timestamp_7

from_2,to_4,timestamp_8

I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.

So with the provided sample above:

document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

document with timestamp_4 should be in the result

document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

document with timestamp_8 should be in the result because its combination is completely new so far

I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.

Any ideas how to approach it?

asked Nov 19 '18 at 15:35

martin-g

12.1k1926

I have an ES index with such kind of documents:

from_1,to_1,timestamp_1

from_1,to_1,timestamp_2

from_1,to_2,timestamp_3

from_2,to_3,timestamp_4

from_1,to_2,timestamp_5

from_2,to_3,timestamp_6

from_1,to_1,timestamp_7

from_2,to_4,timestamp_8

I need a query that would return a document only if its combination of from and to values is different than the previous seen document with the same from value.

So with the provided sample above:

document with timestamp_1 should be in the result because there is no earlier document with from_1+to_1 combination

document with timestamp_2 must be skipped because its from+to combination is exactly the same as the last seen document with from = from_1

document with timestamp_3 should be in the result because its to field (to_2) is different than the value of the last seen with the same from (to_1 in document with timestamp_1

document with timestamp_4 should be in the result

document with timestamp_5 must not be in the result because it has the same combination of from+to as the last seen with from_1 (document with timestamp_3)

document with timestamp_6 must not be in the result because it has the same combination of from+to as the last seen with from_2 (document with timestamp_4)

document with timestamp_7 should be in the result because it has the different combination of from+to to the last seen with from_1 (document with timestamp_3)

document with timestamp_8 should be in the result because its combination is completely new so far

I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll request or after_key if an aggregation is used.

Any ideas how to approach it?

elasticsearch

asked Nov 19 '18 at 15:35

martin-g

12.1k1926

asked Nov 19 '18 at 15:35

martin-g

12.1k1926

asked Nov 19 '18 at 15:35

martin-g

12.1k1926

asked Nov 19 '18 at 15:35

martin-g

12.1k1926

asked Nov 19 '18 at 15:35

martin-g

12.1k1926

add a comment |

1 Answer
1

active

oldest

votes

The closest thing I could come up with is the following (let me know if it does not work with your data).

{

  "size": 0,

  "aggs": {

    "from_and_to": {

      "composite" : {

        "size": 5,

        "sources": [

          {

            "from_to_collected":{

              "terms": {

                "script": {

                  "lang": "painless",

                  "source": "doc['from'].value + '_' + doc['to'].value"

                }

              }

            }

          }]

      },

      "aggs": {

        "top_from_and_to_hits": {

          "top_hits": {

            "size": 1,

            "sort": [{"timestamp":{"order":"asc"}}],

            "_source": {"includes": ["_id"]}

          }

        }

      }

    }

  }

}

Keep in mind that the terms aggregations is probabilistic.

This will allow you to scroll to the next set of buckets over the from_to_collected key.

edited Nov 20 '18 at 21:00

answered Nov 19 '18 at 22:05

Benjamin Trent

5,27332335

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377956%2felasticsearch-get-documents-only-when-value-changes%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The closest thing I could come up with is the following (let me know if it does not work with your data).

{

  "size": 0,

  "aggs": {

    "from_and_to": {

      "composite" : {

        "size": 5,

        "sources": [

          {

            "from_to_collected":{

              "terms": {

                "script": {

                  "lang": "painless",

                  "source": "doc['from'].value + '_' + doc['to'].value"

                }

              }

            }

          }]

      },

      "aggs": {

        "top_from_and_to_hits": {

          "top_hits": {

            "size": 1,

            "sort": [{"timestamp":{"order":"asc"}}],

            "_source": {"includes": ["_id"]}

          }

        }

      }

    }

  }

}

Keep in mind that the terms aggregations is probabilistic.

This will allow you to scroll to the next set of buckets over the from_to_collected key.

edited Nov 20 '18 at 21:00

answered Nov 19 '18 at 22:05

Benjamin Trent

5,27332335

add a comment |

The closest thing I could come up with is the following (let me know if it does not work with your data).

{

  "size": 0,

  "aggs": {

    "from_and_to": {

      "composite" : {

        "size": 5,

        "sources": [

          {

            "from_to_collected":{

              "terms": {

                "script": {

                  "lang": "painless",

                  "source": "doc['from'].value + '_' + doc['to'].value"

                }

              }

            }

          }]

      },

      "aggs": {

        "top_from_and_to_hits": {

          "top_hits": {

            "size": 1,

            "sort": [{"timestamp":{"order":"asc"}}],

            "_source": {"includes": ["_id"]}

          }

        }

      }

    }

  }

}

Keep in mind that the terms aggregations is probabilistic.

This will allow you to scroll to the next set of buckets over the from_to_collected key.

edited Nov 20 '18 at 21:00

answered Nov 19 '18 at 22:05

Benjamin Trent

5,27332335

add a comment |

The closest thing I could come up with is the following (let me know if it does not work with your data).

{

  "size": 0,

  "aggs": {

    "from_and_to": {

      "composite" : {

        "size": 5,

        "sources": [

          {

            "from_to_collected":{

              "terms": {

                "script": {

                  "lang": "painless",

                  "source": "doc['from'].value + '_' + doc['to'].value"

                }

              }

            }

          }]

      },

      "aggs": {

        "top_from_and_to_hits": {

          "top_hits": {

            "size": 1,

            "sort": [{"timestamp":{"order":"asc"}}],

            "_source": {"includes": ["_id"]}

          }

        }

      }

    }

  }

}

Keep in mind that the terms aggregations is probabilistic.

This will allow you to scroll to the next set of buckets over the from_to_collected key.

edited Nov 20 '18 at 21:00

answered Nov 19 '18 at 22:05

Benjamin Trent

5,27332335

The closest thing I could come up with is the following (let me know if it does not work with your data).

{

  "size": 0,

  "aggs": {

    "from_and_to": {

      "composite" : {

        "size": 5,

        "sources": [

          {

            "from_to_collected":{

              "terms": {

                "script": {

                  "lang": "painless",

                  "source": "doc['from'].value + '_' + doc['to'].value"

                }

              }

            }

          }]

      },

      "aggs": {

        "top_from_and_to_hits": {

          "top_hits": {

            "size": 1,

            "sort": [{"timestamp":{"order":"asc"}}],

            "_source": {"includes": ["_id"]}

          }

        }

      }

    }

  }

}

Keep in mind that the terms aggregations is probabilistic.

This will allow you to scroll to the next set of buckets over the from_to_collected key.

edited Nov 20 '18 at 21:00

answered Nov 19 '18 at 22:05

Benjamin Trent

5,27332335

edited Nov 20 '18 at 21:00

answered Nov 19 '18 at 22:05

Benjamin Trent

5,27332335

answered Nov 19 '18 at 22:05

Benjamin Trent

5,27332335

answered Nov 19 '18 at 22:05

Benjamin Trent

5,27332335

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk