Elasticsearch: get documents only when value changes
I have an ES index with such kind of documents:
from_1,to_1,timestamp_1
from_1,to_1,timestamp_2
from_1,to_2,timestamp_3
from_2,to_3,timestamp_4
from_1,to_2,timestamp_5
from_2,to_3,timestamp_6
from_1,to_1,timestamp_7
from_2,to_4,timestamp_8
I need a query that would return a document only if its combination of from
and to
values is different than the previous seen document with the same from
value.
So with the provided sample above:
- document with
timestamp_1
should be in the result because there is no earlier document withfrom_1
+to_1
combination - document with
timestamp_2
must be skipped because itsfrom
+to
combination is exactly the same as the last seen document withfrom
=from_1
- document with
timestamp_3
should be in the result because itsto
field (to_2
) is different than the value of the last seen with the samefrom
(to_1
in document withtimestamp_1
- document with
timestamp_4
should be in the result - document with
timestamp_5
must not be in the result because it has the same combination of from+to as the last seen withfrom_1
(document withtimestamp_3
) - document with
timestamp_6
must not be in the result because it has the same combination of from+to as the last seen withfrom_2
(document withtimestamp_4
) - document with
timestamp_7
should be in the result because it has the different combination of from+to to the last seen withfrom_1
(document withtimestamp_3
) - document with
timestamp_8
should be in the result because its combination is completely new so far
I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll
request or after_key
if an aggregation is used.
Any ideas how to approach it?
elasticsearch
add a comment |
I have an ES index with such kind of documents:
from_1,to_1,timestamp_1
from_1,to_1,timestamp_2
from_1,to_2,timestamp_3
from_2,to_3,timestamp_4
from_1,to_2,timestamp_5
from_2,to_3,timestamp_6
from_1,to_1,timestamp_7
from_2,to_4,timestamp_8
I need a query that would return a document only if its combination of from
and to
values is different than the previous seen document with the same from
value.
So with the provided sample above:
- document with
timestamp_1
should be in the result because there is no earlier document withfrom_1
+to_1
combination - document with
timestamp_2
must be skipped because itsfrom
+to
combination is exactly the same as the last seen document withfrom
=from_1
- document with
timestamp_3
should be in the result because itsto
field (to_2
) is different than the value of the last seen with the samefrom
(to_1
in document withtimestamp_1
- document with
timestamp_4
should be in the result - document with
timestamp_5
must not be in the result because it has the same combination of from+to as the last seen withfrom_1
(document withtimestamp_3
) - document with
timestamp_6
must not be in the result because it has the same combination of from+to as the last seen withfrom_2
(document withtimestamp_4
) - document with
timestamp_7
should be in the result because it has the different combination of from+to to the last seen withfrom_1
(document withtimestamp_3
) - document with
timestamp_8
should be in the result because its combination is completely new so far
I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll
request or after_key
if an aggregation is used.
Any ideas how to approach it?
elasticsearch
add a comment |
I have an ES index with such kind of documents:
from_1,to_1,timestamp_1
from_1,to_1,timestamp_2
from_1,to_2,timestamp_3
from_2,to_3,timestamp_4
from_1,to_2,timestamp_5
from_2,to_3,timestamp_6
from_1,to_1,timestamp_7
from_2,to_4,timestamp_8
I need a query that would return a document only if its combination of from
and to
values is different than the previous seen document with the same from
value.
So with the provided sample above:
- document with
timestamp_1
should be in the result because there is no earlier document withfrom_1
+to_1
combination - document with
timestamp_2
must be skipped because itsfrom
+to
combination is exactly the same as the last seen document withfrom
=from_1
- document with
timestamp_3
should be in the result because itsto
field (to_2
) is different than the value of the last seen with the samefrom
(to_1
in document withtimestamp_1
- document with
timestamp_4
should be in the result - document with
timestamp_5
must not be in the result because it has the same combination of from+to as the last seen withfrom_1
(document withtimestamp_3
) - document with
timestamp_6
must not be in the result because it has the same combination of from+to as the last seen withfrom_2
(document withtimestamp_4
) - document with
timestamp_7
should be in the result because it has the different combination of from+to to the last seen withfrom_1
(document withtimestamp_3
) - document with
timestamp_8
should be in the result because its combination is completely new so far
I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll
request or after_key
if an aggregation is used.
Any ideas how to approach it?
elasticsearch
I have an ES index with such kind of documents:
from_1,to_1,timestamp_1
from_1,to_1,timestamp_2
from_1,to_2,timestamp_3
from_2,to_3,timestamp_4
from_1,to_2,timestamp_5
from_2,to_3,timestamp_6
from_1,to_1,timestamp_7
from_2,to_4,timestamp_8
I need a query that would return a document only if its combination of from
and to
values is different than the previous seen document with the same from
value.
So with the provided sample above:
- document with
timestamp_1
should be in the result because there is no earlier document withfrom_1
+to_1
combination - document with
timestamp_2
must be skipped because itsfrom
+to
combination is exactly the same as the last seen document withfrom
=from_1
- document with
timestamp_3
should be in the result because itsto
field (to_2
) is different than the value of the last seen with the samefrom
(to_1
in document withtimestamp_1
- document with
timestamp_4
should be in the result - document with
timestamp_5
must not be in the result because it has the same combination of from+to as the last seen withfrom_1
(document withtimestamp_3
) - document with
timestamp_6
must not be in the result because it has the same combination of from+to as the last seen withfrom_2
(document withtimestamp_4
) - document with
timestamp_7
should be in the result because it has the different combination of from+to to the last seen withfrom_1
(document withtimestamp_3
) - document with
timestamp_8
should be in the result because its combination is completely new so far
I need to fetch all such "semi-unique" documents from the index, so it would be nice if it possible to use scroll
request or after_key
if an aggregation is used.
Any ideas how to approach it?
elasticsearch
elasticsearch
asked Nov 19 '18 at 15:35
martin-gmartin-g
12.1k1926
12.1k1926
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The closest thing I could come up with is the following (let me know if it does not work with your data).
{
"size": 0,
"aggs": {
"from_and_to": {
"composite" : {
"size": 5,
"sources": [
{
"from_to_collected":{
"terms": {
"script": {
"lang": "painless",
"source": "doc['from'].value + '_' + doc['to'].value"
}
}
}
}]
},
"aggs": {
"top_from_and_to_hits": {
"top_hits": {
"size": 1,
"sort": [{"timestamp":{"order":"asc"}}],
"_source": {"includes": ["_id"]}
}
}
}
}
}
}
Keep in mind that the terms
aggregations is probabilistic.
This will allow you to scroll to the next set of buckets over the from_to_collected
key.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377956%2felasticsearch-get-documents-only-when-value-changes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The closest thing I could come up with is the following (let me know if it does not work with your data).
{
"size": 0,
"aggs": {
"from_and_to": {
"composite" : {
"size": 5,
"sources": [
{
"from_to_collected":{
"terms": {
"script": {
"lang": "painless",
"source": "doc['from'].value + '_' + doc['to'].value"
}
}
}
}]
},
"aggs": {
"top_from_and_to_hits": {
"top_hits": {
"size": 1,
"sort": [{"timestamp":{"order":"asc"}}],
"_source": {"includes": ["_id"]}
}
}
}
}
}
}
Keep in mind that the terms
aggregations is probabilistic.
This will allow you to scroll to the next set of buckets over the from_to_collected
key.
add a comment |
The closest thing I could come up with is the following (let me know if it does not work with your data).
{
"size": 0,
"aggs": {
"from_and_to": {
"composite" : {
"size": 5,
"sources": [
{
"from_to_collected":{
"terms": {
"script": {
"lang": "painless",
"source": "doc['from'].value + '_' + doc['to'].value"
}
}
}
}]
},
"aggs": {
"top_from_and_to_hits": {
"top_hits": {
"size": 1,
"sort": [{"timestamp":{"order":"asc"}}],
"_source": {"includes": ["_id"]}
}
}
}
}
}
}
Keep in mind that the terms
aggregations is probabilistic.
This will allow you to scroll to the next set of buckets over the from_to_collected
key.
add a comment |
The closest thing I could come up with is the following (let me know if it does not work with your data).
{
"size": 0,
"aggs": {
"from_and_to": {
"composite" : {
"size": 5,
"sources": [
{
"from_to_collected":{
"terms": {
"script": {
"lang": "painless",
"source": "doc['from'].value + '_' + doc['to'].value"
}
}
}
}]
},
"aggs": {
"top_from_and_to_hits": {
"top_hits": {
"size": 1,
"sort": [{"timestamp":{"order":"asc"}}],
"_source": {"includes": ["_id"]}
}
}
}
}
}
}
Keep in mind that the terms
aggregations is probabilistic.
This will allow you to scroll to the next set of buckets over the from_to_collected
key.
The closest thing I could come up with is the following (let me know if it does not work with your data).
{
"size": 0,
"aggs": {
"from_and_to": {
"composite" : {
"size": 5,
"sources": [
{
"from_to_collected":{
"terms": {
"script": {
"lang": "painless",
"source": "doc['from'].value + '_' + doc['to'].value"
}
}
}
}]
},
"aggs": {
"top_from_and_to_hits": {
"top_hits": {
"size": 1,
"sort": [{"timestamp":{"order":"asc"}}],
"_source": {"includes": ["_id"]}
}
}
}
}
}
}
Keep in mind that the terms
aggregations is probabilistic.
This will allow you to scroll to the next set of buckets over the from_to_collected
key.
edited Nov 20 '18 at 21:00
answered Nov 19 '18 at 22:05
Benjamin TrentBenjamin Trent
5,27332335
5,27332335
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377956%2felasticsearch-get-documents-only-when-value-changes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown