Efficient way to cluster a timeline OR reconstruct a batch number
I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.
I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.
Considering the given subselect of the whole table:
id tBegin orderId
------------------------------------
1 2018-10-20 00:00:05 1
2 2018-10-20 00:05:15 1
3 2018-10-20 01:00:05 1
10 2018-10-20 10:03:05 3
12 2018-10-20 11:04:05 8
20 2018-10-20 14:15:05 3
37 2018-10-20 18:12:05 1
My goal is it to cluster the data to the following
id tBegin orderId pCount
--------------------------------------------
1 2018-10-20 00:00:05 1 3
10 2018-10-20 10:03:05 3 1
12 2018-10-20 11:04:05 8 1
20 2018-10-20 14:15:05 3 1
37 2018-10-20 18:12:05 1 1
A simple GROUP BY orderID
won't do the trick, so I came upwith the following
SELECT
MIN(c.id) AS id,
MIN(c.tBegin) AS tBegin,
c.orderId,
COUNT(*) AS pCount
FROM (
SELECT t2.id, t2.tBegin, t2.orderId,
( SELECT TOP 1 t.id
FROM history t
WHERE t.tBegin > t2.tBegin
AND t.orderID <> t2.orderID
AND <restrict date here further>
ORDER BY t.tBegin
) AS nextId
FROM history t2
) AS c
WHERE <restrict date here>
GROUP BY c.orderID, c.nextId
I left out the WHERE
s that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.
I appreciate your help very much!
sql sql-server datetime group-by sql-server-2017
add a comment |
I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.
I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.
Considering the given subselect of the whole table:
id tBegin orderId
------------------------------------
1 2018-10-20 00:00:05 1
2 2018-10-20 00:05:15 1
3 2018-10-20 01:00:05 1
10 2018-10-20 10:03:05 3
12 2018-10-20 11:04:05 8
20 2018-10-20 14:15:05 3
37 2018-10-20 18:12:05 1
My goal is it to cluster the data to the following
id tBegin orderId pCount
--------------------------------------------
1 2018-10-20 00:00:05 1 3
10 2018-10-20 10:03:05 3 1
12 2018-10-20 11:04:05 8 1
20 2018-10-20 14:15:05 3 1
37 2018-10-20 18:12:05 1 1
A simple GROUP BY orderID
won't do the trick, so I came upwith the following
SELECT
MIN(c.id) AS id,
MIN(c.tBegin) AS tBegin,
c.orderId,
COUNT(*) AS pCount
FROM (
SELECT t2.id, t2.tBegin, t2.orderId,
( SELECT TOP 1 t.id
FROM history t
WHERE t.tBegin > t2.tBegin
AND t.orderID <> t2.orderID
AND <restrict date here further>
ORDER BY t.tBegin
) AS nextId
FROM history t2
) AS c
WHERE <restrict date here>
GROUP BY c.orderID, c.nextId
I left out the WHERE
s that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.
I appreciate your help very much!
sql sql-server datetime group-by sql-server-2017
add a comment |
I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.
I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.
Considering the given subselect of the whole table:
id tBegin orderId
------------------------------------
1 2018-10-20 00:00:05 1
2 2018-10-20 00:05:15 1
3 2018-10-20 01:00:05 1
10 2018-10-20 10:03:05 3
12 2018-10-20 11:04:05 8
20 2018-10-20 14:15:05 3
37 2018-10-20 18:12:05 1
My goal is it to cluster the data to the following
id tBegin orderId pCount
--------------------------------------------
1 2018-10-20 00:00:05 1 3
10 2018-10-20 10:03:05 3 1
12 2018-10-20 11:04:05 8 1
20 2018-10-20 14:15:05 3 1
37 2018-10-20 18:12:05 1 1
A simple GROUP BY orderID
won't do the trick, so I came upwith the following
SELECT
MIN(c.id) AS id,
MIN(c.tBegin) AS tBegin,
c.orderId,
COUNT(*) AS pCount
FROM (
SELECT t2.id, t2.tBegin, t2.orderId,
( SELECT TOP 1 t.id
FROM history t
WHERE t.tBegin > t2.tBegin
AND t.orderID <> t2.orderID
AND <restrict date here further>
ORDER BY t.tBegin
) AS nextId
FROM history t2
) AS c
WHERE <restrict date here>
GROUP BY c.orderID, c.nextId
I left out the WHERE
s that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.
I appreciate your help very much!
sql sql-server datetime group-by sql-server-2017
I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.
I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.
Considering the given subselect of the whole table:
id tBegin orderId
------------------------------------
1 2018-10-20 00:00:05 1
2 2018-10-20 00:05:15 1
3 2018-10-20 01:00:05 1
10 2018-10-20 10:03:05 3
12 2018-10-20 11:04:05 8
20 2018-10-20 14:15:05 3
37 2018-10-20 18:12:05 1
My goal is it to cluster the data to the following
id tBegin orderId pCount
--------------------------------------------
1 2018-10-20 00:00:05 1 3
10 2018-10-20 10:03:05 3 1
12 2018-10-20 11:04:05 8 1
20 2018-10-20 14:15:05 3 1
37 2018-10-20 18:12:05 1 1
A simple GROUP BY orderID
won't do the trick, so I came upwith the following
SELECT
MIN(c.id) AS id,
MIN(c.tBegin) AS tBegin,
c.orderId,
COUNT(*) AS pCount
FROM (
SELECT t2.id, t2.tBegin, t2.orderId,
( SELECT TOP 1 t.id
FROM history t
WHERE t.tBegin > t2.tBegin
AND t.orderID <> t2.orderID
AND <restrict date here further>
ORDER BY t.tBegin
) AS nextId
FROM history t2
) AS c
WHERE <restrict date here>
GROUP BY c.orderID, c.nextId
I left out the WHERE
s that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.
I appreciate your help very much!
sql sql-server datetime group-by sql-server-2017
sql sql-server datetime group-by sql-server-2017
edited Dec 4 '18 at 10:25
Salman A
178k66338430
178k66338430
asked Nov 19 '18 at 10:25
John BartJohn Bart
155
155
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
You can use window functions for this:
DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO @t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM @t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
- The first cte assigns a "change flag" to each row where the value changed
- The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
- Finally you number rows within each group and select first row per group
Demo on DB Fiddle
add a comment |
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372583%2fefficient-way-to-cluster-a-timeline-or-reconstruct-a-batch-number%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use window functions for this:
DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO @t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM @t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
- The first cte assigns a "change flag" to each row where the value changed
- The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
- Finally you number rows within each group and select first row per group
Demo on DB Fiddle
add a comment |
You can use window functions for this:
DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO @t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM @t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
- The first cte assigns a "change flag" to each row where the value changed
- The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
- Finally you number rows within each group and select first row per group
Demo on DB Fiddle
add a comment |
You can use window functions for this:
DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO @t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM @t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
- The first cte assigns a "change flag" to each row where the value changed
- The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
- Finally you number rows within each group and select first row per group
Demo on DB Fiddle
You can use window functions for this:
DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO @t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM @t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
- The first cte assigns a "change flag" to each row where the value changed
- The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
- Finally you number rows within each group and select first row per group
Demo on DB Fiddle
edited Nov 19 '18 at 10:40
answered Nov 19 '18 at 10:35
Salman ASalman A
178k66338430
178k66338430
add a comment |
add a comment |
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;
add a comment |
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;
add a comment |
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;
answered Nov 19 '18 at 10:46
Yogesh SharmaYogesh Sharma
30.1k51436
30.1k51436
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372583%2fefficient-way-to-cluster-a-timeline-or-reconstruct-a-batch-number%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown