Efficient way to cluster a timeline OR reconstruct a batch number

I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.

I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.

Considering the given subselect of the whole table:

 id   tBegin                orderId   

------------------------------------

 1    2018-10-20 00:00:05   1

 2    2018-10-20 00:05:15   1

 3    2018-10-20 01:00:05   1

 10   2018-10-20 10:03:05   3

 12   2018-10-20 11:04:05   8

 20   2018-10-20 14:15:05   3

 37   2018-10-20 18:12:05   1

My goal is it to cluster the data to the following

 id   tBegin                orderId   pCount 

--------------------------------------------

 1    2018-10-20 00:00:05   1         3

 10   2018-10-20 10:03:05   3         1

 12   2018-10-20 11:04:05   8         1

 20   2018-10-20 14:15:05   3         1

 37   2018-10-20 18:12:05   1         1

A simple GROUP BY orderID won't do the trick, so I came upwith the following

SELECT 

  MIN(c.id) AS id,

  MIN(c.tBegin) AS tBegin,

  c.orderId,

  COUNT(*) AS pCount

FROM (

    SELECT t2.id, t2.tBegin, t2.orderId,

      ( SELECT TOP 1 t.id

        FROM history t

        WHERE t.tBegin > t2.tBegin

          AND t.orderID <> t2.orderID

          AND <restrict date here further>

        ORDER BY t.tBegin 

       ) AS nextId

    FROM history t2 

) AS c

WHERE <restrict date here>

GROUP BY c.orderID, c.nextId

I left out the WHEREs that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.

I appreciate your help very much!

edited Dec 4 '18 at 10:25

Salman A

178k66338430

asked Nov 19 '18 at 10:25

John Bart

155

add a comment |

I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.

I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.

Considering the given subselect of the whole table:

 id   tBegin                orderId   

------------------------------------

 1    2018-10-20 00:00:05   1

 2    2018-10-20 00:05:15   1

 3    2018-10-20 01:00:05   1

 10   2018-10-20 10:03:05   3

 12   2018-10-20 11:04:05   8

 20   2018-10-20 14:15:05   3

 37   2018-10-20 18:12:05   1

My goal is it to cluster the data to the following

 id   tBegin                orderId   pCount 

--------------------------------------------

 1    2018-10-20 00:00:05   1         3

 10   2018-10-20 10:03:05   3         1

 12   2018-10-20 11:04:05   8         1

 20   2018-10-20 14:15:05   3         1

 37   2018-10-20 18:12:05   1         1

A simple GROUP BY orderID won't do the trick, so I came upwith the following

SELECT 

  MIN(c.id) AS id,

  MIN(c.tBegin) AS tBegin,

  c.orderId,

  COUNT(*) AS pCount

FROM (

    SELECT t2.id, t2.tBegin, t2.orderId,

      ( SELECT TOP 1 t.id

        FROM history t

        WHERE t.tBegin > t2.tBegin

          AND t.orderID <> t2.orderID

          AND <restrict date here further>

        ORDER BY t.tBegin 

       ) AS nextId

    FROM history t2 

) AS c

WHERE <restrict date here>

GROUP BY c.orderID, c.nextId

I left out the WHEREs that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.

I appreciate your help very much!

edited Dec 4 '18 at 10:25

Salman A

178k66338430

asked Nov 19 '18 at 10:25

John Bart

155

add a comment |

I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.

I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.

Considering the given subselect of the whole table:

 id   tBegin                orderId   

------------------------------------

 1    2018-10-20 00:00:05   1

 2    2018-10-20 00:05:15   1

 3    2018-10-20 01:00:05   1

 10   2018-10-20 10:03:05   3

 12   2018-10-20 11:04:05   8

 20   2018-10-20 14:15:05   3

 37   2018-10-20 18:12:05   1

My goal is it to cluster the data to the following

 id   tBegin                orderId   pCount 

--------------------------------------------

 1    2018-10-20 00:00:05   1         3

 10   2018-10-20 10:03:05   3         1

 12   2018-10-20 11:04:05   8         1

 20   2018-10-20 14:15:05   3         1

 37   2018-10-20 18:12:05   1         1

A simple GROUP BY orderID won't do the trick, so I came upwith the following

SELECT 

  MIN(c.id) AS id,

  MIN(c.tBegin) AS tBegin,

  c.orderId,

  COUNT(*) AS pCount

FROM (

    SELECT t2.id, t2.tBegin, t2.orderId,

      ( SELECT TOP 1 t.id

        FROM history t

        WHERE t.tBegin > t2.tBegin

          AND t.orderID <> t2.orderID

          AND <restrict date here further>

        ORDER BY t.tBegin 

       ) AS nextId

    FROM history t2 

) AS c

WHERE <restrict date here>

GROUP BY c.orderID, c.nextId

I left out the WHEREs that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.

I appreciate your help very much!

edited Dec 4 '18 at 10:25

Salman A

178k66338430

asked Nov 19 '18 at 10:25

John Bart

155

I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.

I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.

Considering the given subselect of the whole table:

 id   tBegin                orderId   

------------------------------------

 1    2018-10-20 00:00:05   1

 2    2018-10-20 00:05:15   1

 3    2018-10-20 01:00:05   1

 10   2018-10-20 10:03:05   3

 12   2018-10-20 11:04:05   8

 20   2018-10-20 14:15:05   3

 37   2018-10-20 18:12:05   1

My goal is it to cluster the data to the following

 id   tBegin                orderId   pCount 

--------------------------------------------

 1    2018-10-20 00:00:05   1         3

 10   2018-10-20 10:03:05   3         1

 12   2018-10-20 11:04:05   8         1

 20   2018-10-20 14:15:05   3         1

 37   2018-10-20 18:12:05   1         1

A simple GROUP BY orderID won't do the trick, so I came upwith the following

SELECT 

  MIN(c.id) AS id,

  MIN(c.tBegin) AS tBegin,

  c.orderId,

  COUNT(*) AS pCount

FROM (

    SELECT t2.id, t2.tBegin, t2.orderId,

      ( SELECT TOP 1 t.id

        FROM history t

        WHERE t.tBegin > t2.tBegin

          AND t.orderID <> t2.orderID

          AND <restrict date here further>

        ORDER BY t.tBegin 

       ) AS nextId

    FROM history t2 

) AS c

WHERE <restrict date here>

GROUP BY c.orderID, c.nextId

I left out the WHEREs that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.

I appreciate your help very much!

sql sql-server datetime group-by sql-server-2017

edited Dec 4 '18 at 10:25

Salman A

178k66338430

asked Nov 19 '18 at 10:25

John Bart

155

edited Dec 4 '18 at 10:25

Salman A

178k66338430

asked Nov 19 '18 at 10:25

John Bart

155

edited Dec 4 '18 at 10:25

Salman A

178k66338430

edited Dec 4 '18 at 10:25

Salman A

178k66338430

edited Dec 4 '18 at 10:25

Salman A

178k66338430

asked Nov 19 '18 at 10:25

John Bart

155

asked Nov 19 '18 at 10:25

John Bart

155

asked Nov 19 '18 at 10:25

John Bart

155

add a comment |

2 Answers
2

active

oldest

votes

You can use window functions for this:

DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);

INSERT INTO @t VALUES

(1 , '2018-10-20 00:00:05', 1),

(2 , '2018-10-20 00:05:15', 1),

(3 , '2018-10-20 01:00:05', 1),

(10, '2018-10-20 10:03:05', 3),

(12, '2018-10-20 11:04:05', 8),

(20, '2018-10-20 14:15:05', 3),

(37, '2018-10-20 18:12:05', 1);



WITH cte1 AS (

    SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg

    FROM @t

), cte2 AS (

    SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp

    FROM cte1

), cte3 AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn

    FROM cte2

)

SELECT *

FROM cte3

WHERE rn = 1

The first cte assigns a "change flag" to each row where the value changed

The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

Finally you number rows within each group and select first row per group

Demo on DB Fiddle

edited Nov 19 '18 at 10:40

answered Nov 19 '18 at 10:35

Salman A

178k66338430

add a comment |

You can use cumulative approach :

select min(id) as id, max(tBegin), orderid, count(*) 

from (select h.*,

             row_number() over (order by id) as seq1,

             row_number() over (partition by orderid order by id) as seq2

      from history h

     ) h

group by orderid, (seq1 - seq2)

order by id;

answered Nov 19 '18 at 10:46

Yogesh Sharma

30.1k51436

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372583%2fefficient-way-to-cluster-a-timeline-or-reconstruct-a-batch-number%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You can use window functions for this:

DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);

INSERT INTO @t VALUES

(1 , '2018-10-20 00:00:05', 1),

(2 , '2018-10-20 00:05:15', 1),

(3 , '2018-10-20 01:00:05', 1),

(10, '2018-10-20 10:03:05', 3),

(12, '2018-10-20 11:04:05', 8),

(20, '2018-10-20 14:15:05', 3),

(37, '2018-10-20 18:12:05', 1);



WITH cte1 AS (

    SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg

    FROM @t

), cte2 AS (

    SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp

    FROM cte1

), cte3 AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn

    FROM cte2

)

SELECT *

FROM cte3

WHERE rn = 1

The first cte assigns a "change flag" to each row where the value changed

The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

Finally you number rows within each group and select first row per group

Demo on DB Fiddle

edited Nov 19 '18 at 10:40

answered Nov 19 '18 at 10:35

Salman A

178k66338430

add a comment |

You can use window functions for this:

DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);

INSERT INTO @t VALUES

(1 , '2018-10-20 00:00:05', 1),

(2 , '2018-10-20 00:05:15', 1),

(3 , '2018-10-20 01:00:05', 1),

(10, '2018-10-20 10:03:05', 3),

(12, '2018-10-20 11:04:05', 8),

(20, '2018-10-20 14:15:05', 3),

(37, '2018-10-20 18:12:05', 1);



WITH cte1 AS (

    SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg

    FROM @t

), cte2 AS (

    SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp

    FROM cte1

), cte3 AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn

    FROM cte2

)

SELECT *

FROM cte3

WHERE rn = 1

The first cte assigns a "change flag" to each row where the value changed

The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

Finally you number rows within each group and select first row per group

Demo on DB Fiddle

edited Nov 19 '18 at 10:40

answered Nov 19 '18 at 10:35

Salman A

178k66338430

add a comment |

You can use window functions for this:

DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);

INSERT INTO @t VALUES

(1 , '2018-10-20 00:00:05', 1),

(2 , '2018-10-20 00:05:15', 1),

(3 , '2018-10-20 01:00:05', 1),

(10, '2018-10-20 10:03:05', 3),

(12, '2018-10-20 11:04:05', 8),

(20, '2018-10-20 14:15:05', 3),

(37, '2018-10-20 18:12:05', 1);



WITH cte1 AS (

    SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg

    FROM @t

), cte2 AS (

    SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp

    FROM cte1

), cte3 AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn

    FROM cte2

)

SELECT *

FROM cte3

WHERE rn = 1

The first cte assigns a "change flag" to each row where the value changed

The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

Finally you number rows within each group and select first row per group

Demo on DB Fiddle

edited Nov 19 '18 at 10:40

answered Nov 19 '18 at 10:35

Salman A

178k66338430

You can use window functions for this:

DECLARE @t TABLE (id INT, tBegin DATETIME, orderId INT);

INSERT INTO @t VALUES

(1 , '2018-10-20 00:00:05', 1),

(2 , '2018-10-20 00:05:15', 1),

(3 , '2018-10-20 01:00:05', 1),

(10, '2018-10-20 10:03:05', 3),

(12, '2018-10-20 11:04:05', 8),

(20, '2018-10-20 14:15:05', 3),

(37, '2018-10-20 18:12:05', 1);



WITH cte1 AS (

    SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg

    FROM @t

), cte2 AS (

    SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp

    FROM cte1

), cte3 AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn

    FROM cte2

)

SELECT *

FROM cte3

WHERE rn = 1

The first cte assigns a "change flag" to each row where the value changed

The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows

Finally you number rows within each group and select first row per group

Demo on DB Fiddle

edited Nov 19 '18 at 10:40

answered Nov 19 '18 at 10:35

Salman A

178k66338430

edited Nov 19 '18 at 10:40

answered Nov 19 '18 at 10:35

Salman A

178k66338430

answered Nov 19 '18 at 10:35

Salman A

178k66338430

answered Nov 19 '18 at 10:35

Salman A

178k66338430

add a comment |

You can use cumulative approach :

select min(id) as id, max(tBegin), orderid, count(*) 

from (select h.*,

             row_number() over (order by id) as seq1,

             row_number() over (partition by orderid order by id) as seq2

      from history h

     ) h

group by orderid, (seq1 - seq2)

order by id;

answered Nov 19 '18 at 10:46

Yogesh Sharma

30.1k51436

add a comment |

You can use cumulative approach :

select min(id) as id, max(tBegin), orderid, count(*) 

from (select h.*,

             row_number() over (order by id) as seq1,

             row_number() over (partition by orderid order by id) as seq2

      from history h

     ) h

group by orderid, (seq1 - seq2)

order by id;

answered Nov 19 '18 at 10:46

Yogesh Sharma

30.1k51436

add a comment |

You can use cumulative approach :

select min(id) as id, max(tBegin), orderid, count(*) 

from (select h.*,

             row_number() over (order by id) as seq1,

             row_number() over (partition by orderid order by id) as seq2

      from history h

     ) h

group by orderid, (seq1 - seq2)

order by id;

answered Nov 19 '18 at 10:46

Yogesh Sharma

30.1k51436

You can use cumulative approach :

select min(id) as id, max(tBegin), orderid, count(*) 

from (select h.*,

             row_number() over (order by id) as seq1,

             row_number() over (partition by orderid order by id) as seq2

      from history h

     ) h

group by orderid, (seq1 - seq2)

order by id;

answered Nov 19 '18 at 10:46

Yogesh Sharma

30.1k51436

answered Nov 19 '18 at 10:46

Yogesh Sharma

30.1k51436

answered Nov 19 '18 at 10:46

Yogesh Sharma

30.1k51436

answered Nov 19 '18 at 10:46

Yogesh Sharma

30.1k51436

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk