Agfdhyk

Question

I'm working in SQL Server Management Studio so I suppose this is a Microsoft SQL Server T-SQL question.

The real world scenario is this: I have multiple employees working across multiple locations and the "time in" and "time out" records for each. I have already created a unique "Shift ID" for each set of time intervals and joined, based on employee, date, and location the shifts of other employees that match my keystone employee, or the one against which I am comparing everyone else.

Furthermore, I have written a query that pulled each "other employee's" specific overlapping time interval with the keystone. For one shift the timeline looks like this:

 Key Emp. | 9AM------------------------6PM

 Emp. A   | 9AM------------------------6PM

 Emp. B   |         12PM-------4PM

So the discrete periods where a true "controlled" comparison can be made are among:

Key, A, and B from 12PM - 4PM

Key and A from 9AM - 12PM

Key and A from 4PM - 6PM

The end goal is to pull all the activity (organized as events with datetime stamps in a separate table) for each employee that occurs within those time periods and compare totals for each relevant employee. So there would be a separate "Count(events)" total for each time frame only affected by the employees that share the time interval as described above.

Currently, my data is organized like this:

the "In" and "Out" columns for key and other employees are stored as TIMESTAMPs; the "1/1,6PM" is just my crappy way of saving space in my example. Please see my consumable data at the end of this post. SSMS doesn't seem to care that I have more than TIMESTAMP column and treats them all like DATETIME:

Key_ShiftID| Key In | Key Out | Oth_Emp_ShiftID | Oth_Emp_In | Oth_Emp_Out

  K          1/1,9AM   1/1,6PM     A                1/1,9AM     1/1,6PM 

  K          1/1,9AM   1/1,6PM     B                1/1,12PM    1/1,4PM

Where the Shift IDs (Key_ShiftID and Oth_Emp_ShiftID) are unique strings and the time intervals are defined by two columns a piece (Key_In & Key_Out + Oth_Emp_In & Oth_Emp_Out) are stored as datetime/timestamps. I'm looking for discrete periods where I can compare the activity of the employees, which is in a separate table with each event having a unique datetime as was mentioned earlier. Thus, I think the ending data would look something like this:

Key_ShiftID| Key_In | Key_Out | Oth_Emp_ShiftID | Oth_Emp_In  | Oth_Emp_Out

  K          1/1,9AM   1/1,6PM     A                1/1,12PM    1/1,4PM 

  K          1/1,9AM   1/1,6PM     B                1/1,12PM    1/1,4PM 

  K          1/1,9AM   1/1,6PM     A                1/1,9AM     1/1,12PM

  K          1/1,9AM   1/1,6PM     A                1/1,4PM     1/1,6PM

So I would be able to join the table above to my activity table by ShiftID and bring in the Count(events) per relevant employee

where event_datetime >= Oth_Emp_In and event_datetime <= Oth_Emp_Out

Additionally, as I noted before, I already wrote a query to cut down the non-key employees' shifts to reflect only the time intervals where they overlap with the key employee, so the Other_Emp_In will always be greater than or equal to the Key In time and Other_Emp_Out will always be less than or equal to the Key Out time.

Thanks in advance. I've been researching and struggling with this for around 2 days.

Here's sample data of one key shift (not the exact example above):

Also, SQL Server doesn't seem to care that I have more than TIMESTAMP column and treats them all like DATETIME.

CREATE TABLE "sample_data" 

(

    "Employee" INT,

    "Key_ShiftID" TEXT,

    "Key_In" TIMESTAMP,

    "Key_Out" TIMESTAMP,

    "Other_Emp_ShiftID" TEXT,

    "Other_Emp_In" TIMESTAMP,

    "Other_Emp_Out" TIMESTAMP,

    "overlap_min" TIMESTAMP,

    "overlap_max" TIMESTAMP

);



INSERT INTO "sample_data" 

VALUES (900, '545BD826-0C9A-408B-BE9F-4C3D7D307948', '2016-09-27 14:15:00', '2016-09-27 21:45:00', '035FA1C1-B469-44EB-B5B4-5B6948574464', '2016-09-27 08:45:00', '2016-09-27 16:15:00', '2016-09-27 14:15:00', '2016-09-27 16:15:00'),

       (78, '545BD826-0C9A-408B-BE9F-4C3D7D307948', '2016-09-27 14:15:00', '2016-09-27 21:45:00', '74035838-FD07-4F8D-8AC4-F6407AC786D9', '2016-09-27 18:00:00', '2016-09-27 21:15:00', '2016-09-27 18:00:00', '2016-09-27 21:15:00'),

       (900, '545BD826-0C9A-408B-BE9F-4C3D7D307948', '2016-09-27 14:15:00', '2016-09-27 21:45:00', 'D7E9ADCD-8631-476D-B69F-00626F0E4B06', '2016-09-27 16:45:00', '2016-09-27 21:45:00', '2016-09-27 16:45:00', '2016-09-27 21:45:00');

Please read this for some tips on improving your question. DDL, consumable data and your queries are all helpful. — Nov 14 '18 at 20:12
So I included 2/3 of the things mentioned in your link: I have provided sample data and expected output. I'm at a loss for a creating a query that transforms the sample data into my expected output... hence the forum post. — Nov 14 '18 at 20:18
I don't see any DDL, i.e. table definitions that explain what datatype might hold "1/1,6PM" in column [Key Out]. Consumable data is input data in the form of table declarations and insert statements that provide us with easy to use samples for testing. (You can use table variables for most purposes.) And your query seems to consist of a single where clause. In what context is it used? — Nov 14 '18 at 20:24
So I reread your comment and provided the code to create an example table like the one described above. I explained in the body that the both key and other employee in and out columns (4 in total) are datetimes/timestamps, but I see that upon a quick glance that isn't necessarily apparent. — Nov 14 '18 at 20:31
Now we're getting somewhere! A timestamp can only be used in a single column per table and stores the version for the row. From ref: "The Transact-SQL rowversion data type is not a date or time data type. timestamp is a deprecated synonym for rowversion." — Nov 14 '18 at 20:40

Alan Burstein 3,7131713 · Accepted Answer · 2018-11-14 20:31:45Z

Welcome to StackOverflow. In the future, try to include some easily consumable sample data like what I am including in my solution below.

This is a fun little problem. For this kind of thing I leverage my patExtract8K function which leverages ngrams8K. Here's an example of how to use PatExtract; here I'm extracting money from a string:

SELECT p.* 

FROM   dbo.patextract8K('Pay me $50.17 now or $1000 later!','[^$0-9.]') AS p;

Results:

itemNumber  itemIndex  itemLength  item

----------- ---------- ----------- --------

1           8          6           $50.17

2           22         5           $1000

Now to tackle your problem:

-- Easily consumable sample data

DECLARE @table TABLE (shiftId VARCHAR(2), empKey VARCHAR(5), workDuration VARCHAR(100));

INSERT @table(shiftId,empKey,workDuration)

VALUES

('K','A','12PM - 4PM'),

('K','B','12PM - 4PM'),

('K','A','9AM - 12PM'),

('K','A','4PM - 6PM');



-- Solution

SELECT 

  shiftId   = f.shiftId, 

  KeyIn     = '1/1,'+REPLACE(CONVERT(VARCHAR(10),

               MIN(CAST(f.c1 AS TIME)) OVER (),100),':00',''),

  KeyOut    = '1/1,'+REPLACE(CONVERT(VARCHAR(10),

               MAX(CAST(f.c2 AS TIME)) OVER (),100),':00',''),

  empShift  = f.empKey,

    othEmpIn  = '1/1,'+f.c1, 

  othEmpOut = '1/1,'+f.c2

FROM

(

  SELECT      t.shiftId, t.empKey, t.workDuration, 

              c1 = MAX(CASE p.itemNumber WHEN 1 THEN p.item END), 

              c2 = MAX(CASE p.itemNumber WHEN 2 THEN p.item END)

  FROM        @table AS t

  CROSS APPLY dbo.patExtract8k(t.workDuration, '[^0-9APM]') AS p

  CROSS APPLY (VALUES(CAST(p.item AS TIME))) AS tm(N)

  GROUP BY    t.shiftId, t.empKey, t.workDuration

) AS f;

Results:

shiftId KeyIn      KeyOut       empShift othEmpIn     othEmpOut

------- ---------- ------------ -------- ------------ ------------

K       1/1,9AM    1/1,6PM      A        1/1,12PM     1/1,4PM

K       1/1,9AM    1/1,6PM      A        1/1,4PM      1/1,6PM

K       1/1,9AM    1/1,6PM      A        1/1,9AM      1/1,12PM

K       1/1,9AM    1/1,6PM      B        1/1,12PM     1/1,4PM

Note that I have no idea where the "1/1" is coming from so I just hard-coded that in.

Here's my underlying functions. All are very helpful for solving a wide array of SQL issues efficiently and with little code.

CREATE FUNCTION dbo.rangeAB

(

  @low  bigint, 

  @high bigint, 

  @gap  bigint,

  @row1 bit

)

/****************************************************************************************

[Purpose]:

 Creates up to 531,441,000,000 sequentia1 integers numbers beginning with @low and ending 

 with @high. Used to replace iterative methods such as loops, cursors and recursive CTEs 

 to solve SQL problems. Based on Itzik Ben-Gan's getnums function with some tweeks and 

 enhancements and added functionality. The logic for getting rn to begin at 0 or 1 is 

 based comes from Jeff Moden's fnTally function. 



 The name range because it's similar to clojure's range function. The name "rangeAB" as 

 used because "range" is a reserved SQL keyword.



[Author]: Alan Burstein



[Compatibility]: 

 SQL Server 2008+ and Azure SQL Database



[Syntax]:

 SELECT r.RN, r.OP, r.N1, r.N2

 FROM dbo.rangeAB(@low,@high,@gap,@row1) AS r;



[Parameters]:

 @low  = a bigint that represents the lowest value for n1.

 @high = a bigint that represents the highest value for n1.

 @gap  = a bigint that represents how much n1 and n2 will increase each row; @gap also

         represents the difference between n1 and n2.

 @row1 = a bit that represents the first value of rn. When @row = 0 then rn begins

         at 0, when @row = 1 then rn will begin at 1.



[Returns]:

 Inline Table Valued Function returns:

 rn = bigint; a row number that works just like T-SQL ROW_NUMBER() except that it can 

      start at 0 or 1 which is dictated by @row1.

 op = bigint; returns the "opposite number that relates to rn. When rn begins with 0 and

      ends with 10 then 10 is the opposite of 0, 9 the opposite of 1, etc. When rn begins

      with 1 and ends with 5 then 1 is the opposite of 5, 2 the opposite of 4, etc...

 n1 = bigint; a sequential number starting at the value of @low and incrimentingby the

      value of @gap until it is less than or equal to the value of @high.

 n2 = bigint; a sequential number starting at the value of @low+@gap and  incrimenting 

      by the value of @gap.



[Dependencies]:

N/A



[Developer Notes]:



 1. The lowest and highest possible numbers returned are whatever is allowable by a 

    bigint. The function, however, returns no more than 531,441,000,000 rows (8100^3). 

 2. @gap does not affect rn, rn will begin at @row1 and increase by 1 until the last row

    unless its used in a query where a filter is applied to rn.

 3. @gap must be greater than 0 or the function will not return any rows.

 4. Keep in mind that when @row1 is 0 then the highest row-number will be the number of

    rows returned minus 1

 5. If you only need is a sequential set beginning at 0 or 1 then, for best performance

    use the RN column. Use N1 and/or N2 when you need to begin your sequence at any 

    number other than 0 or 1 or if you need a gap between your sequence of numbers. 

 6. Although @gap is a bigint it must be a positive integer or the function will

    not return any rows.

 7. The function will not return any rows when one of the following conditions are true:

      * any of the input parameters are NULL

      * @high is less than @low 

      * @gap is not greater than 0

    To force the function to return all NULLs instead of not returning anything you can

    add the following code to the end of the query:



      UNION ALL 

      SELECT NULL, NULL, NULL, NULL

      WHERE NOT (@high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0)



    This code was excluded as it adds a ~5% performance penalty.

 8. There is no performance penalty for sorting by rn ASC; there is a large performance 

    penalty for sorting in descending order WHEN @row1 = 1; WHEN @row1 = 0

    If you need a descending sort the use op in place of rn then sort by rn ASC. 



Best Practices:

--===== 1. Using RN (rownumber)

 -- (1.1) The best way to get the numbers 1,2,3...@high (e.g. 1 to 5):

 SELECT RN FROM dbo.rangeAB(1,5,1,1);

 -- (1.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 0 to 5):

 SELECT RN FROM dbo.rangeAB(0,5,1,0);



--===== 2. Using OP for descending sorts without a performance penalty

 -- (2.1) The best way to get the numbers 5,4,3...@high (e.g. 5 to 1):

 SELECT op FROM dbo.rangeAB(1,5,1,1) ORDER BY rn ASC;

 -- (2.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 5 to 0):

 SELECT op FROM dbo.rangeAB(1,6,1,0) ORDER BY rn ASC;



--===== 3. Using N1

 -- (3.1) To begin with numbers other than 0 or 1 use N1 (e.g. -3 to 3):

 SELECT N1 FROM dbo.rangeAB(-3,3,1,1);

 -- (3.2) ROW_NUMBER() is built in. If you want a ROW_NUMBER() include RN:

 SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,1);

 -- (3.3) If you wanted a ROW_NUMBER() that started at 0 you would do this:

 SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,0);



--===== 4. Using N2 and @gap

 -- (4.1) To get 0,10,20,30...100, set @low to 0, @high to 100 and @gap to 10:

 SELECT N1 FROM dbo.rangeAB(0,100,10,1);

 -- (4.2) Note that N2=N1+@gap; this allows you to create a sequence of ranges.

 --       For example, to get (0,10),(10,20),(20,30).... (90,100):

 SELECT N1, N2 FROM dbo.rangeAB(0,90,10,1);

 -- (4.3) Remember that a rownumber is included and it can begin at 0 or 1:

 SELECT RN, N1, N2 FROM dbo.rangeAB(0,90,10,1);



[Examples]:

--===== 1. Generating Sample data (using rangeAB to create "dummy rows")

 -- The query below will generate 10,000 ids and random numbers between 50,000 and 500,000

 SELECT

   someId    = r.rn,

   someNumer = ABS(CHECKSUM(NEWID())%450000)+50001 

 FROM rangeAB(1,10000,1,1) r;



--===== 2. Create a series of dates; rn is 0 to include the first date in the series

 DECLARE @startdate DATE = '20180101', @enddate DATE = '20180131';



 SELECT r.rn, calDate = DATEADD(dd, r.rn, @startdate)

 FROM dbo.rangeAB(1, DATEDIFF(dd,@startdate,@enddate),1,0) r;

 GO



--===== 3. Splitting (tokenizing) a string with fixed sized items

 -- given a delimited string of identifiers that are always 7 characters long

 DECLARE @string VARCHAR(1000) = 'A601225,B435223,G008081,R678567';



 SELECT

   itemNumber = r.rn, -- item's ordinal position 

   itemIndex  = r.n1, -- item's position in the string (it's CHARINDEX value)

   item       = SUBSTRING(@string, r.n1, 7) -- item (token)

 FROM dbo.rangeAB(1, LEN(@string), 8,1)  r;

 GO



--===== 4. Splitting (tokenizing) a string with random delimiters

 DECLARE @string VARCHAR(1000) = 'ABC123,999F,XX,9994443335';



 SELECT

   itemNumber = ROW_NUMBER() OVER (ORDER BY r.rn), -- item's ordinal position 

   itemIndex  = r.n1+1, -- item's position in the string (it's CHARINDEX value)

   item       = SUBSTRING

               (

                 @string,

                 r.n1+1,

                 ISNULL(NULLIF(CHARINDEX(',',@string,r.n1+1),0)-r.n1-1, 8000)

               ) -- item (token)

 FROM dbo.rangeAB(0,DATALENGTH(@string),1,1) r

 WHERE SUBSTRING(@string,r.n1,1) = ',' OR r.n1 = 0;

 -- logic borrowed from: http://www.sqlservercentral.com/articles/Tally+Table/72993/



--===== 5. Grouping by a weekly intervals

 -- 5.1. how to create a series of start/end dates between @startDate & @endDate

 DECLARE @startDate DATE = '1/1/2015', @endDate DATE = '2/1/2015';

 SELECT 

   WeekNbr   = r.RN,

   WeekStart = DATEADD(DAY,r.N1,@StartDate), 

   WeekEnd   = DATEADD(DAY,r.N2-1,@StartDate)

 FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r;

 GO



 -- 5.2. LEFT JOIN to the weekly interval table

 BEGIN

  DECLARE @startDate datetime = '1/1/2015', @endDate datetime = '2/1/2015';

  -- sample data 

  DECLARE @loans TABLE (loID INT, lockDate DATE);

  INSERT @loans SELECT r.rn, DATEADD(dd, ABS(CHECKSUM(NEWID())%32), @startDate)

  FROM dbo.rangeAB(1,50,1,1) r;



  -- solution 

  SELECT 

    WeekNbr   = r.RN,

    WeekStart = dt.WeekStart, 

    WeekEnd   = dt.WeekEnd,

    total     = COUNT(l.lockDate)

  FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r

  CROSS APPLY (VALUES (

    CAST(DATEADD(DAY,r.N1,@StartDate) AS DATE), 

    CAST(DATEADD(DAY,r.N2-1,@StartDate) AS DATE))) dt(WeekStart,WeekEnd)

  LEFT JOIN @loans l ON l.lockDate BETWEEN  dt.WeekStart AND dt.WeekEnd

  GROUP BY r.RN, dt.WeekStart, dt.WeekEnd ;

 END;



--===== 6. Identify the first vowel and last vowel in a along with their positions

 DECLARE @string VARCHAR(200) = 'This string has vowels';



 SELECT TOP(1) position = r.rn, letter = SUBSTRING(@string,r.rn,1)

 FROM dbo.rangeAB(1,LEN(@string),1,1) r

 WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'

 ORDER BY r.rn;



 -- To avoid a sort in the execution plan we'll use op instead of rn

 SELECT TOP(1) position = r.op, letter = SUBSTRING(@string,r.op,1)

 FROM dbo.rangeAB(1,LEN(@string),1,1) r

 WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'

 ORDER BY r.rn;



---------------------------------------------------------------------------------------

[Revision History]:

 Rev 00 - 20140518 - Initial Development - Alan Burstein

 Rev 01 - 20151029 - Added 65 rows to make L1=465; 465^3=100.5M. Updated comment section

                   - Alan Burstein

 Rev 02 - 20180613 - Complete re-design including opposite number column (op)

 Rev 03 - 20180920 - Added additional CROSS JOIN to L2 for 530B rows max - Alan Burstein

****************************************************************************************/

RETURNS TABLE WITH SCHEMABINDING AS RETURN

WITH L1(N) AS 

(

  SELECT 1

  FROM (VALUES

   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

   (0),(0)) T(N) -- 90 values 

),

L2(N)  AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),

iTally AS (SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)

SELECT  

  r.RN,

  r.OP,

  r.N1,

  r.N2

FROM

(

  SELECT

    RN = 0,

    OP = (@high-@low)/@gap,

    N1 = @low,

    N2 = @gap+@low

  WHERE @row1 = 0

  UNION ALL -- COALESCE required in the TOP statement below for error handling purposes

  SELECT TOP (ABS((COALESCE(@high,0)-COALESCE(@low,0))/COALESCE(@gap,0)+COALESCE(@row1,1)))

    RN = i.rn,

    OP = (@high-@low)/@gap+(2*@row1)-i.rn,

    N1 = (i.rn-@row1)*@gap+@low,

    N2 = (i.rn-(@row1-1))*@gap+@low

  FROM iTally AS i

  ORDER BY rn

) AS r

WHERE @high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0;

GO



IF OBJECT_ID('dbo.NGrams8k', 'IF') IS NOT NULL DROP FUNCTION dbo.NGrams8k;

GO

CREATE FUNCTION dbo.NGrams8k

(

  @string VARCHAR(8000), -- Input string

  @N      INT            -- requested token size

)

/*****************************************************************************************

[Purpose]:

 A character-level N-Grams function that outputs a contiguous stream of @N-sized tokens

 based on an input string (@string). Accepts strings up to 8000 varchar characters long.

 For more information about N-Grams see: http://en.wikipedia.org/wiki/N-gram.



[Author]: 

 Alan Burstein



[Compatibility]:

 SQL Server 2008+, Azure SQL Database



[Syntax]:

--===== Autonomous

 SELECT ng.position, ng.token 

 FROM   dbo.NGrams8k(@string,@N) AS ng;



--===== Against a table using APPLY

 SELECT      s.SomeID, ng.position, ng.token

 FROM        dbo.SomeTable AS s

 CROSS APPLY dbo.NGrams8K(s.SomeValue,@N) AS ng;



[Parameters]:

 @string  = The input string to split into tokens.

 @N       = The size of each token returned.



[Returns]:

 Position = BIGINT; the position of the token in the input string

 token    = VARCHAR(8000); a @N-sized character-level N-Gram token



[Dependencies]:

 1. dbo.rangeAB (iTVF)



[Developer Notes]:

 1. NGrams8k is not case sensitive;



 2. Many functions that use NGrams8k will see a huge performance gain when the optimizer

    creates a parallel execution plan. One way to get a parallel query plan (if the

    optimizer does not choose one) is to use make_parallel by Adam Machanic which can be

    found here:

 sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx



3. When @N is less than 1 or greater than the datalength of the input string then no

    tokens (rows) are returned. If either @string or @N are NULL no rows are returned.

    This is a debatable topic but the thinking behind this decision is that: because you

    can't split 'xxx' into 4-grams, you can't split a NULL value into unigrams and you

    can't turn anything into NULL-grams, no rows should be returned.



    For people who would prefer that a NULL input forces the function to return a single

    NULL output you could add this code to the end of the function:



    UNION ALL

    SELECT 1, NULL

    WHERE NOT(@N > 0 AND @N <= DATALENGTH(@string)) OR (@N IS NULL OR @string IS NULL)



 4. NGrams8k is deterministic. For more about deterministic functions see:

    https://msdn.microsoft.com/en-us/library/ms178091.aspx



[Examples]:

--===== 1. Split the string, "abcd" into unigrams, bigrams and trigrams

 SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',1) AS ng; -- unigrams (@N=1)

 SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',2) AS ng; -- bigrams  (@N=2)

 SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',3) AS ng; -- trigrams (@N=3)



--===== How many times the substring "AB" appears in each record

 DECLARE @table TABLE(stringID int identity primary key, string varchar(100));

 INSERT @table(string) VALUES ('AB123AB'),('123ABABAB'),('!AB!AB!'),('AB-AB-AB-AB-AB');



 SELECT      string, occurances = COUNT(*)

 FROM        @table t

 CROSS APPLY dbo.NGrams8k(t.string,2) AS ng

 WHERE       ng.token = 'AB'

 GROUP BY    string;



[Revision History]:

------------------------------------------------------------------------------------------

 Rev 00 - 20140310 - Initial Development - Alan Burstein

 Rev 01 - 20150522 - Removed DQS N-Grams functionality, improved iTally logic. Also Added

                     conversion to bigint in the TOP logic to remove implicit conversion

                     to bigint - Alan Burstein

 Rev 03 - 20150909 - Added logic to only return values if @N is greater than 0 and less

                     than the length of @string. Updated comment section. - Alan Burstein

 Rev 04 - 20151029 - Added ISNULL logic to the TOP clause for the @string and @N

                     parameters to prevent a NULL string or NULL @N from causing "an

                     improper value" being passed to the TOP clause. - Alan Burstein

 Rev 05 - 20171228 - Small simplification; changed: 

                (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))-(ISNULL(@N,1)-1)),0)))

                                           to:

                (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))+1-ISNULL(@N,1)),0)))

 Rev 06 - 20180612 - Using CHECKSUM(N) in the to convert N in the token output instead of

                     using (CAST N as int). CHECKSUM removes the need to convert to int.

 Rev 07 - 20180612 - re-designed to: (1) use dbo.rangeAB - Alan Burstein

****************************************************************************************/

RETURNS TABLE WITH SCHEMABINDING AS RETURN

SELECT

  position   = r.RN,

  token      = SUBSTRING(@string, CHECKSUM(r.RN), @N)

FROM  dbo.rangeAB(1, LEN(@string)+1-@N,1,1) AS r

WHERE @N > 0 AND @N <= LEN(@string);

GO



CREATE FUNCTION dbo.patExtract8K

(

  @string  VARCHAR(8000),

  @pattern VARCHAR(50)

)

/*****************************************************************************************

[Description]:

 This can be considered a T-SQL inline table valued function (iTVF) equivalent of 

 Microsoft's mdq.RegexExtract except:



 1. It includes each matching substring's position in the string



 2. It accepts varchar(8000) instead of nvarchar(4000) for the input string, varchar(50)

    instead of nvarchar(4000) for the pattern



 3. The mask parameter is not required and therefore does not exist.



 4. You have specify what text we're searching for as an exclusion; e.g. for numeric 

    characters you should search for '[^0-9]' instead of '[0-9]'. 



 5. There is is no parameter for naming a "capture group". Using the variable below, both 

    the following queries will return the same result:



     DECLARE @string nvarchar(4000) = N'123 Main Street';



   SELECT item FROM dbo.patExtract8K(@string, '[^0-9]');

   SELECT clr.RegexExtract(@string, N'(?<number>(d+))(?<street>(.*))', N'number', 1);



 Alternatively, you can think of patExtract8K as Chris Morris' PatternSplitCM (found here:

 http://www.sqlservercentral.com/articles/String+Manipulation/94365/) but only returns the

 rows where [matched]=0. The key benefit of is that it performs substantially better 

 because you are only returning the number of rows required instead of returning twice as

 many rows then filtering out half of them.  Furthermore, because we're 



 The following two sets of queries return the same result:



 DECLARE @string varchar(100) = 'xx123xx555xx999';

 BEGIN

 -- QUERY #1

   -- patExtract8K

   SELECT ps.itemNumber, ps.item 

   FROM dbo.patExtract8K(@string, '[^0-9]') ps;



   -- patternSplitCM   

   SELECT itemNumber = row_number() over (order by ps.itemNumber), ps.item 

   FROM dbo.patternSplitCM(@string, '[^0-9]') ps

   WHERE [matched] = 0;



 -- QUERY #2

   SELECT ps.itemNumber, ps.item 

   FROM dbo.patExtract8K(@string, '[0-9]') ps;



   SELECT itemNumber = row_number() over (order by itemNumber), item 

   FROM dbo.patternSplitCM(@string, '[0-9]')

   WHERE [matched] = 0;

 END;



[Compatibility]:

 SQL Server 2008+



[Syntax]:

--===== Autonomous

 SELECT pe.ItemNumber, pe.ItemIndex, pe.ItemLength, pe.Item

 FROM dbo.patExtract8K(@string,@pattern) pe;



--===== Against a table using APPLY

 SELECT t.someString, pe.ItemIndex, pe.ItemLength, pe.Item

 FROM dbo.SomeTable t

 CROSS APPLY dbo.patExtract8K(t.someString, @pattern) pe;



[Parameters]:

 @string        = varchar(8000); the input string

 @searchString  = varchar(50); pattern to search for



[Returns]:

 itemNumber = bigint; the instance or ordinal position of the matched substring

 itemIndex  = bigint; the location of the matched substring inside the input string

 itemLength = int; the length of the matched substring

 item       = varchar(8000); the returned text



[Developer Notes]:

 1. Requires NGrams8k 



 2. patExtract8K does not return any rows on NULL or empty strings. Consider using 

    OUTER APPLY or append the function with the code below to force the function to return 

    a row on emply or NULL inputs:



    UNION ALL SELECT 1, 0, NULL, @string WHERE nullif(@string,'') IS NULL;



 3. patExtract8K is not case sensitive; use a case sensitive collation for 

    case-sensitive comparisons



 4. patExtract8K is deterministic. For more about deterministic functions see:

    https://msdn.microsoft.com/en-us/library/ms178091.aspx



 5. patExtract8K performs substantially better with a parallel execution plan, often

    2-3 times faster. For queries that leverage patextract8K that are not getting a 

    parallel exeution plan you should consider performance testing using Traceflag 8649 

    in Development environments and Adam Machanic's make_parallel in production. 



[Examples]:

--===== (1) Basic extact all groups of numbers:

  WITH temp(id, txt) as

 (

   SELECT * FROM (values

   (1, 'hello 123 fff 1234567 and today;""o999999999 tester 44444444444444 done'),

   (2, 'syat 123 ff tyui( 1234567 and today 999999999 tester 777777 done'),

   (3, '&**OOOOO=+ + + // ==?76543// and today !!222222\tester{}))22222444 done'))t(x,xx)

 )

 SELECT

   [temp.id] = t.id,

   pe.itemNumber,

   pe.itemIndex,

   pe.itemLength,

   pe.item

 FROM        temp AS t

 CROSS APPLY dbo.patExtract8K(t.txt, '[^0-9]') AS pe;

-----------------------------------------------------------------------------------------

Revision History:

 Rev 00 - 20170801 - Initial Development - Alan Burstein

 Rev 01 - 20180619 - Complete re-write   - Alan Burstein

*****************************************************************************************/

RETURNS TABLE WITH SCHEMABINDING AS RETURN

SELECT itemNumber = ROW_NUMBER() OVER (ORDER BY f.position),

       itemIndex  = f.position,

       itemLength = itemLen.l,

       item       = SUBSTRING(f.token, 1, itemLen.l)

FROM

(

 SELECT ng.position, SUBSTRING(@string,ng.position,DATALENGTH(@string))

 FROM   dbo.NGrams8k(@string, 1) AS ng

 WHERE  PATINDEX(@pattern, ng.token) <  --<< this token does NOT match the pattern

        ABS(SIGN(ng.position-1)-1) +    --<< are you the first row?  OR

        PATINDEX(@pattern,SUBSTRING(@string,ng.position-1,1)) --<< always 0 for 1st row

) AS f(position, token)

CROSS APPLY (VALUES(ISNULL(NULLIF(PATINDEX('%'+@pattern+'%',f.token),0),

  DATALENGTH(@string)+2-f.position)-1)) AS itemLen(l);

GO

So HABO was confused by the representation of my data as well (1/1,6PM). Apologies. I provided code to create sample data per your and HABO's suggestion. The '1/1,6PM' is really my representation of a timestamp for ease of reading. In reality, it's "2017-01-01 18:00:00.000". That probably makes things easier. — Nov 14 '18 at 20:42
How would your solution look when using DATETIME (as provided in my easily consumable data) as opposed to having to convert my crappy example from a string into some TIME variable? — Nov 14 '18 at 21:15

Alan Burstein 3,7131713 · Accepted Answer · 2018-11-14 20:31:45Z

Welcome to StackOverflow. In the future, try to include some easily consumable sample data like what I am including in my solution below.

This is a fun little problem. For this kind of thing I leverage my patExtract8K function which leverages ngrams8K. Here's an example of how to use PatExtract; here I'm extracting money from a string:

SELECT p.* 

FROM   dbo.patextract8K('Pay me $50.17 now or $1000 later!','[^$0-9.]') AS p;

Results:

itemNumber  itemIndex  itemLength  item

----------- ---------- ----------- --------

1           8          6           $50.17

2           22         5           $1000

Now to tackle your problem:

-- Easily consumable sample data

DECLARE @table TABLE (shiftId VARCHAR(2), empKey VARCHAR(5), workDuration VARCHAR(100));

INSERT @table(shiftId,empKey,workDuration)

VALUES

('K','A','12PM - 4PM'),

('K','B','12PM - 4PM'),

('K','A','9AM - 12PM'),

('K','A','4PM - 6PM');



-- Solution

SELECT 

  shiftId   = f.shiftId, 

  KeyIn     = '1/1,'+REPLACE(CONVERT(VARCHAR(10),

               MIN(CAST(f.c1 AS TIME)) OVER (),100),':00',''),

  KeyOut    = '1/1,'+REPLACE(CONVERT(VARCHAR(10),

               MAX(CAST(f.c2 AS TIME)) OVER (),100),':00',''),

  empShift  = f.empKey,

    othEmpIn  = '1/1,'+f.c1, 

  othEmpOut = '1/1,'+f.c2

FROM

(

  SELECT      t.shiftId, t.empKey, t.workDuration, 

              c1 = MAX(CASE p.itemNumber WHEN 1 THEN p.item END), 

              c2 = MAX(CASE p.itemNumber WHEN 2 THEN p.item END)

  FROM        @table AS t

  CROSS APPLY dbo.patExtract8k(t.workDuration, '[^0-9APM]') AS p

  CROSS APPLY (VALUES(CAST(p.item AS TIME))) AS tm(N)

  GROUP BY    t.shiftId, t.empKey, t.workDuration

) AS f;

Results:

shiftId KeyIn      KeyOut       empShift othEmpIn     othEmpOut

------- ---------- ------------ -------- ------------ ------------

K       1/1,9AM    1/1,6PM      A        1/1,12PM     1/1,4PM

K       1/1,9AM    1/1,6PM      A        1/1,4PM      1/1,6PM

K       1/1,9AM    1/1,6PM      A        1/1,9AM      1/1,12PM

K       1/1,9AM    1/1,6PM      B        1/1,12PM     1/1,4PM

Note that I have no idea where the "1/1" is coming from so I just hard-coded that in.

Here's my underlying functions. All are very helpful for solving a wide array of SQL issues efficiently and with little code.

CREATE FUNCTION dbo.rangeAB

(

  @low  bigint, 

  @high bigint, 

  @gap  bigint,

  @row1 bit

)

/****************************************************************************************

[Purpose]:

 Creates up to 531,441,000,000 sequentia1 integers numbers beginning with @low and ending 

 with @high. Used to replace iterative methods such as loops, cursors and recursive CTEs 

 to solve SQL problems. Based on Itzik Ben-Gan's getnums function with some tweeks and 

 enhancements and added functionality. The logic for getting rn to begin at 0 or 1 is 

 based comes from Jeff Moden's fnTally function. 



 The name range because it's similar to clojure's range function. The name "rangeAB" as 

 used because "range" is a reserved SQL keyword.



[Author]: Alan Burstein



[Compatibility]: 

 SQL Server 2008+ and Azure SQL Database



[Syntax]:

 SELECT r.RN, r.OP, r.N1, r.N2

 FROM dbo.rangeAB(@low,@high,@gap,@row1) AS r;



[Parameters]:

 @low  = a bigint that represents the lowest value for n1.

 @high = a bigint that represents the highest value for n1.

 @gap  = a bigint that represents how much n1 and n2 will increase each row; @gap also

         represents the difference between n1 and n2.

 @row1 = a bit that represents the first value of rn. When @row = 0 then rn begins

         at 0, when @row = 1 then rn will begin at 1.



[Returns]:

 Inline Table Valued Function returns:

 rn = bigint; a row number that works just like T-SQL ROW_NUMBER() except that it can 

      start at 0 or 1 which is dictated by @row1.

 op = bigint; returns the "opposite number that relates to rn. When rn begins with 0 and

      ends with 10 then 10 is the opposite of 0, 9 the opposite of 1, etc. When rn begins

      with 1 and ends with 5 then 1 is the opposite of 5, 2 the opposite of 4, etc...

 n1 = bigint; a sequential number starting at the value of @low and incrimentingby the

      value of @gap until it is less than or equal to the value of @high.

 n2 = bigint; a sequential number starting at the value of @low+@gap and  incrimenting 

      by the value of @gap.



[Dependencies]:

N/A



[Developer Notes]:



 1. The lowest and highest possible numbers returned are whatever is allowable by a 

    bigint. The function, however, returns no more than 531,441,000,000 rows (8100^3). 

 2. @gap does not affect rn, rn will begin at @row1 and increase by 1 until the last row

    unless its used in a query where a filter is applied to rn.

 3. @gap must be greater than 0 or the function will not return any rows.

 4. Keep in mind that when @row1 is 0 then the highest row-number will be the number of

    rows returned minus 1

 5. If you only need is a sequential set beginning at 0 or 1 then, for best performance

    use the RN column. Use N1 and/or N2 when you need to begin your sequence at any 

    number other than 0 or 1 or if you need a gap between your sequence of numbers. 

 6. Although @gap is a bigint it must be a positive integer or the function will

    not return any rows.

 7. The function will not return any rows when one of the following conditions are true:

      * any of the input parameters are NULL

      * @high is less than @low 

      * @gap is not greater than 0

    To force the function to return all NULLs instead of not returning anything you can

    add the following code to the end of the query:



      UNION ALL 

      SELECT NULL, NULL, NULL, NULL

      WHERE NOT (@high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0)



    This code was excluded as it adds a ~5% performance penalty.

 8. There is no performance penalty for sorting by rn ASC; there is a large performance 

    penalty for sorting in descending order WHEN @row1 = 1; WHEN @row1 = 0

    If you need a descending sort the use op in place of rn then sort by rn ASC. 



Best Practices:

--===== 1. Using RN (rownumber)

 -- (1.1) The best way to get the numbers 1,2,3...@high (e.g. 1 to 5):

 SELECT RN FROM dbo.rangeAB(1,5,1,1);

 -- (1.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 0 to 5):

 SELECT RN FROM dbo.rangeAB(0,5,1,0);



--===== 2. Using OP for descending sorts without a performance penalty

 -- (2.1) The best way to get the numbers 5,4,3...@high (e.g. 5 to 1):

 SELECT op FROM dbo.rangeAB(1,5,1,1) ORDER BY rn ASC;

 -- (2.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 5 to 0):

 SELECT op FROM dbo.rangeAB(1,6,1,0) ORDER BY rn ASC;



--===== 3. Using N1

 -- (3.1) To begin with numbers other than 0 or 1 use N1 (e.g. -3 to 3):

 SELECT N1 FROM dbo.rangeAB(-3,3,1,1);

 -- (3.2) ROW_NUMBER() is built in. If you want a ROW_NUMBER() include RN:

 SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,1);

 -- (3.3) If you wanted a ROW_NUMBER() that started at 0 you would do this:

 SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,0);



--===== 4. Using N2 and @gap

 -- (4.1) To get 0,10,20,30...100, set @low to 0, @high to 100 and @gap to 10:

 SELECT N1 FROM dbo.rangeAB(0,100,10,1);

 -- (4.2) Note that N2=N1+@gap; this allows you to create a sequence of ranges.

 --       For example, to get (0,10),(10,20),(20,30).... (90,100):

 SELECT N1, N2 FROM dbo.rangeAB(0,90,10,1);

 -- (4.3) Remember that a rownumber is included and it can begin at 0 or 1:

 SELECT RN, N1, N2 FROM dbo.rangeAB(0,90,10,1);



[Examples]:

--===== 1. Generating Sample data (using rangeAB to create "dummy rows")

 -- The query below will generate 10,000 ids and random numbers between 50,000 and 500,000

 SELECT

   someId    = r.rn,

   someNumer = ABS(CHECKSUM(NEWID())%450000)+50001 

 FROM rangeAB(1,10000,1,1) r;



--===== 2. Create a series of dates; rn is 0 to include the first date in the series

 DECLARE @startdate DATE = '20180101', @enddate DATE = '20180131';



 SELECT r.rn, calDate = DATEADD(dd, r.rn, @startdate)

 FROM dbo.rangeAB(1, DATEDIFF(dd,@startdate,@enddate),1,0) r;

 GO



--===== 3. Splitting (tokenizing) a string with fixed sized items

 -- given a delimited string of identifiers that are always 7 characters long

 DECLARE @string VARCHAR(1000) = 'A601225,B435223,G008081,R678567';



 SELECT

   itemNumber = r.rn, -- item's ordinal position 

   itemIndex  = r.n1, -- item's position in the string (it's CHARINDEX value)

   item       = SUBSTRING(@string, r.n1, 7) -- item (token)

 FROM dbo.rangeAB(1, LEN(@string), 8,1)  r;

 GO



--===== 4. Splitting (tokenizing) a string with random delimiters

 DECLARE @string VARCHAR(1000) = 'ABC123,999F,XX,9994443335';



 SELECT

   itemNumber = ROW_NUMBER() OVER (ORDER BY r.rn), -- item's ordinal position 

   itemIndex  = r.n1+1, -- item's position in the string (it's CHARINDEX value)

   item       = SUBSTRING

               (

                 @string,

                 r.n1+1,

                 ISNULL(NULLIF(CHARINDEX(',',@string,r.n1+1),0)-r.n1-1, 8000)

               ) -- item (token)

 FROM dbo.rangeAB(0,DATALENGTH(@string),1,1) r

 WHERE SUBSTRING(@string,r.n1,1) = ',' OR r.n1 = 0;

 -- logic borrowed from: http://www.sqlservercentral.com/articles/Tally+Table/72993/



--===== 5. Grouping by a weekly intervals

 -- 5.1. how to create a series of start/end dates between @startDate & @endDate

 DECLARE @startDate DATE = '1/1/2015', @endDate DATE = '2/1/2015';

 SELECT 

   WeekNbr   = r.RN,

   WeekStart = DATEADD(DAY,r.N1,@StartDate), 

   WeekEnd   = DATEADD(DAY,r.N2-1,@StartDate)

 FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r;

 GO



 -- 5.2. LEFT JOIN to the weekly interval table

 BEGIN

  DECLARE @startDate datetime = '1/1/2015', @endDate datetime = '2/1/2015';

  -- sample data 

  DECLARE @loans TABLE (loID INT, lockDate DATE);

  INSERT @loans SELECT r.rn, DATEADD(dd, ABS(CHECKSUM(NEWID())%32), @startDate)

  FROM dbo.rangeAB(1,50,1,1) r;



  -- solution 

  SELECT 

    WeekNbr   = r.RN,

    WeekStart = dt.WeekStart, 

    WeekEnd   = dt.WeekEnd,

    total     = COUNT(l.lockDate)

  FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r

  CROSS APPLY (VALUES (

    CAST(DATEADD(DAY,r.N1,@StartDate) AS DATE), 

    CAST(DATEADD(DAY,r.N2-1,@StartDate) AS DATE))) dt(WeekStart,WeekEnd)

  LEFT JOIN @loans l ON l.lockDate BETWEEN  dt.WeekStart AND dt.WeekEnd

  GROUP BY r.RN, dt.WeekStart, dt.WeekEnd ;

 END;



--===== 6. Identify the first vowel and last vowel in a along with their positions

 DECLARE @string VARCHAR(200) = 'This string has vowels';



 SELECT TOP(1) position = r.rn, letter = SUBSTRING(@string,r.rn,1)

 FROM dbo.rangeAB(1,LEN(@string),1,1) r

 WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'

 ORDER BY r.rn;



 -- To avoid a sort in the execution plan we'll use op instead of rn

 SELECT TOP(1) position = r.op, letter = SUBSTRING(@string,r.op,1)

 FROM dbo.rangeAB(1,LEN(@string),1,1) r

 WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'

 ORDER BY r.rn;



---------------------------------------------------------------------------------------

[Revision History]:

 Rev 00 - 20140518 - Initial Development - Alan Burstein

 Rev 01 - 20151029 - Added 65 rows to make L1=465; 465^3=100.5M. Updated comment section

                   - Alan Burstein

 Rev 02 - 20180613 - Complete re-design including opposite number column (op)

 Rev 03 - 20180920 - Added additional CROSS JOIN to L2 for 530B rows max - Alan Burstein

****************************************************************************************/

RETURNS TABLE WITH SCHEMABINDING AS RETURN

WITH L1(N) AS 

(

  SELECT 1

  FROM (VALUES

   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

   (0),(0)) T(N) -- 90 values 

),

L2(N)  AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),

iTally AS (SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)

SELECT  

  r.RN,

  r.OP,

  r.N1,

  r.N2

FROM

(

  SELECT

    RN = 0,

    OP = (@high-@low)/@gap,

    N1 = @low,

    N2 = @gap+@low

  WHERE @row1 = 0

  UNION ALL -- COALESCE required in the TOP statement below for error handling purposes

  SELECT TOP (ABS((COALESCE(@high,0)-COALESCE(@low,0))/COALESCE(@gap,0)+COALESCE(@row1,1)))

    RN = i.rn,

    OP = (@high-@low)/@gap+(2*@row1)-i.rn,

    N1 = (i.rn-@row1)*@gap+@low,

    N2 = (i.rn-(@row1-1))*@gap+@low

  FROM iTally AS i

  ORDER BY rn

) AS r

WHERE @high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0;

GO



IF OBJECT_ID('dbo.NGrams8k', 'IF') IS NOT NULL DROP FUNCTION dbo.NGrams8k;

GO

CREATE FUNCTION dbo.NGrams8k

(

  @string VARCHAR(8000), -- Input string

  @N      INT            -- requested token size

)

/*****************************************************************************************

[Purpose]:

 A character-level N-Grams function that outputs a contiguous stream of @N-sized tokens

 based on an input string (@string). Accepts strings up to 8000 varchar characters long.

 For more information about N-Grams see: http://en.wikipedia.org/wiki/N-gram.



[Author]: 

 Alan Burstein



[Compatibility]:

 SQL Server 2008+, Azure SQL Database



[Syntax]:

--===== Autonomous

 SELECT ng.position, ng.token 

 FROM   dbo.NGrams8k(@string,@N) AS ng;



--===== Against a table using APPLY

 SELECT      s.SomeID, ng.position, ng.token

 FROM        dbo.SomeTable AS s

 CROSS APPLY dbo.NGrams8K(s.SomeValue,@N) AS ng;



[Parameters]:

 @string  = The input string to split into tokens.

 @N       = The size of each token returned.



[Returns]:

 Position = BIGINT; the position of the token in the input string

 token    = VARCHAR(8000); a @N-sized character-level N-Gram token



[Dependencies]:

 1. dbo.rangeAB (iTVF)



[Developer Notes]:

 1. NGrams8k is not case sensitive;



 2. Many functions that use NGrams8k will see a huge performance gain when the optimizer

    creates a parallel execution plan. One way to get a parallel query plan (if the

    optimizer does not choose one) is to use make_parallel by Adam Machanic which can be

    found here:

 sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx



3. When @N is less than 1 or greater than the datalength of the input string then no

    tokens (rows) are returned. If either @string or @N are NULL no rows are returned.

    This is a debatable topic but the thinking behind this decision is that: because you

    can't split 'xxx' into 4-grams, you can't split a NULL value into unigrams and you

    can't turn anything into NULL-grams, no rows should be returned.



    For people who would prefer that a NULL input forces the function to return a single

    NULL output you could add this code to the end of the function:



    UNION ALL

    SELECT 1, NULL

    WHERE NOT(@N > 0 AND @N <= DATALENGTH(@string)) OR (@N IS NULL OR @string IS NULL)



 4. NGrams8k is deterministic. For more about deterministic functions see:

    https://msdn.microsoft.com/en-us/library/ms178091.aspx



[Examples]:

--===== 1. Split the string, "abcd" into unigrams, bigrams and trigrams

 SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',1) AS ng; -- unigrams (@N=1)

 SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',2) AS ng; -- bigrams  (@N=2)

 SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',3) AS ng; -- trigrams (@N=3)



--===== How many times the substring "AB" appears in each record

 DECLARE @table TABLE(stringID int identity primary key, string varchar(100));

 INSERT @table(string) VALUES ('AB123AB'),('123ABABAB'),('!AB!AB!'),('AB-AB-AB-AB-AB');



 SELECT      string, occurances = COUNT(*)

 FROM        @table t

 CROSS APPLY dbo.NGrams8k(t.string,2) AS ng

 WHERE       ng.token = 'AB'

 GROUP BY    string;



[Revision History]:

------------------------------------------------------------------------------------------

 Rev 00 - 20140310 - Initial Development - Alan Burstein

 Rev 01 - 20150522 - Removed DQS N-Grams functionality, improved iTally logic. Also Added

                     conversion to bigint in the TOP logic to remove implicit conversion

                     to bigint - Alan Burstein

 Rev 03 - 20150909 - Added logic to only return values if @N is greater than 0 and less

                     than the length of @string. Updated comment section. - Alan Burstein

 Rev 04 - 20151029 - Added ISNULL logic to the TOP clause for the @string and @N

                     parameters to prevent a NULL string or NULL @N from causing "an

                     improper value" being passed to the TOP clause. - Alan Burstein

 Rev 05 - 20171228 - Small simplification; changed: 

                (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))-(ISNULL(@N,1)-1)),0)))

                                           to:

                (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))+1-ISNULL(@N,1)),0)))

 Rev 06 - 20180612 - Using CHECKSUM(N) in the to convert N in the token output instead of

                     using (CAST N as int). CHECKSUM removes the need to convert to int.

 Rev 07 - 20180612 - re-designed to: (1) use dbo.rangeAB - Alan Burstein

****************************************************************************************/

RETURNS TABLE WITH SCHEMABINDING AS RETURN

SELECT

  position   = r.RN,

  token      = SUBSTRING(@string, CHECKSUM(r.RN), @N)

FROM  dbo.rangeAB(1, LEN(@string)+1-@N,1,1) AS r

WHERE @N > 0 AND @N <= LEN(@string);

GO



CREATE FUNCTION dbo.patExtract8K

(

  @string  VARCHAR(8000),

  @pattern VARCHAR(50)

)

/*****************************************************************************************

[Description]:

 This can be considered a T-SQL inline table valued function (iTVF) equivalent of 

 Microsoft's mdq.RegexExtract except:



 1. It includes each matching substring's position in the string



 2. It accepts varchar(8000) instead of nvarchar(4000) for the input string, varchar(50)

    instead of nvarchar(4000) for the pattern



 3. The mask parameter is not required and therefore does not exist.



 4. You have specify what text we're searching for as an exclusion; e.g. for numeric 

    characters you should search for '[^0-9]' instead of '[0-9]'. 



 5. There is is no parameter for naming a "capture group". Using the variable below, both 

    the following queries will return the same result:



     DECLARE @string nvarchar(4000) = N'123 Main Street';



   SELECT item FROM dbo.patExtract8K(@string, '[^0-9]');

   SELECT clr.RegexExtract(@string, N'(?<number>(d+))(?<street>(.*))', N'number', 1);



 Alternatively, you can think of patExtract8K as Chris Morris' PatternSplitCM (found here:

 http://www.sqlservercentral.com/articles/String+Manipulation/94365/) but only returns the

 rows where [matched]=0. The key benefit of is that it performs substantially better 

 because you are only returning the number of rows required instead of returning twice as

 many rows then filtering out half of them.  Furthermore, because we're 



 The following two sets of queries return the same result:



 DECLARE @string varchar(100) = 'xx123xx555xx999';

 BEGIN

 -- QUERY #1

   -- patExtract8K

   SELECT ps.itemNumber, ps.item 

   FROM dbo.patExtract8K(@string, '[^0-9]') ps;



   -- patternSplitCM   

   SELECT itemNumber = row_number() over (order by ps.itemNumber), ps.item 

   FROM dbo.patternSplitCM(@string, '[^0-9]') ps

   WHERE [matched] = 0;



 -- QUERY #2

   SELECT ps.itemNumber, ps.item 

   FROM dbo.patExtract8K(@string, '[0-9]') ps;



   SELECT itemNumber = row_number() over (order by itemNumber), item 

   FROM dbo.patternSplitCM(@string, '[0-9]')

   WHERE [matched] = 0;

 END;



[Compatibility]:

 SQL Server 2008+



[Syntax]:

--===== Autonomous

 SELECT pe.ItemNumber, pe.ItemIndex, pe.ItemLength, pe.Item

 FROM dbo.patExtract8K(@string,@pattern) pe;



--===== Against a table using APPLY

 SELECT t.someString, pe.ItemIndex, pe.ItemLength, pe.Item

 FROM dbo.SomeTable t

 CROSS APPLY dbo.patExtract8K(t.someString, @pattern) pe;



[Parameters]:

 @string        = varchar(8000); the input string

 @searchString  = varchar(50); pattern to search for



[Returns]:

 itemNumber = bigint; the instance or ordinal position of the matched substring

 itemIndex  = bigint; the location of the matched substring inside the input string

 itemLength = int; the length of the matched substring

 item       = varchar(8000); the returned text



[Developer Notes]:

 1. Requires NGrams8k 



 2. patExtract8K does not return any rows on NULL or empty strings. Consider using 

    OUTER APPLY or append the function with the code below to force the function to return 

    a row on emply or NULL inputs:



    UNION ALL SELECT 1, 0, NULL, @string WHERE nullif(@string,'') IS NULL;



 3. patExtract8K is not case sensitive; use a case sensitive collation for 

    case-sensitive comparisons



 4. patExtract8K is deterministic. For more about deterministic functions see:

    https://msdn.microsoft.com/en-us/library/ms178091.aspx



 5. patExtract8K performs substantially better with a parallel execution plan, often

    2-3 times faster. For queries that leverage patextract8K that are not getting a 

    parallel exeution plan you should consider performance testing using Traceflag 8649 

    in Development environments and Adam Machanic's make_parallel in production. 



[Examples]:

--===== (1) Basic extact all groups of numbers:

  WITH temp(id, txt) as

 (

   SELECT * FROM (values

   (1, 'hello 123 fff 1234567 and today;""o999999999 tester 44444444444444 done'),

   (2, 'syat 123 ff tyui( 1234567 and today 999999999 tester 777777 done'),

   (3, '&**OOOOO=+ + + // ==?76543// and today !!222222\tester{}))22222444 done'))t(x,xx)

 )

 SELECT

   [temp.id] = t.id,

   pe.itemNumber,

   pe.itemIndex,

   pe.itemLength,

   pe.item

 FROM        temp AS t

 CROSS APPLY dbo.patExtract8K(t.txt, '[^0-9]') AS pe;

-----------------------------------------------------------------------------------------

Revision History:

 Rev 00 - 20170801 - Initial Development - Alan Burstein

 Rev 01 - 20180619 - Complete re-write   - Alan Burstein

*****************************************************************************************/

RETURNS TABLE WITH SCHEMABINDING AS RETURN

SELECT itemNumber = ROW_NUMBER() OVER (ORDER BY f.position),

       itemIndex  = f.position,

       itemLength = itemLen.l,

       item       = SUBSTRING(f.token, 1, itemLen.l)

FROM

(

 SELECT ng.position, SUBSTRING(@string,ng.position,DATALENGTH(@string))

 FROM   dbo.NGrams8k(@string, 1) AS ng

 WHERE  PATINDEX(@pattern, ng.token) <  --<< this token does NOT match the pattern

        ABS(SIGN(ng.position-1)-1) +    --<< are you the first row?  OR

        PATINDEX(@pattern,SUBSTRING(@string,ng.position-1,1)) --<< always 0 for 1st row

) AS f(position, token)

CROSS APPLY (VALUES(ISNULL(NULLIF(PATINDEX('%'+@pattern+'%',f.token),0),

  DATALENGTH(@string)+2-f.position)-1)) AS itemLen(l);

GO

So HABO was confused by the representation of my data as well (1/1,6PM). Apologies. I provided code to create sample data per your and HABO's suggestion. The '1/1,6PM' is really my representation of a timestamp for ease of reading. In reality, it's "2017-01-01 18:00:00.000". That probably makes things easier. — Nov 14 '18 at 20:42
How would your solution look when using DATETIME (as provided in my easily consumable data) as opposed to having to convert my crappy example from a string into some TIME variable? — Nov 14 '18 at 21:15

搜尋此網誌

Agfdhyk

SQL, finding discrete overlapping time intervals of multiple ranges against one “key” interval &...

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

鏡平學校

Why https connections are so slow when debugging (stepping over) in Java?

SQL, finding discrete overlapping time intervals of multiple ranges against one “key” interval &...

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

鏡平學校

Why https connections are so slow when debugging (stepping over) in Java?

1 Answer
1

1 Answer
1

1 Answer
1