Common Table Expressions in SQL Server
Common Table Expressions (CTEs) in T-SQL are super powerful. They came out originally in SQL Server 2005 and are essentially a temporary query result that you can further apply queries to. I often use them to dedup data or find all ancestors in a child/parent table. I'll show both of these examples below.
PARTS OF A CTE EXPRESSION
CTEs are defined by a 'WITH' keyword. Other keywords are ROW_NUMBER, OVER, PARTITION BY and ORDER BY
Let's define a simple table quick:
CREATE TABLE Tags (
ID int NOT NULL IDENTITY(1, 1),
Tag varchar(50) NOT NULL,
DateAdded DATETIME2 NOT NULL
)
INSERT INTO Tags (tag, DateAdded) VALUES ('one', '1/1/2001')
INSERT INTO Tags (tag, DateAdded) VALUES ('two', '2/2/2002')
INSERT INTO Tags (tag, DateAdded) VALUES ('two', '3/3/2003')
INSERT INTO Tags (tag, DateAdded) VALUES ('three', '4/4/2004')
INSERT INTO Tags (tag, DateAdded) VALUES ('three', '5/5/2005')
INSERT INTO Tags (tag, DateAdded) VALUES ('three', '6/6/2006')
INSERT INTO Tags (tag, DateAdded) VALUES ('four', '7/7/2007')
INSERT INTO Tags (tag, DateAdded) VALUES ('five', '8/8/2008')
INSERT INTO Tags (tag, DateAdded) VALUES ('five', '9/9/2009')
Copy
As you can see, we have a several rows with duplicate tags that were created on different dates. Let's say we wanted to grab only one tag and choose it by the first date that it was entered.
DEDUPING DATA
;WITH CTE
AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY tag ORDER BY dateadded) dupcount, *
FROM dbo.tags
)
SELECT * FROM CTE WHERE dupcount = 1
Copy
From this example, you can see that the ROW_NUMBER() function is counting the dup records and they are PARTITIONed by tag and ordered by the DateAdded field. The SELECT statement uses the CTE (can be any name) as the table name in the the query.
Instead of just seeing the results, you could also UPDATE dup tags or DELETE them altogether:
;WITH CTE
AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY tag ORDER BY dateadded) dupcount, *
FROM dbo.tags
)
DELETE FROM CTE WHERE dupcount > 1
Copy
RECURSIVE CTES
CTEs can also be recursive. The query gets rerun until it fulfills its conditions. For this next example, let's suppose you have a self-referencing table in a child/parent fashion as follows:
CREATE TABLE Category (
Id INT IDENTITY(1,1),
ParentId INT,
Name VARCHAR(20)
);
INSERT INTO Category (ParentId, Name) VALUES
(NULL, 'Cat 1'),
(1, 'Cat 2'),
(2, 'Cat 3'),
(NULL, 'Cat 4'),
(4, 'Cat 5'),
(5, 'Cat 6'),
(1, 'Cat 7'),
(4, 'Cat 8');
Copy
Many people will query one category, find the Parent and query again. It works but is terribly inefficient. Recursive CTEs to the rescue. By using a UNION ALL statement as the CTE expression, we will recursively retrieve results until the NULL parent is found. Then we can order them in order as they were put in:
;WITH cats AS (
SELECT
Id,
Name,
0 AS Level,
CAST(Id AS VARCHAR(255)) AS Path
FROM
Category WHERE ParentId IS NULL
UNION ALL
SELECT
i.Id,
i.Name,
Level + 1,
CAST(Path + '.' + CAST(i.Id AS VARCHAR(255)) AS VARCHAR(255)) AS Path
FROM
Category i INNER JOIN
cats c ON c.Id = i.ParentId
)
SELECT * FROM cats ORDER BY Path
Copy
This gives us our results all in one quick query ordered in a very usable way.
Id | Name | Level | Path |
---|---|---|---|
1 | Cat 1 | 0 | 1 |
2 | Cat 2 | 1 | 1.2 |
3 | Cat 3 | 2 | 1.2.3 |
7 | Cat 7 | 1 | 1.7 |
4 | Cat 4 | 0 | 4 |
5 | Cat 5 | 1 | 4.5 |
6 | Cat 6 | 2 | 4.5.6 |
8 | Cat 8 | 1 | 4.8 |