Common Table Expressions in SQL Server

by Kristian Ranstrom
October 12, 2021
3 min read

Common Table Expressions (CTEs) in T-SQL are super powerful.  They came out originally in SQL Server 2005 and are essentially a temporary query result that you can further apply queries to.  I often use them to dedup data or find all ancestors in a child/parent table.  I'll show both of these examples below.

Parts of a CTE expression

CTEs are defined by a 'WITH' keyword.  Other keywords are ROW_NUMBER, OVER, PARTITION BY and ORDER BY

Let's define a simple table quick:

CREATE TABLE Tags (
ID int NOT NULL IDENTITY(1, 1),
Tag varchar(50) NOT NULL,
DateAdded DATETIME2 NOT NULL
)
INSERT INTO Tags (tag, DateAdded) VALUES ('one', '1/1/2001')
INSERT INTO Tags (tag, DateAdded) VALUES ('two', '2/2/2002')
INSERT INTO Tags (tag, DateAdded) VALUES ('two', '3/3/2003')
INSERT INTO Tags (tag, DateAdded) VALUES ('three', '4/4/2004')
INSERT INTO Tags (tag, DateAdded) VALUES ('three', '5/5/2005')
INSERT INTO Tags (tag, DateAdded) VALUES ('three', '6/6/2006')
INSERT INTO Tags (tag, DateAdded) VALUES ('four', '7/7/2007')
INSERT INTO Tags (tag, DateAdded) VALUES ('five', '8/8/2008')
INSERT INTO Tags (tag, DateAdded) VALUES ('five', '9/9/2009')

As you can see, we have a several rows with duplicate tags that were created on different dates.  Let's say we wanted to grab only one tag and choose it by the first date that it was entered.

Deduping Data

;WITH CTE
AS 
(
SELECT ROW_NUMBER() OVER(PARTITION BY tag ORDER BY dateadded) dupcount, *
   FROM   dbo.tags 
)
SELECT * FROM CTE WHERE dupcount = 1

From this example, you can see that the ROW_NUMBER() function is counting the dup records and they are PARTITIONed by tag and ordered by the DateAdded field.  The SELECT statement uses the CTE (can be any name) as the table name in the the query.

Instead of just seeing the results, you could also UPDATE dup tags or DELETE them altogether:

;WITH CTE
AS 
(
SELECT ROW_NUMBER() OVER(PARTITION BY tag ORDER BY dateadded) dupcount, *
   FROM   dbo.tags 
)
DELETE FROM CTE WHERE dupcount > 1

Recursive CTEs

CTEs can also be recursive.  The query gets rerun until it fulfills its conditions.  For this next example, let's suppose you have a self-referencing table in a child/parent fashion as follows:

CREATE TABLE Category (
 Id INT IDENTITY(1,1),  
 ParentId INT,
 Name VARCHAR(20)
 );

INSERT INTO Category (ParentId, Name) VALUES
(NULL, 'Cat 1'),
(1, 'Cat 2'),
(2, 'Cat 3'),
(NULL, 'Cat 4'),
(4, 'Cat 5'),
(5, 'Cat 6'),
(1, 'Cat 7'),
(4, 'Cat 8');

Many people will query one category, find the Parent and query again.  It works but is terribly inefficient.  Recursive CTEs to the rescue.  By using a UNION ALL statement as the CTE expression, we will recursively retrieve results until the NULL parent is found.  Then we can order them in order as they were put in:

;WITH cats AS (
SELECT 
	Id, 
	Name, 
	0 AS Level, 
	CAST(Id AS VARCHAR(255)) AS Path
 FROM 
	Category WHERE ParentId IS NULL

UNION ALL

SELECT 
	i.Id, 
	i.Name, 
	Level + 1, 
	CAST(Path + '.' + CAST(i.Id AS VARCHAR(255)) AS VARCHAR(255)) AS Path
FROM 
	Category i INNER JOIN 
	cats c ON c.Id = i.ParentId
)
SELECT * FROM cats ORDER BY Path

This gives us our results all in one quick query ordered in a very usable way.

Id Name Level Path
1 Cat 1 0 1
2 Cat 2 1 1.2
3 Cat 3 2 1.2.3
7 Cat 7 1 1.7
4 Cat 4 0 4
5 Cat 5 1 4.5
6 Cat 6 2 4.5.6
8 Cat 8 1 4.8

You can see the SQL Fiddle in action here.