John's Technical Blog: 2011

T-SQL currently supports four ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. I'll define these rank functions in SQL Server and show you how they work, but let's first look at the test environment I'll be using to demonstrate these functions.

To obtain the data I needed, I used the following code to create the Sales.Quota table in the SQL Server 2005 AdventureWorks sample database:

USE AdventureWorks
GO
--Drop Sales.Quotas table if it exists
IF OBJECT_ID (N'Sales.Quotas', N'U') IS NOT NULL

DROP TABLE Sales.Quotas
GO

--Create Sales.Quotas table
SELECT e.FirstName, e.LastName, q.SalesQuota AS Quota,
DATENAME(m,q.QuotaDate) AS [Month], YEAR(q.QuotaDate) AS [Year]
INTO Sales.Quotas
FROM Sales.SalesPersonQuotaHistory q
INNER JOIN HumanResources.vEmployee e
ON q.SalesPersonID = e.EmployeeID

WHERE SalesQuota BETWEEN 210000 and 280000

ORDER BY e.LastName, q.QuotaDate

As you can see, I simply pull data from a couple other tables in the database in order to create a set of meaningful test data.

Here's the SELECT statement I use to query the new table:

SELECT
ROW_NUMBER() OVER(ORDER BY Quota DESC) AS [RowNumber],
RANK() OVER(ORDER BY Quota DESC) AS [Rank],
DENSE_RANK() OVER(ORDER BY Quota DESC) AS [DenseRank],
NTILE(5) OVER(ORDER BY Quota DESC) AS [NTile],
LastName, Quota, [Month], [Year]
FROM Sales.Quotas

The SELECT statement uses all four ranking functions to rank the rows. I include all the functions in one statement, so you can compare the results returned by each function, as shown in the following result set:

RowNumber	Rank	DenseRank	NTile	LastName	Quota	Month	Year
1	1	1	1	Campbell	280000.00	January	2002
2	1	1	1	Vargas	280000.00	January	2004
3	3	2	1	Campbell	267000.00	April	2002
4	4	3	2	Vargas	266000.00	January	2002
5	5	4	2	Ansman-Wolfe	264000.00	January	2002
6	6	5	2	Jiang	263000.00	July	2003
7	7	6	3	Saraiva	247000.00	April	2003
8	8	7	3	Vargas	244000.00	July	2001
9	9	8	3	Vargas	239000.00	January	2003
10	10	9	4	Campbell	234000.00	January	2004
11	11	10	4	Ansman-Wolfe	226000.00	October	2002
12	11	10	4	Campbell	226000.00	July	2001
13	13	11	5	Ansman-Wolfe	224000.00	April	2003
14	14	12	5	Varkey Chudukatil	217000.00	January	2003
15	15	13	5	Ansman-Wolfe	210000.00	July	2002

ROW_NUMBER function

The ROW_NUMBER function is the most basic of the ranking functions. As you can see in the result set (the RowNumber column), the function numbers each row sequentially, beginning with 1. If you refer back to the query, you'll see that the first element in the SELECT clause is the ROW_NUMBER function. When you use this function, first specify the function name, followed by the empty parentheses. You do not pass any values into the function.

After the ranking function, specify the OVER function. For this function, you pass in an ORDER BY clause as an argument. The clause specifies the column (or columns) you want to rank. In this case, I am ranking the values in the Quota column -- in descending order. As a result, the rows in the result set are ranked starting with the highest Quota amount. If you refer again to the result set, you'll see the row with the highest Quota value is ranked 1 and the row with the lowest value is ranked 15. (The result set contains 15 rows.)

That's all there is to using the ROW_NUMBER function, and the other ranking functions work in much the same way, only the results are slightly different.

RANK function in SQL Server

The next ranking function in the SELECT list is RANK. Once again, you specify the function name, followed by the OVER function, which again includes the ORDER BY clause. However, as you can see in the result set (the Rank column),

the ranked values are slightly different than what you saw for the ROW_NUMBER function. Yes, the highest Quota value is ranked 1, but, because two rows share the same highest value, they are both ranked 1.

When you use the RANK function, all shared values will be ranked the same. But notice that the rank value itself is based on the row's position in the result set, not on the sequential number of the row. For example, the Quota value in the third row is 267,000. That is the second highest Quota value, yet because it falls in the third row, it receives a ranking of 3, rather than 2. The RANK function skips the 2 because the second row matches the first row. If the fourth row shared the same value as the third row, it would also be ranked as 3. But because the value is lower and it is in the fourth row, it is ranked 4.

DENSE_RANK function

The DENSE_RANK function takes a different approach. Like the RANK function, the first two rows are assigned a value of 1. However, the DENSE_RANK function uses sequential numbering, rather than tying the rank to the row number. As a result, the third row is assigned a value of 2 because the Quota column contains the second highest value, and the fourth row is assigned a value of 3 because it is the third highest value, and so on.

The ROW_NUMBER, RANK, and DENSE_RANK functions are similar in how they return results. The difference is in whether the numbering is sequential and whether it is tied to the row number. The NTILE function, however, is a bit different than these three functions.

NTILE function

If you refer back to the SELECT statement, you can see that when you specify the NTILE function, you pass in an integer as an argument to the function -- unlike the other ranking functions where you pass in no argument. The NTILE function divides the result set

into the number of groups specified by this argument. For example, in the SELECT statement, I specify 5, which means the result set will be split into five groups. Because there are 15 rows in the result set, each group will contain three rows. The rows are grouped together based on the value in the Quota column.

As a result, the three rows with the highest Quota values are in the first group and receive a ranking of 1. The three rows with the next highest Quota values are in the second group and receive a ranking of 2. and so on. Because there are only five groups, the highest ranking is 5, which is assigned to the group with the three lowest Quota values. Again, refer back to the result set to better understand how the NTILE function groups data and then ranks each group.

--XML xpath example

DECLARE @xml XML

SET @xml = N'

Gambardella, Matthew

Computer

44.95

2000-10-01

An in-depth look at creating applications

with XML.

Ralls, Kim

Fantasy

5.95

2000-12-16

A former architect battles corporate zombies,

an evil sorceress, and her own childhood to become queen

of the world.

SELECT x.books.value('@id[1]','varchar(10)') AS bookid

,x.books.value('title[1]','varchar(50)') AS bookName

,x.books.value('author[1]','varchar(50)') AS [BookAuth]

,x.books.value('description[1]','varchar(500)') AS [Desc]

,x.books.value('price[1]','money') as price

,x.books.value('publish_date[1]','datetime') as datePublished

FROM @xml.nodes('//catalog/book[price > 5]') as x(books)

where x.books.value('price[1]','money') > 6

order by bookid

--xQuery

SELECT @xml.query('

for $book in //catalog/book

where $book/price > 5

order by $book/title[1]

return $book')

SELECT @xml.query('

for $book in //catalog/book

where $book/price > 5

order by $book/title[1]

return

element Book

{(

element Title { data($book/title)},

element Author { data($book/author)}

)}

SELECT @xml.query('

for $book in //catalog/book

where $book/price > 5

order by $book/title[1]

return

element Book

{(

attribute Title { data($book/title)},

attribute Author { data($book/author)}

)}

John's Technical Blog

Tuesday, 21 June 2011

SQL: Row_Number, Rank, Dense_Rank, Ntile functions

Thursday, 2 June 2011

SQL: Checksum

SQL: xpath and xQuery

Sunday, 24 April 2011

Kerberos Settings

My Homepage

Search This Blog