T-SQL currently supports four ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. I'll define these rank functions in SQL Server and show you how they work, but let's first look at the test environment I'll be using to demonstrate these functions.
To obtain the data I needed, I used the following code to create the Sales.Quota table in the SQL Server 2005 AdventureWorks sample database:
USE AdventureWorks
GO
--Drop Sales.Quotas table if it exists
IF OBJECT_ID (N'Sales.Quotas', N'U') IS NOT NULL
DROP TABLE Sales.Quotas
GO
--Create Sales.Quotas table
SELECT e.FirstName, e.LastName, q.SalesQuota AS Quota,
DATENAME(m,q.QuotaDate) AS [Month], YEAR(q.QuotaDate) AS [Year]
INTO Sales.Quotas
FROM Sales.SalesPersonQuotaHistory q
INNER JOIN HumanResources.vEmployee e
ON q.SalesPersonID = e.EmployeeID
As you can see, I simply pull data from a couple other tables in the database in order to create a set of meaningful test data.
Here's the SELECT statement I use to query the new table:
SELECT
ROW_NUMBER() OVER(ORDER BY Quota DESC) AS [RowNumber],
RANK() OVER(ORDER BY Quota DESC) AS [Rank],
DENSE_RANK() OVER(ORDER BY Quota DESC) AS [DenseRank],
NTILE(5) OVER(ORDER BY Quota DESC) AS [NTile],
LastName, Quota, [Month], [Year]
FROM Sales.Quotas
The SELECT statement uses all four ranking functions to rank the rows. I include all the functions in one statement, so you can compare the results returned by each function, as shown in the following result set:
RowNumber | Rank | DenseRank | NTile | LastName | Quota | Month | Year |
1 | 1 | 1 | 1 | Campbell | 280000.00 | January | 2002 |
2 | 1 | 1 | 1 | Vargas | 280000.00 | January | 2004 |
3 | 3 | 2 | 1 | Campbell | 267000.00 | April | 2002 |
4 | 4 | 3 | 2 | Vargas | 266000.00 | January | 2002 |
5 | 5 | 4 | 2 | Ansman-Wolfe | 264000.00 | January | 2002 |
6 | 6 | 5 | 2 | Jiang | 263000.00 | July | 2003 |
7 | 7 | 6 | 3 | Saraiva | 247000.00 | April | 2003 |
8 | 8 | 7 | 3 | Vargas | 244000.00 | July | 2001 |
9 | 9 | 8 | 3 | Vargas | 239000.00 | January | 2003 |
10 | 10 | 9 | 4 | Campbell | 234000.00 | January | 2004 |
11 | 11 | 10 | 4 | Ansman-Wolfe | 226000.00 | October | 2002 |
12 | 11 | 10 | 4 | Campbell | 226000.00 | July | 2001 |
13 | 13 | 11 | 5 | Ansman-Wolfe | 224000.00 | April | 2003 |
14 | 14 | 12 | 5 | Varkey Chudukatil | 217000.00 | January | 2003 |
15 | 15 | 13 | 5 | Ansman-Wolfe | 210000.00 | July | 2002 |
ROW_NUMBER function
The ROW_NUMBER function is the most basic of the ranking functions. As you can see in the result set (the RowNumber column), the function numbers each row sequentially, beginning with 1. If you refer back to the query, you'll see that the first element in the SELECT clause is the ROW_NUMBER function. When you use this function, first specify the function name, followed by the empty parentheses. You do not pass any values into the function.
After the ranking function, specify the OVER function. For this function, you pass in an ORDER BY clause as an argument. The clause specifies the column (or columns) you want to rank. In this case, I am ranking the values in the Quota column -- in descending order. As a result, the rows in the result set are ranked starting with the highest Quota amount. If you refer again to the result set, you'll see the row with the highest Quota value is ranked 1 and the row with the lowest value is ranked 15. (The result set contains 15 rows.)
That's all there is to using the ROW_NUMBER function, and the other ranking functions work in much the same way, only the results are slightly different.
The next ranking function in the SELECT list is RANK. Once again, you specify the function name, followed by the OVER function, which again includes the ORDER BY clause. However, as you can see in the result set (the Rank column),
the ranked values are slightly different than what you saw for the ROW_NUMBER function. Yes, the highest Quota value is ranked 1, but, because two rows share the same highest value, they are both ranked 1.
When you use the RANK function, all shared values will be ranked the same. But notice that the rank value itself is based on the row's position in the result set, not on the sequential number of the row. For example, the Quota value in the third row is 267,000. That is the second highest Quota value, yet because it falls in the third row, it receives a ranking of 3, rather than 2. The RANK function skips the 2 because the second row matches the first row. If the fourth row shared the same value as the third row, it would also be ranked as 3. But because the value is lower and it is in the fourth row, it is ranked 4.
DENSE_RANK function
The DENSE_RANK function takes a different approach. Like the RANK function, the first two rows are assigned a value of 1. However, the DENSE_RANK function uses sequential numbering, rather than tying the rank to the row number. As a result, the third row is assigned a value of 2 because the Quota column contains the second highest value, and the fourth row is assigned a value of 3 because it is the third highest value, and so on.
The ROW_NUMBER, RANK, and DENSE_RANK functions are similar in how they return results. The difference is in whether the numbering is sequential and whether it is tied to the row number. The NTILE function, however, is a bit different than these three functions.
NTILE function
If you refer back to the SELECT statement, you can see that when you specify the NTILE function, you pass in an integer as an argument to the function -- unlike the other ranking functions where you pass in no argument. The NTILE function divides the result set
into the number of groups specified by this argument. For example, in the SELECT statement, I specify 5, which means the result set will be split into five groups. Because there are 15 rows in the result set, each group will contain three rows. The rows are grouped together based on the value in the Quota column.
As a result, the three rows with the highest Quota values are in the first group and receive a ranking of 1. The three rows with the next highest Quota values are in the second group and receive a ranking of 2. and so on. Because there are only five groups, the highest ranking is 5, which is assigned to the group with the three lowest Quota values. Again, refer back to the result set to better understand how the NTILE function groups data and then ranks each group.