Tuesday, 21 June 2011

SQL: Row_Number, Rank, Dense_Rank, Ntile functions

T-SQL currently supports four ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. I'll define these rank functions in SQL Server and show you how they work, but let's first look at the test environment I'll be using to demonstrate these functions.

To obtain the data I needed, I used the following code to create the Sales.Quota table in the SQL Server 2005 AdventureWorks sample database:

USE AdventureWorks
GO

--Drop Sales.Quotas table if it exists
IF OBJECT_ID (N'Sales.Quotas', N'U') IS NOT NULL

DROP TABLE Sales.Quotas
GO


--Create Sales.Quotas table
SELECT e.FirstName, e.LastName, q.SalesQuota AS Quota,
DATENAME(m,q.QuotaDate) AS [Month], YEAR(q.QuotaDate) AS [Year]
INTO Sales.Quotas
FROM Sales.SalesPersonQuotaHistory q
INNER JOIN HumanResources.vEmployee e
ON q.SalesPersonID = e.EmployeeID

WHERE SalesQuota BETWEEN 210000 and 280000
ORDER BY e.LastName, q.QuotaDate

As you can see, I simply pull data from a couple other tables in the database in order to create a set of meaningful test data.

Here's the SELECT statement I use to query the new table:

SELECT
ROW_NUMBER() OVER(ORDER BY Quota DESC) AS [RowNumber],
RANK() OVER(ORDER BY Quota DESC) AS [Rank],
DENSE_RANK() OVER(ORDER BY Quota DESC) AS [DenseRank],
NTILE(5) OVER(ORDER BY Quota DESC) AS [NTile],
LastName, Quota, [Month], [Year]
FROM Sales.Quotas


The SELECT statement uses all four ranking functions to rank the rows. I include all the functions in one statement, so you can compare the results returned by each function, as shown in the following result set:

RowNumber

Rank

DenseRank

NTile

LastName

Quota

Month

Year

1

1

1

1

Campbell

280000.00

January

2002

2

1

1

1

Vargas

280000.00

January

2004

3

3

2

1

Campbell

267000.00

April

2002

4

4

3

2

Vargas

266000.00

January

2002

5

5

4

2

Ansman-Wolfe

264000.00

January

2002

6

6

5

2

Jiang

263000.00

July

2003

7

7

6

3

Saraiva

247000.00

April

2003

8

8

7

3

Vargas

244000.00

July

2001

9

9

8

3

Vargas

239000.00

January

2003

10

10

9

4

Campbell

234000.00

January

2004

11

11

10

4

Ansman-Wolfe

226000.00

October

2002

12

11

10

4

Campbell

226000.00

July

2001

13

13

11

5

Ansman-Wolfe

224000.00

April

2003

14

14

12

5

Varkey Chudukatil

217000.00

January

2003

15

15

13

5

Ansman-Wolfe

210000.00

July

2002


ROW_NUMBER function

The ROW_NUMBER function is the most basic of the ranking functions. As you can see in the result set (the RowNumber column), the function numbers each row sequentially, beginning with 1. If you refer back to the query, you'll see that the first element in the SELECT clause is the ROW_NUMBER function. When you use this function, first specify the function name, followed by the empty parentheses. You do not pass any values into the function.

After the ranking function, specify the OVER function. For this function, you pass in an ORDER BY clause as an argument. The clause specifies the column (or columns) you want to rank. In this case, I am ranking the values in the Quota column -- in descending order. As a result, the rows in the result set are ranked starting with the highest Quota amount. If you refer again to the result set, you'll see the row with the highest Quota value is ranked 1 and the row with the lowest value is ranked 15. (The result set contains 15 rows.)

That's all there is to using the ROW_NUMBER function, and the other ranking functions work in much the same way, only the results are slightly different.


RANK function in SQL Server

The next ranking function in the SELECT list is RANK. Once again, you specify the function name, followed by the OVER function, which again includes the ORDER BY clause. However, as you can see in the result set (the Rank column),

the ranked values are slightly different than what you saw for the ROW_NUMBER function. Yes, the highest Quota value is ranked 1, but, because two rows share the same highest value, they are both ranked 1.

When you use the RANK function, all shared values will be ranked the same. But notice that the rank value itself is based on the row's position in the result set, not on the sequential number of the row. For example, the Quota value in the third row is 267,000. That is the second highest Quota value, yet because it falls in the third row, it receives a ranking of 3, rather than 2. The RANK function skips the 2 because the second row matches the first row. If the fourth row shared the same value as the third row, it would also be ranked as 3. But because the value is lower and it is in the fourth row, it is ranked 4.


DENSE_RANK function

The DENSE_RANK function takes a different approach. Like the RANK function, the first two rows are assigned a value of 1. However, the DENSE_RANK function uses sequential numbering, rather than tying the rank to the row number. As a result, the third row is assigned a value of 2 because the Quota column contains the second highest value, and the fourth row is assigned a value of 3 because it is the third highest value, and so on.

The ROW_NUMBER, RANK, and DENSE_RANK functions are similar in how they return results. The difference is in whether the numbering is sequential and whether it is tied to the row number. The NTILE function, however, is a bit different than these three functions.


NTILE function

If you refer back to the SELECT statement, you can see that when you specify the NTILE function, you pass in an integer as an argument to the function -- unlike the other ranking functions where you pass in no argument. The NTILE function divides the result set

into the number of groups specified by this argument. For example, in the SELECT statement, I specify 5, which means the result set will be split into five groups. Because there are 15 rows in the result set, each group will contain three rows. The rows are grouped together based on the value in the Quota column.

As a result, the three rows with the highest Quota values are in the first group and receive a ranking of 1. The three rows with the next highest Quota values are in the second group and receive a ranking of 2. and so on. Because there are only five groups, the highest ranking is 5, which is assigned to the group with the three lowest Quota values. Again, refer back to the result set to better understand how the NTILE function groups data and then ranks each group.

Thursday, 2 June 2011

SQL: Checksum

Compare 2 tables for duplicates.

There are 3 types of checksum:
  1. checksum: Is not case sensitive.
  2. binary_checksum: Is case sensitive.
  3. checksum_agg: Aggregates the results to check table as a whole.
example:

SELECT *

FROM [Person].[Contact] a

inner join [Person].[Contact] b

on a.ContactID = b.ContactID

WHERE BINARY_CHECKSUM(a.[NameStyle],a.[Title],a.[FirstName]) = BINARY_CHECKSUM(b.[NameStyle],b.[Title],b.[FirstName])

SQL: xpath and xQuery

--XML xpath example

DECLARE @xml XML



SET @xml = N'


Gambardella, Matthew


Computer


44.95

2000-10-01

An in-depth look at creating applications

with XML.


Ralls, Kim

Fantasy

5.95

2000-12-16

A former architect battles corporate zombies,

an evil sorceress, and her own childhood to become queen

of the world.

';


SELECT x.books.value('@id[1]','varchar(10)') AS bookid

,x.books.value('title[1]','varchar(50)') AS bookName

,x.books.value('author[1]','varchar(50)') AS [BookAuth]

,x.books.value('description[1]','varchar(500)') AS [Desc]

,x.books.value('price[1]','money') as price

,x.books.value('publish_date[1]','datetime') as datePublished

FROM @xml.nodes('//catalog/book[price > 5]') as x(books)

where x.books.value('price[1]','money') > 6

order by bookid


--xQuery

SELECT @xml.query('

for $book in //catalog/book

where $book/price > 5

order by $book/title[1]

return $book')



SELECT @xml.query('

for $book in //catalog/book

where $book/price > 5

order by $book/title[1]

return

element Book

{(

element Title { data($book/title)},

element Author { data($book/author)}

)}

')



SELECT @xml.query('

for $book in //catalog/book

where $book/price > 5

order by $book/title[1]

return

element Book

{(

attribute Title { data($book/title)},

attribute Author { data($book/author)}

)}

')

Sunday, 24 April 2011

Kerberos Settings

Kerberos needs to be set up for any services across the network.

For example SQL server, SSRS, SharePoint.

Add:
setspn -a HTTP/URL Domain User

Remove:
setspn -d HTTP/URL Domain User

See here for further help:
http://blogs.msdn.com/b/martinkearn/archive/2007/04/23/configuring-kerberos-for-sharepoint-2007-part-1-base-configuration-for-sharepoint.aspx