Quantcast
Channel: Kimberly L. Tripp - Nonclustered Indexes
Viewing all articles
Browse latest Browse all 20

How can you tell if an index is REALLY a duplicate?

$
0
0

There's a lot of confusion around duplicate indexes and I think a lot of it has to do with how sp_helpindex (or even SSMS) display what's in an index. What's actually in the index - and how it's structured - are not always what they seem. This was the original motivation behind my rewrite of sp_helpindex but even since then, I've still seen a lot of confusion. In today's blog post I'm going to first explain EXACTLY which indexes are the same and which aren't - as well as the faults in the tools.

So, starting with index structures... (it all starts with internals :) )

The clustered index IS the data. The KEY to the clustered index (what I often refer to as the clustering key) defines the way that the data is ordered (not necessarily truly, physically ordered on disk - but ordered LOGICALLY). And, no, I'm not going to rehash every bit of internals here... just a few reminders.

A nonclustered index is duplicated data (similar to an index in the back of a book). This duplicated data can be used to help reference the actual data (exactly like an index in the back of a book) OR can be used to actually respond to requests (for example, if you're solely looking for a count of people with a name that begins with 'S' then an index that has LastName in the index could be used to count them - without actually "looking up" the actual data). So, there are some very powerful uses to indexes. But, alas, this isn't a post on using indexes or indexing strategies - this is ALL about internals (and understanding the structure of an index). So, I'm going to cut to the chase!

A nonclustered index always has:

  • Key (this is what defines the order of the index)
  • A leaf-level entry (this is the actual data stored in the index + the lookup value* + any included columns) - however, ALL of these columns are only stored once (and they've ALWAYS only been stored once here so, even if you reference a column that's part of the lookup value, SQL Server will NOT duplicate it again).

*So, what is this lookup value?

The lookup value is what SQL Server uses to reference the actual data row. If a table has a clustered index then the lookup value is the clustering key (and EVERY column defined by it). If a table does not have a clustered index (it's then a HEAP) then SQL Server uses something called a RID. A RID is an 8-byte structure consisting of 2:4:2 bytes which breakdown into 2 for the FileID, 4 for the PageID and 2 for the slot number. And while RID's (and their history) are very interesting - it's not entirely relevant here (how they specifically work and/or their structures) but if/when they're in an index, I'll list is as RID.

Let's now put all of this together with an example (or two) starting with a relatively easy one.

USE JunkDB
go

CREATE TABLE Test
(
    TestID  int identity,
    [Name]  char(16)
)
go

CREATE UNIQUE CLUSTERED INDEX TestCL ON Test (TestID)
go

CREATE INDEX TestName ON Test([Name])
go

sp_helpindex Test
go

The output shows:

index_name      index_description                               index_keys
TestCL          clustered, unique located on PRIMARY            TestID
TestName        nonclustered located on PRIMARY            
     Name

And, while that looks right, it's horribly misleading. The index that's on TestName also includes TestID in the index as well. And, not just in the leaf level but in the tree (for ordering purposes). So, that should really show Name, TestID. But, it's a bit more confusing if you throw in this:

CREATE UNIQUE INDEX TestNameUnique ON Test([Name])
go

sp_helpindex Test
go

The output shows:

index_name      index_description                               index_keys
TestCL          clustered, unique located on PRIMARY            TestID
TestName        nonclustered located on PRIMARY                 Name
TestNameUnique  nonclustered, unique located on PRIMARY         Name

And, at this point, it doesn't look like there's any difference at all between the second and third indexes (well, except that the third index requires that the values be unique - shown in the description). But, as for the "index_keys" they look the same. However, they're actually not the same (in all parts of the tree). So, this is why I tend to differentiate between the "leaf" and the non-leaf levels of an index (when I describe them). And, it only gets more complicated when you throw in included columns (2005+).

So, how to do you tell the difference? Unfortunately, there are NO tools within SQL Server (or even any 3rd party tools that I know of) that display this through the UI, etc. But, you could start by using my replacement to sp_helpindex. I've updated and rewritten it a few times but the latest one will always be in the sp_helpindex reqrites category here: http://www.sqlskills.com/BLOGS/KIMBERLY/category/sp_helpindex-rewrites.aspx. I personally think that's the easiest (and is the reason why I originally wrote it!). Using it you can see how the output is more detailed.

The output shows (specifically showing ONLY the last 2 columns):

columns_in_tree              columns_in_leaf
[TestID]                     All columns "included" - the leaf level IS the data row.
[Name], [TestID]             [Name], [TestID]
[Name]                       [Name], [TestID]

Now, we're getting somewhere. We can definitely see that these two indexes ARE different. A nonunique nonclustered needs to have the lookup value pushed up into the tree (for navigation). A unique nonclustered index does not.

Next, let's go with a [just slightly more challenging] example:

USE JunkDB
go

CREATE TABLE Member
(
    MemberNo    int     identity,
    FirstName   varchar(30)     NOT NULL,
    LastName    varchar(30)     NOT NULL,
    RegionNo    int
)
go

CREATE UNIQUE CLUSTERED INDEX MemberCL ON Member(MemberNo)
go
   
CREATE INDEX MemberIndex1
ON Member(FirstName, RegionNo, MemberNo)
INCLUDE (LastName)
go

CREATE INDEX MemberIndex2
ON Member(FirstName, RegionNo)
INCLUDE (LastName)
go

CREATE INDEX MemberIndex3
ON Member(FirstName, RegionNo)
INCLUDE (MemberNo, LastName)
go

CREATE UNIQUE INDEX MemberIndex4
ON Member(Firstname, RegionNo)
INCLUDE (MemberNo, lastname)
go

First, let's review with sp_helpindex:

sp_helpindex Member
go

The output shows:

index_name      index_description                           index_keys
MemberCL        clustered, unique located on PRIMARY        MemberNo
MemberIndex1    nonclustered located on PRIMARY             FirstName, RegionNo, MemberNo
MemberIndex2    nonclustered located on PRIMARY             FirstName, RegionNo
MemberIndex3    nonclustered located on PRIMARY             FirstName, RegionNo
MemberIndex4    nonclustered, unique located on PRIMARY     FirstName, RegionNo

Looking solely at sp_helpindex it LOOKS like the first nonclustered index is different from the others and that the 2nd, 3rd and 4th nonclustered indexes are the same. However, that's actually NOT true. Next, let's use my version of sp_helpindex.

sp_SQLskills_SQL2008_helpindex Member
go

The output (again, just showing the last two columns of output):

columns_in_tree                           columns_in_leaf
[MemberNo]                                All columns "included" - the leaf level IS the data row.
[FirstName], [RegionNo], [MemberNo]       [FirstName], [RegionNo], [MemberNo], [LastName]
[FirstName], [RegionNo], [MemberNo]       [FirstName], [RegionNo], [MemberNo], [LastName]
[FirstName], [RegionNo], [MemberNo]       [FirstName], [RegionNo], [MemberNo], [LastName]
[FirstName], [RegionNo]                   [FirstName], [RegionNo], [MemberNo], [LastName]

From this, you can see that all 4 indexes have the same leaf level but index 4 has a slightly different tree structure. Ultimately, it's indexes 1, 2 and 3 that are the same and index 4 is actually [slightly] different. How they're different (outside of the fact that the 4th index guarantees uniqueness) is a bit beyond the scope of this post. But, yes, there are some [relatively minor in this case] differences. And, since I'm ONLY looking for indexes that are the same then only 1, 2 and 3 fit that requirement.

And, things get MUCH more complicated if you have multicolumn clustering key and/or more complicated INCLUDEs.

Having said that - how do you find duplicate indexes?

Well... I had started this by putting together a simple way for you to check for dupes with my version of sp_helpindex but then I figured out an issue with included columns. I'm showing the structures as they're defined (and stored). But, in terms of usage - the order of the columns in the INCLUDE does not matter. As a result, two indexes with different orders for their included columns will show as two different indexes (technically they are and they are stored differently). However, there are absolutely NO differences (in terms of usage) so I'll need to write code to adjust for that (to truly find duplicates).

For now, here's some quick code to get you closer. I had written some of this when we discussed this in our London Immersion Event. However, I've tweaked it even further here after thinking about some interesting exceptions. This code will find absolute duplicates (where the order of the structure is completely identical). To use it you'll need to enter in the schemaname and tablename (in bold below).

IF (SELECT count(*) FROM #FindDupes) IS NOT NULL
    DROP TABLE #FindDupes
go

CREATE TABLE #FindDupes
(
 index_id           int,
 is_disabled        bit,
 index_name         sysname,
 index_description  varchar(210),
 index_keys         nvarchar(2126),
 included_columns   nvarchar(max),
 filter_definition  nvarchar(max),
 columns_in_tree    nvarchar(2126),
 columns_in_leaf    nvarchar(max)
)
go

DECLARE @SchemaName sysname,
        @TableName  sysname,
        @ExecStr    nvarchar(max)

SELECT @SchemaName = N'schemaname',     -- MODIFY
       @TableName = N'tablename'        -- MODIFY
       
SELECT @ExecStr = 'EXEC sp_SQLskills_SQL2008_helpindex '''
                    + QUOTENAME(@SchemaName)
                    + N'.'
                    + QUOTENAME(@TableName)
                    + N''''

INSERT #FindDupes
EXEC (@ExecStr)

SELECT t1.index_id, COUNT(*) AS 'Duplicate Indexes w/Lower Index_ID',
        N'DROP INDEX '
            + QUOTENAME(@SchemaName, N']')
            + N'.'
            + QUOTENAME(@TableName, N']')
            + N'.'
            + t1.index_name AS 'Drop Index Statement'
FROM #FindDupes AS t1
    JOIN #FindDupes AS t2
        ON t1.columns_in_tree = t2.columns_in_tree
            AND t1.columns_in_leaf = t2.columns_in_leaf
            AND ISNULL(t1.filter_definition, 1) = ISNULL(t2.filter_definition, 1)
           
AND PATINDEX('%unique%', t1.index_description) = PATINDEX('%unique%', t2.index_description)
           
AND t1.index_id > t2.index_id
GROUP BY t1.index_id, N'DROP INDEX ' + QUOTENAME(@SchemaName, N']')
                    + N'.'
                    + QUOTENAME(@TableName, N']')
                    + N'.' + t1.index_name
go

I'm planning to bulid a sproc around this next as well as get rid of cases where the included columns are just in different orders next. It will be another sp_ for master and I do hope to wrap it up shortly! That code will REALLY find ALL unnecessary duplicates and help you to drop them! Stay tuned!!

Cheers and thanks for reading!!
kt


Viewing all articles
Browse latest Browse all 20

Trending Articles