Do we need to eliminate Sort operator all the time in execution plan?

Hi Friends,

As we know Performance is one area where the scope for learning will never end. Today I am going to share an article which emphasizes on Seek/Scan and Sort operators. There are many blogs and my article is nothing in front of them but I am just sharing the knowledge of what I could.

Let me start up with the demo first. We have 2 tables one containing LogID and other some random Dates. When it comes to second table as you can see it has PK/FK relationship on the column LogID and a userid column which has only 3 users (1,2,3).

CREATE TABLE [dbo].[LogTable]

(

[LogID] [int] NOT NULL IDENTITY(1, 1) ,

[DateSent] [datetime] NULL,

)

ON [PRIMARY]

ALTER TABLE [dbo].[LogTable] ADD CONSTRAINT [PK_LogTable] PRIMARY KEY CLUSTERED ([LogID]) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent] ON [dbo].[LogTable] ([DateSent] DESC) ON [PRIMARY]

CREATE TABLE [dbo].[LogTable_Cross]

(

[LogID] [int] NOT NULL ,

[UserID] [int] NOT NULL

)

ON [PRIMARY]

ALTER TABLE [dbo].[LogTable_Cross] WITH NOCHECK ADD CONSTRAINT [FK_LogTable_Cross_LogTable] FOREIGN KEY ([LogID]) REFERENCES [dbo].[LogTable] ([LogID])

CREATE NONCLUSTERED INDEX [IX_LogTable_Cross_UserID_LogID]

ON [dbo].[LogTable_Cross] ([UserID])

INCLUDE ([LogID])

INSERT INTO [LogTable]

SELECT TOP 100000

DATEADD(day, ( ABS(CHECKSUM(NEWID())) % 65530 ), 0)

FROM sys.sysobjects

CROSS JOIN sys.all_columns

INSERT INTO [LogTable_Cross] SELECT [LogID] ,1 FROM [LogTable] ORDER BY NEWID()

INSERT INTO [LogTable_Cross] SELECT [LogID] ,2 FROM [LogTable] ORDER BY NEWID()

INSERT INTO [LogTable_Cross] SELECT [LogID] ,3 FROM [LogTable] ORDER BY NEWID()

I will show you top 5 rows (in descending order) in each of the tables

As you can see from the script above we have clustered index on logID column and non clustered index on Datesent column for the Logtable. when it comes to other table we have non clustered index on LogID with included column as userid.

Now I want to select all those logs (from LogTable) which has given userid (user id will be checked from cross table LogTable_Cross) with datesent desc.

SELECT DI.LogID

FROM LogTable DI

INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID

WHERE DP.UserID = 1

ORDER BY DateSent DESC

If you see from the execution plan it went for Clustered Index Scan on LogTable and seek on LogTable_cross. It went for parallel execution plan and when it comes to Cost it is Sort operation which is consuming 45 %.

Let's execute the same query but this time with MAXDOP 1 and see if there is any change with respect to sort operation.

Now if you see above the execution plan we do have sort operation in place. But Now why it went with Hash Join? Why not with Merge join? Because the ordered property is false for the clustered index scan operator

When it comes to scans the database engine has 2 options.

1)Allocation Ordered Scans

2)Index Order Scan

When the plan shows a Table Scan operator, the storage engine has only one option: to use an allocation order scan. When the plan shows an Index Scan operator (clustered or nonclustered) with the property Ordered: True, the storage engine can use only an index order scan.

When the plan shows an Index Scan operator with Ordered: False, the relational engine doesn’t care in what order the rows are returned. In this case, there are two options to scan the data: allocation order scan and index order scan. It is up to the storage engine to determine which to employ.

In the above execution plan it opted for False which means it went for allocation ordered scan. As the order is false it has to go with Hash join. If it would have been true then definitely we will have Merge join operator.

Okay now in both plans (parallel and Serial) it is suggesting an missing index so let's create one and see if that eliminates sort operator.

We have an index with userid as the leading column and the missing index is suggesting to create leading column as LogID and then userID.

After creating the index I tried running the same query and I got the below execution plans for parallel and serial(maxdop 1)

So the top execution plan is for serial execution and the bottom for parallel. If you notice there is no change in the plan when it comes to parallel execution and for serial as explained above this time we got Merge join operator instead of Hash as now ordered property is true. Also it made use of the index what missing index has requested and it went with Index scan instead of seek.

Okay if you compare the costs the one with parallel outcomes the serial as it is just 19%.

Now back to my question the sort operator is there even now and it is the one which is contributing to the cost so how to get rid of this?

Well when I forced the optimizer to make use of nested joins then the sort operator got disappeared.

SELECT DI.LogID

FROM LogTable DI

INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID

WHERE DP.UserID = 1

ORDER BY DateSent DESC

OPTION (MAXDOP 1, LOOP JOIN);

As you can see the parallel and the select both have no sort operators. Wow finally great to see the execution plan with out sort operator but wait let us compare the cost with the ones where we forced Nested loop join to the default one's of what Engine thinks as better plan.

The top execution plan is what Engine has opted and the bottom is where I forced as you can see Engine's selection outperformed mine. Let's do the same with Serial plan.

Just like parallel plan even the serial plan of Engine thinks that Nested loop join is not the better plan.

Now as my question points do we need to eliminate sort all the time? Well, it depends but certainly not in this case.

In fact we can eliminate sort here by creating Materialized view but I am not going to discuss this here.

I know this is bit lengthy and even few might feel it as clumsy but I am ready to help in case if you have any queries on the same.

RDBMS HUB

Search This Blog

Do we need to eliminate Sort operator all the time in execution plan?

Comments