Friday, November 04, 2005

Variable Processing in SQL Server

or Why Won’t SQL Server Use My Index?

I ran into an interesting situation recently when I was writing a new stored procedure. I had written a very simple query that I was confident SQL Server 2000 would use an index when it executed, but SQL Server would not use the index (unless I explicitly told it to use it). I did some research and discovered why SQL Server would not use the index. The results of my research are given below.

The Problem
I have a typical order table with an OrderID column, a ProductOrderID primary key column, a bunch of other columns, and an OrderDateTime column. The clustered index for the table is on the OrderID column and there is an index on the OrderDateTime column. I needed to write a stored procedure that returned information about orders that occurred between two datetime values. I wrote a query in SQL Query Analyzer similar to the one shown below to return the information I needed:

-- Examples
DECLARE @StartDateTime DATETIME
DECLARE @EndDateTime DATETIME

SET @StartDateTime = '11/1/2005'
SET @EndDateTime = '11/2/2005'

-- Attempt 1
SELECT ProductOrderID, TotalPrice
FROM ProductOrder
WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime


When I ran the query in SQL Query Analyzer the performance was horrible, so I took a look at the query plan and saw that it was doing a Clustered Index Scan (which is basically a table scan) instead of doing an Index Seek on the OrderDateTime index. This baffled me so I though I would change the query to use >= and <= operators instead of the BETWEEN operator.

-- Attempt 2
SELECT ProductOrderID, TotalPrice
FROM ProductOrder
WHERE OrderDateTime >= @StartDateTime AND OrderDateTime <= @EndDateTime


This query behaved the exact same way. If I added a query hint to the query (as shown below) I could get it to use the index, but I really did not want to do that. (I tend to believe that SQL Server can optimize a query better than I can.)

-- Attempt 1 With Hint
SELECT ProductOrderID, TotalPrice
FROM ProductOrder WITH (INDEX(ix_ProductOrder_OrderDateTime))
WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime


This new stored procedure was actually replacing some dynamically generated SQL, so I though I would run the query with the values explicitly specified (Dynamic SQL Method) instead of using variables (Stored Procedure Method).

-- Dynamic SQL
SELECT ProductOrderID, TotalPrice
FROM ProductOrder
WHERE OrderDateTime BETWEEN '11/1/2005' AND '11/2/2005'


Just as I would expect, this version used the OrderDateTime index. At this point I was very confused and was beginning to doubt the conventional wisdom that stored procedures are better than dynamic SQL (but that is another topic all together).

A coworker and I did some digging and found the following useful newsgroup threads and articles:

Optimizing Variables and Parameters
By Kalen Delaney
http://www.windowsitpro.com/SQLServer/Article/ArticleID/42801/42801.html
http://sqlmag.com/t-sql/optimizing-variables-and-parameters
https://web.archive.org/web/20170621161058/http://sqlmag.com/t-sql/optimizing-variables-and-parameters

Inside Search Arguments
By Kalen Delaney
http://www.windowsitpro.com/Articles/Index.cfm?ArticleID=42349
http://sqlmag.com/t-sql/inside-search-arguments
https://web.archive.org/web/20170930231710/http://sqlmag.com/t-sql/inside-search-arguments

Dates and Indexes in Query
microsoft.public.sqlserver.programming
http://groups.google.com/group/microsoft.public.sqlserver.programming
/browse_thread/thread/a27a1d7930f42ad9/ac2b0b3a280427f2


I would recommend reading any and all of the resources listed above, but the Optimizing Variables and Parameters article said it best:


Although you have a valid SARG in the SELECT statement, when the optimizer evaluates the query, it hasn't yet executed the SET statement that assigns the specific value to the variable. The optimizer optimizes the entire batch at once. You can't put the variable assignment in a separate batch because the scope of a local variable is one batch. Because the optimizer doesn't know what value to look for in the index-statistics histogram, it has to guess.


Basically SQL Server doesn’t know what the value of the variable is going to be (even though I had just set it in the line above), so it guesses what it might be and goes from there. The article goes on to explain exactly how SQL Server guesses, but in most cases (including mine) it guesses wrong.

The Solution
So what is the solution to this problem? Put the query in a stored procedure which, ironically enough, is where it was destined for anyway.

When I put the query into the following stored procedure and ran it, it used the index just like it was supposed to.

CREATE PROCEDURE GetProductOrderInfoForRange(

    @StartDateTime DATETIME,
    @EndDateTime DATETIME) AS
BEGIN
    SELECT ProductOrderID, TotalPrice
    FROM ProductOrder
    WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime
END

-- Stored Procedure Method
EXEC GetProductOrderInfoForRange
    @StartDateTime = '11/1/2005',
    @EndDateTime = '11/2/2005'


Some Lessons Learned
Unlike some compiled languages that used optimizers (like C/C++, .NET, etc.) using a literal constant is different than using a variable that is set to a constant value. Do not expect the optimizer to know what the value of a variable will be or that a variable will not change.

If you are testing or analyzing the performance of a query, it is best to test it in the context that it will actually be running in. In other words, if a query is destined for a stored procedure, go ahead and test it in a stored procedure. (Of course if I had done that in the first place, I would not have learned about SARGs and how SQL Servers query optimizer works.)

No comments: