Posts Tagged ‘Database’

Something about NULL in SQL Server

Tuesday, August 10th, 2010

As developers, we all probably know what a NULL is in SQL Server. NULL is not a specific value but it stands for UNKNOWN value. In other words, NULL represents the ‘absence of data’.

The behavior and treatment of NULL in SQL server is very interesting in different context and part of T-SQL language. Let’s have a quick look on how it is treated in different ways by SQL Server.

When you use NULL (UNKNOWN value) in Arithmetic operations or in String Concatenation operations, the result is always UNKNOWN because the value which is being operated is not available or UNKNOWN. So, all the following statements yield the result of NULL:

SELECT 5 + NULL
SELECT 10 / NULL
SELECT 'Gateway' + ' Technolabs' + NULL

So beware when you are trying to do something similar as shown below:

DECLARE @TOP INT
DECLARE @SQL NVARCHAR(100)
SET @SQL = 'SELECT TOP ' + CAST(@TOP AS VARCHAR(10)) + '* FROM HumanResources.Employee'
EXECUTE SP_EXECUTESQL @SQL

The above query will never execute. Guess why?  Because you forgot to initialize @TOP variable and due to that it will contain NULL. So when you try to concatenate the value of @TOP with the value of @SQL, the resulting query is UNKNOWN.

Unlike other programming languages, SQL Server follows the rules of “Three-Valued Logic” which means that the result of a condition in T-SQL can have any of three possible values viz. TRUE, FALSE and UNKNOWN. If NULL is one of the operands in condition composed of comparison operators like =,>,<,>=,<=,<>, then the result of the expression is NULL or UNKNOWN.  For example, the result of all of the following conditions is UNKNOWN:

… WHERE 100 = NULL

… WHERE NULL > 5

… WHERE NULL < 10

Even following statement yields NULL as per SQL-92 Standards:

… WHERE NULL = NULL

Let’s create a simple table and insert some records in it.

CREATE TABLE tblPerson
(
PersonID INT
,PersonName VARCHAR(50)
,Age TINYINT
)
GO
INSERT INTO tblPerson
SELECT 1,'Person-1',25
UNION
SELECT 2,'Person-2',NULL
UNION
SELECT 3,'Person-3',35

We have a table which contains three rows.  The second row has NULL in Age for Person-2. Now, execute the following query:

SET ANSI_NULLS OFF
SELECT * FROM tblPerson WHERE Age = NULL

The above query will return 1 row for Person-2. Notice the use of SET ANSI_NULLS OFF which is not recommended way because it does not conform to SQL-92 standards. Microsoft encourages the usage of SET ANSI_NULLS ON which is considered best practice and which is also the default behavior of SQL Server. Now, consider following query:

SET ANSI_NULLS ON
SELECT * FROM tblPerson WHERE Age = NULL

The result of the condition in WHERE clause in above query will be UNKNOWN. It is neither TRUE nor FALSE. But, in this case SQL server treats NULL as if “NOT EQUAL TO” and evaluates the result of the condition to FALSE and will not return any rows. Note the use of SET ANSI_NULLS ON, which is SQL Server default.

If you want to fetch rows where Age is NULL, you should use ‘IS’ operator with ANSI_NULLS set to ON instead of using ‘=’ operator with ANSI_NULLS set to OFF. So, the preferred query should be as follows:

SET ANSI_NULLS ON
SELECT * FROM tblPerson WHERE Age IS NULL

As you can observe from above examples that using comparison operators (=,<,>,>=,<=,<>) in condition with NULL as one of its operands appearing in WHERE clause yields UNKNOWN result when you don’t specify ANSI_NULLS option or if it is set to ON.

But this is not the case when NULL appears in condition for a CHECK constraint. To see this, let’s add a CHECK constraint to our table so that age must be greater than zero when a record is inserted or updated.

ALTER TABLE tblPerson
ADD CONSTRAINT CK_tblPerson_Age CHECK (Age &gt; 0)

Now, add new row with NULL in Age as shown below:

INSERT INTO tblPerson VALUES (4,'Person-4',NULL)

The age for newly inserted row is NULL which is not greater than zero because it is UNKNOWN. But, SQL Server treats NULL differently in this case and evaluates the CHECK constraint condition as if it is TRUE and inserts new row to the table.

You saw that the condition WHERE NULL=NULL yields NULL because both the values are UNKNOWN and due to this comparison is not possible. Thus, the result will always be UNKNOWN. But again, this is not always the case. To prove my point, let’s add a UNIQUE constraint on PersonID column.

ALTER TABLE tblPerson
ADD CONSTRAINT UK_tblPerson_Age UNIQUE (PersonID)

Now insert a new row where PersonID is NULL as shown below:

INSERT INTO tblPerson VALUES (NULL,’Person-5′,30)

Because PersonID column allows NULL values, SQL Server will insert this new row. Now, try inserting one more row where PersonID is NULL:

INSERT INTO tblPerson VALUES (NULL,'Person-6',35)

As you may probably have guessed, because of UNIQUE constraint on PersonID SQL Server will not allow inserting another row with NULL for PersonID because tblPerson already contains one row with NULL in PersonID column. In this case, SQL server treats NULL even differently and considers all NULLs as if they are equal and does not allow a second NULL to be inserted. Even when SQL Server groups or orders rows by a column, it considers all the NULLs in one group as if they are all equal.

I hope this discussion will help many of you in understanding the behavior of NULL in SQL Server. Thank you for taking time to read this article. Readers’ valuable comments are always welcome!

5 Performance Enhancement Tips for LINQ Coupled With MSSQL Server

Tuesday, July 27th, 2010

LINQ is one of the most exciting enhancements made in the Microsoft.NET languages in the recent years.

Essentially, it’s a new way to represent different data models in OOPs notation.

While working extensively with LINQ in last couple of years on mission critical .NET projects, I’ve observed the following five performance enhancement possibilities.

1. Lazy Loading & Eager Loading

Lazy Loading:

By default LINQ loads the related objects lazily. Lazy loading means Loading Object with that associate Objects.

For example, If I have two tables called User (UserId, Name) and User_Car  (UserId, CarID, CarBrand), whenever we initiate a User object, it will automatically load User_Car object details also as shown in the code below.

var userObj = from user in DataContext.User
              Select user;

The above query returns the User Object with the details of associate object of User_Car.  E.g.

Int carId= userObj.User_Car.CarID;

The above statement will fire two queries on database.

First one will be fired when we write query on User Object. E.g.

var userObj = from user in datacontext.User
              Select user;
(Select ID, Name from User)

And second query will fired when we try getting the value from User_Car object.

E.g

Int carId=userObj.User_Car.CarID;
(Select User_Car.CarID, User_Car.User_ID, User_Car.CarBrand
from User_Car where User_Car. User_ID =@userID)

This leads to more round trips to the database. Instead, we can opt for Eager loading.

Eager Loading:

Defining associate Objects  which are require in advance.So when LINQ execute query then it fetch only that objects.And LINQ fire a single on database.

For Example,

DataContext dc= new DataContext ();
DataLoadOptions options = new DataLoadOptions ();
option.Loadwidth<User> (U => U.User_Car)
dc.LoadOptions = options; 

(Here, User_Car object as pre-fetch object. So when we write query on User Object then automatically User_car records get fetch.)

e.g

var userObj = from user in DataContext.User
               Select user;

SQL Query is,

(Select User.ID, User.Name, User_Car.CarID, User_Car.User_ID,
User_Car.CarBrand from User join User_Car on
User.ID=User_Car.User_ID)

And when we write

Int carId= userObj.User_Car.CarID

It fetches the record from memory only. So Eager Loading reduce the Database round Trips and Improve the performance.

2. Use DataLoadOption.AssociateWith ()

In Eager Loading, When association of Master object and child object are based on non PrimaryKey then use DataLoadOption.AssociateWith (). It will optimize SQL query and give better Performance.

3.Set ObjectTrackingEnabled = False

LINQ objects are use only for data retrieval (No insert, update, delete operations are required) then make this flag off.It will avoid Identity Field(Auto-Increment Field) management process.

4. Set Delay Loaded = True

Field which is not in use or huge in size of contain for particular object then set this property as True.So we access the field then only data get  load.

5. Use Compiled Query

Compiled Query skip the steps of  generating Expression Tree and generating SQL Query  of Process Life Cycle.When First time query get executed, the expression tree and SQL query is generated and next time system will use same for execution.

e.g

Public static Func<DataContext, int, IQueryable<User>>
    UserByID=
	    CompiledQuery.Compile ((DataContext db, int UserID) =>
            From U in db.User where U.ID == UserID select U);

When we call Function first time the expression tree and SQL query will be generate and next time system will use the same to execute query on database.

After applying one or more of the above tips, a LINQ application will definitely perform better and thus will ensure faster response times to business needs.

Smart and Effective Use of SQL Server’s Pool of Shared Memory

Wednesday, June 30th, 2010

Microsoft SQL Server has an effective pool of shared memory which ideally is used for two purposes .

  1. Data buffer;
  2. Execution plan cache.

It is commonly known that a query executed second time always gives response faster than the first time. In this post I’m going to discuss what works behind this and how can we make sure that the shared memory is effectively used.

What works behind this?
The shared memory part which is commonly known as execution plan cache (also known as procedure cache) save the execution plans of all the queries executed in SQL Server. When a query is executed this area of memory is looked into to determine if any of the existing execution plan can be re-used for this query. If the plan is not found then the query is parsed, compiled, optimized for execution and then the execution plan of the same is stored in the execution plan cache. That is the reason of queries working faster in subsequent time than the first time.

How to make effective use of execution plans
For each query executed in SQL Server an execution plan is saved but its not necessary that the execution plan stored is useful enough that its used for other queries also. To better understand this lets have an look how SQL Server saves the execution plan.

SELECT UseCounts, CacheObjType, ObjType, ST.Text FROM sys.dm_exec_cached_plans
cross apply sys.dm_exec_sql_text(plan_handle) as ST
ORDER BY ST.text



In the above screenshot let’s concentrate on the area which is in red. These three execution plans are the result of non-parameterized ad-hoc queries executed from the code of application inside an iteration. In this case none of the execution plans are effective, for each query a new execution plan is required. The query in the code was written something like below.

string sql = “SELECT * FROM URls where urlid=” + a;
where “a” is the varying value of the iteration

Had this query been a parametrized query only one execution plan would have been created and used for all the iteration. Let’s take an example to understand this better. Now my query would be something like.

string sql = “SELECT * FROM URls where urlid=@a”;
SqlCommand cmd = new SqlCommand(sql, myConn);
cmd.Parameters.Add(new SqlParameter(“@a”, a));

Now let’s have a look at the execution plans

In the above screenshot you would realize that only one execution plan has been created and its “usecounts” is 3, that means this plan has been used three time. Reusable execution plans means lesser CPU utilization , lesser utilization of memory for execution plan cache there by leaving more memory available for data buffer.

Conclusion
Though SQL Server maintains the execution plan of each query but making effective use of it is some what in our hands. There should be no piece of code in the application which is making use of ad-hoc queries it should always be parameterized queries. Though I personally prefer to use stored procedure since allowing direct access to the tables can be a security risk.

Few Smart Database Indexing Tips To Boost Application Performance

Wednesday, June 2nd, 2010

Indexes play key role in performance and scalability of any database intensive applications. Here are few tips to smartly use indexes to get the best out of it.

  1. Decide Best Candidates for Index. The column(s) which are most frequently used to filter records (WHERE clause and JOIN clause), group records (GROUP BY clause) and/or sort records (ORDER BY) are best candidates for index creating. Analyze your queries very attentively and avoid creating indexes which are seldom used.
    You should also minimize the number of indexes for the tables which has data being frequently updated/deleted/inserted to reduce index maintenance cost during each such operation.
  2. If you create a composite (multi-column) index, try to order the columns in the key as to enhance selectivity
    1. With the most selective columns to the left most of the key.
    2. The WHERE clauses of the frequently used queries match the column(s) that are leftmost in the index.

    The order of the columns in a multi column index plays a key role during query execution. The composite index used only when the left most column is included in WHERE clause of the query otherwise the index will not be hit and instead an expensive full table scan will be performed.
    For example, if you create composite index such as “Branch, Department”, then the query with the WHERE clause such as “WHERE Branch = ‘New York’” will use the index, but the query with the WHERE clause such as “WHERE Department = ‘Finance’” will not use the index.

  3. Every table should have a clustered index. Cluster index ensures that data rows stored in particular order only. If you do not create a cluster index for a table, the data rows will not be stored in any particular order, and ‘Heap’ will be used. Every time data is inserted into this table, the row will be added to the last block of the table. When many rows will be added concurrently, a “hot spot” can occur in absence of proper locking. To avoid “hot spot” and improve concurrency, you should create a clustered index for each table.
  4. Carefully create clustered index instead of non clustered to increase performance of the queries that return a range of values and for the queries that contain the GROUP BY or ORDER BY clauses and return the sort results. However, make sure to NOT create clustered index on column(s) that is updated very frequently. Because clustered index ensures that data is stored in order of key to the index, updating of key data will cause lot of index maintenance which is not desirable for performance centric application.You should choose column(s) for clustered index very very carefully because every table can have only one clustered index. First you analyze all your queries and select optimum candidate which is best suitable for the above mentioned criteria to create the clustered index.

By smartly placing indexes you can achieve great response times to your database applications and offer better user experience.

Happy Indexing!