SQL Query Optimization

Query Optimization is the process of changing a query so that SQL Server is able to return the same results more efficiently. This process also reduces the load on the database.

Below are 2 query examples that return the exact same data.
First a sub-optimal query, followed by a query with optimized SQL.

Example

#

ORDER
Id
OrderDate
OrderNumber
CustomerId
TotalAmount

Problem: List orders with customer name and number of items in the order.

SELECT FirstName = (SELECT FirstName FROM Customer C WHERE C.Id = O.CustomerId),
       LastName = (SELECT LastName FROM Customer C WHERE C.Id = O.CustomerId),
       OrderNumber,
       OrderDate,
       Items = (SELECT COUNT(Id) FROM OrderItem I WHERE I.OrderId = O.Id)
  FROM [Order] O
 ORDER BY OrderNumber

A query with 3 correlated subqueries getting data from Customer and OrderItem tables.

Result:  830 records
FirstName LastName OrderNumber OrderDate Items
Paul Henriot 542378 2012-07-04 00:00:00.000 3
Karin Josephs 542379 2012-07-05 00:00:00.000 2
Mario Pontes 542380 2012-07-08 00:00:00.000 3
Mary Saveley 542381 2012-07-08 00:00:00.000 3
Pascale Cartrain 542382 2012-07-09 00:00:00.000 3
Mario Pontes 542383 2012-07-10 00:00:00.000 3

Next, an optimized query returning the exact same results.

SELECT FirstName, LastName, OrderNumber, OrderDate,
       COUNT(I.Id) AS Items
  FROM [Order] O
  JOIN Customer C ON C.Id = O.CustomerId
  JOIN OrderItem I ON I.OrderId = O.Id
 GROUP BY FirstName, LastName, OrderNumber, OrderDate
 ORDER BY OrderNumber

A query with 3 JOIN operations and an Aggregate function. With proper indexing on primary keys, foreign keys, and aggregate columns, this query will be extremely fast, even with very large tables.

Result:  830 records
FirstName LastName OrderNumber OrderDate Items
Paul Henriot 542378 2012-07-04 00:00:00.000 3
Karin Josephs 542379 2012-07-05 00:00:00.000 2
Mario Pontes 542380 2012-07-08 00:00:00.000 3
Mary Saveley 542381 2012-07-08 00:00:00.000 3
Pascale Cartrain 542382 2012-07-09 00:00:00.000 3
Mario Pontes 542383 2012-07-10 00:00:00.000 3

What is SQL Query Optimization?

#

Optimization is the process of writing effective SQL to get fast and accurate results.

Minor changes can often have a dramatic impact on query performance.

Optimization involves indexing, selection limits, matching records, and more.

Fundamental factors for slow queries are 1) poorly designed data model, and 2) missing indexes.

How to: Query Optimization

Below are some tips and techniques on improving query and datbase performance.


Implement Proper Indexing

#

An index is a data structure that speeds up data retrieval from a database tables.

Indexes are also used to define primary keys with unique indexes to avoid duplicate key values.

Using proper indexing allows for very fast SELECT and ORDER BY operations.

Indexing may slow down INSERT, UPDATE, and DELETE operations.

When inserting a batch of a million records, it may be best to drop the index and then re-create it afterwards.

This diagram outlines the logic flow of adding indexes to a table.

Indexing throught process

Avoid using SELECT *

#

Instead of SELECT * use SELECT column_1, column_2, ....

SELECT * may return unused columns which negatively affects bandwidth and database resources.

When table changes are made, SELECT * may automatically return columns that are undesirable.


Avoid using SELECT DISTINCT

#

The DISTINCT keyword removes duplicates from a query.

To ensure there are no duplicates requires extensive processing power.

You can select more fields from the database table to create more unique results instead of using DISTINCT.

Example
SELECT DISTINCT FirstName, LastName, Country 
  FROM Customer

This query will group duplicate first name, last name, and country customers causing inaccurate results.
Imagine having John Smith or Jane Smith multiple times with exactly the same country origin but different customers.
For larger sets of data, grouping these duplicates will cause the query to be executed slowly.

SELECT FirstName, LastName, City, Country 
  FROM Customer

By omitting the DISTINCT and adding more fields to compare, the records will be easily identified as duplicated or not. The database will not group the records as well so same names and address won't be an issue.


Avoid Running Queries in Loop

#

Coding SQL queries in loop slows down the entire sequence.

Instead of writing a query that loops an INSERT statement, you can use bulk insert or update.

This way as well the application will not handle another function to loop and call the database to perform action everytime it is looped.


Use JOIN Instead of WHERE Clause

#

Consider this query below:

SELECT Product.ProductName, Supplier.CompanyName
  FROM Product, Supplier
 WHERE Product.SupplierId = Supplier.Id

This type of query is a cartesian join. In this type of join, all possible combinations are created which means it will retrieve all records first then filter it from there. This will cause the database to slow down especially for large-scale databases. To prevent this INNER JOIN can be used.

SELECT Product.ProductName, Supplier.CompanyName
   FROM Product
  INNER JOIN Supplier ON Product.SupplierId = Supplier.Id

In here, the database will only generate the desired number of records that has the same Supplier Id. Some database management system automatically recognizes the WHERE joins and run them like INNER JOIN but some does not so it is recommended to use JOIN instead.


Use EXISTS To Check If Record Is Found

#

Commonly, the COUNT() function is used to check if a record exists from the database.

This function scans the entire table even if the record is found on the first row which makes it efficient.

Instead of using COUNT(), the EXISTS() function can be used so when a record is found, the execution stops as well and return the result right away.


Use WHERE Instead of HAVING

#

HAVING statements are all calculated after the WHERE statments.

If the goal of the HAVING statement is just to filter result, it is recommended to use the WHERE statement instead for higher efficiency.

For instance, if you wish to see a date range, the HAVING statement will retrieve all records then filter them while WHERE statement instantly limits the number of records pull based on the condition.


Avoid Using Correlated Subqueries

#

Correlated subqueries is dependent on the parent or outer query.

Since this type of query executes row by row, it decreases overall database performance.

Example
SELECT ProductName,
       SupplierName = (SELECT CompanyName 
                         FROM Supplier 
                        WHERE Product.SupplierId = Supplier.Id) 
  FROM Product

The query above may have performance issues as it will run the inner query for each product row. This can easily be solved by using a JOIN instead.


Use Wildcards At The End of A Phrase Only

#

Wildcards create a widest search possible which is also considered as the most inefficient type of search.

When using a lot of wildcards to search for a string, the database is tasked to search records for a match anywhere.

As much as possible, try to limit using wildcard searching.


Retrieve Limited and Accurate Results

#

The lesser data is retrieved, the faster the query will execute.

Rather than applying many filters or conditions to the actual query, you can do this on the server or application side first which will limit the number or results the database has to retrieve.

You can also use the LIMIT statement to limit the number of results to return.

Having pagination as well on the application can help increase the database performance.


Query Optimization Reminders

  • Always look for resources allocation such as disks, CPU, and memory
  • Review queries, indexes, and perform regular database maintenance checks
  • Analyze workload using extended events or any third-party database monitoring tools
  • Always implement solutions on test environment or sandbox first before implementing them on production

You may also like



Guides


vsn 3.1