More efficient SQL with query planning and optimization

Anonymous Engineer
4 min readJun 23, 2021

Now that you’ve learned many ways of selecting data and are starting to do SELECTs across multiple tables, it's a good time to talk about the efficiency of your SQL queries - how quickly do they execute, and could they execute faster?

SQL is a declarative language — each query declares what we want the SQL engine to do, but it doesn’t say how. As it turns out, the how — the “plan” — is what affects the efficiency of the queries, however, so it’s pretty important.

Why do SQL queries need a plan?

For example, let’s say we had this simple query:

SELECT * FROM books WHERE author = "J K Rowling";

For this query, these are 2 different ways that SQL could find the results:

  • Do a “full table scan”: look at every single row in the table, return the matching rows.
  • Create an “index”: Make a copy of the table sorted by author, then do a binary search to find the row where the author is “J K Rowling”, find the matching IDs, then do a binary search on the original table that returns the rows that match the ID.

Which one is faster? It depends on the data, and on how often the query will be executed. If the table is only 10 rows long, then a full table scan only requires looking at 10 rows, and the first plan would work out well.

If the table was 10 million rows long, then that full table scan would require looking at 10 million rows. It would be faster to do a binary search on a sorted table — we only need 23 lookups to find a value in 10 million rows. However, creating the sorted table would take a while (~230 million operations, depending on our engine). If we were executing that query many times (more than 23 times) or if we already had that table created, then that second plan would be better.

How does a SQL engine decide which plan to pick? That’s an important step that we haven’t talked about yet, because we’ve been focused on the syntax of our queries, not their implementation. As you get into more advanced SQL usage on big databases, the planning step is increasingly important.

The lifecycle of a SQL query

We can think of the SQL engine going through these steps for each query we give it:

Parse, then Optimize, then Execute

  1. The query parser makes sure that the query is syntactically correct (e.g. commas out of place) and semantically correct (i.e. the tables exist), and returns errors if not. If it’s correct, then it turns it into an algebraic expression and passes it to the next step.
  2. The query planner and optimizer does the hard thinking work. It first performs straightforward optimizations (improvements that always result in better performance, like simplifying 5*10 into 50). It then considers different “query plans” which may have different optimizations, estimates the cost (CPU and time) of each query plan based on the number of rows in the relevant tables, then it picks the optimal plan and passes it on to the next step.
  3. The query executor takes the plan and turns it into operations for the database, returning the results back to us if there are any.

Where do humans come in?

The query planning and optimization happens for every query, and you could go your whole life issuing SQL queries and not realize it. However, once you start dealing with bigger data sets, you’ll start to care more about the speed of your queries, and you might find yourself wondering if there’s any way you can improve the performance of your queries.

Many times, especially for complex queries, there are indeed ways you can help optimize a query, and that’s known as “query tuning”.

The first step is to identify what queries you want to tune, which you can figure out by looking at which of your database calls are taking the longest or using the most resources, like with a SQL profiler. Sometimes, you might discover a badly performing query after it takes so long that it takes down your whole database. Hopefully, you find that out earlier.

The next step is to understand how a particular SQL engine is executing a query, and all SQL systems come with a way to ask the engine. In SQLite, you can stick EXPLAIN QUERY PLAN in front of any SQL to see what it's doing behind the scenes. If you use that, be prepared to dig deep into the EXPLAIN QUERY PLAN reference, because the "explanation" is pretty detailed and implementation specific. If you're using another SQL engine, you can search for "how do I get an execution plan in X".

Now comes the hard part: manual optimization to improve that execution plan. This is also the part that is often dependent on the particularities of the SQL engine you’re using, and the particularities of your own data.

For example, remember that query we discussed at the top? If we knew ahead of time that we would want to do hundreds of queries that restricted WHERE on the author column, then we could explicitly create the index, using CREATE INDEX. Then the SQL engine would be able to use that index to efficiently find the matching rows. You can read this guide about SQLite query planning to help you understand when indexes would help.

Creating indexes can often make repeated queries more efficient. But there are many other approaches as well. For SQLite, you can get more insight in their query planner overview and take careful note of the “manual” sections.

We can’t cover all the complexities of query optimization and query tuning here, so I recommend that you dive deeper into it when you need it.

(Here are some deep dives into different SQL query planners that I found interesting: SQL Server Query Optimizer, Oracle SQL Tuning, MSSQL Execution Plan Basics)

thanks for reading

Cheers :-)

--

--