Using an Index to Speed Up Your Database Queries

I almost didn’t write on this topic. There are a lot of people out there who know more about indexing than I do. The problem is all of the knowledge on the subject is being presented by and for database specialists. While their need is probably greater, I contend that web programmers who create dynamic sites need this information as well. So I’m going to share what I’ve learned to make my queries faster over the years. If there are any gaping holes of information, than hopefully those will get addressed in the comments.

An index is created on a particular table in your database. You specify one or more columns that you’d like included in the index, and the database then has a snapshot of what the table looks like when all the records are sorted on those columns. The idea is that the table will be able to return query results involving those columns more quickly. Here’s how it works.

Let’s say you have a simple customer table. You have the following columns: id, last_name, and first_name.

Customer Table

Now suppose you write a query where you were returning all of the customers whose last name starts with the letter A.

select * from customer where last_name like ‘A%’

In order to process this query, the processor goes through each record of the table checking to see if the last_name field begins with the letter A. If you have a few hundred thousand customer records, that can take a while. Now imagine the same scenario only add an index on the last_name column. There is now a snapshot of the table sorted by the value for last_name.

Customer Table

The processor goes through the records. As soon as it hits the first record where the last name begins with B, the query can stop. There aren’t going to be anymore “A” last names farther down the list. Instead of checking every record in the table, the processor only had to check a small fraction, resulting in a faster query.

I should point out a caveat about the data types you choose to use in an index. Numbers and dates are much easier to maintain an index on then strings. (Here’s a gross oversimplification: sorting an integer field is sorting one number while sorting a varchar field with a length of 20 is like sorting 20 separate integer fields.) Create an index on a long string field, and you may find that your web app actually run slower than it did without the index.

So the safe bet is to just create an index on every date and integer field, right? Not exactly. Every time you insert, update, or delete a record from your table, all of the indexes on that table need to be rebuilt. That rebuilding can take time.

Even with select queries, an index is not a panacea. What if we were actually looking for customers whose last names started with Z?

select * from customer where last_name like ‘Z%’

Even with the index on last_name, the processor will still have to go through every record in the table. The index doesn’t shave any time off this query.

The best way to learn is to experiment with creating and modifying an index. Unfortunately, freelance web programmers are usually involved with the initial creation of a site, when there are only a handful of test records in the database. The speed gains of an index are unnoticeable in those cases, so your experimentation won’t reveal much. When you really want to learn, you’ll have to find or create a few tables that have at least a few tens of thousands of records for you to play with. The more, the better.

Leave a Reply

Best Practices

presented by Site Potion