Chapter 15. Indexing Data

Table of Contents

15.1. Introduction to Indexing
15.2. Types of Indexes
15.3. Choosing Columns to Index
15.3.1. Reviewing Queries for Index Usage
15.3.2. Identifying Slow Queries with the Slow Query Log
15.4. Displaying Table Indexes
15.4.1. Displaying Table Indexes Using the SHOW Command
15.4.2. Displaying Table Indexes Using the INFORMATION_SCHEMA
15.4.3. Displaying Index Information Using MySQL Administrator
15.5. Creating Indexes
15.5.1. Creating Indexes with the CREATE TABLE Statement
15.5.2. Using the CREATE INDEX Syntax
15.5.3. Using ALTER TABLE to Create Indexes
15.5.4. Indexing the Prefix of a Column
15.6. Creating and Using Composite Indexes
15.7. Dropping Indexes
15.8. Using FULLTEXT Indexes
15.9. Using EXPLAIN to Optimize Indexing

15.1. Introduction to Indexing

Note: Experienced users may wish to skip this section.

An index in a relational database serves much the same purpose as an index in a book: books are not (typically) organized alphabetically by subject, but are instead organized into logical chapters and parts. This works well until you need to find all references to a specific subject.

When you find yourself in such a situation you look at the index. In the index, the contents of the book are listed in alphabetical order, with a page number displayed for each subject.

Rather than go through the book page by page, you can look a subject up in the index, then move to the page number in question and scan the page for the subject you are interested in.

In a relational database, rows are not stored in a table in any particular order. An index creates a sorted version of a column (or set of columns) from a table that can be searched more efficiently than the table itself.

Each entry in the index stores either the row itself (in the case of InnoDB), or a pointer to the row's location in the data file (similar to a page number in a book index).

Effective indexing can greatly improve the performance of SELECT queries. When a specific column value is searched for, and an appropriate index is not available, the MySQL server must read every row of the table looking for matching rows. When an appropriate index is available, the MySQL server can look in the index for the appropriate values. If you are querying only columns that appear in the index, the MySQL server skips reading the table itself and return data directly from the index.

This chapter covers the following topics:

  • Types of Indexes

    • INDEX

    • PRIMARY KEY

    • UNIQUE INDEX

    • SPATIAL INDEX

    • FULLTEXT INDEX

  • Choosing Columns to Index

  • Displaying Index Information

    • Using SHOW

    • Using the INFORMATION_SCHEMA

    • Using MySQL Administrator

  • Creating Indexes

    • Using CREATE TABLE

    • Using CREATE INDEX

    • Using ALTER TABLE

  • Composite Indexes

  • Dropping Indexes

  • Using FULLTEXT

  • Optimizing Indexes

15.2. Types of Indexes

There are five different index types available for use in MySQL:

  • INDEX: The standard table index. This type of index speeds up queries, but does not enforce uniqueness or uniquely identify the row.

  • PRIMARY: This index is placed on the PRIMARY KEY of the table. Each entry in the index must be unique and the column(s) of this index are considered to be the unique identifier of the row within the table.

  • UNIQUE: This index enforces that the column(s) being indexed are unique within the table, but is not considered to be the unique row identifier for the table.

  • FULLTEXT: This index is used to increase the efficiency of querying natural language information in CHAR, VARCHAR, and TEXT columns. The FULLTEXT index is available for the MyISAM storage engine only.

  • SPATIAL: This index is used with GIS applications to improve the performance of functions such as MBRContains() or MBRWithin(). The SPATIAL index is available for the MyISAM storage engine only.

Your choice of index usually depends on whether a given column should contain unique values: if a column contains unique values and those values are considered to uniquely identify a row (such as ISBN, UPC, or SSN values), choose a PRIMARY index. If your values are unique, but are not considered to uniquely identify the row, choose a UNIQUE index. If you wish to improve the performance of queries that match values in a column, use a regular INDEX. Only one PRIMARY index is allowed per table, all other index types can occur multiple times in a single table.

In special cases you may need to use FULLTEXT and SPATIAL indexes. If you are searching columns that contain natural language, you should use a FULLTEXT index. An example of natural language would be the titles and bodies of a collection of articles, as opposed to simple textual data such as person or place names, which should be indexed using a regular INDEX. The SPATIAL index is for use with GIS data only.

15.3. Choosing Columns to Index

To use indexes effectively, it is necessary to identify those queries that are executing slowly and are not using indexes, or columns which contain unique values. Slow queries can be identified by directly reviewing the queries you use or by using the MySQL Slow Query Log.

15.3.1. Reviewing Queries for Index Usage

To identify queries that make good candidates for indexing, look at the WHERE, GROUP BY, and ORDER BY clauses of your queries.

For example, look at the following query performed against the sakila sample database:

SELECT last_name, first_name FROM actor WHERE last_name = 'Walken'

In this case the WHERE clause contains a reference to the last_name column of the actor table.

Here is another example:


SELECT film.title, COUNT(inventory.inventory_id) AS Stock
FROM film, inventory
WHERE film.film_id = inventory.film_id
  AND inventory.store_id = 1
GROUP BY film.film_id

In this case the film_id and store_id columns are candidates for indexing.

Once you have identified candidate columns, you need to evaluate how often the column is involved in a query: the most often a column is referenced in your various queries, the stronger a candidate it becomes for indexing.

Each index you add to a table will have a negative effect on the performance of INSERT, UPDATE and DELETE queries because not only does the row data need to be changed, the index information must also be updated for each affected index. Over-indexing can lead to performance loss.

Once you have identified your candidate columns you can check whether they are already indexed by displaying the existing indexes on a table.

15.3.2. Identifying Slow Queries with the Slow Query Log

If you need to identify slow queries on a production MySQL server you may benefit from using the MySQL Slow Query Log. When the MySQL server is started with the Slow Query Log enabled, it writes all queries that take longer than a configurable number of seconds to a log file. The queries in the Slow Query Log can be further examined and optimized. For more information see Section 25.3, “The Slow Query Log”

15.4. Displaying Table Indexes

Before creating a new index, it is important to know what indexes currently exist for a given table. Index information can be retrieved using either the SHOW syntax or through use of the INFORMATION_SCHEMA. INFORMATION_SCHEMA is a standard method for accessing information, while SHOW is an extension of the SQL standard. Index information can also be browsed using the MySQL GUI tools.

15.4.1. Displaying Table Indexes Using the SHOW Command

The SHOW command can be used within the active schema to display the index information for a table:

mysql> USE sakila;
Database changed

mysql> SHOW INDEX FROM film\G
*************************** 1. row ***************************
       Table: film
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 1
 Column_name: film_id
   Collation: A
 Cardinality: 2
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 2. row ***************************
       Table: film
  Non_unique: 1
    Key_name: Title_Description_Fulltext
Seq_in_index: 1
 Column_name: title
   Collation: NULL
 Cardinality: NULL
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: FULLTEXT
     Comment:
*************************** 3. row ***************************
       Table: film
  Non_unique: 1
    Key_name: Title_Description_Fulltext
Seq_in_index: 2
 Column_name: description
   Collation: NULL
 Cardinality: NULL
    Sub_part: NULL
      Packed: NULL
        Null: YES
  Index_type: FULLTEXT
     Comment:
3 rows in set (0.00 sec)

The output of the SHOW command shows that there are two indexes on the film table: PRIMARY and Title_Description_Fulltext (as seen from the Key_name value).

The columns being indexed are listed in the Column_name field. In this case there are indexes on the film_id, title, and description columns.

The title and description columns form two parts of the Title_Description_Fulltext index, with the title column appearing before the description column in the index, according to the Seq_in_index field.

You can determine whether or not an index enforces uniqueness by the Non_unique field: 0 indicates that the index enforces uniqueness, 1 indicates that the index does not enforce uniqueness.

15.4.2. Displaying Table Indexes Using the INFORMATION_SCHEMA

As an alternative to the SHOW command, users can query the STATISTICS table of the INFORMATION_SCHEMA.

mysql> SELECT * FROM INFORMATION_SCHEMA.STATISTICS
    -> WHERE TABLE_SCHEMA = 'sakila'
    ->   AND table_name = 'film'\G
*************************** 1. row ***************************
TABLE_CATALOG: NULL
 TABLE_SCHEMA: sakila
   TABLE_NAME: film
   NON_UNIQUE: 0
 INDEX_SCHEMA: sakila
   INDEX_NAME: PRIMARY
 SEQ_IN_INDEX: 1
  COLUMN_NAME: film_id
    COLLATION: A
  CARDINALITY: 2
     SUB_PART: NULL
       PACKED: NULL
     NULLABLE:
   INDEX_TYPE: BTREE
      COMMENT:
*************************** 2. row ***************************
TABLE_CATALOG: NULL
 TABLE_SCHEMA: sakila
   TABLE_NAME: film
   NON_UNIQUE: 1
 INDEX_SCHEMA: sakila
   INDEX_NAME: Title_Description_Fulltext
 SEQ_IN_INDEX: 1
  COLUMN_NAME: title
    COLLATION: NULL
  CARDINALITY: NULL
     SUB_PART: NULL
       PACKED: NULL
     NULLABLE:
   INDEX_TYPE: FULLTEXT
      COMMENT:
*************************** 3. row ***************************
TABLE_CATALOG: NULL
 TABLE_SCHEMA: sakila
   TABLE_NAME: film
   NON_UNIQUE: 1
 INDEX_SCHEMA: sakila
   INDEX_NAME: Title_Description_Fulltext
 SEQ_IN_INDEX: 2
  COLUMN_NAME: description
    COLLATION: NULL
  CARDINALITY: NULL
     SUB_PART: NULL
       PACKED: NULL
     NULLABLE: YES
   INDEX_TYPE: FULLTEXT
      COMMENT:
3 rows in set (0.00 sec)
  

The output of a query to the INFORMATION_SCHEMA closely matches that of the SHOW statement. For information on interpreting the output of a query on the INFORMATION_SCHEMA, please see Section 15.4.1, “Displaying Table Indexes Using the SHOW Command”.

One advantage of the INFORMATION_SCHEMA is that you can view the index information of more than one table at a time by modifying the TABLE_NAME portion of the WHERE clause of your query.

15.4.3. Displaying Index Information Using MySQL Administrator

Users can also view index information using MySQL Administrator. To view index information, select the desired schema in the Catalogs screen, then select the Schema Indices tab:

Figure 15.1. Displaying index information with MySQL Administrator

Displaying index information with MySQL
            Administrator

The Catalog screen displays the index information for all tables in the selected schema. To view the columns that make up a given index, click the arrow icon to the left of the index name.

15.5. Creating Indexes

Indexes can be created using either a CREATE TABLE, CREATE INDEX or ALTER TABLE syntax. Before creating an index, you must know which table the index will be added to, which column(s) the index will apply to, the type of index you will create, and whether the index will enforce uniqueness.

15.5.1. Creating Indexes with the CREATE TABLE Statement

Indexes can be created during table creation by specifying the index information as part of the CREATE TABLE statement, specifying the index information either as part of a column information or at the end of the column definitions.

At minimum, the PRIMARY KEY and UNIQUE indexes should be specified at table creation to prevent duplicate key issues from occurring when the indexes added later (if you try to add such an index after your table is populated, you may have to remove duplicate rows manually before the indexes can be created).

For example, this is a simplified version of the CREATE TABLE statement for the inventory table:


CREATE TABLE inventory (
  inventory_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  film_id INT UNSIGNED NOT NULL,
  store_id INT UNSIGNED NOT NULL,
  INDEX store_id_index (store_id)
)

The primary key was created as part of the column creation line, while the index on the store_id column was created at the end of the CREATE TABLE statement.

It is necessary to specify index information at the end of the CREATE TABLE statement if the index contains more than one column. For example, in the film_actor table a single actor cannot appear more than once in the same film, so the primary key is a combination of the film_id and actor_id columns:


CREATE TABLE film_actor (
  actor_id INT UNSIGNED NOT NULL,
  film_id INT UNSIGNED NOT NULL,
  PRIMARY KEY  (film_id, actor_id)
)

15.5.2. Using the CREATE INDEX Syntax

To create a table with the CREATE INDEX syntax, you specify the type of index, a name for the index, the table to create the index on, and a list of the columns that form the index. CREATE INDEX is a non-standard alternative syntax to ALTER TABLE, described in the section that follows.

For example, to create an index on the last_name and first_name columns of the actor table, you would execute the following statement:

 CREATE INDEX last_first_name ON actor (last_name, first_name)

This next example creates an index that enforces uniqueness on the manager column of the store, ensuring no employee is so overworked that they have to manage two locations:

CREATE UNIQUE INDEX unique_manager ON store (manager)
 

The index types can be INDEX, UNIQUE INDEX, SPATIAL INDEX, and FULLTEXT INDEX.

15.5.3. Using ALTER TABLE to Create Indexes

You can use the ALTER TABLE statement to create indexes on existing tables. The benefit of ALTER TABLE is that it can be used to create multiple indexes in a single statement, which can speed index creation when multiple indexes are needed.

For example, to create an index on last_name and first_name columns of the actor table, you would execute the following statement:

ALTER TABLE actor ADD INDEX last_first_name (last_name, first_name)

This example creates an index that enforces uniqueness on the manager column of the store:

ALTER TABLE actor ADD UNIQUE unique_manager (manager)

Here is an example of creating a primary key on the staff table:

ALTER TABLE staff ADD PRIMARY KEY (staff_id)

Primary keys can only be created with CREATE TABLE and ALTER TABLE statements, there is no CREATE INDEX equivalent for primary keys.

Multiple indexes can be created with a single ALTER TABLE statement by separating the ADD statements with commas:

 
ALTER TABLE actor ADD UNIQUE unique_manager (manager), ADD PRIMARY KEY (staff_id)
 

Other index types you can add include ADD SPATIAL and ADD FULLTEXT.

15.5.4. Indexing the Prefix of a Column

It is possible to index only a prefix of a VARCHAR, CHAR or TEXT column by placing the size of the prefix (in characters) within brackets after the column name:

 CREATE INDEX lname ON actor (last_name(5))

Indexing the prefix of a column decreases the size of the index on disk compared to indexing the entire column, which in turn increases the performance of the index. An index can prefix up to 1000 bytes of a column in MyISAM and InnoDB, and 255 bytes for all other storage engines.

One way to find the proper prefix size for a column is to perform the following SELECT query:


 SELECT COUNT(DISTINCT column_name) AS distinct_rows,
 COUNT(DISTINCT(LEFT(column_name, N))) AS prefix_distinct 
 FROM table_name

Start with a N value of 3 and increase the size of N until the value of prefix_distinct nears that of distinct_rows.

15.6. Creating and Using Composite Indexes

When executing a SELECT query, the MySQL server typically uses only one index per table involved in the query. If the WHERE clause of the query references more than one column, a single-column index may be less than optimal. For example, say you were executing the following query:


SELECT actor_id 
FROM actor 
WHERE last_name = 'Johnson'
  AND first_name = 'Robert'

If the table had an index on the last_name column, the index could be used to narrow the table down to all actors with the last name Johnson, but MySQL would still have to scan all the matched rows to find actors with the first name Robert.

By using a composite index, or an index on multiple columns, the preceding query could be fully optimized. Here is an example of a composite index on the actor table:

CREATE INDEX last_first_name ON actor (last_name, first_name)

With such a composite index, MySQL can first find the last name Johnson in the table, then search for the first name Robert in the matching index entries.

Composite indexes can also be partially used when the columns in the WHERE clause of a query appear in the left-most part of the composite index. For instance, the following query would make use of the composite index we have created:


SELECT first_name 
FROM actor 
WHERE last_name = 'Johnson'

However, the following query would not make use of our composite index:


SELECT last_name 
FROM actor 
WHERE first_name = 'Robert'

The second example does not make use of the composite index because the first_name column is not the left-most part of the index columns.

This rule applies no matter how many parts a composite index has; if you have an index on (columnA, columnB, columnC, columnD), the index will be used on queries that contain the following columns in the WHERE clause: (columnA), (columnA, columnB),(columnA, columnB, columnC), and (columnA, columnB, columnC, columnD). You would not be able to create any queries without columnA and expect the composite index to be used.

For a more detailed description on when an index will be used, see the section titled How MySQL Uses Indexes in the MySQL Reference Manual.

15.7. Dropping Indexes

Existing indexes can be dropped using either a DROP INDEX or ALTER TABLE syntax:

DROP INDEX index_name ON table_name
ALTER TABLE table_name DROP PRIMARY KEY
ALTER TABLE table_name DROP INDEX index_name

You can drop multiple indexes in a single ALTER TABLE statement by separating them with commas:

 ALTER TABLE actor DROP PRIMARY KEY, DROP INDEX last_first_name

15.8. Using FULLTEXT Indexes

While regular indexes are effective for many purposes, they are not effective for columns that contain natural language. An index can be used for single word CHAR and VARCHAR columns, but FULLTEXT indexes are designed for finding strings within larger natural language fields.

A FULLTEXT search takes a string and column list and searches the specified columns for the string, returning results ranked by relevancy.

The FULLTEXT index is available for the MyISAM storage engine only.

The syntax for creating a FULLTEXT index is listed in Section 15.5, “Creating Indexes”. Once the index is created, the MATCH ... AGAINST syntax can be used to perform FULLTEXT queries.

The MATCH clause indicates which columns are to be searched. The list of columns in the MATCH clause must be identical to the list of columns in the FULLTEXT index.

The AGAINST clause contains the string being searched for. The string in the AGAINST clause must be a constant string: you cannot use a user variable or search result in the AGAINST clause.

Here is an example of a basic FULLTEXT query that searches for movies in the film table that contain the word army in the title or description columns:

 SELECT title, description FROM film WHERE MATCH (title, description) AGAINST ('army')

*************************** 1. row ***************************
           title: ARMY FLINTSTONES
     description: A Boring Saga of a Database Administrator And a Womanizer who
must Battle a Waitress in Nigeria
1 row in set (0.00 sec)

Results from a query with MATCH ... AGAINST in the WHERE clause will always return in descending order based on relevancy.

Here is the same query performed with a LIKE clause instead:

 SELECT title, description FROM film WHERE title LIKE '%army%' OR description LIKE '%army%'

*************************** 1. row ***************************
           title: ARMY FLINTSTONES
     description: A Boring Saga of a Database Administrator And a Womanizer who
must Battle a Waitress in Nigeria
1 row in set (0.20 sec)

Note the performance improvement provided by the FULLTEXT index.

The MATCH ... AGAINST syntax can also provide relevancy ranking information:

 
 SELECT title, description, MATCH (title, description) AGAINST ('army') AS rank
 FROM film
 WHERE MATCH (title, description) AGAINST ('army')

*************************** 1. row ***************************
      title: ARMY FLINTSTONES
description: A Boring Saga of a Database Administrator And a Womanizer who
must Battle a Waitress in Nigeria
       rank: 6.1943987015493
1 row in set (0.00 sec)

The relevancy scores are based on the weighting of words within the individual rows. Words that occur rarely in the table are ranked higher than words that appear in a large percentage of the rows.

For more information on the FULLTEXT search engine, see the Fulltext Search section of the MySQL Reference Manual.

15.9. Using EXPLAIN to Optimize Indexing

Sometimes it is not easy to identify which columns of a table to index, even when you have identified the slow queries in your application. The EXPLAIN statement is designed to assist in the query optimization process by providing insight into how the MySQL optimizer handles a specific query.

To analyze a query, precede the query with the EXPLAIN keyword:

 
 EXPLAIN SELECT film.title FROM actor, film, film_actor
 WHERE actor.actor_id = film_actor.actor_id
   AND film.film_id = film_actor.film_id
   AND actor.last_name = 'Walken'\G
 
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: actor
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 200
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: film_actor
         type: ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: sakila.actor.actor_id
         rows: 26
        Extra: Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: sakila.film_actor.film_id
         rows: 1
        Extra:
3 rows in set (0.00 sec)

EXPLAIN returns a row of information for each table used in the SELECT statement. The tables are listed in the output in the order that MySQL would read them while processing the query.

The main things to look out for are rows where the key column is NULL, where the type column is range, index, or ALL, or where the Extra column contains Using filesort or Using temporary. Such queries should be closely examined for proper index usage as they generally indicate that no index is being used.

For additional information on using the EXPLAIN statement, see the EXPLAIN section of the MySQL Reference Manual.