Table of Contents
Note: Experienced users may wish to skip this section.
An index in a relational database serves much the same purpose as an index in a book: books are not (typically) organized alphabetically by subject, but are instead organized into logical chapters and parts. This works well until you need to find all references to a specific subject.
When you find yourself in such a situation you look at the index. In the index, the contents of the book are listed in alphabetical order, with a page number displayed for each subject.
Rather than go through the book page by page, you can look a subject up in the index, then move to the page number in question and scan the page for the subject you are interested in.
In a relational database, rows are not stored in a table in any particular order. An index creates a sorted version of a column (or set of columns) from a table that can be searched more efficiently than the table itself.
Each entry in the index stores either the row itself (in the case of InnoDB), or a pointer to the row's location in the data file (similar to a page number in a book index).
Effective indexing can greatly improve the performance of
SELECT queries. When a specific column value is
searched for, and an appropriate index is not
available, the MySQL server must read every row of the table
looking for matching rows. When an appropriate index is available,
the MySQL server can look in the index for the appropriate values.
If you are querying only columns that appear in the index, the
MySQL server skips reading the table itself and return data
directly from the index.
This chapter covers the following topics:
Types of Indexes
INDEX
PRIMARY KEY
UNIQUE INDEX
SPATIAL INDEX
FULLTEXT INDEX
Choosing Columns to Index
Displaying Index Information
Using SHOW
Using the INFORMATION_SCHEMA
Using MySQL Administrator
Creating Indexes
Using CREATE TABLE
Using CREATE INDEX
Using ALTER TABLE
Composite Indexes
Dropping Indexes
Using FULLTEXT
Optimizing Indexes
There are five different index types available for use in MySQL:
INDEX: The standard table index. This type
of index speeds up queries, but does not enforce uniqueness or
uniquely identify the row.
PRIMARY: This index is placed on the
PRIMARY KEY of the table. Each entry in the
index must be unique and the column(s) of this index are
considered to be the unique identifier of the row within the
table.
UNIQUE: This index enforces that the
column(s) being indexed are unique within the table, but is
not considered to be the unique row identifier for the table.
FULLTEXT: This index is used to increase
the efficiency of querying natural language information in
CHAR, VARCHAR, and
TEXT columns. The
FULLTEXT index is available for the
MyISAM
storage engine only.
SPATIAL: This index is used with
GIS applications to improve the performance
of functions such as MBRContains() or
MBRWithin(). The SPATIAL
index is available for the MyISAM storage
engine only.
Your choice of index usually depends on whether a given column
should contain unique values: if a column contains unique values
and those values are considered to uniquely identify a row (such
as ISBN, UPC, or
SSN values), choose a
PRIMARY index. If your values are unique, but
are not considered to uniquely identify the row, choose a
UNIQUE index. If you wish to improve the
performance of queries that match values in a column, use a
regular INDEX. Only one
PRIMARY index is allowed per table, all other
index types can occur multiple times in a single table.
In special cases you may need to use FULLTEXT
and SPATIAL indexes. If you are searching
columns that contain natural language, you should use a
FULLTEXT index. An example of natural language
would be the titles and bodies of a collection of articles, as
opposed to simple textual data such as person or place names,
which should be indexed using a regular INDEX.
The SPATIAL index is for use with
GIS data only.
To use indexes effectively, it is necessary to identify those queries that are executing slowly and are not using indexes, or columns which contain unique values. Slow queries can be identified by directly reviewing the queries you use or by using the MySQL Slow Query Log.
To identify queries that make good candidates for indexing, look
at the WHERE, GROUP BY,
and ORDER BY clauses of your queries.
For example, look at the following query performed against the
sakila sample database:
SELECT last_name, first_name FROM actor WHERE last_name = 'Walken'
In this case the WHERE clause contains a reference to the
last_name column of the
actor table.
Here is another example:
SELECT film.title, COUNT(inventory.inventory_id) AS Stock
FROM film, inventory
WHERE film.film_id = inventory.film_id
AND inventory.store_id = 1
GROUP BY film.film_id
In this case the film_id and
store_id columns are candidates for indexing.
Once you have identified candidate columns, you need to evaluate how often the column is involved in a query: the most often a column is referenced in your various queries, the stronger a candidate it becomes for indexing.
Each index you add to a table will have a negative effect on the
performance of INSERT,
UPDATE and DELETE queries
because not only does the row data need to be changed, the index
information must also be updated for each affected index.
Over-indexing can lead to performance loss.
Once you have identified your candidate columns you can check whether they are already indexed by displaying the existing indexes on a table.
If you need to identify slow queries on a production MySQL server you may benefit from using the MySQL Slow Query Log. When the MySQL server is started with the Slow Query Log enabled, it writes all queries that take longer than a configurable number of seconds to a log file. The queries in the Slow Query Log can be further examined and optimized. For more information see Section 25.3, “The Slow Query Log”
Before creating a new index, it is important to know what indexes
currently exist for a given table. Index information can be
retrieved using either the SHOW syntax or
through use of the INFORMATION_SCHEMA.
INFORMATION_SCHEMA is a standard method for
accessing information, while SHOW is an
extension of the SQL standard. Index information can also be
browsed using the MySQL GUI tools.
The SHOW command can be used within the
active schema to display the index information for a table:
mysql>USE sakila;Database changed mysql>SHOW INDEX FROM film\G*************************** 1. row *************************** Table: film Non_unique: 0 Key_name: PRIMARY Seq_in_index: 1 Column_name: film_id Collation: A Cardinality: 2 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 2. row *************************** Table: film Non_unique: 1 Key_name: Title_Description_Fulltext Seq_in_index: 1 Column_name: title Collation: NULL Cardinality: NULL Sub_part: NULL Packed: NULL Null: Index_type: FULLTEXT Comment: *************************** 3. row *************************** Table: film Non_unique: 1 Key_name: Title_Description_Fulltext Seq_in_index: 2 Column_name: description Collation: NULL Cardinality: NULL Sub_part: NULL Packed: NULL Null: YES Index_type: FULLTEXT Comment: 3 rows in set (0.00 sec)
The output of the SHOW command shows that
there are two indexes on the film table:
PRIMARY and
Title_Description_Fulltext (as seen from the
Key_name value).
The columns being indexed are listed in the
Column_name field. In this case there are
indexes on the film_id,
title, and description
columns.
The title and description
columns form two parts of the
Title_Description_Fulltext index, with the
title column appearing before the
description column in the index, according to
the Seq_in_index field.
You can determine whether or not an index enforces uniqueness by
the Non_unique field: 0
indicates that the index enforces uniqueness,
1 indicates that the index does not enforce
uniqueness.
As an alternative to the SHOW command, users
can query the STATISTICS table of the
INFORMATION_SCHEMA.
mysql>SELECT * FROM INFORMATION_SCHEMA.STATISTICS->WHERE TABLE_SCHEMA = 'sakila'->AND table_name = 'film'\G*************************** 1. row *************************** TABLE_CATALOG: NULL TABLE_SCHEMA: sakila TABLE_NAME: film NON_UNIQUE: 0 INDEX_SCHEMA: sakila INDEX_NAME: PRIMARY SEQ_IN_INDEX: 1 COLUMN_NAME: film_id COLLATION: A CARDINALITY: 2 SUB_PART: NULL PACKED: NULL NULLABLE: INDEX_TYPE: BTREE COMMENT: *************************** 2. row *************************** TABLE_CATALOG: NULL TABLE_SCHEMA: sakila TABLE_NAME: film NON_UNIQUE: 1 INDEX_SCHEMA: sakila INDEX_NAME: Title_Description_Fulltext SEQ_IN_INDEX: 1 COLUMN_NAME: title COLLATION: NULL CARDINALITY: NULL SUB_PART: NULL PACKED: NULL NULLABLE: INDEX_TYPE: FULLTEXT COMMENT: *************************** 3. row *************************** TABLE_CATALOG: NULL TABLE_SCHEMA: sakila TABLE_NAME: film NON_UNIQUE: 1 INDEX_SCHEMA: sakila INDEX_NAME: Title_Description_Fulltext SEQ_IN_INDEX: 2 COLUMN_NAME: description COLLATION: NULL CARDINALITY: NULL SUB_PART: NULL PACKED: NULL NULLABLE: YES INDEX_TYPE: FULLTEXT COMMENT: 3 rows in set (0.00 sec)
The output of a query to the
INFORMATION_SCHEMA closely matches that of
the SHOW statement. For information on
interpreting the output of a query on the
INFORMATION_SCHEMA, please see
Section 15.4.1, “Displaying Table Indexes Using the SHOW Command”.
One advantage of the INFORMATION_SCHEMA is
that you can view the index information of more than one table
at a time by modifying the TABLE_NAME portion
of the WHERE clause of your query.
Users can also view index information using MySQL Administrator.
To view index information, select the desired schema in the
Catalogs screen, then select the
Schema Indices tab:
The Catalog screen displays the index
information for all tables in the selected schema. To view the
columns that make up a given index, click the arrow icon to the
left of the index name.
Indexes can be created using either a CREATE
TABLE, CREATE INDEX or ALTER
TABLE syntax. Before creating an index, you must know
which table the index will be added to, which column(s) the index
will apply to, the type of index you will create, and whether the
index will enforce uniqueness.
Indexes can be created during table creation by specifying the
index information as part of the CREATE TABLE
statement, specifying the index information either as part of a
column information or at the end of the column definitions.
At minimum, the PRIMARY KEY and
UNIQUE indexes should be specified at table
creation to prevent duplicate key issues from occurring when the
indexes added later (if you try to add such an index after your
table is populated, you may have to remove duplicate rows
manually before the indexes can be created).
For example, this is a simplified version of the CREATE TABLE
statement for the inventory table:
CREATE TABLE inventory (
inventory_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
film_id INT UNSIGNED NOT NULL,
store_id INT UNSIGNED NOT NULL,
INDEX store_id_index (store_id)
)
The primary key was created as part of the column creation line,
while the index on the store_id column was
created at the end of the CREATE TABLE
statement.
It is necessary to specify index information at the end of the
CREATE TABLE statement if the index contains
more than one column. For example, in the
film_actor table a single actor cannot appear
more than once in the same film, so the primary key is a
combination of the film_id and
actor_id columns:
CREATE TABLE film_actor (
actor_id INT UNSIGNED NOT NULL,
film_id INT UNSIGNED NOT NULL,
PRIMARY KEY (film_id, actor_id)
)
To create a table with the CREATE INDEX
syntax, you specify the type of index, a name for the index, the
table to create the index on, and a list of the columns that
form the index. CREATE INDEX is a
non-standard alternative syntax to ALTER
TABLE, described in the section that follows.
For example, to create an index on the
last_name and first_name
columns of the actor table, you would execute the following
statement:
CREATE INDEX last_first_name ON actor (last_name, first_name)
This next example creates an index that enforces uniqueness on
the manager column of the
store, ensuring no employee is so overworked
that they have to manage two locations:
CREATE UNIQUE INDEX unique_manager ON store (manager)
The index types can be INDEX, UNIQUE
INDEX, SPATIAL INDEX, and
FULLTEXT INDEX.
You can use the ALTER TABLE statement to
create indexes on existing tables. The benefit of ALTER
TABLE is that it can be used to create multiple
indexes in a single statement, which can speed index creation
when multiple indexes are needed.
For example, to create an index on last_name
and first_name columns of the actor table,
you would execute the following statement:
ALTER TABLE actor ADD INDEX last_first_name (last_name, first_name)
This example creates an index that enforces uniqueness on the
manager column of the
store:
ALTER TABLE actor ADD UNIQUE unique_manager (manager)
Here is an example of creating a primary key on the
staff table:
ALTER TABLE staff ADD PRIMARY KEY (staff_id)
Primary keys can only be created with CREATE
TABLE and ALTER TABLE statements,
there is no CREATE INDEX equivalent for
primary keys.
Multiple indexes can be created with a single ALTER
TABLE statement by separating the ADD
statements with commas:
ALTER TABLE actor ADD UNIQUE unique_manager (manager), ADD PRIMARY KEY (staff_id)
Other index types you can add include ADD
SPATIAL and ADD FULLTEXT.
It is possible to index only a prefix of a
VARCHAR, CHAR or
TEXT column by placing the size of the prefix
(in characters) within brackets after the column name:
CREATE INDEX lname ON actor (last_name(5))
Indexing the prefix of a column decreases the size of the index
on disk compared to indexing the entire column, which in turn
increases the performance of the index. An index can prefix up
to 1000 bytes of a column in MyISAM and
InnoDB, and 255 bytes for all other storage
engines.
One way to find the proper prefix size for a column is to
perform the following SELECT query:
SELECT COUNT(DISTINCT column_name) AS distinct_rows,
COUNT(DISTINCT(LEFT(column_name, N))) AS prefix_distinct
FROM table_name
Start with a N value of
3 and increase the size of
N until the value of
prefix_distinct nears that of
distinct_rows.
When executing a SELECT query, the MySQL server
typically uses only one index per table involved in the query. If
the WHERE clause of the query references more
than one column, a single-column index may be less than optimal.
For example, say you were executing the following query:
SELECT actor_id
FROM actor
WHERE last_name = 'Johnson'
AND first_name = 'Robert'
If the table had an index on the last_name
column, the index could be used to narrow the table down to all
actors with the last name Johnson, but MySQL
would still have to scan all the matched rows to find actors with
the first name Robert.
By using a composite index, or an index on multiple columns, the
preceding query could be fully optimized. Here is an example of a
composite index on the actor table:
CREATE INDEX last_first_name ON actor (last_name, first_name)
With such a composite index, MySQL can first find the last name
Johnson in the table, then search for the first
name Robert in the matching index entries.
Composite indexes can also be partially used when the columns in
the WHERE clause of a query appear in the
left-most part of the composite index. For instance, the following
query would make use of the composite index we have created:
SELECT first_name
FROM actor
WHERE last_name = 'Johnson'
However, the following query would not make use of our composite index:
SELECT last_name
FROM actor
WHERE first_name = 'Robert'
The second example does not make use of the composite index
because the first_name column is not the
left-most part of the index columns.
This rule applies no matter how many parts a composite index has;
if you have an index on (columnA,
columnB, columnC,
columnD), the index will be used on queries
that contain the following columns in the WHERE
clause: (columnA), (columnA,
columnB),(columnA,
columnB, columnC), and
(columnA, columnB,
columnC, columnD). You would
not be able to create any queries without
columnA and expect the composite index to be
used.
For a more detailed description on when an index will be used, see the section titled How MySQL Uses Indexes in the MySQL Reference Manual.
Existing indexes can be dropped using either a DROP
INDEX or ALTER TABLE syntax:
DROP INDEXindex_nameONtable_nameALTER TABLEtable_nameDROP PRIMARY KEYALTER TABLEtable_nameDROP INDEXindex_name
You can drop multiple indexes in a single ALTER
TABLE statement by separating them with commas:
ALTER TABLE actor DROP PRIMARY KEY, DROP INDEX last_first_name
While regular indexes are effective for many purposes, they are
not effective for columns that contain natural language. An index
can be used for single word CHAR and
VARCHAR columns, but
FULLTEXT indexes are designed for finding
strings within larger natural language fields.
A FULLTEXT search takes a string and column
list and searches the specified columns for the string, returning
results ranked by relevancy.
The FULLTEXT index is available for the
MyISAM storage engine only.
The syntax for creating a FULLTEXT index is
listed in Section 15.5, “Creating Indexes”. Once the index is
created, the MATCH ... AGAINST syntax can be
used to perform FULLTEXT queries.
The MATCH clause indicates which columns are to
be searched. The list of columns in the MATCH
clause must be identical to the list of columns in the
FULLTEXT index.
The AGAINST clause contains the string being
searched for. The string in the AGAINST clause
must be a constant string: you cannot use a user variable or
search result in the AGAINST clause.
Here is an example of a basic FULLTEXT query
that searches for movies in the film table that
contain the word army in the
title or description
columns:
SELECT title, description FROM film WHERE MATCH (title, description) AGAINST ('army')
*************************** 1. row ***************************
title: ARMY FLINTSTONES
description: A Boring Saga of a Database Administrator And a Womanizer who
must Battle a Waitress in Nigeria
1 row in set (0.00 sec)
Results from a query with MATCH ... AGAINST in
the WHERE clause will always return in
descending order based on relevancy.
Here is the same query performed with a LIKE
clause instead:
SELECT title, description FROM film WHERE title LIKE '%army%' OR description LIKE '%army%'
*************************** 1. row ***************************
title: ARMY FLINTSTONES
description: A Boring Saga of a Database Administrator And a Womanizer who
must Battle a Waitress in Nigeria
1 row in set (0.20 sec)
Note the performance improvement provided by the
FULLTEXT index.
The MATCH ... AGAINST syntax can also provide
relevancy ranking information:
SELECT title, description, MATCH (title, description) AGAINST ('army') AS rank
FROM film
WHERE MATCH (title, description) AGAINST ('army')
*************************** 1. row ***************************
title: ARMY FLINTSTONES
description: A Boring Saga of a Database Administrator And a Womanizer who
must Battle a Waitress in Nigeria
rank: 6.1943987015493
1 row in set (0.00 sec)
The relevancy scores are based on the weighting of words within the individual rows. Words that occur rarely in the table are ranked higher than words that appear in a large percentage of the rows.
For more information on the FULLTEXT search
engine, see the
Fulltext
Search section of the MySQL Reference Manual.
Sometimes it is not easy to identify which columns of a table to
index, even when you have identified the slow queries in your
application. The EXPLAIN statement is designed
to assist in the query optimization process by providing insight
into how the MySQL optimizer handles a specific query.
To analyze a query, precede the query with the
EXPLAIN keyword:
EXPLAIN SELECT film.title FROM actor, film, film_actor
WHERE actor.actor_id = film_actor.actor_id
AND film.film_id = film_actor.film_id
AND actor.last_name = 'Walken'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: actor
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 200
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: film_actor
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: sakila.actor.actor_id
rows: 26
Extra: Using index
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: film
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: sakila.film_actor.film_id
rows: 1
Extra:
3 rows in set (0.00 sec)
EXPLAIN returns a row of information for each
table used in the SELECT statement. The tables
are listed in the output in the order that MySQL would read them
while processing the query.
The main things to look out for are rows where the
key column is NULL, where
the type column is range,
index, or ALL, or where the
Extra column contains Using
filesort or Using temporary. Such
queries should be closely examined for proper index usage as they
generally indicate that no index is being used.
For additional information on using the EXPLAIN
statement, see the
EXPLAIN
section of the MySQL Reference Manual.