Table of Contents
The most common “database” format, especially for small- to medium-sized businesses, is the spreadsheet. The reason for this is fairly obvious — no special skills are required either for design or for data entry. Not only that, a spreadsheet may well be the best format for presenting and maintaining some kinds of information. If the file is not complicated, it's easy to get a quick overview of the data and sorting on a specific field is usually just a matter of clicking a column heading.
However, as the volume or complexity of information increases, this format becomes more and more cumbersome. Information becomes more difficult to retrieve and you run into the kinds of problems usually associated with flat-table databases — data duplication, for example.
This chapter deals with migrating a spreadsheet to a MySQL database. The solution presented here is operating system (OS) neutral; it works on Mac, Windows, or any Unix-like OS.
Excel is probably the most commonly used spreadsheet format but the procedure described here applies to any spreadsheet. The only requirement is that the spreadsheet data be exported as a text file so that it can be imported into MySQL.
To help facilitate things Query Browser, one of the open source
MySQL GUI Tools, will be used. Creating database objects is made
especially easy using the Table Editor, a
feature of the Query Browser also common to other GUI Tools. By
pointing and clicking you can quickly build a table without
knowing anything about data definition language (DDL). Not only
will the table editor help you work more quickly, it's also a good
way to learn MySQL's implementation of SQL. Any alterations made
to a table using the graphical interface are shown as SQL
statements, making it easy to learn the appropriate SQL commands.
We'll take advantage of this feature to document as we go.
The example spreadsheet that we'll be importing contains information about the accreditations of members of a professional association. It's not complicated so the process should be fairly easy to follow but at the same time it does highlight the major issues you might encounter and provides general guidelines for importing spreadsheets into MySQL.
The steps we'll take are as follows:
Export the spreadsheet to a text file
Import this file wholesale into a temporary table
Create and populate permanent tables
Use the mysqldump utility to export the
tables and data
Upload these tables and data to a production server
We want to import the data directly into a table that mirrors the structure of the spreadsheet. The sample spreadsheet has the following fields with maximum required lengths as shown:
firstname – 50
lastname – 50
certification – 10
expirydate – 10
streetaddress1 – 50
streetaddress2 – 50
city – 50
state – 2
zipcode – 10
certificationnumber – 10
If you wish to follow along, create a spreadsheet with these fields and enter some sample data as shown in the table below.
Table 30.1. Spreadsheet format
| firstname | lastname | certification | expirydate | street1 | street2 | city | state | zip | certnum |
|---|---|---|---|---|---|---|---|---|---|
| John | Doe | RAC | 10-Jan-08 | 10 Mulberry St | New York | NY | 30263 | C-12345 | |
| John | Doe | ARM-1 | 28-Feb-09 | 10 Mulberry St | New York | NY | 30263 | A-44456 | |
| Jane | Doe | DAC | 10-Dec-09 | 10 Mulberry St | New York | NY | 30263 | D-4567 | |
| Bob | Smith | RAC | 02-Jan-07 | 10 Main St | Apt 10 | Detroit | MI | 20789 | C-6785 |
The only requirements are:
Any identical combination of the
firstname, lastname,
and street1 is always understood to apply
to the same individual. The same individual can appear more
than once in the file. However, ensure that the
certification differs.
Make sure that the certification numbers are unique.
Format the date as 01-Dec-07; that is,
use two digits for the day, the standard month abbreviation,
and a two digit year. Use a hyphen as a separator.
Don't exceed the specified field lengths.
If you don't have a spreadsheet program at hand, you can create a tab-separated or comma-separated text file to match the structure defined above. Of course, if you do this, you won't need to export the data.
Before attempting to convert any spreadsheet, it is best to review the data for consistency. For example, make sure that every row has the same number of columns and check that all dates are formatted in the same way. This may save major headaches by helping to spot errors early.
How you convert a spreadsheet to a text file may depend upon the
OS you are using. If you are working under Windows with an Excel
spreadsheet, open the spreadsheet in Excel and choose the
menu option under the
menu. From the Save as
type list box choose the Text(Tab
delimited) option.
Under Unix, Windows, or Mac you can use OpenOffice Calc to open a
variety of spreadsheet formats, including an Excel spreadsheet.
Export the spreadsheet from the Calc application by choosing the
and menu
options. Next choose Text CSV from the
Filter list box. This opens a dialog box for
further refining your choice. Choose the character set
Unicode or a platform-specific format, if
appropriate. As a field delimiter choose the
{Tab} option and no text delimiter at all. The
drop-down list box only offers single or double quotation marks as
alternatives; to choose no delimiter simply delete the quotation
mark.
If you are using an application that won't let you save the spreadsheet in tab-separated format, a Google spreadsheet for example, then simply save the file as a CSV file. Tab-separated text files are the easiest to import into MySQL but importing a CSV file is almost as simple. Before saving the file, review the contents first and ensure that no commas appear anywhere in the data. A stray comma can cause data corruption or complete failure when importing data in CSV format.
Save the file as data.tsv (or as
data.csv if the format is comma-separated),
have a look at the exported data in a text editor. Each record
should appear on a separate line. Don't be concerned if each line
is not a uniform length. Many programs will export the column
headings as the first row of the text file. Delete this row and
resave the file, making sure that you save it as a text file and
don't introduce any formatting.
Creating a table to match the fields as described in Section 30.2, “The Spreadsheet File” is a fairly straightforward matter. For importing the data our principal concern is to get the right information in the right fields without truncating data. By treating all fields as VARCHAR we can keep things simple and only need to worry about the order of the fields and their length.
As promised we'll use the MySQL Query Browser until we're ready to create a database dump. Query Browser is a fairly intuitive tool but for a quick overview find the documentation online at http://dev.mysql.com/doc/.
Start up Query Browser and enter your credentials and the server hostname and port — we haven't created a database yet so don't worry about the Default Schema text box. When the application opens, you'll find a list of schemata (databases) on the right. The cursor should be active in the text area at the top of the screen. This text area is used for entering queries, which are executed using the button on the right. If a result set is returned, it shows in the main area in the center of the screen.
The first thing to do is create a database. Make sure that the
Schemata tab on the right is selected, right
click anywhere in this window, and choose the option from the pop-up menu. Name the database
association. To refresh the databases shown in
the Schemata window, right click in this
window and choose the refresh menu option. Next open a script
window — we'll use this window as a scratch pad to save
copies of the queries we create. Open a script tab by choosing the
option from the
menu. After doing this two tabs, one
labeled Resultset1 and the other
New Script, should be visible on the left
below the tool bar.
To create a table, right click the association
database in the Schemata panel and choose
from the pop-up menu.
This opens the table editor, in the default view with the
Columns and Indices tab active. Enter the
name alldata in the text box at the top of the
table editor. Refer to the values shown in
Section 30.2, “The Spreadsheet File”, enter a name for each column,
choose VARCHAR as the data type, and specify a
field length. You needn't worry about making any other changes at
this point. After all, the alldata table is
only temporary.
When you are finished, use the button. This button opens a dialog box showing the SQL code that will execute. Before executing this code, copy it and paste it into the script window. The code should look something like this:
CREATE TABLE `alldata` ( `lastname` VARCHAR(50) NOT NULL, `firstname` VARCHAR(50) NOT NULL, `certification` VARCHAR(10) NOT NULL, `expirydate` VARCHAR(10) NOT NULL, `streetaddress1` VARCHAR(50) NOT NULL, `streetaddress2` VARCHAR(50) NOT NULL, `city` VARCHAR(50) NOT NULL, `state` VARCHAR(2) NOT NULL, `zipcode` VARCHAR(10) NOT NULL, `certificationnumber` VARCHAR(10) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Right click the association database in the
Schemata pane and choose the option. The newly created table should appear
beneath the association database, ready for
imported data.
To import the spreadsheet data we'll use the LOAD DATA
INFILE syntax. Security considerations can sometimes
make this a frustrating exercise, so as we go, we'll try to
anticipate any problems that may arise.
Click on the Resultset1 tab and enter the following statement into the query text box (using a path appropriate to your circumstances):
LOAD DATA INFILE “/home/peter/Documents/spreadsheet/data.tsv”
INTO TABLE alldata;
Windows pathnames are also specified using forward slashes rather than backslashes. If you do use backslashes, you must double them.
To import a comma separated file on the Windows platform use the following syntax:
LOAD DATA INFILE “C:/Documents and Settings/peter/My Documents/spreadsheet/data.csv”
INTO TABLE alldata
FIELDS TERMINATED BY “,”;
The default field terminator is a tab character so if you use a different terminator you must specify it as shown in the preceding statement.
There are other possible pitfalls when executing a LOAD
DATA INFILE statement. The rules for using a relative
path are a bit tricky so always specify the complete path to the
file. Also, a data file must be readable by all. This is usually
not an issue under Windows; on Unix operating systems, if you need
to adjust the file permissions, you can readily do this using the
GUI. To make a file world-readable from the command prompt type:
shell> chmod 755 data.tsv
Finally, the user who is executing the LOAD
DATA statement must have the FILE
privilege. If you need to grant this privilege, log in as root and
execute the command:
GRANT FILE ON *.* TO 'user'@'hostname' IDENTIFIED BY 'password';
You can do this from the command line or from within Query Browser.
The FILE privilege is a global privilege and
cannot be restricted to a specific database.
So far so good, but the syntax shown to this point only works if
the text file is located on the same system as the server. If your
MySQL server is remote, you must add the keyword
LOCAL to the LOAD DATA
INFILE syntax as in the following example:
LOAD DATA LOCAL INFILE “/home/peter/Documents/spreadsheet/data.tsv”
INTO TABLE alldata;
Using LOCAL is not much different syntactically
but servers are sometimes started up with the ability to
LOAD DATA LOCAL disabled. If the server
supports LOCAL, you can start up the MySQL
client with the --local-infile option. Another
approach is to copy the text file to the server before executing
the LOAD DATA statement.
Further complications can ensue. For files created on a Windows
system, you might have to add LINES TERMINATED BY
'\r\n' to read the file properly, because Windows
programs typically use these two characters as a line
terminator. If you need to add this clause, it follows
immediately after the table name or, if a FIELD
TERMINATED BY clause is present, immediately after
this clause.
If you run into problems and require more information about
LOAD DATA INFILE refer to the manual
http://dev.mysql.com/doc/refman5.0/en/sql-syntax.html.
Before you continue, paste the appropriate version of the
LOAD DATA INFILE statement into the script
window below the alldata table definition.
After executing this statement and loading the data you can check
that it has been copied to the alldata table
using Query Browser. To inspect the data, double click the
alldata table and find the following statement
in the query text box:
SELECT * FROM alldata LIMIT 0,1000
A LIMIT clause may not appear when using
Query Browser under Windows.
Click the button and you should be able to view the data in the query window.
You might want to review the integrity of the data again at this point. A visual inspection is fine but you might also want to automate the process with an SQL statement such as the following:
SELECT * FROM alldata PROCEDURE ANALYSE();
(Note the spelling of ANALYSE.)
Among other things, this query shows actual minimum and maximum
values for data in the various fields. If any of the maximum field
length values equal the field length, then you have probably
truncated data. Empty or NULL values in some
fields may also indicate problems.
Now that the data has been copied into a MySQL database, we need
to split it up into different tables — we want a proper
relational database and not another flat database. The most
obvious entity is a member, having the attributes name and
address. The following fields from the alldata
table belong to this entity exclusively:
lastname VARCHAR(50) NOT NULL firstname VARCHAR(50) NOT NULL streetaddress1 VARCHAR(50) NOT NULL streetaddress2 VARCHAR(50) NOT NULL city VARCHAR(50) NOT NULL state VARCHAR(2) NOT NULL zipcode VARCHAR(10) NOT NULL
Removing any fields that relate to certification gives us the basis for a members table.
We need to transfer data from the alldata table
into a members table but, since members can have more than one
certification and so appear more than once in the
alldata table, we can't just copy all records
over to a members table. To make sure that we have unique records
we need a way of uniquely identifying each member. We can do this
by combining a number of fields together to create a unique value
— a combination of the firstname,
lastname, and streetaddress1
columns fits the bill. The combination of these fields could form
a primary key, but it would be a very cumbersome one. For this
reason, we're also going to add a numeric key value — an
integer AUTO_INCREMENT field. The two new
fields are:
unique_value VARCHAR(150) id INT(11)
Create this table using the table editor in the same way that you
created the alldata table. The only new element
is an integer, auto increment field. To create this field select
INTEGER as the data type and ensure that all three check boxes in
the column options frame, Primary Key,
Not NULL, and Auto
Increment, are checked. Make sure the size of the
unique_value column is adequate and add the
other columns exactly as you did before.
When you're ready apply your changes and copy the SQL from the dialog box. It should look something like the following:
CREATE TABLE `tempmembers` ( `unique_value` VARCHAR(150) DEFAULT NULL, `id` INT(11) NOT NULL AUTO_INCREMENT, `firstname` VARCHAR(30) NOT NULL DEFAULT '', `lastname` VARCHAR(40) NOT NULL DEFAULT '', `streetaddress1` VARCHAR(60) NOT NULL DEFAULT '', `streetaddress2` VARCHAR(60) NOT NULL DEFAULT '', `city` VARCHAR(60) NOT NULL DEFAULT '', `state` VARCHAR(10) NOT NULL DEFAULT '', `zipcode` VARCHAR(10) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Paste this table definition into the script window and switch to
the Resultset1 tab to create a query to
populate this table. As described above, we want to concatenate
three columns to create a unique value and also add an auto
increment column. The remaining columns come directly from the
alldata table.
To populate the tempmembers table enter the
following SQL into the query text box and execute it:
INSERT INTO tempmembers SELECT DISTINCT CONCAT(firstname, lastname, streetaddress1) AS unique_value, NULL AS id, firstname, lastname, streetaddress1, streetaddress2, city, state, zipcode FROM alldata;
Using DISTINCT with the unique_value field
should guarantee that we don't have duplicate members and
selecting NULL as the id field generates a
unique auto increment value for each record. Look at the records
in the tempmembers table to confirm that unique
id numbers have been generated.
This is fairly close to what a final version of a members table
would look like — removing the
unique_value field would be the next step to
take but as you'll see shortly, we still need this field.
The entire spreadsheet that we've imported could be described as a
table of members' different accreditations. In the previous
section we extracted the member information from the
alldata table and created a unique id number
for each member. The task now is to replace the duplicated member
information with a single unique field. In other words, we're
going to create a member accreditations table with a foreign key.
The fields in the alldata table that apply
solely to a member accreditations table are readily identified:
`certification` VARCHAR(10) NOT NULL `expirydate` VARCHAR(10) NOT NULL, `certificationnumber` VARCHAR(10) NOT NULL
So far we've treated the expirydate field as
text. While we're creating a member accreditations table we can
convert this field to the DATE data type. The new definition for
this field is:
`expirydate` DATE DEFAULT NULL
Again we want to concatenate three columns to create a unique value and also add an integer column for the member id — so we can relate the member certifications to their matching records in the members table. The two additional columns are as follows:
`unique_value` VARCHAR(150) DEFAULT NULL `memberid` INT(11) NOT NULL DEFAULT '0',
Right click the association database in the
Schemata pane and open the table editor.
You've already added VARCHAR and
INTEGER fields so adding a
DATE type field should present no problems.
Create a table named tempmemberaccreditations
and apply your changes. The resulting table should look something
like this:
CREATE TABLE `tempmemberaccreditations` ( `unique_value` VARCHAR(150) DEFAULT NULL, `certification` VARCHAR(10) NOT NULL, `memberid` INT(11) NOT NULL DEFAULT '0', `certificationnumber` VARCHAR(10) NOT NULL, `expirydate` DATE DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Don't forget to paste it into the script window before proceeding. If you do forget, retrieving the table structure is a simple matter of executing the SQL statement:
SHOW CREATE TABLE tempmemberaccreditations;
Again we need to populate this table from the
alldata table. We are going to select all the
records from the alldata table but only
selected fields. Click on the Resultset1 tab
and enter the following query into the query text box:
INSERT INTO tempmemberaccreditations SELECT CONCAT(firstname,lastname,streetaddress1) AS unique_value, certification, 0 AS memberid, certificationnumber, STR_TO_DATE(expirydate, "%d-%b-%y") FROM alldata;
Converting a string value to a date is done using the
STR_TO_DATE function. This function takes two
string arguments; the first is a string expression of the date and
the second specifies the date format. In the
alldata table dates are in the form
'12-Dec-07'. The format specifier tells MySQL exactly how to
interpret the string representation of the date. In this case the
specifier, "%d-%b-%y", means the day of the month comes first and
is expressed as two digits — it will have a leading zero
even if the value is less than 10, the month is expressed as an
abbreviated name, and the year numerically with two digits. All
values are separated by a “-”.
The complete list of specifiers is given in the manual
immediately following discussion of the
DATE_FORMAT function.
Review the data after executing the INSERT
statement. You'll see that dates are now expressed in the default
MySQL format, the year has four digits followed by a two digit
month, and a two digit day.
At this point, reviewing the data to ensure consistency is a good
idea. Any dates that were improperly formatted in the original
spreadsheet will not convert to the DATE data
type.
When reviewing the data you'll also see that the
memberid field is set to '0' for all records.
Let's update this field using the values in the tempmembers table:
UPDATE tempmemberaccreditations t2 INNER JOIN tempmembers t ON t.unique_value=t2.unique_value SET t2.memberid = t.id;
That's the last time we'll need the
unique_value field for either of our
transitional tables. We can now relate these two tables on the
numeric id field.
With both tables populated with data it's time to get rid of the
unique_value field and while doing so we should
also change the name of our tables since they are no longer
temporary or transitional tables.
Select the tempmembers table in the
Schemata pane and open the table editor.
Rename the table to members, by changing the
table name in the text box in the top left of the table editor.
Select the unique_value field and press the
Delete key to remove it. Choose to view a dialog box with the following
content:
ALTER TABLE `tempmembers` RENAME TO `members`, DROP COLUMN `unique_value`;
Making similar changes to the tempmemberaccreditations
table will result in the following DDL statement:
ALTER TABLE `tempmemberaccreditations` RENAME TO `memberaccreditations` DROP COLUMN `unique_value`;
Copy the SQL version of these table alterations to the script window.
Adding indexes to tables is also easily accomplished using the
Query Browser. Since we expect searches on the
lastname and the city fields
these two columns are ideal candidates for indexing. Again this
can be done using the table editor. Open the table editor and
click the Indices tab. Click the
button on the bottom left and a new index
called new_index appears in the list of
indexes. Change the name to lastname_idx and
drag and drop the lastname column to the
Columns text area on the right.
Create an index on the city column in the same way. When you apply your changes you should see something similar to the following:
ALTER TABLE `members` ADD KEY `lastname_idx` (`lastname`), ADD KEY `city_idx` (`city`);
The memberaccreditations table still lacks a
primary key. To remedy this, open the table editor again so that
we can add a primary key. To do this click the
button on the lower left and ensure that
PRIMARY is selected in the
Kind drop-down list box. Create a primary key
composed of two columns by dragging the
memberid column and the
certification column to the
Columns list. When applying your changes you
should see:
ALTER TABLE `memberaccreditations` ADD PRIMARY KEY (`certification`, `memberid`);
After altering database objects, it's always an idea to refresh
the view in the Schemata pane. Do this by
right clicking the association database and
choosing the option (Under Unix this
option is called .)
Looking at the data there is yet one more change we could apply.
The certification field may indicate another
database entity. Let's create a table of accreditation acronyms
with their corresponding descriptions.
One of the simplest ways to create a table and populate it using MySQL is to issue a CREATE TABLE statement in conjunction with a SELECT statement. For instance we could create our final version of the members table in the following way:
CREATE TABLE accreditations SELECT DISTINCT certification AS acronym, '' AS description FROM alldata;
At this point we don't have the information necessary to add a description so we populate this field with an empty string.
Creating and populating a table in this way is a quick and easy way to create a populated table. The downside to creating a table in this way is that the resulting table has no primary key or indexes. I'll leave it to you to add an index to this table.
At this point we've created all the necessary tables and migrated the data to those tables. We just need to check the integrity of the data before copying it to a production server.
It's always wise to check the state of your transformed data. There's no substitute for visual inspection but there are a variety of ways to check your data using SQL.
For example, there should be no orphaned member records. Since we've migrated from a flat-table database that contained all the original data, finding an id in the members table with no corresponding record in the member accreditations table would indicate that something was wrong. The following SQL statement will return all records in the members table that don't have matching records in the member accreditations table:
SELECT `t`.`id` ,`t`.`firstname`,`t`.`lastname` FROM `members` `t` LEFT JOIN `memberaccreditations` `tma` ON `t`.`id` = `tma`.`memberid` WHERE ISNULL(`tma`.`memberid`);
If the above SELECT statement returns an empty
set, there are no orphaned member records.
An easy way to reuse this SQL statement is to save it as a view.
To do this using Query Browser, make sure the
association database is active, then right
click on any one of the tables in the Schemata
window and choose the option.
Clicking after entering a view name
opens a new tab displaying the basic syntax for creating a view.
Paste the preceding SQL statement into the AS
clause and execute the query. After refreshing the schemata the
new view should show up. You can view the record set associated
with this view in exactly the same way that you would view the
record set associated with a table.
To check that there are no orphaned records in the member accreditations table execute the following query:
SELECT `tma`.`certification`, `tma`.`memberid`,`tma`.`certificationnumber`, `tma`.`expirydate` FROM (`memberaccreditations` `tma` LEFT JOIN `members` `t` ON ((`tma`.`memberid` = `t`.`id`))) WHERE ISNULL(`t`.`id`);
Again, to save this SQL statement, convert it to a view using the procedure described above.
There are also various other ways of querying your records to verify the data. For example, if all certification numbers in the member accreditations table are meant to be unique, executing the following query would determine if there are duplicates:
SELECT COUNT(t.`certificationnumber`), t.`certificationnumber` FROM memberaccreditations t GROUP BY (t.`certificationnumber`) HAVING COUNT(t.`certificationnumber`) > 1;
Checking the number of records in the
memberaccreditations table provides further
assurance of the integrity of your data. The number should exactly
match the number of records in the alldata
table.
If you notice discrepancies in the data and wish to update records you can do this from within Query Browser. Click the button and then select the record you wish to change and place the cursor in the column you wish to change. When you are finished editing click the button.
If a record set is created from a single table having a primary key, it is editable. A disabled button indicates that the record set is not editable.
Once you're satisfied with the integrity of the data, drop the
alldata table. This is easily done by right
clicking the table and choosing the
option. Before exiting Query Browser make sure that you save the
script file of all the queries.
Once you are convinced of the validity of your data, you can move
the tables to your production server. We'll do that by first using
the mysqldump utility. To export only the final
versions of the tables, go to the command line and type:
shell> mysqldump -u username -p --databases association > newdb.sql
You can open a MySQL console window from within Query Browser. Find this option under the menu.
The mysqldump utility takes many of the same
switches as the MySQL client; as you can see, you specify your
user name and password in the same way. You also need to specify
the database name you wish to dump. In this case, the output is
redirected to a script file named newdb.sql. If
you do not wish to create a database and only want to dump the
tables in the association database, execute the
preceding command without the --databases option.
For more information about the many options available with
mysqldump see
http://dev.mysql.com/doc/5.0/en/mysqldump.html.
Have a look at the contents of the script file so that you understand what it does. Any existing tables with the specified table names will be dropped and recreated and then the data will be inserted. If you are overwriting existing data, you may want to back up your data before running the script file.
How you execute the dump script file depends upon how you access your production MySQL server. If you have direct access to the server or access through ssh, transfer the script file to the machine hosting the server, and then issue the command:
shell> mysql -u username -p < newdb.sql
If you saved only the database tables, you must specify a database when issuing the preceding command.
If you have remote access to your production server simply add the
-h hostname option to
the preceding command. You may also upload your script using an
application such as phpMyAdmin. Finally, you can open and execute
the script file from within Query Browser — but more about
this in the next section.
Migrating spreadsheet data to a MySQL database can be a relatively simple task especially when using Query Browser. However, migrating users to that database can be much more difficult. You may find that you continue to receive database updates in the form of complete but modified spreadsheets.
It may seem counterintuitive, but such updates can be handled most easily by recreating the entire database again. If we script this process then updates can be done in a matter of seconds. All we need are a few modifications to the script file that we saved as we worked.
The only additions to this script are DROP
TABLE statements — making it much easier to reuse
the database that's already there. This script can be run from the
command line as described in the previous section or you can open
it within Query Browser.
To open a script file from within Query Browser choose the option under the menu. Find the script file and select it. A script file tab will open showing the contents of the file. Syntax highlighting is one of the advantages of executing a script from within Query Browser — errors are much more easily spotted. Any errors that occur during execution are displayed in a pop-up dialog, specifying the nature of the error and also the line number. You can also set break points and step through the code one line at a time if you wish.
Using Query Browser made it easy to document our actions in migrating a spreadsheet to MySQL. This documentation is easily turned into a script file so that we can recreate the process. It can also serve as a reference for techniques to use in future migrations. Find a copy of the script file in the next section.
#use database USE association; #First make copy of Excel data #treat all fields as text DROP TABLE IF EXISTS `alldata`; CREATE TABLE `alldata` ( `lastname` VARCHAR(50) NOT NULL, `firstname` VARCHAR(50) NOT NULL, `certification` VARCHAR(10) NOT NULL, `expirydate` VARCHAR(10) NOT NULL, `streetaddress1` VARCHAR(50) NOT NULL, `streetaddress2` VARCHAR(50) NOT NULL, `city` VARCHAR(50) NOT NULL, `state` VARCHAR(2) NOT NULL, `zipcode` VARCHAR(10) NOT NULL, `certificationnumber` VARCHAR(10) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; #get data from tab-separated file created from spreadsheet #Unfortunately, variable below won't work with LOAD DATA #SET @filename = "/home/peter/Documents/spreadsheet/data.tsv"; #so hard code LOAD DATA INFILE "/home/peter/Documents/spreadsheet/data.tsv" INTO TABLE alldata; #Our server and client are on the same machine #Don't need "LOCAL" if file is on the server (local means local to the client) #will need the FILE privilege though #but if it's there you can execute this script from somewhere else on your network #Syntax with comma separated fields -- not as safe as tabs #LOAD DATA INFILE "C:/Documents and Settings/peter/My Documents/spreadsheet/data.csv" # INTO TABLE alldata # FIELDS TERMINATED BY ","; #create temporary tables #Association members information DROP TABLE IF EXISTS `tempmembers`; CREATE TABLE `tempmembers` ( `unique_value` VARCHAR(150) DEFAULT NULL, `id` INT(11) NOT NULL AUTO_INCREMENT, `firstname` VARCHAR(30) NOT NULL DEFAULT '', `lastname` VARCHAR(40) NOT NULL DEFAULT '', `streetaddress1` VARCHAR(60) NOT NULL DEFAULT '', `streetaddress2` VARCHAR(60) NOT NULL DEFAULT '', `city` VARCHAR(60) NOT NULL DEFAULT '', `state` VARCHAR(10) NOT NULL DEFAULT '', `zipcode` VARCHAR(10) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO tempmembers SELECT DISTINCT CONCAT(firstname,lastname,streetaddress1) AS unique_value, NULL AS id, firstname, lastname, streetaddress1, streetaddress2, city, state, zipcode FROM alldata; #Member accreditations DROP TABLE IF EXISTS `tempmemberaccreditations`; CREATE TABLE `tempmemberaccreditations` ( `unique_value` VARCHAR(150) DEFAULT NULL, `certification` VARCHAR(10) NOT NULL, `memberid` INT(11) NOT NULL DEFAULT '0', `certificationnumber` VARCHAR(10) NOT NULL, `expirydate` DATE DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO tempmemberaccreditations SELECT CONCAT(firstname,lastname,streetaddress1) AS unique_value, certification, 0 AS memberid, certificationnumber, STR_TO_DATE(expirydate, "%d-%b-%y") FROM alldata; #Above format for 31-Dec-07 #for 12/1/2007 use "%c/%e/%Y" #now relate the two tables and insert ids into the tempmemberaccreditations UPDATE tempmemberaccreditations t2 INNER JOIN tempmembers t ON t.unique_value=t2.unique_value SET t2.memberid = t.id; #create final version of tables DROP TABLE IF EXISTS `members`; ALTER TABLE `tempmembers` RENAME TO `members`, DROP COLUMN `unique_value`; DROP TABLE IF EXISTS `memberaccreditations`; ALTER TABLE `tempmemberaccreditations` RENAME TO `memberaccreditations`, DROP COLUMN `unique_value`; #add indices ALTER TABLE `members` ADD KEY `lastname_idx` (`lastname`), ADD KEY `city_idx` (`city`); ALTER TABLE `memberaccreditations` ADD PRIMARY KEY (`certification`, `memberid`); #create accreditations table DROP TABLE IF EXISTS accreditations; CREATE TABLE accreditations SELECT DISTINCT certification AS acronym, '' AS description FROM alldata; ALTER TABLE `accreditations` ADD PRIMARY KEY (`acronym`); #now add views DROP VIEW IF EXISTS vwOrphanedMembers; CREATE VIEW `vwOrphanedMembers` AS SELECT `t`.`id` ,`t`.`firstname`,`t`.`lastname` FROM `members` `t` LEFT JOIN `memberaccreditations` `tma` ON `t`.`id` = `tma`.`memberid` WHERE ISNULL(`tma`.`memberid`); DROP VIEW IF EXISTS vwOrphanedAccreditations; CREATE VIEW `vwOrphanedAccreditations` AS SELECT `tma`.`certification`, `tma`.`memberid`,`tma`.`certificationnumber`, `tma`.`expirydate` FROM (`memberaccreditations` `tma` LEFT JOIN `members` `t` ON ((`tma`.`memberid` = `t`.`id`))) WHERE ISNULL(`t`.`id`); #remove spreadsheet-based table DROP TABLE IF EXISTS `alldata`;