Helpful ways to nuance an INSERT statement

Like SELECT has other helpful quantifiers to weed through the data being returned, INSERT has ways of nuancing the origin of the data to be inserted as well as the timing and conditions of the insertion. The three most common ways of altering the way MySQL processes an INSERT statement are:

  • INSERT...SELECT...
  • INSERT DELAYED...
  • INSERT...ON DUPLICATE KEY UPDATE...

In the following section, we take each one in turn.

INSERT...SELECT...

Using INSERT...SELECT... we can tell MySQL to draw from different tables without having to draw them into Python or to set a variable in MySQL. It functions on the following syntactic template:

INSERT INTO <target table>(target column name) SELECT <source column name> FROM <source table>;

By default, the SELECT phrase of the sentence is greedy and will return as many hits as it can. As with a generic SELECT statement, however, we can restrict the hits returned using WHERE. See the Other helpful quantifiers section in the previous chapter for more on this critical argument to SELECT.

To understand how to use this technique well, let us switch to the world database from MySQL that was mentioned in the previous chapter.

USE world;

The database has three tables. If you forget what they are, simply type:

SHOW TABLES;

You will then be rewarded with the following output:

mysql> show tables;
+-----------------+
| Tables_in_world |
+-----------------+
| City | 
| Country | 
| CountryLanguage | 
+-----------------+
3 rows in set (0.00 sec)

In order to affect a statement using INSERT...SELECT..., it is necessary to understand the make-up of each database. Use DESCRIBE to get the definitions on each.

mysql> describe City;
+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment | 
| Name | char(35) | NO | | | | 
| CountryCode | char(3) | NO | | | | 
| District | char(20) | NO | | | | 
| Population | int(11) | NO | | 0 | | 
+-------------+----------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> describe Country;
+----------------+---------------------------------------------------------------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------------------------------------------------------------------------------+------+-----+---------+-------+
| Code | char(3) | NO | PRI | | | 
| Name | char(52) | NO | | | | 
| Continent | enum('Asia','Europe','North America','Africa','Oceania','Antarctica','South America') | NO | | Asia | | 
| Region | char(26) | NO | | | | 
| SurfaceArea | float(10,2) | NO | | 0.00 | | 
| IndepYear | smallint(6) | YES | | NULL | | 
| Population | int(11) | NO | | 0 | | 
| LifeExpectancy | float(3,1) | YES | | NULL | | 
| GNP | float(10,2) | YES | | NULL | | 
| GNPOld | float(10,2) | YES | | NULL | | 
| LocalName | char(45) | NO | | | | 
| GovernmentForm | char(45) | NO | | | | 
| HeadOfState | char(60) | YES | | NULL | | 
| Capital | int(11) | YES | | NULL | | 
| Code2 | char(2) | NO | | | | 
+----------------+---------------------------------------------------------------------------------------+------+-----+---------+-------+
15 rows in set (0.01 sec)
mysql> describe CountryLanguage;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| CountryCode | char(3) | NO | PRI | | | 
| Language | char(30) | NO | PRI | | | 
| IsOfficial | enum('T','F') | NO | | F | | 
| Percentage | float(4,1) | NO | | 0.0 | | 
+-------------+---------------+------+-----+---------+-------+
4 rows in set (0.01 sec)

INSERT...SELECT... allows us to draw from each of the tables to form a new one. Let's say we wanted a table Combo that operated off the same identifier as City and incorporated the names for the first 999 countries listed in that database. We would begin by creating a MySQL table for the task. Creating a MySQL table is addressed in a later chapter, so here we assume the existence of a table Combo with the following definition:

+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment | 
| Name | char(35) | NO | | NULL | | 
| CountryCode | char(3) | NO | | NULL | | 
+-------------+----------+------+-----+---------+----------------+

Having done that we can insert the desired data from City into Combo using the following INSERT command:

INSERT INTO Combo(ID, Name, CountryCode) SELECT ID, Name, CountryCode FROM City WHERE ID < 1000;

A SELECT command to the database then shows the effect. For the sake of space, let's restrict ID to 10.

mysql> SELECT * FROM Combo WHERE ID<=10;
+----+----------------+-------------+
| ID | Name | CountryCode |
+----+----------------+-------------+
| 1 | Kabul | AFG | 
| 2 | Qandahar | AFG | 
| 3 | Herat | AFG | 
| 4 | Mazar-e-Sharif | AFG | 
| 5 | Amsterdam | NLD | 
| 6 | Rotterdam | NLD | 
| 7 | Haag | NLD | 
| 8 | Utrecht | NLD | 
| 9 | Eindhoven | NLD | 
| 10 | Tilburg | NLD | 
+----+----------------+-------------+
10 rows in set (0.00 sec)

This significantly cuts down on I/O and therefore dramatically reduces processing time—whether perceived or real. It lightens the load on the network and makes it appear more responsive and able to handle more requests (all other dynamics being equal).

Note

Note that even if your program is run on the same system as the database being queried, you will still have the dynamics of a network and therefore suffer lag if your program passes too many requests to MySQL too quickly.

Sluggishness on many systems is due to excessive data transfer between processes, not because of the speed at which those processes are executed.

More information on the INSERT...SELECT... functionality can be found in the MySQL manual at:

http://dev.mysql.com/doc/refman/5.1/en/insert-select.html

INSERT DELAYED…

The DELAYED argument to INSERT causes MySQL to handle the insertion in deference to other MySQL processes. When the server is sufficiently quiet, the INSERT command is executed. Until then, MySQL keeps it on hold.

The DELAYED argument simply follows the INSERT command. Otherwise, the syntax is the same:

INSERT DELAYED INTO <some table> (<some column names>) VALUES("<some values>");

For finer details on the DELAYED argument to INSERT, see the MySQL manual at http://dev.mysql.com/doc/refman/5.1/en/insert-delayed.html

INSERT...ON DUPLICATE KEY UPDATE...

Whenever you insert a record into a large table, there is a chance of creating an identical record. If your INSERT statement would result in two identical records, MySQL will throw an error and refuse to create the record. The error you get will look something like this:

ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'

To mitigate against this error and the chance of submitted data not being inserted properly, MySQL offers this further argument to the INSERT command. The syntax is as follows:

INSERT INTO <some table>(<some column names>) VALUES ("<some values>") ON DUPLICATE KEY UPDATE <change to make the data unique>

After UPDATE, simply include what you have MySQL do to the record that you would insert in order to ensure that it is no longer a duplicate. In practice, this means incrementing the Primary key identifier. So where we get an error with one statement, we can adapt the statement. In the following statement, we get an error due to a duplicate ID number:

mysql> INSERT INTO Combo(ID, Name, CountryCode) VALUES ("27", "Singapore", "SGP");
ERROR 1062 (23000): Duplicate entry '27' for key 'PRIMARY'

Using the ON DUPLICATE KEY UPDATE... argument, we can insert the value and ensure that the record is unique:

mysql> INSERT INTO Combo(ID, Name, CountryCode) VALUES ("4078", "Singapore", "SGP") ON DUPLICATE KEY UPDATE ID=ID+1;
Query OK, 1 row affected (0.00 sec)

Note that if there is no conflict in values, MySQL will process the statement as if you did not include the ON DUPLICATE KEY UPDATE... clause.

If we then run a quantified SELECT statement against the table, we see that we now have two unique records for Singapore:

mysql> select * from Combo WHERE Name="Singapore";
+------+-----------+-------------+
| ID | Name | CountryCode |
+------+-----------+-------------+
| 3208 | Singapore | SGP | 
| 4078 | Singapore | SGP | 
+------+-----------+-------------+
2 rows in set (0.00 sec)