Performance problems can occur with subqueries if they are not well constructed. There can be a performance drain when a subquery is placed within an IN() operator as part of a
WHERE clause of the outer query. It’s generally better to use instead the = operator, along with AND for each column=value pair. For situations in which you suspect poor
performance with a subquery, try reconstructing the SQL statement with JOIN and
compare the differences between the two SQL statements using the BENCHMARK() function.
For ideas on improving subquery performance, Oracle has tips on their site for Optimizing Subqueries.
Summary
Many developers prefer subqueries — I do. They’re easier to construct and decipher when you have problems later. If you work on a database that is very large and has a huge
amount of activity, subqueries may not be a good choice because they can sometimes affect performance. For small databases, though, they’re fine. You should learn to use subqueries and learn how to work without them (i.e, use JOIN) so you can handle any situation presented to you. You cannot be sure which method your next employer and team of developers may being using. It’s best to be versatile.
As for learning to use JOIN, that’s hardly optional. Very few developers don’t use JOIN. Even if you prefer subqueries, they still call for JOIN. You can see this in almost all of the examples of subqueries in this chapter. You may rarely use UNION. But there’s not much to learn there. However, you should be proficient in using JOIN. So don’t avoid them;
practice manually entering SQL statements that use them. The act of typing them helps.
Exercises
The goal of the following exercises is to give you practice assembling tables using JOIN and creating subqueries. In the process of doing these exercises, think about how tables and data come together. Try to envision each table as a separate piece of paper with a list of data on it, and how you might place them on a desk to find information on them in relation to each other. In such a scenario, you might tend to place your left index finger at one point on a page on the left and your right index finger on a point on another page on your right. That’s a join. Where you point on each are the join points. As you type the SQL statements in these exercises, think of this scene and say aloud what you’re doing, what you’re telling MySQL to do. It helps to better understand the joining of tables and creating of subqueries.
1. In the birdwatchers database, there is a table called bird_sightings in which there are records of birds that members have seen in the wild. Suppose we have a contest in which we will award a prize based on the most sightings of birds from the order Galliformes. A member gets one point for each sighting of birds in this order.
Construct an SQL statement to count the number of entries from each member.
There should be two fields in the results set: one containing the human_id with
Birder as the alias; and the second field containing the number of entries with
Entries as its alias. To accomplish this, join the bird_sightings table to birds,
bird_families, and bird_orders. Remember that these tables are in a different database. You will have to use the COUNT() function and a GROUP BY clause. Do all of this with JOIN and not with subqueries. Your results should look like the following:
+---+---+
| Birder | Entries | +---+---+
| 19 | 1 |
| 28 | 5 | +---+---+
When you have successfully constructed this SQL statement, modify it to join in the
humans table. In the column list, replace the field for human_id with the first and last name of the member. Use the CONCAT() function to put them together into a single field (with a space in between the names), with the same alias. Once you make the needed changes and execute it, the results should look like this, but the number of names and points may be different:
+---+---+
| Birder | Points | +---+---+
| Elena Bokova | 4 |
| Marie Dyer | 8 | +---+---+
2. In the preceding exercises, you were asked to count the number of bird species the members sighted from the Galliformes. So that the contest is more fun, instead of giving one point for each bird species in that order, give a point for only one bird species per bird family in the bird order. That means that a member doesn’t get more points for sighting the same bird species multiple times. A member also doesn’t get more points for spotting several birds in the same family. Instead, the member has to look through bird guides to find a species for each species and then go looking for
one from each in their area. This should make the contest more of an adventure for the members.
To allow for the change to the contest, you will need to modify the SQL statement you constructed at the end of the previous exercise. First, you will need to add a
DISTINCT to the start of the column list in the outer query. You’ll need to remove the
CONCAT() and GROUP BY. When you’ve done that, execute the SQL statement to make sure you have no errors. You should get a results set that shows multiple
entries for some members. Next, place the whole SQL statement inside another SQL statement to make it a subquery. The new, outer query should include CONCAT() and
GROUP BY so that it can count the single entries from each family for each member. It should return results like this:
+---+---+
| Birder | Points | +---+---+
| Elena Bokova | 1 |
| Marie Dyer | 5 | +---+---+
3. There are five families in the Galliformes bird order. For the contest described in the last two exercises, the most points that a member could achieve therefore is 5.
Change the SQL statement you entered at the end of the previous exercise to list only members who have 5 points. To do this, you will need to wrap the previous SQL statement inside another, creating a nested query. When you execute the full SQL statement, the results should look like this:
+---+---+
| Birder | Points | +---+---+
| Marie Dyer | 5 | +---+---+