Suppose we want to get a list of species of Geese whose existence is Threatened — that’s a category of conservation states. We will need to construct a SELECT statement that takes data from the birds table and the conservation_status table. The shared data in the
birds and the conservation_status tables is the conservation_status_id column of each table. We didn’t have to give the column the same name in each table, but doing so makes it easier to know where to join them.
Enter the following in the mysql client:
SELECT common_name, conservation_state FROM birds
JOIN conservation_status
ON(birds.conservation_status_id = conservation_status.conservation_status_id) WHERE conservation_category = 'Threatened'
AND common_name LIKE '%Goose%';
+---+---+
| common_name | conservation_state | +---+---+
| Swan Goose | Vulnerable |
| Lesser White-fronted Goose | Vulnerable |
| Hawaiian Goose | Vulnerable |
| Red-breasted Goose | Endangered |
| Blue-winged Goose | Vulnerable | +---+---+
The ON operator specifies the conservation_status_id columns from each table as the common item on which to join the tables. MySQL knows the proper table in which to find the conservation_category and common_name columns, and pulls the rows that match.
That works fine, but it’s a lot to type. Let’s modify this statement to use the USING operator, specifing conservation_status_id just once to make the join. MySQL will understand what to do. Here’s that same SQL statement, but with the USING operator:
SELECT common_name, conservation_state FROM birds
JOIN conservation_status USING(conservation_status_id)
WHERE conservation_category = 'Threatened' AND common_name LIKE '%Goose%';
Now let’s modify the SQL statement to include the bird family. To do that, we’ll have to add another table, the bird_families. Let’s also include Ducks in the list. Try executing the following:
SELECT common_name AS 'Bird',
bird_families.scientific_name AS 'Family', conservation_state AS 'Status' FROM birds
JOIN conservation_status USING(conservation_status_id) JOIN bird_families USING(family_id)
WHERE conservation_category = 'Threatened' AND common_name REGEXP 'Goose|Duck'
ORDER BY Status, Bird;
+---+---+---+
| Bird | Family | Status | +---+---+---+
| Laysan Duck | Anatidae | Critically Endangered |
| Pink-headed Duck | Anatidae | Critically Endangered |
| Blue Duck | Anatidae | Endangered |
| Hawaiian Duck | Anatidae | Endangered |
| Meller's Duck | Anatidae | Endangered |
| Red-breasted Goose | Anatidae | Endangered |
| White-headed Duck | Anatidae | Endangered |
| White-winged Duck | Anatidae | Endangered |
| Blue-winged Goose | Anatidae | Vulnerable |
| Hawaiian Goose | Anatidae | Vulnerable |
| Lesser White-fronted Goose | Anatidae | Vulnerable |
| Long-tailed Duck | Anatidae | Vulnerable |
| Philippine Duck | Anatidae | Vulnerable |
| Swan Goose | Anatidae | Vulnerable |
| West Indian Whistling-Duck | Anatidae | Vulnerable |
| White-headed Steamer-Duck | Anatidae | Vulnerable | +---+---+---+
We gave two JOIN clauses in this SQL statement. It doesn’t usually matter which table is listed where. For instance, although bird_families is listed just after the join for the
conservation_statustable, MySQL determined that bird_families is to be joined to the
birds table. Without using JOIN, we would have to be more emphatic in specifying the join points, and we would have to list them in the WHERE clause. It would have to be entered like this:
SELECT common_name AS 'Bird',
bird_families.scientific_name AS 'Family', conservation_state AS 'Status' FROM birds, conservation_status, bird_families
WHERE birds.conservation_status_id = conservation_status.conservation_status_id AND birds.family_id = bird_families.family_id
AND conservation_category = 'Threatened' AND common_name REGEXP 'Goose|Duck' ORDER BY Status, Bird;
That’s a very cluttered WHERE clause, making it difficult to see clearly the conditions by which we’re selecting data from the tables. Using JOIN clauses is much tidier.
Incidentally, the SQL statement with two JOIN clauses used a regular expression — the
REGEXP operator in the WHERE clause — to specify that the clause find either Goose or Duck. We also added an ORDER BY clause to order first by Status, then by Bird name.
In this example, though, there’s little point in listing the bird family name, because the birds are all of the same family. Plus, there may be similar birds that we might like to have in the list, but that don’t have the words Goose or Duck in their name. So let’s change that in the SQL statement. Let’s also order the results differently and list birds from the least endangered to the most endangered. Enter the following:
SELECT common_name AS 'Bird from Anatidae', conservation_state AS 'Conservation Status' FROM birds
JOIN conservation_status AS states USING(conservation_status_id) JOIN bird_families USING(family_id)
WHERE conservation_category = 'Threatened' AND bird_families.scientific_name = 'Anatidae'
ORDER BY states.conservation_status_id DESC, common_name ASC;
+---+---+
| Bird from Anatidae | Conservation Status | +---+---+
| Auckland Islands Teal | Vulnerable |
| Blue-winged Goose | Vulnerable |
| Eaton's Pintail | Vulnerable |
| Hawaiian Goose | Vulnerable |
| Lesser White-fronted Goose | Vulnerable |
| Long-tailed Duck | Vulnerable |
| Marbled Teal | Vulnerable |
| Philippine Duck | Vulnerable |
| Salvadori's Teal | Vulnerable |
| Steller's Eider | Vulnerable |
| Swan Goose | Vulnerable |
| West Indian Whistling-Duck | Vulnerable |
| White-headed Steamer-Duck | Vulnerable |
| Bernier's Teal | Endangered |
| Blue Duck | Endangered |
| Brown Teal | Endangered |
| Campbell Islands Teal | Endangered |
| Hawaiian Duck | Endangered |
| Meller's Duck | Endangered |
| Red-breasted Goose | Endangered |
| Scaly-sided Merganser | Endangered |
| White-headed Duck | Endangered |
| White-winged Duck | Endangered |
| White-winged Scoter | Endangered |
| Baer's Pochard | Critically Endangered |
| Brazilian Merganser | Critically Endangered |
| Crested Shelduck | Critically Endangered |
| Laysan Duck | Critically Endangered |
| Madagascar Pochard | Critically Endangered |
| Pink-headed Duck | Critically Endangered | +---+---+
An obvious change to this example is the elimination of
bird_families.scientific_name from the list of selected columns, so only two columns appear in the output. Another change, which is cosmetic, is to provide the alias states to the conservation_status table so we could refer to the short alias later instead of the long name.
Finally, the ORDER BY clause orders the output by conservation_status_id, because that value happens to be in the order of severity in the conservation_status table. We want to override the default order, which puts the most threatened species first, so we add the DESC
option to put the least threatened first. We’re still ordering results secondarily by the
common name of the birds, but using the actual column name this time instead of an alias.
This is because we changed the alias for the common_name column from Birds to Birds
from Anatidae, because all the results are in that family. We could have used 'Birds
from Anatidae' in the ORDER BY clause, but that’s bothersome to type.
Let’s look at one more basic example of a JOIN. Suppose we wanted to get a list of
members located in Russia (i.e., where country_id has a value of ru) who have reported sighting a bird from the Scolopacidae family (shore and wader birds like Sandpipers and Curlews). Information on bird sightings is stored in the bird_sightings table. It includes GPS coordinates recorded from a bird list application on the member’s mobile phone when they note the sighting. Enter this SQL statement:
SELECT CONCAT(name_first, ' ', name_last) AS Birder,
common_name AS Bird, location_gps AS 'Location of Sighting' FROM birdwatchers.humans
JOIN birdwatchers.bird_sightings USING(human_id) JOIN rookery.birds USING(bird_id)
JOIN rookery.bird_families USING(family_id) WHERE country_id = 'ru'
AND bird_families.scientific_name = 'Scolopacidae' ORDER BY Birder;
+---+---+---+
| Birder | Bird | Location of Sighting | +---+---+---+
| Anahit Vanetsyan | Bar-tailed Godwit | 42.81958072; 133.02246094 |
| Elena Bokova | Eurasian Curlew | 51.70469364; 58.63746643 |
| Elena Bokova | Eskimo Curlew | 66.16051056; -162.7734375 |
| Katerina Smirnova | Eurasian Curlew | 42.69096856; 130.78185081 | +---+---+---+
This SQL statement joins together four tables, two from the birdwatchers database and two from the birds database. Look closely at this SQL statement and consider the purpose of including each of those four tables. All of them were needed to assemble the results shown. Incidentally, we used the CONCAT() function to concatenate together the member’s first and last name for the Birder field in the results.
There are other types of joins besides a plain JOIN. Let’s do another SELECT using another type of JOIN. For an example of this, we’ll get a list of Egrets and their conservation status. Enter the following SQL statement:
SELECT common_name AS 'Bird', conservation_state AS 'Status' FROM birds
LEFT JOIN conservation_status USING(conservation_status_id) WHERE common_name LIKE '%Egret%'
ORDER BY Status, Bird;
+---+---+
| Bird | Status | +---+---+
| Great Egret | NULL |
| Cattle Egret | Least Concern |
| Intermediate Egret | Least Concern |
| Little Egret | Least Concern |
| Snowy Egret | Least Concern |
| Reddish Egret | Near Threatened |
| Chinese Egret | Vulnerable |
| Slaty Egret | Vulnerable | +---+---+
This SELECT statement is like the previous examples, except that instead of using a JOIN, we’re using a LEFT JOIN. This type of join selects rows in the table on the left (i.e., birds) regardless of whether there is a matching row in the table on the right (i.e.,
conservation_status). Because there is no match on the right, MySQL returns a NULL value for columns it cannot reconcile from the table on the right. You can see this in the results. The Great Egret has a value of NULL for its Status. This is because no value was entered in the conservation_status_id column of the row related to that bird species. It would return NULL if the value of that column is NULL, blank if the column was set to empty (e.g., ''), or any value that does not match in the right table.
Because of the LEFT JOIN, the results show all birds with the word Egret in the common name even if we don’t know their conservation status. It also indicates which Egrets need to set the value of conservation_status_id. We’ll need to update that row and others like it. An UPDATE statement with this same LEFT JOIN can easily do that. We’ll show a couple in the next section.