• Tidak ada hasil yang ditemukan

Expressions and the Like

Dalam dokumen Buku Learning MySQL and MariaDB (Halaman 154-158)

Let’s change the latest SELECT statement to include birds from multiple orders. To do this, we’ll focus in on the operator in the WHERE clause for the common_name:

AND common_name != ''

We’ll change the simple comparison here (i.e., the LIKE operator, which we saw in

Chapter 6) to select multiple names that are similar. Among many families of birds, there are often bird species that are similar but have different sizes. The smallest is sometimes referred to as the least in the common name. So let’s search the database for birds with Least in their name:

SELECT common_name AS 'Bird',

families.scientific_name AS 'Family', orders.scientific_name AS 'Order'

FROM birds, bird_families AS families, bird_orders AS orders WHERE birds.family_id = families.family_id

AND families.order_id = orders.order_id AND common_name LIKE 'Least%'

ORDER BY orders.scientific_name, families.scientific_name, common_name LIMIT 10;

+---+---+---+

| Bird | Family | Order | +---+---+---+

| Least Nighthawk | Caprimulgidae | Caprimulgiformes |

| Least Pauraque | Caprimulgidae | Caprimulgiformes |

| Least Auklet | Alcidae | Charadriiformes |

| Least Tern | Laridae | Charadriiformes |

| Least Sandpiper | Scolopacidae | Charadriiformes |

| Least Seedsnipe | Thinocoridae | Charadriiformes |

| Least Flycatcher | Tyrannidae | Passeriformes |

| Least Bittern | Ardeidae | Pelecaniformes |

| Least Honeyguide | Indicatoridae | Piciformes |

| Least Grebe | Podicipedidae | Podicipediformes | +---+---+---+

In the preceding example, using the LIKE operator, MySQL selected rows in which the

common_name starts with Least and ends with anything (i.e., the wildcard, %). We also removed the families.order_id = 102 clause, so that we wouldn’t limit the birds to a single order. The results now have birds from a few different orders.

We also changed the ORDER BY clause to have MySQL order the results in the temporary table first by the bird order’s scientific name, then by the bird family’s scientific name, and then by the bird’s common name. If you look at the results, you can see that’s what it did: it sorted the orders first. If you look at the rows for the Charadriiformes, you can see that the families for that order are in alphabetical order. The two birds in the

Caprimulgidae family are in alphabetical order.

NOTE

You cannot use alias names for columns in the ORDER BY clause, but you can use alias table names. In fact, they’re required if you’ve used the aliases in the FROM clause.

The previous example used the LIKE operator, which has limited pattern matching abilities. As an alternative, you can use REGEXP, which has many pattern matching

characters and classes. Let’s look at a simpler example, of the previous SELECT statement, but using REGEXP. In the previous example we searched for small birds, birds with a

common name starting with the word Least. The largest bird in a family is typically called Great. To add these birds, enter the following SQL statement on your server:

SELECT common_name AS 'Birds Great and Small' FROM birds

WHERE common_name REGEXP 'Great|Least' ORDER BY family_id LIMIT 10;

+---+

| Birds Great and Small | +---+

| Great Northern Loon |

| Greater Scaup |

| Greater White-fronted Goose |

| Greater Sand-Plover |

| Great Crested Tern |

| Least Tern |

| Great Black-backed Gull |

| Least Nighthawk |

| Least Pauraque |

| Great Slaty Woodpecker | +---+

The expression we’re giving with REGEXP, within the quote marks, contains two string values: Great and Least. By default, MySQL assumes the text given for REGEXP is meant to be for the start of the string. To be emphatic, you can insert a carat (i.e., ^) at the start of these string values, but it’s unnecessary. The vertical bar (i.e., |) between the two

expressions signifies that either value is acceptable — it means or.

In the results, you can see some common bird names starting with Greater, not just Great.

If we don’t want to include the Greater birds, we can exclude them with the NOT REGEXP operator. Enter the following on your server:

SELECT common_name AS 'Birds Great and Small' FROM birds

WHERE common_name REGEXP 'Great|Least' AND common_name NOT REGEXP 'Greater' ORDER BY family_id LIMIT 10;

+---+

| Birds Great and Small | +---+

| Great Northern Loon |

| Least Tern |

| Great Black-backed Gull |

| Great Crested Tern |

| Least Nighthawk |

| Least Pauraque |

| Great Slaty Woodpecker |

| Great Spotted Woodpecker |

| Great Black-Hawk |

| Least Flycatcher | +---+

Using NOT REGEXP eliminated all of the Greater birds. Notice that it was included with

AND, and not as part of the REGEXP.

Incidentally, we’re ordering here by family_id to keep similar birds together in the list and to have a good mix of Great and Least birds. The results may seem awkward, though, as the names of the birds are not ordered. We could add another column to the ORDER BY clause to alphabetize them within each family.

REGEXP and NOT REGEXP are case insensitive. If we want an expression to be case sensitive, we’ll need to add the BINARY option. Let’s get another list of birds to see this. This time we’ll search for Hawks, with the first letter in uppercase. This is because we want only Hawks and not other birds that have the word, hawk in their name, but are not a Hawk. For instance, we don’t want Nighthawks and we don’t want Hawk-Owls. The way the data is in the birds table, each word of a common name starts with an uppercase letter — the

names are in title case. So we’ll eliminate birds such as Nighthawks by using the BINARY

option to require that “Hawk” be spelled with an uppercase H and the other letters in lowercase. We’ll use NOT REGEXP to not allow Hawk-Owls. Try the following on your server:

SELECT common_name AS 'Hawks' FROM birds

WHERE common_name REGEXP BINARY 'Hawk' AND common_name NOT REGEXP 'Hawk-Owl' ORDER BY family_id LIMIT 10;

+---+

| Hawks | +---+

| Red-tailed Hawk |

| Bicolored Hawk |

| Common Black-Hawk |

| Cuban Black-Hawk |

| Rufous Crab Hawk |

| Great Black-Hawk |

| Black-faced Hawk |

| White-browed Hawk |

| Ridgway's Hawk |

| Broad-winged Hawk | +---+

I stated that REGEXP and NOT REGEXP are case insensitive, unless you add the BINARY option as we did to stipulate the collating method as binary (e.g., the letter H has a different

binary value fromn the letter h). For the common_name column, though, we didn’t need to add the BINARY option because the column has a binary collation setting. We did this unknowingly when we created the rookery database near the beginning of Chapter 4. See how we created the database by entering this from the mysql client:

SHOW CREATE DATABASE rookery \G

*************************** 1. row ***************************

Database: rookery

Create Database: CREATE DATABASE `rookery` /*!40100 DEFAULT CHARACTER SET latin1 COLLATE latin1_bin */

The COLLATE clause is set to latin1_bin, meaning Latin1 binary. Any columns that we create in tables in the rookery database, unless we specify otherwise, will be collated using latin1_bin. Execute the following statement to see how the common_name column in the birds table is set:

SHOW FULL COLUMNS

FROM birds LIKE 'common_name' \G

*************************** 1. row ***************************

Field: common_name Type: varchar(255) Collation: latin1_bin Null: YES

Key:

Default: NULL Extra:

Privileges: select,insert,update,references Comment:

This shows information just on the common_name column. Notice that the Collation is

latin1_bin. Because of that, regular expressions using REGEXP are case sensitive without having to add the BINARY option.

Looking through the birds table, we discover some common names for birds that contain the words, “Hawk Owls,” without the hyphen in between. We didn’t allow for that in the

expression we gave. We discover also that there are birds in which the word “Hawk” is not in title case — so we can’t count on looking for the uppercase letter, H. Our previous regular expression left those birds out of the results. So we’ll have to change the expression and try a different method. Enter this on your server:

SELECT common_name AS 'Hawks' FROM birds

WHERE common_name REGEXP '[[:space:]]Hawk|[[.hyphen.]]Hawk' AND common_name NOT REGEXP 'Hawk-Owl|Hawk Owl'

ORDER BY family_id;

This first, rather long REGEXP expression uses a character class and a character name. The format of character classes and character names is to put the type of character between two sets of double brackets. A character class is given between a pair of colons (e.g.,

[[:alpha:]] for alphabetic characters). A character name is given between two dots (e.g.,

[[.hyphen.]] for a hyphen). Looking at the first expression, you can deduce that we want rows in which the common_name contains either “Hawk” or “-Hawk” — that is to say, Hawk preceded by a space or a hyphen. This won’t allow for Hawk preceded by a letter (e.g., Nighthawk). The second expression excludes Hawk-Owl and Hawk Owl.

Pattern matching in regular expressions in MySQL tends to be more verbose than they are in other languages like Perl or PHP. But they do work for basic requirements. For

elaborate regular expressions, you’ll have to use an API like the Perl DBI to process the data outside of MySQL. Because that may be a performance hit, it’s better to try to accomplish such tasks within MySQL using REGEXP.

Dalam dokumen Buku Learning MySQL and MariaDB (Halaman 154-158)