Combining SQL and JSON - Indexing JSON Data

to you—it’s like column definitions for tables. The difference is we append a path expression on the end with the keyword PATH. This simply locates the value in the JSON document.

As you can imagine, you can form complex definitions drilling down to precisely the elements you want. The demand and use cases for this function will likely grow given that it is a recent addition, but if you need to turn a JSON document into a result set, this function can achieve those results albeit with some creativity and path expressions.

For more information about the JSON_TABLE() function, see the section entitled

“JSON Table Functions” in the online MySQL reference manual.

Tip For more information about JSON functions, see the online mySQL reference

126

WHAT ABOUT CONVERTING TEXT TO JSON?

If you have a database in which you have stored semistructured data in a teXt or BLOB field, you may want to consider converting the data to JSON documents. the JSON functions we’ve seen in this chapter are your key to successfully converting the data such as JSON_ARRAY(), JSON_OBJECT(), and JSON_VALID(). I will discuss more about this topic in Chapter 9, including suggestions and examples on how to convert existing data. You may also want to check out various blogs on converting data to JSON—just google phrases similar to, “convert to JSON.” although most blogs are Java-based, you can use them to get ideas for how to convert your own data.

Some may think the restriction prohibiting indexing of JSON columns an oversight, but it isn’t. Consider the fact that JSON documents are semistructured data that is not required to conform to any specific layout. That is, one row could contain a JSON document that not only has different keys but also may arrange the document in a different order.

Although this isn’t necessarily a show stopper for indexing and despite the special, internal mechanism used to access data in the document, indexing JSON documents directly would be cumbersome and likely to perform poorly. However, all is not lost.

MySQL 5.7 introduced a new feature called generated columns (sometimes called virtual columns) .

Generated columns are dynamically resolved columns that are defined by the CREATE or ALTER TABLE statements. There are two types of virtual columns: those that are generated on demand (called virtual generated columns), which do not use any additional storage; and those generated columns that can be stored in the rows. Virtual generated columns use the VIRTUAL option and stored generated columns use the STORED option in the CREATE or ALTER TABLE statement.

So how does this work? We create the generated column to extract data from the JSON document then use that column to create an index. Thus, the index can be used to find rows more quickly. That is if you want to perform grouping, ordering, or want to search for a subset of rows that predicate on the JSON data, you can create and index for the optimizer to use to retrieve the data more quickly.

Chapter 3 JSON DOCumeNtS

Let’s see an example. The following shows a table I created to store information in a JSON column.

CREATE TABLE `test`.`thermostats` ( `model_number` char(20) NOT NULL, `manufacturer` char(30) DEFAULT NULL, `capabilities` json DEFAULT NULL, PRIMARY KEY (`model_number`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `test`.`thermostats` VALUES ('AB-90125-C1', 'Jasper', '{"rpm":

1500, "color": "beige", "modes": ["ac"], "voltage": 110, "capability":

"auto fan"}');

INSERT INTO `test`.`thermostats` VALUES ('ODX-123','Genie','{"rpm": 3000,

"color": "white", "modes": ["ac", "furnace"], "voltage": 220, "capability":

"fan"}');

Note that this table has a single JSON field and a single character field for the model number that is also the primary key. Suppose the rows contain JSON data such as the following in the capabilities column.

MySQL localhost:33060+ ssl SQL > SELECT * FROM `test`.`thermostats` LIMIT 2 \G

*************************** 1. row ***************************

model_number: AB-90125-C1 manufacturer: Jasper

capabilities: {"rpm": 1500, "color": "beige", "modes": ["ac"], "voltage":

110, "capability": "auto fan"}

*************************** 2. row ***************************

model_number: ODX-123 manufacturer: Genie

capabilities: {"rpm": 3000, "color": "white", "modes": ["ac", "furnace"],

"voltage": 220, "capability": "fan"}

2 rows in set (0.00 sec)

Now suppose we wanted to execute queries to select rows by one or more of the data elements in the JSON document. For example, suppose we wanted to run queries that locate rows that have fans that operate at 110 volts. If the table contains hundreds of thousands or even tens of millions of rows and there is not index, the optimizer must

128

read all the rows (a table scan). However, if there is an index on the data, the optimizer merely needs to generate the virtual generated column, which is potentially more efficient.

To mitigate the potential performance issue, we can add a virtual generated column on the table using the voltage element. The following shows the ALTER TABLE statements we can use to add the virtual generated column.

ALTER TABLE `test`.`thermostats` ADD COLUMN voltage INT GENERATED ALWAYS AS (capabilities->'$.voltage') VIRTUAL;

ALTER TABLE `test`.`thermostats` ADD INDEX volts (voltage);

Note If you leave off the option, the generated column generated is a virtual

Dalam dokumen Buku Introducing the MySQL 8 Document Store (PDF) (Halaman 142-145)