• Tidak ada hasil yang ditemukan

More Advanced Query Structures

159 Most of this book is targeted at beginners, but because beginners can quickly become more advanced in developing SQL, I wanted to give you some ideas of what is possible when you think a little more creatively and go beyond the simplest SELECT statements. SQL is a powerful way to shape and summarize data into a wide variety forms that can be used for many types of analyses. This chapter includes a few examples of more complex query structures.

SELECT market_year, MIN(market_date) AS first_market_date FROM farmers_market.market_date_info

WHERE market_year = '2020'

Of course, this isn’t a sensible use case, because you could just write one query, GROUP BY market_year, and filter to WHERE market_year IN (‘2019’,’2020’) and get the same output. There are always multiple ways to write queries, but sometimes combining two queries with identical columns selected but different criteria or different aggregation is the quickest way to get the results you want.

For a more complex example combining CTEs and UNIONs, we’ll build a report that shows the products with the largest quantities available at each market: the bulk product with the largest weight available, and the unit product with the highest count available:

WITH

product_quantity_by_date AS (

SELECT

vi.market_date, vi.product_id, p.product_name,

SUM(vi.quantity) AS total_quantity_available, p.product_qty_type

FROM farmers_market.vendor_inventory vi LEFT JOIN farmers_market.product p ON vi.product_id = p.product_id GROUP BY market_date, product_id

)

SELECT * FROM (

SELECT

market_date, product_id, product_name,

total_quantity_available, product_qty_type,

RANK() OVER (PARTITION BY market_date ORDER BY total_quantity_

available DESC) AS quantity_rank FROM product_quantity_by_date WHERE product_qty_type = 'unit' UNION

SELECT

market_date, product_id, product_name,

total_quantity_available, product_qty_type,

(continued)

WHERE x.quantity_rank = 1 ORDER BY market_date

The WITH statement (CTE) at the top of this query totals up the quantity of each product that is available at each market from the vendor_inventory table, and joins in helpful information from the product table such as product name and the type of quantity, which has product_qty_type values of “lbs” or “unit.”

The inner part of the bottom query contains two different queries of the same view created in the CTE, product_quantity_by_date, UNIONed together. Each ranks the information available in the CTE by total_quantity_available (the sum of the quantity field, aggregated in the WITH clause), as well as returning all of the available fields. Note that both queries return the fields in the exact same order. The only difference between the two queries is their WHERE clauses, which separate the results by product_qty_type. In this case, we can’t simply GROUP BY the product_qty_type and remove the UNION as was possible for the initial example query in this section, because the RANK() window function is ranking by quantity available in each query, and we want to see the top item per product_qty_type, so want to return the top ranked item from each set separately.

The outer part of the bottom query selects the results of the union, and filters to only the top- ranked quantities, so we get one row per market date with the highest number of lbs, and one row per market date with the highest number of units. Some results of this query are shown in Figure 11.1. You can see that for the month of August 2019, the bulk product with the highest weight each week was organic jalapeno peppers, and the product sold by unit with the highest count each week was sweet corn.

Figure 11.1

For the sake of instruction, and because I frequently say that there are mul- tiple ways to construct queries in SQL that result in identical outputs, there is at least one other way to get the preceding output that doesn’t require a UNION. In the following query, the second query in the WITH clause queries from the first query in the WITH clause, and the final SELECT statement simply filters the result of the second query:

WITH

product_quantity_by_date AS (

SELECT

vi.market_date, vi.product_id, p.product_name,

SUM(vi.quantity) AS total_quantity_available, p.product_qty_type

FROM farmers_market.vendor_inventory vi LEFT JOIN farmers_market.product p ON vi.product_id = p.product_id GROUP BY market_date, product_id

),

rank_by_qty_type AS (

SELECT

market_date, product_id, product_name,

total_quantity_available, product_qty_type,

RANK() OVER (PARTITION BY market_date, product_qty_type ORDER BY total_quantity_available DESC) AS quantity_rank

FROM product_quantity_by_date )

SELECT * FROM rank_by_qty_type WHERE quantity_rank = 1 ORDER BY market_date

We were able to accomplish the same result without the UNION by partition- ing by both the market_date and product_qty_type in the RANK() function, resulting in a ranking for each date and quantity type.

Because I have shown two examples of UNION queries that don’t actually require UNIONs, I wanted to mention one case when a UNION is definitely required: when you have separate tables with the same columns, representing different time periods. This could happen, for example, when you have event logs (such as website traffic logs) that are stored across multiple files, and each file is loaded into its own table in the database. Or, the tables could be static snapshots of the same dynamic dataset from different points in time. Or, maybe the data was