■ What percentage of each vendor’s inventory is selling per time period?
■ Did the prices of any products change over time?
■ What are the total sales per vendor for the season?
■ How frequently do vendors discount their product prices?
■ Which vendor sold the most tomatoes last week?
We can’t answer questions about any time periods shorter than a day, because the timestamp of the sale isn’t included. We also don’t have any detailed information about customers, but because we have date, vendor, and product dimensions, we can slice and dice different metrics by those values.
We can add some calculated fields to the query for reporting purposes without pulling in any additional columns. The following query adds fields calculating the percentage of product quantity sold and the total discount and aliases them percent_of_available_sold and discount_amount, respectively. Let’s store this dataset as a view and use it to answer a business question:
CREATE VIEW farmers_market.vw_sales_per_date_vendor_product AS
SELECT
vi.market_date, vi.vendor_id, v.vendor_name, vi.product_id, p.product_name,
vi.quantity AS quantity_available, sales.quantity_sold,
ROUND((sales.quantity_sold / vi.quantity) * 100, 2) AS percent_of_
available_sold,
vi.original_price,
(vi.original_price * sales.quantity_sold) - sales.total_sales AS discount_amount,
sales.total_sales
FROM farmers_market.vendor_inventory AS vi LEFT JOIN
(
SELECT market_date, vendor_id, product_id,
SUM(quantity) quantity_sold,
SUM(quantity * cost_to_customer_per_qty) AS total_sales FROM farmers_market.customer_purchases
GROUP BY market_date, vendor_id, product_id ) AS sales
ON vi.market_date = sales.market_date AND vi.vendor_id = sales.vendor_id AND vi.product_id = sales.product_id LEFT JOIN farmers_market.vendor v ON vi.vendor_id = v.vendor_id
(vi.original_price * sales.quantity_sold) - sales.total_sales we’re taking the original_price of a product and multiplying it by the quantity of that product that was sold, to get the total value of the products sold, and subtracting from it the actual sales for the day, so we get a total for how much potential profit the vendor gave away in the form of discounts for that product on that day.
If a vendor asked, “What percent of my sales at each market came from each product I brought?” you could use this dataset to build a report to answer the question, because you have a summary of sales per product per vendor per market date. You will need to use a window function to get the answer, because the total you are dividing is the sum of all of a vendor’s sales per date, which means adding up the total_sales values across multiple rows of the dataset.
Once you have the total_sales for a vendor on a market date, then you can divide each row’s total sales (remember there is one row per product per vendor per market date) into the vendor’s total sales of all products for the day.
The query needed to generate this report is pictured in Figure 10.8, because the window functions make the calculations quite long, and the syntax high- lighting provided by the IDE helps make the various sections of the SQL state- ment clearer.
Let’s walk through these calculations step by step. First, note that we’re querying from the view created by the previous query, vw_sales_per_date_vendor_prod- uct. We give the total sales, which is summarized per market date, vendor, and product in the view, an alias of vendor_product_sales_on_market_date, and round it to have two digits after the decimal point, since this is a report and we want to format everything nicely.
Figure 10.8
In the next line of the SQL statement in Figure 10.8, we are summing up each vendor’s sales (of all of their products) on each market date, using a window function that partitions sales by market_date and vendor_id. (See Chapter 7 for more information about window functions.) Then we give that sum an alias of vendor_total_sales_on_market_date and round it to two decimal places.
We now have the total sales for the vendor for the day on each row, and we already had the total sales of each product the vendor sold that day. The calcu- lation in the next line is that first dollar amount divided by the second dollar amount, which calculates the percentage of the vendor’s sales on that market date represented by each product.
In the pictured rows of output at the bottom of Figure 10.8, you can see that Marco’s Peppers is only selling one product on 4/22/2020, so the sales on that row represent 100% of Marco’s sales for the day. Annie’s Pies is selling three different products, and you can see in the final column what portion of Annie’s total sales was contributed by each product.
We can write additional queries against this reusable dataset to build other reports in SQL, too. To use SQL to get the same data summary that is shown in Figure 9.18, which was grouped and visualized in Tableau, we can query the view as follows:
SELECT
market_date, vendor_name, product_name, quantity_available, quantity_sold
FROM farmers_market.vw_sales_per_date_vendor_product AS s WHERE market_date BETWEEN '2020- 06- 01' AND '2020- 07- 31' AND vendor_name = 'Marco''s Peppers'
AND product_id IN (2, 4) ORDER BY market_date, product_id
A partial view of the output of this query is in Figure 10.9, and you can com- pare the numbers to those in the bar chart in Figure 9.18. One benefit of saving queries that generate summary datasets so they are available to reuse as needed, is that any tool you use to pull the data will be referencing the underlying data, table joins, and calculated values. As long as the data isn’t changing between the report generation times, everyone using the defined dataset can get the same results.
Exercises
1. Using the view created in this chapter called farmers_market.vw_
sales_by_day_vendor, referring to Figure 10.3 for a preview of the data in the dataset, write a query to build a report that summarizes the sales per vendor per market week.
2. Rewrite the query associated with Figure 7.11 using a CTE (WITH clause).
3. If you were asked to build a report of total and average market sales by vendor booth type, how might you modify the query associated with Figure 10.3 to include the information needed for your report?
Figure 10.9
159 Most of this book is targeted at beginners, but because beginners can quickly become more advanced in developing SQL, I wanted to give you some ideas of what is possible when you think a little more creatively and go beyond the simplest SELECT statements. SQL is a powerful way to shape and summarize data into a wide variety forms that can be used for many types of analyses. This chapter includes a few examples of more complex query structures.