Notice that vendor_id has been removed from the list of columns to be dis- played and from the ORDER BY clause. That’s because if we want the aggregation level of one row per customer per date, we can’t also include vendor_id in the output, because the customer can purchase from multiple vendors on a single date, so the results wouldn’t be aggregated at the level we wanted.
Even though it’s not required in order to get these results, we should also add customer_id to the GROUP BY list, so the query will work without error even when it’s not filtered to a single customer. We’ll make this change in the next query.
What if we wanted to find out how much this customer had spent at each vendor, regardless of date? Then we can group by customer_id and vendor_id:
SELECT
customer_id, vendor_id,
SUM(quantity * cost_to_customer_per_qty) AS total_spent FROM farmers_market.customer_purchases
WHERE
customer_id = 3
GROUP BY customer_id, vendor_id ORDER BY customer_id, vendor_id
The results of this query are shown in Figure 6.7.
We can also remove the customer_id filter— in this case by removing the entire WHERE clause since there are no other filter values— and GROUP BY customer_id only, to get a list of every customer and how much they have ever spent at the farmer’s market. The results of the following query are shown in Figure 6.8:
SELECT
customer_id,
SUM(quantity * cost_to_customer_per_qty) AS total_spent FROM farmers_market.customer_purchases
Figure 6.6
Figure 6.7
Continues
GROUP BY customer_id ORDER BY customer_id
So far, we have been doing all of this aggregation on a single table, but it can be done on joined tables, as well. It’s a good idea to join the tables without the aggregate functions first, to make sure the data is at the level of granularity you expect (and not generating duplicates) before adding the GROUP BY.
Let’s say that for the query that was grouped by customer_id and vendor_id, we want to bring in some customer details, such as first and last name, and the vendor name. We can first join the three tables together, select columns from all of the tables, and inspect the output before grouping, as shown in Figure 6.9:
SELECT
c.customer_first_name, c.customer_last_name, cp.customer_id, v.vendor_name, cp.vendor_id,
cp.quantity * cp.cost_to_customer_per_qty AS price FROM farmers_market.customer c
LEFT JOIN farmers_market.customer_purchases cp ON c.customer_id = cp.customer_id
LEFT JOIN farmers_market.vendor v ON cp.vendor_id = v.vendor_id WHERE
cp.customer_id = 3
ORDER BY cp.customer_id, cp.vendor_id
To summarize at the level of one row per customer per vendor, we will have to group by a lot more fields, including all of the customer table fields and all of Figure 6.8
Figure 6.9 (continued)
nicely in dollar form:
SELECT
c.customer_first_name, c.customer_last_name, cp.customer_id, v.vendor_name, cp.vendor_id,
ROUND(SUM(quantity * cost_to_customer_per_qty), 2) AS total_spent FROM farmers_market.customer c
LEFT JOIN farmers_market.customer_purchases cp ON c.customer_id = cp.customer_id
LEFT JOIN farmers_market.vendor v ON cp.vendor_id = v.vendor_id WHERE
cp.customer_id = 3 GROUP BY
c.customer_first_name, c.customer_last_name, cp.customer_id, v.vendor_name, cp.vendor_id
ORDER BY cp.customer_id, cp.vendor_id
We can also keep the same level of aggregation and filter to a single vendor instead of a single customer, to get a list of customers per vendor instead of vendors per customer, as shown in the following code. Note that the only line of code that is changed is the WHERE clause condition, because even though we’re changing the filter, we want the grouping level and the output fields to stay the same. You can see in Figure 6.11 that the customer_id column now has values other than 3, and the vendor_id column now is limited to vendor 9:
SELECT
c.customer_first_name, c.customer_last_name, cp.customer_id, v.vendor_name, cp.vendor_id,
ROUND(SUM(quantity * cost_to_customer_per_qty), 2) AS total_spent FROM farmers_market.customer c
Figure 6.10
Continues
LEFT JOIN farmers_market.customer_purchases cp ON c.customer_id = cp.customer_id
LEFT JOIN farmers_market.vendor v ON cp.vendor_id = v.vendor_id WHERE
cp.vendor_id = 9 GROUP BY
c.customer_first_name, c.customer_last_name, cp.customer_id, v.vendor_name, cp.vendor_id
ORDER BY cp.customer_id, cp.vendor_id
Or, we could remove the WHERE clause altogether and get one row for every customer- vendor pair in the database. This would be useful as a query to support a reporting system that allows for front- end filtering, such as Tableau. The query can provide a list of any customer that has shopped at any vendor and the sum of how much they have spent, and the reporting tool can then allow the user to choose any customer or vendor, to narrow down the results dynamically.
You can now see how all of the basic SQL components you have learned in previous chapters are coming together to build analytical reports!