3 Aggregates
> (require plisqin-examples/adventure-works/schema)
3.1 Grouping
(from p Product)
> (aw:show-table (from p Product (group-by (Color p)) (select (Color p))))
Show TableShow SQL
select p.Color as Color
from Product p
group by p.Color
Color
#<sql-null>
Black
Blue
Grey
Multi
Red
Silver
Silver/Black
White
Yellow
Note that (select (Color p)) was not a illegal select. We grouped by Color, so each group will have exactly one Color by definition.
3.2 Traditional Aggregation
It seems that group-by is mostly useless on its own. Once we add aggregates, things start to make sense. Plisqin provides the following aggregate operations: count, avg, min, max, sum.
> (aw:show-table (from p Product (group-by (Color p)) (select (Color p)) (select (count p)) (select (min (ListPrice p))) (select (max (ListPrice p)))))
Show TableShow SQL
select p.Color as Color
, count(*)
, min(p.ListPrice)
, max(p.ListPrice)
from Product p
group by p.Color
Color
count(*)
min(p.ListPrice)
max(p.ListPrice)
#<sql-null>
248
0.0
229.49
Black
93
0.0
3374.99
Blue
26
34.99
2384.07
Grey
1
125.0
125.0
Multi
8
8.99
89.99
Red
38
34.99
3578.27
Silver
43
0.0
3399.99
Silver/Black
7
40.49
80.99
White
4
8.99
9.5
Yellow
36
53.99
2384.07
The previous query is an example of traditional aggregation. Soon we will contrast traditional aggregation with another style.
> (aw:show-table (from p Product (select (count p)) (select (min (ListPrice p))) (select (max (ListPrice p)))))
Show TableShow SQL
select count(*)
, min(p.ListPrice)
, max(p.ListPrice)
from Product p
count(*)
min(p.ListPrice)
max(p.ListPrice)
504
0.0
3578.27
Attempting to add (select (ListPrice p)) to the previous example would be another illegal select. The grouping is implicit, but the reasoning is the same – the group of all Products does not have a single List Price.
3.3 Illegal vs Quasi-Legal Selects
(from p Product (group-by (Color p)) (select (Color p)) (select (count p)) (select (min (ListPrice p))) (select (max (ListPrice p))))
> (aw:show-table (from p Product (group-by (ProductCategoryID p)) (select (ProductCategoryID p)) (select (CategoryName p)) (select (count p)) (select (min (ListPrice p))) (select (max (ListPrice p)))))
Show TableShow SQL
select subcat.ProductCategoryID as ProductCategoryID
, cat.Name as CategoryName
, count(*)
, min(p.ListPrice)
, max(p.ListPrice)
from Product p
left join ProductSubcategory subcat
on (p.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = p.ProductSubcategoryID))
left join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
group by subcat.ProductCategoryID
ProductCategoryID
CategoryName
count(*)
min(p.ListPrice)
max(p.ListPrice)
#<sql-null>
#<sql-null>
209
0.0
196.92
1
Bikes
97
539.99
3578.27
2
Components
134
20.24
1431.5
3
Clothing
35
8.99
89.99
4
Accessories
29
2.29
159.0
We have already grouped by ProductCategoryID.
ProductCategoryID is a unique key of the Category table.
Therefore, for every column C of the Category table, each group contains exactly one value of C.
CategoryName is a column of the Category table.
Therefore, each group contains exactly one CategoryName.
Therefore, "What is the CategoryName of each group?" has an unambiguous answer, and it is not an illegal select.
I think I read that PostgreSQL now accepts some quasi-legal selects, but only when it can prove that they are not illegal, which is probably impossible to do with 100% accuracy.
3.4 Grouped Join Aggregation
> (define/contract (Products-by-Category cat) (-> (instanceof ProductCategory) (instanceof Product)) (join p Product #:to cat (join-type 'left) (group-by (ProductCategoryID p)) (join-on (.= (?? (ProductCategoryID p) /void) (ProductCategoryID cat)))))
According to my normal naming convention, group-of-products would be named productsG for brevity. The longer name is for clarity.
> (aw:show-table (from cat ProductCategory (select (ProductCategoryID cat)) (select (CategoryName cat)) (define group-of-products (Products-by-Category cat)) (select (count group-of-products)) (select (min (ListPrice group-of-products))) (select (max (ListPrice group-of-products)))))
Show TableShow SQL
select cat.ProductCategoryID as ProductCategoryID
, cat.Name as CategoryName
, p.__INJECT1
, p.__INJECT2
, p.__INJECT3
from ProductCategory cat
left join (
select subcat.ProductCategoryID as __INJECT0
, count(*) as __INJECT1
, min(p.ListPrice) as __INJECT2
, max(p.ListPrice) as __INJECT3
from Product p
left join ProductSubcategory subcat
on (p.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = p.ProductSubcategoryID))
group by subcat.ProductCategoryID
) p
on (p.__INJECT0 is not null and (p.__INJECT0 = cat.ProductCategoryID))
ProductCategoryID
CategoryName
__INJECT1
__INJECT2
__INJECT3
1
Bikes
97
539.99
3578.27
2
Components
134
20.24
1431.5
3
Clothing
35
8.99
89.99
4
Accessories
29
2.29
159.0
Notice that grouped join aggregation gives us better composability than traditional aggregation thanks to separation of concerns. Specifically, the "grouping" and "aggregating" concerns are now separated. The Products-by-Category procedure says nothing about aggregation; it only encodes the relationship "a ProductCategory has a group of Products." This means we can reuse it in other queries with other aggregations. And each aggregate (such as max) says nothing about grouping, but it will accept any grouped join you give it.
Also notice that we no longer have the quasi-legal select that we did in the traditional aggregation version of this query. Each select is fully legal here. This means that grouped join aggregation is usually easier to use if you are using a database that does not allow quasi-legal selects.
(define/contract (MinListPrice cat) (-> (instanceof ProductCategory) Number?) (min (ListPrice (Products-by-Category cat))))
3.5 Summary
; Traditional aggregation, this works: (from p Product (group-by (Color p)) (select (Color p)) (select (count p)) (select (min (ListPrice p))) (select (max (ListPrice p)))) ; Grouped join aggregation, this does not work ; because there is no Color table: (from c Color (define productsG (Products-by-Color c)) (select (ColorName c)) (select (count p)) (select (min (ListPrice p))) (select (max (ListPrice p))))
; traditional aggregation: (from p Product (group-by (ProductCategoryID p)) (select (ProductCategoryID p)) ; A quasi-legal select; this won't work in some databases: (select (CategoryName p)) (select (count p)) (select (min (ListPrice p))) (select (max (ListPrice p)))) ; grouped join aggregation: (from cat ProductCategory ; All selects are fully legal here: (select (ProductCategoryID cat)) (select (CategoryName cat)) (define group-of-products (Products-by-Category cat)) (select (count group-of-products)) (select (min (ListPrice group-of-products))) (select (max (ListPrice group-of-products))))
> (aw:show-table (from p Product (group-by (ProductCategoryID p)) (select (ProductCategoryID p)) (select (CategoryName p)) (select (count p)) (select (min (ListPrice p))) (select (max (ListPrice p)))))
Show TableShow SQL
select subcat.ProductCategoryID as ProductCategoryID
, cat.Name as CategoryName
, count(*)
, min(p.ListPrice)
, max(p.ListPrice)
from Product p
left join ProductSubcategory subcat
on (p.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = p.ProductSubcategoryID))
left join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
group by subcat.ProductCategoryID
ProductCategoryID
CategoryName
count(*)
min(p.ListPrice)
max(p.ListPrice)
#<sql-null>
#<sql-null>
209
0.0
196.92
1
Bikes
97
539.99
3578.27
2
Components
134
20.24
1431.5
3
Clothing
35
8.99
89.99
4
Accessories
29
2.29
159.0