2 Using define-schema
For this walkthrough, you are a new employee at AdventureWorks. AdventureWorks is a fictional company that sells bicycles and related products. The company has a mature database, but the SQL that has been written so far is scattered all over the place, making it difficult to find and reuse.
Your boss is going to ask you for a series of reports. As you produce these reports, you will capture certain facts about this database using define-schema, making it easier for yourself and other programmers to find and reuse them. You will also learn how Plisqin offers opportunities for code reuse that are simply impossible using plain SQL.
2.1 Teaser
> (aw:show-table (from p Product (limit 3) (select (ProductName p)) (select (CategoryName p)) (select (TotalSales p)) (order-by 'desc (TotalSales p))))
Show TableShow SQL
select p.Name as ProductName
, cat.Name as CategoryName
, round(detailsG.__INJECT1, 2) as TotalSales
from Product p
left join (
select detailsG.ProductID as __INJECT0
, sum(detailsG.LineTotal) as __INJECT1
from SalesOrderDetail detailsG
group by detailsG.ProductID
) detailsG
on (detailsG.__INJECT0 = p.ProductID)
left join ProductSubcategory subcat
on (p.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = p.ProductSubcategoryID))
left join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
order by round(detailsG.__INJECT1, 2) desc
limit 3
ProductName
CategoryName
TotalSales
Mountain-200 Black, 38
Bikes
4400592.8
Mountain-200 Black, 42
Bikes
4009494.76
Mountain-200 Silver, 38
Bikes
3693678.03
Click on "Show SQL" and you might be surprised! The generated SQL is much larger than the Plisqin query. This may seem like "too much magic" right now, but once you learn how it works it is actually pretty formulaic.
2.2 Getting Started
> (aw:show-table "select datetime('now')")
Show TableShow SQL
select datetime('now')
datetime('now')
2025-02-01 16:27:02
If you really want to know how to automatically generate this, see Appendix A: Generating the Initial Schema Definition.
> (aw:show-table (from pc ProductCategory (select (Name pc))))
Show TableShow SQL
select pc.Name as Name
from ProductCategory pc
Name
Bikes
Components
Clothing
Accessories
Now that aw-schema.rkt is somewhat large, DrRacket may take a long time to run it. For this reason, I recommend that you require it from another file, where you will do any work that does not require a change to the define-schema code. (The answer keys that I will share with you do not follow this advice.)
> (adventure-works-schema '(_ ProductCategory)) '(ModifiedDate Name ProductCategoryID rowguid)
> (adventure-works-schema '(ProductCategoryID _)) '(ProductCategory ProductSubcategory)
(The previous REPL interaction is very useful for discovering relationships between tables in this database. This is one reason that I prefer primary key columns to be named like "ProductCategoryID" rather than "ID".)
(adventure-works-schema 'tables)
2.3 The Tasks
Create a query.
Repeatedly refactor that query. In this part, you will add definitions to your aw-schema.rkt file.
Recap and verify that your refactorings were correct. Here I will provide an answer key that you can check.
Proceed to the next task.
During refactoring, I will link you to refactoring recipes that you will follow. These recipes are meant to be very thorough. As you build confidence with the recipes, you might start taking shortcuts. This is fine. You can use the Recap to make sure you are in sync.
2.3.1 Task 1: Subcategories & Categories
I want to see a list of Subcategories with the Category that they belong to. |
|
Remember you can use (adventure-works-schema 'tables) to list all the tables.
> (aw:show-table (from subcat ProductSubcategory (limit 5)))
Show TableShow SQL
select subcat.*
from ProductSubcategory subcat
limit 5
ProductSubcategoryID
ProductCategoryID
Name
rowguid
ModifiedDate
1
1
Mountain Bikes
{2D364ADE-264A-433C-B092-4FCBF3804E01}
2008-04-30 00:00:00
2
1
Road Bikes
{000310C0-BCC8-42C4-B0C3-45AE611AF06B}
2008-04-30 00:00:00
3
1
Touring Bikes
{02C5061D-ECDC-4274-B5F1-E91D76BC3F37}
2008-04-30 00:00:00
4
2
Handlebars
{3EF2C725-7135-4C85-9AE6-AE9A3BDD9283}
2008-04-30 00:00:00
5
2
Bottom Brackets
{A9E54089-8A1E-4CF5-8646-E3801F685934}
2008-04-30 00:00:00
> (adventure-works-schema '(ProductCategoryID _)) '(ProductCategory ProductSubcategory)
(from subcat ProductSubcategory (limit 5) (join cat ProductCategory (join-on (.= (ProductCategoryID cat) (ProductCategoryID subcat)))))
> (aw:show-table (from subcat ProductSubcategory (limit 5) (join cat ProductCategory (join-on (.= (ProductCategoryID cat) (ProductCategoryID subcat)))) (select (Name subcat)) (select (Name cat))))
Show TableShow SQL
select subcat.Name as Name
, cat.Name as Name
from ProductSubcategory subcat
inner join ProductCategory cat
on (cat.ProductCategoryID = subcat.ProductCategoryID)
limit 5
Name
Name
Mountain Bikes
Bikes
Road Bikes
Bikes
Touring Bikes
Bikes
Handlebars
Components
Bottom Brackets
Components
Now the query seems to be returning the correct data. We saw the "Mountain Bikes" subcategory earlier, but now we also see that it belongs to the "Bikes" category. One obvious problem is that both columns are shown as "Name". We will immediately fix that during refactoring.
(from subcat ProductSubcategory (limit 5) (join cat ProductCategory (join-on (.= (ProductCategoryID cat) (ProductCategoryID subcat)))) (select (SubcategoryName subcat)) (select (Name cat)))
(from subcat ProductSubcategory (limit 5) (join cat ProductCategory (join-on (.= (ProductCategoryID cat) (ProductCategoryID subcat)))) (select (SubcategoryName subcat)) (select (CategoryName cat)))
(from subcat ProductSubcategory (limit 5) (join cat (ProductCategory subcat)) (select (SubcategoryName subcat)) (select (CategoryName cat)))
(from subcat ProductSubcategory (limit 5) (join cat (ProductCategory subcat)) (select (SubcategoryName subcat)) (select (CategoryName (ProductCategory subcat))))
But we are still not done refactoring. Use the Scalar Flattening recipe to create the following equivalent query:
> (aw:show-table (from subcat ProductSubcategory (limit 5) (select (SubcategoryName subcat)) (select (CategoryName subcat))))
Show TableShow SQL
select subcat.Name as SubcategoryName
, cat.Name as CategoryName
from ProductSubcategory subcat
inner join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
limit 5
SubcategoryName
CategoryName
Mountain Bikes
Bikes
Road Bikes
Bikes
Touring Bikes
Bikes
Handlebars
Components
Bottom Brackets
Components
And now we are done!
Refactoring RecapWhile refactoring our query, we made the following enhancements to our schema. You can try the following on your REPL and verify that no error is raised.
(SubcategoryName ProductSubcategory) (CategoryName ProductCategory) (ProductCategory ProductSubcategory) (CategoryName ProductSubcategory)
My solution for this section is here.
2.3.2 Task 2: Products & Subcategories & Categories
Show me a list of Products with Subcategory and Category names. |
|
> (aw:show-table (from prd Product (limit 5)))
Show TableShow SQL
select prd.*
from Product prd
limit 5
ProductID
Name
ProductNumber
MakeFlag
FinishedGoodsFlag
Color
SafetyStockLevel
ReorderPoint
StandardCost
ListPrice
Size
SizeUnitMeasureCode
WeightUnitMeasureCode
Weight
DaysToManufacture
ProductLine
Class
Style
ProductSubcategoryID
ProductModelID
SellStartDate
SellEndDate
DiscontinuedDate
rowguid
ModifiedDate
1
Adjustable Race
AR-5381
0
0
#<sql-null>
1000
750
0
0.0
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
0
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
2008-04-30 00:00:00
#<sql-null>
#<sql-null>
{694215B7-08F7-4C0D-ACB1-D734BA44C0C8}
2014-02-08 10:01:36.827000000
2
Bearing Ball
BA-8327
0
0
#<sql-null>
1000
750
0
0.0
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
0
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
2008-04-30 00:00:00
#<sql-null>
#<sql-null>
{58AE3C20-4F3A-4749-A7D4-D568806CC537}
2014-02-08 10:01:36.827000000
3
BB Ball Bearing
BE-2349
1
0
#<sql-null>
800
600
0
0.0
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
1
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
2008-04-30 00:00:00
#<sql-null>
#<sql-null>
{9C21AED2-5BFA-4F18-BCB8-F11638DC2E4E}
2014-02-08 10:01:36.827000000
4
Headset Ball Bearings
BE-2908
0
0
#<sql-null>
800
600
0
0.0
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
0
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
2008-04-30 00:00:00
#<sql-null>
#<sql-null>
{ECFED6CB-51FF-49B5-B06C-7D8AC834DB8B}
2014-02-08 10:01:36.827000000
316
Blade
BL-2036
1
0
#<sql-null>
800
600
0
0.0
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
1
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
#<sql-null>
2008-04-30 00:00:00
#<sql-null>
#<sql-null>
{E73E9750-603B-4131-89F5-3DD15ED5FF80}
2014-02-08 10:01:36.827000000
The code (?? expr /void) says that "if expr is null, it should not be considered equal to anything." You can read more about this at Nullability.
(from prd Product (limit 5) (join subcat ProductSubcategory (join-type 'left) (join-on (.= (ProductSubcategoryID subcat) (?? (ProductSubcategoryID prd) /void)))))
> (aw:show-table (from prd Product (limit 5) (join subcat ProductSubcategory (join-type 'left) (join-on (.= (ProductSubcategoryID subcat) (?? (ProductSubcategoryID prd) /void)))) (select (Name prd)) (select (ProductNumber prd)) (select (SubcategoryName subcat))))
Show TableShow SQL
select prd.Name as Name
, prd.ProductNumber as ProductNumber
, subcat.Name as SubcategoryName
from Product prd
left join ProductSubcategory subcat
on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))
limit 5
Name
ProductNumber
SubcategoryName
Adjustable Race
AR-5381
#<sql-null>
Bearing Ball
BA-8327
#<sql-null>
BB Ball Bearing
BE-2349
#<sql-null>
Headset Ball Bearings
BE-2908
#<sql-null>
Blade
BL-2036
#<sql-null>
> (adventure-works-schema '(CategoryName _)) '(ProductCategory ProductSubcategory)
Sweet! We can see that CategoryName is defined for ProductSubcategory. We did this as part of the previous task’s refactoring. That investment pays off now, because our current query has an instance of ProductSubcategory, so we can pass it into CategoryName:
> (aw:show-table (from prd Product (limit 5) (join subcat ProductSubcategory (join-type 'left) (join-on (.= (ProductSubcategoryID subcat) (?? (ProductSubcategoryID prd) /void)))) (select (Name prd)) (select (ProductNumber prd)) (select (SubcategoryName subcat)) (select (CategoryName subcat))))
Show TableShow SQL
select prd.Name as Name
, prd.ProductNumber as ProductNumber
, subcat.Name as SubcategoryName
, cat.Name as CategoryName
from Product prd
left join ProductSubcategory subcat
on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))
left join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
limit 5
Name
ProductNumber
SubcategoryName
CategoryName
Adjustable Race
AR-5381
#<sql-null>
#<sql-null>
Bearing Ball
BA-8327
#<sql-null>
#<sql-null>
BB Ball Bearing
BE-2349
#<sql-null>
#<sql-null>
Headset Ball Bearings
BE-2908
#<sql-null>
#<sql-null>
Blade
BL-2036
#<sql-null>
#<sql-null>
This query looks good!
(from prd Product (limit 5) (join subcat (ProductSubcategory prd)) (select (Name prd)) (select (ProductNumber prd)) (select (SubcategoryName subcat)) (select (CategoryName subcat)))
(from prd Product (limit 5) (join subcat (ProductSubcategory prd)) (select (Name prd)) (select (ProductNumber prd)) (select (SubcategoryName (ProductSubcategory prd))) (select (CategoryName (ProductSubcategory prd))))
(from prd Product (limit 5) (select (Name prd)) (select (ProductNumber prd)) (select (SubcategoryName prd)) (select (CategoryName prd)))
Use the Name Clarification recipe to create the following equivalent query:
> (aw:show-table (from prd Product (limit 5) (select (ProductName prd)) (select (ProductNumber prd)) (select (SubcategoryName prd)) (select (CategoryName prd))))
Show TableShow SQL
select prd.Name as ProductName
, prd.ProductNumber as ProductNumber
, subcat.Name as SubcategoryName
, cat.Name as CategoryName
from Product prd
left join ProductSubcategory subcat
on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))
left join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
limit 5
ProductName
ProductNumber
SubcategoryName
CategoryName
Adjustable Race
AR-5381
#<sql-null>
#<sql-null>
Bearing Ball
BA-8327
#<sql-null>
#<sql-null>
BB Ball Bearing
BE-2349
#<sql-null>
#<sql-null>
Headset Ball Bearings
BE-2908
#<sql-null>
#<sql-null>
Blade
BL-2036
#<sql-null>
#<sql-null>
And now we are done!
Refactoring RecapWhile refactoring our query, we made the following enhancements to our schema. You can try the following on your REPL and verify that no error is raised.
(ProductSubcategory Product) (SubcategoryName Product) (CategoryName Product) (ProductName Product)
My solution for this section is here.
2.3.3 Task 3: Products with Non-Zero Sales
Show me a list of Products that have non-zero sales, with Subcategory and Category names. |
|
(adventure-works-schema 'tables)
(adventure-works-schema '(_ SalesOrderHeader)) (adventure-works-schema '(_ SalesOrderDetail))
> (aw:show-table (from prd Product (limit 5) (select (ProductName prd)) (select (ProductNumber prd)) (select (SubcategoryName prd)) (select (CategoryName prd)) (where (exists (from dtl SalesOrderDetail (where (.= (ProductID dtl) (ProductID prd))))))))
Show TableShow SQL
select prd.Name as ProductName
, prd.ProductNumber as ProductNumber
, subcat.Name as SubcategoryName
, cat.Name as CategoryName
from Product prd
left join ProductSubcategory subcat
on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))
left join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
where exists (
select dtl.*
from SalesOrderDetail dtl
where (dtl.ProductID = prd.ProductID)
)
limit 5
ProductName
ProductNumber
SubcategoryName
CategoryName
Sport-100 Helmet, Red
HL-U509-R
Helmets
Accessories
Sport-100 Helmet, Black
HL-U509
Helmets
Accessories
Mountain Bike Socks, M
SO-B909-M
Socks
Clothing
Mountain Bike Socks, L
SO-B909-L
Socks
Clothing
Sport-100 Helmet, Blue
HL-U509-B
Helmets
Accessories
This query looks good!
> (aw:show-table (from prd Product (limit 5) (select (ProductName prd)) (select (ProductNumber prd)) (select (SubcategoryName prd)) (select (CategoryName prd)) (where (HasSales? prd))))
Show TableShow SQL
select prd.Name as ProductName
, prd.ProductNumber as ProductNumber
, subcat.Name as SubcategoryName
, cat.Name as CategoryName
from Product prd
left join ProductSubcategory subcat
on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))
left join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
where exists (
select dtl.*
from SalesOrderDetail dtl
where (dtl.ProductID = prd.ProductID)
)
limit 5
ProductName
ProductNumber
SubcategoryName
CategoryName
Sport-100 Helmet, Red
HL-U509-R
Helmets
Accessories
Sport-100 Helmet, Black
HL-U509
Helmets
Accessories
Mountain Bike Socks, M
SO-B909-M
Socks
Clothing
Mountain Bike Socks, L
SO-B909-L
Socks
Clothing
Sport-100 Helmet, Blue
HL-U509-B
Helmets
Accessories
And now we are done!
Refactoring RecapWhile refactoring our query, we made the following enhancements to our schema. You can try the following on your REPL and verify that no error is raised.(HasSales? Product)
Notice the encapsulation that (HasSales? Product) provides. Today it is implemented using exists, but in the future we might decide to denormalize and add a HasSales column to the Product table. The important point is that we can choose a different implementation without breaking any calling code - the calling code will always remain (HasSales? product). This is not true in SQL - switching from an "exists" implementation to a simple column access would require updating all the call sites.
2.3.4 Task 4: Sales by Product
Show me a list of the best-selling Products of all time. Sort by total revenue. Include total quantity sold and subcategory. |
|
So far, each query prior to refactoring has been similar to how you would write the query using plain SQL. This time, the initial query will use grouped join aggregation, a technique that is unlike anything in SQL. You don’t need to understand this right now in order to perform the refactoring, but if you are curious, the Aggregates section should be next on your reading list.
We need to create a query. The first step is to determine the table we need to query. This query will be of the Product table. But in order to get "total revenue" and "total quantity sold", we will need to use some aggregations. We recall from the previous task that the SalesOrderDetail table points to the Product table. From the other direction, we can say that each Product has a group of SalesOrderDetail records – the records whose ProductID matches the Product. We build the detailsG grouped join and perform aggregations over it to solve this task.
> (aw:show-table (from prd Product (limit 5) (select (ProductNumber prd)) (select (SubcategoryName prd)) (join detailsG SalesOrderDetail (join-type 'left) (group-by (ProductID detailsG)) (join-on (.= (ProductID detailsG) (ProductID prd)))) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG)))))
Show TableShow SQL
select prd.ProductNumber as ProductNumber
, subcat.Name as SubcategoryName
, round(detailsG.__INJECT1, 2) as TotalSales
, detailsG.__INJECT2 as TotalQtySold
from Product prd
left join (
select detailsG.ProductID as __INJECT0
, sum(detailsG.LineTotal) as __INJECT1
, sum(detailsG.OrderQty) as __INJECT2
from SalesOrderDetail detailsG
group by detailsG.ProductID
) detailsG
on (detailsG.__INJECT0 = prd.ProductID)
left join ProductSubcategory subcat
on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))
order by detailsG.__INJECT1 desc
limit 5
ProductNumber
SubcategoryName
TotalSales
TotalQtySold
BK-M68B-38
Mountain Bikes
4400592.8
2977
BK-M68B-42
Mountain Bikes
4009494.76
2664
BK-M68S-38
Mountain Bikes
3693678.03
2394
BK-M68S-42
Mountain Bikes
3438478.86
2234
BK-M68S-46
Mountain Bikes
3434256.94
2216
This query looks good!
> (aw:show-table (from prd Product (limit 5) (select (ProductNumber prd)) (select (SubcategoryName prd)) (join detailsG (DetailsG prd)) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG)))))
Show TableShow SQL
select prd.ProductNumber as ProductNumber
, subcat.Name as SubcategoryName
, round(detailsG.__INJECT1, 2) as TotalSales
, detailsG.__INJECT2 as TotalQtySold
from Product prd
inner join (
select detailsG.ProductID as __INJECT0
, sum(detailsG.LineTotal) as __INJECT1
, sum(detailsG.OrderQty) as __INJECT2
from SalesOrderDetail detailsG
group by detailsG.ProductID
) detailsG
on (detailsG.__INJECT0 = prd.ProductID)
left join ProductSubcategory subcat
on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))
order by detailsG.__INJECT1 desc
limit 5
ProductNumber
SubcategoryName
TotalSales
TotalQtySold
BK-M68B-38
Mountain Bikes
4400592.8
2977
BK-M68B-42
Mountain Bikes
4009494.76
2664
BK-M68S-38
Mountain Bikes
3693678.03
2394
BK-M68S-42
Mountain Bikes
3438478.86
2234
BK-M68S-46
Mountain Bikes
3434256.94
2216
You might want to go further with the refactoring by using the Inline Join recipe to move detailsG inline, then using the Scalar -> Schema Definition recipe to define (TotalSales Product) and (TotalQtySold Product). This might be a good idea, and you are welcome to do so. But I am going to stop here for now.
Refactoring RecapWhile refactoring our query, we made the following enhancements to our schema. You can try the following on your REPL and verify that no error is raised.(DetailsG Product)
My solution for this section is here.
2.3.5 Task 5: Sales by Subcategory
Show me a list of the best-selling Subcategories of all time. Sort by total revenue. Include total quantity sold and category name. |
|
> (aw:show-table (from subcat ProductSubcategory (limit 5) (select (SubcategoryName subcat)) (select (CategoryName subcat)) (join detailsG SalesOrderDetail (join-type 'left) (join prd Product (join-on (.= (ProductID prd) (ProductID detailsG)))) (group-by (ProductSubcategoryID prd)) (join-on (.= (?? (ProductSubcategoryID prd) /void) (ProductSubcategoryID subcat)))) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG)))))
Show TableShow SQL
select subcat.Name as SubcategoryName
, cat.Name as CategoryName
, round(detailsG.__INJECT1, 2) as TotalSales
, detailsG.__INJECT2 as TotalQtySold
from ProductSubcategory subcat
left join (
select prd.ProductSubcategoryID as __INJECT0
, sum(detailsG.LineTotal) as __INJECT1
, sum(detailsG.OrderQty) as __INJECT2
from SalesOrderDetail detailsG
inner join Product prd
on (prd.ProductID = detailsG.ProductID)
group by prd.ProductSubcategoryID
) detailsG
on (detailsG.__INJECT0 is not null and (detailsG.__INJECT0 = subcat.ProductSubcategoryID))
inner join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
order by detailsG.__INJECT1 desc
limit 5
SubcategoryName
CategoryName
TotalSales
TotalQtySold
Road Bikes
Bikes
43909437.51
47196
Mountain Bikes
Bikes
36445443.94
28321
Touring Bikes
Bikes
14296291.26
14751
Mountain Frames
Components
4713930.23
11621
Road Frames
Components
3851350.6
11753
This query looks good!
(from subcat ProductSubcategory (limit 5) (select (SubcategoryName subcat)) (select (CategoryName subcat)) (join detailsG SalesOrderDetail (join-type 'left) (join prd (Product detailsG)) (group-by (ProductSubcategoryID prd)) (join-on (.= (?? (ProductSubcategoryID prd) /void) (ProductSubcategoryID subcat)))) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG))))
(from subcat ProductSubcategory (limit 5) (select (SubcategoryName subcat)) (select (CategoryName subcat)) (join detailsG SalesOrderDetail (join-type 'left) (join prd (Product detailsG)) (group-by (ProductSubcategoryID (Product detailsG))) (join-on (.= (?? (ProductSubcategoryID (Product detailsG)) /void) (ProductSubcategoryID subcat)))) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG))))
(from subcat ProductSubcategory (limit 5) (select (SubcategoryName subcat)) (select (CategoryName subcat)) (join detailsG SalesOrderDetail (join-type 'left) (group-by (ProductSubcategoryID detailsG)) (join-on (.= (?? (ProductSubcategoryID detailsG) /void) (ProductSubcategoryID subcat)))) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG))))
Use the Grouped Join -> Schema Definition recipe to create the following equivalent query:
> (aw:show-table (from subcat ProductSubcategory (limit 5) (select (SubcategoryName subcat)) (select (CategoryName subcat)) (join detailsG (DetailsG subcat)) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG)))))
Show TableShow SQL
select subcat.Name as SubcategoryName
, cat.Name as CategoryName
, round(detailsG.__INJECT1, 2) as TotalSales
, detailsG.__INJECT2 as TotalQtySold
from ProductSubcategory subcat
inner join (
select prd.ProductSubcategoryID as __INJECT0
, sum(detailsG.LineTotal) as __INJECT1
, sum(detailsG.OrderQty) as __INJECT2
from SalesOrderDetail detailsG
inner join Product prd
on (prd.ProductID = detailsG.ProductID)
group by prd.ProductSubcategoryID
) detailsG
on (detailsG.__INJECT0 is not null and (detailsG.__INJECT0 = subcat.ProductSubcategoryID))
inner join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
order by detailsG.__INJECT1 desc
limit 5
SubcategoryName
CategoryName
TotalSales
TotalQtySold
Road Bikes
Bikes
43909437.51
47196
Mountain Bikes
Bikes
36445443.94
28321
Touring Bikes
Bikes
14296291.26
14751
Mountain Frames
Components
4713930.23
11621
Road Frames
Components
3851350.6
11753
Again, you might want to go further with the refactoring and define properties like (TotalSales ProductSubcategory) and (TotalQtySold ProductSubcategory). This might be a good idea, and you are welcome to do so. But I am going to stop here for now.
Refactoring RecapWhile refactoring our query, we made the following enhancements to our schema. You can try the following on your REPL and verify that no error is raised.
(Product SalesOrderDetail) (ProductSubcategoryID SalesOrderDetail) (DetailsG ProductSubcategory)
My solution for this section is here.
2.3.6 Task 6: Sales by Anything
Show me a list of the best-selling Products of all time. Sort by total revenue. Include total quantity sold and subcategory. |
|
Show me a list of the best-selling Subcategories of all time. Sort by total revenue. Include total quantity sold and category name. |
|
We can generalize this to Show me a list of the best-selling THINGS of all time. Sort by total revenue. Include total quantity sold and SOME_OTHER_STUFF.
; Sales by Product: (from prd Product (limit 5) (select (ProductNumber prd)) (select (SubcategoryName prd)) (join detailsG (DetailsG prd)) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG)))) ; Sales by Subcategory: (from subcat ProductSubcategory (limit 5) (select (SubcategoryName subcat)) (select (CategoryName subcat)) (join detailsG (DetailsG subcat)) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG))))
> (define original (from p Product (select (ProductNumber p)) (select (ProductName p)) (select (SubcategoryName p)) (select (CategoryName p))))
> (define first-half (from p Product (select (ProductNumber p)) (select (ProductName p))))
> (define both-halves ; This appends two more clauses to `first-half` (from x first-half (select (SubcategoryName x)) (select (CategoryName x)))) > (equal? original both-halves) #t
(define (sales-report first-half) (from x first-half (limit 5) (join detailsG (DetailsG x)) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG)))))
> (aw:show-table (sales-report (from prd Product (select (ProductNumber prd)) (select (SubcategoryName prd)))))
Show TableShow SQL
select prd.ProductNumber as ProductNumber
, subcat.Name as SubcategoryName
, round(detailsG.__INJECT1, 2) as TotalSales
, detailsG.__INJECT2 as TotalQtySold
from Product prd
inner join (
select detailsG.ProductID as __INJECT0
, sum(detailsG.LineTotal) as __INJECT1
, sum(detailsG.OrderQty) as __INJECT2
from SalesOrderDetail detailsG
group by detailsG.ProductID
) detailsG
on (detailsG.__INJECT0 = prd.ProductID)
left join ProductSubcategory subcat
on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))
order by detailsG.__INJECT1 desc
limit 5
ProductNumber
SubcategoryName
TotalSales
TotalQtySold
BK-M68B-38
Mountain Bikes
4400592.8
2977
BK-M68B-42
Mountain Bikes
4009494.76
2664
BK-M68S-38
Mountain Bikes
3693678.03
2394
BK-M68S-42
Mountain Bikes
3438478.86
2234
BK-M68S-46
Mountain Bikes
3434256.94
2216
> (aw:show-table (sales-report (from subcat ProductSubcategory (select (SubcategoryName subcat)) (select (CategoryName subcat)))))
Show TableShow SQL
select subcat.Name as SubcategoryName
, cat.Name as CategoryName
, round(detailsG.__INJECT1, 2) as TotalSales
, detailsG.__INJECT2 as TotalQtySold
from ProductSubcategory subcat
inner join (
select prd.ProductSubcategoryID as __INJECT0
, sum(detailsG.LineTotal) as __INJECT1
, sum(detailsG.OrderQty) as __INJECT2
from SalesOrderDetail detailsG
inner join Product prd
on (prd.ProductID = detailsG.ProductID)
group by prd.ProductSubcategoryID
) detailsG
on (detailsG.__INJECT0 is not null and (detailsG.__INJECT0 = subcat.ProductSubcategoryID))
inner join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
order by detailsG.__INJECT1 desc
limit 5
SubcategoryName
CategoryName
TotalSales
TotalQtySold
Road Bikes
Bikes
43909437.51
47196
Mountain Bikes
Bikes
36445443.94
28321
Touring Bikes
Bikes
14296291.26
14751
Mountain Frames
Components
4713930.23
11621
Road Frames
Components
3851350.6
11753
Interesting! The sales-report procedure accepts a query of any table for which DetailsG is defined! It appends some more clauses to create a larger query.
My solution for this section is here.
We didn’t add anything to our schema definition in this section.
2.3.7 Task 7: Sales by Anything with Date Range
(define (sales-report first-half #:start-date [start-date #f] #:end-date [end-date #f]) (from x first-half (limit 5) (join detailsG (DetailsG x) ; Like queries, joins are also appendable. ; The following clauses are appended to the join ; that (DetailsG x) returned: (join soh SalesOrderHeader (join-on (.= (SalesOrderID soh) (SalesOrderID detailsG)))) (when start-date (where (.>= (OrderDate soh) start-date))) (when end-date (where (.< (OrderDate soh) end-date)))) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG)))))
> (aw:show-table (sales-report #:start-date (val "2012-01-01" Datetime?) #:end-date (val "2013-01-01" Datetime?) (from subcat ProductSubcategory (select (SubcategoryName subcat)) (select (CategoryName subcat)))))
Show TableShow SQL
select subcat.Name as SubcategoryName
, cat.Name as CategoryName
, round(detailsG.__INJECT1, 2) as TotalSales
, detailsG.__INJECT2 as TotalQtySold
from ProductSubcategory subcat
inner join (
select prd.ProductSubcategoryID as __INJECT0
, sum(detailsG.LineTotal) as __INJECT1
, sum(detailsG.OrderQty) as __INJECT2
from SalesOrderDetail detailsG
inner join SalesOrderHeader soh
on (soh.SalesOrderID = detailsG.SalesOrderID)
inner join Product prd
on (prd.ProductID = detailsG.ProductID)
where (soh.OrderDate >= '2012-01-01')
and (soh.OrderDate < '2013-01-01')
group by prd.ProductSubcategoryID
) detailsG
on (detailsG.__INJECT0 is not null and (detailsG.__INJECT0 = subcat.ProductSubcategoryID))
inner join ProductCategory cat
on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))
order by detailsG.__INJECT1 desc
limit 5
SubcategoryName
CategoryName
TotalSales
TotalQtySold
Road Bikes
Bikes
17193556.04
19460
Mountain Bikes
Bikes
11791959.8
9034
Road Frames
Components
1692467.03
5564
Mountain Frames
Components
1541340.67
3168
Wheels
Components
492218.86
3802
My solution for this section is here.
We didn’t add anything to our schema definition in this section.
2.3.7.1 Extra Credit 1
(define (sales-report some-query #:start-date [start-date #f] #:end-date [end-date #f]) (from x some-query (limit 5) (join detailsG (DetailsG x) (when start-date (where (.>= (OrderDate detailsG) start-date))) (when end-date (where (.< (OrderDate detailsG) end-date)))) (select (>> (round (sum (LineTotal detailsG)) 2) #:as 'TotalSales)) (select (>> (sum (OrderQty detailsG)) #:as 'TotalQtySold)) (order-by 'desc (sum (LineTotal detailsG)))))
(SalesOrderHeader SalesOrderDetail)
(OrderDate SalesOrderDetail)
My solution for this section is here.
2.3.7.2 Extra Credit 2 (More Challenging)
Extra Credit: Extend the definition of DetailsG so that it is defined for the ProductCategory, SalesTerritory, and SalesPerson tables. Try using sales-report with these tables. Here are some hints:
The primary key of the ProductCategory table is ProductCategoryID, so you will need to group the SalesOrderDetail records by ProductCategoryID. You will need to join from SalesOrderDetail to Product to ProductSubcategory. The ProductSubcategory table has a ProductCategoryID column.
The primary key of the SalesTerritory table is TerritoryID, so you will need to group the SalesOrderDetail records by TerritoryID. You will need to join from SalesOrderDetail to SalesOrderHeader. The SalesOrderHeader table has a TerritoryID column.
The primary key of the SalesPerson table is BusinessEntityID. There is a foreign key from SalesOrderHeader.SalesPersonID to SalesPerson.BusinessEntityID. You probably have already defined the join from SalesOrderDetail to SalesOrderHeader. This is sufficient to solve the task, but the SalesPerson table does not contain much information. If you want to see, for example, the full name of the SalesPerson you will have to join to the Person table.
Part 2: My definition of (DetailsG SalesTerritory) is here.
Part 3: My definition of (DetailsG SalesPerson) is here.
2.4 Appendix A: Generating the Initial Schema Definition
Extracting table and column information for use with define-schema is currently an ad-hoc process. Plisqin may make this easier in the future. But it is already pretty easy if you are using SQL Server or PostgreSQL.
The AdventureWorks database is originally for SQL Server; Plisqin provides a port of this database to SQLite. So to generate the initial AdventureWorks schema, I ran this query against the SQL Server original. This produces Racket code that just needs some simple text massaging.
> (aw:show-table "select type, name from sqlite_master where name like 'Business%'")
Show TableShow SQL
select type, name from sqlite_master where name like 'Business%'
type
name
table
BusinessEntity
table
BusinessEntityAddress
table
BusinessEntityContact
> (aw:show-table "pragma table_info(BusinessEntity)")
Show TableShow SQL
pragma table_info(BusinessEntity)
cid
name
type
notnull
dflt_value
pk
0
BusinessEntityID
integer_id
1
#<sql-null>
0
1
rowguid
text
1
#<sql-null>
0
2
ModifiedDate
text
1
#<sql-null>
0