7 Summarizing
This operation summarizes a data frame into a smaller one, using some kind of summary statistic, based on a vectorized operation.
syntax
(aggregate df [new-column (bound-column ...) body ...] ...)
df : (or/c data-frame? grouped-data-frame?)
Each new column is specified by a single clause. The column created will have the name new-column, and be specified by the expressions in body.
The bound variables in body are specified by bound-column. Unlike create, all variables bound in body are the entirety of the column as a vector. body is expected to produce a single value, which is the "aggregation" of that vector.
The binding structure of aggregate is like let: all bound-columns come from df.
If the input is a grouped data-frame, the last layer of grouping will be implicitly removed after aggregating.
> (~> example-df (aggregate [sum (adult) (vector-length adult)]) show)
data-frame: 1 rows x 1 columns
┌───┐
│sum│
├───┤
│5 │
└───┘
> (~> example-df (group-with "grp") (aggregate [adult-sum (adult) (sum adult)] [juv-sum (juv) (sum juv)]) show)
data-frame: 2 rows x 3 columns
┌───────┬─────────┬───┐
│juv-sum│adult-sum│grp│
├───────┼─────────┼───┤
│30 │3 │a │
├───────┼─────────┼───┤
│120 │12 │b │
└───────┴─────────┴───┘
> (~> example-df (group-with "grp" "trt") (aggregate [adult-sum (adult) (sum adult)] [juv-sum (juv) (sum juv)]) show)
data-frame: 3 rows x 4 columns
groups: (grp)
┌───┬─────────┬───────┬───┐
│grp│adult-sum│juv-sum│trt│
├───┼─────────┼───────┼───┤
│a │3 │30 │b │
├───┼─────────┼───────┼───┤
│b │3 │30 │a │
├───┼─────────┼───────┼───┤
│b │9 │90 │b │
└───┴─────────┴───────┴───┘