On this page:
3.1 Grouping
grouped-data-frame?
group-with
ungroup
ungroup-once
3.2 Splitting
split-with
combine

3 Grouping and splitting🔗ℹ

The overwhelming majority of operations in Sawzall respect the "grouping" of a data-frame. Most operations are done on groups defined by variables, so grouping takes an existing frame and converts it into a grouped one, in which operations are performed by group.

3.1 Grouping🔗ℹ

procedure

(grouped-data-frame? v)  boolean?

  v : any/c
Determines if the input v is a grouped data-frame. These can only be constructed by group-with, or as the result on another operation on an existing grouped data-frame.

procedure

(group-with df var ...)  grouped-data-frame?

  df : data-frame?
  var : string?
Takes an existing data-frame df, and groups it with respect to the given variables var sequentially, returning a grouped data frame.

This does not change how the data-frame is displayed with show or introspect, but the result is internally different, and cannot be used with regular data-frame operators like df-select.

Example:
> (~> example-df
      (group-with "grp" "trt")
      show)

data-frame: 5 rows x 4 columns

groups: (trt grp)

┌───┬───┬─────┬───┐

│grp│trt│adult│juv│

├───┼───┼─────┼───┤

│a  │b  │1    │10 │

├───┼───┼─────┼───┤

│a  │b  │2    │20 │

├───┼───┼─────┼───┤

│b  │a  │3    │30 │

├───┼───┼─────┼───┤

│b  │b  │4    │40 │

├───┼───┼─────┼───┤

│b  │b  │5    │50 │

└───┴───┴─────┴───┘

procedure

(ungroup df)  data-frame?

  df : (or/c data-frame? grouped-data-frame?)
Removes all levels of grouping from a grouped data frame, returning a singular data frame. In most cases, you’ll want to do this before passing your wrangled data to some other application.

If df is not grouped, this does nothing.

Removes the last level of grouping from a grouped data frame. For example, if a grouped frame is grouped by X and Y, running ungroup-once it would make it grouped by just X.

If df is not grouped, this does nothing.

3.2 Splitting🔗ℹ

The following operations behave similar to the above counterparts, but they return a list instead of a grouped data frame, so you must use map to do sequential groups or perform operations.

These operations are also notably less performant due to the amount of copying involved.

procedure

(split-with df var)  (listof data-frame?)

  df : data-frame?
  var : string?
Splits the given data-frame df along the input variable var, returning a list of each possibility.

procedure

(combine df ...)  data-frame?

  df : data-frame?
Appends the shared series of the input data-frames into a single data-frame.