10 Pivoting

10 Pivoting🔗ℹ

The majority of data found in the wild will not be tidy, and therefore not work with the above operations. The goal of pivoting is to help make your data tidy.

Tidy data is data where:

Every column is a variable.
Every row is an observation.
Every cell is a single value.

So, for example, if you were to have a column corresponding to a value and not a variable (such as a site), or a column corresponding to a "type" of observation, these operations would help.

syntax
(pivot-longer df slice-spec
              #:names-to names-to #:values-to values-to)

   df : data-frame?
   names-to : string?
   values-to : string?

Returns a new data-frame that is the input df pivoted "longer", so less columns, more rows. This is useful for tidying wide-form data.

slice-spec is an expression in the slice sub-language. See Slicing for more information on this language. Any not column returned by evaluating slice-spec will be brought along so that former observations line up, but its data will not be modified.

All the names of every column selected by slice-spec is brought into a new column with name names-to, and all the values are brought into a new column with name values-to.

This function does not work with grouped data frames, as it has potential to destroy some internal invariants.

Examples:

> (define wide-df
    (row-df [day hour a  b  c]
             1   10   97 84 55
             2   11   78 47 54))
> (~> wide-df
      (pivot-longer ["a" "b" "c"] #:names-to "site" #:values-to "catch")
      show)
data-frame: 6 rows x 4 columns
┌─────┬────┬────┬───┐
│catch│hour│site│day│
├─────┼────┼────┼───┤
│97   │10  │a   │1  │
├─────┼────┼────┼───┤
│78   │11  │a   │2  │
├─────┼────┼────┼───┤
│84   │10  │b   │1  │
├─────┼────┼────┼───┤
│47   │11  │b   │2  │
├─────┼────┼────┼───┤
│55   │10  │c   │1  │
├─────┼────┼────┼───┤
│54   │11  │c   │2  │
└─────┴────┴────┴───┘

procedure
(pivot-wider df
#:names-from names-from
#:values-from values-from) → data-frame?
  df : data-frame?
  names-from : string?
  values-from : string?

Returns a new data-frame that is the input df pivoted "wider", so less rows, more columns. This is useful for putting data into "wide form", optimized for data entry for export into spreadsheet software like Excel, or for some convoluted tidying pipeline.

names-from is the column to create new columns from, and values-from is the column to get the corresponding data from.

If a value is not found in the long format data-frame, it will be replaced with "NA" (#f).

This function does not work with grouped data frames, as it has potential to destroy some internal invariants.

Examples:

> (define long-df1
    (row-df [day grp val]
             1   "A" 10
             1   "B" 20
             2   "B" 30))
> (~> long-df1
      (pivot-wider #:names-from "grp" #:values-from "val")
      show)
data-frame: 2 rows x 3 columns
┌──┬───┬──┐
│A │day│B │
├──┼───┼──┤
│10│1  │20│
├──┼───┼──┤
│#f│2  │30│
└──┴───┴──┘
> (define long-df2
    (row-df [day hour grp val]
             1   10   "a" 83
             1   10   "b" 78
             1   11   "a" 80
             1   11   "b" 105
             2   10   "a" 95
             2   10   "b" 77
             2   11   "a" 96
             2   11   "b" 99))
> (~> long-df2
      (pivot-wider #:names-from "grp" #:values-from "val")
      show)
data-frame: 4 rows x 4 columns
┌──┬────┬───┬───┐
│a │hour│b  │day│
├──┼────┼───┼───┤
│83│10  │78 │1  │
├──┼────┼───┼───┤
│80│11  │105│1  │
├──┼────┼───┼───┤
│95│10  │77 │2  │
├──┼────┼───┼───┤
│96│11  │99 │2  │
└──┴────┴───┴───┘

1	Constructing data-frames
2	Displaying data
3	Grouping and splitting
4	Filtering
5	Slicing
6	Creating and modifying columns
7	Summarizing
8	Joining
9	Sorting
10	Pivoting
11	Separating variables
12	Unnesting
13	Missing values