10 Pivoting
The majority of data found in the wild will not be tidy, and therefore not work with the above operations. The goal of pivoting is to help make your data tidy.
Every column is a variable.
Every row is an observation.
Every cell is a single value.
So, for example, if you were to have a column corresponding to a value and not a variable (such as a site), or a column corresponding to a "type" of observation, these operations would help.
syntax
(pivot-longer df slice-spec #:names-to names-to #:values-to values-to)
df : data-frame?
names-to : string?
values-to : string?
slice-spec is an expression in the slice sub-language. See Slicing for more information on this language. Any not column returned by evaluating slice-spec will be brought along so that former observations line up, but its data will not be modified.
All the names of every column selected by slice-spec is brought into a new column with name names-to, and all the values are brought into a new column with name values-to.
This function does not work with grouped data frames, as it has potential to destroy some internal invariants.
> (define wide-df (row-df [day hour a b c] 1 10 97 84 55 2 11 78 47 54))
> (~> wide-df (pivot-longer ["a" "b" "c"] #:names-to "site" #:values-to "catch") show)
data-frame: 6 rows x 4 columns
┌────┬─────┬────┬───┐
│hour│catch│site│day│
├────┼─────┼────┼───┤
│10 │97 │a │1 │
├────┼─────┼────┼───┤
│11 │78 │a │2 │
├────┼─────┼────┼───┤
│10 │84 │b │1 │
├────┼─────┼────┼───┤
│11 │47 │b │2 │
├────┼─────┼────┼───┤
│10 │55 │c │1 │
├────┼─────┼────┼───┤
│11 │54 │c │2 │
└────┴─────┴────┴───┘
procedure
(pivot-wider df #:names-from names-from #:values-from values-from) → data-frame? df : data-frame? names-from : string? values-from : string?
names-from is the column to create new columns from, and values-from is the column to get the corresponding data from.
If a value is not found in the long format data-frame, it will be replaced with "NA" (#f).
This function does not work with grouped data frames, as it has potential to destroy some internal invariants.
> (define long-df1 (row-df [day grp val] 1 "A" 10 1 "B" 20 2 "B" 30))
> (~> long-df1 (pivot-wider #:names-from "grp" #:values-from "val") show)
data-frame: 2 rows x 3 columns
┌──┬──┬───┐
│B │A │day│
├──┼──┼───┤
│20│10│1 │
├──┼──┼───┤
│30│#f│2 │
└──┴──┴───┘
> (define long-df2 (row-df [day hour grp val] 1 10 "a" 83 1 10 "b" 78 1 11 "a" 80 1 11 "b" 105 2 10 "a" 95 2 10 "b" 77 2 11 "a" 96 2 11 "b" 99))
> (~> long-df2 (pivot-wider #:names-from "grp" #:values-from "val") show)
data-frame: 4 rows x 4 columns
┌───┬──┬────┬───┐
│b │a │hour│day│
├───┼──┼────┼───┤
│78 │83│10 │1 │
├───┼──┼────┼───┤
│77 │95│10 │2 │
├───┼──┼────┼───┤
│105│80│11 │1 │
├───┼──┼────┼───┤
│99 │96│11 │2 │
└───┴──┴────┴───┘