12 Unnesting
Sometimes, data that’s read into a data-frame will have deeply nested structure (for example, when reading JSON directly into a data-frame). These operations convert these variables into tidy data, as described in Pivoting; with enough unnesting, these operations can turn this data into a tidy data-frame (when all hashes, vectors, lists, etc are removed).
These operations only work on "list-like" (the sequence returns one value per iteration) or "dictionary-like" (two values per iteration) sequences.
> (define deep-df (column-df [character #("Toothless" "Dory" "Holly")] [metadata (vector (hash 'species "dragon" 'color "black" 'films (vector "How to Train Your Dragon" "How to Train Your Dragon 2" "How to Train Your Dragon: The Hidden World")) (hash 'species "blue tang" 'color "blue" 'films (vector "Finding Nemo" "Finding Dory")) (hash 'species "glaceon" 'color "also blue" 'films #f))]))
procedure
(unnest-wider df column-name [ #:index-prefix index-prefix #:remove? remove?]) → data-frame? df : data-frame? column-name : string? index-prefix : string? = "idx-" remove? : boolean? = #t
If column-name’s data is "list-like" (returns one element per iteration), the column names will be index-prefix concatenated with its index plus one. Otherwise, the column names will be the "keys" (first element) of the sequence.
If the keys of the given sequence are not strings, they will be converted to strings with ~a.
If remove? is true, column-name will be removed from the result data-frame.
> (define expanded-metadata (~> deep-df (unnest-wider "metadata") introspect))
data-frame: 3 rows x 4 columns
┌─────────┬─────────┬─────────┬─────────┐
│character│color │films │species │
├─────────┼─────────┼─────────┼─────────┤
│Toothless│black │#<vector>│dragon │
├─────────┼─────────┼─────────┼─────────┤
│Dory │blue │#<vector>│blue tang│
├─────────┼─────────┼─────────┼─────────┤
│Holly │also blue│#f │glaceon │
└─────────┴─────────┴─────────┴─────────┘
> (~> expanded-metadata (unnest-wider "films") (slice ["character" "idx-1" "idx-2" "idx-3"]) show)
data-frame: 3 rows x 4 columns
┌─────────┬──────────────────────────────────────────┬────────────────────────┬──────────────────────────┐
│character│idx-3 │idx-1 │idx-2 │
├─────────┼──────────────────────────────────────────┼────────────────────────┼──────────────────────────┤
│Toothless│How to Train Your Dragon: The Hidden World│How to Train Your Dragon│How to Train Your Dragon 2│
├─────────┼──────────────────────────────────────────┼────────────────────────┼──────────────────────────┤
│Dory │#f │Finding Nemo │Finding Dory │
├─────────┼──────────────────────────────────────────┼────────────────────────┼──────────────────────────┤
│Holly │#f │#f │#f │
└─────────┴──────────────────────────────────────────┴────────────────────────┴──────────────────────────┘
procedure
(unnest-longer df column-name [ #:keys-to keys-to #:values-to values-to #:remove? remove?]) → data-frame? df : data-frame? column-name : string? keys-to : (or/c string? #f) = #f values-to : (or/c string? #f) = #f remove? : boolean? = #t
If values-to is unspecified or #f, it defaults to column-name.
If keys-to is unspecified or #f, it defaults to (string-append column-name "-keys").
If column-name’s data is "list-like" (returns one element per iteration), the column keys-to is not added, even if it specified.
If remove? is #t, column-name will be removed from the result data-frame, unless column-name is equal to values-to.
> (~> deep-df (unnest-longer "metadata") show)
data-frame: 9 rows x 3 columns
┌─────────────┬─────────┬─────────┐
│metadata-keys│character│metadata │
├─────────────┼─────────┼─────────┤
│color │Toothless│black │
├─────────────┼─────────┼─────────┤
│species │Toothless│dragon │
├─────────────┼─────────┼─────────┤
│films │Toothless│#<vector>│
├─────────────┼─────────┼─────────┤
│color │Dory │blue │
├─────────────┼─────────┼─────────┤
│species │Dory │blue tang│
├─────────────┼─────────┼─────────┤
│films │Dory │#<vector>│
└─────────────┴─────────┴─────────┘
3 rows, 0 cols elided
(use (show df everything #:n-rows 'all) for full frame)
> (~> expanded-metadata (unnest-longer "films") show)
data-frame: 6 rows x 4 columns
┌──────────────────────────────────────────┬─────────┬─────────┬─────────┐
│films │color │species │character│
├──────────────────────────────────────────┼─────────┼─────────┼─────────┤
│How to Train Your Dragon │black │dragon │Toothless│
├──────────────────────────────────────────┼─────────┼─────────┼─────────┤
│How to Train Your Dragon 2 │black │dragon │Toothless│
├──────────────────────────────────────────┼─────────┼─────────┼─────────┤
│How to Train Your Dragon: The Hidden World│black │dragon │Toothless│
├──────────────────────────────────────────┼─────────┼─────────┼─────────┤
│Finding Nemo │blue │blue tang│Dory │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┤
│Finding Dory │blue │blue tang│Dory │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┤
│#f │also blue│glaceon │Holly │
└──────────────────────────────────────────┴─────────┴─────────┴─────────┘