2 Reference

9.3.0.2

Racket

2 Reference🔗ℹ

Alongside the monomorphic, dtype-suffixed bindings (series-new-i32, series-sum-f64, and friends), polars provides a small, "rackety" high-level layer: series and dataframe wrapper values reached through a handful of purpose-named generic operations. The generics dispatch at runtime on the wrapper’s type, or — for the reductions — on the series’ dtype (read via dtype).

2.1 Series🔗ℹ

A series wraps a typed column and prints in the REPL the way Polars prints it; series? is its predicate. (The underlying foreign pointer is an implementation detail and not part of the public series API.)

procedure
(series? v) → boolean?
v : any/c

Returns #t if v is a series.

procedure
(series elements [#:name name #:dtype dtype]) → series?
  elements : (or/c list? vector?)
  name : string? = ""
  dtype : (or/c #f symbol? pair?) = #f

Builds a series from a list or vector. When #:dtype is omitted the dtype is inferred from the elements; otherwise it is taken from dtype. Both short spellings ('i32, 'f64, 'str, 'bool) and canonical symbols ('int32, 'float64, 'string, 'boolean) are accepted. Use polars-null for missing values. Exact integers are coerced to flonums when the target dtype is floating point.

procedure
(series->string s) → string?
s : series?

Renders s in Polars’ series format (a shape line, a Series: ’name’ [dtype] line, then the bracketed values, truncated to the first and last five when longer than ten). This is also what a series prints as in the REPL.

value
polars-null : any/c
procedure
(polars-null? v) → boolean?
v : any/c

polars-null is the sentinel marking a missing value: pass it among the elements given to series to produce nulls, and it is what ref returns for a null entry. polars-null? tests for it.

procedure
(dtype s) → (or/c symbol? pair?)
  s : has-dtype?
procedure
(len x) → exact-nonnegative-integer?
  x : sized?
procedure
(null-count s) → exact-nonnegative-integer?
  s : has-null-count?

Generic series accessors. dtype returns the canonical dtype symbol (e.g. 'int32, 'float64, '(datetime milliseconds #f)). len returns the number of elements (and, on a dataframe, the number of rows). null-count returns the number of null entries.

procedure
(sum v ...) → any/c
  v : any/c
procedure
(mean v ...) → any/c
  v : any/c
procedure
(min v ...) → any/c
  v : any/c
procedure
(max v ...) → any/c
  v : any/c

These dispatch on their argument. Applied to a single series, they reduce it (dispatching on its dtype): min, max and sum preserve the input dtype, while mean always returns a float64, so the mean of an integer series is a flonum (see dtype promotion). Applied to a single expression or a bare column-name string, they produce the corresponding aggregation expression — so (sum (col "value")) reads like Polars’ col("value").sum() and is used inside agg (see Fluent pipelines). Applied to anything else they fall back to the usual numeric behaviour, so (max 1 2 3) still works.

procedure
(rename s new-name) → series?
  s : series?
  new-name : string?
procedure
(rename! s new-name) → void?
  s : series?
  new-name : string?
procedure
(clone s) → series?
  s : series?
procedure
(series-clone s) → series?
  s : series?

rename! renames a series in place (matching Polars), returning void as is conventional for ! mutators; rename returns a renamed copy and leaves the original untouched. clone (and its series-specific alias series-clone) returns an independent copy.

2.1.1 dtype promotion🔗ℹ

Reductions follow a simple, predictable rule. The widening order, narrow to wide, is

'int8 < 'int16 < 'int32 < 'int64
'uint8 < 'uint16 < 'uint32 < 'uint64
any integer < 'float32 < 'float64

sum, min and max preserve the input dtype. mean promotes to 'float64. Use series-cast to change a series’ dtype explicitly.

2.2 DataFrames🔗ℹ

A dataframe is a collection of equal-length named series. Like a series it is a wrapper value (dataframe?) carrying the column data; it prints as a Polars table, so display (or ~a, or the REPL) renders it with no separate display call.

procedure
(dataframe? v) → boolean?
v : any/c

Returns #t if v is a dataframe.

procedure
(dataframe columns) → dataframe?
columns : (listof series?)

Builds a dataframe from a list of equal-length series. The columns may be series wrappers built with series; their names become the column names.

procedure
(shape x) → (listof exact-nonnegative-integer?)
  x : has-shape?
procedure
(shape/values x) →
exact-nonnegative-integer? ...
  x : has-shape?
procedure
(height d) → exact-nonnegative-integer?
  d : dataframe?
procedure
(width d) → exact-nonnegative-integer?
  d : dataframe?

shape returns the dimensions as a list — (list rows cols) for a dataframe and (list n) for a series — mirroring Polars’ shape tuples. shape/values returns the same dimensions as multiple values, for callers that want to bind them positionally with let-values or define-values. height and width return the row and column counts of a dataframe; height is also (len d).

procedure
(column-names d) → (listof string?)
  d : dataframe?
procedure
(column-name d i) → string?
  d : dataframe?
  i : exact-nonnegative-integer?

column-names returns all column names in order; column-name returns the name of the column at index i.

procedure
(ref x [key #:columns columns #:rows rows]) → any/c
  x : has-ref?
  key : (or/c exact-nonnegative-integer? string?) = absent
   columns :
(or/c exact-nonnegative-integer? string?
      (listof (or/c exact-nonnegative-integer? string?)))
= absent
  rows : any/c = absent

The generic element / column accessor. On a series, (ref s i) returns the element at index i. On a dataframe, a single selector — given positionally or as #:columns — returns that column (by name or index) as a series, and a list of selectors returns a column-projected dataframe. It is data-first, so it threads. #:rows is reserved for row slicing and currently raises an error. Provided by the gen:has-ref interface.

procedure
(describe x) → dataframe?
x : (or/c series? dataframe?)

Mirrors Polars’ .describe(): returns a summary-statistics dataframe (which prints as a table). For a series the result has a "statistic" column and a "value" column, with rows adapted to the dtype — a numeric series gets "count", "null_count", "mean", "std", "min", "25%", "50%", "75%" and "max"; a boolean series drops "std" and the quantiles; other dtypes (string, temporal) keep just "count", "null_count", "min" and "max". For a dataframe the result uses Polars’ fixed nine-row layout (a "statistic" column plus one column per input column), leaving a cell polars-null where a column has no value for that statistic. Quantiles use nearest interpolation. Dispatches on series? / dataframe?.

2.2.1 Low-level DataFrame API🔗ℹ

The generic layer above is built on a set of monomorphic dataframe-* bindings that operate directly on the foreign dataframe. They remain exported and accept the dataframe wrapper (it marshals transparently); the generic operations are simply the preferred surface.

procedure
(dataframe-new columns) → dataframe?
  columns : (listof series?)
procedure
(dataframe-shape d) →
exact-nonnegative-integer?
exact-nonnegative-integer?
  d : dataframe?
procedure
(dataframe-height d) → exact-nonnegative-integer?
  d : dataframe?
procedure
(dataframe-width d) → exact-nonnegative-integer?
  d : dataframe?
procedure
(dataframe-column d name) → series?
  d : dataframe?
  name : string?
procedure
(dataframe-column-name d i) → string?
  d : dataframe?
  i : exact-nonnegative-integer?
procedure
(dataframe-column-names d) → (listof string?)
  d : dataframe?
procedure
(dataframe-select d names) → dataframe?
  d : dataframe?
  names : (listof string?)
procedure
(display-dataframe d [out]) → void?
  d : dataframe?
  out : output-port? = (current-output-port)

The low-level dataframe operations underlying dataframe, shape, height, width, ref, column-name, and column-names. display-dataframe prints the Polars table to out; since a dataframe now prints itself, prefer plain display.

2.2.2 Reading & writing🔗ℹ

procedure
(dataframe-write-csv d path) → void?
  d : dataframe?
  path : path-string?
procedure
(dataframe-read-csv path) → dataframe?
  path : path-string?
procedure
(dataframe-write-parquet d path) → void?
  d : dataframe?
  path : path-string?
procedure
(dataframe-read-parquet path) → dataframe?
  path : path-string?
procedure
(dataframe-write-json-lines d path) → void?
  d : dataframe?
  path : path-string?
procedure
(dataframe-read-json-lines path) → dataframe?
  path : path-string?

Round-trip a dataframe through CSV, Parquet, or newline-delimited JSON.

2.3 Fluent pipelines🔗ℹ

A data-first layer that mirrors Polars’ Python method chaining. Because each operation takes the frame as its first argument, a pipeline reads as a thread-first ~> chain (re-provided from threading, so (require polars) is enough):

(~> df
    (filter (> (col "value") 15))
    (group-by "group")
    (agg (alias (sum (col "value")) "sum_value")))

procedure
(> a b ...) → any/c
  a : any/c
  b : any/c
procedure
(< a b ...) → any/c
  a : any/c
  b : any/c
procedure
(>= a b ...) → any/c
  a : any/c
  b : any/c
procedure
(<= a b ...) → any/c
  a : any/c
  b : any/c
procedure
(= a b ...) → any/c
  a : any/c
  b : any/c
procedure
(!= a b ...) → any/c
  a : any/c
  b : any/c

Overloaded comparison operators. If an operand is an expression, they build a comparison expression (scalars are lifted automatically), so (> (col "value") 15) reads like col("value") > 15. If an operand is a series, they build an eager boolean-mask series — element-wise over every numeric dtype, including 'int64 — so (> (ref df #:columns "value") 15) is a mask. Otherwise they fall back to the numeric racket/base operator and stay variadic, so (> 3 2) and (< 1 2 3) still work. != has no racket/base spelling; on numbers it is (not (= a b)). These shadow the racket/base comparisons; see Shadowed bindings.

procedure
(filter d predicate) → dataframe?
d : dataframe?
predicate : any/c

Keeps the rows of d matching predicate, which may be a boolean expression — (filter df (> (col "value") 15)) — or a precomputed boolean-mask series. Returns a new dataframe. Applied to a non-dataframe it falls back to racket/base’s filter, so (filter even? '(1 2 3 4)) is '(2 4).

procedure
(sort d names [#:descending descending]) → dataframe?
  d : dataframe?
  names : (or/c string? (listof string?))
  descending : (or/c boolean? (listof boolean?)) = #f

Sorts d by one or more columns. #:descending is a single boolean applied to all keys, or a per-key list. Applied to a non-dataframe it falls back to racket/base’s sort, so (sort '(3 1 2) <) is '(1 2 3).

procedure
(group-by d key ...) → grouped?
  d : dataframe?
  key : (or/c string? any/c)
procedure
(agg g agg-expr ...) → dataframe?
  g : grouped?
  agg-expr : any/c
procedure
(grouped? v) → boolean?
  v : any/c

group-by captures d and one or more group keys in a deferred grouped handle — no work happens yet — so it threads cleanly. agg consumes the handle, computing the aggregation expressions per group in a single pass, and returns a dataframe with one row per group. The split mirrors df.group_by("g").agg(...); the row order of the result is not guaranteed.

procedure
(count x) → any/c
  x : any/c
procedure
(n-unique x) → any/c
  x : any/c
procedure
(median x) → any/c
  x : any/c
procedure
(std x [#:ddof ddof]) → any/c
  x : any/c
  ddof : exact-nonnegative-integer? = 1
procedure
(var x [#:ddof ddof]) → any/c
  x : any/c
  ddof : exact-nonnegative-integer? = 1
procedure
(alias e name) → any/c
  e : any/c
  name : string?

Aggregation-expression builders for use inside agg, alongside the expression arms of sum, mean, min and max. Each accepts an expression or a bare column-name string (lifted with col), so (count "value") and (count (col "value")) are equivalent. alias names a result, matching Polars’ .alias: (alias (sum (col "value")) "total"). std and var take a #:ddof degrees-of-freedom adjustment, defaulting to 1.

procedure
(first x) → any/c
x : (or/c string? pair? any/c)
procedure
(last x) → any/c
x : (or/c string? pair? any/c)

Dual-purpose. On an expression or column-name string they build the first/last-element aggregation (Polars’ .first() / .last()), for use inside agg. On a list they are the ordinary list accessors, so (first '(1 2 3)) is 1 and (last '(1 2 3)) is 3 — matching racket/list.

Name clash. racket/list also exports first and last (along with count and group-by, which polars exports too). Requiring both modules explicitly — (require racket/list polars) — is an error (identifier already required). A plain #lang racket/base program is unaffected, because racket/base does not export these names. See Shadowed bindings for how to take control.

2.3.1 Shadowed bindings🔗ℹ

(require polars) re-exports a handful of generic operations whose names also live in racket/base (min, max, sort, filter, >, <, >=, <=, =) and in racket/list (first, last, count, group-by). Under #lang racket/base this is seamless — these names are either not bound (so polars simply provides them) or bound only by the module language (which an explicit require silently shadows), and the polars versions intentionally fall back to the numeric/list behaviour for non-frame arguments.

A conflict arises only when another module providing the same name is also required explicitly — most commonly racket/list. Resolve it with the usual require sub-forms:

; keep polars' first/last/count/group-by, drop racket/list's:
(require (except-in racket/list first last count group-by) polars)

; keep racket/list's, reach polars' under a prefix:
(require racket/list (prefix-in pl: polars))
; then (pl:first (col "v")) for the Expr, (first '(1 2 3)) for the list

; keep polars', reach racket/list's under a prefix:
(require polars (prefix-in list: racket/list))

2.4 Generic interfaces🔗ℹ

The high-level operations are small, purpose-named racket/generic interfaces. A wrapper implements the interface for each capability it has — a series and a dataframe both have a len and a shape, so both implement gen:sized and gen:has-shape; only a series has a dtype. Each interface exports its method(s) and a predicate that recognises values implementing it.

syntax
gen:has-ref
procedure
(has-ref? v) → boolean?
v : any/c

The ref capability (method: ref). Implemented by series and dataframes.

syntax
gen:sized
procedure
(sized? v) → boolean?
v : any/c

The len capability (method: len). Implemented by series (number of elements) and dataframes (number of rows).

syntax
gen:has-shape
procedure
(has-shape? v) → boolean?
v : any/c

The shape capability (method: shape). Implemented by series and dataframes.

syntax
gen:has-dtype
procedure
(has-dtype? v) → boolean?
v : any/c

The dtype capability (method: dtype). Implemented by series.

syntax
gen:has-null-count
procedure
(has-null-count? v) → boolean?
v : any/c

The null-count capability (method: null-count). Implemented by series.

2.1	Series
2.2	Data Frames
2.3	Fluent pipelines
2.4	Generic interfaces