On this page:
1.1 Series
1.2 Data  Frames
1.3 Reading & writing
1.4 Expressions
1.5 Method chaining with ~>
1.6 Combining Data  Frames
9.2.0.5

1 Guide🔗ℹ

This guide is a short, narrative tour of polars, modelled section by section on the upstream Polars getting-started guide. Every snippet below has a complete, runnable counterpart under user-guide/getting-started/ in the project repository, paired with an equivalent Python program so the two read side by side. Inside the dev shell:

racket user-guide/getting-started/series-and-dataframes.rkt

python user-guide/getting-started/series_and_dataframes.py

For the full definition of every binding mentioned here, see the Reference.

1.1 Series🔗ℹ

A series is a typed, one-dimensional column. The generic series constructor infers a dtype from the values, or takes an explicit #:dtype; polars-null marks missing entries. A series is a wrapper value (series?) that prints in Polars’ format.

(define s (series '(1 2 3 4 5) #:name "a"))
(describe s)
(list (sum s) (min s) (max s) (mean s))

sum, min and max preserve the dtype; mean returns a float64. The generic accessors len, dtype and null-count read a series’ length, element dtype and null count.

1.2 DataFrames🔗ℹ

A dataframe is a collection of equal-length named series, built with the dataframe constructor. It is a wrapper value (dataframe?) that prints as a Polars table, so plain display (or ~a) shows it. gregor datetimes become Polars datetime columns.

(require gregor)
(define df
  (dataframe
   (list (series (list (datetime 2025 1 1) (datetime 2025 1 2)) #:name "date")
         (series '(1.0 2.0) #:name "float")
         (series '("a" "b") #:name "string"))))
(displayln df)

Inspect it with shape, height, width, column-names, and describe. ref is the generic accessor: an element out of a series, a column (or, with a list, a projection) out of a dataframe.

(shape df)
(column-names df)
(ref (ref df #:columns "float") 0)
(ref df #:columns '("date" "float"))

1.3 Reading & writing🔗ℹ

DataFrames round-trip through CSV, Parquet, and newline-delimited JSON.

(dataframe-write-csv df "data.csv")
(dataframe-read-csv "data.csv")
 
(dataframe-write-parquet df "data.parquet")
(dataframe-read-parquet "data.parquet")
 
(dataframe-write-json-lines df "data.jsonl")
(dataframe-read-json-lines "data.jsonl")

1.4 Expressions🔗ℹ

Expressions (col, expr-mul, expr-sum, expr-alias, …) describe transformations, and are reused across the select, with_columns, filter, and group_by/agg contexts.

; select: choose and transform columns
(dataframe-select-exprs
 df
 (list (col "group")
       (expr-alias (expr-mul (col "value") (col "cost")) "spend")))
 
; with_columns: add derived columns
(dataframe-with-columns
 df
 (list (expr-alias (expr-mul (col "value") 2) "double_value")))
 
; filter: keep matching rows
(dataframe-filter-expr
 df
 (expr-and (expr-gt (col "value") 15)
           (expr-lt (col "cost") 3.0)))
 
; group_by + agg: aggregate per group
(dataframe-group-by-agg
 df '("group")
 (list (expr-alias (expr-sum (col "value")) "sum_value")
       (expr-alias (expr-count (col "value")) "n")))

1.5 Method chaining with ~>🔗ℹ

Polars’ Python API reads as a chain of methods:

df.filter(pl.col("value") > 15)

  .group_by("group")

  .agg(pl.col("value").sum().alias("sum_value"))

The same pipeline reads as a thread-first ~> chain — re-provided from polars, so (require polars) is all you need. Each step takes the frame as its first argument, so the frame flows through the chain:

(~> df
    (filter (> (col "value") 15))
    (group-by "group")
    (agg (alias (sum (col "value")) "sum_value")))

The comparison operators (>, <, >=, <=, =, !=) are overloaded: they build an expression when an operand is an expression — (> (col "value") 15) — an eager boolean-mask series when an operand is a series — (> (ref df #:columns "value") 15) — and otherwise fall back to the numeric racket/base operator, so (> 3 2) still works. filter, sort, group-by and agg are data-first so they thread; group-by returns a deferred handle that agg consumes. Inside agg, sum, mean, min, max, count, first and friends take an expression (or a bare column name) and produce an aggregation, matching col("value").sum(). See Fluent pipelines for the full set, including the first/last name clash with racket/list.

1.6 Combining DataFrames🔗ℹ

Join two frames on a key (eagerly, or as part of a lazy plan), and stack rows with dataframe-vstack.

(lazyframe-collect
 (lazyframe-join (dataframe-lazy users) (dataframe-lazy orders)
                 #:on '("uid") #:how 'inner))
 
(dataframe-vstack users more-users)

For complete, runnable versions of all of the above — including the Python references — see user-guide/getting-started/.