1 Guide
This guide is a short, narrative tour of polars, modelled section by section on the upstream Polars getting-started guide. Every snippet below has a complete, runnable counterpart under user-guide/getting-started/ in the project repository, paired with an equivalent Python program so the two read side by side. Inside the dev shell:
racket user-guide/getting-started/series-and-dataframes.rkt |
python user-guide/getting-started/series_and_dataframes.py |
For the full definition of every binding mentioned here, see the Reference.
1.1 Series
A series is a typed, one-dimensional column. The generic series constructor infers a dtype from the values, or takes an explicit #:dtype; polars-null marks missing entries. A series is a wrapper value (series?) that prints in Polars’ format.
(define s (series '(1 2 3 4 5) #:name "a")) (describe s) (list (sum s) (min s) (max s) (mean s))
sum, min and max preserve the dtype; mean returns a float64. The generic accessors len, dtype and null-count read a series’ length, element dtype and null count.
1.2 DataFrames
A dataframe is a collection of equal-length named series, built with the dataframe constructor. It is a wrapper value (dataframe?) that prints as a Polars table, so plain display (or ~a) shows it. gregor datetimes become Polars datetime columns.
(require gregor) (define df (dataframe (list (series (list (datetime 2025 1 1) (datetime 2025 1 2)) #:name "date") (series '(1.0 2.0) #:name "float") (series '("a" "b") #:name "string")))) (displayln df)
Inspect it with shape, height, width, column-names, and describe. ref is the generic accessor: an element out of a series, a column (or, with a list, a projection) out of a dataframe.
(shape df) (column-names df) (ref (ref df #:columns "float") 0) (ref df #:columns '("date" "float"))
1.3 Reading & writing
DataFrames round-trip through CSV, Parquet, and newline-delimited JSON.
(dataframe-write-csv df "data.csv") (dataframe-read-csv "data.csv") (dataframe-write-parquet df "data.parquet") (dataframe-read-parquet "data.parquet") (dataframe-write-json-lines df "data.jsonl") (dataframe-read-json-lines "data.jsonl")
1.4 Expressions
Expressions (col, expr-mul, expr-sum, expr-alias, …) describe transformations, and are reused across the select, with_columns, filter, and group_by/agg contexts.
; select: choose and transform columns (dataframe-select-exprs df (list (col "group") (expr-alias (expr-mul (col "value") (col "cost")) "spend"))) ; with_columns: add derived columns (dataframe-with-columns df (list (expr-alias (expr-mul (col "value") 2) "double_value"))) ; filter: keep matching rows (dataframe-filter-expr df (expr-and (expr-gt (col "value") 15) (expr-lt (col "cost") 3.0))) ; group_by + agg: aggregate per group (dataframe-group-by-agg df '("group") (list (expr-alias (expr-sum (col "value")) "sum_value") (expr-alias (expr-count (col "value")) "n")))
1.5 Method chaining with ~>
Polars’ Python API reads as a chain of methods:
df.filter(pl.col("value") > 15) |
.group_by("group") |
.agg(pl.col("value").sum().alias("sum_value")) |
The same pipeline reads as a thread-first ~> chain — re-provided from polars, so (require polars) is all you need. Each step takes the frame as its first argument, so the frame flows through the chain:
(~> df (filter (> (col "value") 15)) (group-by "group") (agg (alias (sum (col "value")) "sum_value")))
The comparison operators (>, <, >=, <=, =, !=) are overloaded: they build an expression when an operand is an expression — (> (col "value") 15) — an eager boolean-mask series when an operand is a series — (> (ref df #:columns "value") 15) — and otherwise fall back to the numeric racket/base operator, so (> 3 2) still works. filter, sort, group-by and agg are data-first so they thread; group-by returns a deferred handle that agg consumes. Inside agg, sum, mean, min, max, count, first and friends take an expression (or a bare column name) and produce an aggregation, matching col("value").sum(). See Fluent pipelines for the full set, including the first/last name clash with racket/list.
1.6 Combining DataFrames
Join two frames on a key (eagerly, or as part of a lazy plan), and stack rows with dataframe-vstack.
(lazyframe-collect (lazyframe-join (dataframe-lazy users) (dataframe-lazy orders) #:on '("uid") #:how 'inner)) (dataframe-vstack users more-users)
For complete, runnable versions of all of the above — including the Python references — see user-guide/getting-started/.