On this page:
rx
#%literal
#%juxtapose
+  +
#%call
||
#%parens
#%brackets
#%index
*
+
?
#%comp
.
any
char
byte
.*
.+
.?
bof
bol
eof
eol
$
~~
lookahead
lookbehind
!
word_  boundary
word_  continue
if
cut
bytes
string
case_  sensitive
case_  insensitive
alpha
upper
lower
digit
xdigit
alnum
word
blank
newline
space
graph
print
cntrl
ascii
latin1
unicode.Ll
unicode.Lu
unicode.Lt
unicode.Lm
unicode.Lx
unicode.Lo
unicode.L
unicode.Nd
unicode.Nl
unicode.No
unicode.N
unicode.Ps
unicode.Pe
unicode.Pi
unicode.Pf
unicode.Pc
unicode.Pd
unicode.Po
unicode.P
unicode.Mn
unicode.Mc
unicode.Me
unicode.M
unicode.Sc
unicode.Sk
unicode.Sm
unicode.So
unicode.S
unicode.Zl
unicode.Zp
unicode.Zs
unicode.Z
unicode.Cc
unicode.Cf
unicode.Cs
unicode.Cn
unicode.Co
unicode.C
rx_  repetition
rx_  subtraction
rx_  enumeration
rx_  conjunction
rx_  disjunction
rx_  concatenation
rx_  meta.space
rx.macro
rx_  meta.Parsed
rx_  meta.After  Prefix  Parsed
rx_  meta.After  Infix  Parsed
8.15.0.12
12.2.1 Regexp Patterns🔗ℹ

The portion of a rx or rx_in form within '' is a pattern that is written with regexp pattern operators. Some pattern operators overlap with expression operators, but they have different meanings and precedence in a pattern. For example, the pattern operator * creates a repetition pattern, instead of multiplying like the expression * operator.

space

rx

The space for pattern operators that can be used within rx and rx_in forms.

regexp operator

#%literal string

 

regexp operator

#%literal bytes

 

~stronger_than: ~other

A literal string or byte string can be used as a pattern. It matches the string’s characters or bytes literally. See also case_insensitive.

> rx'"hello"'.match("hello")

RXMatch("hello", [], {})

> rx'"hello"'.match("olleh")

#false

> rx'#"a"'.match(#"a")

RXMatch(Bytes.copy(#"a"), [], {})

regexp operator

pat #%juxtapose pat

 

regexp operator

pat ++ pat

 

regexp operator

pat #%call (pat)

 

~order: rx_concatenation

Patterns that are adjacent in a larger pattern match in sequence. The ++ operator can be used to make sequencing explicit. An implicit #%call form is treated like #%juxtapose, consistent with implicit uses of parentheses for grouping as handled by #%parens.

> rx'"hello" " " "world"'.match("hello world")

RXMatch("hello world", [], {})

> rx'"hello" ++ " " ++ "world"'.match("hello world")

RXMatch("hello world", [], {})

> rx'"hello"

       ++ " "

       ++ "world"'.match("hello world")

RXMatch("hello world", [], {})

regexp operator

pat || pat

 

~order: rx_disjunction

Matches as either the first pat or second pat. The first pat is tried first.

> rx'"a" || "b"'.match("a")

RXMatch("a", [], {})

> rx'"a" || "b"'.match("b")

RXMatch("b", [], {})

> rx'"a" || "b"'.match("c")

#false

regexp operator

#%parens (pat)

 

~order: rx_concatenation

A parenthesized pattern is equivalent to the pat inside the parentheses. That is, parentheses are just for grouping and resolving precedence mismatches. See $ for information about capture groups, which are not implicitly created by parentheses (as they are in some traditional regexp languages).

> rx'"a" || "b" ++ "c"'.match("ac")

#false

> rx'("a" || "b") ++ "c"'.match("ac")

RXMatch("ac", [], {})

regexp operator

#%brackets [charset]

 

regexp operator

pat #%index [charset]

 

~order: rx_concatenation

A [] pattern, which is an implicit use of #%brackets, matches a single character or byte, where charset determines the matching characters or bytes. An implicit #%index form (see Implicit Forms) is treated as a sequence of a pat and #%brackets.

See Regexp Character Sets for character set forms that can be used in charset.

> rx'["a"-"z"]'.match("m")

RXMatch("m", [], {})

> rx'["a"-"z"]'.match("0")

#false

regexp operator

pat *

 

regexp operator

pat * mode

 

~order: rx_repetition

 

mode

 = 

~greedy

 | 

~nongreedy

 | 

~possessive

Matches a sequence of 0 or more matches to pat.

> rx'any*'.match("abc")

RXMatch("abc", [], {})

> rx'any*'.match("")

RXMatch("", [], {})

By default, the match uses ~greedy mode, where a larger number of matches is tried first—but subsequent patterns may cause backtracking to a shorter match. In ~nongreedy mode, shorter matches are tried first. The ~possessive mode is like ~greedy, but without backtracking (i.e., the longest match must succeed overall for the enclosing pattern); see also cut.

> rx'($head: any*) ($tail: any*)'.match("abc")

RXMatch("abc", ["abc", ""], {#'head: 1, #'tail: 2})

> rx'($head: any* ~nongreedy) ($tail: any*)'.match("abc")

RXMatch("abc", ["", "abc"], {#'head: 1, #'tail: 2})

> rx'any* ~greedy "z"'.match("abcz")

RXMatch("abcz", [], {})

> rx'any* ~possessive "z"'.match("abcz")

#false

regexp operator

pat +

 

regexp operator

pat + mode

 

~order: rx_repetition

Like *, but matches 1 or more instances of pat.

> rx'any+'.match("abc")

RXMatch("abc", [], {})

> rx'any+'.match("")

#false

regexp operator

pat ?

 

regexp operator

pat ? mode

 

~order: rx_repetition

Similar to *, but matches 0 or 1 instances of pat.

> rx'any?'.match("a")

RXMatch("a", [], {})

> rx'any?'.match("")

RXMatch("", [], {})

> rx'any?'.match("abc")

#false

regexp operator

pat #%comp {count}

 

regexp operator

pat #%comp {min ..}

 

regexp operator

pat #%comp {min .. max}

 

~order: rx_repetition

Using {} after a pattern, which is use of the implicit #%comp form, specifies a repetition like * or + more generally. If a single count is provided, it specifies an exact number of repetitions. If just min is provided, then it specifies a minimum number of repetitions, and there is no maximum. Finally, min and max both can be specified. Write 0 .. max to provide only an upper bound. Note that the expression form .. max creates a range that starts a #neginf, and the intent of requiring a min for a regexp repetition is to avoid suggesting that negative counts are possible. A count, min, or max must be a literal nonnegative integer.

> rx'any{2}'.match("aa")

RXMatch("aa", [], {})

> rx'any{2}'.match("aaa")

#false

> rx'any{2..}'.match("aa")

RXMatch("aa", [], {})

> rx'any{2..}'.match("aaa")

RXMatch("aaa", [], {})

> rx'any{2..3}'.match("aa")

RXMatch("aa", [], {})

> rx'any{2..3}'.match("aaa")

RXMatch("aaa", [], {})

> rx'any{2..3}'.match("aaaa")

#false

regexp operator

.

 

regexp operator

any

 

regexp operator

char

 

regexp operator

byte

Matches a single character or byte. The . pattern matches any character or byte except a newline, while any also matches a newline. The char and byte forms are like any and also imply that that the enclosing regexp matches strings or byte strings, respectively.

> rx'.'.match("a")

RXMatch("a", [], {})

> rx'.'.match("\n")

#false

> rx'any'.match("\n")

RXMatch("\n", [], {})

> rx'char'.match("\n")

RXMatch("\n", [], {})

> rx'byte'.match("\n")

RXMatch(Bytes.copy(#"\n"), [], {})

regexp operator

.*

 

regexp operator

.* mode

 

regexp operator

.+

 

regexp operator

.+ mode

 

regexp operator

.?

 

regexp operator

.? mode

Equivalent to . *, . +, and . ?, but allowing the space between the operators to be omitted.

> rx'.*'.match("abc")

RXMatch("abc", [], {})

regexp operator

bof

 

regexp operator

bol

Matches the start of input with bof or the position after a newline with bol.

A regexp created with rx (as opposed to rx_in) is implicitly prefixed with bof for use with methods like Regexp.match (as opposed to Regexp.match_in).

> rx'bof "a"'.match_in("a")

RXMatch("a", [], {})

> rx'bol "a"'.match_in("x\na")

RXMatch("a", [], {})

> rx'bof "a"'.match_in("x\na")

#false

regexp operator

eof

 

regexp operator

eol

Matches the end of input with eof or the position before a newline with eol.

A regexp created with rx (as opposed to rx_in) is implicitly suffixed with eof for use with methods like Regexp.match (as opposed to Regexp.match_in).

> rx'"a" eof'.match_in("a")

RXMatch("a", [], {})

> rx'"a" eol'.match_in("a\nx")

RXMatch("a", [], {})

> rx'"a" eof'.match_in("a\nx")

#false

regexp operator

$ identifier: pat

 

regexp operator

$ identifier

 

regexp operator

$ int

 

regexp operator

$ expr

The $ operator is overloaded for related uses:

regexp operator

~~ pat

Matches pat as an unnamed capture group. The capture group’s match can only be referenced by index (counting from 1).

> rx'any ~~any any*'.match("abc")[1]

"b"

> rx'any ~~any $1'.match("abb")

RXMatch("abb", ["b"], {})

regexp operator

lookahead(pat)

 

regexp operator

lookbehind(pat)

 

regexp operator

! lookahead(pat)

 

regexp operator

! lookbehind(pat)

Matches an empty position in the input where the subsequent (for lookahead) or preceding (for lookbehind) input matches pator does not match, when a ! prefix is used.

> rx'. "a" lookahead("p")'.match_in("cat nap")

RXMatch("na", [], {})

> rx'. "a" !lookahead("t")'.match_in("cat nap")

RXMatch("na", [], {})

> rx'lookbehind("n") "a" .'.match_in("cat nap")

RXMatch("ap", [], {})

> rx'!lookbehind("c") "a" .'.match_in("cat nap")

RXMatch("ap", [], {})

regexp operator

word_boundary

 

regexp operator

word_continue

Matches an empty position in the input. The word_boundary pattern matches between an alphanumeric ASCII character (a-z, A-Z, or 0-9) or _ and another character that is not alphanumeric or _. The word_continue pattern matches positions that do not match word_boundary.

> rx'any+ ~nongreedy word_boundary'.match_in("cat nap")

RXMatch("cat", [], {})

> rx'any+ ~nongreedy word_continue'.match_in("cat nap")

RXMatch("c", [], {})

regexp operator

if lookahead(pat) | then_pat | else_pat

 

regexp operator

if lookbehind(pat) | then_pat | else_pat

 

regexp operator

if ! lookahead(pat) | then_pat | else_pat

 

regexp operator

if ! lookbehind(pat) | then_pat | else_pat

 

regexp operator

if $ identifier | then_pat | else_pat

 

regexp operator

if $ int | then_pat | else_pat

Matches as then_pat or else_pat, depending on the form immediately after if, which must be either a lookahead, lookbehind, or backreference pattern.

> rx'($x: "x")* if $x | "s" | "."'.match_in("xxxs")

RXMatch("xxxs", ["x"], {#'x: 1})

> rx'($x: "x")* if $x | "s" | "."'.match_in(".")

RXMatch(".", [#false], {#'x: 1})

regexp operator

cut

Matches an empty position in the input. The first potential match that reaches cut is the only one that is allowed to succeed. Note that a possessive repetition mode like * ~possessive is equivalent to using cut after the repetition.

In the case of a rx_in pattern or use of RX.match_in, cut applies only to a match attempt at a given input position. It does not prevent trying the match at a later position.

> rx'("ax" || "a") cut "x"'.match("ax")

#false

> rx'("a" || "ax") cut "x"'.match("ax")

RXMatch("ax", [], {})

regexp operator

bytes: pat

 

regexp operator

string: pat

Matches he same as pat, but specifies explicitly either byte-string mode or string mode.

> rx'string: "a"'.match("a")

RXMatch("a", [], {})

> rx'bytes: "a"'.match("a")

RXMatch(Bytes.copy(#"a"), [], {})

> rx'string: any'.match(#"\x80")

#false

> rx'bytes: any'.match(#"\x80")

RXMatch(Bytes.copy(#"\200"), [], {})

regexp operator

case_sensitive: pat

 

regexp operator

case_insensitive: pat

Adjusts the treatment of literal strings and ranges in pat to match case-sensitive (the default) or case-insensitive. In case-insensitive mode, characters are folded individually (as opposed for folding a string sequence, which can change its length).

> rx'"hello"'.match("HELLO")

#false

> rx'case_insensitive: "hello"'.match("HELLO")

RXMatch("HELLO", [], {})

regexp operator

alpha

 

regexp operator

upper

 

regexp operator

lower

 

regexp operator

digit

 

regexp operator

xdigit

 

regexp operator

alnum

 

regexp operator

word

 

regexp operator

blank

 

regexp operator

newline

 

regexp operator

space

 

regexp operator

graph

 

regexp operator

print

 

regexp operator

cntrl

 

regexp operator

ascii

 

regexp operator

latin1

 

regexp operator

unicode.Ll

 

regexp operator

unicode.Lu

 

regexp operator

unicode.Lt

 

regexp operator

unicode.Lm

 

regexp operator

unicode.Lx

 

regexp operator

unicode.Lo

 

regexp operator

unicode.L

 

regexp operator

unicode.Nd

 

regexp operator

unicode.Nl

 

regexp operator

unicode.No

 

regexp operator

unicode.N

 

regexp operator

unicode.Ps

 

regexp operator

unicode.Pe

 

regexp operator

unicode.Pi

 

regexp operator

unicode.Pf

 

regexp operator

unicode.Pc

 

regexp operator

unicode.Pd

 

regexp operator

unicode.Po

 

regexp operator

unicode.P

 

regexp operator

unicode.Mn

 

regexp operator

unicode.Mc

 

regexp operator

unicode.Me

 

regexp operator

unicode.M

 

regexp operator

unicode.Sc

 

regexp operator

unicode.Sk

 

regexp operator

unicode.Sm

 

regexp operator

unicode.So

 

regexp operator

unicode.S

 

regexp operator

unicode.Zl

 

regexp operator

unicode.Zp

 

regexp operator

unicode.Zs

 

regexp operator

unicode.Z

 

regexp operator

unicode.Cc

 

regexp operator

unicode.Cf

 

regexp operator

unicode.Cs

 

regexp operator

unicode.Cn

 

regexp operator

unicode.Co

 

regexp operator

unicode.C

Each of these names is bound both as a character set and as a pattern that can be used directly, instead of wrapping in []. See the alpha, etc., character set for more information.

> rx'alpha'.match("m")

RXMatch("m", [], {})

> rx'alpha'.match("0")

#false

operator order

operator_order.def rx_repetition

 

operator order

operator_order.def rx_subtraction

 

operator order

operator_order.def rx_enumeration

 

operator order

operator_order.def rx_conjunction:

  ~weaker_than:

    rx_repetition

    rx_enumeration

 

operator order

operator_order.def rx_disjunction:

  ~weaker_than:

    rx_conjunction

    rx_repetition

    rx_enumeration

 

operator order

operator_order.def rx_concatenation:

  ~weaker_than:

    ~other

  ~stronger_than:

    rx_conjunction

    rx_disjunction

Provided as meta.

A compile-time value that identifies the same space as rx. See also SpaceMeta.

Like expr.macro, but defines a new regexp operator.

rx.macro 'upto_e($(n :: Int))':

  let n = n.unwrap()

  if n == 1

  | 'digit'

  | '["1"-"9"] digit{$(n-1)} || upto_e($(n-1))'

rx.macro 'pct':

  '("100" || upto_e(2)) "%"'

> rx'pct "/" pct "/" pct'.is_match("1%/42%/100%")

#true

syntax class

syntax_class rx_meta.Parsed

 

syntax class

syntax_class rx_meta.AfterPrefixParsed(name :: Name)

 

syntax class

syntax_class rx_meta.AfterInfixParsed(name :: Name)

Provided as meta.

Analogous to expr_meta.Parsed, etc., but for regexp patterns.