On this page:
rx_  charset
#%literal
#%juxtapose
||
#%call
#%parens
-
&&
--
!
any
alpha
upper
lower
digit
xdigit
alnum
word
newline
blank
space
graph
print
cntrl
ascii
latin1
unicode.Ll
unicode.Lu
unicode.Lt
unicode.Lm
unicode.Lx
unicode.Lo
unicode.L
unicode.Nd
unicode.Nl
unicode.No
unicode.N
unicode.Ps
unicode.Pe
unicode.Pi
unicode.Pf
unicode.Pc
unicode.Pd
unicode.Po
unicode.P
unicode.Mn
unicode.Mc
unicode.Me
unicode.M
unicode.Sc
unicode.Sk
unicode.Sm
unicode.So
unicode.S
unicode.Zl
unicode.Zp
unicode.Zs
unicode.Z
unicode.Cc
unicode.Cf
unicode.Cs
unicode.Cn
unicode.Co
unicode.C
rx_  charset_  meta.space
rx_  charset.macro
rx_  charset_  meta.Parsed
rx_  charset_  meta.After  Prefix  Parsed
rx_  charset_  meta.After  Infix  Parsed
8.15.0.12
12.2.2 Regexp Character Sets🔗ℹ

A character set is written with [] in a regexp pattern (via the implicit #%brackets operator). A character set represents a set of Chars, but as long as the characters range in Unicode value from 0 to 255, a character set can be used as a set of bytes to match for a byte-mode regexp.

The space for character set operators that can be used within [] in a regexp pattern.

regexp charset operator

#%literal string

 

regexp charset operator

#%literal bytes

A literal string or byte string can be used as a character set. Each character or byte is part of the set.

> rx'["a"]'.is_match("a")

#true

> rx'["a"]'.is_match("b")

#false

> rx'["abc"]'.is_match("b")

#true

regexp charset operator

charset #%juxtapose charset

 

regexp charset operator

charset || charset

 

regexp charset operator

charset #%call (charset)

Character sets that are adjacent or joined with || form a larger character set that includes all combined elements, i.e., a union of the sets. An implicit #%call form is treated like #%juxtapose, consistent with implicit uses of parentheses for grouping as handled by #%parens.

> rx'["a" "b"]'.is_match("a")

#true

> rx'["a" "b"]'.is_match("b")

#true

> rx'["a" "b"]'.is_match("c")

#false

> rx'["a" || "b"]'.is_match("b")

#true

regexp charset operator

#%parens (charset)

 

~order: rx_concatenation

A parenthesized character set is equivalent to the charset inside the parentheses. That is, parentheses are just for grouping and resolving precedence mismatches.

> rx'["a" "b" "c"]'.is_match("a")

#true

> rx'[("a" "b") "c"]'.is_match("a")

#true

regexp charset operator

charset - charset

 

~order: rx_enumeration

Assuming that each charset contains a single character, creates a charset that has those two characters and all characters in between (based on Char.to_int values). An error is reported if either charset has zero or multiple characters.

> rx'["a" - "y"]'.is_match("a")

#true

> rx'["a" - "y"]'.is_match("x")

#true

> rx'["a" - "y"]'.is_match("z")

#false

regexp charset operator

charset && charset

 

~order: rx_conjunction

Creates a character set that has each character in both the first charset and the second charset, i.e., an intersection of the sets.

> rx'[("a" - "f") && ("c" - "h")]'.is_match("a")

#false

> rx'[("a" - "f") && ("c" - "h")]'.is_match("d")

#true

regexp charset operator

charset -- charset

 

~order: rx_subtraction

Creates a character set that starts with the character of the first charset and removes each character of the second charset, i.e., set difference.

> rx'[("a" - "z") -- ("m" - "p")]'.is_match("n")

#false

> rx'[("a" - "z") -- ("m" - "p")]'.is_match("a")

#true

regexp charset operator

! charset

 

~weaker_than: ~other

Inverts charset by creating a character set that has every character not in charset.

> rx'[! "a" - "z"]'.is_match("n")

#false

> rx'[! "a" - "z"]'.is_match("0")

#true

regexp charset operator

any

A character set that has all characters.

> rx'[any]'.is_match("a")

#true

regexp charset operator

alpha

 

regexp charset operator

upper

 

regexp charset operator

lower

The alpha character set has all ASCII letters: a-z and A-Z. The upper character set has just A-Z, while the lower character set has just a-z.

> rx'[alpha]'.is_match("a")

#true

> rx'[alpha]'.is_match("0")

#false

> rx'[alpha]'.is_match("λ")

#false

> rx'[upper]'.is_match("A")

#true

> rx'[upper]'.is_match("a")

#false

> rx'[lower]'.is_match("a")

#true

> rx'[lower]'.is_match("A")

#false

regexp charset operator

digit

 

regexp charset operator

xdigit

The digit character set has all ASCII digits: 0-9. The xdigit character set adds the remaining hexadecimal digits: a-f, and A-F.

> rx'[digit]'.is_match("0")

#true

> rx'[digit]'.is_match("a")

#false

> rx'[xdigit]'.is_match("0")

#true

> rx'[xdigit]'.is_match("a")

#true

> rx'[xdigit]'.is_match("z")

#false

regexp charset operator

alnum

 

regexp charset operator

word

The alnum character set has all ASCII letters and digits: 0-9, a-z, and A-Z. The word character set adds _.

> rx'[alnum]'.is_match("0")

#true

> rx'[alnum]'.is_match("z")

#true

> rx'[alnum]'.is_match("_")

#false

> rx'[word]'.is_match("_")

#true

regexp charset operator

newline

 

regexp charset operator

blank

 

regexp charset operator

space

The newline character set has just the newline character (Char.to_int value 10). The blank character set has space (Char.to_int value 32) and tab (Char.to_int value 7). The space character set combines those and adds return (Char.to_int value 10) and form feed (Char.to_int value 12).

> rx'[blank]'.is_match(" ")

#true

regexp charset operator

graph

 

regexp charset operator

print

The graph character set has all ASCII characters that print with ink. The print character set adds space (Char.to_int value 32) and tab (Char.to_int.

regexp charset operator

cntrl

All ASCII control characters (Char.to_int values 0 through 31).

> rx'[cntrl]'.is_match("\n")

#true

> rx'[cntrl]'.is_match("a")

#false

regexp charset operator

ascii

 

regexp charset operator

latin1

The ascii character set has all ASCII characters (Char.to_int values 0 through 127), and the latin1 character set has all Latin-1 characters (Char.to_int 0 through 255).

> rx'[ascii]'.is_match("a")

#true

> rx'[ascii]'.is_match("é")

#false

> rx'[latin1]'.is_match("é")

#true

> rx'[latin1]'.is_match("λ")

#false

regexp charset operator

unicode.Ll

 

regexp charset operator

unicode.Lu

 

regexp charset operator

unicode.Lt

 

regexp charset operator

unicode.Lm

 

regexp charset operator

unicode.Lx

 

regexp charset operator

unicode.Lo

 

regexp charset operator

unicode.L

 

regexp charset operator

unicode.Nd

 

regexp charset operator

unicode.Nl

 

regexp charset operator

unicode.No

 

regexp charset operator

unicode.N

 

regexp charset operator

unicode.Ps

 

regexp charset operator

unicode.Pe

 

regexp charset operator

unicode.Pi

 

regexp charset operator

unicode.Pf

 

regexp charset operator

unicode.Pc

 

regexp charset operator

unicode.Pd

 

regexp charset operator

unicode.Po

 

regexp charset operator

unicode.P

 

regexp charset operator

unicode.Mn

 

regexp charset operator

unicode.Mc

 

regexp charset operator

unicode.Me

 

regexp charset operator

unicode.M

 

regexp charset operator

unicode.Sc

 

regexp charset operator

unicode.Sk

 

regexp charset operator

unicode.Sm

 

regexp charset operator

unicode.So

 

regexp charset operator

unicode.S

 

regexp charset operator

unicode.Zl

 

regexp charset operator

unicode.Zp

 

regexp charset operator

unicode.Zs

 

regexp charset operator

unicode.Z

 

regexp charset operator

unicode.Cc

 

regexp charset operator

unicode.Cf

 

regexp charset operator

unicode.Cs

 

regexp charset operator

unicode.Cn

 

regexp charset operator

unicode.Co

 

regexp charset operator

unicode.C

Each of these character sets contains all Unicode characters that have the named general category, such as Ll for lowercase letters. Each single-letter name, such as unicode.L, unions all of the other general categories that start with the same letter. The unicode.Lx character set unions unicode.Ll, unicode.Lu, unicode.Lt, and unicode.Lm.

> rx'[unicode.Ll]'.is_match("λ")

#true

Provided as meta.

A compile-time value that identifies the same space as rx_charset. See also SpaceMeta.

Like expr.macro, but defines a character set operator.

rx_charset.macro 'octal': '"0"-"7"'

rx_charset.macro 'maybe $charset': '$charset "?"'

> rx'[maybe(octal) "!"]*'.match("3?!4")

RXMatch("3?!4", [], {})

> rx'[maybe(octal)]'.match("8")

#false

Provided as meta.

Analogous to expr_meta.Parsed, etc., but for regexp character ranges.