On this page:
RX
RX.num_  captures
RX.capture_  names
RX.has_  backreference
RX.element
RX.match
RX.match_  in
RX.match_  range
RX.match_  range_  in
RX.is_  match
RX.is_  match_  in
RX.try_  match
RX.try_  match_  in
RX.matches
RX.split
RX.replace
RX.replace_  all
RX.max_  lookbehind
RX.handle
RX.in_  handle
RX.from_  handles
9.0.0.1

12.2.3 Regexp Objects🔗ℹ

class

class RX()

Represents a regexp as created with rx or rx_in. This class cannot be instantiated directly.

property

property (regexp :: RX).num_captures :: Int

 

property

property (regexp :: RX).capture_names :: Map

 

property

property (regexp :: RX).has_backreference :: Boolean

 

property

property (regexp :: RX).element :: (#'char || #'byte)

Properties of a regexp: its number of capture groups, a mapping of symbolic capture-group names to indices (counting from 1), whether the regexp is implemented with backreferences (which affects regexp splicing via $), and whether a match is in terms of characters or bytes.

method

method (regexp :: RX).match(input :: String || Bytes || Port.Input,

                            ~start: start :: Int = 0,

                            ~end: end :: maybe(Int) = #false,

                            ~input_prefix: input_prefix :: Bytes = #"",

                            ~unmatched_out: out :: maybe(Port.Output)

                                              = #false)

  :: maybe(RXMatch)

 

method

method (regexp :: RX).match_in(....) :: maybe(RXMatch)

 

method

method (regexp :: RX).match_range(....) :: maybe(RXMatch)

 

method

method (regexp :: RX).match_range_in(....) :: maybe(RXMatch)

 

method

method (regexp :: RX).is_match(....) :: Boolean

 

method

method (regexp :: RX).is_match_in(....) :: Boolean

Attempts to match a regular expression to input. For a regexp created with rx, the entire content (between start and end) must match for RX.match, while RX.match_in can match a portion of the input. For a regexp created with rx_in, both RX.match and RX.match_in can match a portion of the input.

The RX.match_range and RX.match_range_in methods are like RX.match and RX.match_in, but the resulting RXMatch object reports Range results instead of String or Bytes results. Range results are in terms of the start of the input, so if start is not 0, matching ranges will have only values of start and greater. Such ranges can be used with String.substring or Bytes.subbytes to obtain results that would have been produced by RX.match and RX.match_in.

The RX.is_match and RX.is_match_in methods are like RX.match and RX.match_in, but report just a boolean instead of assembling a RXMatch value in the case of a match.

> rx'"a"'.match("a")

RXMatch("a", [], {})

> rx'"a"'.match("ab")

#false

> rx'"a"'.match_in("ab")

RXMatch("a", [], {})

> rx'"a"'.is_match("ab")

#false

> rx'"a"'.is_match_in("ab")

#true

The start and end arguments select a portion of the input to apply the match, where false for end corresponds to the end of input. The start and end positions correspond to characters for a string as input, and they correspond to bytes for a byte string or input port as input. Portions of input outside of that range are ignored. For example, bof matches the start offset of the full input.

> rx'"a"*'.match_in("a aa aaa", ~start: 2)

RXMatch("aa", [], {})

The input_prefix argument specifies bytes that effectively precede input for the purposes of bol and other lookbehind matching. For example, a #"" prefix means that bof matches at the beginning of the input, while a #"\n" prefix means that bol can match the beginning of the input, while a bof cannot.

> rx'bol "a"*'.match_in("aaa")

RXMatch("aaa", [], {})

> rx'bol "a"*'.match_in("aaa", ~input_prefix: #"x")

#false

If out is provided as an output port for the ~unmatched_out argument, the part of input from its beginning (including before start) that precedes the match is written to the port. All input up to end is written to out if no match is found. This functionality is most useful when input is an input port.

> def out = Port.Output.open_string()

> rx'"a"+'.match_in("before aaa after", ~unmatched_out: out)

RXMatch("aaa", [], {})

> out.get_string()

"before "

method

method (regexp :: RX).try_match(input :: Port.Input,

                                ~start: start :: Int = 0,

                                ~end: end :: maybe(Int) = #false,

                                ~input_prefix: input_prefix :: Bytes = #"",

                                ~unmatched_out: out :: maybe(Port.Output)

                                                  = #false)

  :: maybe(RXMatch)

 

method

method (regexp :: RX).try_match_in(....) :: maybe(RXMatch)

Like RX.match and RX.match_in, but no bytes are consumed from input if the pattern does not match.

> def p = Port.Input.open_string("hello")

> rx'"hi"'.try_match(p)

#false

> p.peek_char()

Char"h"

> rx'"hi"'.match(p)

#false

> p.peek_char()

Port.eof

method

method (regexp :: RX).matches(input :: String || Bytes || Port.Input,

                              ~start: start :: Int = 0,

                              ~end: end :: maybe(Int) = #false,

                              ~input_prefix: input_prefix :: Bytes = #"")

  :: List.of(String || Bytes)

 

method

method (regexp :: RX).split(input :: String || Bytes || Port.Input,

                            ~start: start :: Int = 0,

                            ~end: end :: maybe(Int) = #false,

                            ~input_prefix: input_prefix :: Bytes = #"")

  :: List.of(String || Bytes)

Like RX.match_in, but finding all non-overlapping matches. The RX.matches method returns the found matches, and RX.split returns the complement, i.e., the strings that are between matches. The result from RX.split will start or end with empty strings if the regexp matches the start or end of the input, respectively.

> rx'any ["abc"] any'.matches("xbx ycy")

["xbx", "ycy"]

> rx'any ["abc"] any'.matches(#"xbx ycy")

[Bytes.copy(#"xbx"), Bytes.copy(#"ycy")]

method

method (regexp :: RX).replace(

  input :: String || Bytes,

  insert :: (String || Bytes || Function.of_arity(1+num_captures)),

  ~input_prefix: input_prefix :: Bytes = #""

) :: String || Bytes

 

method

method (regexp :: RX).replace_all(....)

  :: String || Bytes

Like RX.match_in, but restricted to string and byte string inputs, and returning the input with the partial matches replaced by insert. The RX.replace method replaces only the first partial match, while RX.replace_all replaces all non-overlapping partial matches.

If insert is a string or byte string, then it is used in place of a match for the output. If insert is a function, then it receives at least one argument, plus an additional argument for each capture group in the regular expression; the result of calling input for each match is used as the replacement for the match.

> rx'any "x" any'.replace("extra text", "_")

"_ra text"

> rx'any "x" any'.replace_all("extra text", "_")

"_ra t_"

> rx'any "x" any'.replace("extra text", fun (s): "(" ++ s ++ ")")

"(ext)ra text"

> rx'any "x" any'.replace_all("extra text", fun (s): "(" ++ s ++ ")")

"(ext)ra t(ext)"

> rx'any "x" ($last: any)'.replace_all("extra text",

                                       fun (s, l): "(" ++ l ++ ")")

"(t)ra t(t)"

method

method (regexp :: RX).max_lookbehind()

Reports the maximum number of characters or bytes needed before the start of a match.

> rx'lookbehind("abc")'.max_lookbehind()

3

> rx'any lookbehind("abc")'.max_lookbehind()

2

> rx'any'.max_lookbehind()

0

property

property (regexp :: RX).handle

 

property

property (regexp :: RX).in_handle

The RX.handle and RX.in_handle properties produce a Racket-level regular expression object that corresponds to RX.match and RX.match_in, respectively.

function

fun RX.from_handles(handle,

                    in_handle,

                    num_captures :: NonnegInt,

                    vars :: Map.of(Symbol, NonnegInt),

                    ~has_backref: has_backref = #false,

                    ~source: source :: String = "rx '....'")

  :: RX

Constructs an RX object given Racket-level regular expressions for whole-input and partial-input matching, the number of capture groups in the pattern (which should be the same for both handles), and a mapping from capture-group names, if any, to indices. The optional has_backref argument determines whether the RX can be spliced into other regexp patterns. The optional source string is the printed representation of the pattern.