12.2 Regular Expressions
import: rhombus/rx | package: rhombus-lib |
A regular expression, or regexp can be matched against the content of a string, byte string, or input port. The rx and rx_in forms create regexps, which are represented as RX objects. A successful match is represented as a RXMatch object, which reports either a matching (byte) string or a range of the input.
The rx and rx_in binding forms match input while directly binding named capture groups within the regexp pattern, instead of returning an RXMatch object.
A regexp matches in either character or byte mode. The mode is inferred by the elements of the pattern, but bytes or string can force a choice of mode. A regexp in character mode can be matched against a byte string or input port, in which case it matches UTF-8 sequences whose decoding matches the character regexp. A regexp in byte mode can similarly be matched against strings, where it matches a string whose UTF-8 encoding matches the string. Regexp matches are reported in terms of strings when the regexp is in character mode and when the input is a string; otherwise, matches are reported in terms of bytes.
See Regexp Patterns for patterns that can be used in pat.
The rx form produces a regexp that matches with RX.match only when the whole input string, byte string, or port content matches the pattern. An rx_in regexp matches with RX.match the same as with RX.match_in, which means that it always can match against a portion of the input.
RXMatch("abc", ["bc"], {#'more: 1})
"bc"
RXMatch("abc", [], {})
> rx_in'["a"-"z"]+'.match_range("_abc_")
RXMatch(1 .. 4, [], {})
See Regexp Patterns for patterns that can be used in pat.
The rx and rx_in bindings forms as analogous to the rx and rx_in expression forms, where rx matches only when the whole input matches, and rx_in can match a part of the input.