7.3 Characters

8.18.0.13

7.3 Characters🔗ℹ

A character is Unicode code point.

Characters are comparable, which means that generic operations like < and > work on characters.

annotation

Char

Matches characters.

expression

Char single_char_str

repetition

Char single_char_str

binding operator

Char single_char_str

Produces or matches a character. The single_char_str literal string must have exactly a single character, and that character is produced or matched.

> Char"a"
Char"a"
> match Char"a"
| Char"a" || Char"b": "yes"
| ~else: "no"
"yes"
> Char"too long"
Char: expected a literal single-character string
> Char"a" matches Char"too long"
Char: expected a literal single-character string

method

method (ch :: Char).to_int() :: NonnegInt

Returns the Unicode value of a character.

function

fun Char.from_int(i :: Int.in(0x0, 0x10FFFF)

&& !Int.in(0xD800, 0xDFFF))

:: Char

Returns the character corresponding to a Unicode value. The given i must be in the range 0x0 to 0x10FFFF, (inclusive), and not in the range 0xD800 to 0xDFFF (inclusive).

method

method (ch :: Char).upcase() :: Char

method

method (ch :: Char).downcase() :: Char

method

method (ch :: Char).foldcase() :: Char

method

method (ch :: Char).titlecase() :: Char

Unicode case conversions.

method

method (ch :: Char).is_alphabetic() :: Boolean

method

method (ch :: Char).is_lowercase() :: Boolean

method

method (ch :: Char).is_uppercase() :: Boolean

method

method (ch :: Char).is_titlecase() :: Boolean

method

method (ch :: Char).is_numeric() :: Boolean

method

method (ch :: Char).is_symbolic() :: Boolean

method

method (ch :: Char).is_punctuation() :: Boolean

method

method (ch :: Char).is_graphic() :: Boolean

method

method (ch :: Char).is_whitespace() :: Boolean

method

method (ch :: Char).is_blank() :: Boolean

method

method (ch :: Char).is_extended_pictographic() :: Boolean

method

method (ch :: Char).general_category() :: Symbol

method

method (ch :: Char).grapheme_break_property() :: Symbol

Character Unicode classifications:

alphabetic: Unicode “Alphabetic” property
lowercase: Unicode “Lowercase” property
uppercase: Unicode “Uppercase” property
titlecase: Unicode general category Lt
numeric: Unicode “Numeric_Type” property other than None
symbolic: Unicode general category Sm, Sc, Sk, or So
punctuation: Unicode general category Pc, Pd, Ps, Pe, Pi, Pf, or Po
graphic: alphabetic, numeric, symbolic, punctuation, or Unicode general category is Ll, Lm, Lo, Lt, Lu, Nd, Nl, No, Mn, Mc, or Me
whitespace: Unicode “White_Space” property
blank (horizontal whitespace): Unicode general category is Zs or the Tab character
ISO control: Unicode value between 0x0 and 0x1F (inclusive) or between 0x7F and 0x9F (inclusive)
extended pictographic: Unicode “Extended_Pictographic” property
general category: #'lu, #'ll, #'lt, #'lm, #'lo, #'mn, #'mc, #'me, #'nd, #'nl, #'no, #'ps, #'pe, #'pi, #'pf, #'pd, #'pc, #'po, #'sc, #'sm, #'sk, #'so, #'zs, #'zp, #'zl, #'cc, #'cf, #'cs, #'co, or #'cn.
grapheme break property: #'Other, #'CR, #'LF, #'Control, #'Extend, #'ZWJ, #'Regional_Indicator, #'Prepend, #'SpacingMark, #'L, #'V, #'T, #'LV, or #'LVT

method

method (ch :: Char).grapheme_step(state :: Int)

:: (Boolean, Int)

Encodes a state machine for Unicode’s grapheme-cluster specification on a sequence of code points. It accepts a character for the next code point in a sequence, and it returns two values: whether a (single) grapheme cluster has terminated since the most recently reported termination (or the start of the stream), and a new state to be used with Char.grapheme_step and the next character.

A value of 0 for state represents the initial state or a state where no characters are pending toward a new boundary. Thus, if a sequence of characters is exhausted and accumulated state is not 0, then the end of the stream creates one last grapheme-cluster boundary. When Char.grapheme_step produces a true value as its first result and a non-0 value as its second result, then the given ch must be the only character pending toward the next grapheme cluster (by the rules of Unicode grapheme clustering).

The Char.grapheme_step function will produce a result for any fixnum state, but the meaning of a non-0 state is specified only in that providing such a state produced by Char.grapheme_step in another call to Char.grapheme_step continues detecting grapheme-cluster boundaries in the sequence.

annotation

CharCI

A veneer for a character that redirects comparable operations like < and > to case-insensitive comparisons, equivalent to using Char.foldcase on each character before comparing.

As always for a veneer, CharCI works only in static mode (see use_static) to help ensure that it has the intended effect.

> Char"a" < Char"B"
#false
> (Char"a" :: CharCI) < (Char"B" :: CharCI)
#true

1	Notation and Conventions
2	Implicits and Context
3	Names and Definitions
4	Functions and Operators
5	Comparison and Branching
6	Objects and Annotations
7	Basic Data
8	Collections and Iteration
9	Object Protocols
10	Higher-Order Control
11	Code as Data
12	String Formatting and Matching
13	Input and Output
14	Operating System
15	Threads and Concurrency
16	Reflection and Security
17	Runtime System

7.1	Booleans
7.2	Numbers
7.3	Characters
7.4	Strings
7.5	Keywords
7.6	Symbols
7.7	Byte Strings
7.8	Boxes
7.9	Source Locations
7.10	Void