On this page:
Char
Char
Char
Char
Char.to_  int
Char.from_  int
Char.upcase
Char.downcase
Char.foldcase
Char.titlecase
Char.is_  alphabetic
Char.is_  lowercase
Char.is_  uppercase
Char.is_  titlecase
Char.is_  numeric
Char.is_  symbolic
Char.is_  punctuation
Char.is_  graphic
Char.is_  whitespace
Char.is_  blank
Char.is_  extended_  pictographic
Char.general_  category
Char.grapheme_  break_  property
Char.grapheme_  step
Char  CI
8.17.0.6

7.3 Characters🔗ℹ

A character is Unicode code point.

Characters are comparable, which means that generic operations like < and > work on characters.

annotation

Char

Matches characters.

expression

Char single_char_str

 

repetition

Char single_char_str

 

binding operator

Char single_char_str

Produces or matches a character. The single_char_str literal string must have exactly a single character, and that character is produced or matched.

> Char"a"

Char"a"

> match Char"a"

  | Char"a" || Char"b": "yes"

  | ~else: "no"

"yes"

> Char"too long"

Char: expected a literal single-character string

> Char"a" matches Char"too long"

Char: expected a literal single-character string

method

method (ch :: Char).to_int() :: NonnegInt

Returns the Unicode value of a character.

function

fun Char.from_int(i :: Int.in(0x0, 0x10FFFF)

                    && !Int.in(0xD800, 0xDFFF))

  :: Char

Returns the character corresponding to a Unicode value. The given i must be in the range 0x0 to 0x10FFFF, (inclusive), and not in the range 0xD800 to 0xDFFF (inclusive).

method

method (ch :: Char).upcase() :: Char

 

method

method (ch :: Char).downcase() :: Char

 

method

method (ch :: Char).foldcase() :: Char

 

method

method (ch :: Char).titlecase() :: Char

Unicode case conversions.

method

method (ch :: Char).is_alphabetic() :: Boolean

 

method

method (ch :: Char).is_lowercase() :: Boolean

 

method

method (ch :: Char).is_uppercase() :: Boolean

 

method

method (ch :: Char).is_titlecase() :: Boolean

 

method

method (ch :: Char).is_numeric() :: Boolean

 

method

method (ch :: Char).is_symbolic() :: Boolean

 

method

method (ch :: Char).is_punctuation() :: Boolean

 

method

method (ch :: Char).is_graphic() :: Boolean

 

method

method (ch :: Char).is_whitespace() :: Boolean

 

method

method (ch :: Char).is_blank() :: Boolean

 

method

method (ch :: Char).is_extended_pictographic() :: Boolean

 

method

method (ch :: Char).general_category() :: Symbol

 

method

method (ch :: Char).grapheme_break_property() :: Symbol

Character Unicode classifications:

  • alphabetic: Unicode “Alphabetic” property

  • lowercase: Unicode “Lowercase” property

  • uppercase: Unicode “Uppercase” property

  • titlecase: Unicode general category Lt

  • numeric: Unicode “Numeric_Type” property other than None

  • symbolic: Unicode general category Sm, Sc, Sk, or So

  • punctuation: Unicode general category Pc, Pd, Ps, Pe, Pi, Pf, or Po

  • graphic: alphabetic, numeric, symbolic, punctuation, or Unicode general category is Ll, Lm, Lo, Lt, Lu, Nd, Nl, No, Mn, Mc, or Me

  • whitespace: Unicode “White_Space” property

  • blank (horizontal whitespace): Unicode general category is Zs or the Tab character

  • ISO control: Unicode value between 0x0 and 0x1F (inclusive) or between 0x7F and 0x9F (inclusive)

  • extended pictographic: Unicode “Extended_Pictographic” property

  • general category: #'lu, #'ll, #'lt, #'lm, #'lo, #'mn, #'mc, #'me, #'nd, #'nl, #'no, #'ps, #'pe, #'pi, #'pf, #'pd, #'pc, #'po, #'sc, #'sm, #'sk, #'so, #'zs, #'zp, #'zl, #'cc, #'cf, #'cs, #'co, or #'cn.

  • grapheme break property: #'Other, #'CR, #'LF, #'Control, #'Extend, #'ZWJ, #'Regional_Indicator, #'Prepend, #'SpacingMark, #'L, #'V, #'T, #'LV, or #'LVT

method

method (ch :: Char).grapheme_step(state :: Int)

  :: (Boolean, Int)

Encodes a state machine for Unicode’s grapheme-cluster specification on a sequence of code points. It accepts a character for the next code point in a sequence, and it returns two values: whether a (single) grapheme cluster has terminated since the most recently reported termination (or the start of the stream), and a new state to be used with Char.grapheme_step and the next character.

A value of 0 for state represents the initial state or a state where no characters are pending toward a new boundary. Thus, if a sequence of characters is exhausted and accumulated state is not 0, then the end of the stream creates one last grapheme-cluster boundary. When Char.grapheme_step produces a true value as its first result and a non-0 value as its second result, then the given ch must be the only character pending toward the next grapheme cluster (by the rules of Unicode grapheme clustering).

The Char.grapheme_step function will produce a result for any fixnum state, but the meaning of a non-0 state is specified only in that providing such a state produced by Char.grapheme_step in another call to Char.grapheme_step continues detecting grapheme-cluster boundaries in the sequence.

See also String.grapheme_span and String.grapheme_count.

annotation

CharCI

A veneer for a character that redirects comparable operations like < and > to case-insensitive comparisons, equivalent to using Char.foldcase on each character before comparing.

As always for a veneer, CharCI works only in static mode (see use_static) to help ensure that it has the intended effect.

> Char"a" < Char"B"

#false

> (Char"a" :: CharCI) < (Char"B" :: CharCI)

#true