Racket

randstr: Random String Generator🔗ℹ

Hugefiver

(require randstr)

package: randstr

Version 0.1.1

A library for generating random strings based on regex-like patterns.

1 Features🔗ℹ

Generate random strings from regex-like patterns
Support for character classes, ranges, and quantifiers
POSIX character classes ([:alpha:], [:digit:], etc.)
Unicode property support (\\p{L}, \\p{Script=Han}, etc.)
Normal distribution quantifiers for realistic length variation
Named groups and backreferences for pattern reuse
Command-line interface for quick generation
Fair distribution: duplicate characters in classes are deduplicated
Cryptographically secure random number generation option

2 Functions🔗ℹ

procedure
(randstr pattern) → string?
pattern : string?

Generate a random string based on the given pattern.

Examples:

(randstr "[a-z]{5}")
(randstr "[0-9][a-z]+")
(randstr "(abc|def)+")

procedure
(randstr* pattern n) → (listof string?)
pattern : string?
n : exact-positive-integer?

Generate a list of n random strings based on the given pattern.

Examples:

(randstr* "[0-9]{3}" 5)

parameter
(randstr-secure-random?) → boolean?
(randstr-secure-random? secure?) → void?
secure? : boolean?

When set to #t, all random number generation uses cryptographically secure random numbers (via crypto-random-bytes). Default is #f.

Use this for security-sensitive applications like generating tokens, passwords, or secrets.

(parameterize ([randstr-secure-random? #t])
(randstr "[A-Za-z0-9]{32}"))

(randstr-secure-random? #t)
(randstr "[0-9]{6}")

Note: Cryptographically secure random is slower than the default pseudo-random number generator, but provides unpredictable output suitable for security purposes.

3 Pattern Syntax🔗ℹ

The following pattern syntax is supported:

{n} - Exactly n repetitions
{n+} - Normal distribution with mean n (2nd order)
{n++} - Normal distribution with mean n (3rd order, more concentrated)
{n1+n2} - Normal distribution in range n1..n2 (2nd order)
{n1++n2} - Normal distribution in range n1..n2 (3rd order)
{+n} - Shorthand for {0+n} (range 0..n)
{++n} - Shorthand for {0++n} (range 0..n, 3rd order)
(?<name>...) - Named group (captures pattern for later reference)
\\k<name> - Backreference to named group
[abc] - Choose randomly from characters a, b, or c
[a-z] - Choose randomly from lowercase letters a through z
(abc|def) - Choose randomly between "abc" or "def"
a* - Zero or more of the preceding character
a+ - One or more of the preceding character
a? - Zero or one of the preceding character
. - Any character
[:alpha:] - Alphabetic characters
[:digit:] - Numeric characters
[:alnum:] - Alphanumeric characters (POSIX standard) or [:alphanum:] (alias)
[:word:] - Word characters (alphanumeric plus underscore)
[:blank:] - Blank characters (space and tab)
[:space:] - Whitespace characters
[:upper:] - Uppercase letters
[:lower:] - Lowercase letters
[:ascii:] - ASCII characters
[:cntrl:] - Control characters
[:graph:] - Printable characters except space
[:print:] - Printable characters including space
[:punct:] - Punctuation characters
[:xdigit:] - Hexadecimal digits
\\p{L} - Unicode letters
\\p{N} - Unicode numbers
\\p{P} - Unicode punctuation
\\p{M} - Unicode marks
\\p{S} - Unicode symbols
\\p{Z} - Unicode separators
\\p{C} - Unicode other (control characters)
\\p{Lu} - Unicode uppercase letters
\\p{Ll} - Unicode lowercase letters
\\p{Nd} - Unicode decimal numbers
\\p{Letter} - Unicode letters (alias for \\p{L})
\\p{Number} - Unicode numbers (alias for \\p{N})
\\p{Punctuation} - Unicode punctuation (alias for \\p{P})
\\p{Script=Han} - Unicode characters from Han script
\\p{Script=Latin} - Unicode characters from Latin script
\\p{Block=Basic_Latin} - Unicode characters from Basic Latin block
\\p{Block=CJK_Unified_Ideographs} - Unicode characters from CJK Unified Ideographs block
\\p{Alphabetic} - Unicode alphabetic characters
\\p{Uppercase} - Unicode uppercase characters
\\p{Lowercase} - Unicode lowercase characters
\\p{White_Space} - Unicode whitespace characters
\\p{Cased} - Unicode characters with case distinctions
\\p{Dash} - Unicode dash characters
\\p{Emoji} - Unicode emoji characters
\\p{Emoji_Component} - Unicode emoji component characters
\\p{Emoji_Modifier} - Unicode emoji modifier characters
\\p{Emoji_Modifier_Base} - Unicode emoji modifier base characters
\\p{Emoji_Presentation} - Unicode emoji presentation characters
\\p{Extended_Pictographic} - Unicode extended pictographic characters
\\p{Hex_Digit} - Unicode hexadecimal digits
\\p{ID_Continue} - Unicode identifier continuation characters
\\p{ID_Start} - Unicode identifier start characters
\\p{Ideographic} - Unicode ideographic characters
\\p{Math} - Unicode mathematical symbols
\\p{Quotation_Mark} - Unicode quotation mark characters

4 Advanced Examples🔗ℹ

In addition to basic pattern matching, the library supports more complex patterns:

(randstr "[[:alpha:]]{5}")
(randstr "[[:digit:]]{3}")
(randstr "[[:alnum:]]{4}")
(randstr "[[:word:]]+")
(randstr "[[:upper:]0-9]+")
(randstr "[[:lower:]_]+")
(randstr "[[:alpha:]0-9]+")
(randstr "\\p{L}{5}")
(randstr "\\p{N}{3}")
(randstr "\\p{P}{2}")
(randstr "\\p{Lu}{3}\\p{Ll}{3}")
(randstr "\\p{Letter}{5}")
(randstr "\\p{Number}{3}")
(randstr "\\p{Script=Han}{2}")
(randstr "\\p{Block=Basic_Latin}{5}")
(randstr "\\p{Alphabetic}{4}")
(randstr "\\p{White_Space}{3}")

5 Normal Distribution Quantifiers🔗ℹ

Generate strings with lengths following a normal distribution for more realistic random data:

(randstr "\\w{10+}")
(randstr "\\w{10++}")
(randstr "\\w{5+15}")
(randstr "\\w{5++15}")
(randstr "\\d{+10}")
(randstr "\\d{++10}")

Higher order (more + signs) means values are more concentrated around the center/mean.

6 Named Groups and Backreferences🔗ℹ

Capture generated content and reuse it later in the pattern:

(randstr "(?<word>\\w{4})-\\k<word>")
(randstr "(?<id>\\d{3}):\\k<id>")
(randstr "(?<a>[A-Z]{2})(?<b>\\d{2})-\\k<a>\\k<b>")

Named groups are defined with (?<name>...) and referenced with \\k<name>. The backreference will produce the exact same string that was generated by the named group.

7 Character Class Duplicate Handling🔗ℹ

When a character class contains duplicate elements, each unique character is treated equally regardless of how many times it appears in the class. For example:

[aaabbbccc] - Each of a, b, c has equal probability (1/3 each), not a=3/9, b=3/9, c=3/9
[a-cb-e] - Each of a, b, c, d, e has equal probability (1/5 each)
[[:digit:]0-2] - Digits 0, 1, 2 appear in both the POSIX class and the range, but each digit still has equal probability

This ensures fair distribution of character selection in all character classes.

8 Changelog🔗ℹ

8.1 Version 0.1.1🔗ℹ

New: Normal distribution quantifiers ({n+}, {n++}, etc.) for realistic length variation
New: Range normal distribution ({n1+n2}, {n1++n2}, {+n}, {++n})
New: Named groups (?<name>...) for capturing generated content
New: Backreferences \\k<name> for reusing captured content

8.2 Version 0.1.0🔗ℹ

Initial stable release
Fixed: \\W no longer incorrectly matches underscore
Performance: Optimized character class deduplication with O(1) hash-set lookups
Cleaned up internal code architecture

9 License🔗ℹ

This project is licensed under the MIT License. See the "LICENSE" file for details.

1	Features
2	Functions
3	Pattern Syntax
4	Advanced Examples
5	Normal Distribution Quantifiers
6	Named Groups and Backreferences
7	Character Class Duplicate Handling
8	Changelog
9	License