Next: Match Structures, Up: Regular Expressions
By default, Guile supports POSIX extended regular expressions. That means that the characters ‘(’, ‘)’, ‘+’ and ‘?’ are special, and must be escaped if you wish to match the literal characters.
This regular expression interface was modeled after that implemented by SCSH, the Scheme Shell. It is intended to be upwardly compatible with SCSH regular expressions.
Compile the string pattern into a regular expression and compare it with str. The optional numeric argument start specifies the position of str at which to begin matching.
string-matchreturns a match structure which describes what, if anything, was matched by the regular expression. See Match Structures. If str does not match pattern at all,string-matchreturns#f.
Each time string-match is called, it must compile its
pattern argument into a regular expression structure. This
operation is expensive, which makes string-match inefficient if
the same regular expression is used several times (for example, in a
loop). For better performance, you can compile a regular expression in
advance and then match strings against the compiled regexp.
Compile the regular expression described by pat, and return the compiled regexp structure. If pat does not describe a legal regular expression, throw a
regular-expression-syntaxerror.The flags change the behavior of the compiled regexp. The following flags may be supplied:
regexp/icase- Consider uppercase and lowercase letters to be the same when matching.
regexp/newline- If a newline appears in the target string, then permit the ‘^’ and ‘$’ operators to match immediately after or immediately before the newline, respectively. Also, the ‘.’ and ‘[^...]’ operators will never match a newline character. The intent of this flag is to treat the target string as a buffer containing many lines of text, and the regular expression as a pattern that may match a single one of those lines.
regexp/basic- Compile a basic (“obsolete”) regexp instead of the extended (“modern”) regexps that are the default. Basic regexps do not consider ‘|’, ‘+’ or ‘?’ to be special characters, and require the ‘{...}’ and ‘(...)’ metacharacters to be backslash-escaped (see Backslash Escapes). There are several other differences between basic and extended regular expressions, but these are the most significant.
regexp/extended- Compile an extended regular expression rather than a basic regexp. This is the default behavior; this flag will not usually be needed. If a call to
make-regexpincludes bothregexp/basicandregexp/extendedflags, the one which comes last will override the earlier one.
Match the compiled regular expression rx against str. If the optional integer argument start is provided, begin matching from that position in the string. Return a match structure describing the results of the match, or
#fif no match could be found. Optional integer argument flags changes the behavior of the matching, similar tomake-regexp.
Return
#tif obj is a compiled regular expression, or#fotherwise.
Regular expressions are commonly used to find patterns in one string and replace them with the contents of another string.
Write to the output port port selected contents of the match structure match. Each item specifies what should be written, and may be one of the following arguments:
- A string. String arguments are written out verbatim.
- An integer. The submatch with that number is written.
- The symbol ‘pre’. The portion of the matched string preceding the regexp match is written.
- The symbol ‘post’. The portion of the matched string following the regexp match is written.
port may be
#f, in which case nothing is written; instead,regexp-substituteconstructs a string from the specified items and returns that.
The following example takes a regular expression that matches a standard
yyyymmdd-format date such as "20020828". The
regexp-substitute call returns a string computed from the
information in the match structure, consisting of the fields and text
from the original string reordered and reformatted.
(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
(define s "Date 20020429 12am.")
(define sm (string-match date-regex s))
(regexp-substitute #f sm 'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
⇒ "Date 04-29-2002 12am. (20020429)"
Similar to
regexp-substitute, but can be used to perform global substitutions on str. Instead of taking a match structure as an argument,regexp-substitute/globaltakes two string arguments: a regexp string describing a regular expression, and a target string which should be matched against this regular expression.Each item behaves as in
regexp-substitute, with the following exceptions:
- A function may be supplied. When this function is called, it will be passed one argument: a match structure for a given regular expression match. It should return a string to be written out to port.
- The ‘post’ symbol causes
regexp-substitute/globalto recurse on the unmatched portion of str. This must be supplied in order to perform global search-and-replace on str; if it is not present among the items, thenregexp-substitute/globalwill return after processing a single match.