========
Patterns
========
.. Modified: 2018-10-23/05:38-0400 btiffin
.. Copyright 2016 Brian Tiffin
.. GPL 3.0+ :ref:`license`
.. This file is part of the Unicon Programming documentation
.. image:: images/unicon.png
:align: center
.. only:: html
:ref:`genindex`
:floatright:`Unicon`
.. index::
pair: pattern; scanning
single: patterns
.. _patterns:
Unicon Pattern data
===================
.. index::
pair: patterns; SNOBOL
.. _snobol patterns:
SNOBOL patterns
---------------
Unicon version 13 alpha has :ref:`SNOBOL` inspired pattern matching. New
functions and operators were added to emulate the very powerful, and well
studied ``SNOBOL`` pattern matching features. This augments ``String
scanning`` quite nicely. These features introduce a new datatype, ``pattern``.
Details are in Technical Report UTR18a, http://unicon.org/utr/utr18.pdf.
SNOBOL is still relevant to many developers and SNOBOL4 implementations have
been made freely available, thanks in large part to Catspaw Inc.
There is also a very comprehensive tutorial hosted at http://www.snobol4.org.
Chapter 4 of the tutorial is about Pattern Matching.
http://www.snobol4.org/docs/burks/tutorial/ch4.htm
This is a conversion (with some changes to add a test pass, and outputting
results) of the small program listed in section 4.7 of that page:
.. literalinclude:: examples/snobols.icn
:language: unicon
:start-after: ##+
.. only:: html
.. rst-class:: rightalign
:download:`examples/snobols.icn`
.. program-output:: unicon -s snobols.icn -x
:cwd: examples
:ref:`Clint`, along with Sudarshan Gaikaiwari and John Goettsche carefully
designed this feature set to be an almost one to one correspondence to
``SNOBOL`` patterns. It provides a highly viable path for porting old,
beloved, ``SNOBOL`` programs to Unicon.
*Unicon currently lacks the full* ``eval`` *potential of* :ref:`SNOBOL` *but
ameliorates that downside, somewhat, by allowing invocation of functions and
methods along with variable and field references inside patterns.*
.. index:: patterns; internals
Internals
.........
To see a little bit of how the implementation actually works, let's take a
look at the preprocessor output. *The listing below has extra blank lines
squeezed out,* ``cat -s``\ *, and is reformatted,* ``fmt``\ *. This is only
for human curiousity and the listing below is not the version sent to the
compiler.*
.. command-output:: unicon -s -E snobols.icn | cat -s | fmt
:cwd: examples
:shell:
Nice. The SNOBOL operators are actually a new class of functions.
*I talked with Clinton about this, and for now, those functions are for
compiler internal use only. Much smarter, and cleaner, to use the operators.*
.. index:: regular expressions, patterns; regex
.. _regular expressions:
Regular expressions
-------------------
When SNOBOL patterns were added to Unicon, regular expression features were
also added. This means Unicon has the power of :ref:`string scanning`,
:ref:`snobol patterns` and ``regular expressions`` available. And all three
features can be freely mixed in string manipulation expressions. *Raising the
bar*.
Regular expression literals are surrounded by angle brackets, not quotes.
Pattern matching uses a :ref:`?? ` operator. As of early Unicon
release 13, regular expressions are limited to ``basic`` regex patterns.
.. literalinclude:: examples/hello-regex.icn
:language: unicon
:start-after: ##+
Displays a message when the subject includes some form of ``Hello, world``.
In the example, the first and last elements of the string list do not match.
The regular expression looks for Hello with or without a capital H, an
optional comma, any number of spaces or tabs (including zero), followed by
World (or world), with an optional exclamation mark.
.. program-output:: unicon -s hello-regex.icn -x
:cwd: examples
.. index:: patterns; operators
Pattern operators
-----------------
- ``??`` - a variant form of string scanning, s ?? p matching a pattern, not a
general Unicon expression as with :ref:`? ` scanning. Unanchored.
- ``=p`` - anchored match of pattern, p.
- ``.|`` - a pattern alternation. Accepts Unicon expressions as an operand.
- ``->`` - conditional assignment.
- ``=>`` - immediate assignment, (regardless of an actual successful match
result).
- ``.>`` - cursor position assignment.
- ```` - a regular expression literal is surrounded in angle brackets
(chevrons).
.. index:: patterns; syntax
Regex syntax
------------
Regular expressions can include the following components
- ``r`` - ordinary symbol that matches to r.
- ``r1 r2`` - juxtaposition is concatenation.
- ``r1 | r2`` - regular expression alternate (not a generator).
- ``r*`` - match zero or more occurrences of r.
- ``r+`` - match one or more occurrences of r.
- ``r?`` - match zero or one occurrences of r.
- ``r{n}`` - braces surround an integer count, match n occurrences of r.
- ``"lit"`` - match the literal string, with the usual escapes allowed.
- ``'lit'`` - cset literal matching any one character of the set, escapes
allowed.
- ``[chars]`` - cset literal with dash range syntax.
- ``.`` - match any character *except newline*.
- ``(r)`` - parentheses are used for grouping.
.. only:: html
..
--------
:ref:`genindex` | Previous: :doc:`strings` | Next: :doc:`objects`
|