Search:

Regular Expressions

yawk's search function looks for regular expressions. If you don't have your regexp documentation around here's an excerpt from gawk's manpage. Formatting mistakes are mine.

-
    Regular Expressions
       Regular expressions are the extended kind found in  egrep.
       They are composed of characters as follows:
       c          matches the non-metacharacter c.
       \c         matches the literal character c.
       .          matches any character including newline.
       ^          matches the beginning of a string.
       $          matches the end of a string.
       [abc...]   character  list,  matches any of the characters
                  abc....
       [^abc...]  negated character list, matches  any  character
                  except abc....
       r1|r2      alternation: matches either r1 or r2.
       r1r2       concatenation: matches r1, and then r2.
       r+         matches one or more r's.
       r*         matches zero or more r's.
       r?         matches zero or one r's.
       (r)        grouping: matches r.

       \y         matches  the  empty string at either the begin-
                  ning or the end of a word.

       \B         matches the empty string within a word.

       \<         matches the empty string at the beginning of  a
                  word.

       \&>        matches  the empty string at the end of a word.

       \w         matches any word-constituent character (letter,
                  digit, or underscore).

       \W         matches  any  character  that  is not word-con-
                  stituent.

       \`         matches the empty string at the beginning of  a
                  buffer (string).

       \'         matches  the  empty  string  at  the  end  of a
                  buffer.

       The escape sequences that are valid  in  string  constants
       (see below) are also valid in regular expressions.

       Character  classes  are  a  new  feature introduced in the
       POSIX standard.  A character class is a  special  notation
       for  describing  lists  of characters that have a specific
       attribute, but where the actual characters themselves  can
       vary  from country to country and/or from character set to
       character set.  For example, the  notion  of  what  is  an
       alphabetic character differs in the USA and in France.

       A  character  class  is only valid in a regular expression
       inside  the  brackets  of  a  character  list.   Character
       classes  consist  of [:, a keyword denoting the class, and
       :].  The character classes defined by the  POSIX  standard
       are:

       [:alnum:]  Alphanumeric characters.

       [:alpha:]  Alphabetic characters.

       [:blank:]  Space or tab characters.

       [:cntrl:]  Control characters.

       [:digit:]  Numeric characters.

       [:graph:]  Characters that are both printable and visible.
                  (A space is printable, but not  visible,  while
                  an a is both.)


       [:upper:]  Upper-case alphabetic characters.

       [:xdigit:] Characters that are hexadecimal digits.

       For  example, before the POSIX standard, to match alphanu-
       meric  characters,   you   would   have   had   to   write
       /[A-Za-z0-9]/.  If your character set had other alphabetic
       characters in it, this would not match them, and  if  your
       character  set collated differently from ASCII, this might
       not even match the ASCII  alphanumeric  characters.   With
       the  POSIX character classes, you can write /[[:alnum:]]/,
       and this matches the alphabetic and numeric characters  in
       your character set.

       Two  additional  special sequences can appear in character
       lists.  These apply to non-ASCII character sets, which can
       have  single  symbols (called collating elements) that are
       represented with more than one character, as well as  sev-
       eral  characters  that  are  equivalent  for collating, or
       sorting, purposes.  (E.g., in French, a plain  "e"  and  a
       grave-accented e` are equivalent.)

       Collating Symbols
              A  collating  symbol is a multi-character collating
              element enclosed in [.  and .].  For example, if ch
              is a collating element, then [[.ch.]]  is a regular
              expression that  matches  this  collating  element,
              while  [ch]  is  a  regular expression that matches
              either c or h.

       Equivalence Classes
              An equivalence class is a locale-specific name  for
              a list of characters that are equivalent.  The name
              is enclosed in [= and =].  For example, the name  e
              might  be  used  to  represent all of "e," "`," and
              "`."  In this case, [[=e=]] is a regular expression
              that matches any of e, e</B'</B', or e</B`</B`.

       These  features  are very valuable in non-English speaking
       locales.  The library functions that gawk uses for regular
       expression matching currently only recognize POSIX charac-
       ter classes; they do not recognize  collating  symbols  or
       equivalence classes.

       The  \y, \B, \&<, \&>, \w, \W, \`, and \' operators are
       specific to gawk; they are extensions based on facilities
       in the GNU regular expression libraries.
-

< dag | at | awk-scripting.de >