awk(1)                                                               awk(1)








 NAME
      awk - pattern-directed scanning and processing language


 SYNOPSIS
      awk [-Ffs] [-v var=value] [program | -f progfile ...] [file ...]


 DESCRIPTION
      awk scans each input file for lines that match any of a set of
      patterns specified literally in program or in one or more files
      specified as -f progfile.  With each pattern there can be an
      associated action that is to be performed when a line in a file
      matches the pattern.  Each line is matched against the pattern portion
      of every pattern-action statement, and the associated action is
      performed for each matched pattern.  The file name - means the
      standard input.  Any file of the form var=value is treated as an
      assignment, not a filename.  An assignment is evaluated at the time it
      would have been opened if it were a filename, unless the -v option is
      used.


      An input line is made up of fields separated by white space, or by
      regular expression FS.  The fields are denoted $1, $2, ...; $0 refers
      to the entire line.


    Options
      awk recognizes the following options and arguments:


           -F fs          Specify regular expression used to separate
                          fields.  The default is to recognize space and tab
                          characters, and to discard leading spaces and
                          tabs.  If the -F option is used, leading input
                          field separators are no longer discarded.


           -f progfile    Specify an awk program file.  Up to 100 program
                          files can be specified.  The pattern-action
                          statements in these files are executed in the same
                          order as the files were specified.


           -v var=value   Cause var=value assignment to occur before the
                          BEGIN action (if it exists) is executed.


    Statements
      A pattern-action statement has the form:


           pattern { action }


      A missing { action } means print the line; a missing pattern always
      matches.  Pattern-action statements are separated by new-lines or
      semicolons.


      An action is a sequence of statements.  A statement can be one of the
      following:






 Hewlett-Packard Company            - 1 -    HP-UX Release 10.20:  July 1996












 awk(1)                                                               awk(1)








           if(expression) statement [else statement]
           while(expression) statement
           for(expression;expression;expression) statement
           for(var in array) statement
           do statement while(expression)
           break
           continue
           {[statement ...]}
           expression                   # commonly var=expression
           print[expression-list] [> expression]
           printf format [, expression-list] [> expression]
           return [expression]
           next           # skip remaining patterns on this input line.
           delete array [expression]          # delete an array element.
           exit [expression]      # exit immediately; status is expression.


      Statements are terminated by semicolons, newlines or right braces.  An
      empty expression-list stands for $0.  String constants are quoted
      (""), with the usual C escapes recognized within.  Expressions take on
      string or numeric values as appropriate, and are built using the
      operators +, -, *, /, %, ^ (exponentiation), and concatenation
      (indicated by a blank).  The operators ++, --, +=, -=, *=, /=, %=, ^=,
      **=, >, >=, <, <=, ==, !=, and ?: are also available in expressions.
      Variables can be scalars, array elements (denoted x[i]) or fields.
      Variables are initialized to the null string.  Array subscripts can be
      any string, not necessarily numeric (this allows for a form of
      associative memory).  Multiple subscripts such as [i,j,k] are
      permitted.  The constituents are concatenated, separated by the value
      of SUBSEP.


      The print statement prints its arguments on the standard output (or on
      a file if >file or >>file is present or on a pipe if |cmd is present),
      separated by the current output field separator, and terminated by the
      output record separator.  file and cmd can be literal names or
      parenthesized expressions.  Identical string values in different
      statements denote the same open file.  The printf statement formats
      its expression list according to the format (see printf(3)).


    Built-In Functions
      The built-in function close(expr) closes the file or pipe expr opened
      by a print or printf statement or a call to getline with the same
      string-valued expr.  This function returns zero if successful,
      otherwise, it returns non-zero.


      The customary functions exp, log, sqrt, sin, cos, atan2 are built in.
      Other built-in functions are:


         blength[([s])]    Length of its associated argument (in bytes)
                           taken as a string, or of $0 if no argument.










 Hewlett-Packard Company            - 2 -    HP-UX Release 10.20:  July 1996












 awk(1)                                                               awk(1)








         length[([s])]     Length of its associated argument (in characters)
                           taken as a string, or of $0 if no argument.


         rand()            Returns a random number between zero and one.


         srand([expr])     Sets the seed value for rand, and returns the
                           previous seed value.  If no argument is given,
                           the time of day is used as the seed value;
                           otherwise, expr is used.


         int(x)            Truncates to an integer value


         substr(s, m[, n]) Return the at most n-character substring of s
                           that begins at position m, numbering from 1.  If
                           n is omitted, the substring is limited by the
                           length of string s.


         index(s, t)       Return the position, in characters, numbering
                           from 1, in string s where string t first occurs,
                           or zero if it does not occur at all.


         match(s, ere)     Return the position, in characters, numbering
                           from 1, in string s where the extended regular
                           expression ere occurs, or 0 if it does not.  The
                           variables RSTART and RLENGTH are set to the
                           position and length of the matched string.


         split(s, a[, fs]) Splits the string s into array elements a[1],
                           a[2], ..., a[n], and returns n.  The separation
                           is done with the regular expression fs, or with
                           the field separator FS if fs is not given.


         sub(ere, repl [, in])
                           Substitutes repl for the first occurrence of the
                           extended regular expression ere in the string in.
                           If in is not given, $0 is used.


         gsub              Same as sub except that all occurrences of the
                           regular expression are replaced; sub and gsub
                           return the number of replacements.


         sprintf(fmt, expr, ...)
                           String resulting from formatting expr ...
                           according to the printf(3S) format fmt


         system(cmd)       Executes cmd and returns its exit status


         toupper(s)        Converts the argument string s to uppercase and
                           returns the result.










 Hewlett-Packard Company            - 3 -    HP-UX Release 10.20:  July 1996












 awk(1)                                                               awk(1)








         tolower(s)        Converts the argument string s to lowercase and
                           returns the result.


      The built-in function getline sets $0 to the next input record from
      the current input file; getline < file sets $0 to the next record from
      filegetline x sets variable x instead.  Finally, cmd | getline
      pipes the output of cmd into getline; each call of getline returns the
      next line of output from cmd.  In all cases, getline returns 1 for a
      successful input, 0 for end of file, and -1 for an error.


    Patterns
      Patterns are arbitrary Boolean combinations (with ! || &&) of regular
      expressions and relational expressions.  awk supports Extended Regular
      Expressions as described in regexp(5).  Isolated regular expressions
      in a pattern apply to the entire line.  Regular expressions can also
      occur in relational expressions, using the operators ~ and !~/re/
      is a constant regular expression; any string (constant or variable)
      can be used as a regular expression, except in the position of an
      isolated regular expression in a pattern.


      A pattern can consist of two patterns separated by a comma; in this
      case, the action is performed for all lines from an occurrence of the
      first pattern though an occurrence of the second.


      A relational expression is one of the following:


           expression matchop regular-expression
           expression relop expression
           expression in array-name
           (expr,expr,...) in array-name


      where a relop is any of the six relational operators in C, and a
      matchop is either ~ (matches) or !~ (does not match).  A conditional
      is an arithmetic expression, a relational expression, or a Boolean
      combination of the two.


      The special patterns BEGIN and END can be used to capture control
      before the first input line is read and after the last.  BEGIN and END
      do not combine with other patterns.


    Special Characters
      The following special escape sequences are recognized by awk in both
      regular expressions and strings:


           Escape    Meaning
             \a      alert character
             \b      backspace character
             \f      form-feed character
             \n      new-line character
             \r      carriage-return character








 Hewlett-Packard Company            - 4 -    HP-UX Release 10.20:  July 1996












 awk(1)                                                               awk(1)








             \t      tab character
             \v      vertical-tab character
             \nnn    1- to 3-digit octal value nnn
             \xhhh   1- to n-digit hexadecimal number


    Variable Names
      Variable names with special meanings are:


           FS                Input field separator regular expression; a
                             space character by default; also settable by
                             option -Ffs.


           NF                The number of fields in the current record.


           NR                The ordinal number of the current record from
                             the start of input. Inside a BEGIN action the
                             value is zero. Inside an END action the value
                             is the number of the last record processed.


           FNR               The ordinal number of the current record in the
                             current file. Inside a BEGIN action the value
                             is zero. Inside an END action the value is the
                             number of the last record processed in the last
                             file processed.


           FILENAME          A pathname of the current input file.


           RS                The input record separator; a newline character
                             by default.


           OFS               The print statement output field separator; a
                             space character by default.


           ORS               The print statement output record separator; a
                             newline character by default.


           OFMT              Output format for numbers (default %.6g).  If
                             the value of OFMT is not a floating-point
                             format specification, the results are
                             unspecified.


           CONVFMT           Internal conversion format for numbers (default
                             %.6g).  If the value of CONVFMT is not a
                             floating-point format specification, the
                             results are unspecified.


           SUBSEP            The subscript separator string for multi-
                             dimensional arrays; the default value is " 34"


           ARGC              The number of elements in the ARGV array.








 Hewlett-Packard Company            - 5 -    HP-UX Release 10.20:  July 1996












 awk(1)                                                               awk(1)








           ARGV              An array of command line arguments, excluding
                             options and the program argument numbered from
                             zero to ARGC-1.


                             The arguments in ARGV can be modified or added
                             to; ARGC can be altered. As each input file
                             ends, awk will treat the next non-null element
                             of ARGV, up to the current value of ARGC-1,
                             inclusive, as the name of the next input file.
                             Thus, setting an element of ARGV to null means
                             that it will not be treated as an input file.
                             The name - indicates the standard input. If an
                             argument matches the format of an assignment
                             operand, this argument will be treated as an
                             assignment rather than a file argument.


           ENVIRON           Array of environment variables; subscripts are
                             names.  For example, if environment variable
                             V=thing, ENVIRON["V"] produces thing.


           RSTART            The starting position of the string matched by
                             the match function, numbering from 1. This is
                             always equivalent to the return value of the
                             match function.


           RLENGTH           The length of the string matched by the match
                             function.


      Functions can be defined (at the position of a pattern-action
      statement) as follows:


           function foo(a, b, c) { ...; return x }


      Parameters are passed by value if scalar, and by reference if array
      name.  Functions can be called recursively.  Parameters are local to
      the function; all other variables are global.


      Note that if pattern-action statements are used in an HP-UX command
      line as an argument to the awk command, the pattern-action statement
      must be enclosed in single quotes to protect it from the shell.  For
      example, to print lines longer than 72 characters, the pattern-action
      statement as used in a script (-f progfile command form) is:


           length > 72


      The same pattern action statement used as an argument to the awk
      command is quoted in this manner:


           awk 'length > 72'










 Hewlett-Packard Company            - 6 -    HP-UX Release 10.20:  July 1996












 awk(1)                                                               awk(1)








 EXTERNAL INFLUENCES
    Environment Variables
      LANG           Provides a default value for the internationalization
                     variables that are unset or null.  If LANG is unset or
                     null, the default value of "C" (see lang(5)) is used.
                     If any of the internationalization variables contains
                     an invalid setting, awk will behave as if all
                     internationalization variables are set to "C".  See
                     environ(5).


      LC_ALL         If set to a non-empty string value, overrides the
                     values of all the other internationalization variables.


      LC_CTYPE       Determines the interpretation of text as single and/or
                     multi-byte characters, the classification of characters
                     as printable, and the characters matched by character
                     class expressions in regular expressions.


      LC_NUMERIC     Determines the radix character used when interpreting
                     numeric input, performing conversion between numeric
                     and string values and formatting numeric output.
                     Regardless of locale, the period character (the
                     decimal-point character of the POSIX locale) is the
                     decimal-point character recognized in processing awk
                     programs (including assignments in command-line
                     arguments).


      LC_COLLATE     Determines the locale for the behavior of ranges,
                     equivalence classes and multi-character collating
                     elements within regular expressions.


      LC_MESSAGES    Determines the locale that should be used to affect the
                     format and contents of diagnostic messages written to
                     standard error and informative messages written to
                     standard output.


      NLSPATH        Determines the location of message catalogues for the
                     processing of LC_MESSAGES.


      PATH           Determines the search path when looking for commands
                     executed by system(cmd), or input and output pipes.


      In addition, all environment variables will be visible via the awk
      variable ENVIRON.


    International Code Set Support
      Single- and multi-byte character code sets are supported except that
      variable names must contain only ASCII characters and regular
      expressions must contain only valid characters.










 Hewlett-Packard Company            - 7 -    HP-UX Release 10.20:  July 1996












 awk(1)                                                               awk(1)








 DIAGNOSTICS
      awk supports up to 199 fields ($1, $2, ..., $199) per record.


 EXAMPLES
      Print lines longer than 72 characters:


           length > 72


      Print first two fields in opposite order:


           { print $2, $1 }


      Same, with input fields separated by comma and/or blanks and tabs:


           BEGIN { FS = ",[ \t]*|[ \t]+" }
                 { print $2, $1 }


      Add up first column, print sum and average:


                   { s += $1 }"
           END     { print "sum is", s, " average is", s/NR }


      Print all lines between start/stop pairs:


           /start/, /stop/


      Simulate echo command (see echo(1)):


           BEGIN   {                             # Simulate echo(1)
                   for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
                   printf "\n"
                   exit }


 AUTHOR
      awk was developed by AT&T, IBM, OSF, and HP.


 SEE ALSO
      lex(1), sed(1).
      A. V. Aho, B. W. Kernighan, P. J. Weinberger: The AWK Programming
      Language, Addison-Wesley, 1988.


 STANDARDS CONFORMANCE
      awk: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2






















 Hewlett-Packard Company            - 8 -    HP-UX Release 10.20:  July 1996