/*************************************************************************\ * Copyright (c) 2002 The University of Chicago, as Operator of Argonne * National Laboratory. * Copyright (c) 2002 The Regents of the University of California, as * Operator of Los Alamos National Laboratory. * EPICS BASE Versions 3.13.7 * and higher are distributed subject to a Software License Agreement found * in file LICENSE that is included with this distribution. \*************************************************************************/


NAME

     flex - fast lexical analyzer generator


SYNOPSIS

     flex [-bcdfinpstvFILT8 -C[efmF] -Sskeleton] [filename ...]


DESCRIPTION

     flex is a  tool  for  generating  scanners:  programs  which
     recognized  lexical  patterns in text.  flex reads the given
     input files, or its standard input  if  no  file  names  are
     given,  for  a  description  of  a scanner to generate.  The
     description is in the form of pairs of  regular  expressions
     and  C  code,  called  rules.  flex  generates as output a C
     source file, lex.yy.c, which defines a routine yylex(). This
     file is compiled and linked with the -lfl library to produce
     an executable.  When the executable is run, it analyzes  its
     input  for occurrences of the regular expressions.  Whenever
     it finds one, it executes the corresponding C code.

     For full documentation, see flexdoc(1). This manual entry is
     intended for use as a quick reference.


OPTIONS

     flex has the following options:

     -b   Generate  backtracking  information  to  lex.backtrack.
          This  is  a  list of scanner states which require back-
          tracking and the input characters on which they do  so.
          By adding rules one can remove backtracking states.  If
          all backtracking states are eliminated and -f or -F  is
          used, the generated scanner will run faster.

     -c   is a do-nothing, deprecated option included  for  POSIX
          compliance.

          NOTE: in previous releases of flex -c specified  table-
          compression  options.   This functionality is now given
          by the -C flag.  To ease the the impact of this change,
          when  flex encounters -c, it currently issues a warning
          message and assumes that -C was  desired  instead.   In
          the future this "promotion" of -c to -C will go away in
          the name of full POSIX  compliance  (unless  the  POSIX
          meaning is removed first).

     -d   makes the generated scanner run in debug  mode.   When-
          ever   a   pattern   is   recognized   and  the  global
          yy_flex_debug is non-zero (which is the  default),  the
          scanner will write to stderr a line of the form:

              --accepting rule at line 53 ("the matched text")

          The line number refers to the location of the  rule  in
          the  file defining the scanner (i.e., the file that was
          fed to flex).  Messages are  also  generated  when  the
          scanner  backtracks,  accepts the default rule, reaches
          the end of its input buffer (or encounters a  NUL;  the
          two  look  the same as far as the scanner's concerned),
          or reaches an end-of-file.

     -f   specifies (take your pick) full table or fast  scanner.
          No  table compression is done.  The result is large but
          fast.  This option is equivalent to -Cf (see below).

     -i   instructs flex to generate a case-insensitive  scanner.
          The  case  of  letters given in the flex input patterns
          will be ignored,  and  tokens  in  the  input  will  be
          matched  regardless of case.  The matched text given in
          yytext will have the preserved case (i.e., it will  not
          be folded).

     -n   is another do-nothing, deprecated option included  only
          for POSIX compliance.

     -p   generates a performance report to stderr.   The  report
          consists  of  comments  regarding  features of the flex
          input file which will cause a loss  of  performance  in
          the resulting scanner.

     -s   causes the default rule (that unmatched  scanner  input
          is  echoed to stdout) to be suppressed.  If the scanner
          encounters input that does not match any of its  rules,
          it aborts with an error.

     -t   instructs flex to write the  scanner  it  generates  to
          standard output instead of lex.yy.c.

     -v   specifies that flex should write to stderr a summary of
          statistics regarding the scanner it generates.

     -F   specifies that the fast  scanner  table  representation
          should  be  used.  This representation is about as fast
          as the full table representation  (-f),  and  for  some
          sets  of patterns will be considerably smaller (and for
          others, larger).  See flexdoc(1) for details.

          This option is equivalent to -CF (see below).

     -I   instructs flex to generate an interactive scanner, that
          is, a scanner which stops immediately rather than look-
          ing ahead if it knows that the currently  scanned  text
          cannot  be  part  of a longer rule's match.  Again, see
          flexdoc(1) for details.

          Note, -I cannot be used in  conjunction  with  full  or
          fast tables, i.e., the -f, -F, -Cf, or -CF flags.

     -L   instructs flex not  to  generate  #line  directives  in
          lex.yy.c. The default is to generate such directives so
          error messages in the actions will be correctly located
          with  respect  to the original flex input file, and not
          to the fairly meaningless line numbers of lex.yy.c.

     -T   makes flex run in trace mode.  It will generate  a  lot
          of  messages to stdout concerning the form of the input
          and the resultant non-deterministic  and  deterministic
          finite  automata.   This  option  is  mostly for use in
          maintaining flex.

     -8   instructs flex to generate an 8-bit scanner.   On  some
          sites,  this is the default.  On others, the default is
          7-bit characters.  To see which is the case, check  the
          verbose  (-v) output for "equivalence classes created".
          If the denominator of the number shown is 128, then  by
          default  flex is generating 7-bit characters.  If it is
          256, then the default is 8-bit characters.

     -C[efmF]
          controls the degree of table compression.

          -Ce directs  flex  to  construct  equivalence  classes,
          i.e.,  sets  of characters which have identical lexical
          properties.  Equivalence classes usually give  dramatic
          reductions  in the final table/object file sizes (typi-
          cally  a  factor  of  2-5)   and   are   pretty   cheap
          performance-wise   (one  array  look-up  per  character
          scanned).

          -Cf specifies that the full scanner  tables  should  be
          generated - flex should not compress the tables by tak-
          ing advantages of similar transition functions for dif-
          ferent states.

          -CF specifies that the alternate fast scanner represen-
          tation (described in flexdoc(1)) should be used.

          -Cm directs flex to construct meta-equivalence classes,
          which  are  sets of equivalence classes (or characters,
          if equivalence classes are not  being  used)  that  are
          commonly  used  together.  Meta-equivalence classes are
          often a big win when using compressed tables, but  they
          have  a  moderate  performance  impact (one or two "if"
          tests and one array look-up per character scanned).

          A lone -C specifies that the scanner tables  should  be
          compressed  but  neither  equivalence classes nor meta-
          equivalence classes should be used.
          The options -Cf or  -CF  and  -Cm  do  not  make  sense
          together - there is no opportunity for meta-equivalence
          classes if the table is not being  compressed.   Other-
          wise the options may be freely mixed.

          The default setting is -Cem, which specifies that  flex
          should   generate   equivalence   classes   and   meta-
          equivalence classes.  This setting provides the highest
          degree   of  table  compression.   You  can  trade  off
          faster-executing scanners at the cost of larger  tables
          with the following generally being true:

              slowest & smallest
                    -Cem
                    -Cm
                    -Ce
                    -C
                    -C{f,F}e
                    -C{f,F}
              fastest & largest


          -C options are not cumulative;  whenever  the  flag  is
          encountered, the previous -C settings are forgotten.

     -Sskeleton_file
          overrides the default skeleton  file  from  which  flex
          constructs its scanners.  You'll never need this option
          unless you are doing flex maintenance or development.


SUMMARY OF FLEX REGULAR EXPRESSIONS

     The patterns in the input are written using an extended  set
     of regular expressions.  These are:

         x          match the character 'x'
         .          any character except newline
         [xyz]      a "character class"; in this case, the pattern
                      matches either an 'x', a 'y', or a 'z'
         [abj-oZ]   a "character class" with a range in it; matches
                      an 'a', a 'b', any letter from 'j' through 'o',
                      or a 'Z'
         [^A-Z]     a "negated character class", i.e., any character
                      but those in the class.  In this case, any
                      character EXCEPT an uppercase letter.
         [^A-Z\n]   any character EXCEPT an uppercase letter or
                      a newline
         r*         zero or more r's, where r is any regular expression
         r+         one or more r's
         r?         zero or one r's (that is, "an optional r")
         r{2,5}     anywhere from two to five r's
         r{2,}      two or more r's
         r{4}       exactly 4 r's
         {name}     the expansion of the "name" definition
                    (see above)
         "[xyz]\"foo"
                    the literal string: [xyz]"foo
         \X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
                      then the ANSI-C interpretation of \x.
                      Otherwise, a literal 'X' (used to escape
                      operators such as '*')
         \123       the character with octal value 123
         \x2a       the character with hexadecimal value 2a
         (r)        match an r; parentheses are used to override
                      precedence (see below)


         rs         the regular expression r followed by the
                      regular expression s; called "concatenation"


         r|s        either an r or an s


         r/s        an r but only if it is followed by an s.  The
                      s is not part of the matched text.  This type
                      of pattern is called as "trailing context".
         ^r         an r, but only at the beginning of a line
         r$         an r, but only at the end of a line.  Equivalent
                      to "r/\n".


         <s>r       an r, but only in start condition s (see
                    below for discussion of start conditions)
         <s1,s2,s3>r
                    same, but in any of start conditions s1,
                    s2, or s3


         <<EOF>>    an end-of-file
         <s1,s2><<EOF>>
                    an end-of-file when in start condition s1 or s2

     The regular expressions listed above are  grouped  according
     to  precedence, from highest precedence at the top to lowest
     at the bottom.   Those  grouped  together  have  equal  pre-
     cedence.

     Some notes on patterns:

     -    Negated character classes match  newlines  unless  "\n"
          (or  an equivalent escape sequence) is one of the char-
          acters explicitly  present  in  the  negated  character
          class (e.g., "[^A-Z\n]").

     -    A rule can have at most one instance of  trailing  con-
          text (the '/' operator or the '$' operator).  The start
          condition, '^', and "<<EOF>>" patterns can  only  occur
          at the beginning of a pattern, and, as well as with '/'
          and '$', cannot be  grouped  inside  parentheses.   The
          following are all illegal:

              foo/bar$
              foo|(bar$)
              foo|^bar
              <sc1>foo<sc2>bar



SUMMARY OF SPECIAL ACTIONS

     In addition to arbitrary C code, the following can appear in
     actions:

     -    ECHO copies yytext to the scanner's output.

     -    BEGIN followed by the name of a start condition  places
          the scanner in the corresponding start condition.

     -    REJECT directs the scanner to proceed on to the "second
          best"  rule which matched the input (or a prefix of the
          input).  yytext and yyleng are  set  up  appropriately.
          Note that REJECT is a particularly expensive feature in
          terms scanner performance; if it is used in any of  the
          scanner's   actions  it  will  slow  down  all  of  the
          scanner's matching.  Furthermore, REJECT cannot be used
          with the -f or -F options.

          Note also that unlike the other special actions, REJECT
          is  a  branch;  code  immediately  following  it in the
          action will not be executed.

     -    yymore() tells  the  scanner  that  the  next  time  it
          matches  a  rule,  the  corresponding  token  should be
          appended onto the current value of yytext  rather  than
          replacing it.

     -    yyless(n) returns all but the first n characters of the
          current token back to the input stream, where they will
          be rescanned when the scanner looks for the next match.
          yytext  and  yyleng  are  adjusted appropriately (e.g.,
          yyleng will now be equal to n ).

     -    unput(c) puts the  character  c  back  onto  the  input
          stream.  It will be the next character scanned.

     -    input() reads the next character from the input  stream
          (this  routine  is  called  yyinput() if the scanner is
          compiled using C++).

     -    yyterminate() can be used in lieu of a return statement
          in  an action.  It terminates the scanner and returns a
          0 to the scanner's caller, indicating "all done".

          By default, yyterminate() is also called when  an  end-
          of-file is encountered.  It is a macro and may be rede-
          fined.

     -    YY_NEW_FILE is an  action  available  only  in  <<EOF>>
          rules.   It  means "Okay, I've set up a new input file,
          continue scanning".

     -    yy_create_buffer( file, size ) takes a FILE pointer and
          an integer size. It returns a YY_BUFFER_STATE handle to
          a new input buffer  large  enough  to  accomodate  size
          characters and associated with the given file.  When in
          doubt, use YY_BUF_SIZE for the size.

     -    yy_switch_to_buffer(   new_buffer   )   switches    the
          scanner's  processing to scan for tokens from the given
          buffer, which must be a YY_BUFFER_STATE.

     -    yy_delete_buffer( buffer ) deletes the given buffer.


VALUES AVAILABLE TO THE USER

     -    char *yytext holds the text of the current  token.   It
          may not be modified.

     -    int yyleng holds the length of the current  token.   It
          may not be modified.

     -    FILE *yyin is the file  which  by  default  flex  reads
          from.   It  may  be  redefined  but doing so only makes
          sense before scanning begins.  Changing it in the  mid-
          dle of scanning will have unexpected results since flex
          buffers its input.  Once scanning terminates because an
          end-of-file   has   been  seen,  void  yyrestart(  FILE
          *new_file ) may be called to  point  yyin  at  the  new
          input file.

     -    FILE *yyout is the file to which ECHO actions are done.
          It can be reassigned by the user.

     -    YY_CURRENT_BUFFER returns a YY_BUFFER_STATE  handle  to
          the current buffer.


MACROS THE USER CAN REDEFINE

     -    YY_DECL controls how the scanning routine is  declared.
          By  default, it is "int yylex()", or, if prototypes are
          being used, "int yylex(void)".  This definition may  be
          changed  by  redefining the "YY_DECL" macro.  Note that
          if you give arguments to the scanning routine  using  a
          K&R-style/non-prototyped function declaration, you must
          terminate the definition with a semi-colon (;).

     -    The nature of how the scanner gets  its  input  can  be
          controlled    by   redefining   the   YY_INPUT   macro.
          YY_INPUT's         calling         sequence          is
          "YY_INPUT(buf,result,max_size)".    Its  action  is  to
          place up to max_size characters in the character  array
          buf  and  return  in the integer variable result either
          the number of characters read or the  constant  YY_NULL
          (0  on  Unix  systems)  to  indicate  EOF.  The default
          YY_INPUT reads from the global file-pointer "yyin".   A
          sample  redefinition  of  YY_INPUT  (in the definitions
          section of the input file):

              %{
              #undef YY_INPUT
              #define YY_INPUT(buf,result,max_size) \
                  { \
                  int c = getchar(); \
                  result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
                  }
              %}


     -    When the scanner  receives  an  end-of-file  indication
          from  YY_INPUT,  it  then checks the yywrap() function.
          If yywrap() returns false (zero), then  it  is  assumed
          that  the  function  has  gone ahead and set up yyin to
          point to another input file,  and  scanning  continues.
          If  it  returns  true (non-zero), then the scanner ter-
          minates, returning 0 to its caller.

          The default yywrap() always returns 1.   Presently,  to
          redefine  it  you  must first "#undef yywrap", as it is
          currently implemented as a macro.  It  is  likely  that
          yywrap()  will  soon be defined to be a function rather
          than a macro.

     -    YY_USER_ACTION can be redefined to  provide  an  action
          which  is  always  executed prior to the matched rule's
          action.

     -    The macro YY_USER_INIT may be redefined to  provide  an
          action which is always executed before the first scan.

     -    In the generated scanner, the actions are all  gathered
          in  one  large  switch  statement  and  separated using
          YY_BREAK, which may be redefined.  By  default,  it  is
          simply  a  "break", to separate each rule's action from
          the following rule's.


FILES

     flex.skel
          skeleton scanner.

     lex.yy.c
          generated scanner (called lexyy.c on some systems).

     lex.backtrack
          backtracking information for -b flag (called lex.bck on
          some systems).

     -lfl library with which to link the scanners.


SEE ALSO

     flexdoc(1), lex(1), yacc(1), sed(1), awk(1).

     M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator


DIAGNOSTICS

     reject_used_but_not_detected undefined or

     yymore_used_but_not_detected undefined -  These  errors  can
     occur  at compile time.  They indicate that the scanner uses
     REJECT or yymore() but that flex failed to notice the  fact,
     meaning that flex scanned the first two sections looking for
     occurrences of these actions and failed  to  find  any,  but
     somehow  you  snuck  some in (via a #include file, for exam-
     ple).  Make an explicit reference to the action in your flex
     input   file.    (Note  that  previously  flex  supported  a
     %used/%unused mechanism for dealing with this problem;  this
     feature  is  still supported but now deprecated, and will go
     away soon unless the author hears from people who can  argue
     compellingly that they need it.)

     flex scanner jammed - a scanner compiled with -s has encoun-
     tered  an  input  string  which wasn't matched by any of its
     rules.

     flex input buffer overflowed -  a  scanner  rule  matched  a
     string  long enough to overflow the scanner's internal input
     buffer  (16K   bytes   -   controlled   by   YY_BUF_MAX   in
     "flex.skel").

     scanner  requires  -8  flag  -  Your  scanner  specification
     includes  recognizing  8-bit  characters  and  you  did  not
     specify the -8 flag (and your site has  not  installed  flex
     with -8 as the default).

     fatal flex scanner internal error--end of  buffer  missed  -
     This  can  occur  in  an  scanner which is reentered after a
     long-jump has jumped out (or over) the scanner's  activation
     frame.  Before reentering the scanner, use:
         yyrestart( yyin );


     too many %t classes! - You managed to put every single char-
     acter  into  its  own %t class.  flex requires that at least
     one of the classes share characters.


AUTHOR

     Vern Paxson, with the help of many ideas and  much  inspira-
     tion from Van Jacobson.  Original version by Jef Poskanzer.

     See flexdoc(1) for additional credits  and  the  address  to
     send comments to.


DEFICIENCIES / BUGS

     Some trailing context patterns cannot  be  properly  matched
     and  generate  warning  messages  ("Dangerous  trailing con-
     text").  These are patterns where the ending  of  the  first
     part  of  the rule matches the beginning of the second part,
     such as "zx*/xy*", where the 'x*' matches  the  'x'  at  the
     beginning  of  the  trailing  context.  (Note that the POSIX
     draft states that the text matched by such patterns is unde-
     fined.)

     For some trailing context rules, parts  which  are  actually
     fixed-length  are  not  recognized  as  such, leading to the
     abovementioned performance loss.  In particular, parts using
     '|'   or  {n}  (such  as  "foo{3}")  are  always  considered
     variable-length.

     Combining trailing context with the special '|'  action  can
     result  in fixed trailing context being turned into the more
     expensive variable trailing context.  For example, this hap-
     pens in the following example:

         %%
         abc      |
         xyz/def


     Use of unput() invalidates yytext and yyleng.

     Use of unput() to push back more text than was  matched  can
     result  in the pushed-back text matching a beginning-of-line
     ('^') rule even though it didn't come at  the  beginning  of
     the line (though this is rare!).

     Pattern-matching  of  NUL's  is  substantially  slower  than
     matching other characters.

     flex does not generate correct  #line  directives  for  code
     internal to the scanner; thus, bugs in flex.skel yield bogus
     line numbers.

     Due to both buffering of input and  read-ahead,  you  cannot
     intermix  calls to <stdio.h> routines, such as, for example,
     getchar(), with flex rules and  expect  it  to  work.   Call
     input() instead.

     The total table entries listed by the -v flag  excludes  the
     number  of  table  entries needed to determine what rule has
     been matched.  The number of entries is equal to the  number
     of  DFA states if the scanner does not use REJECT, and some-
     what greater than the number of states if it does.

     REJECT cannot be used with the -f or -F options.

     Some of the macros, such as  yywrap(),  may  in  the  future
     become  functions which live in the -lfl library.  This will
     doubtless break a lot of  code,  but  may  be  required  for
     POSIX-compliance.

     The flex internal algorithms need documentation.
































Man(1) output converted with man2html