The Scanner Class

Setting up the Scanner Class

The usage pattern of scanner library is very similar to the usage pattern of the parser library (Setting up the Parser Class).

To make a C++ class for the parser, you should have a grammar file that looks like

    %{ 
    ... // Normal declaration stuff goes here 
    #define YLMM_SCANNER_CLASS scanner
    #include <ylmm/lexmm.hh> 
    %}
    ...// The rest is as usual 
The header ylmm/lexmm.hh redefines the macros YY_FATAL_ERROR(msg) and YY_INPUT(buf,res,max) to forward calls to the user defined class.

The file ylmm/yaccmm.hh defines the interface to the generated C function via a C++ class, where YLMM_SCANNER_CLASS is the name of the user scanner class. The developer can either specify this as a specific instantation of ylmm::basic_scanner, or it can be a sub-class of specific instantation of ylmm::basic_scanner.

See also the example simple_scanner.ll for and example of usage.

Example Lexical Definition

The Lex input file is basically not altered by the use of this package. However, the static variable _scanner is defined as a pointer to the user scanner class, and can be used in the definition to process the read tokens.

First we setup the usual stuff (see Setting up the Scanner Class).

%{
#include "simple_scanner.hh"
#ifdef HAVE_CONFIG_H
# include "config.hh"
#endif
#define YLMM_SCANNER_CLASS simple_scanner
#define LEXDEBUG 1
#include <ylmm/lexmm.hh>
%}

Then, we define some short-hand regular expression we'll use later on

%%

And finally, we make the lexical rules. Note, that the static variable _scanner already has the right type.

[0-9]+       { return _scanner->integer(yytext, yyleng); }
"\n"         { return _scanner->newline(yytext, yyleng); }
"quit"       { return 0;                                 }
.            { if (yytext[0] == EOF) return 0;           }

%%

The Scanner Class

Now, we need to make our scanner class.
/** Example of a scanner class
    @ingroup simple
*/
class simple_scanner : public ylmm::basic_scanner<int>
{
public:
  /** Constructor 
      @param buf The buffer to read from. */
  simple_scanner(ylmm::basic_buffer* buf=0) 
    : ylmm::basic_scanner<int>(buf) 
  { 
    _current->auto_increment(true); 
  }
  /** Destructor */
  virtual ~simple_scanner() {}
  /** Send a message to the user
      @param text The text to write.
      @param len  The length of @a text */
  void output(const char* text, int len) {
    if (_messenger)
      _messenger->error("At %d,%d: unrecognised token '%s'\n",
                 _current->line(), _current->column(), text);
  }
  /** Send a message to the user
      @param t The character to write.*/
  void output(const char t) {
    if (_messenger)
      _messenger->error("At %d,%d: unrecognised token '%c'\n",
                 _current->line(), _current->column(), t);
  }
  /** Process a integer literal 

And finally, we define the member functions to process the various lexical tokens and pass the identifier on to the parser. In the member functions integer and floating, we use the data member ylmm::basic_scanner::_err_stream to output errors. The idea is that we should setup this stream to be the same as used elsewhere, so that all error messages can be handled the same.

      @param str The integer literal as a string
      @param len The length of the string
      @return integer literal token ID number @c NUM */
  int integer(const char* str, int len) 
  { 
    std::stringstream s(str);
    s >> _token;
    return NUM;
  }
  /** Process a newline
      @param str The new line string
      @param len The length of @a str
      @return The newline token ID @c NEWLINE */
  int newline(const char* str, int len) { return NEWLINE; }
}; 

The data member _current is defined in the base class ylmm::basic_scanner . It's a pointer to a ylmm::basic_buffer. That class represents an input stream to the lexical scanner. The member functions ylmm::basic_buffer::increment_column and ylmm::basic_buffer::increment_line are used to track the input position of scanner, so that the parser may use that information in error messages (see the Bison documentation).

Top of page
Christian Holm (home page)
Last update Fri Jul 8 12:58:03 2005
Created by DoxyGen 1.4.3-20050530