Flex: inheriting from its base class
by mark | 12 Nov 2022, 1:45 p.m.
Updated by mark | 13 Nov 2022, 8:16 a.m.
As mentioned earlier I am building a qif parsing thing so I can force various financial institutions' data into a semi coherent whole. There is some documentation available on getting Flex to behave with C++ but it is scant and mostly in contradictory stack overflow posts. So here are some pointers.
Flex Options
The declarations section of flex lets you specify %option
s. If you are building a C++ lexing class you already have a class name in mind for your scanner. I landed on the following layout:
%option c++ noyywrap outfile="Scanner.cpp" noyylineno batch yyclass="qif::Scanner" prefix="qif"
What this does, one at a time:
c++
: Says this is a C++ scanner. The classic C interface reaches back to the 1970s which I am not interested in.noyywrap
: flex offers the ability to read multiple input files one at a time through a yywrap() call. This is C++; you create objects one at a time, one for each input stream; you don't want flex to wrap inputs too.outfile
: This one just sets what flex should call its generated scanner source, instead of the legacy 1970s default it will use otherwisenoyylineno
: You do want line numbers. However you get better features using bison locations instead of flex positioning. So we disable flex's line counts here.batch
: flex can do interactive scanners if you are going to be responding to a command line. But I am not. So I put it into batch mode for a marginal performance gain.yyclass
: I want the output class to be called something specific.prefix
: I want the base class from which we inherit to be called qifFlexLexer, not yyFlexLexer, in case I define multiple scanners in the same project
Flex Includes
You need to produce, yourself, two include files. This is because the flex system header #include <FlexLexer.h> is designed to be included multiple times to build multiple classes. This leads to very difficult situations if you don't take great care around where and when you include stuff. So I've figured it out.
First file you make is called Scanner.hpp (or whatever, aligned with outfile). This is very simple:
#pragma once #define yyFlexLexer qifFlexLexer #include <FlexLexer.h> #undef yyFlexLexer #include "Scanner-Internal.hpp"
That's it. The other internal file referred to in the header is more complicated:
#pragma once #include "Parser.hpp" namespace qif { class Scanner : public qifFlexLexer { public: Scanner(std::istream& arg_yyin, std::ostream& arg_yyout) : qifFlexLexer(arg_yyin, arg_yyout) {} Scanner(std::istream* arg_yyin = nullptr, std::ostream* arg_yyout = nullptr) : qifFlexLexer(arg_yyin, arg_yyout) {} int lex(Parser::semantic_type *yylval, Parser::location_type *yyloc); }; }
Of course this depends on a bison created file so actually compilling this will be a bit of a pain. You can create a stub bison file to produce the bison header so it compiles for now, or you can use the base lex definition for now (i.e. don't specify int lex() in the class). The important things about the internal scanner header are:
- This inherits from qifFlexLexer, and the FlexLexer file itself can be included again as it doesn't have any include guards
- The lex function is set up to be useful, in that you can use it to communicate with Bison, both in terms of lexemes and locations. There is a nasty chicken and egg situation as you need the parser to generate parser include file, but you can't parse without first lexing...
The two files have the following purpose. The main one is included whenever you need to refer to the scanner. This will ensure the system wide FlexLexer.h file is brought in exactly once per scanner (with the right defines) per translation unit so everything is OK. The internal one is needed so that you can include it in your flex file directly, as this already includes the system wide header and you can't get around it.
If you keep my copy above you'll need to add the following into the prologue of your .l file:
#undef YY_DECL #define YY_DECL int qif::Scanner::lex(qif::Parser::semantic_type* const yylval, qif::Parser::location_type* yyloc)
YYDECL
is by default a simple int function. Need it to match what we have declared. Of course the default 1970s definition is still there so you need to undefine it first.
That is the first of several flex related headaches. There will be others. I'll talk about newlines next time. They need a little bit of care.
Back to all articles