Blog postings
I didn't do it
I said some time ago I'd write about how I got line endings working. This is that post.
There are three problems with Flex newlines:
- Windows
- Line counts
- Posix line definition
Windows
Windows isses come from the fact that Flex is primarily a Unix tool so expects lines to end \n
. But if you read Windows files you get \r\n
. You can possibly get \r
or even \n\r
if you read files from old Macs or old BBC Micros, somehow. The Flex issue is then telling it how to handle Windows line endings. It is actually straightforward. You have a rule
(\n|\r\n?)
which will catch Unix line endings, Windows line endings or old Mac line endings. I last used a BBC Micro in 2001 as part of a sixth year project (they had a pH probe that was connected to a BBC micro), so I am not interested in reading those files. But it could be adapted. In practice Unix and Windows line endings are all you need. The only other one in common use is mainframe but you won't likely see one of these files as a normal person, because the people that own them go to great lengths to hide EBCDIC from you.
An alternative is just to consume \r
with an empty action and let the trailing \n
drive any actions. But I have a small handful of mac files I need to deal with. Finally you can have two versions of your program for the different line endings. QIFs are just plain ASCII so the control codes only appear where they should.
Line counts
This is probably the nastiest part of the whole thing as you really need to update the column count as well. If you are not insane you are using Bison's complete symbols which provide a location type. So "all" you have to do is update this with every token. And if you are using complete symbols you define your own parsing context type that is passed to the lexer. This means you can access the context (i.e., the embedded location object) in every action. Phew!
Your context object will look like this:
#pragma once #include "location.hh" namespace my_ns { class Context // again more like a struct { public: Context() : done(false) { loc.initialize(); } bool done; // set to true at EOF std::unique_ptr node; // each lexeme is actually a tree node location loc; // bison provided location class }; }
and your lexer call declarion for flex will be
#define YY_DECL my_ns::Parser::symbol_type my_ns::Scanner::lex(my_ns::Context *context)
You will need a custom user action that ensures consumed characters are reflected in the location; this goes immediately after the preceding
#define YY_USER_ACTION context->loc.step(); context->loc.columns(yyleng);
and finally your new line reader updates the line counter when it is touched:
(\n|\r\n?) { context->loc.lines(); return my_ns::Parser::make_ENDL(context->loc); }
Phew. The YY_USER_ACTION
gets performed for every match so that, even if you discard characters, the location is correct. You have to be very careful with newline matching; anything that could possibly match a newline isn't good enough, your patterns either always contain newlines or they never contain newlines.
Posix
Posix declares that every line ends with a newline. In practice, many last lines end with an EOF not a newline. What do? Flex lets you match an EOF and you can use a push parser approach so that a raw EOF is always turned into a newline and an EOF. You can track whether the last thing was a newline in the context. But in reality it is better handled in the grammar. If there are only a small number (possibly one or two) tokens that a legally constructed file, modulo terminal newline, can end on then you can use the Bison pseudotoken YYEOF
to say "EOF acceptable". For example, QIF files always end in a record separator ^
but this is not always a well formed POSIX line. So the bison rule is
separator_row: SEPARATOR ENDL | SEPARATOR YYEOF;
where separator is just the ^
character. This means a file can end in any number (including zero) of newlines (including windows new lines) as long as the last record includes a record separator at the end. Injecting optional newlines via flex into the token stream can be done but it is very irrating and ugly. Bison does the right thing with end of files (i.e. it'll reduce as much as it can), so you can generally forget about the end of file marker except if you need to catch optional end of lines.
The context class above contains a bit that can be used to tell Bison that the EOF flex returned is an actual EOF if you are doing something interesting like an interpreter (where you use EOFs to capture line endings, making Bison reduce the input and execute it). It is easy to set this. It is already initialised to false, so to make it true:
<<EOF>> { context->done = true; return my_ns::Parser::make_YYEOF(context->loc); }
What happens in interpreters is that newlines are used to make a YYEOF
which causes bison to reduce the input (this almost always means evaluate the syntax tree) and return from yyparse()
. The thing that called yyparse()
will examine the context to see if flex actually reached the end of input or not. If it is just a newline and not a "real" EOF the calling routine loops.
This is what command line interpreters really do - basically loop over a fancy tree building and reduction routine. The context can hold symbol tables etc which allows things to persist between bison calls. Neat!
Feels like winter is drawing to a close. But what about second winter?
In Norway there are mountain huts you are allowed to just use. Similar to bothies in the UK, but while bothies are in very remote places to stop people doing arson on them you can get to et hytte reasonably easily. I went to the one near Fister. There is a waymarked path up the mountain behind the school and you basically keep going up. Quite steep but also only a few km and the views are wonderful.
Here are some images.
The hut itself. Plonked on top of a mountain. Some of them are delivered by helicopter in sections and put together. Note the frost clinging to things, it was quite cold.
First hints of the view keeking over the shoulder of the hut. In summer I can imagine this gets rammed with people drinking 39NOK beers
I walked up to the edge and took this. This is the village / townlet of Fister north of Stavanger. View is just so nice. At the top they have a wee book you can fill in to say "I was here". It's in a sealed metal trap thing with a pen. "Jeg kommer fra Skottland til her".
Now looking towards the sun but still over Fister (just). This was wonderful. Very calming. Much relax. Little heat.
The hut had a fireplace in it. We brought some logs and immolated them in the approprirate place. You'd need to get through maybe six or seven logs to fully warm the place up. We only brought three which did take the edge off but it was still cold inside. I just find it astonishing that the huts are allowed to exist; someone would wreck them in the UK.
Spending a week near Stavanger. Here are some snaps
above is the view ul a fjord. It's nice.
Stavanger at night
And again
Finally the view of the fjord again