blogme/INTERNALS (htmlized)

Warning: this is an htmlized version!
The original is here, and
the conversion rules are here.
This is the file `INTERNALS' of blogme.
This file describes the internals of blogme2, that are quite different
from the internals of (the old) blogme; blogme2 is MUCH cleaner.
Author:  Eduardo Ochs <edrx@mat.puc-rio.br>
Version: 2005sep09
License: GPL
Site:    <http://angg.twu.net/>



The main tables used by blogme
==============================
 _G: Lua globals (<http://www.lua.org/manual/5.0/manual.html#predefined>)
 _W: blogme words
 _P: low-level parsers
 _A: argument-parsing functions for blogme words
 _AA: abbreviations for argument-parsing functions (see `def')
 _V: blogme variables (see "$" and `withvars')



Blogme words (the tables _W and _A)
===================================
Let's examine an example. When blogme processes:
    [HREF http://foo bar]
it expands it to:
    <a href="http://foo">bar</a>

When the blogme evaluator processes a bracketed expression it first
obtains the first "word" of the brexp (called the "head" of the
brexp), that in this case is "HREF"; then it parses and evaluates the
"arguments" of the brexp, and invokes the function associated to the
word "HREF" using those arguments. Different words may have different
ways of parsing and evaluating their arguments; this is like the
distinction in Lisp between functions and special forms, and like the
special words like LIT in Forth. Here are the hairy details: if HREF
is defined by

  HREF = function (url, str)
      return "<a href=\""..url.."\">"..str.."</a>" end
  _W["HREF"] = HREF
  _A["HREF"] = vargs2

then the "value" of [HREF http://foo bar] will be the same as the
value returned by HREF("http://foo", "bar"), because

  _W["HREF"](_A["HREF"]())

will be the same as:

  HREF(vargs2())

when vargs2 is run the parser is just after the end of the word
"HREF" in the brexp, and running vargs2() there parses the rest of
the brexp and returns two strings, "http://foo" and "bar".

See: (info "(elisp)Function Forms")
and: (info "(elisp)Special Forms")



The blogme parsers (the table _P)
=================================
Blogme has a number of low-level parsers, each one identified by a
string (a "blogme pattern"); the (informal) "syntax" of those blogme
patterns was vaguely inspired by Lua5's syntax for patterns.
(See: <http://www.lua.org/manual/5.0/manual.html#pm>).
In the table below "BP" stands for "blogme pattern".

  BP    Long name/meaning      Corresponding Lua pattern
 -----+----------------------+--------------------------
 "%s" | space char           | "[ \t\n]"
 "%w" | word char            | "[^%[%]]"
 "%c" | normal char          | "[^ \t\n%[%]]"
 "%B" | bracketed expression | "%b[]"
 "%W" | bigword              | "(%w*%b[]*)*" (but not the empty string!)

The low-level parsing functions of blogme are of two kinds (levels):
* Functions in the "parse only" level only succeed or fail. When they
  succeed they return true and advance the global variable `pos'; when
  they fail they return nil and leave pos unchanged (*).
* Functions in the "parse and process" level are like the functions in
  the "parse only" level, but with something extra: when they succeed
  they store in the global variable `val' the "semantic value" of the
  thing that they parsed. When they fail they are allowed to garble
  `val', but they won't change `pos'.
See: (info "(bison)Semantic Values")

These low-level parsing functions are stored in the table `_P', with
the index being the "blogme patterns". They use the global variables
`subj', `pos', `b', `e', and `val'.

An example: running _P["%w+"]() tries to parse a (non-empty) series of
word chars starting at pos; running _P["%w+:string"]() does the same,
but in case of success the semantic value is stored into `val' as a
string -- the comment ":string" in the name of the pattern indicates
that this is a "parse and process" function, and tells something about
how the semantic value is built.

(*): Blogme patterns containing a semicolon (";") violate the
convention that says that patterns that fail do not advance pos.
Parsing "A;B" means first parsing "A", not caring if it succeds or
fails, discarding its semantic value (if any), then parsing "B", and
returning the result of parsing "B". If "A" succeds but "B" fails then
"A;B" will fail, but pos will have been advanced to the end of "A".
"A" is usually "%s*".