Chapa 1)


BlogMe - an extensible language for generating HTML

(2007oct25: This page refers to Blogme2, and is obsolete... The docs for Blogme3 are across this link - they're very messy at the moment, but I'm working on them, and they should be ready in a few days (top priority this time, really).)

(2007apr18: Hey! The rest of this page refers to BlogMe2, that is obsolete... I just finished rewriting it (-> BlogMe3), but I haven't had the time yet to htmlize its docs...)

(2005sep28: I wrote this page in a hurry by htmlizing two of blogme's documentation files, README and INTERNALS, which are not very clean...)

See also the entry about BlogMe in my page about little languages.

Quick index:


The "language" that blogme2.lua accepts is extensible and can deal with input having a lot of explicit mark-up, like this,

[HLIST2 Items:
  [HREF http://foo/bar a link]
  [HREF http://another/link]
  [IT Italic text]
  [BF Boldface]

and conceivably also with input with a lot of implicit mark-up and with control structures, like these examples (which haven't been implemented yet):

  Tuesday, February 15, 2005

  I usually write my notes in plain text files using Emacs; in
  these files "["s and "]"s can appear unquoted, urls appear
  anywhere without any special markup (like http://angg.twu.net/)
  and should be recognized and htmlized to links, some lines are
  dates or "anchors" and should be treated in special ways, the
  number of blank lines between paragraphs matter, in text
  paragraphs maybe _this markup_ should mean bold or italic, and
  there may be links to images that should be inlined, etc etc

[IF LOCAL==true
    [INCLUDE todo-list.blogme]

BlogMe also support executing blocks of Lua code on-the-fly, like this:

   -- We can put any block of Lua code here
   -- as long as its "["s and "]"s are balanced.

How the language works

BlogMe's language has only one special syntactical construct, "[...]". There are only have four classes of characters "[", "]", whitespace, and "word"; "[...]" blocks in the text are treated specially, and we use Lua's "%b[]" regexp-ish construct to skip over the body of a "[...]" quickly, skipping over all balanced "[]" pairs inside. The first "word" of such a block (we call it the "head" of the block) determines how to deal with the "rest" of the block.

To "evaluate" an expression like

[HREF http://foo/bar a link]

we only parse its "head" - "HREF" - and then we run the Lua function called HREF. It is up to that function HREF to parse what comes after the head (the "rest"); HREF may evaluate the []-expressions in the rest, or use the rest without evaluations, or even ignore the rest completely. After the execution of HREF the parsing resumes from the point after the associated "]".

How []-expressions are evaluated

Actually the evaluation process is a bit more subtle than than. In the last example, BlogMe doesn't just execute HREF(); it uses an auxiliary table, _A, and it executes:


_A["HREF"] returns a function, vargs2, that uses the rest to produce arguments for HREF. Running vargs2() in that situation returns

"http://foo/bar", "a link"

and HREF is called as HREF("http://foo/bar", "a link"). So, to define HREF as a head all we would need to do ("would" because it's already defined) is:

HREF = function (url, text)
    return "<a href=\""..url.."\">"..text.."</a>"
_A["HREF"] = vargs2

Defining new words in Lua with def

Defining new heads is so common - and writing out the full Lua code for a new head, as above, is so boring - that there are several tools to help us with that. I will explain only one of them, "def":

def [[ HREF 2 url,text  "<a href=\"$url\">$text</a>" ]]

"def" is a lua function taking one argument, a string; it splits that string into its three first "words" (delimited by blanks) and a "rest"; here is its definition:

restspecs = {
  ["1"]=vargs1,    ["2"]=vargs2,    ["3"]=vargs3,    ["4"]=vargs4,
  ["1L"]=vargs1_a, ["2L"]=vargs2_a, ["3L"]=vargs3_a, ["4L"]=vargs4_a
def = function (str)
    local _, __, name, restspec, arglist, body =
      string.find (str, "^%s*([^%s]+)%s+([^%s]+)%s+([^%s]+)%s(.*)")
    _G[name] = lambda(arglist, undollar(body))
    _A[name] = restspecs[restspec] or _G[restspec]
      or error("Bad restspec: "..name)

The first "word" ("name") is the name of the head that we're defining; the second "word" ("restspec") determines the _GETARGS function for that head, and it may be either a special string (one of the ones registered in the table "restspecs") or the name of a global function.

The internals of blogme2.lua:

The main tables used by the program

  • _G: Lua's table of globals
  • _W: blogme words
  • _P: low-level parsers
  • _A: argument-parsing functions for blogme words
  • _AA: abbreviations for argument-parsing functions (see `def')
  • _V: blogme variables (see "$" and `withvars')

Blogme words (the tables _W and _A)

(Source code: the function `run_head', at the end of blogme2-inner.lua.)

Let's examine an example. When blogme processes:

[HREF http://foo bar]

it expands it to:

<a href="http://foo">bar</a>

When the blogme evaluator processes a bracketed expression it first obtains the first "word" of the brexp (called the "head" of the brexp), that in this case is "HREF"; then it parses and evaluates the "arguments" of the brexp, and invokes the function associated to the word "HREF" using those arguments. Different words may have different ways of parsing and evaluating their arguments; this is like the distinction in Lisp between functions and special forms, and like the special words like LIT in Forth. Here are the hairy details: if HREF is defined by

HREF = function (url, str)
    return "<a href=\""..url.."\">"..str.."</a>" end
_A["HREF"] = vargs2

then the "value" of [HREF http://foo bar] will be the same as the value returned by HREF("http://foo", "bar"), because


will be the same as:


when vargs2 is run the parser is just after the end of the word "HREF" in the brexp, and running vargs2() there parses the rest of the brexp and returns two strings, "http://foo" and "bar".

See: (info "(elisp)Function Forms")
and: (info "(elisp)Special Forms")

The blogme parsers (the table _P)

(Corresponding source code: most of blogme2-inner.lua.)

Blogme has a number of low-level parsers, each one identified by a string (a "blogme pattern"); the (informal) "syntax" of those blogme patterns was vaguely inspired by Lua5's syntax for patterns. In the table below "BP" stands for "blogme pattern".

BP    Long name/meaning      Corresponding Lua pattern
 "%s" | space char           | "[ \t\n]"
 "%w" | word char            | "[^%[%]]"
 "%c" | normal char          | "[^ \t\n%[%]]"
 "%B" | bracketed expression | "%b[]"
 "%W" | bigword              | "(%w*%b[]*)*" (but not the empty string!)

The low-level parsing functions of blogme are of two kinds (levels):

  • Functions in the "parse only" level only succeed or fail. When they succeed they return true and advance the global variable `pos'; when they fail they return nil and leave pos unchanged (*).
  • Functions in the "parse and process" level are like the functions in the "parse only" level, but with something extra: when they succeed they store in the global variable `val' the "semantic value" of the thing that they parsed. When they fail they are allowed to garble `val', but they won't change `pos'.

See: (info "(bison)Semantic Values")

These low-level parsing functions are stored in the table `_P', with the index being the "blogme patterns". They use the global variables `subj', `pos', `b', `e', and `val'.

An example: running _P["%w+"]() tries to parse a (non-empty) series of word chars starting at pos; running _P["%w+:string"]() does the same, but in case of success the semantic value is stored into `val' as a string -- the comment ":string" in the name of the pattern indicates that this is a "parse and process" function, and tells something about how the semantic value is built.

(*): Blogme patterns containing a semicolon (";") violate the convention that says that patterns that fail do not advance pos. Parsing "A;B" means first parsing "A", not caring if it succeds or fails, discarding its semantic value (if any), then parsing "B", and returning the result of parsing "B". If "A" succeds but "B" fails then "A;B" will fail, but pos will have been advanced to the end of "A". "A" is usually "%s*".


(To do: write this stuff, organize.)


There is no .tar.gz yet (coming soon!).

Help needed

Lua seems to be quite popular in the M$-Windows world, but I haven't used W$ for anything significative since 1994 and I can't help with W$-related questions. If you want to try BlogMe on W$ then please consider writing something about your experience to help the people coming after you.


A BlogMe mode for emacs and a way to switch modes quickly (with M-m).

A note on usage (see the corresponding source code):

blogme2.lua -o foo.html -i foo.blogme

This behaves in a way that is a bit unexpected: what gets written to foo.html is not the result of "expanding" the contents of foo.blogme - it's the contents of the variable blogme_output. The function (or "blogme word") htmlize sets this variable. Its source code is here.

History: BlogMe is the result of many years playing with little languages; see this page. BlogMe borrowed many ideas from Forth, Tcl and Lisp.

How to get in touch with the author.