Warning: this is an htmlized version!
The original is here, and
the conversion rules are here.
These notes are the very bare beginnings of a technical report. I felt
that it would be immoral to keep them to myself until I could publish
them, as that could take months or years; so, here they are. Enjoy,
and please get in touch if you have any comments.

  Eduardo Ochs

Bootstraping a Forth-like language in 50 lines of Lua code

If we define a Forth-like language as being one in which the
interpreter parses a word, executes it immediately and repeats the
process indefinitely, then the code below is an implementation of a
Forth-like language:

  res = {}
  re = function( res, name, def )
    if res[name] then return res[name] end
    res[name] = regex(def)
    return res[name]
  re(res, "getline",   "^([^\n]*)(\n?)")
  re(res, "getspaces", "^([^ \t]*)")
  re(res, "getword",   "^[ \t]*([^ \t\n]*)")

  program = {}
  program.string = readfile(arg[1])
  program.pos = 0

  getword = function( )
    local _, mall, m1 = regmatch(res.getword, program.string, program.pos)
    program.pos = program.pos + strlen(mall)
    return m1
  getline = function( )
    local _, mall, m1, nl = regmatch(res.getline, program.string, program.pos)
    program.pos = program.pos + strlen(mall)
    if mall ~= "" then return m1 end
  getuntilre = function( delimre )
    local offset, mdelim =
      regmatch(re(res, delimre, delimre), program.string, program.pos)
    local m1 = strsub(program.string, program.pos+1, program.pos+offset)
    program.pos = program.pos+offset+strlen(mdelim)
    return m1

  dict = {}
  dict[""] = function( ) getline() end
  dict["lua-until"] = function( )

  while 1 do

The last block is the main loop, that parses a word with getword(),
converts it to a function by looking it up in a dictionary, and
executes the function; the second-to-last block defines the two only
words with which the dictionary starts: "", that is executed every
time the parser reaches an end of line, and that simply advances the
parser pointer (that is stored in program.pos) past the end-of-line
char, and "lua-until", that parses a string until a certain delimiter
and evaluates that string as Lua code; the idea is that we can use
that code to add more words to dictionary, to replace the interpreter
main loop by something else, or whatever; thus, "lua-until" is
essentially all what is needed to bootstrap a more powerful system.

The execution of lua-until is a bit tricky, so let's see it in detail.
Consider the following miniforth program:

  lua-until EOL
  this is not executed

The meaning of "lua-until" is given by

  dict["lua-until"] = function( )

so the execution of lua-until in the block above consists on parsing a
word ("EOL", in that case), then running getuntilre("EOL") to parse
everything up to its next occurrence -- getuntilre("EOL") will return
the string '\n print("Hello")\nexit()\n' -- and evaluating that with
dostring, which will print "Hello" and leave miniforth. Note that the
parser won't ever touch what comes after the second EOL -- the "this
is not executed".

This is an example of a slightly less trivial miniforth program in
which the lua-until block is used to define two new words:

  lua-until EOL
    dict["hello"] = function( ) print("hello") end
    dict["bye"]   = exit

This is another one, in which we define two words that parse the
following words themselves (actually `#' parses all the rest of the
current line). Note that `p' evaluates the word as Lua code, and so it
is fairy versatile; "p exit()", for example, leaves miniforth.

  lua-until EOL
    dict["p"] = function( ) pa(eval(getword())) end
    dict["#"] = getline
  p "Hello"  p 1+2  p dict  # comment
  p exit()

and this is the classical ": square dup * ; : cube dup square * ;"
example -- but without bytecodes.

  lua-until EOL
    dstack = {}
    rstack = {program}
    dpush = function( val )  tinsert(dstack, 1, val) end
    dpop  = function( )      return tremove(dstack, 1) end
    rpush = function( prog ) tinsert(rstack, 1, prog); program = prog end
    rpop  = function( )      tremove(rstack, 1); program = rstack[1] end
    dict[""] = function( )
        if program.pos == strlen(program.string) then rpop() end

    f = function( code ) rpush({string=code, pos=0}) end

    re(res, ";;", "[ \t\n];;([ \t\n]|$)")
    dict["::"] = function( )
        local word, code = getword(), getuntilre(";;")
        dict[word] = function( ) f(%code) end
    dict["::lua"] = function()
        local word, code = getword(), getuntilre(";;")
        dict[word] = dostring(format("return function() %s\nend", code))

  ::lua * dpush(dpop()*dpop()) ;;
  ::lua dup dpush(dstack[1]) ;;
  ::lua . pa(dpop()) ;;
  ::lua val dpush(eval(getword())) ;;

  :: square dup * ;;
  :: cube dup square * ;;
  val 5 cube .

  val exit()

# (find-fline "~/miniforth/")
# (find-fline "~/miniforth/miniforth1.lua")