-- Some notes for a text on blogme3, that will be implemented as an -- extension of miniforth. -- Links: -- http://angg.twu.net/blogme3.html -- http://angg.twu.net/miniforth/blogme3.txt -- http://angg.twu.net/miniforth/blogme3.txt.html -- http://angg.twu.net/miniforth/miniforth3.lua -- http://angg.twu.net/littlelangs.html def [[ * 2 a,b return a*b ]] def [[ + 2 a,b return a+b ]] [* [+ 1 2] [+ 3 4]] [if ...] A more precise description ========================== The core of Blogme is made of a parser that recognizes a very simple language, and an interpreter coupled to the parser; as the parser goes on processing the input text the interpreter takes the outputs of the parser and interprets these outputs immediately. This core engine should the thought as if it had layers. At the base, a (formal) grammar; then functions that parse and recognize constructs from that grammar; then functions that take what the parser reads, assemble that into commands and arguments for those commands, and execute those commands. I think that the best way to describe Blogme is to describe these three layers and the implementation of the top two layers - the grammar layer doesn't correspond to any code. Looking at the actual code of the core is very important; the core is not a black box at all - the variables are made to be read by and changed by user scripts, and most functions are intended to be replaced by the user eventually, either by less simplistic versions with more features, or, sometimes, by functions only thinly connected to the original ones. Influences and rationale ======================== I know that it sounds pretentious to say that, but it's true... Blogme descends from three important "extensible" programming languages - Forth, Lisp, and Tcl - and from several Blogme was inspired The design of Blogme was inspired mainly by _ borrows some of its ideas from Forth, Lisp, and Tcl. (1) Forth. This is a Forth program that prints "3 Hello20": 1 2 + . ." Hello" 4 5 * . Forth reads one word at a time and executes it immediately (sometimes it "compiles" the word instead of running it, but we can ignore this now). `.' is a word that prints the number at the top of the stack, followed by a space; `."' is a word that prints a string; it's a tricky word because it _interferes on the parsing_ to get the string to be printed. I've always thought that this permission to interfere on the parsing was one of Forth's most powerful features, and I have always thought about how to implement something like that - maybe as an extension - on other languages. So - the Forth interpreter (actually the "outer interpreter" in Forth's jargon; the "inner interpreter" is the one that executes bytecodes) reads the word `."', and then it calls the associated code to execute it; at that point the pointer to the input text - let's call it "pos" - is after the space after the `."', that is, at the `H'; the code for `."' advances pos past the `Hello"' and prints the "Hello", after that the control returns to the outer interpreter, who happilly goes on to interpret "4 5 * .", without ever touching the 'Hello"'. (2) Lisp. In Lisp all data structures are built from "atoms" (numbers, strings, symbols) and "conses"; a list like (1 2 3) is a cons - a pair - holding the "first element of the list", 1, and the "rest of the list", which is the cons that represents the list (2 3). Trees are also built from conses and atoms, and programs are trees - there is no distinction between code and data. The Lisp parser is very simple, and most of the semantics of Lisp lies in the definition of the "eval" function. The main idea that I borrowed from Lisp's "eval" is that of having two kinds of evaluation strategies: in (* (+ 1 2) (+ 3 4)) the "*" is a "normal" function, that receives the _results_ of (+ 1 2) and (+ 3 4) and returns the result of multiplying those two results; but in (if flag (message "yes") (message "no")) the "if" is a "special form", that receives its three arguments unevaluated, then evaluates the first one, "flag", to decide if it is going to evaluate the second one or the third one. (3) Tcl. In Tcl the main data structure is the string, and Tcl doesn't even have the distinction that Lisp has between atoms and conses - in Tcl numbers, lists, trees and program code are just strings that can be parsed in certain ways. Tcl has an evaluation strategy, given by 11 rules, that describes how to "expand", or "substitute", the parts of the program that are inside ""s, []s, and {}s (plus rules for "$"s for variables, "#"s for comments, and a few other things). The ""-contexts and []-contexts can nest inside one another, and what is between {}s is not expanded, except for a few backslash sequences. In a sense, what is inside []s is "active code", to be evaluated immediately, while what is inside {}s is "passive code", to be evaluated later, if at all. Here are some examples of Tcl code: set foo 2+3 set bar [expr 2+3] puts $foo=$bar ;# Prints "2+3=5" proc square {x} { expr $x*$x } puts "square 5 = [square 5]" ;# Prints "square 5 = 25" Blogme descends from a "language" for generating HTML that I implemented on top of Tcl in 1999; it was called TH. The crucial feature of Tcl on which TH depended was that _in ""-expansions the whitespace is preserved, but []-blocks are evaluated_. TH scripts could be as simple as this: htmlize {Title of the page} { [P A paragraph with a [HREF http://foo/bar/ link].] } but it wasn't hard to construct slightly longer TH scripts in which a part of the "body of the page" - the second argument to htmlize - would become, say, an ASCII diagram that would be formatted as a
...
block in the HTML output, keeping all the whitespace that it had in the script. That would be a bit hard to do in Lisp; _it is only trivial to implement new languages on top of Lisp when the code for programs in those new languages is made of atoms and conses_. I wanted something more free-form than that, and I couldn't do it in Lisp because the Lisp parser can't be easily changed; also, sometimes, if a portion of the source script became, say, a cons, I would like to be able to take this cons and discover from which part of the source script that cons came... in Blogme this is trivial to do, as []-blocks in the current Blogme scripts are represented simply by a number - the position in the script just after the "[". The Lisp parser can't be easily changed to