/*
 * descrip
 * 95 jun 28
 *
 * an extract from a letter I sent describing Crim;
 * might be a good introduction.
 *

  I've created an alternative inner interpreter that allows for much
smaller code and it's easier to learn than the traditional ones; in
fact, I have teached it successfully to people that never programmed
before in about a half-hour. Its main characteristics are:

  * CFAs (I call them "heads") are only one byte long.

  * A word may have many heads (I borrowed this idea from HS-Forth; I
still don't know where it appeared first). For example, HEX 99 VALUE A
may compile this:

  +--+--+--+--+--+--+--+
  |05|04|03|  00000099 |
  +==+==+==+--+--+--+--+
  09 0A 0B

    Then, calling the word that starts at 0B (the one with 03 as head)
fetches the value of A onto the stack; calling 0A stores the last
value on the stack into A, and calling 09 puts the address of the data
region of A (0C) on the stack.

  * The inner interpreter becomes a state machine, with at least two
states (more about them soon). We consider that it always interprets
what it finds at IP, using the state to decide its meaning: In
"threaded" state, it reads either one byte, in the case of the short
built-in instructions, or two bytes, that are interpreted as an
address to be called; in "head" state, it reads one byte and executes
some corresponding action. For example, a head of 00 just advances IP
and changes the state to "threaded". The engine is written as to be
easy to have new head actions, new short instructions and new states
added. In particular, it should be easy to add a state that acts
exactly like your favourite high-performance inner interpreter,
another that runs, say, SPITBOL threaded code, and so on.

  * To simplify the creation of words that read the data that follow
their calls, like <."> and 0BRANCH, we add a third stack, called the
"streams stack", or S-stack. Instead of creating <."> directly, we
first create S<.">, that reads the lenght of a string and then the
string itself from the address at the top of the S-stack, instead of
from some position at the R-stack. Then we define <."> as being the
"RSR'ing" version of S<.">, that is:

    1) move the address past the call from the R-stack to the S-stack,

    2) place the address of an S-EXIT at the R-stack, so that S<.">
will execute that S-EXIT after it returns; then the S-EXIT will
transfer the execution to the address at the S-stack, making we return
to the position after the string.

    3) execute S<.">; it will return to an S-EXIT, doing the action
described above.

    It sounds messy at first, but, anyway, it's an advanced feature,
and it can unify many types of parsing that are usually kept apart.


  I've done a working prototype, using PFE on Linux. The inner engine
and some related words are implemented in C as user-defined
primitives, in less than 200 lines; all dumping, debugging,
single-stepping and compiling tools are implemented in forth, again in
less that 200 lines. The syntax for compiling a program for the engine
is like this:

  : S@U1, ( s // -- s+1 // 0..255 )
      S> DUP C@ SWAP 1+ >S ;
  : S@S1, ( s // -- s+1 // -128..127 ) S@U1, 128 SIGN-EXTEND ;

  : SBRANCH ( s // -- s+1+displ // ) S@S1, S> + >S ;

              ' DUP     %-> %DUP      \ "herits" DUP from PFE
              ' *       %-> %*
  RSR BRANCH  ' SBRANCH %-> %SBRANCH  \ two words glued into a word
                                      \ with two heads
  %: %SQUARE  %DUP %* %;

  I'm working on a version with normal Forth syntax, but it's not
ready yet. The present version, that is meant to expose the ideas and
to attrach collaborators, is public domain and is available for
anonymous FTP at saci.mat.puc-rio.br (139.82.27.51), directory
/pub/crim, file crim000.tgz. The name "Crim" is just a short for
"Crimble Crumble", a character in one of Syd Barrett's songs, and have
little or no relation with either "Cream" or "Crime".