/*
 * ETC
 *   95 jun 27
 *
 * I've put in this file all information that didn't fit anywhere else.
 *

The basic ideas:

  1) The virtual machine has a "state" that specifies how it will
execute what it finds at *ip. In this way we can unify the notions of
executing the head of a word (I think it's called a "CFA" in
traditional terminology), and that of deciding if the code is a short
instruction or a function call and executing it. That makes the
machine less intrincate and more extensible, but probably slower.

  2) Word heads are one-byte long, and a word may have many heads. I
took the idea of words with many heads from HS-Forth.

  3) We have a third stack, the streams stack, that is used both for
general parsing and for parsing immediate arguments, like the string
following a <.">. See example 2.


The files:

ETC          - this file
letter       - a description of Crim (taken from a letter I sent)
announce     - the short announce that appeared on comp.lang.forth

crim.c       - the inner interpreter and some auxiliary words
crimcomp.4th - used to enter Crim programs
autodoc.4th  - used to make "automatic documentation" for the examples

patchpfe     - instructions to link crim.c with PFE

1.4th        - first example - version that just compiles and runs
1-autod.4th  - long version, used to produce 1.aud
1.aud        - docs for the first example

2.4th        - second example
2-autod.4th
2.aud


------ Example 1 - how it would be in Forth -------

  4 VALUE A
  : SQUARE DUP * ;
  : ^2+6A SQUARE 6 A * + ;

  5 ^2+6A                  \ the result is 49


------ Example 2 - approximate translation, ignoring the weird features -------

  : CLASSIFY ( c -- )  TO CASE-V
      CASE-V ASCII 0 ASCII 9 IN[,] IF ." digit" ELSE
      CASE-V BL = IF ." space" ELSE
      ." not digit or space" THEN THEN ;


------ Example 2 - a more faithful translation, but using -------
------ some words you'll have to guess what they do       -------

RSR CZ=    : SCZ= ( a // -- a+2|a+2+j // ) CASE-V S@U1, = S0BRANCH ;
RSR CZ[,]  : SCZ[,] ( a // -- a+3|a+3+j // )
               CASE-V S@U1, S@U1, IN[,] S0BRANCH ;

: CLASSIFY ( c -- ) BEGIN-CASE
    CZ[,] [ ASCII 0 1,  ASCII 9 1, ]{ ." digit" }
    CZ=   [ BL 1, ]{ ." space" }
    ." not digit or space" END-CASE ;


Boxes with "==" on the floor are heads. Something like 12345678 (all
digits together) represents a four-byte number stored in the order
preferred by your machine; for 80x86s, 78 56 34 12.

Meaning of the characters at the floors of the boxes:

  "==" - head
  "--" - Crim instruction
  ">>" - branch destination or string lenght
  ".." - anything else


Meaning of the inner engine instructions:

  heads:

  00 - Crim - pass to Crim mode
  01 - Forth - execute PFE word and return
  02 - RSR - see example 2
  03 - Value_1 (fetch)
  04 - Value_2 (store)
  05 - Value_3 (get address)

  Short instructions:

  FF - exit
  FE - s-exit - see example 2
  FD - four-byte literal
  FC - end-crim


Detailed description of some heads:

00 - Crim - ip+=1, switch to Crim mode

01 - Forth - call the PFE word whose Xt follows, and return

02 - RSR - (used at example 2) push RS[1] (the address where we should
return to) into the Streams stack, replacing it by the address of an
S-EXIT; that is, instead of returning directly to RS[1] we would
return (this is the action of S-EXIT) to the address stored at the top
of SS, that has probably been changed. Check, at example 2, how this
allowed us to define BRANCH from SBRANCH using just one byte
more. These "S"-words are much easier to debug and it's easier to
compose their actions.

"Return" always means "drop ip from the Rstack, then continue
execution in Crim mode".


Random notes about example 1:

Note that we are providing two extra heads for %A that are being
wasted. The second one (H_VALUE_2) executes as "TO A", setting %A's
value, and the third one (H_VALUE_3), like "' A". Also, the %SEXIT
would only be used at example 2.


The notation used for the stack maps is

  rstack /// sstack // dstack
  rstack ///           dstack
             sstack // dstack
                       dstack
                              :: state


The default state transitions are ( :: head -- :: crim ) for heads and
( :: crim -- :: crim ) for non-heads. H_RSR is an exception:

H_CRIM   ( ip0 ///        :: head   --   ip0+1 ///        :: crim )

H_RSR    ( r ip0 ///      :: head   --   &sexit ip0+1 /// r // :: head )

S_EXIT   ( adr ip0 ///    :: crim   --   adr ///          :: crim )
S_SEXIT  ( ip0 /// adr // :: crim   --   adr ///          :: crim )
S_LIT4   ( ip0 ///        :: crim   --   ip0+5 /// lit4   :: crim )

S0BRANCH ( adr // 0 -- adr+1+drw_s1 // )
     or: ( adr // t -- adr+1 // )


A too-short tutorial on writing Crim programs
(preliminary, bare metal version)

Chapter one:

  To write the shortest possible Crim program, just write the code for
the instruction END-CRIM somewhere in the memory, put this address in
the IP (by pushing it in the R-stack, maybe), set CRIM-STATE to CRIM
and call the engine. It would execute this instruction immediately,
returning to PFE.

Chapter two:

  Let's write a slightly larger program: a single sequence of
instructions. Write the code for LIT4, then the number 6, using four
bytes, then the code for multiplying, that will be probably a two-byte
value, with the MOST SIGNIFICANT BYTE FIRST; that's important to let
the engine distinguish correctly between one-byte and two-byte
instructions. After that, the code for END-CRIM. Put the address of
the LIT4 at IP, set CRIM-STATE to Crim, and start the engine. That
would multiply the last number at the data stack by six and return to
PFE.

  You may have noticed that, unless we modify crim.c (a very
instructive task), we wouldn't have an instruction code for
multiplying. There's a workaround for this in the next chapter.

Chapter three:

  Series of instructions that execute sequentially and end by
returning directly to Forth are not very useful, so we'll learn how to
write subroutines. The code for multiplication could be the address of
a subroutine for multiplication; that's exactly what we do, but to let
these addresses be only two bytes long, we write them relative to some
memory position, namely the one stored at CRIM_BEGMEM. Not all
subroutines will be sequences of built-in instructions and addresses
of other subroutines, so we need one byte at the beginning of each
subroutine to tell what kind of subroutine it is; we call this code
the "head" of the subroutine. It is obvious that the task of selecting
one kind of subroutine from one head is different from executing
instructions and function calls, so we say that the engine can be in a
lot of "states", switching between them tirelessly in some predefined
way. Two of these states are the "Head" state and the "Crim" state.

  Now it's time to take a look at 1.aut, 2.aut, crimcomp.4th
(esp. SHOW-STACKS) and crim.c to see how this works in practice. Note
how EXIT keeps us in Crim state, so we need to push the address of an
END-CRIM on the stack to be able to call a subroutine by its head and
return to PFE.

Chapter four:

  Here I should have been discussing the RSR construct, that (let me
be pretentious for short while) is the pearl of Crim, just like CREATE
DOES> is the pearl of Forth. (Thanks.) But I haven't thought enough on
how to explain it clearly, so just look at example 2.


Legal status:

Public domain.


Contributing:

Please send suggestions, commentaries, code, related material,
references for what I'm reinventing, corrections on my english,
ANYTHING (even questions!) to me at

  edrx@saci.mat.puc-rio.br (e-mail)

or upload stuff to the directory /pub/crim/uploads, at
saci.mat.puc-rio.br (139.82.27.51).


Thanks to:

  Pedro Campos
  Lea Tavora
  Ricardo Cezar
  Bruno Miranda
  Gonzalo Contreras
  Otton ____ (I forgot the last name)

for letting me try to explain Crim to them, and

  George Svetlichny
  Carlos Tomei
  Roberto Ierusalimchy
  Paulo Henrique V. Barros  

for encouragement.