/* * ETC * 95 jun 27 * * I've put in this file all information that didn't fit anywhere else. * The basic ideas: 1) The virtual machine has a "state" that specifies how it will execute what it finds at *ip. In this way we can unify the notions of executing the head of a word (I think it's called a "CFA" in traditional terminology), and that of deciding if the code is a short instruction or a function call and executing it. That makes the machine less intrincate and more extensible, but probably slower. 2) Word heads are one-byte long, and a word may have many heads. I took the idea of words with many heads from HS-Forth. 3) We have a third stack, the streams stack, that is used both for general parsing and for parsing immediate arguments, like the string following a <.">. See example 2. The files: ETC - this file letter - a description of Crim (taken from a letter I sent) announce - the short announce that appeared on comp.lang.forth crim.c - the inner interpreter and some auxiliary words crimcomp.4th - used to enter Crim programs autodoc.4th - used to make "automatic documentation" for the examples patchpfe - instructions to link crim.c with PFE 1.4th - first example - version that just compiles and runs 1-autod.4th - long version, used to produce 1.aud 1.aud - docs for the first example 2.4th - second example 2-autod.4th 2.aud ------ Example 1 - how it would be in Forth ------- 4 VALUE A : SQUARE DUP * ; : ^2+6A SQUARE 6 A * + ; 5 ^2+6A \ the result is 49 ------ Example 2 - approximate translation, ignoring the weird features ------- : CLASSIFY ( c -- ) TO CASE-V CASE-V ASCII 0 ASCII 9 IN[,] IF ." digit" ELSE CASE-V BL = IF ." space" ELSE ." not digit or space" THEN THEN ; ------ Example 2 - a more faithful translation, but using ------- ------ some words you'll have to guess what they do ------- RSR CZ= : SCZ= ( a // -- a+2|a+2+j // ) CASE-V S@U1, = S0BRANCH ; RSR CZ[,] : SCZ[,] ( a // -- a+3|a+3+j // ) CASE-V S@U1, S@U1, IN[,] S0BRANCH ; : CLASSIFY ( c -- ) BEGIN-CASE CZ[,] [ ASCII 0 1, ASCII 9 1, ]{ ." digit" } CZ= [ BL 1, ]{ ." space" } ." not digit or space" END-CASE ; Boxes with "==" on the floor are heads. Something like 12345678 (all digits together) represents a four-byte number stored in the order preferred by your machine; for 80x86s, 78 56 34 12. Meaning of the characters at the floors of the boxes: "==" - head "--" - Crim instruction ">>" - branch destination or string lenght ".." - anything else Meaning of the inner engine instructions: heads: 00 - Crim - pass to Crim mode 01 - Forth - execute PFE word and return 02 - RSR - see example 2 03 - Value_1 (fetch) 04 - Value_2 (store) 05 - Value_3 (get address) Short instructions: FF - exit FE - s-exit - see example 2 FD - four-byte literal FC - end-crim Detailed description of some heads: 00 - Crim - ip+=1, switch to Crim mode 01 - Forth - call the PFE word whose Xt follows, and return 02 - RSR - (used at example 2) push RS[1] (the address where we should return to) into the Streams stack, replacing it by the address of an S-EXIT; that is, instead of returning directly to RS[1] we would return (this is the action of S-EXIT) to the address stored at the top of SS, that has probably been changed. Check, at example 2, how this allowed us to define BRANCH from SBRANCH using just one byte more. These "S"-words are much easier to debug and it's easier to compose their actions. "Return" always means "drop ip from the Rstack, then continue execution in Crim mode". Random notes about example 1: Note that we are providing two extra heads for %A that are being wasted. The second one (H_VALUE_2) executes as "TO A", setting %A's value, and the third one (H_VALUE_3), like "' A". Also, the %SEXIT would only be used at example 2. The notation used for the stack maps is rstack /// sstack // dstack rstack /// dstack sstack // dstack dstack :: state The default state transitions are ( :: head -- :: crim ) for heads and ( :: crim -- :: crim ) for non-heads. H_RSR is an exception: H_CRIM ( ip0 /// :: head -- ip0+1 /// :: crim ) H_RSR ( r ip0 /// :: head -- &sexit ip0+1 /// r // :: head ) S_EXIT ( adr ip0 /// :: crim -- adr /// :: crim ) S_SEXIT ( ip0 /// adr // :: crim -- adr /// :: crim ) S_LIT4 ( ip0 /// :: crim -- ip0+5 /// lit4 :: crim ) S0BRANCH ( adr // 0 -- adr+1+drw_s1 // ) or: ( adr // t -- adr+1 // ) A too-short tutorial on writing Crim programs (preliminary, bare metal version) Chapter one: To write the shortest possible Crim program, just write the code for the instruction END-CRIM somewhere in the memory, put this address in the IP (by pushing it in the R-stack, maybe), set CRIM-STATE to CRIM and call the engine. It would execute this instruction immediately, returning to PFE. Chapter two: Let's write a slightly larger program: a single sequence of instructions. Write the code for LIT4, then the number 6, using four bytes, then the code for multiplying, that will be probably a two-byte value, with the MOST SIGNIFICANT BYTE FIRST; that's important to let the engine distinguish correctly between one-byte and two-byte instructions. After that, the code for END-CRIM. Put the address of the LIT4 at IP, set CRIM-STATE to Crim, and start the engine. That would multiply the last number at the data stack by six and return to PFE. You may have noticed that, unless we modify crim.c (a very instructive task), we wouldn't have an instruction code for multiplying. There's a workaround for this in the next chapter. Chapter three: Series of instructions that execute sequentially and end by returning directly to Forth are not very useful, so we'll learn how to write subroutines. The code for multiplication could be the address of a subroutine for multiplication; that's exactly what we do, but to let these addresses be only two bytes long, we write them relative to some memory position, namely the one stored at CRIM_BEGMEM. Not all subroutines will be sequences of built-in instructions and addresses of other subroutines, so we need one byte at the beginning of each subroutine to tell what kind of subroutine it is; we call this code the "head" of the subroutine. It is obvious that the task of selecting one kind of subroutine from one head is different from executing instructions and function calls, so we say that the engine can be in a lot of "states", switching between them tirelessly in some predefined way. Two of these states are the "Head" state and the "Crim" state. Now it's time to take a look at 1.aut, 2.aut, crimcomp.4th (esp. SHOW-STACKS) and crim.c to see how this works in practice. Note how EXIT keeps us in Crim state, so we need to push the address of an END-CRIM on the stack to be able to call a subroutine by its head and return to PFE. Chapter four: Here I should have been discussing the RSR construct, that (let me be pretentious for short while) is the pearl of Crim, just like CREATE DOES> is the pearl of Forth. (Thanks.) But I haven't thought enough on how to explain it clearly, so just look at example 2. Legal status: Public domain. Contributing: Please send suggestions, commentaries, code, related material, references for what I'm reinventing, corrections on my english, ANYTHING (even questions!) to me at edrx@saci.mat.puc-rio.br (e-mail) or upload stuff to the directory /pub/crim/uploads, at saci.mat.puc-rio.br (139.82.27.51). Thanks to: Pedro Campos Lea Tavora Ricardo Cezar Bruno Miranda Gonzalo Contreras Otton ____ (I forgot the last name) for letting me try to explain Crim to them, and George Svetlichny Carlos Tomei Roberto Ierusalimchy Paulo Henrique V. Barros for encouragement.