/* * descrip * 95 jun 28 * * an extract from a letter I sent describing Crim; * might be a good introduction. * I've created an alternative inner interpreter that allows for much smaller code and it's easier to learn than the traditional ones; in fact, I have teached it successfully to people that never programmed before in about a half-hour. Its main characteristics are: * CFAs (I call them "heads") are only one byte long. * A word may have many heads (I borrowed this idea from HS-Forth; I still don't know where it appeared first). For example, HEX 99 VALUE A may compile this: +--+--+--+--+--+--+--+ |05|04|03| 00000099 | +==+==+==+--+--+--+--+ 09 0A 0B Then, calling the word that starts at 0B (the one with 03 as head) fetches the value of A onto the stack; calling 0A stores the last value on the stack into A, and calling 09 puts the address of the data region of A (0C) on the stack. * The inner interpreter becomes a state machine, with at least two states (more about them soon). We consider that it always interprets what it finds at IP, using the state to decide its meaning: In "threaded" state, it reads either one byte, in the case of the short built-in instructions, or two bytes, that are interpreted as an address to be called; in "head" state, it reads one byte and executes some corresponding action. For example, a head of 00 just advances IP and changes the state to "threaded". The engine is written as to be easy to have new head actions, new short instructions and new states added. In particular, it should be easy to add a state that acts exactly like your favourite high-performance inner interpreter, another that runs, say, SPITBOL threaded code, and so on. * To simplify the creation of words that read the data that follow their calls, like <."> and 0BRANCH, we add a third stack, called the "streams stack", or S-stack. Instead of creating <."> directly, we first create S<.">, that reads the lenght of a string and then the string itself from the address at the top of the S-stack, instead of from some position at the R-stack. Then we define <."> as being the "RSR'ing" version of S<.">, that is: 1) move the address past the call from the R-stack to the S-stack, 2) place the address of an S-EXIT at the R-stack, so that S<."> will execute that S-EXIT after it returns; then the S-EXIT will transfer the execution to the address at the S-stack, making we return to the position after the string. 3) execute S<.">; it will return to an S-EXIT, doing the action described above. It sounds messy at first, but, anyway, it's an advanced feature, and it can unify many types of parsing that are usually kept apart. I've done a working prototype, using PFE on Linux. The inner engine and some related words are implemented in C as user-defined primitives, in less than 200 lines; all dumping, debugging, single-stepping and compiling tools are implemented in forth, again in less that 200 lines. The syntax for compiling a program for the engine is like this: : S@U1, ( s // -- s+1 // 0..255 ) S> DUP C@ SWAP 1+ >S ; : S@S1, ( s // -- s+1 // -128..127 ) S@U1, 128 SIGN-EXTEND ; : SBRANCH ( s // -- s+1+displ // ) S@S1, S> + >S ; ' DUP %-> %DUP \ "herits" DUP from PFE ' * %-> %* RSR BRANCH ' SBRANCH %-> %SBRANCH \ two words glued into a word \ with two heads %: %SQUARE %DUP %* %; I'm working on a version with normal Forth syntax, but it's not ready yet. The present version, that is meant to expose the ideas and to attrach collaborators, is public domain and is available for anonymous FTP at saci.mat.puc-rio.br (139.82.27.51), directory /pub/crim, file crim000.tgz. The name "Crim" is just a short for "Crimble Crumble", a character in one of Syd Barrett's songs, and have little or no relation with either "Cream" or "Crime".