# 2000jul07 sendemail -s 'perpol' freestuff@boswa.removethis.com <<'%%%' Hi! I wrote the text below in a fit of inspiration after taking a look at perpol and I will recycle it in at least two ways, by placing it at my page and by sending a version of it to the MISC list; so, if you find it boring you can just discard it without problems. But I had to send you the first copy... --snip--snip-- Hi Darrell! I've just downloaded perpol and easynasm and played a bit with them, and I'm very impressed! The least I can say is that the rate good ideas/lines of code is fantastic and I'm going to take a lot of inspiration from them. I've spent a lot of time some years ago doing prototypes for a modified Forth inner interpreter; I called it "Crim" for obscure reasons. It was intended to have a better bytecode than usual Linux Forths, where "better" means "better to write by hand" and "funnier to hack". Here are its main ideas. Crim's Forth instructions, like "DUP", "*" and "EXIT" in ": SQUARE DUP * ;", can be both one and two bytes long; the interpreter reads one byte and if it is in a certain range then it is a one-byte instruction, either one of the primitives or something that should be converted to two bytes by looking at a table. If the byte is not in that range then read another byte, assemble an address (large Forth programs take only a few K, so two bytes is enough) and do a call: push the next IP in the return stack, jump to the address, etc. One point that was never made clear in any of the Forth books (or docs, or code) that I've read is that the inner interpreter can be in any of a few "modes", and in each of these modes it executes what it finds in a completely different manner. In "head" mode it reads, for example, the DOCOL that corresponds to the executable part of ": SQUARE", which, in most Forths, is the address of a routine written in machine language (well, it's not exactly a routine because it doesn't return, but that doesn't matter); we can think that after saving the address that comes after the DOCOL the inner interpreter jumps to the machine-language definition of DOCOL, but in in, say, "assembler" mode. It takes a certain imagination to conceive that the inner interpreter will leave the assembler mode, but it easy to see the transitions between the two other main modes: after some juggling the inner interpreter, that was in "head" mode when it read the DOCOL, reaches the address of the "DUP" in "Forth" mode, and after more juggling the address of "*", again in "Forth" mode, then the "NEXT", etc. I found that it is very convenient when I explain Forth to people to start describing the transitions between the "head" and "Forth" modes and then show how these modes correspond to particular cases of an eternal ugly assembler mode that the processor cannot leave. At this point it is clear that we can add more states to the inner interpreter, and that in fact we may have been doing that with noticing; for example, after a "<.">" the, huh, inner interpreter falls in string-gobbling mode from which it will only recover at the end of the literal string. And note that we have other interpreters that traverse the same code: "SEE", for example, understand the common forth words plus things like BRANCH, 0BRANCH, LIT and EXIT, that may take literal data after them and may or may not end the word definition. Even more curiously, SEE always goes straight ahead without the need of following the function calls... But implementing words like SEE, or even like <.">, that sort of implement other states on the inner interpreter, is a boring and error-prone task (or at least it was for me. :-). There's a simple solution: consider that the inner interpreter has a third stack; let's call it "S" (for "streams"). If you're working on a Forth where the dictionary and the code are kept apart, then it makes sense to have words with multiple heads. For example: AT_FOO: db DOAT TO_FOO: db DOTO FOO: db DOCON dw 222 The Forth word with head at FOO acts as a word defined by CONSTANT and returns 222; the one with head at TO_FOO will take the last item of the stack and store it where the 222 is now, and the one at AT_FOO will return the address of the 222. With good code for DOAT and DOTO and with immediate words AT and TO (with obvious definitions) we can save both program space and execution time. Now we can start to use the third stack. Consider a word "S$@,", that will advance over a counted string whose address is in the S-stack: : S$@, ( d:: s:: adr -- d:: adr+1 len s:: adr+len+1 ) S> COUNT 2DUP + >S ; and consider that we can use a certain DORSR for heads (this is the novelty!); what it does is to execute the following head "with the last address in the return stack moved to the S-stack", and then, after finishing its execution, it moves the top value in the S-stack (that probably has changed) back to the return stack. I will give the details soon, after an example. With this code in pseudo-assembler we define at the same time <."> and S<.">, where S<."> is the same as <."> but reads the string from the S-stack and advances the S-stack. <.">: db DORSR S<.">: db DOCOL dw S$@, dw TYPE db EXIT And this code will do something similar, but taking and printing two strings, with a CR in between: 2<.">: db DORSR S2<.">: db DOCOL dw S<."> dw CR dw S<."> db EXIT DEMO: db DOCOL dw 2<."> db 5; db "Hello" db 5; db "there" db EXIT The details: the execution of DORSR is a bit strange because after executing the word that comes after it it has to move back a value from the S-stack to the return stack. Let's follow what it should do, taking the execution of the DORSR in <."> as an example and describing the transitions of the inner interpreter. Consider that the IP (in whatever mode) is the top element of the return stack. before DORSR: state:: head r:: x <."> s:: d:: after DORSR: state:: head r:: RSREXIT <.">+1 s:: x d:: where RSREXIT is the address of a Forth word that will execute like an "EXIT to the value in the S-stack": before RSREXIT: state:: Forth r:: RSREXIT s:: x+ d:: after RSREXIT: state:: Forth r:: x+ s:: d:: That's it. I did a toy implementation of it using C and PFE (PFE was an old free Linux Forth), and from that time on I only drafted some things, made plans to write something more serious to link with C, PForth and Tcl, and learned the basic techniques. A few weeks ago I started to play with eForth for Linux (especially because I heard that there's a version of it for the F21/MuP21, and it seemed like a good step in the twisted path to make F/P21 emulators for Linux), but it is not trivial to add dynamic linking and C library calls to it, etc, etc, etc. Well, Your system provides many of the missing parts I was looking for. --snip--snip-- Some links: http://angg.twu.net/e/fortho.e.html#perpol http://angg.twu.net/e/anatocc.e.html#easynasm http://angg.twu.net/e/fortho.e.html#eforth http://angg.twu.net/e/forth.e.html#pforth_and_tcl http://angg.twu.net/forth.html Cheers, Eduardo Ochs edrx@inx.com.br http://angg.twu.net/ %%%