# 2000jul07

sendemail -s 'perpol' freestuff@boswa.removethis.com <<'%%%'
Hi! I wrote the text below in a fit of inspiration after taking a look
at perpol and I will recycle it in at least two ways, by placing it at
my page and by sending a version of it to the MISC list; so, if you
find it boring you can just discard it without problems. But I had to
send you the first copy...

--snip--snip--

Hi Darrell! I've just downloaded perpol and easynasm and played a bit
with them, and I'm very impressed! The least I can say is that the
rate good ideas/lines of code is fantastic and I'm going to take a lot
of inspiration from them.

I've spent a lot of time some years ago doing prototypes for a
modified Forth inner interpreter; I called it "Crim" for obscure
reasons. It was intended to have a better bytecode than usual Linux
Forths, where "better" means "better to write by hand" and "funnier to
hack". Here are its main ideas.

Crim's Forth instructions, like "DUP", "*" and "EXIT" in ": SQUARE DUP
* ;", can be both one and two bytes long; the interpreter reads one
byte and if it is in a certain range then it is a one-byte
instruction, either one of the primitives or something that should be
converted to two bytes by looking at a table. If the byte is not in
that range then read another byte, assemble an address (large Forth
programs take only a few K, so two bytes is enough) and do a call:
push the next IP in the return stack, jump to the address, etc.

One point that was never made clear in any of the Forth books (or
docs, or code) that I've read is that the inner interpreter can be in
any of a few "modes", and in each of these modes it executes what it
finds in a completely different manner. In "head" mode it reads, for
example, the DOCOL that corresponds to the executable part of ":
SQUARE", which, in most Forths, is the address of a routine written in
machine language (well, it's not exactly a routine because it doesn't
return, but that doesn't matter); we can think that after saving the
address that comes after the DOCOL the inner interpreter jumps to the
machine-language definition of DOCOL, but in in, say, "assembler"
mode. It takes a certain imagination to conceive that the inner
interpreter will leave the assembler mode, but it easy to see the
transitions between the two other main modes: after some juggling the
inner interpreter, that was in "head" mode when it read the DOCOL,
reaches the address of the "DUP" in "Forth" mode, and after more
juggling the address of "*", again in "Forth" mode, then the "NEXT",
etc. I found that it is very convenient when I explain Forth to people
to start describing the transitions between the "head" and "Forth"
modes and then show how these modes correspond to particular cases of
an eternal ugly assembler mode that the processor cannot leave.

At this point it is clear that we can add more states to the inner
interpreter, and that in fact we may have been doing that with
noticing; for example, after a "<.">" the, huh, inner interpreter
falls in string-gobbling mode from which it will only recover at the
end of the literal string. And note that we have other interpreters
that traverse the same code: "SEE", for example, understand the common
forth words plus things like BRANCH, 0BRANCH, LIT and EXIT, that may
take literal data after them and may or may not end the word
definition. Even more curiously, SEE always goes straight ahead
without the need of following the function calls...

But implementing words like SEE, or even like <.">, that sort of
implement other states on the inner interpreter, is a boring and
error-prone task (or at least it was for me. :-). There's a simple
solution: consider that the inner interpreter has a third stack; let's
call it "S" (for "streams"). If you're working on a Forth where the
dictionary and the code are kept apart, then it makes sense to have
words with multiple heads. For example:

  AT_FOO:  db DOAT
  TO_FOO:  db DOTO
  FOO:     db DOCON
           dw 222

The Forth word with head at FOO acts as a word defined by CONSTANT and
returns 222; the one with head at TO_FOO will take the last item of
the stack and store it where the 222 is now, and the one at AT_FOO
will return the address of the 222. With good code for DOAT and DOTO
and with immediate words AT and TO (with obvious definitions) we can
save both program space and execution time.

Now we can start to use the third stack. Consider a word "S$@,", that
will advance over a counted string whose address is in the S-stack:

  : S$@, ( d::  s:: adr -- d:: adr+1 len  s:: adr+len+1 )
      S> COUNT 2DUP + >S ;

and consider that we can use a certain DORSR for heads (this is the
novelty!); what it does is to execute the following head "with the
last address in the return stack moved to the S-stack", and then,
after finishing its execution, it moves the top value in the S-stack
(that probably has changed) back to the return stack. I will give the
details soon, after an example. With this code in pseudo-assembler we
define at the same time <."> and S<.">, where S<."> is the same as
<."> but reads the string from the S-stack and advances the S-stack.

  <.">:   db DORSR
  S<.">:  db DOCOL
          dw S$@,
          dw TYPE
          db EXIT

And this code will do something similar, but taking and printing two
strings, with a CR in between:

  2<.">:  db DORSR
  S2<.">: db DOCOL
          dw S<.">
          dw CR
          dw S<.">
          db EXIT

  DEMO:   db DOCOL
          dw 2<.">
          db 5; db "Hello"
          db 5; db "there"
          db EXIT

The details: the execution of DORSR is a bit strange because after
executing the word that comes after it it has to move back a value
from the S-stack to the return stack. Let's follow what it should do,
taking the execution of the DORSR in <."> as an example and describing
the transitions of the inner interpreter. Consider that the IP (in
whatever mode) is the top element of the return stack.

  before DORSR:   state:: head  r:: x <.">          s::    d::
   after DORSR:   state:: head  r:: RSREXIT <.">+1  s:: x  d::

where RSREXIT is the address of a Forth word that will execute like an
"EXIT to the value in the S-stack":

  before RSREXIT:  state:: Forth  r:: RSREXIT  s:: x+  d::
   after RSREXIT:  state:: Forth  r:: x+       s::     d::

That's it. I did a toy implementation of it using C and PFE (PFE was
an old free Linux Forth), and from that time on I only drafted some
things, made plans to write something more serious to link with C,
PForth and Tcl, and learned the basic techniques. A few weeks ago I
started to play with eForth for Linux (especially because I heard that
there's a version of it for the F21/MuP21, and it seemed like a good
step in the twisted path to make F/P21 emulators for Linux), but it is
not trivial to add dynamic linking and C library calls to it, etc,
etc, etc. Well, Your system provides many of the missing parts I was
looking for.

--snip--snip--

  Some links:
    http://angg.twu.net/e/fortho.e.html#perpol
    http://angg.twu.net/e/anatocc.e.html#easynasm
    http://angg.twu.net/e/fortho.e.html#eforth
    http://angg.twu.net/e/forth.e.html#pforth_and_tcl
    http://angg.twu.net/forth.html

  Cheers,
    Eduardo Ochs
    edrx@inx.com.br
    http://angg.twu.net/
%%%