### Dednat6 - a TeX preprocessor to typeset trees and diagrams

Quick index:

2018may08: I submitted this abstract for a presentation about Dednat6 - called "Dednat6: an extensible (semi-)preprocessor for LuaLaTeX that understands diagrams in ASCII art" - to TUG2018 (program). Warning: it mentions a way to load dednat6 with a single "\directlua{dofile"loaddednat6.lua"}" that I have not implemented yet.

2017sep19: I've stopped trying to document dednat6 because 1) I don't have a mental image of who I am writing for, 2) I get far too little feedback, 3) all of the feedback that I got came from people who felt that I was not writing for them - my approach, tone and choice of pre-requisites were all wrong. If you would like to try dednat6, get in touch, let's chat - please. By the way, practically all of my recent uploads here can be turned into a complete tarball, including the current version of dednat6, all tex dependencies, and compilation instructions, with just a few keystrokes (I use this script). If you want to start by trying to compile one of them, get in touch.

## 1. Testing

To download and test dednat6, do this (if you are in a *NIX-based system):

 mkdir /tmp/dednat6/ cd /tmp/dednat6/ wget http://angg.twu.net/dednat6/dednat6.zip unzip dednat6.zip cd tests/ lualatex 0.tex lualatex 2.tex lualatex 3.tex 

This should produce a 0.pdf, a 0.pdf and a 3.pdf. Here are direct links to 0.tex, 2.tex, 3.tex, and the resulting PDFs:

 http://angg.twu.net/dednat6/tests/0.tex.html http://angg.twu.net/dednat6/tests/0.tex http://angg.twu.net/dednat6/tests/0.pdf http://angg.twu.net/dednat6/tests/2.tex.html http://angg.twu.net/dednat6/tests/2.tex http://angg.twu.net/dednat6/tests/2.pdf http://angg.twu.net/dednat6/tests/3.tex.html http://angg.twu.net/dednat6/tests/3.tex http://angg.twu.net/dednat6/tests/3.pdf 

If you are on Windows, a friend of mine who uses TeXniccenter told me that this works:

I don't have more details at the moment, sorry. =(

## 2. Dednat4 vs. Dednat6

Note: Dednat4 and Dednat6 are very similar. Dednat4 is easier to explain, because it is just a preprocessor that we have to run like this:

 dednat4 foo-4.tex latex foo-4.tex 

Dednat6, in contrast, is easier to use - we just need this:

 lualatex foo-6.tex 

In Dednat4 all the Lua code is run before running LaTeX; in Dednat6, Lua is run from LuaLaTeX (with the command "\pu") to process chunks of foo-6.tex bit by bit.

(Lua)(La)TeX treats lines starting with "%" as comments, and ignores them. This means that we can put anything we want in these "%" lines - even code to be processed by other programs besides *TeX.

Dednat4/6 read TeX files and pay attention only to the lines that begin with some special sequences of characters (called "heads"), all starting with "%":

%L Lua code
%R Lua code with rectangles
%:* define abbreviations
%: derivation trees (two-dimensional)
%D definitions of diagrams (in a stack language)

Dednat4 processes a TeX file, say, foo-4.tex, and produces an auxiliary TeX file, foo-4.dnt, containing the TeX code to typeset the derivation trees and diagrams of foo.tex. Dednat6 does something similar, but the TeX code is usually not saved to a file; instead, it is processed by TeX immediately. Let's look at two examples (in Dednat6 syntax):

User code
LaTeX (generated)
Result
 %D diagram T:F->G %D 2Dx 100 +20 +20 %D 2D 100 A %D 2D / - \ %D 2D / | \ %D 2D v v v %D 2D +25 FA ------> GA %D 2D TA %D (( A FA -> A GA -> %D FA GA -> .plabel= b TA %D A FA GA midpoint |-> %D )) %D enddiagram $$\pu \diag{T:F->G}$$ 
 $$\defdiag{T:F->G}{ \morphism(300,0)/->/% <-300,-375>[{A}{FA};{}] \morphism(300,0)/->/% <300,-375>[{A}{GA};{}] \morphism(0,-375)|b|/->/% <600,0>[{FA}{GA};{TA}] \morphism(300,0)/|->/% <0,-375>[{A}{\phantom{O}};{}] } \diag{T:F->G}$$ 
 %: P\&Q %: ---- %: P\&Q Q %: ---- :f %: P\&Q P R %: :(P\&)f ------- %: P\&R P\&R %: %: ^t1 ^t2 %: $$\pu \ded{t1} := \ded{t2}$$ 
 $$\defded{t1}{ \infer*[{(P\&)f}]{ \mathstrut P\&R }{ \mathstrut P\&Q } } \defded{t2}{ \infer[{}]{ \mathstrut P\&R }{ \infer[{}]{ \mathstrut P }{ \mathstrut P\&Q } & \infer*[{f}]{ \mathstrut R }{ \infer[{}]{ \mathstrut Q }{ \mathstrut P\&Q } } } } \ded{t1} := \ded{t2}$$ 

## 4. "\pu": process all dednat code until the current line

The variable tf in dednat6 holds a TexFile object, and it is initialized by this code in LuaLaTeX:

 \directlua{texfile(tex.jobname)} 

If the current .tex file is foo-6.tex then tex.jobname is "foo-6", and this runs:

 tf = TexFile.read("foo-6.tex") 

which does, among other things,

 tf.lines = splitlines(readfile "foo-6.tex") tf.nline = 1 

If LuaLaTeX encounters at the line 23 of foo-6.tex the command \pu, then it runs this, in Lua:

 tf:processuntil(23) 

As tf.nline = 1, this means that Dednat6 has not processed any dednat lines - the ones beginning with "%D", "%:", "%L", etc - yet; Dednat6 processes everything between lines 1 and 22, and the result, which typically is some TeX code containg a series of "\def"s, "\defdiag"s, and "\defded"s, is run at the current point.

To understand this, take a look again at the table here - the left column of the table contains high-level code with dednat blocks, and the middle column contains the low-level code corresponding to it, in which the "\pu"s have been replaced by the "\defdiag"s and "\defded"s corresponding the diagrams and trees defined in "%D" and "%:" lines using Dednat6 syntax.

If LuaLaTeX encounters the next \pu in foo-6.tex at line 54, then Dednat6 will process the dednat lines between lines 23 and 53 of foo-6.tex, and LaTeX will run the resulting "\def"s, "\defdiag"s, and "\defded"s.

## 5. "output(...)"

The functions from Dednat6 that produce LaTeX code - "\def"s, "\defdiag"s, "\defded"s - use the function output(...), defined here, to send that code to LaTeX to make it be executed. In all the tests we have this:

 \directlua{verbose()} 

it makes "output(...)" to be verbose, i.e., to always print to the standard output the defs that will be sent to LaTeX.

The opposite of verbose() is:

 \directlua{quiet()} 

I am not sure if this verbose-mode output is sent also to the ".log" file; I think it should go there too.

## 6. Special characters

LuaLaTeX is UTF-8-based. This means that we can use UTF-8 chars in our .tex files if we do things like this,

 \catcode∀=13 \def∀{\forall} \catcodeΘ=13 \defΘ{\Theta} 

but some tricks, that I used a lot, do not work - they depended on all characters being 1-byte long and all codes between 0 and 255 being valid, including the ranges 1-7, 14-31, and 160-191.

The red stars ("*"s) in this document and in the page about dednat4 stand for "\^O"s; see this intro, especially the section "Red stars" at the end.

I heard that LuaLaTeX on Windows rejects files with "*"s, but I don't have the means for testing this myself or for finding workarounds.

 #:*->*\to * #:*|->*\mapsto * 

In dednat6 the best way to do something correspondent to that - without using "*"s - is:

 %L abbrevs:add("->", "\to ", "|->", "\\mapsto ") 

In the tests for dednat6 I am trying to have some tests that use only ascii, some other ones that are latin-1, some that are "pure UTF-8", and a few tests that use the characters that may be causing problems with LuaLaTeX on Windows.

(...but at the moment very few test files are ready...)

## 7. LuaTeX

Dednat6 uses very little of LuaTeX at the moment - essentially just tex.jobname, tex.inputlineno, tex.print from the Lua side, and \directlua from TeX.

The following hacks were needed. 1) dednat6.lua loads this to make require behave like the require from Lua. 2) Dednat6's output function runs deletecomments to filter out comments before sending code to tex.print. 3) I had to use a

 \catcode\^^J=10 

in the demos - 0.tex, 2.tex, 3.tex - to avoid having newlines become spurious "Ω"s.

My guess is that (2) and (3) are needed because tex.print and \input use different catcode tables. At one point I tried to check the details of this using this script to run Rob Hoelz's lua-repl from LuaLaTeX, but at some point I gave up.

One of the items in my to-do list is to make it easy to load and run lua-repl from dednat6.

(2015sep07: The following sections were copied verbatim from my page about Dednat4 - there are many details in them that need to be updated!)

## 8. A first example

if foo.tex contains:

 \documentclass{book} \usepackage{proof} \def\defded#1#2{\expandafter\def\csname ded-#1\endcsname{#2}} \def\ded#1{\csname ded-#1\endcsname} \begin{document} \input foo.dnt \def\<{\langle} \def\>{\rangle} %:*|->*\mapsto * %:*->*\to * %:*\\*\lambda * %:*:*{:}* %: %: [a,b]^1 [d:A×B]^1 %: ------- --------- %: [a,b]^1 b b|->c [d:A×B]^1 \pi_2d:B f:B->C %: ------- ----------- --------- ------------------- %: a c a:A f(\pi_2b):C %: -------------- ----------------------- %: a,c \:A×C %: ---------1 --------------------------------1 %: a,b|->a,c \\d:A×B.\:A×B->A×C %: %: ^Atimes-DNC-notation ^Atimes-conventional %: $$\ded{Atimes-DNC-notation} \qquad \ded{Atimes-conventional}$$ \end{document} 

then running "dednat4.lua foo.tex" and then "latex foo.tex" will produce this,

because "dednat4.lua foo.tex" creates a a file foo.dnt containing this:

 \defded{Atimes-DNC-notation}{ % (find-fline "foo.tex" 27) \infer[{1}]{ \mathstrut a,b\mapsto a,c }{ \infer{ \mathstrut a,c }{ \infer{ \mathstrut a }{ \mathstrut [a,b]^1 } & \infer{ \mathstrut c }{ \infer{ \mathstrut b }{ \mathstrut [a,b]^1 } & \mathstrut b\mapsto c } } } } \defded{Atimes-conventional}{ % (find-fline "foo.tex" 27) \infer[{1}]{ \mathstrut \lambda d{:}A×B.\{:}A×B\to A×C }{ \infer{ \mathstrut \{:}A×C }{ \infer{ \mathstrut a{:}A }{ \mathstrut [d{:}A×B]^1 } & \infer{ \mathstrut f(\pi_2b){:}C }{ \infer{ \mathstrut \pi_2d{:}B }{ \mathstrut [d{:}A×B]^1 } & \mathstrut f{:}B\to C } } } } 

"\usepackage{proof}" loads Makoto Tatsuta's proof.sty package, which defines \infer. Dednat4's routines for tree output also support Paul Taylor's "proofs" package, and inserting a line like

 %L tex_tree_function = tex_tree_paultaylor 

in foo.tex anywhere before the

 %: ^Atimes-DNC-notation ^Atimes-conventional 

line would make dednat4.lua spit out code for Paul Taylor's package instead.

## 9. Words

A word in dednat4's terminology (and in Forth terminology) is a sequence of non-whitespace characters, delimited by whitespace; the only characters that dednat4 considers as whitespace are " ", TAB, NL and CR (chars 32, 9, 10 and 13 respectively). The characters in the head of a line are removed before splitting it into words.

## 10. Everything about "%:" lines (and abbreviations)

The "%:" lines -- and also the "%D" lines, that we will describe soon -- are processed word by word. As heads don't count to form words, the line

 %: ^Atimes-DNC-notation ^Atimes-conventional 

has two words, "^Atimes-DNC-notation" and "^Atimes-conventional". In "%:" lines only the words that start with "^" are "active": "^Atimes-conventional" means "process the deduction tree whose root node is two lines above the "^" and output a block of TeX code of the form \defded{Atimes-conventional}{...} -- a definition for a deduction called Atimes-conventional; the definition is invoked with \ded{Atimes-conventional}.

Deduction trees are made of "nodes" and "bars". Both nodes and bars are words. The TeX code for a node is obtained by expanding all the abbreviations in the word (the functions that do that are here). Note that the expansion of an abbreviation can contain spaces -- for example:

 %:*|->*\mapsto *

Bars are always words that start with either a sequence of one or more "-"s, or a sequence of one or more "="s (for double bars). The "rest" of the word of a bar, when it exists, has its abbreviations expanded and the resulting TeX code is typeset at the right of the bar.

A node can either have a bar above it or have nothing above it; a bar can have any number of nodes above it. Here "above" means "immediately above", and two words are only considered to be one above the other when their horizontal ranges have at least one character in common. In the example below the node "notabove" is not considered to be above the bar.

 %: abovethebar alsoabove notabove %: ===stuffattheright %: belowthebar 

## 11. A second example: categorical diagrams

The part of the source file that starts at this point implements the support for "%D" lines. This part of dednat4 is a front-end for Michael Barr's diagxy package, which in its turn is a front-end for XYpic.

"%D" lines are processed one by one, and each word in them (except the head) is parsed (the code for the parser starts here) and then is executed. And there is a trick: some words advance the input pointer during their execution, and process the text between the old "pos" and the new "pos" in their own ways; usually they either read some words or read everything up to the end of the line.

Here is our first example of code with "%D" lines. Suppose that the file foo2.tex contains this:

 \documentclass{book} \input diagxy \def\defdiag#1#2{\expandafter\def\csname diag-#1\endcsname{#2}} \def\diag#1{\bfig\csname diag-#1\endcsname\efig} \begin{document} \input foo2.dnt %D diagram T:F->G %D 2Dx 100 +20 +20 %D 2D 100 A %D 2D / - \ %D 2D / | \ %D 2D v v v %D 2D +25 FA ------> GA %D 2D TA %L PP(nodes) -- Lua code: dump the table nodes' %D (( A FA -> A GA -> %L PP(ds) -- Lua code: dump the table ds' %D FA GA -> .plabel= b TA %D A FA GA midpoint |-> %D )) %D enddiagram $$\diag{T:F->G}$$ \end{document} 

The first word parsed is "diagram". When it is executed it reads the next word, "T:F->G", sets the name of the current diagram to that, clears the tables that hold the catalog of known nodes and arrows, and does a few other things; its code is here and here.

## 11.1. The 2D grid

Both "2Dx" and "2D" are words that parse everything to the end of the line and treat what they read in their own ways - as a grid with coordinates and nodes. Only columns that are below the first character of a number in the "2Dx" line have a horizontal coordinate; only lines that start with a number have a vertical coordinate (these "numbers" can start with "+", that means "the previous value plus this"). In the grid in foo2.tex only these six positions, marked as a', b', c', d', e', and f' below, have both a horizontal and a vertical coordinates; e' has coordinates (140, 125).

 %D 2Dx 100 +20 +20 %D 2D 100 a b c %D 2D %D 2D %D 2D %D 2D +25 c d e %D 2D 

Some words in the grid in foo2.tex -- namely, "A", "FA", "------>", and "GA", are over positions with both coordinates; those words become names of nodes with those coordinates. Most of the things that we drew on the grid are just "decorations" that are ignored by "2D"; they are there just to make the ASCII diagram look like a textual representation of the real diagram. Note that the "------>", that in a sense is just a decoration, is not ignored -- it becomes a node, but as unused nodes don't show on the picture and don't generate TeX code, we can ignore it.

After the grid in foo2.tex there's a Lua line with a command to dump the array of nodes; the source for PP is here, and the result (that is printed to stdout) is this, modulo whitespace:

 {1={"noden"=1, "tag"="A", "x"=120, "y"=100}, 2={"noden"=2, "tag"="FA", "x"=100, "y"=125}, 3={"noden"=3, "tag"="------>", "x"=120, "y"=125}, 4={"noden"=4, "tag"="GA", "x"=140, "y"=125}, "------>"={"noden"=3, "tag"="------>", "x"=120, "y"=125}, "A"={"noden"=1, "tag"="A", "x"=120, "y"=100}, "FA"={"noden"=2, "tag"="FA", "x"=100, "y"=125}, "GA"={"noden"=4, "tag"="GA", "x"=140, "y"=125} } 

Note that entries in that table can be accessed either by a numeric id (the "noden") or by the name of the node ("tag"); some of the subtables are shared -- nodes[1] = nodes["A"] -- but the output of PP doesn't make that explicit.

There's also a table called "arrows", but at this point it is empty.

## 11.2. Building arrows

After that comes this code:

 %D (( A FA -> A GA -> %L PP(ds) -- Lua code: dump the table ds' %D FA GA -> .plabel= b TA %D A FA GA midpoint |-> %D )) 

Dednat4 has a data stack ("ds"; we will see a dump of it soon), like Forth; it doesn't have a "return stack" like the one in Forth, as we don't need subroutines in the obvious sense of the term, at least not in the kernel; it's easy to define new words in Lua, and usually that's enough.

"((" puts a value in an auxiliary stack, called "depths", to remember how deep is the data stack at that point; in the next lines we will put many new objects in the data stack, and the "))" will get rid of all these new objects: it will drop everything above the stored depth. "((" and "))" help keeping the data stack tidy.

"A" and "FA" put two nodes on the data stack; "->" (the words for arrows are defined here) creates a new arrow, going from "A" to "FA", and puts it both on the data stack (after "A" and "FA") and on the list of arrows; the main thing that "enddiagram" does is to output TeX code for all the defined arrows, i.e., to draw them. This can only be done at the end, because some words modify attributes of arrows: the code ".plabel= b TA", a few lines after that, adds a "label" and a "position" to the arrow at the top of the stack: the text of the label is "TA", and it is to be TeXed below the arrow.

When "PP(ds)" (in Lua) dumps the data stack what we see is this, modulo whitespace:

 {1={"arrown"=2, "from"=1, "shape"="->", "to"=4}, 2={"noden"=4, "tag"="GA", "x"=140, "y"=125}, 3={"noden"=1, "tag"="A", "x"=120, "y"=100}, 4={"arrown"=1, "from"=1, "shape"="->", "to"=2}, 5={"noden"=2, "tag"="FA", "x"=100, "y"=125}, 6={"noden"=1, "tag"="A", "x"=120, "y"=100} } 

A full description of the "node" and "arrow" structures can be found here; arrows have many optional fields. Note that the top of the stack is ds[1].

The only other new thing in this diagram is "midpoint". It is defined here, and it takes the two nodes at the top of the stack and replaces them (on the stack only!) by a new node, lying halfway between them.

We have already described "))"; in this case it makes the depth of ds go back to zero. After it "enddiagram" appends this to foo2.dnt,

 \defdiag{T:F->G}{ % (find-fline "foo2.tex" 9) \morphism(300,0)/->/<-300,-375>[{A}{FA};{}] \morphism(300,0)/->/<300,-375>[{A}{GA};{}] \morphism(0,-375)|b|/->/<600,0>[{FA}{GA};{TA}] \morphism(300,0)/|->/<0,-375>[{A}{\phantom{O}};{}] } 

And TeX typesets it into this:

## 11.3. More tricks for 2D diagrams

(Describe how to use "@" to refer to the elements pushed on the stack after the last "((", how to use ".tex" and ".TeX" to have several nodes with the same TeX text; also: other "shapes" of floating arrows (=>, for example), "place" and the pullback symbol; discuss the source code of the BCC diagram below; discuss the extensions in experimental.lua)

## 11.4. A bigger diagram

The diagram below - the Beck-Chevalley condition in a certain notation -

was produced by this code:

 %D diagram LCCC-BCC %D 2Dx 100 +30 +25 +30 %D 2D 100 {}d ===============> c,d{} %D 2D - /\ - ^ %D 2D | \\ |-> | |\BCC %D 2D v \\ v - %D 2D +20 {}c,d <=\\=========== c,d{}{} %D 2D /\ \\ /\ %D 2D +10 \\ d ===============> c,d{{}} %D 2D \\ - \\ - %D 2D \\ | <-| \\ |\id %D 2D \\ v \\ v %D 2D +20 c,d <============== c,d{{}}{} %D 2D %D 2D +10 a,b,c |----------> a,b %D 2D - _| - %D 2D \ \ %D 2D v v %D 2D +35 a,c |--------------> a %D 2D %D (( {}d c,d{} # 0 1 %D {}c,d c,d{}{} # 2 3 %D d c,d{{}} # 4 5 %D c,d c,d{{}}{} # 6 7 %D @ 0 @ 1 => %D @ 0 @ 2 |-> @ 1 @ 3 |-> sl_ .plabel= l \natural %D @ 1 @ 3 <-| sl^ .plabel= r \mathrm{BCC} %D @ 0 @ 3 harrownodes nil 20 nil |-> %D @ 2 @ 3 <= %D @ 0 @ 4 <= @ 2 @ 6 <= @ 3 @ 7 <= %D @ 0 @ 2 midpoint @ 4 @ 6 midpoint dharrownodes nil 14 nil <-| %D @ 4 @ 5 => @ 4 @ 6 |-> @ 5 @ 7 |-> .plabel= r \mathrm{id} %D @ 4 @ 7 harrownodes nil 20 nil <-| %D @ 6 @ 7 <= %D )) %D (( a,b,c a,b %D a,c a %D @ 0 @ 1 |-> @ 0 @ 2 |-> @ 1 @ 3 |-> @ 2 @ 3 |-> %D @ 0 relplace 15 7 \pbsymbol{7} %D )) `