(Chapa 1)


Dednat6: an extensible (semi-)preprocessor for LuaLaTeX that understands diagrams in ASCII art

2019may21: Dednat6 now has a git repository (with instructions!)
2019may15: A new package of Dednat6 is ready!
You can download it from the first link below.
Note that "dednat6.zip" always points to the latest version.


The .zip includes copies of these PDFs:

http://angg.twu.net/LATEX/2018tug-dednat6.pdf  (slides, TUG2018)
http://angg.twu.net/LATEX/2018tugboat-rev1.pdf (TUGBoat article) ***
http://angg.twu.net/LATEX/2018dednat6-extras.pdf  (extra features)

The source for the minimal test file is here (with hyperlinks):


The rest of this page needs lots of updates.
Many things changed after I gave a talk at TUG2018.
The current code for dednat6 is here: http://angg.twu.net/LATEX/dednat6/.
There are older, obsolete versions at http://angg.twu.net/dednat6/
and at http://angg.twu.net/dednat4/ and http://angg.twu.net/dednat/.

Recent questions (by me) on mailing lists:
2019-01-16: Inspecting TeX tokens from a Lua REPL
2019-01-19: Doubts about syntax: can <cs> and <pos> expand to <empty>?

Quick index:

1. A quick start guide for beginners

If you use TeXstudio, TeXworks or something similar, then start by this.

  1. Download and unpack dednat6.zip
  2. Edit 2018dednat6-minimal.tex
  3. Set your editor font to some monospaced font
  4. Set your compiler to lualatex
  5. Compile 2018dednat6-minimal.tex and view the PDF
  6. Make small changes to the deduction tree in ascii art, recompile, view
  7. Read the TUGBoat article.

2. Testing

Here is a way to test dednat6 on a *NIX system: we download the .zip, delete the PDF files and recompile (i.e., re-tex) everything. In the cases of "2018dednat6-no-lua" and "2018dednat6-preproc" we compile them twice, once in a place with access to the Lua files of dednat6, and a second time in a place without - but with access to the ".dnt" file.

rm -Rfv /tmp/dn6-test/
mkdir   /tmp/dn6-test/
cd      /tmp/dn6-test/
wget http://angg.twu.net/dednat6.zip
unzip dednat6.zip
rm -v *.pdf
lualatex 2018dednat6-minimal.tex
lualatex 2018tugboat-rev1.tex
lualatex 2018tugboat-rev1.tex
lualatex 2018tug-dednat6.tex
lualatex 2018tug-dednat6.tex
lualatex 2018dednat6-extras.tex
lualatex 2018dednat6-extras.tex

lualatex 2018dednat6-no-lua.tex
mkdir no-lua/
cd    no-lua/
cp -v ../2018dednat6-no-lua.tex ../2018dednat6-no-lua.dnt .
pdflatex 2018dednat6-no-lua.tex
cd ..

./dednat6load.lua -4 2018dednat6-preproc.tex
pdflatex             2018dednat6-preproc.tex
mkdir preproc/
cd    preproc/
cp -v ../2018dednat6-preproc.tex ../2018dednat6-preproc.dnt .
pdflatex 2018dednat6-preproc.tex
cd ..

Another one is to unpack only the source code of dednat6 - i.e., dednat6/* and dednat6load.lua - and 2018dednat6-minimal.tex, and run lualatex on 2018dednat6-minimal.tex:

rm -Rfv /tmp/dn6-test-min/
mkdir   /tmp/dn6-test-min/
cd      /tmp/dn6-test-min/
wget http://angg.twu.net/dednat6.zip
unzip dednat6.zip "dednat6/**" dednat6load.lua 2018dednat6-minimal.tex
lualatex 2018dednat6-minimal.tex

Note that the .zip includes lots of strange files - that's because I use flsfiles.lua to produce it. They are needed to compile some of the PDFs.

3. Dednat4 vs. Dednat6

Note: Dednat4 and Dednat6 are very similar. Dednat4 is easier to explain, because it is just a preprocessor that we have to run like this:

dednat4 foo-4.tex
latex   foo-4.tex

Dednat6, in contrast, is easier to use - we just need this:

lualatex foo-6.tex

In Dednat4 all the Lua code is run before running LaTeX; in Dednat6, Lua is run from LuaLaTeX (with the command "\pu") to process chunks of foo-6.tex bit by bit. This is explained in the TUGBoat article, in section 3 ("semi-preprocessors"). To generate a .dnt file with dednat6, see the next section.

4. Producing a .tex/.dnt pair that doesn't need LuaLaTeX

The file 2018dednat6-no-lua.tex in the package shows hows how to use dednat6 in situations where you have to generate code that compiles with just pdflatex, without lualatex - for example, when you need to produce LaTeX code acceptable by Arxiv (without dirty tricks). To test 2018dednat6-no-lua.tex, run this:

rm -Rfv /tmp/dn6-test-no-lua/
mkdir   /tmp/dn6-test-no-lua/
cd      /tmp/dn6-test-no-lua/
wget http://angg.twu.net/dednat6.zip
unzip dednat6.zip "dednat6/**" dednat6load.lua 2018dednat6-no-lua.tex
lualatex 2018dednat6-no-lua.tex
mkdir no-lua/
cd    no-lua/
cp -v ../2018dednat6-no-lua.tex ../2018dednat6-no-lua.dnt .
pdflatex 2018dednat6-no-lua.tex
xpdf     2018dednat6-no-lua.pdf

The line "lualatex 2018dednat6-no-lua.tex" generates a .dnt file; the commands after that create a directory with just the .tex and the .dnt, and compiles the .tex with pdflatex.

A .tex file that supports being compiled in this way has this structure:

\input diagxy

  \directlua{dofile "dednat6load.lua"}


%L write_dnt_file()


Note the "\usepackage{ifluatex}", the "\ifluatex / \else / \fi" block, and the "%L write_dnt_file()" followed by a "\pu".

5. Main idea: heads

(Lua)(La)TeX treats lines starting with "%" as comments, and ignores them. This means that we can put anything we want in these "%" lines - even code to be processed by other programs besides *TeX.

Dednat4/6 read TeX files and pay attention only to the lines that begin with some special sequences of characters (called "heads"), all starting with "%":

Head interpreted as
%L Lua code
%R Lua code with rectangles
%:* define abbreviations
%: derivation trees (two-dimensional)
%D definitions of diagrams (in a stack language)

Dednat4 processes a TeX file, say, foo-4.tex, and produces an auxiliary TeX file, foo-4.dnt, containing the TeX code to typeset the derivation trees and diagrams of foo.tex. Dednat6 does something similar, but the TeX code is usually not saved to a file; instead, it is processed by TeX immediately. Let's look at two examples (in Dednat6 syntax):

User code
LaTeX (generated)
%D diagram T:F->G
%D 2Dx    100   +20  +20
%D 2D 100       A
%D 2D         / - \
%D 2D        /  |  \
%D 2D       v   v   v
%D 2D +25 FA ------> GA
%D 2D          TA
%D (( A FA -> A GA ->
%D    FA GA -> .plabel= b TA
%D    A FA GA midpoint |->
%D ))
%D enddiagram
$$\pu \diag{T:F->G}$$
simple 2D diagram
%:                   P\&Q
%:                   ----
%:             P\&Q   Q  
%:             ----   :f 
%:  P\&Q        P     R  
%:    :(P\&)f   -------  
%:  P\&R          P\&R   
%:  ^t1           ^t2
$$\pu \ded{t1} := \ded{t2}$$
 \infer*[{(P\&)f}]{ \mathstrut P\&R }{
  \mathstrut P\&Q } }
 \infer[{}]{ \mathstrut P\&R }{
  \infer[{}]{ \mathstrut P }{
   \mathstrut P\&Q } &
  \infer*[{f}]{ \mathstrut R }{
   \infer[{}]{ \mathstrut Q }{
    \mathstrut P\&Q } } } }
\ded{t1} := \ded{t2}
simple 2D diagram

6. "\pu": process all dednat code until the current line

The variable tf in dednat6 holds a TexFile object, and it is initialized by this code in LuaLaTeX:


If the current .tex file is foo-6.tex then tex.jobname is "foo-6", and this runs:

tf = TexFile.read("foo-6.tex")

which does, among other things,

tf.lines = splitlines(readfile "foo-6.tex")
tf.nline = 1

If LuaLaTeX encounters at the line 23 of foo-6.tex the command \pu, then it runs this, in Lua:


As tf.nline = 1, this means that Dednat6 has not processed any dednat lines - the ones beginning with "%D", "%:", "%L", etc - yet; Dednat6 processes everything between lines 1 and 22, and the result, which typically is some TeX code containg a series of "\def"s, "\defdiag"s, and "\defded"s, is run at the current point.

To understand this, take a look again at the table here - the left column of the table contains high-level code with dednat blocks, and the middle column contains the low-level code corresponding to it, in which the "\pu"s have been replaced by the "\defdiag"s and "\defded"s corresponding the diagrams and trees defined in "%D" and "%:" lines using Dednat6 syntax.

If LuaLaTeX encounters the next \pu in foo-6.tex at line 54, then Dednat6 will process the dednat lines between lines 23 and 53 of foo-6.tex, and LaTeX will run the resulting "\def"s, "\defdiag"s, and "\defded"s.

7. "output(...)"

The functions from Dednat6 that produce LaTeX code - "\def"s, "\defdiag"s, "\defded"s - use the function output(...), defined here, to send that code to LaTeX to make it be executed. In all the tests we have this:


it makes "output(...)" to be verbose, i.e., to always print to the standard output the defs that will be sent to LaTeX.

The opposite of verbose() is:


I am not sure if this verbose-mode output is sent also to the ".log" file; I think it should go there too.

8. Special characters

LuaLaTeX is UTF-8-based. This means that we can use UTF-8 chars in our .tex files if we do things like this,

\catcode`∀=13 \def∀{\forall}
\catcode`Θ=13 \defΘ{\Theta}

but some tricks, that I used a lot, do not work - they depended on all characters being 1-byte long and all codes between 0 and 255 being valid, including the ranges 1-7, 14-31, and 160-191.

The red stars ("*"s) in this document and in the page about dednat4 stand for "\^O"s; see this intro, especially the section "Red stars" at the end.

I heard that LuaLaTeX on Windows rejects files with "*"s, but I don't have the means for testing this myself or for finding workarounds.

In dednat4, this was the standard way of adding "abbreviations" was this:

#:*->*\to *
#:*|->*\mapsto *

In dednat6 the best way to do something correspondent to that - without using "*"s - is:

%L abbrevs:add("->", "\to ", "|->", "\\mapsto ")

In the tests for dednat6 I am trying to have some tests that use only ascii, some other ones that are latin-1, some that are "pure UTF-8", and a few tests that use the characters that may be causing problems with LuaLaTeX on Windows.

(...but at the moment very few test files are ready...)

9. LuaTeX

Dednat6 uses very little of LuaTeX at the moment - essentially just tex.jobname, tex.inputlineno, tex.print from the Lua side, and \directlua from TeX.

The following hacks were needed. 1) dednat6.lua loads this to make require behave like the require from Lua. 2) Dednat6's output function runs deletecomments to filter out comments before sending code to tex.print. 3) I had to use a


in the demos - 0.tex, 2.tex, 3.tex - to avoid having newlines become spurious "Ω"s.

My guess is that (2) and (3) are needed because tex.print and \input use different catcode tables. At one point I tried to check the details of this using this script to run Rob Hoelz's lua-repl from LuaLaTeX, but at some point I gave up.

One of the items in my to-do list is to make it easy to load and run lua-repl from dednat6.

(2015sep07: The following sections were copied verbatim from my page about Dednat4 - there are many details in them that need to be updated!)

10. A first example

if foo.tex contains:

\def\defded#1#2{\expandafter\def\csname ded-#1\endcsname{#2}}
\def\ded#1{\csname ded-#1\endcsname}

\input foo.dnt

%:*|->*\mapsto *
%:*->*\to *
%:*\\*\lambda *
%:          [a,b]^1	                   [d:A×B]^1             
%:          -------	   	           ---------             
%:  [a,b]^1    b     b|->c   	[d:A×B]^1  \pi_2d:B     f:B->C   
%:  -------    -----------   	---------  -------------------   
%:     a            c	 	  a:A         f(\pi_2b):C        
%:     --------------	 	  -----------------------        
%:         a,c		 	    \<a,f(\pi_2b)\>:A×C          
%:       ---------1	 	--------------------------------1
%:       a,b|->a,c	 	\\d:A×B.\<a,f(\pi_2b)\>:A×B->A×C  
%:       ^Atimes-DNC-notation	 ^Atimes-conventional            
$$\ded{Atimes-DNC-notation} \qquad \ded{Atimes-conventional}$$


then running "dednat4.lua foo.tex" and then "latex foo.tex" will produce this,

because "dednat4.lua foo.tex" creates a a file foo.dnt containing this:

\defded{Atimes-DNC-notation}{    % (find-fline "foo.tex" 27)
 \infer[{1}]{ \mathstrut a,b\mapsto a,c }{
  \infer{ \mathstrut a,c }{
   \infer{ \mathstrut a }{
    \mathstrut [a,b]^1 } &
   \infer{ \mathstrut c }{
    \infer{ \mathstrut b }{
     \mathstrut [a,b]^1 } &
    \mathstrut b\mapsto c } } } }

\defded{Atimes-conventional}{    % (find-fline "foo.tex" 27)
 \infer[{1}]{ \mathstrut \lambda d{:}A×B.\<a,f(\pi_2b)\>{:}A×B\to A×C }{
  \infer{ \mathstrut \<a,f(\pi_2b)\>{:}A×C }{
   \infer{ \mathstrut a{:}A }{
    \mathstrut [d{:}A×B]^1 } &
   \infer{ \mathstrut f(\pi_2b){:}C }{
    \infer{ \mathstrut \pi_2d{:}B }{
     \mathstrut [d{:}A×B]^1 } &
    \mathstrut f{:}B\to C } } } }

"\usepackage{proof}" loads Makoto Tatsuta's proof.sty package, which defines \infer. Dednat4's routines for tree output also support Paul Taylor's "proofs" package, and inserting a line like

%L tex_tree_function = tex_tree_paultaylor

in foo.tex anywhere before the

%:       ^Atimes-DNC-notation	 ^Atimes-conventional

line would make dednat4.lua spit out code for Paul Taylor's package instead.

11. Words

A word in dednat4's terminology (and in Forth terminology) is a sequence of non-whitespace characters, delimited by whitespace; the only characters that dednat4 considers as whitespace are " ", TAB, NL and CR (chars 32, 9, 10 and 13 respectively). The characters in the head of a line are removed before splitting it into words.

12. Everything about "%:" lines (and abbreviations)

The "%:" lines -- and also the "%D" lines, that we will describe soon -- are processed word by word. As heads don't count to form words, the line

%:       ^Atimes-DNC-notation    ^Atimes-conventional

has two words, "^Atimes-DNC-notation" and "^Atimes-conventional". In "%:" lines only the words that start with "^" are "active": "^Atimes-conventional" means "process the deduction tree whose root node is two lines above the "^" and output a block of TeX code of the form \defded{Atimes-conventional}{...} -- a definition for a deduction called Atimes-conventional; the definition is invoked with \ded{Atimes-conventional}.

Deduction trees are made of "nodes" and "bars". Both nodes and bars are words. The TeX code for a node is obtained by expanding all the abbreviations in the word (the functions that do that are here). Note that the expansion of an abbreviation can contain spaces -- for example:

%:*|->*\mapsto *

Bars are always words that start with either a sequence of one or more "-"s, or a sequence of one or more "="s (for double bars). The "rest" of the word of a bar, when it exists, has its abbreviations expanded and the resulting TeX code is typeset at the right of the bar.

A node can either have a bar above it or have nothing above it; a bar can have any number of nodes above it. Here "above" means "immediately above", and two words are only considered to be one above the other when their horizontal ranges have at least one character in common. In the example below the node "notabove" is not considered to be above the bar.

%: abovethebar    alsoabove    notabove
%:           ===stuffattheright
%:                            belowthebar

13. A second example: categorical diagrams

The part of the source file that starts at this point implements the support for "%D" lines. This part of dednat4 is a front-end for Michael Barr's diagxy package, which in its turn is a front-end for XYpic.

"%D" lines are processed one by one, and each word in them (except the head) is parsed (the code for the parser starts here) and then is executed. And there is a trick: some words advance the input pointer during their execution, and process the text between the old "pos" and the new "pos" in their own ways; usually they either read some words or read everything up to the end of the line.

Here is our first example of code with "%D" lines. Suppose that the file foo2.tex contains this:

\input diagxy
\def\defdiag#1#2{\expandafter\def\csname diag-#1\endcsname{#2}}
\def\diag#1{\bfig\csname diag-#1\endcsname\efig}

\input foo2.dnt

%D diagram T:F->G
%D 2Dx    100   +20  +20
%D 2D 100       A
%D 2D         / - \  
%D 2D        /  |  \
%D 2D       v   v   v
%D 2D +25 FA ------> GA
%D 2D          TA
%L PP(nodes)                 -- Lua code: dump the table `nodes'
%D (( A FA -> A GA ->
%L    PP(ds)                 -- Lua code: dump the table `ds'
%D    FA GA -> .plabel= b TA
%D    A FA GA midpoint |->
%D ))
%D enddiagram


The first word parsed is "diagram". When it is executed it reads the next word, "T:F->G", sets the name of the current diagram to that, clears the tables that hold the catalog of known nodes and arrows, and does a few other things; its code is here and here.

13.1. The 2D grid

Both "2Dx" and "2D" are words that parse everything to the end of the line and treat what they read in their own ways - as a grid with coordinates and nodes. Only columns that are below the first character of a number in the "2Dx" line have a horizontal coordinate; only lines that start with a number have a vertical coordinate (these "numbers" can start with "+", that means "the previous value plus this"). In the grid in foo2.tex only these six positions, marked as `a', `b', `c', `d', `e', and `f' below, have both a horizontal and a vertical coordinates; `e' has coordinates (140, 125).

%D 2Dx    100   +20  +20
%D 2D 100 a     b    c
%D 2D
%D 2D
%D 2D
%D 2D +25 c     d    e
%D 2D

Some words in the grid in foo2.tex -- namely, "A", "FA", "------>", and "GA", are over positions with both coordinates; those words become names of nodes with those coordinates. Most of the things that we drew on the grid are just "decorations" that are ignored by "2D"; they are there just to make the ASCII diagram look like a textual representation of the real diagram. Note that the "------>", that in a sense is just a decoration, is not ignored -- it becomes a node, but as unused nodes don't show on the picture and don't generate TeX code, we can ignore it.

After the grid in foo2.tex there's a Lua line with a command to dump the array of nodes; the source for PP is here, and the result (that is printed to stdout) is this, modulo whitespace:

{1={"noden"=1, "tag"="A", "x"=120, "y"=100},
 2={"noden"=2, "tag"="FA", "x"=100, "y"=125},
 3={"noden"=3, "tag"="------>", "x"=120, "y"=125},
 4={"noden"=4, "tag"="GA", "x"=140, "y"=125},
 "------>"={"noden"=3, "tag"="------>", "x"=120, "y"=125},
 "A"={"noden"=1, "tag"="A", "x"=120, "y"=100},
 "FA"={"noden"=2, "tag"="FA", "x"=100, "y"=125},
 "GA"={"noden"=4, "tag"="GA", "x"=140, "y"=125}

Note that entries in that table can be accessed either by a numeric id (the "noden") or by the name of the node ("tag"); some of the subtables are shared -- nodes[1] = nodes["A"] -- but the output of PP doesn't make that explicit.

There's also a table called "arrows", but at this point it is empty.

13.2. Building arrows

After that comes this code:

%D (( A FA -> A GA ->
%L    PP(ds)                 -- Lua code: dump the table `ds'
%D    FA GA -> .plabel= b TA
%D    A FA GA midpoint |->
%D ))

Dednat4 has a data stack ("ds"; we will see a dump of it soon), like Forth; it doesn't have a "return stack" like the one in Forth, as we don't need subroutines in the obvious sense of the term, at least not in the kernel; it's easy to define new words in Lua, and usually that's enough.

"((" puts a value in an auxiliary stack, called "depths", to remember how deep is the data stack at that point; in the next lines we will put many new objects in the data stack, and the "))" will get rid of all these new objects: it will drop everything above the stored depth. "((" and "))" help keeping the data stack tidy.

"A" and "FA" put two nodes on the data stack; "->" (the words for arrows are defined here) creates a new arrow, going from "A" to "FA", and puts it both on the data stack (after "A" and "FA") and on the list of arrows; the main thing that "enddiagram" does is to output TeX code for all the defined arrows, i.e., to draw them. This can only be done at the end, because some words modify attributes of arrows: the code ".plabel= b TA", a few lines after that, adds a "label" and a "position" to the arrow at the top of the stack: the text of the label is "TA", and it is to be TeXed below the arrow.

When "PP(ds)" (in Lua) dumps the data stack what we see is this, modulo whitespace:

{1={"arrown"=2, "from"=1, "shape"="->", "to"=4},
 2={"noden"=4, "tag"="GA", "x"=140, "y"=125},
 3={"noden"=1, "tag"="A", "x"=120, "y"=100},
 4={"arrown"=1, "from"=1, "shape"="->", "to"=2},
 5={"noden"=2, "tag"="FA", "x"=100, "y"=125},
 6={"noden"=1, "tag"="A", "x"=120, "y"=100}

A full description of the "node" and "arrow" structures can be found here; arrows have many optional fields. Note that the top of the stack is ds[1].

The only other new thing in this diagram is "midpoint". It is defined here, and it takes the two nodes at the top of the stack and replaces them (on the stack only!) by a new node, lying halfway between them.

We have already described "))"; in this case it makes the depth of ds go back to zero. After it "enddiagram" appends this to foo2.dnt,

\defdiag{T:F->G}{    % (find-fline "foo2.tex" 9)

And TeX typesets it into this:

13.3. More tricks for 2D diagrams

(Describe how to use "@" to refer to the elements pushed on the stack after the last "((", how to use ".tex" and ".TeX" to have several nodes with the same TeX text; also: other "shapes" of floating arrows (=>, for example), "place" and the pullback symbol; discuss the source code of the BCC diagram below; discuss the extensions in experimental.lua)

13.4. A bigger diagram

The diagram below - the Beck-Chevalley condition in a certain notation -

was produced by this code:

%D diagram LCCC-BCC
%D 2Dx         100     +30        +25      +30
%D 2D  100   {}d ===============> c,d{}		 
%D 2D          - /\               - ^		 
%D 2D          |  \\   |->        | |\BCC		 
%D 2D          v   \\             v -		 
%D 2D  +20  {}c,d <=\\=========== c,d{}{}		 
%D 2D           /\   \\             /\		 
%D 2D  +10       \\    d ===============> c,d{{}}	 
%D 2D             \\   -              \\   -	 
%D 2D              \\  |       <-|     \\  |\id	 
%D 2D               \\ v                \\ v	 
%D 2D  +20            c,d <============== c,d{{}}{}
%D 2D                                              
%D 2D  +10     a,b,c |----------> a,b		 
%D 2D              -  _|              -		 
%D 2D               \                  \		 
%D 2D                v                  v		 
%D 2D  +35            a,c |--------------> a	 
%D 2D                                              
%D ((  {}d      c,d{}		 # 0   1
%D    {}c,d     c,d{}{}	         # 2   3 
%D           d        c,d{{}}    #   4   5
%D          c,d       c,d{{}}{}  #   6   7
%D    @ 0 @ 1 =>
%D    @ 0 @ 2 |-> @ 1 @ 3 |-> sl_ .plabel= l \natural
%D                @ 1 @ 3 <-| sl^ .plabel= r \mathrm{BCC}
%D            @ 0 @ 3 harrownodes nil 20 nil |->
%D    @ 2 @ 3 <=
%D    @ 0 @ 4 <=  @ 2 @ 6 <=  @ 3 @ 7 <=
%D    @ 0 @ 2 midpoint @ 4 @ 6 midpoint dharrownodes nil 14 nil <-|
%D    @ 4 @ 5 =>  @ 4 @ 6 |-> @ 5 @ 7 |-> .plabel= r \mathrm{id}
%D            @ 4 @ 7 harrownodes nil 20 nil <-|
%D    @ 6 @ 7 <=
%D ))
%D (( a,b,c     a,b		 
%D          a,c     a	 
%D    @ 0 @ 1 |->  @ 0 @ 2 |->  @ 1 @ 3 |->  @ 2 @ 3 |->
%D    @ 0 relplace 15 7 \pbsymbol{7}
%D ))