Peek.lua

Quick
index
main
eev
eepitch
maths
angg
blogme
dednat6
littlelangs
PURO
(C2,C3,C4,
λ,ES,
GA,MD,
Caepro,
textos,
Chapa 1)

emacs
lua
(la)tex
maxima
git
lean4
agda
forth
squeak
icon
tcl
tikz
fvwm
debian
irc
contact
☿

Peek.lua

Inspirations:
  PiL:     (find-pilw3m "25.3.html" "A Generic Call Function")
  cdecl:   (find-es "anatocc" "cdecl")
  cinvoke: (find-es "lua5" "cinvoke")
  evil.rb: (find-es "ruby" "evil.rb")
  gdb:     (find-es "gdb" "C-types")
  swig:    (find-es "swig" "C-types-as-strings")

Note: most of this is quite old (from 2007). The updates (2011/2012) are at the bottom of the page.

The code of peek.lua doesn't do much at this moment - it understands files like these, builds the internal structures, and is able to "peek" whole structures from the memory at once (no field extractions yet). But the code doesn't matter so much - what's really important is that its representation for C types as strings gives us a way to reason about C objects from inside Lua programs - or from programs written in Ruby, Python, Forth, etc - the idea is portable. In particular, peek.lua's representation of types can be used to explain and compare how swig, tolua, cinvoke, etc access C objects from Lua.

The part in C of peek.lua is just this, and it will not grow.

(find-es "davinci" "peek.lua:doc")

We need to start by creating a variant of C - let me call it Middle-C - in which some of unary operators - [...]s for array declarations and accesses, (...)s for function declarations and accesses - appear in the position opposite from that from in C.

Declarations
============
In C:                    In MiddleC:
-----                    -----------
char a[4];               char[4] a;
char *(*mptr)[4][7];     char*[7][4]* mptr;
int f(float x, char c);  int(float x, char c) f;
                     or: int(float,char) f;

Expressions
===========
In C:                 In MiddleC:
a[3]                  [3]a
*(*mptr)[3][6]        *[6][3]*mptr
f(0.99)               (0.99)f

The original docs (incomplete, as always...):

Primitive types
===============
"char", "int", etc are primitive types, with sizeofs 1, 4, etc.

Names of types and "TD" ("type data")
=====================================
For each type name "t" the table TD has an entry describing its "type
data" - its sizeof, its name, how it reacts to the ampersand and star
operators, etc.

The table TD lets us refer to types by their names.

Array types
===========
If "t" is a type and "n" is a non-negative integer, then "t[n]" is a
type with sizeof = n * TD["t"].sizeof.

When "t" doesn't have a sizeof we can't create a type "t[n]".

If "t" is a type with a sizeof then "t[]" is a type without a sizeof.

When "t" doesn't have a sizeof then we can't create a type "t[]". We
can't create types like "char[5][][2]" or "char[5][][]".

Pointer types
=============
If "t" is a type then "t*" is a type with sizeof 4.

A type like "char[5][]**[2]" is valid.

"void" is a special primitive type with no sizeof.

"void*" is a type with sizeof 4.

Arrays and pointers in C and in peek.lua
========================================
A declaration in C like

  char *(*mptr)[4][7];

corresponds to:

  char *(((*mptr)[4])[7]);

If we write the "[...]"s at the left this becomes:

  char *([7]([4](*mptr)));

This is the order that we will use for type names in peek.lua; it lets
us get rid of the parentheses.

In peek.lua, after a declaration corresponding to the one above, the
variable

  "mptr" would have type "char*[7][4]*";
  "*mptr" would have type "char*[7][4]",
  "[3]*mptr" would have type "char*[7]",
  "[6][3]*mptr" would have type "char*", and
  "*[6][3]*mptr" would have type "char".

Struct types
============

If "t", "u" and "v" are types with sizeofs, then a declaration like
this in C,

  struct s {
    t a;
    u b;
    v c;
  };

corresponds to defining these types in TD:

  "struct:s"
  "struct{t:a;u:b;v:c;}"

The TD entry for "struct:s" points to the entry for
"struct{t:a;u:b;v:c;}".

When we omit the name of the struct in C, as in

  struct {
    t a;
    u b;
    v c;
  };

then this corresponds to having just the entry 

  "struct{t:a;u:b;v:c;}"

in TD; the entry "struct:s" is not created.

The sizeof of the resulting "struct" types is the sum of the sizeofs
of the fields of the struct.

We can only create a struct type when all the fields have sizeofs.

We can create a struct with field of type "t*" (that has sizeof = 4)
even if the type "t" has no sizeof, or if its sizeof is unknown at the
moment of the creation of the struct type.

Union types
===========
"union" types are like "struct" types, with "union" replacing "struct"
everywhere.

The sizeof of a "union" type is the maximum of the sizeofs of its
fields.

peek
====
We will only need to define one Lua function in C: peek. "peek(addr,
len)" returns the result of reading len bytes from the memory,
starting from the address addr, as a Lua string len bytes long.

The definition of peek is as simple as possible, and it will happily
segfault when given a bad addr.

C objects
=========
If "t" is a type with a sizeof, the a "C object with type "t" in
memory" can be seen as a triple: {addr=a, type="t", value=v}, where a
is an integer - the address where the object starts in the memory -
and v is its "value", as a sequence of bytes (a string).

If "t" is a type without a sizeof - for example, "char[]" - then we
can represent a C object of type "t" in memory as just a pair {addr=a,
type="t"}.

We can represent an "immediate object" of type "t" - for example, an
integer that is the result of an expression - as a pair {type="t",
value=v}, with v being a string sizeof "t" bytes long.

We can't have an immediate object of type "t" when "t" doesn't have a
sizeof: it wouldn't have an addr, and it couldn't have a value.

A possible notation for C objects: (type) value at addr. Examples:

  (char[5]) {'@', 'A', 'B', 'C', 0} at 1000
  (char[5]) $4041424300 at 1000
  (char) '@' at 1000
  (char) $40 at 1000
  (char*) 1000 at 2000
  (char*) 1000
  (char[]) at 1000

Function types
==============

Consider the following program in C:

  char a[] = {'@', 'A', 'B', 'C', 0};
  char f(int i) {
    return a[i];
  }
  char (*fp)(int i) = f;

The linker sess "a" as being a certain (fixed) position in the
initialized data segment - the beginning of the five bytes of the
array - and "f" as being a certain (fixed) position in the code
segment. Also,

  a[2] is a char,
  f(2) is a char,
  (*fp)[2] is a char -

so arrays and functions are similar, and if we write the "(...)"s at
the left as we did with the "[...]"s, we have that

  "[2]a" is of type "char",
     "a" is of type "char[]",
  "(2)f" is of type "char",
     "f" is of type "char(int:i)",
  "(2)*fp" is of type "char",
     "*fp" is of type "char(int:i)",
      "fp" is of type "char(int:i)*".

The rule for constructing function types is this: if "t" is either a
type with a sizeof or "void", and if "u", "w", and "w" are either
types with sizeofs or types of the form "x[]", and if "a", "b", and
"c" are names for variables, then

  "t(u:a,v:b,w:c)",
  "t(u:a,v:b,w:c,...)",
  "t(void)",
  "t()"

are types with no sizeofs.

Dereferencing
=============

In C arrays are "dereferenced" into pointers when they are used as
values, and if "t" is a type with a sizeof, then "t*" and "t[]" as
almost equivalent when used as types in argument lists of functions.

The code generated for the functions g1 and g2 below is the same:

  char a[] = {'@', 'A', 'B', 'C', 0};
  char b[] = {'@', 'A', 'B', 'C', 0};
  int g1(char *cp) {
    return cp[2];
  }
  int g2(char cp[]) {
    return cp[2];
  }
  int h() { g1(a); g2(a); }

[Two differences: inside g1 we could add a line like "cp = b", but
inside g2 that would be invalid; and inside g1 and g2 the "cp"s would
react differently to "&" - give details using a notation like
"(char*)1000" and "(char[]) at 1000"...]

The first prototype implementation, written in 2007,
  (find-angg "DAVINCI/peek.lua")
  (find-angg "DAVINCI/peek-luadecls-1.txt")
  (find-angg "DAVINCI/peek-luadecls-2.txt")
didn't work well because when I wrote it I didn't know that GCC aligns
all "int"s and "short"s in structures... and it would have been so
hard to add alignment information to that program that I gave up, and
decided to rewrite everything from scratch.

The current prototype, written in dec 2011,
  (find-angg "peek/")
  (find-angg "peek/ctypes2.lua")
  (find-angg "peek/peek-0.0.1-0.rockspec")
is part of a bigger project that intends to add some introspection
facilities to Lua, and to create a (VERY unsafe) patch to the Lua
interpreter to let us run debug.getinfo and debug.setinfo on stack
frames starting in arbitrary memory addresses. Links:
  (find-lua51manualw3m       "#lua_getinfo")
  (find-lua51manualw3m "#pdf-debug.getinfo")
  (find-lua51manualw3m       "#lua_setinfo")
  (find-lua51manualw3m "#pdf-debug.setinfo")
  (find-angg "LUA/lua50init.lua" "mytraceback")
  (find-angg "LUA/lua50init.lua" "errorfb_line")
  (find-es "lua5" "xpcall")