Primitive types
===============
"char", "int", etc are primitive types, with sizeofs 1, 4, etc.
Names of types and "TD" ("type data")
=====================================
For each type name "t" the table TD has an entry describing its "type
data" - its sizeof, its name, how it reacts to the ampersand and star
operators, etc.
The table TD lets us refer to types by their names.
Array types
===========
If "t" is a type and "n" is a non-negative integer, then "t[n]" is a
type with sizeof = n * TD["t"].sizeof.
When "t" doesn't have a sizeof we can't create a type "t[n]".
If "t" is a type with a sizeof then "t[]" is a type without a sizeof.
When "t" doesn't have a sizeof then we can't create a type "t[]". We
can't create types like "char[5][][2]" or "char[5][][]".
Pointer types
=============
If "t" is a type then "t*" is a type with sizeof 4.
A type like "char[5][]**[2]" is valid.
"void" is a special primitive type with no sizeof.
"void*" is a type with sizeof 4.
Arrays and pointers in C and in peek.lua
========================================
A declaration in C like
char *(*mptr)[4][7];
corresponds to:
char *(((*mptr)[4])[7]);
If we write the "[...]"s at the left this becomes:
char *([7]([4](*mptr)));
This is the order that we will use for type names in peek.lua; it lets
us get rid of the parentheses.
In peek.lua, after a declaration corresponding to the one above, the
variable
"mptr" would have type "char*[7][4]*";
"*mptr" would have type "char*[7][4]",
"[3]*mptr" would have type "char*[7]",
"[6][3]*mptr" would have type "char*", and
"*[6][3]*mptr" would have type "char".
Struct types
============
If "t", "u" and "v" are types with sizeofs, then a declaration like
this in C,
struct s {
t a;
u b;
v c;
};
corresponds to defining these types in TD:
"struct:s"
"struct{t:a;u:b;v:c;}"
The TD entry for "struct:s" points to the entry for
"struct{t:a;u:b;v:c;}".
When we omit the name of the struct in C, as in
struct {
t a;
u b;
v c;
};
then this corresponds to having just the entry
"struct{t:a;u:b;v:c;}"
in TD; the entry "struct:s" is not created.
The sizeof of the resulting "struct" types is the sum of the sizeofs
of the fields of the struct.
We can only create a struct type when all the fields have sizeofs.
We can create a struct with field of type "t*" (that has sizeof = 4)
even if the type "t" has no sizeof, or if its sizeof is unknown at the
moment of the creation of the struct type.
Union types
===========
"union" types are like "struct" types, with "union" replacing "struct"
everywhere.
The sizeof of a "union" type is the maximum of the sizeofs of its
fields.
peek
====
We will only need to define one Lua function in C: peek. "peek(addr,
len)" returns the result of reading len bytes from the memory,
starting from the address addr, as a Lua string len bytes long.
The definition of peek is as simple as possible, and it will happily
segfault when given a bad addr.
C objects
=========
If "t" is a type with a sizeof, the a "C object with type "t" in
memory" can be seen as a triple: {addr=a, type="t", value=v}, where a
is an integer - the address where the object starts in the memory -
and v is its "value", as a sequence of bytes (a string).
If "t" is a type without a sizeof - for example, "char[]" - then we
can represent a C object of type "t" in memory as just a pair {addr=a,
type="t"}.
We can represent an "immediate object" of type "t" - for example, an
integer that is the result of an expression - as a pair {type="t",
value=v}, with v being a string sizeof "t" bytes long.
We can't have an immediate object of type "t" when "t" doesn't have a
sizeof: it wouldn't have an addr, and it couldn't have a value.
A possible notation for C objects: (type) value at addr. Examples:
(char[5]) {'@', 'A', 'B', 'C', 0} at 1000
(char[5]) $4041424300 at 1000
(char) '@' at 1000
(char) $40 at 1000
(char*) 1000 at 2000
(char*) 1000
(char[]) at 1000
Function types
==============
Consider the following program in C:
char a[] = {'@', 'A', 'B', 'C', 0};
char f(int i) {
return a[i];
}
char (*fp)(int i) = f;
The linker sess "a" as being a certain (fixed) position in the
initialized data segment - the beginning of the five bytes of the
array - and "f" as being a certain (fixed) position in the code
segment. Also,
a[2] is a char,
f(2) is a char,
(*fp)[2] is a char -
so arrays and functions are similar, and if we write the "(...)"s at
the left as we did with the "[...]"s, we have that
"[2]a" is of type "char",
"a" is of type "char[]",
"(2)f" is of type "char",
"f" is of type "char(int:i)",
"(2)*fp" is of type "char",
"*fp" is of type "char(int:i)",
"fp" is of type "char(int:i)*".
The rule for constructing function types is this: if "t" is either a
type with a sizeof or "void", and if "u", "w", and "w" are either
types with sizeofs or types of the form "x[]", and if "a", "b", and
"c" are names for variables, then
"t(u:a,v:b,w:c)",
"t(u:a,v:b,w:c,...)",
"t(void)",
"t()"
are types with no sizeofs.
Dereferencing
=============
In C arrays are "dereferenced" into pointers when they are used as
values, and if "t" is a type with a sizeof, then "t*" and "t[]" as
almost equivalent when used as types in argument lists of functions.
The code generated for the functions g1 and g2 below is the same:
char a[] = {'@', 'A', 'B', 'C', 0};
char b[] = {'@', 'A', 'B', 'C', 0};
int g1(char *cp) {
return cp[2];
}
int g2(char cp[]) {
return cp[2];
}
int h() { g1(a); g2(a); }
[Two differences: inside g1 we could add a line like "cp = b", but
inside g2 that would be invalid; and inside g1 and g2 the "cp"s would
react differently to "&" - give details using a notation like
"(char*)1000" and "(char[]) at 1000"...]
|