tendra-doc-4.1.2.orig/ 40775 1750 1750 0 6626015352 14004 5 ustar broonie broonie tendra-doc-4.1.2.orig/doc/ 40755 1750 1750 0 6507734505 14555 5 ustar broonie broonie tendra-doc-4.1.2.orig/doc/diag/ 40755 1750 1750 0 6507734466 15467 5 ustar broonie broonie tendra-doc-4.1.2.orig/doc/diag/diag1.html 100644 1750 1750 5032 6466607533 17435 0 ustar broonie broonie
The version of the diagnostics described here is the first version. It has only been tested with TDF produced from C programs. There are known to be certain deficiencies relative to other languages (in particular to FORTRAN). A later version will correct these deficiencies. The changes already envisaged are detailed in section 4, and would have minimal (if any) impact on C producers.
The diagnostic system introduces one new type of TDF linkable entities, and currently adds two new units to the bitstream representation of TDF.
Much of the actual annotation of procedure bodies is currently done
by reserved TOKEN
s, which installers recognize specially.
These TOKEN
s are described in section
3.
There is a resemblance between the TDF diagnostic information and Unix International's DWARF format. DWARF has similar aims to the TDF diagnostics, and ensuring that complete DWARF information could be generated provided a useful check during the development of the TDF diagnostics. However the TDF diagnostics are intended to be architecture (and format) neutral. No inference should be made about any link (present or future) between DWARF and TDF diagnostics.
Part of the TenDRA Web.
Crown
Copyright © 1998.
DIAG_TYPE
s describe programming language types (e.g.
arrays, structs...). DIAG_TQ
s are qualifiers of DIAG_TYPE
s used for attributes like volatile and const.
FILENAME
s and SOURCEMARK
s describe source
files and locations within them.
DIAG_TAG
s associate integers with DIAG_TYPE
s.
They are used in a similar manner to normal TDF TAG
s,
and are held in a (TDF) linkable unit called a DIAG_TYPE_UNIT
.
DIAG_UNIT
s hold a collection of DIAG_DESCRIPTOR
s,
used for information outside procedure bodies.
DIAG_DESCRIPTOR
s are used to associate names in the source
program with diagnostic items.
src_name: TDFSTRING(k, n) whence: SOURCEMARK found_at: EXP POINTER(al) type: DIAG_TYPE -> DIAG_DESCRIPTORGenerates a descriptor for an identifier (of
DIAG_TYPE
type), whose source name was src_name from source location
whence. The EXP
found_at describes how
to access the value. Note that the EXP
need not be unique
(e.g. FORTRAN EQUIVALENCE might be implemented this way).
src_name: TDFSTRING(k, n) whence: SOURCEMARK new_type: DIAG_TYPE -> DIAG_DESCRIPTORGenerates a descriptor whose source name was src_name. new_type must be either a
DIAG_STRUCT
, DIAG_UNION
or DIAG_ENUM
.This construct is obsolete.
src_name: TDFSTRING(k, n) whence: SOURCEMARK new_type: DIAG_TYPE -> DIAG_DESCRIPTORGenerates a descriptor for a type new_type whose source name was src_name. Note that diag_desc_typedef is used for associating a name with a type, rather than for any name given in the initial description of the type (e.g. in C this is used for typedef, not for struct/union/enum tags).
A DIAG_UNIT
is a TDF unit containing DIAG_DESCRIPTOR
s.
A DIAG_UNIT
is used to contain descriptions of items
outside procedure bodies (e.g. global variables, global type definitions).
no_labels: TDFINT descriptors: SLIST(DIAG_DESCRIPTOR) -> DIAG_UNITCreate a
DIAG_UNIT
containing DIAG_DESCRIPTOR
s.
no_labels is the number of local labels used in descriptors
(for conditionals).
DIAG_TAG
s are used inter alia to break cyclic
diagnostic types. They are (TDF) linkable entities. A DIAG_TAG
is made from a number, and used in use_diag_tag to obtain the
DIAG_TYPE
associated with that number by make_diag_tagdef.
num: TDFINT -> DIAG_TAGCreate a
DIAG_TAG
from num.
DIAG_TAGDEF
s associate DIAG_TAG
s with DIAG_TYPE
s.
tno: TDFINT dtype: DIAG_TYPE -> DIAG_TAGDEFAssociates tag number tno with dtype.
A DIAG_TYPE_UNIT
is a TDF unit containing DIAG_TAGDEF
s.
no_labels: TDFINT tagdefs: SLIST(DIAG_TAGDEF) -> DIAG_TYPEUNITCreate a
DIAG_TYPEUNIT
containing DIAG_TAGDEF
s.
no_labels is the number of local labels used in tagdefs
(for conditionals).
DIAG_TYPE
s are used to provide diagnostic information
about data types.
token_value: TOKEN token_args: BITSTREAM -> DIAG_TYPEThe token is applied to the arguments to give a
DIAG_TYPE
.
If there is a definition for token_value in the CAPSULE
then token_args is a BITSTREAM
encoding of the
SORT
s of its parameters, in the order specified.
element_type: DIAG_TYPE stride: EXP OFFSET(p,p) lower_bound: EXP INTEGER(v) upper_bound: EXP INTEGER(v) index_type: DIAG_TYPE -> DIAG_TYPEAn array of element_type objects. stride is the
OFFSET
between elements of the array (i.e. p is described by element_type).
The bounds are in general not runtime constants, hence the values
are EXP
s (not say SIGNED_NAT
). The VARIETY
v is described by index_type. As in TDF there is no
multi-dimensional array primitive.
type: DIAG_TYPE number_of_bits: NAT -> DIAG_TYPEDescribes number_of_bits, which when extracted will have
DIAG_TYPE
type.
base_type: DIAG_TYPE enum_name: TDFSTRING(k, n) values: LIST(ENUM_VALUES) -> DIAG_TYPEAn enumeration to be stored in an object of type base_type. If enum_name is a string contining zero characters this signifies no source tag.
var: FLOATING_VARIETY -> DIAG_TYPECreates a
DIAG_TYPE
to describe an FLOATING_VARIETY
var.
object: DIAG_TYPE qualifier: DIAG_TQ -> DIAG_TYPERecords the existence of an item of
DIAG_TYPE
object,
qualified by qualifier. diag_loc is used for variables
(which may of course not actually occupy a memory location).
params: LIST(DIAG_TYPE) optional_args: BOOL result_type: DIAG_TYPE -> DIAG_TYPEDescribes a procedure taking n parameters. optional_args is true if and only if the make_proc which this diag_proc describes had vartag present.
object: DIAG_TYPE qualifier: DIAG_TQ -> DIAG_TYPEDescribes a pointer to an object of
DIAG_TYPE
object.
The DIAG_TQ
qualifier qualifier qualifies the
pointer, not the object pointed to.
tdf_shape: SHAPE src_name: TDFSTRING(k, n) fields: LIST(DIAG_FIELD) -> DIAG_TYPEDescribes a structure. If src_name is a string contining zero characters this signifies no source tag for the whole structure. tdf_shape allows the total size to be computed.
-> DIAG_TYPEA null
DIAG_TYPE
.
tdf_shape: SHAPE src_name: TDFSTRING(k, n) fields: LIST(DIAG_FIELD) -> DIAG_TYPEDescribes a union. If src_name is a string contining zero characters this signifies no source tag for the whole union. tdf_shape allows the total size to be computed.
var: VARIETY -> DIAG_TYPECreates a
DIAG_TYPE
to describe an integer VARIETY
var.
dtag: DIAG_TAG -> DIAG_TYPEObtains the
DIAG_TYPE
associated with DIAG_TAG
dtag.
value: EXP sh src_name: TDFSTRING(k, n) -> ENUM_VALUES
ENUM_VALUES
describe elements of an enumerated type.
src_name is the source language name. value evaluates
to a value of SHAPE
sh. Note that all members
of a LIST(ENUM_VALUES
) must have the same sh.
field_name: TDFSTRING(k, n) found_at: EXP OFFSET( ALIGNMENT whole, ALIGNMENT this_field ) field_type: DIAG_TYPE -> DIAG_FIELD
DIAG_FIELD
s describe one field of a structure or union.
field_name is the source language name. found_at is
the OFFSET
between whole (the enclosing structure
or union), and this field (this_field). field_type is
the DIAG_TYPE
of the field.
DIAG_TQ
s are type qualifiers, used to qualify DIAG_TYPE
s. A DIAG_TQ
is constructed from diag_tq_null
and the various add_diag_XXX operations.
qual: DIAG_TQ -> DIAG_TQMarks a
DIAG_TQ
type qualifier as being const
in the ANSI C sense.
qual: DIAG_TQ -> DIAG_TQMarks a
DIAG_TQ
type qualifier as being volatile
in the ANSI C sense.
-> DIAG_TQCreate a null
DIAG_TQ
type qualifier.
FILENAME
record details of source files used in producing
a CAPSULE
. They can be tokenised for abbreviation.
token_value: TOKEN token_args: BITSTREAM -> FILENAMEThe token is applied to the arguments to give a
FILENAME
.
If there is a definition for token_value in the CAPSULE
then token_args is a BITSTREAM
encoding of the
SORT
s of its parameters, in the order specified.
date: NAT machine: TDFSTRING(k1, n1) file: TDFSTRING(k2, n2) -> FILENAMECreate a
FILENAME
for file file, dated date
(a UNIX timestamp; seconds since 1 Jan 1970) on machine machine.
A SOURCEMARK
records a location in the source program.
Present SOURCEMARK
s assume that a location can be described
by one or two numbers within a FILENAME
.
file: FILENAME line_no: NAT char_offset: NAT -> SOURCEMARKCreate a
SOURCEMARK
referencing the char_offset'th
character on line line_no in file file.char_offset is counted from 1, 0 meaning that no character offset is available.
Part of the TenDRA Web.
Crown
Copyright © 1998.
TOKEN
s were used for diagnostic extensions to
EXP
s, to avoid adding new constructs the contents of
an existing UNIT
. All other parts of the diagnostic system
occur in other UNIT
s.
body: EXP sh from: SOURCEMARK to: SOURCEMARK -> EXP shRecords that the
EXP
body arose from translating
program between SOURCEMARK
from and SOURCEMARK
to (inclusive).
body: EXP sh name: TDFSTRING(k, n) access: EXP POINTER(al) type: DIAG_TYPE -> EXP shWithin the
EXP
body a variable named name
of DIAG_TYPE
type can be accessed via the EXP
access.
body: EXP sh name: TDFSTRING(k, n) type: DIAG_TYPE -> EXP shWithin the
EXP
body a source language type named
name of DIAG_TYPE
type is valid.
body: EXP sh name: TDFSTRING(k, n) type: DIAG_TYPE -> EXP shThis
TOKEN
is obsolete.
Part of the TenDRA Web.
Crown
Copyright © 1998.
In several places below the absence of "standardised methods" is noted. These are cases where TDF can express some operation in several ways, and the installer cannot be expected to spot all of them and generate new diagnostic info.
EXP
s are
sufficiently powerful, but needs some rules)
VARIETY
s etc. used for them. Present installers
for 32 bit machines cannot distinguish between int and long
when generating diagnostics, other than by means of the standard token
names which form part of the C producer language interface.
DIAG_TYPE
s have names, some don't.
I suspect we should make a separate is_named operation and
remove the other names.
Part of the TenDRA Web.
Crown
Copyright © 1998.
The current C producer does this for some of the constructs, but not in any systematic manner; perhaps it will change.
Part of the TenDRA Web.
Crown
Copyright © 1998.
Throughout this document, references like (S5.1) are headings in the TDF specification, Issue 4.0. I use the term "compiling" or "producing" to mean the production of TDF from some source language and "translating" to mean making a program for some specific platform from TDF.
I use the first person where I am expressing my own opinions or preferences; these should not be taken as official opinions of DRA or the TenDRA team.
Part of the TenDRA Web.
Crown
Copyright © 1998.
The question then arises of what is meant by overflow in an operation
which is meant to deliver an integer of this VARIETY - is it when
the integer result is outside the range (1..10) or outside the range
(0..255)? For purely pragmatic reasons, TDF chooses the latter - the
result is overflowed when it is outside its representational range
(0..255). If the program insists that it must be within (1..10), then
it can always test for it. If the program uses the error handling
mechanism and the result is outside (1..10) but still within the representational
limits, then, in order for the program to be portable, then the error
handling actions must in some sense be "continuous" with
the normal action. This would not be the case if, for example, the
value was used to index an array with bounds (1..10), but will usually
be the case where the value is used in further arithmetic operations
which have similar error handling. The arithmetic will continue to
give the mathematically correct result provided the representational
bounds are not exceeded.
The limits in a VARIETY are there to provide a guide to its representation,
and not to give hard limits to its possible values. This choice is
consistent with the general TDF philosophy of how exceptions are to
be treated. If, for example, one wishes to do array-bound checking,
then it must be done by explicit tests on the indices and jumping
to some exception action if they fail. Similarly, explicit tests can
be made on an integer value, provided its representational limits
are not exceeded. It is unlikely that a translator could produce any
more efficient code, in general, if the tests were implicit. The representational
limits can be exceeded in arithmetic operations, so facilities are
provided to either to ignore it , to allow one to jump to a label
, or to obey a TDF exception handler if it happens.
The ERROR_TREATMENT , impossible, is an assertion by the producer
that overflow will not occur; on its head be it if it does.
The ERROR_TREATMENTS continue and wrap give "fixup" values
for the result. For continue the fixup value is undefined. For wrap,
the the answer will be modulo 2 to the power of the number of bits
in the representational variety.Thus, integer arithmetic with byte
representational variety is done modulo 256. This just corresponds
to what happens in most processors and, incidentally, the definition
of C.
The ERROR_TREATMENT that one would use if one wished to jump to a
label is error_jump:
The ERROR_TREATMENT, trap(overflow) will raise a TDF exception(see
section 6.3 on page 35)with ERROR_CODE
overflow if overflow occurs.
Conversions from integer to floating are done by float_int and from
floating to integers by round_with_mode . This latter constructor
has a parameter of SORT ROUNDING_MODE which effectively gives the
IEEE rounding mode to be applied to the float to produce its integer
result.
One can extract the real and imaginary parts of a complex FLOATING
using real_part and imaginary_part. A complex FLOATING can be constructed
using make_complex. Normal complex arithmetic applies to all the other
FLOATING constructors except for those explicitly excluded (eg floating_abs,
floating_max etc.)
Constant sized array values may be constructed using make_nof, make_nof_int
(see section 8.7 on page 42), and n_copies. Again,
they only occur in C as constants in global definitions.
Part of the TenDRA Web.
An initialisation of a TAGDEF is a constant in this sense
*; this allows one
to ignore any difficulties about their order of evaluation in the
UNIT and consequently the order of evaluation of UNITs. Once again
all the EXPs which are initialisations must be evaluated before the
program is run; this obviously includes any make_proc or make_general_proc.
. The limitation on an initialisation EXP to ensure this is basically
that one cannot take the contents of a variable declared outside the
EXP after all tokens and conditional evaluation is taken into account.
In other words, each TDF translator effectively has an TDF interpreter
which can do evaluation of expressions (including conditionals etc.)
involving only constants such as numbers, sizes and addresses of globals.
This corresponds very roughly to the kind of initialisations of globals
that are permissible in C; for a more precise definition, see (S7.3)
Constants for both floats and strings use STRINGs. A constant string
is just an particular example of make_nof_int:
A floating constant uses a STRING which contains the ASCI characters
of a expansion of the number to some base in make_floating:
The make_floating construct does not apply apply to a complex FLOATING_VARIETY
f; to construct a complex constant use make_complex with two
make_floating arguments.
Constants are also provided to give unique null values for pointers,
label values and procs i.e.: make_null_ptr, make_null_local_lv and
make_null_proc. Any significant use of these values (e.g. taking the
contents of a null pointer) is undefined, but they can be assigned
and used in tests in the normal way.
Part of the TenDRA Web.
The most commonly quoted example is the representation of the type
FILE and its related operations in C. The ANSI C definition gives
no common representation for FILE; its implementation is defined to
be platform-dependent. A TDF producer can assume nothing about FILE;
not even that it is a structure. The only things that can alter or
create FILEs are also entities in the Ansi-C API and they will always
refer to FILEs via a C pointer. Thus TDF abstracts FILE as a SHAPE
TOKEN with no parameters, make_tok(T_FILE) say. Any program that uses
FILE would have to include a TOKDEC introducing T_FILE:
Many of the C operations which use FILEs are explicitly allowed to
be expanded as either procedure calls or as macros. For example, putc(c,f)
may be implemented either as a procedure call or as the expansion
of macro which uses the fields of f directly. Thus, it is quite natural
for putc(c, f) to be represented in TDF as an EXP TOKEN with two EXP
parameters which allows it to be expanded in either way. Of course,
this would be quite distinct from the use of putc as a value (as a
proc parameter of a procedure for example) which would require some
other representation. One such representation that comes to mind might
be to simply to make a TAGDEC for the putc value, supplying its TAGDEF
in the Ansi API CAPSULE for the platform. This might prove to be rather
short-sighted, since it denies us the possibility that the putc value
itself might be expanded from other values and hence it would be better
as another parameterless TOKEN. I have not come across an actual API
expansion for the putc value as other than a simple TAG; however the
FILE* value stdin is sometimes expressed as:
There will be CAPSULEs for each API on each platform giving the expansions
for the TOKENs involved, usually as uses of identifiers which will
be supplied by system linking from some libraries. These CAPSULEs
would be derived from the header files on the platform for the API
in question, usually using some automatic tools. For example, there
will be a TDF CAPSULE (derived from <stdio.h>) which defines
the TOKEN T_FILE as the SHAPE for FILE, together with definitions
for the TOKENs for putc, stdin, etc., in terms of identifiers which
will be found in the library libc.a.
An interesting example in the use of TOKENs comes in abstracting field
names. Often, an API will say something like "the type Widget
is a structure with fields alpha, beta ..." without specifying
the order of the fields or whether the list of fields is complete.
The field selection operations for Widget should then be expressed
using EXP OFFSET TOKENs; each field would have its own TOKEN giving
its offset which will be filled in when the target is known. This
gives implementors on a particular platform the opportunity to reorder
fields or add to them as they like; it also allows for extension of
the standard in the same way.
The most common SORTs of TOKENs used for APIs are SHAPEs to represent
types, and EXPs to represent values, including procedures and constants.
NATs and VARIETYs are also sometimes used where the API does not specify
the types of integers involved. The other SORTs are rarely used in
APIs; indeed it is difficult to imagine any realistic use of
TOKENs of SORT BOOL. However, the criterion for choosing which SORTs
are available for TOKENisation is not their immediate utility, but
that the structural integrity and simplicity of TDF is maintained.
It is fairly obvious that having BOOL TOKENs will cause no problems,
so we may as well allow them.
The notion of a producer for any language working to an LPI specific
to the constructs of the language is very attractive. It could use
different TOKENs to reflect the subtle differences between uses of
similar constructs in different languages which might be difficult
or impossible to detect from their expansions, but which could allow
better optimisations in the object code. For example, Fortran procedures
are slightly different from C procedures in that they do not allow
aliasing between parameters and globals. While application of the
standard TDF procedure calls would be semantically correct, knowledge
of that the non-aliasing rule applies would allow some procedures
to be translated to more efficient code. A translator without knowledge
of the semantics implicit in the TOKENs involved would still produce
correct code, but one which knew about them could take advantage of
that knowledge.
I also think that LPIs would be a very useful tool for crystalising
ideas on how languages should be translated, allowing one to experiment
with expansions not thought of by the producer writer. This decoupling
is also an escape clause allowing the producer writer to defer the
implementation of a construct completely to translate-time or link-time,
as is done at the moment in C for off-stack allocation. As such it
also serves as a useful test-bed for TOKEN constructions which may
in future become new constructors of core TDF.
Part of the TenDRA Web.
Why would a translator wish to make the more complicated form from
the simpler one? This would usually depend on the particular forms
of e1 and e2 as well as the machine idioms available for implementing
the assignment. If, for example, the joint evaluation of e1 and e2
used more evaluation registers than is available, the transformation
is probably a good idea. It would not necessarily commit one to putting
the new tag value into the stack; some other more global criteria
might lead one to allocate it into a register disjoint from the evaluation
registers. In general, this kind of transformation is used to modify
the operands of TDF constructions so that the code-production phase
of the translator can just "churn the handle" knowing that
the operands are already in the correct form for the machine idioms.
Transformations like this are also used to give optimisations which
are largely independent of the target architecture. In general, provided
that the sequencing rules are not violated, any EXP construction,
F(X), say, where X is some inner EXP, can be replaced by:
Most of the transformations performed by translators are of the above
form, but there are many others. Particular machine idioms might lead
to transformations like changing a test (i>=1) to (i>0) because
the test against zero is faster; replacing multiplication by a constant
integer by shifts and adds because multiplication is slow; and so
on. Target independent transformations include things like procedure
inlining and loop unrolling. Often these target independent transformations
can be profitably done in terms of the TOKENs of an LPI; loop unrolling
is an obvious example.
Many of these transformations often only come into play when some
previous transformation reveals some otherwise hidden information.
For example, after procedure in-lining given by {U} or loop un-rolling
given by {S}, a translator can often deduce the behaviour of a _test
constructor, replacing it by either a make_top or a goto. This may
allow one to apply either {J} or {H} to eliminate dead code in sequences
and in turn {N} or {P} to eliminate entire conditions and so on.
Application of transformations can also give expansions which are
rather pathological in the sense that a producer is very unlikely
to form them. For example, a procedure which returns no result would
have result statements of the form return(make_top()). In-lining such
a procedure by {U} would have a form like:
Part of the TenDRA Web.
From the definition, we find that:
The most common use of OFFSETs is in add_to_ptr to compute the address
of a structure or array element. Looking again at its signature in
a slightly different form:
The difficulties inherent in this may be illustrated by attempting
to construct an array of bitfields using the nof constructor. Assuming
a standard architecture, we find that we cannot usefully define an
object of SHAPE nof(N, bitfield(bfvar_bits(b, M))) without padding
this shape out to some integer variety which can contain M bits. In
addition, they can only be usefully indexed (using bitfield_contents)either
if M is some power of 2 or M*N is less than the length of the maximum
integer variety. Thus a producer must be sure that these conditions
hold if he is to generate and index this object simply. Even here
he is in some dificulty, since he does not know the representational
varieties of the integers available to him; also it is difficult for
him to ensure that the alignment of the entire array is in some sense
minimal. Similar difficulties occur with bitfields in structures -
they are not restricted to arrays.
The solution to this conundrum in its full generality requires knowledge
of the available representational varieties. Particular languages
have restrictions which means that sub-optimal solutions will satisfy
its specification on the use of bitfields. For example, C is satisfied
with bitfields of maximum length 32 and simple alignment constraints.
However, for the general optimal solution, I can see no reasonable
alternative to the installer defining some tokens to produce bitfield
offsets which are guaranteed to obey the alignment rules and also
give optimal packing of fields and alignments of the total object
for the platform in question. I believe that three tokens are sufficient
to do this; these are analogous to the constructors offset_zero, offset_pad
and offset_mult with ordinary alignments and their signatures could
be:
These tokens can be expressed in TDF if the lengths of the available
varieties are known, i.e., they are installer defined
*. They ought to be
used in place of offset_zero, offset_pad and offset_mult whereever
the alignment or shape (required to construct a SHAPE or an argument
to the bitfield constructs) is a pure bitfield. The constructor nof
should never be used on a pure bitfield; instead it should be replaced
by:
Part of the TenDRA Web.
In common with most homomorphisms, the mappings of constructions can
exhibit more equivalences than are given by the abstract algebra.
The use of these extra equivalences is the basis of most of the target-dependent
optimisations in a TDF translator; it can make use of "idioms"
of the target architecture to produce equivalent constructions which
may work faster than the "obvious" translation. In addition,
we may find that may find that more predicates are satisfied in a
mapping than would be in the abstract algebra. A particular concrete
mapping might allow more constructions to be well-formed than are
permitted in the abstract; a producer can use this fact to target
its output to a particular class of architectures. In this case, the
producer should produce TDF so that any translator not targeted to
this class can fail gracefully.
Giving a complete mapping for a particular architecture here is tantamount
to writing a complete translator. However, the mappings for the small
but important sub-algebra concerned with OFFSETs and ALIGNMENTs illustrates
many of the main principles. What follows is two sets of mappings
for disparate architectures; the first gives a more or less standard
meaning to ALIGNMENTs but the second may be less familiar.
We are not allowed to construct pointers where REPR[A] = 1, but we
still have offsets whose second alignment is a bitfield. Thus a offsets
to bitfield are represented differently to offsets to other alignments:
A value with SHAPE OFFSET(A, B) where REPR(B) xb9 1 is represented
by a 32-bit byte-offset.
A value with SHAPE OFFSET(A, B) where REPR(B) = 1 is represented by
a 32-bit bit-offset.
Let us assume that the machine has bit-addressing to avoid the bit
complications already covered in the first model. Also assume that
instruction blocks are just scalar blocks and that block addresses
are aligned on 32-bit boundaries.
A value with SHAPE OFFSET(A, B) where ÿ REPR[ A ].b is represented
by an integer bit-displacement.
A value with SHAPE OFFSET(A, B) where REPR[ A ].b is represented by
a pair of bit-displacements, one relative to a scalar-block and the
other to an address-block. Denote this by:
Unlike the previous one, this model of ALIGNMENTs would reject OFFSETs
such as OFFSET({long_variety}, {pointer}) but not OFFSET( {pointer},
{long_variety}) since:
Part of the TenDRA Web.
I shall continue tracking further revisions of the TDF specification
in later releases or appendices to this document.
Part of the TenDRA Web.
Throughout this document, references like (S5.1)
Changes between Issue 3.0 and Issuue 4.0 are marked by change bars.
Also I use the first person where I am expressing my own opinions
or preferences; these should not be taken as official opinions of
DRA or the TenDRA team.
Part of the TenDRA Web.
At the moment, there are 58 different SORTS, from ACCESS to VARIETY
given in tables 1 and 2. Some of these have familiar analogues in
standard language construction as with EXP and TAG above. Others will
be less familiar since TDF must concern itself with issues not normally
addressed in language definitions. For example, the process of linking
together TDF programs is at the root of the architecture neutrality
of TDF and so must form an integral part of its definition. On the
other hand, TDF is not meant to be a language readily accessible to
the human reader or writer; computers handle it much more easily.
Thus a great many choices have been made in the definition which would
be intolerable in a standard language definition for the human programmer
but which, paradoxically enough, make it much simpler for a computer
to produce and analyse TDF
The SORTs and constructors in effect form a multi-sorted algebra.
There were two principal reasons for choosing this algebraic form
of definition. First, it is easy to extend - a new operation on existing
constructs simply requires a new constructor. Secondly, the algebraic
form is highly amenable to the automatic construction of programs.
Large parts of both TDF producers and TDF translators have been created
by automatic transformation of the text of the specification document
itself, by extracting the algebraic signature and constructing C program
which can read or produce TDF. To this extent, one can regard the
specification document as a formal description of the free algebra
of TDF SORTs and constructors. Of course, most of the interesting
parts of the definition of TDF lies in the equivalences of parts of
TDF, so this formality only covers the easy bit.
Another distinction between the TDF definition and language syntactic
description is that TDF is to some extent conscious of its own SORTs
so that it can specify a new construction of a given SORT. The analogy
in normal languages would be that one could define a new construction
with new syntax and say this is an example of an <Expression>,
for example; I don't know of any standard language which permits this,
although those of you with a historical bent might remember Algol-N
which made a valiant attempt at it. Of course, the algebraic method
of description makes it much easier to specify, rather than having
to give syntax to provide the syntax for the new construction in a
language.
Every TOKEN has a result SORT, i.e. the SORT of its resulting expansion
and before it can be expanded, one must have its parameter SORTs.
Thus, you can regard a TOKEN as having a type defined by its result
and parameter SORTs and the _apply_token as the operator which expands
the encapsulation and substitutes the parameters.
However, if we look at the signature of exp_apply_token:
To explain BITSTREAMs requires a diversion into the bit-encoding of
TDF. Constructors for a particular SORT are represented in a number
of bits depending on the number of constructors for that SORT; the
context will determine the SORT required, so no more bits are required.
Thus since there is only one constructor for UNITs, no bits are required
to represent make_unit; there are about 120 different constructors
for EXPs so 7 bits are required to cover all the EXPs. The parameters
of each constructor have known SORTs and so their representations
are just concatenated after the representation of the constructor
*. While this is a
very compact representation, it suffers from the defect that one must
decode it even just to skip over it. This is very irksome is some
applications, notably the TDF linker which is not interested detailed
expansions. Similarly, in translators there are places where one wishes
to skip over a token application without knowledge of the SORTs of
its parameters. Thus a BITSTREAM is just an encoding of some TDF,
preceded by the number of bits it occupies. Applications can then
skip over BITSTREAMs trivially. Similar considerations apply to BYTESTREAMs
used elsewhere; here the encoding is preceded by the number of bytes
in the encoding and is aligned to a byte boundary to allow fast copying.
Hence the BITSTREAM param_sorts(token_value) in the
actual parameter of exp_apply_token above is simply formed by the
catenation of constructions of the SORTs given by the SORTNAMEs in
the tok_params of the TOKEN being expanded.
Usually one gives a name to a TOKEN_DEFN using to form a TOKDEF using
make_tokdef :
The significance of the signature parameter is discussed in
section 3.2.2 on page 18.
Often, one wishes a token without giving its definition - the definition
could, for example, be platform-dependent. A TOKDEC introduces such
a token using make_tokdec:
One can also use a TOKEN_DEFN in an anonymous fashion by giving it
as an actual parameter of a TOKEN which itself demands a TOKEN parameter.
To do this one simply uses use_tokdef :
The crucial use of TOKENs in TDF is to provide abstractions of APIs
(see section 10 on page 45) but they
are also used as shorthand for commonly occurring constructions. For
example, given the TDF constructor plus, mentioned above, we could
define a plus with only two EXP parameters more suitable to C by using
the wrap constructor as the ERROR_TREATMENT:
Some of these constructors are implicit. For example, there are no
explicit constructors for LIST or SLIST which are both used to form
lists of SORTs; their construction is simply part of the encoding
of TDF. However, it is forseen that LIST constructors would be highly
desireable and there will probably extensions to TDF to promote LIST
from a second-class SORT to a first-class one. This will not apply
to SLIST or to the other SORTs which have implicit constructions.
These include BITSTREAM, BYTESTREAM, TDFINT, TDFIDENT and TDFSTRING.
Part of the TenDRA Web.
The diagram gives an example of a CAPSULE using the same components
as in the following text.
Each CAPSULE has its own name-space, distinct from all other CAPSULEs'
name-spaces and also from the name-spaces of its component UNITs (see
section 3.1.2 on page 14). There are several different
kinds of names in TDF and each name-space is further subdivided into
one for each kind of name. The number of different kinds of names
is potentially unlimited but only three are used in core-TDF, namely
"tag", "token" and "al_tag". Those names
in a "tag" name-space generally correspond to identifiers
in normal programs and I shall use these as the paradigm for the properties
of them all.
The actual representations of a "tag" name in a given name-space
is an integer, described as SORT TDFINT. These integers are drawn
from a contiguous set starting from 0 up to some limit given by the
constructor which introduces the name-space. For CAPSULE name-spaces,
this is given by the capsule_linking parameter of make_capsule:
First we give the limits of the various name-spaces local to the UNIT
in the local_vars parameter:
Note that names from the CAPSULE name-space only arise in two places,
LINKs and LINK_EXTERNs. Every other use of names are derived from
some UNIT name-space.
There are three constructors for TAGDEFs, each with slightly different
properties. The simplest is make_id_tagdef:
The two other constructors for TAGDEF, make_var_tagdef and common_tagdef
both define variable tags and have the same signature:
The ACCESS parameter gives various properties required for the tag
being defined and is discussed in section
5.3.2 on page 30.
The initialisation EXPs of TAGDEFs will be evaluated before the "main"
program is started. An initialiation EXP must either be a constant
(in the sense of section 9 on page 43)
or reduce to (either directly or by token or _cond expansions) to
an initial_value:
Most program will appear in the "tagdefs" UNITs - they will
include the definitions of the procedures of the program which in
turn will include local definitions of tags for the locals of the
procedures.
The standard TDF linker allows one to link CAPSULEs together using
the name identifications given in the LINKEXTERNs, perhaps hiding
some of them in the final CAPSULE. It does this just by generating
a new CAPSULE name-space, grouping together component UNITs of the
same kind and replacing their lks parameters with values derived
from the new CAPSULE name-space without changing the UNITs' name-spaces
or their props parameters. The operation of grouping together
UNITs is effectively assumed to be associative, commutative and idempotent
e.g. if the same tag is declared in two capsules it is assumed to
be the same thing . It also means that there is no implied order of
evaluation of UNITs or of their component TAGDEFs
Different languages have different conventions for deciding how programs
are actually run. For example, C requires the presence of a suitably
defined "main" procedure; this is usually enforced by requiring
the system ld utility to bind the name "main" along with
the definitions of any library values required. Otherwise, the C conventions
are met by standard TDF linking. Other languages have more stringent
requirements. For example, C++ requires dynamic initialisation of
globals, using initial_value. As the only runnable code in TDF is
in procedures, C++ would probably require an additional linking phase
to construct a "main" procedure which calls the initialisation
procedures of each CAPSULE involved if the system linker did not provide
suitable C++ linking.
Similar considerations apply to TOKDEFs and TOKDECs; the "type"
of a TOKEN may not have any familiar analogue in most HLLs, but the
principle remains the same.
Part of the TenDRA Web.
In the TDF specification of EXPs, you will observe that all EXPs in
constructor signatures are all qualified by the SHAPE name; for example,
a parameter might be EXP INTEGER(v). This merely means that for the
construct to be meaningful the parameter must be derived from a constructor
defined to be an EXP INTEGER(v). You might be forgiven for assuming
that TDF is hence strongly-typed by its SHAPEs. This is not true;
the producer must get it right. There are some checks in translators,
but these are not exhaustive and are more for the benefit of translator
writers than for the user. A tool for testing the SHAPE correctness
of a TDF program would be useful but has yet to be written.
The other way of constructing a VARIETY is to specify the number of
bits required for its 2s-complemennt representation using var_width:
A FLOATING_VARIETY for a real number is constructed using fvar_parms:
Complex numbers have a floating variety constructed by complex_parms
which has the the same signature as fvar_parms. The representation
of these numbers is likely to be a pair of real numbers each defined
as if by fvar_parms with the same arguments. The real and imaginary
parts of of a complex number can be extracted using real_part and
imaginary_part; these could have been injected ito the complex number
using make_complex or any of the complex operations. Many translators
will simply transform complex numbers into COMPOUNDs consisting of
two floating point numbers, transforming the complex operations into
floating point operations on the fields.
The abstraction of alignments clearly has to cover compound objects
as well as primitive ones like integers. For example, if a field of
structure in C is to be accessed efficiently, then the alignment of
the field will influence the alignment of the structure as whole;
the structure itself could be a component of a larger object whose
alignment must then depend on the alignment of the structure and so
on. In general, we find that a compound alignment is given by the
maximum alignment of its components, regardless of the form of the
compound object e.g. whether it is a structure, union, array or whatever.
This gives an immediate handle on the abstraction of the alignment
of a compound object - it is just the set of abstractions of the alignments
of its components. Since "maximum" is associative, commutative
and idempotent, the component sets can be combined using normal set-union
rules. In other words, a compound alignment is abstracted as the set
of alignments of the primitive objects which make up the compound
object. Thus the alignment abstraction of a C structure with only
float fields is the singleton set containing the alignment of a float
while that of a C union of an int and this structure is a pair of
the alignments of an int and a float.
The constructor , alignment, gives the ALIGNMENT of a given SHAPE
according to the rules given in the definition. These rules effectively
define the primitive ALIGNMENTs as in the ALIGNMENT column
of table 3. Those for PROC, all OFFSETs and all POINTERs are constants
regardless of any SHAPE qualifiers. Each of the INTERGER VARIETYs,
each of the FLOATING VARIETYs and each of the BITFIELD VARIETYs have
their own ALIGNMENTs. These ALIGNMENTs will be bound to values apposite
to the particular platform at translate-time. The ALIGNMENT of TOP
is conventionally taken to be the empty set of ALIGNMENTs (corresponding
to the minimum alignment on the platform).
The alignment of a procedure parameter clearly has to include the
alignment of its SHAPE; however, most ABIs will mandate a greater
alignment for some SHAPEs e.g. the alignment of a byte parameter is
usually defined to be on a 32-bit rather than an 8-bit boundary. The
constructor, parameter_alignment, gives the ALIGNMENT of a parameter
of given SHAPE.
The alignment of a code address is {code} given by code_alignment;
this will be the alignment of a pointer given by make_local_lv giving
the value of a label.
The other special ALIGNMENTs are considered to include all of the
others, but remain distinct. They are all concerned with offsets and
pointers relevant to procedure frames, procedure parameters and local
allocations and are collectively known as frame alignments. These
frame alignments differ from the normal alignments in that their mapping
to a given architecture is rather more than just saying that it describes
some n-bit boundary. For example, alloca_alignment describes the alignment
of dynamic space produced by local_alloc (roughly the C alloca). Now,
an ABI could specify that the alloca space is a stack disjoint from
the normal procedure stack; thus manipulations of space at alloca_alignment
may involve different code to space generated in other ways.
Similar considerations apply to the other special alignments, callees_alignment(b),
callers_alignment(b) and locals_alignment. The first two give the
alignments of the bases of the two different parameter spaces in procedures
(q.v.) and locals_alignment gives the alignment of the base of locally
declared tags within a procedure. The exact interpretation of these
depends on how the frame stack is defined in the target ABI, e.g.
does the stack grow downwards or upwards?
The final special alignment is var_param_alignment. This describes
the alignment of a special kind of parameter to a procedure which
can be of arbitrary length (see section 5.1.1
on page 27).
.
The other compound SHAPEs are produced using compound:
The SHAPE of a C structure consisting of an char followed by an int
would require x to be the set consisting of two INTEGER VARIETYs,
one for int and one for char, and sz would probably have been
constructed like:
The iAPX-432 is a fairly extreme example of such a machine; it is
a "capability" machine which must segregate pointer values
and non-pointer values into different spaces. On this machine a value
of SHAPE POINTER({pointer, int}) (e.g. a pointer to a structure
containing both integers and pointers) could have two components;
one referring to the pointers and another to the integers. In general,
offsets from this pointer would also have two components, one to pick
out any pointer values and the other the integer values. This would
obviously be the case if the original POINTER referred to an array
of structures containing both pointers and integers; an offset to
an element of the array would have SHAPE OFFSET({pointer, int},{pointer
, int}); both elements of the offset would have to be used as
displacements to the corresponding elements of the pointer to extract
the structure element. The OFFSET ordering is now given by the comparison
of both displacements. Using this method, one finds that pointers
in store to non-pointer alignments are two words in different blocks
and pointers to pointer-alignments are four words, two in one block
and two in another. This sounds a very unwieldy machine compared to
normal machines with linear addressing. However, who knows what similar
strange machines will appear in future; the basic conflicts between
security, integrity and flexibility that the iAPX-432 sought to resolve
are still with us. For more on the modelling of pointers and offsets
see section 13 on page 56.
TDF Version 3.0 does not allow one to construct a pointer of SHAPE
POINTER(b) where b consists entirely of bitfield alignments; this
relieves the translators of the burden of doing general bit-addressing.
Of course, this simply shifts the burden to the producer. If the high
level language requires to construct a pointer to an arbitrary bit
position, then the producer is required to represent such a pointer
as a pair consisting of pointer to some alignment including the required
bitfield and an offset from this alignment to the bitfield. For example,
Ada may require the construction of a pointer to a boolean for use
as the parameter to a procedure; the SHAPE of the rep resentation
of this Ada pointer could be a COMPOUND formed from a POINTER({x,b})
and an OFFSET({x, b}, b) where b is the alignment given by a 1 bit
alignment. To access the boolean, the producer would use the elements
of this pair as arguements to bitfield_assign and bitfield_contents
(Assigning and extracting bitfields on page
39.).
Part of the TenDRA Web.
All executable code in TDF will arise from an EXP PROC made by either
make_proc or make_general_proc. They differ in their treatment of
how space for the actual parameters of a call is managed; in particular,
is it the caller or the callee which deallocates the parameter space?
With make_proc, this management is conceptually done by the caller
at an apply_proc; i.e. the normal C situation. This suffers from the
limitation that tail-calls of procedures are then only possible in
restricted circumstances (e.g. the space for the parameters of the
tail-call must be capable of being included in caller's parameters)
and could only be implemented as an optimisation within a translator.
A producer could not predict these circumstances in a machine independent
manner, whether or not it knew that a tail-call was valid.
An alternative would be to make the management of parameter space
the responsibility of the called procedure. Rather than do this, make_general_proc
(and apply_general_proc) splits the parameters into two sets, one
whose allocation is the responsibility of the caller and the other
whose allocation is dealt with by the callee. This allows an explicit
tail_call to be made to a procedure with new callee parameters; the
caller parameters for the tail_call will be the same as (or some initial
subset of) the caller parameters of the procedure containing the tail_call
.
A further refinement of make_general_proc is to allow access to the
caller parameter space in a postlude at the call of the procedure
using an apply_general_proc. This allows simple implementations of
Ada out_parameters, or more generally, multiple results of procedures.
Each straightforward formal parameter is introduced by an auxiliary
SORT TAGSHACC using make_tagshacc:
Within body, the formal will be accessed using tg_intro;
it is always considered to be a pointer to the space of SHAPE sha
allocated by apply_proc, hence the pointer SHAPE.
For example, if we had a simple procedure with one integer parameter,
var_intro would be empty and params_intro might be:
Procedures, whether defined by make_proc or make_general_proc, will
usually terminate and deliver its result with a return:
Note that the body has SHAPE bottom since all possible terminations
to a procedure have SHAPE BOTTOM..
Procedures defined by make_proc are called using apply_proc:
The SHAPE of the result of the call is given by result_shape
which must be identical to the result_shape of the make_proc.
This is a slightly different method of giving a variable number of
parameters to a procedure, rather than simply giving more actuals
than formals. The principle difference is in the alignment of the
components of varparam; these will be laid out according to
the default padding defined by the component shapes. In most ABIs,
this padding is usually different to the way parameters are laid out;
for example, character parameters are generally padded out to a full
word. Thus a sequence of parameters of given shape has a different
layout in store to the same sequence of shapes in a structure. If
one wished to pass an arbitrary structure to a procedure, one would
use the varparam option rather passing the fields individually
as extra actual parameters.
The result_shape and body have the same general properties
as in make_proc. In addition prcprops gives other information
both about body and the way that that the procedure is called.
PROCPROPS are a set drawn from check_stack, inline, no_long_jump_dest,
untidy, var_callees and var_callers. The set is composed using add_procprops.
The PROCPROPS no_long_jump_dest is a property of body only;
it indicates that none of the labels within body will be the
target of a long_jump construct. The other properties should also
be given consistently at all calls of the procedure; theu are discussed
in section 5.2.2 on page 29.
A procedure, p, constructed by make_general_proc is called
using apply_general_proc:
The actual callee_params may be constructed in three different
ways. The usual method is to use make_callee_list, giving a list of
actual EXP parameters, corresponding to the caller_intro
list in the obvious way.The constructor, same_callees allows one to
use the callees of the current procedure as the callees of the call;
this, of course, assumes that the formals of the current procedure
are compatible with the formals required for the call The final method
allows one to construct a dynamically sized set of CALLEES; make_dynamic_callees
takes a pointer and a size (expressed as an OFFSET) to make the CALLEES;
this will be used in conjunction with a var_callees PROCPROPS (see
section 5.2.2 on page 29).
Some procedures can be expressed using either make_proc or make_general_proc.
For example:
make_proc(S, L, empty, B) = make_general_proc(S, var_callers, L, empty,
B)
One condition for such a tail call to be applicable is knowing that
g does not require any pointers to locals of f; this
is often implicit in the language involved. Equally important is that
the action on the return from f is indistiguishable from the
return from g. For example, if it were the callers responsibility
to pop the the space for the parameters on return from a call, then
the tail call of g would only work if g had the same
parameter space as f.
This is the justification for splitting the parameter set of a general
proc; it is (at least conceptually) the caller's responsibility for
popping the caller-parameters only - the callee-parameters are dealt
with by the procedure itself. Hence we can define tail_call which
uses the same caller-parameters, but a different set of callee-parameters:
tail_call(P, p, C) = return(apply_general_proc(S, P, p, L, C, make_top()))
However an implementation is expected to conserve stack by using the
same space for the call of p as the current procedure.
The presence of untidy means that body may be terminated by
an untidy_return. This returns the result of the procedure as in return,
but the lifetime of the local space of the procedure is extended (in
practice this is performed by not returning the stack to its original
value at the call). A procedure containing an untidy_return is a generalisation
of a local_alloc(see
section 5.3.4 on page 32). For example the procedure
could do some complicated local allocation (a triangular array, say)
and untidily return a pointer to it so that the space is still valid
in the calling procedure. The space will remain valid for the lifetime
of the calling procedure unless some local_free is called within it,
just as if the space had been generated by a local_alloc in the calling
procedure.
The presence of inline is just a hint to the translator that the procedure
body is a good candidate for inlining at the call.
The presence of check_stack means that the static stack requirements
of the procedure will be checked on entry to see that they do not
exceed the limits imposed by set_stack_limit; if they are exceeded
a TDF exception with ERROR_CODE stack_overflow (see section
6.3 on page 35) will be raised.
The other kind of local definition is variable:
The basic model used for the locals and parameters of a procedure
is a frame within a stack of nested procedure calls. One could implement
a procedure by allocating space according to SHAPEs of all of the
parameter and local TAGs so that the corresponding values are at fixed
offsets either from the start of the frame or some pointer within
it.
Indeed, if the ACCESS opt_access parameter in a TAG definition
is produced by visible, then a translator is almost bound to do just
that for that TAG. This is because it allows for the possibility of
the value to be accessed in some way other than by using obtain_tag
which is the standard way of recovering the value bound to the TAG.
The principal way that this could happen within TDF is by the combined
use of env_offset to give the offset and current_env to give a pointer
to the current frame (see section 5.3.3 on page 31).
The out_par ACCESS is only applicable to caller parameters of procedures;
it indicates that the value of the TAG concerned will accessed by
the postlude part of an apply_general_proc. Hence, the value of the
parameter must be accessible after the call; usually this will be
on the stack in the callers frame.
The long_jump_access flag is used to indicate that the tag must be
available after a long_jump. In practice, if either visible or long_jump_access
is set, most translators would allocate the space for the declaration
on the main-store stack rather than in an available register. If it
is not set, then a translator is free to use its own criteria for
whether space which can fit into a register is allocated on the stack
or in a register, provided there is no observable difference (other
than time or program size) between the two possibilities.
Some of these criteria are rather obvious; for example, if a pointer
to local variable is passed outside the procedure in an opaque manner,
then it is highly unlikely that one can allocate the variable in a
register. Some might be less obvious. If the only uses of a TAG t
was in obtain_tag(t)s which are operands of contents or the left-hand
operands of assigns , most ABIs would allow the tag to be placed in
a register. We do not necessarily have to generate a pointer value
if it can be subsumed by the operations available.
A POINTER tag with ACCESS no_other_read or no_other_write is asserting
that there are no "aliassed" accesses to the contents of
the pointer. For example, when applied to a parameter of a procedure,
it is saying that the original pointer of the tag is distinct from
any other tags used (reading/writing) in the lifetime of the tag.
These other tags could either be further parameters of the procedure
or globals. Clearly, this is useful for describing the limitations
imposed by Fortran parameters, for example.
Offsets from the current_env of a procedure to a tag declared in the
procedure are constructed by env_offset:
The offset arithmetic operations allow one to access the values of
tags non-locally using values derived from current_env and env_offset.
They are effectively defined by the following identities:
The importance of this is that env_offset(t) is a constant OFFSET
and can be used anywhere within the enclosing UNIT, in other procedures
or as part of constant TAGDEF; remember that the TDFINT underlying
t is unique within the UNIT. The result of a current_env could be
passed to another procedure (as a parameter, say) and this new procedure
could then access a local of the original by using its env_offset.
This would be the method one would use to access non-local, non-global
identifiers in a language which allowed one to define procedures within
procedures such as Pascal or Algol. Of course, given the stack-based
model, the value given by current_env becomes meaningless once the
procedure in which it is invoked is exited.
The alignment of pointer result is alloca_alignment which must include
all SHAPE alignments.
There are two constructors for releasing space generated by local_alloc.
To release all such space generated in the current procedure one does
local_free_all(); this reduces the size of the current frame to its
static size.
The other constructor is local_free whch is effectively a "pop"
to local_alloc's "push":
The use of a procedure with an untidy_return is just a generalisation
of the idea of local_alloc and the space made available by its use
can be freed in the same way as normal local allocations. Of course,
given that it could be the result of the procedure it can be structured
in an arbitrarily complicated way.
Part of the TenDRA Web.
A translator is free to rearrange the order of evaluation if there
is no observable difference other than in time or space. This applies
anywhere I say "something is evaluated and then ...". We
find this kind of statement in definitions of local variables in
section 5.3 on page 30, and in the controlling
parts of the conditional constructions below.
For a more precise discussion of allowable reorderings see (S7.14)
Note that control does not automatically pass from one EXP to the
next; if this is required the appropriate EXP must end with an explicit
goto.
One can also pick up a label value of SHAPE POINTER {code} (usually
implemented as a program address) using make_local_lv for later use
by an "indirect jump" such as goto_local_lv . Here the same
prohibition holds - the construction which introduced the LABEL must
still be active.
The construction goto_local_lv only permits one to jump within the
current procedure; if one wished to do a jump out of a procedure into
a calling one, one uses long_jump which requires a pointer to the
destination frame (produced by current_env in the destination procedure)
as well as the label value. If a long_jump is made to a label, only
those local TAGs which have been defined with a visible ACCESS are
guaranteed to have preserved their values; the translator could allocate
the other TAGs in scope as registers whose values are not necessarily
preserved.
A slightly "shorter" long jump is given by return_to_label.
Here the destination of the jump must a label value in the calling
procedure. Usually this value would be passed as parameter of the
call to provide an alternative exit to the procedure.
Some of the constructors for NTESTs are disallowed for some _tests
(e.g. proc_test ) while others are redundant for some _tests; for
example, not_greater_than is the same as less_than_or_equal for all
except possibly floating_test. where the use of NaNs (in the IEEE
sense) as operands may give different results.
Obviously, this kind of conditional is rather different to those found
in traditional high-level languages which usually have three components,
a boolean expression, a then-part and an else-part. Here, the first
component includes both the boolean expression and the then-part;
usually we find that it is a sequence of the tests (branching to alt_label_intro
) forming the boolean expression followed by the else-part. This
formulation means that HLL constructions like "andif" and
"orelse" do not require special constructions in TDF.
A simple loop can be expressed using repeat:
Just as with conditionals, the tests for the continuing or breaking
the loop are included in body and require no special constructions.
Each of these exceptions have an ERROR_CODE ascribed to them, namely
overflow, nil_access and stack_overflow. Each ERROR_CODE can be made
into a distinct NAT by means of the constructor error_val; these NATs
could be used, for example, to discriminate the different kinds of
errors using a case construction.
When one of these exceptions is raised, the translator will arrange
that a TOKEN, ~Throw, is called with the appropriate ERROR_CODE as
its (sole) parameter. Given that every language has a different way
of both characterising and handling exceptions, the exact expansion
of ~Throw must be given by the producer for that language - usually
it will involve doing a long_jump to some label specifying a signal
handler and translating the ERROR_CODE into its language-specific
representation.
The expansion of ~Throw forms part of the language specific environment
required for the translation of TDF derived from the language, just
as the exact shape of FILE must be given for the translation of C.
Part of the TenDRA Web.
If a TAG, id, is introduced by an identify, then the value bound is
fixed by its definition argument. If, on the other hand, v
was a TAG introduced by a variable definition, then the value bound
to v is a pointer to fixed space in the procedure frame (i.e. the
left-hand value in C).
Some languages give results to assignments. For example, C defines
the result of an assignment to be its right-hand expression, so that
if the result of (v = exp) was required, it would probably be expressed
as:
For example, the value assigned in assign has some fixed SHAPE; its
size is known at translate-time. Variable sized objects can be moved
by move_some:
The TRANSFER_MODE md parameter controls the way that the move
will be performed. If overlap is present, then the translator will
ensure that the move is equivalent to moving the source into new space
and then copying it to the destination; it would probably do this
by choosing a good direction in which to step through the value. The
alternative, standard_transfer_mode, indicates that it does not matter.
If the TRANSFER_MODE trap_on_nil is present and
arg1 is a nil pointer, a TDF exception with ERROR_CODE nil_access
is raised.
There are variants of both the contents and assign constructors. The
signature of contents_with_mode is:
Similar considerations apply to assign_with_mode; here the overlap
TRANSFER_MODE is also possible with the same meaning as in move_some.
This constraint means that producers cannot expect that arbitrary
bitfield lengths can be accomodated without extra padding; clearly
it also means that the maximum bitfield length possible is the maximum
size of integer variety that can be implemented on the translator
concerned (this is defined to be at least 32). On standard architectures,
the producer can expect that an array of bitfields of lenth 2n
will be packed without padding; this, of course, includes arrays
of booleans. For structures of several different bitfields, he can
be sure of no extra padding bits if the total number of bits involved
is less than or equal to 32; similarly if he can subdivide the bitfields
so that each of the subdivisions (except the last) is exactly equal
to 32 and the last is less than or equal to 32. This could be relaxed
to 64 bits if the translator deals with 64 bit integer varieties,
but would require that conditional TDF is produced to maintain portability
to 32 bit platforms - and on these platforms the assurance of close
packing would be lost.
Since a producer is ignorant of the exact representational varieties
used by a translator, the onus is on the translator writer to provide
standard tokens which can be used by a producer to achieve both optimum
packing of bitfields and minimum alignments for COMPOUNDs containing
them(see Bitfield offsets on page 54).
These tokens would allow one to construct an offset of the form OFFSET(x,
b) (where b is some bitfield alignment and x is the `minimum' alignment
which could contain it) in a manner analogous to the normal padding
operations for offsets. This offset could then used both in the construction
of a compound shape and in the extraction and assignment constructors.
The assignment of bitfields follows the same pattern with the same
constraints using bitfield_assign:
Part of the TenDRA Web.8.1. VARIETY and overflow
An INTEGER VARIETY, for example, is defined by some range of signed
natural numbers. A translator will fit this range into some possibly
larger range which is convenient for the processor in question. For
example, the integers with variety(1,10) would probably be represented
as unsigned characters with range (0..255), a convenient representation
for both storage and arithmetic. 8.1.1. ERROR_TREATMENT
Taking integer addition as an example, plus has signature:
ov_err: ERROR_TREATMENT
arg1: EXP INTEGER(v)
arg2: EXP INTEGER(v)
-> EXP INTEGER(v)
The result of the addition has the same integer VARIETY as its parameters.
If the representational bounds of v are exceeded, then the
action taken depends on the ERROR_TREATMENT ov_err.
lab: LABEL
-> ERROR_TREATMENT
A branch to lab will occur if the result overflows. 8.3. change_variety
Conversions between the various INTEGER varieties are provided for
by change_variety:
ov_err: ERROR_TREATMENT
r: VARIETY
arg1: EXP INTEGER(v)
-> EXP INTEGER(r)
If the value arg1 is outside the limits of the representational
variety of r, then the ERROR_TREATMENT ov_err will be
invoked.8.4. and, or, not, xor
The standard logical operations, and, not, or and xor are provided
for all integer varieties. Since integer varieties are defined to
be represented in twos-complement the result of these operations are
well defined.8.5. Floating-point operations, ROUNDING_MODE
All of the floating-point (including complex) operations include ERROR-TREATMENTs.
If the result of a floating-point operation cannot be represented
in the desired FLOATING_VARIETY, the error treatment is invoked. If
the ERROR_TREATMENT is wrap or impossible, the result is undefined;
otherwise the jump operates in the same way as for integer operations.
Both floating_plus and floating_mult are defined as n-ary operations.
In general, floating addition and multiplication are not associative,
but a producer may not care about the order in which they are to be
performed. Making them appear as though they were associative allows
the translator to choose an order which is convenient to the hardware.8.6. change_bitfield_to_int, change_int_to_bitfield
There are two bit-field operation, change_bitfield_to_int and change_int_to_bitfield
to transform between bit-fields and integers. If the varieties do
not fit the result is undefined; the producer can always get it right.8.7. make_compound, make_nof, n_copies
There is one operation to make values of COMPOUND SHAPE, make_compound:
arg1: EXP OFFSET(base, y)
arg2: LIST(EXP)
-> EXP COMPOUND(sz)
The OFFSET arg1 is evaluated as a translate-time constant to
give sz, the size of the compound object. The EXPs of arg2
are alternately OFFSETs (also translate-time constants) and values
which will be placed at those offsets. This constructor is used to
construct values given by structure displays; in C, these only occur
with constant val[i] in global definitions. It is also used
to provide union injectors; here sz would be the size of the
union and the list would probably two elements with the first being
an offset_zero.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
9 Constants
The representation of constants clearly has peculiar difficulties
in any architecture neutral format. Leaving aside any problems of
how numbers are to be represented, we also have the situation where
a "constant" can have different values on different platforms.
An obvious example would be the size of a structure which, although
it is a constant of any particular run of a program, may have different
values on different machines. Further, this constant is in general
the result of some computation involving the sizes of its components
which are not known until the platform is chosen. In TDF, sizes are
always derived from some EXP OFFSET constructed using the various
OFFSET arithmetic operations on primitives like shape_offset and offset_zero.
Most such EXP OFFSETs produced are in fact constants of the platform;
they include field displacements of structure as well as their sizes.
TDF assumes that, if these EXPs can be evaluated at translate-time
(i.e. when the sizes and alignments of primitive objects are known),
then they must be evaluated there. An example of why this is so arises
in make_compound; the SHAPE of its result EXP depends on its arg1
EXP OFFSET parameter and all SHAPEs must be translate-time values.
9.1. _cond constructors
Another place where translate-time evaluation of constants is mandated
is in the various _cond constructors which give a kind of "conditional
compilation" facility; every SORT which has a SORTNAME, other
that TAG, TOKEN and LABEL, has one of these constructors e.g. exp_cond:
control: EXP INTEGER(v)
e1: BITSTREAM EXP x
e2: BITSTREAM EXP y
-> EXP x or EXP y
The constant, control, is evaluated at translate time. If it
is not zero the entire construction is replaced by the EXP in e1;
otherwise it is replaced by the one in e2. In either case,
the other BITSTREAM is totally ignored; it even does not need to be
sensible TDF. This kind of construction is use extensively in C pre-processing
directives e.g.:
#if (sizeof(int) == sizeof(long)) ...
9.2. Primitive constant constructors
Integer constants are constructed using make_int:
v: VARIETY
value: SIGNED_NAT
-> EXP INTEGER(v)
The SIGNED_NAT value is an encoding of the binary value required
for the integer; this value must lie within the limits given by v.
I have been rather slip-shod in writing down examples of integer constants
earlier in this document; where I have written 1 as an integer EXP,
for example, I should have written make_int(v, 1) where v is some
appropriate VARIETY.
v: VARIETY
str: STRING(k, n)
-> EXP NOF(n, INTEGER(v))
Each unsigned integer in str must lie in the variety v
and the result is the constant array whose elements are the integers
considered to be of VARIETY v. An ASCI-C constant string might
have v = variety(-128,127) and k = 7; however, make_nof_int
can be used to make strings of any INTEGER VARIETY; a the elements
of a Unicode string would be integers of size 16 bits.
f: FLOATING_VARIETY
rm: ROUNDING_MODE
sign: BOOL
mantissa: STRING(k, n)
base: NAT
exponent: SIGNED_NAT
-> EXP FLOATING(f)
For a normal floating point number, each integer in mantissa
is either the ASCI `.'-symbol or the ASCI representation of a digit
of the representation in the given base; i.e. if c is the ASCI
symbol, the digit value is c-'0'. The resulting floating point number
has SHAPE FLOATING(f) and value mantissa * base exponent
rounded according to rm. Usually the base will be 10 (sometimes
2) and the rounding mode to_nearest. Any floating-point evaluation
of expressions done at translate-time will be done to an accuracy
greater that implied by the FLOATING_VARIETY involved, so that floating
constants will be as accurate as the platform permits.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
10 Tokens and APIs
All of the examples of the use of TOKENs so far given have really
been as abbreviations for commonly used constructs, e.g. the EXP OFFSETS
for fields of structures. However, the real justification for TOKENs
are their use as abstractions for things defined in libraries or application
program interfaces (APIs). 10.1. Application programming interfaces
APIs usually do not give complete language definitions of the operations
and values that they contain; generally, they are defined informally
in English giving relationships between the entities within them.
An API designer should allow implementors the opportunity of choosing
actual definitions which fit their hardware and the possibility of
changing them as better algorithms or representations become available.
make_tokdec(T_FILE, empty, shape())
and anywhere that it wished to refer to the SHAPE of FILE it would
do:
shape_apply_token(make_tok(T_FILE), ())
Before this program is translated on a given platform, the actual
SHAPE of FILE must be supplied. This would be done by linking a TDF
CAPSULE which supplies the TOKDEF for the SHAPE of FILE which is particular
to the target platform.
#define stdin &_iob[0]
which illustrates the point. It is better to have all of the interface
of an API expressed as TOKENs to give both generality and flexibility
across different platforms.10.2. Linking to APIs
In general, each API requires platform-dependent definitions to be
supplied by a combination of TDF linking and system linking for that
platform. This is illustrated in the following diagram giving the
various phases involved in producing a runnable program.10.2.1. Target independent headers, unique_extern
Any producer which uses an API will use system independent information
to give the common interface TOKENs for this API. In the C producer,
this is provided by header files using pragmas, which tell the producer
which TOKENs to use for the particular constructs of the API . In
any target-independent CAPSULE which uses the API, these TOKENs would
be introduced as TOKDECs and made globally accessible by using make_linkextern.
For a world-wide standard API, the EXTERNAL "name" for a
TOKEN used by make_linkextern should be provided by an application
of unique_extern on a UNIQUE drawn from a central repository of names
for entities in standard APIs; this repository would form a kind of
super-standard for naming conventions in all possible APIs. The mechanism
for controlling this super-standard has yet to be set up, so at the
moment all EXTERN names are created by string_extern.10.3. Language programming interfaces
So far, I have been speaking as though a TOKENised API could only
be some library interface, built on top of some language, like xpg3,
posix, X etc. on top of C. However, it is possible to consider the
constructions of the language itself as ideal candidates for TOKENisation.
For example, the C for-statement could be expressed as TOKEN with
four parameters
*. This TOKEN could
be expanded in TDF in several different ways, all giving the correct
semantics of a for-statement. A translator (or other tools) could
choose the expansion it wants depending on context and the properties
of the parameters. The C producer could give a default expansion which
a lazy translator writer could use, but others might use expansions
which might be more advantageous. This idea could be extended to virtually
all the constructions of the language, giving what is in effect a
C-language API; perhaps this might be called more properly a language
programming interface (LPI). Thus, we would have TOKENs for C for-statements,
C conditionals, C procedure calls, C procedure definitions etc.
*.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
11 TDF transformations
TDF to TDF transformations form the basis of most of the tools of
TDF, including translators. TDF has a rich set of easily performed
transformations; this is mainly due to its algebraic nature, the liberality
of its sequencing rules, and the ease with which one can introduce
new names over limited scopes. For example, a translator is always
free to transform:
assign(e1, e2)
to:
identify(empty, new_tag, e1, assign(obtain_tag(new_tag), e2))
i.e. identify the evaluation of the left-hand side of the assignment
with a new TAG and use that TAG as the left-hand operand of a new
assignment in the body of the identification. Note that the reverse
transformation is only valid if the evaluation of e1 does not side-effect
the evaluation of e2. A producer would have had to use the second
form if it wished to evaluate e1 before e2. The definition of assign
allows its operands to be evaluated in any order while identify insists
that the evaluation of its definition is conceptually complete
before starting on its body.
identify(empty, new_tag, X, F(obtain_tag(new_tag))).
This includes the extraction of expressions which are constant over
a loop; if F was some repeat construction and one can show that the
EXP X is invariant over the repeat, the transformation does the constant
extraction.11.1. Transformations as definitions
As well being a vehicle for expressing optimisation, TDF transformations
can be used as the basis for defining TDF. In principle, if we were
to define all of the allowable transformations of the TDF constructions,
we would have a complete definition of TDF as the initial model of
the TDF algebra. This would be a fairly impracticable project, since
the totality of the rules including all the simple constructs would
be very unwieldy, difficult to check for inconsistencies and would
not add much illumination to the definition. However, knowledge of
allowable transformations of TDF can often answer questions of the
meaning of diverse constructs by relating them to a single construct.
What follows is an alphabet of generic transformations which can often
help to answer knotty questions. Here, E[X \ Y] denotes an EXP E with
all internal occurrences of X replaced by Y.
If F is any non order-specifying
* EXP constructor and E is one of the EXP operands of F, then:
F(... , E, ...) xde identify(empty, newtag, E, F(... , obtain_tag(newtag),
...)) If E is a non side-effecting
*
EXP and none of the variables used in E are assigned to in B: identify(v,
tag, E, B) xde B[obtain_tag(tag) \ E] If all uses of tg in B are
of the form contents(shape(E), obtain_tag(tg)): variable(v, tg, E,
B) xde identify(v, nt, E, B[contents(shape(E), obtain_tag(tg)) \
obtain_tag(nt)]) sequence((S1, ... , Sn), sequence((P1,
..., Pm), R) xdb sequence((S1, ..., Sn, P1,
..., Pm), R) If Si = sequence((P1, ..., Pm),
R) : sequence((S1, ... , Sn), T) xdb sequence((S1,
..., Si-1, P1, ..., Pm, R, Si+1, ...,
Sn), T) E xdb sequence(( ), E) If D is either identify or
variable: D(v, tag, sequence((S1, ..., Sn), R), B) xde
sequence((S1, ..., Sn), D(v, tag, R, B) ) If Si
is an EXP BOTTOM , then: sequence((S1, S2, ... Sn),
R) xde sequence((S1, ... Si-1), Si) If E is
an EXP BOTTOM, and if D is either identify or variable: D(v, tag,
E, B) xde E If Si is make_top(), then: sequence((S1,
S2, ... Sn), R) xdb sequence((S1, ... Si-1,
Si+1, ...Sn), R) If Sn is an EXP TOP: sequence((S1,
... Sn), make_top()) xdb sequence((S1
, ..., Sn-1), Sn) If E is an EXP TOP and E is not
side-effecting then E xde make_top() If C is some non order-specifying
and non side-effecting constructor, and Si is C(P1,...,
Pm) where P1..m are the EXP operands of C: sequence((S1,
..., Sn), R) xde sequence((S1, ..., Si-1, P1,
..., Pm, Si+1, ..., Sn), R) If none of the Si
use the label L: conditional(L, sequence((S1, ..., Sn),
R), A) xde sequence((S1, ..., Sn), conditional(L,
R, A)) If there are no uses of L in X
*:
conditional(L, X, Y) xde X conditional(L, E , goto(Z)) xde E[L \
Z] If EXP X contains no use of the LABEL L: conditional(L, conditional(M,
X, Y), Z) xde conditional(M, X, conditional(L, Y, Z)) repeat(L,
I, E) xde sequence( (I), repeat(L, make_top(), E)) repeat(L, make_top(),
E) xde conditional(Z, E[L \ Z], repeat(L, make_top(), E)) If there
are no uses of L in E: repeat(L, make_top(), sequence((S, E), make_top())
xde conditional(Z, S[L \ Z], repeat(L, make_top(), sequence((E,
S), make_top()) ) ) If f is a procedure defined
*
as: make_proc(rshape, formal1..n, vtg , B( return R1,
..., return Rm)) where: formali = make_tagshacc(si,
vi, tgi) and B is an EXP with all of its internal return
constructors indicated parametrically then, if Ai has SHAPE
si
apply_proc(rshape, f, (A1, ... , An), V) xde variable(
empty, newtag, make_value((rshape=BOTTOM)? TOP: rshape), labelled(
(L), variable(v1, tg1, A1, ... , variable(vn,
tgn, An, variable(empty, vtg, V, B(sequence( assign(obtain_tag(newtag),
R1), goto(L)) , ... , sequence( assign(obtain_tag(newtag),
Rm), goto(L)) ) ) ), contents(rshape, obtain_tag(newtag))
) ) assign(E, make_top()) xde sequence( (E), make_top()) contents(TOP,
E) xde sequence((E), make_top()) make_value(TOP) xde make_top()
component(s, contents(COMPOUND(S), E), D) xde contents(s, add_to_ptr(E,
D)) make_compound(S, ((E1, D1), ..., (En, Dn))
) xde variable(empty, nt, make_value(COMPOUND(S)), sequence( (
assign(add_to_ptr(obtain_tag(nt), D1), E1), ... ,
assign(add_to_ptr(obtain_tag(nt), Dn), En) ), contents(S,
obtain_tag(nt)) ) )
11.1.1. Examples of transformations
Any of these transformations may be performed by the TDF translators.
The most important is probably {A} which allows one to reduce all
of the EXP operands of suitable constructors to obtain_tags. The expansion
rules for identification, {G}, {H} and {I}, gives definition to complicated
operands as well as strangely formed ones, e.g. return(... return(X)...).
Rule {A} also illustrates neatly the lack of ordering constraints
on the evaluation of operands. For example, mult(et, exp1, exp2) could
be expanded by applications of {A} to either:
identify(empty, t1, exp1,
identify(empty, t2, exp2, mult(et, obtain_tag(t1), obtain_tag(t2))) )
or:
identify(empty, t2, exp2,
identify(empty, t1, exp1, mult(et, obtain_tag(t1), obtain_tag(t2))) )
Both orderings of the evaluations of exp1 and exp2 are acceptable,
regardless of any side-effects in them. There is no requirement that
both expansions should produce the same answer for the multiplications;
the only person who can say whether either result is "wrong"
is the person who specified the program.
variable(empty, nt, make_shape(TOP),
labelled( (L),
... sequence((assign(obtain_tag(nt), make_top())),
goto(L)) ...
contents(TOP, obtain_tag(nt))
)
)
The rules {V}, {W} and {X} allow this to be replaced by:
variable(empty, nt, make_top(),
labelled( (L),
... sequence((obtain_tag(nt)), goto(L)) ...
sequence((obtain_tag(nt)), make_top())
)
)
The obtain_tags can be eliminated by rule {M} and then the sequences
by {F}. Sucessive applications of {C} and {B} then give:
labelled( (L),
... goto(L) ...
make_top()
)
11.1.2. Programs with undefined values
The definitions of most of the constructors in the TDF specification
are predicated by some conditions; if these conditions are not met
the effect and result of the constructor is not defined for all possible
platforms*. Any value
which is dependent on the effect or result of an undefined construction
is also undefined. This is not the same as saying that a program is
undefined if it can construct an undefined value - the dynamics of
the program might be such that the offending construction is never
obeyed.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
12 TDF expansions of offsets
Consider the C structure defined by:
typedef struct{ int i; double d; char c;} mystruct;
Given that sh_int, sh_char and sh_double are the SHAPEs for int, char
and double, the SHAPE of mystruct is constructed by:
SH_mystruct = compound(S_c)
where:
S_c = offset_add(O_c, shape_offset(sh_char))
where:
O_c = offset_pad(alignment(sh_char), S_d)
where:
S_d = offset_add(O_d, shape_offset(sh_double))
where:
O_d = offset_pad(alignment(sh_double), S_i)
where*:
S_i = offset_add(O_i, shape_offset(sh_int)) and: O_i = offset_zero(alignment(sh_int))
Each of S_c, S_d and S_i gives the minimum "size" of the
space required to upto and including the field c, d and i respectively.
Each of O_c, O_d and O_i gives the OFFSET "displacement"
from a pointer to a mystruct required to select the fields
c, d and i respectively. The C program fragment:
mystruct s;
.... s.d = 1.0; ...
would translate to something like:
variable(empty, tag_s, make_value(compound(S_c)),
sequence( ...
assign(add_to_ptr(obtain_tag(tag_s), O_d), 1.0)
...
)
)
Each of the OFFSET expressions above are ideal candidates for tokenisation;
a producer would probably define tokens for each of them and use
exp_apply_token to expand them at each of their uses.
S_c = shape_offset(SH_mystruct)
i.e. an OFFSET(alignment(sh_int) xc8 alignment(sh_char) xc8 alignment(sh_double), {})
This would not be the OFFSET required to describe sizeof(mystruct)
in C, since this is defined to be the difference between successive
elements an array of mystructs. The sizeof OFFSET would
have to pad S_c to the alignment of SH_mystruct:
offset_pad(alignment(SH_mystruct), S_c)
This is the OFFSET that one would use to compute the displacement
of an element of an array of mystructs using offset_mult with
the index.
arg1: EXP POINTER(y xc8 A)
arg2: EXP OFFSET(y, z)
-> EXP POINTER(z)
... for any ALIGNMENT A
one sees that arg2 can measure an OFFSET from a value of a
"smaller" alignment than the value pointed at by arg1.
If arg2 were O_d, for example, then arg1 could be a
pointer to any structure of the form struct {int i, double d,...}
not just mystruct. The general principle is that an OFFSET
to a field constructed in this manner is independent of any fields
after it, corresponding to normal usage in both languages and machines.
A producer for a language which conflicts with this would have to
produce less obvious TDF, perhaps by re-ordering the fields, padding
the offsets by later alignments or taking maxima of the sizes of the
fields. 12.1. Bitfield offsets
Bitfield offsets are governed by rather stricter rules. In order to
extract or assign a bitfield, we have to find an integer variety which
entirely contains the bitfield. Suppose that we wished to extract
a bitfield by:
bitfield_contents(v, p:POINTER(X), b:OFFSET(Y, B))
Y must be an alignment(I) where I is some integer SHAPE, contained
in X. Further, this has to be equivalent to:
bitfield_contents(v, add_ptr(p, d:OFFSET(Y,Y)), b':OFFSET(Y, B))
for some d and b' such that:
offset_pad(v, shape_offset(I)) >= b'
and
offset_add(offset_pad(v, offset_mult(d, sizeof(I)), b') = b
Clearly, we have a limitation on the length of bitfields to the maximum
integer variety available; in addition, we cannot have a bitfield
which overlaps two such varieties.
~Bitfield_offset_zero:
n: NAT
issigned: BOOL
-> EXP OFFSET(A, bfvar_bits(issigned, n))
Here the result is a zero offset to the bitfield with `minimum' integer variety alignment A.
~Bitfield_offset_pad:
n: NAT
issigned: BOOL
sh: SHAPE
-> EXP OFFSET(alignment(sh) xc8 A, bfvar_bits(issigned,
n))
Here the result is the shape_offset of sh padded with the `minimum' alignment A so that it can accomodate the bitfield. Note that this may involve padding
sh with the alignment of the maximum integer variety if there are not enough bits left at the end of
sh.
~Bitfield_offset_mult:
n: NAT
issigned: BOOL
ind: EXP INTEGER(v)
-> EXP OFFSET(A, bfvar_bits(issigned, n))
Here the result is an offset which gives the displacement of indth
element of an array of n-bit bitfields with `minimum' alignment A. Note that this will correspond to a normal multiplication only if
n is a power of 2 or ind*n <= length of the maximum integer variety.
S = compound(~Bitfield_offset_mult(M, b, N))
to give a shape, S, representing an array of N M-bit bitfields. This
may not be just N*M bits; for example ~Bitfield_offset_mult may be
implemented to pack an array of 3-bit bitfields as 10 fields to a
32-bit word. In any case, one would expect that normal rules for offset
arithmetic are preserved, e.g.
offset_add(~Bitfield_offset_pad(M, b, S), size(bitfield(bfvar_bits(b, N))) )
= ~Bitfield_offset_mult(M, b, N+1)
where size(X) = offset_pad(alignment(X), shape_offset(X))
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
13 Models of the TDF algebra
TDF is a multi-sorted abstract algebra. Any implementation of TDF
is a model of this algebra, formed by a mapping of the algebra into
a concrete machine. An algebraic mapping gives a concrete representation
to each of the SORTs in such a way that the representation of any
construction of TDF is independent of context; it is a homomorphism.
In other words if we define the mapping of a TDF constructor, C, as
MAP[C] and the representation of a SORT, S, as REPR[S] then:
REPR[ C(P1 ,..., Pn) ] = MAP[C]( REPR(P1) ,..., REPR(Pn
))
Any mapping has to preserve the equivalences of the abstract algebra,
such as those exemplified by the transformations {A} - {Z} in
section 11.1 on page 48. Similarly, the
mappings of any predicates on the constructions, such as those giving
"well-formed" conditions, must be satisfied in terms of
the mapped representations.13.1. Model for a 32-bit standard architecture
Almost all current architectures use a "flat-store" model
of memory. There is no enforced segregation of one kind of data from
another - in general, one can access one unit of memory as a linear
offset from any other. Here, TDF ALIGNMENTs are a reflection of constraints
for the efficient access of different kinds of data objects - usually
one finds that 32-bit integers are most efficiently accessed if they
start at 32 bit boundaries and so on. 13.1.1. Alignment model
The representation of ALIGNMENT in a typical standard architecture
is a single integer where:
REPR [ { } ] = 1
REPR[ {bitfield} ] = 1
REPR[ {char_variety} ] = 8
REPR[ {short_variety} ] = 16
Otherwise, for all other primitive ALIGNMENTS a:
REPR [ {a} ] = 32
The representation of a compound ALIGNMENT is given by:
REPR [ A xc8 B ] = Max(REPR[ A ] , REPR[ B ])
i.e. MAP[ unite_alignment] = Max
while the ALIGNMENT inclusion predicate is given by:
REPR[ A ... B ]= REPR[ A ] xb3 REPR[ B }
All the constructions which make ALIGNMENTs are represented here and
they will always reduce to an integer known at translate-time. Note
that the mappings for xc8 and ... must preserve the basic algebraic
properties derived from sets; for example the mapping of xc8 must
be idempotent, commutative and associative, which is true for Max.13.1.2. Offset and pointer model
Most standard architectures use byte addressing; to address bits requires
more complication. Hence, a value with SHAPE POINTER(A) where REPR[A)]xb9
1 is represented by a 32-bit byte address. 13.1.3. Size model
In principle, the representation of a SHAPE is a pair of an ALIGNMENT
and a size, given by shape_offset applied to the SHAPE. This pair
is constant which can be evaluated at translate time. The construction,
shape_offset(S), has SHAPE OFFSET(alignment(s), { } ) and hence is
represented by a bit-offset:
REPR[ shape_offset(top()) ] = 0
REPR[ shape_offset(integer(char_variety)) ] = 8
REPR[ shape_offset(integer(short_variety)) ] = 16
.... etc. for other numeric varieties
REPR[ shape_offset(pointer(A)) ]= 32
REPR[ shape_offset(compound(E)) ] = REPR[ E ]
REPR[ shape_offset(bitfield(bfvar_bits(b, N))) ] = N
REPR[ shape_offset(nof(N, S)) ] = N * REPR[ offset_pad(
alignment(S), shape_offset(S)) ]
where S is not a bitfield shape
Similar considerations apply to the other offset-arithmetic constructors.
In general, we have:
REPR [ offset_zero(A) ] = 0 for all A
REPR[offset_pad(A, X:OFFSET(C,D))
= ((REPR[X] + REPR[A]-1)/(REPR[A]))*REPR[A]/8
if REPR[A] xb9 1 xd9 REPR[D ] =1
Otherwise :
REPR[offset_pad(A, X:OFFSET(C,D))
= ((REPR[X] + REPR[A]-1)/(REPR[A]))*REPR[A]
REPR[ offset_add(X:OFFSET(A,B), Y:OFFSET(C,D) )]
= REPR[ X ] *8+ REPR[ Y ]
if REPR[B] xb9 1 xd9 REPR[D ] =1
Otherwise:
REPR[ offset_add(X, Y )] = REPR[ X ] + REPR[ Y ]
REPR[ offset_max(X: OFFSET(A, B), Y: OFFSET(C, D))]
= Max(REPR[ X ], 8*REPR[ Y ]
if REPR[ B ] = 1 xd9 REPR[D ] xb9 1
REPR[ offset_max(X: OFFSET(A, B), Y: OFFSET(C, D))]
= Max(8*REPR[ X ], REPR[ Y ]
if REPR[ D ] = 1 xd9 REPR[ B ] xb9 1
Otherwise:
REPR[ offset_max(X, Y) ] = Max( REPR[ X ], REPR[ Y ])
REPR[offset_mult(X, E) ] = REPR[ X ] * REPR[ E ]
IA translator working to this model maps ALIGNMENTs into the integers
and their inclusion constraints into numerical comparisons. As a result,
it will correctly allow many OFFSETs which are disallowed in general;
for example, OFFSET({pointer}, {char_variety}) is allowed since REPR[
{pointer} ] xb3 REPR[ {char_variety} ]. Rather fewer of these extra
relationships are allowed in the next model considered.13.2. Model for machines like the iAPX-432
The iAPX-432 does not have a linear model of store. The address of
a word in store is a pair consisting of a block-address and a displacement
within that block. In order to take full advantage of the protection
facilities of the machine, block-addresses are strictly segregated
from scalar data like integers, floats, displacements etc. There are
at least two different kind of blocks, one which can only contain
block-addresses and the other which contains only scalar data. There
are clearly difficulties here in describing data-structures which
contain both pointers and scalar data.13.2.1. Alignment model
An ALIGNMENT is represented by a pair consisting of an integer, giving
the natural alignments for scalar data, and boolean to indicate the
presence of a block-address. Denote this by:
(s: alignment_of_scalars, b: has_blocks)
We then have:
REPR[ alignment({ }) ] = (s: 1, b: FALSE)
REPR[ alignment({char_variety}) = (s: 8, b:FALSE)
... etc. for other numerical and bitfield varieties.
REPR[ alignment({pointer}) ] = (s: 32, b: TRUE)
REPR[ alignment({proc}) ] = (s: 32, b: TRUE)
REPR[ alignment({local_label_value}) ] = (s: 32, b: TRUE)
The representation of a compound ALIGNMENT is given by:
REPR[ A xc8 B ] = (s: Max(REPR[ A ].s, REPR[ B ].s), b: REPR[ A ].b xda REPR[ B ].b )
and their inclusion relationship is given by:
REPR[ A ... B ] = (REPR[ A ].s xb3 REPR[ B ].s) xd9 (REPR[ A ].b xda ÿ REPR[ B ].b)
13.2.2. Offset and pointer model
A value with SHAPE POINTER A where ÿ REPR[ A ].b is represented
by a pair consisting of a block-address of a scalar block and an integer
bit-displacement within that block. Denote this by:
(sb: scalar_block_address, sd: bit_displacement)
A value with SHAPE POINTER A where REPR[ A ].b is represented by a
quad word consisting of two block-addresses and two bit-displacements
within these blocks. One of these block addresses will contain the
scalar information pointed at by one of the bit-displacements; similarly,
the other pair will point at the block addresses in the data are held.
Denote this by:
(sb: scalar_block_address, ab: address_block_address,
sd: scalar_displacement, ad: address_displacement )
( sd: scalar_displacement, ad: address_displacement )
13.2.3. Size model
The sizes given by shape_offset are now:
REPR[shape_offset(integer(char_variety)) ] = 8
... etc. for other numerical and bitfield varieties.
REPR[ shape_offset(pointer(A)) ] = ( REPR[ A ].b ) ? (sd: 64, ad: 64) : (sd: 32, ad: 32)
REPR[ shape_offset(offset(A, B)) ] = (REPR[ A ].b) ? 64 : 32)
REPR[ shape_offset(proc) ] = (sd: 32, ad: 32)
REPR[ shape_offset(compound(E)) ] = REPR[ E ]
REPR[ shape_offset(nof(N, S)) ]
= N* REPR[ offset_pad(alignment(S)), shape_offset(S)) ]
REPR[ shape_offset(top) ] = 0
13.2.4. Offset arithmetic
The other OFFSET constructors are given by:
REPR[ offset_zero(A) ] = 0 if ÿ REPR[ A ].b
REPR[ offset_zero(A) ] = (sd: 0, ad: 0) if REPR[ A ].b
REPR[ offset_add(X: OFFSET(A,B), Y: OFFSET(C, D)) ] = REPR[ X ] + REPR[ Y ]
if ÿ REPR[ A ].b xd9 ÿ REPR[ C ].b
REPR[ offset_add(X: OFFSET(A,B), Y: OFFSET(C, D)) ]
= ( sd: REPR[ X ].sd + REPR[ Y ].sd, ad: REPR[ X ].ad + REPR[ Y ].ad)
if REPR[ A ].b xd9 REPR[ C ].b
REPR[ offset_add(X: OFFSET(A,B), Y: OFFSET(C, D)) ]
= ( sd: REPR[ X ].sd + REPR[ Y ], ad:REPR[ X ].ad )
if REPR[ A ].b xd9 ÿ REPR[ C ].b
REPR[ offset_pad(A, Y: OFFSET(C, D)) ] = (REPR[Y ] + REPR[A ].s - 1)/REPR[ A ].s
if ÿ REPR[ A ].b xd9 ÿ REPR[ C ].b
REPR[ offset_pad(A, Y: OFFSET(C, D)) ]
= ( sd: (REPR[Y ] + REPR[A ].s - 1)/REPR[ A ].s, ad: REPR[ Y ].ad)
if REPR[ C ].b
REPR[ offset_pad(A, Y: OFFSET(C, D)) ]
= ( sd: (REPR[Y]+REPR[A].s-1)/REPR[A].s, ad: 0)
if REPR[ A ].b xd9 ÿ REPR[ C ].b
REPR[ offset_max(X: OFFSET(A,B), Y: OFFSET(C, D)) ]
= Max(REPR[ X ], REPR[ Y ])
if ÿ REPR[ A ].b xd9 ÿ REPR[ C ].b
REPR[ offset_max(X: OFFSET(A,B), Y: OFFSET(C, D)) ]
= ( sd: Max(REPR[ X ].sd, REPR[ Y ].sd),
ad: Max(REPR[ X ].a, REPR[ Y ].ad) )
if REPR[ A ].b xd9 REPR[ C ].b
REPR[ offset_max(X: OFFSET(A,B), Y: OFFSET(C, D)) ]
= ( sd: Max(REPR[ X ].sd, REPR[ Y ]), ad:REPR[ X ].ad )
if REPR[ A ].b xd9 ÿ REPR[ C ].b
REPR[ offset_max(X: OFFSET(A,B), Y: OFFSET(C, D)) ]
= ( sd: Max(REPR[Y ].sd, REPR[ X]), ad: REPR[Y ].ad )
if REPR[C ].b xd9 ÿ REPR[ A ].b
REPR[ offset_subtract(X: OFFSET(A,B), Y: OFFSET(C, D)) ]
= REPR[ X ]- REPR[ Y ]
if ÿ REPR[ A ].b xd9 ÿ REPR[ C ].b
REPR[ offset_subtract(X: OFFSET(A,B), Y: OFFSET(C, D)) ]
= ( sd: REPR[ X ].sd - REPR[ Y ].sd, ad:REPR[ X ].ad - REPR[ Y ].ad)
if REPR[ A ].b xd9 REPR[ C ].b
REPR[ offset_add(X: OFFSET(A,B), Y: OFFSET(C, D)) ]
= REPR[ X ].sd - REPR[ Y ]
if REPR[ A ].b xd9 ÿ REPR[ C ].b
.... and so on.
REPR [ {long_variety} ... {pointer} ] = FALSE
but:
REPR [ {pointer} ... {long_variety} ] = TRUE
This just reflects the fact that there is no way that one can extract
a block-address necessary for a pointer from a scalar-block, but since
the representation of a pointer includes a scalar displacement, one
can always retrieve a scalar from a pointer to a pointer.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
14 Conclusion
This commentary is not complete. I have tended to go into considerable
detail into aspects which I consider might be unfamiliar and skip
over others which occur in most compiling systems. I also have a tendency
to say things more than once, albeit in different words; however if
something is worth saying, it is worth saying twice.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
1 Introduction
This memo is intended to be a fairly detailed commentary on the specification
of TDF, a kind of Talmud to the Torah. If it conflicts with the specification
document, it is wrong. The aim is elucidate the various constructions
of TDF, giving examples of usages both from the point of view of a
producer of TDF and how it is used to construct programs on particular
platforms using various installers or translators. In addition, some
attempt is made to give the reasons why the particular constructions
have been chosen. Most of the commentary is a distillation of questions
and answers raised by people trying to learn TDF from the specification
document.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
2 SORTs and TOKENs
In the syntax of language like C or Pascal, we find various syntactic
units like <Expression>, <Identifier> etc. A SORT bears
the same relation to TDF as these syntactic units bear to the language;
roughly speaking, the syntactic unit <Expression> corresponds
to the SORT EXP and <Identifier> to TAG . However, instead of
using BNF to compose syntactic units from others, TDF uses explicit
constructors to compose its SORTs; each constructor uses other pieces
of TDF of specified SORTs to make a piece of its result SORT. For
example, the constructor plus uses an ERROR_TREATMENT and two EXPs
to make another EXP.2.1. Token applications and first-class SORTs
A new construction is introduced by the SORT TOKEN; the constructors
involving TOKENs allow one to give an expansion for the TOKEN in terms
of other pieces of TDF, possibly including parameters. We can encapsulate
a (possibly parameterised) fragment of TDF of a suitable SORT by giving
it a TOKEN as identification. Not all of the SORTs are available for
this kind of encapsulation - only those which have a SORTNAME constructor
(from access to variety). These are the "first-class" SORTs
given in table 1 on page 8. Each of these have an appropriate
_apply_token constructor (e.g. exp_apply_token ) give the expansion.
Most of these also have _cond constructors (e.g.see exp_cond in
section 9.1 on page 43) which allows
translate time conditional expansion of the SORT.
token_value: TOKEN
token_args: BITSTREAM param_sorts(token_value)
-> EXP x
we are confronted by the mysterious BITSTREAM where one might expect
to find the actual parameters of the TOKEN.2.2. Token definitions and declarations
Thus the token_args parameter of exp_apply_token is just the
BITSTREAM formed from the actual parameters in the sequence described
by the definition of the token_value parameter. This
will be given in a TOKEN_DEFN somewhere with constructor token_definition:
result_sort: SORTNAME
tok_params: LIST(TOKFORMALS)
body: result_sort
-> TOKEN_DEFN
The result_sort is the SORT of the construction of body;
e.g. if result_sort is formed from exp then body would
be constructed using the EXP constructors and one would use exp_apply_token
to give the expansion. The list tok_params gives the formal
parameters of the definition in terms of TOKFORMALS constructed using
make_tok_formals:
sn: SORTNAME
tk: TDFINT
-> TOKFORMALS
The TDFINT tk will be the integer representation of the formal
parameter expressed as a TOKEN whose result sort is sn (see
more about name representation in section
3.1 on page 13). To use the parameter in the body of the TOKEN_DEFN,
one simply uses the _apply_token appropriate to sn.Note that
sn may be a TOKEN but the result_sort may not.
tok: TDFINT
signature: OPTION(STRING)
def: BITSTREAM TOKEN_DEFN
-> TOKDEF
Here, tok gives the name that will be used to identify the
TOKEN whose expansion is given by def. Any use of this TOKEN
(e.g. in exp_apply_token) will be given by make_token(tok
) . Once again, a BITSTREAM is used to encapsulate the TOKEN_DEFN.
tok: TDFINT
signature: OPTION(STRING)
s: SORTNAME
-> TOKDEC
Here the SORTNAME, s, is given by token:
result: SORTNAME
params: LIST(SORTNAME)
-> SORTNAME
which gives the result and parameter SORTs of tok.
tdef: BITSTREAM TOKEN_DEFN
-> TOKEN
2.3. A simple use of a TOKEN
make_tokdef (C_plus, empty,
token_definition(
exp(),
(make_tokformals(exp(), l), make_tokformals(exp(), r)),
plus(wrap(), exp_apply_token(l, ()), exp_apply_token(r,())
)
)
2.4. Second class SORTs
Second class SORTs (given in table 2 on page 11) cannot be TOKENised.
These are the "syntactic units" of TDF which the user cannot
extend; he can only produce them using the constructors defined in
core-TDF.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
3 CAPSULEs and UNITs
A CAPSULE is typically the result of a single compilation - one could
regard it as being the TDF analogue of a Unix .o file. Just as with
.o files, a set of CAPSULEs can be linked together to form another.
Similarly, a CAPSULE may be translated to make program for some platform,
provided certain conditions are met. One of these conditions is obviously
that a translator exists for the platform, but there are others. They
basically state that any names that are undefined in the CAPSULE can
be supplied by the system in which it is to be run. For example, the
translator could produce assembly code with external identifiers which
will be supplied by some system library.3.1. make_capsule and name-spaces
The only constructor for a CAPSULE is make_capsule. Its basic function
is to compose together UNITs which contain the declarations and definitions
of the program. The signature of make_capsule looks rather daunting
and is probable best represented graphically.
capsule_linking: SLIST(CAPSULE_LINK)
In the most general case in core-TDF, there would be three entries
in the list introducing limits using make_capsule_link for each of
the "tag", "token" and "al_tag" name-spaces
for the CAPSULE. Thus if:
capsule_linking = (make_capsule_link("tag", 5),
make_capsule_link("token", 6),
make_capsule_link("al_tag", 7))
there are 5 CAPSULE "tag" names used within the CAPSULE,
namely 0, 1, 2, 3 and 4; similarly there are 6 "token" names
and 7 "al_tag" names.3.1.1. External linkages
The context of usage will always determine when and how an integer
is to be interpreted as a name in a particular name-space. For example,
a TAG in a UNIT is constructed by make_tag applied to a TDFINT which
will be interpreted as a name from that UNIT's "tag" name-space.
An integer representing a name in the CAPSULE name-space would be
found in a LINKEXTERN of the external_linkage
parameter of make_capsule.
external_linkage: SLIST(EXTERN_LINK)
Each EXTERN_LINK is itself formed from an SLIST of LINKEXTERNs given
by make_extern_link . The order of the EXTERN_LINKs determines which
name-space one is dealing with; they are in the same order as given
by the capsule_linkage
parameter. Thus, with the capsule_linkage given above, the
first EXTERN_LINK would deal with the "tag" name-space;
Each of its component LINKEXTERNs constructed by make_linkextern
would be identifying a tag number with some name external to the CAPSULE;
for example one might be:
make_linkextern (4, string_extern("printf"))
This would mean: identify the CAPSULE's "tag" 4 with an
name called "printf", external to the module. The name "printf"
would be used to linkage external to the CAPSULE; any name required
outside the CAPSULE would have to be linked like this.3.1.2. UNITs
This name "printf", of course, does not necessarily mean
the C procedure in the system library. This depends both on the system
context in which the CAPSULE is translated and also the meaning of
the CAPSULE "tag" name 4 given by the component UNITs of
the CAPSULE in the groups parameter of make_capsule:
groups: SLIST(GROUP)
Each GROUP in the groups SLIST will be formed by sets of UNITs
of the same kind. Once again, there are a potentially unlimited number
of kinds of UNITs but core-TDF only uses those named "tld","al_tagdefs",
"tagdecs", "tagdefs", "tokdecs" and
"tokdefs"
*. These names will
appear (in the same order as in groups) in the prop_names
parameter of make_capsule, one for each kind of UNIT appearing
in the CAPSULE:
prop_names: SLIST(TDFIDENT)
Thus if:
prop_names = ("tagdecs", "tagdefs")
then, the first element of groups would contain only "tagdecs"
UNITs and and the second would contain only "tagdefs" UNITs.
A "tagdecs" UNIT contains things rather like a set of global
identifier declarations in C, while a "tagdefs" UNIT is
like a set of global definitions of identifiers.3.1.3. make_unit
Now we come to the construction of UNITs using make_unit, as in the
diagram below
local_vars: SLIST(TDFINT)
Just in the same way as with external_linkage, the numbers
in local_vars correspond (in the same order) to the spaces indicated
in capsule_linking in section 3.1 on page 13.
With our example,the first element of local_vars gives the
number of "tag" names local to the UNIT, the second gives
the number of "token" names local to the UNIT etc. These
will include all the names used in the body of the UNIT. Each declaration
of a TAG, for example, will use a new number from the "tag"
name-space; there is no hiding or reuse of names within a UNIT.3.1.4. LINK
Connections between the CAPSULE name-spaces and the UNIT name-spaces
are made by LINKs in the lks parameter of make_unit:
lks: SLIST(LINKS)
Once again, lks is effectively indexed by the kind of name-space
a. Each LINKS is an SLIST of LINKs each of which which establish an
identity between names in the CAPSULE name-space and names in the
UNIT name-space. Thus if the first element of lks contains:
make_link(42, 4)
then, the UNIT "tag" 42 is identical to the CAPSULE "tag"
4.3.2. Definitions and declarations
The encoding in the properties:BYTSTREAM parameter of a UNIT
is a PROPS , for which there are five constructors corresponding to
the kinds of UNITs in core-TDF, make_al_tagdefs, make_tagdecs, make_tagdefs,
make_tokdefs and make_tokdecs. Each of these will declare or define
names in the appropriate UNIT name-space which can be used by make_link
in the UNIT's lks
parameter as well as elsewhere in the properties parameter.
The distinction between "declarations" and "definitions"
is rather similar to C usage; a declaration provides the "type"
of a name, while a definition gives its meaning. For tags, the "type"
is the SORT SHAPE (see below). For tokens, the "type" is
a SORTNAME constructed from the SORTNAMEs of the parameters and result
of the TOKEN using token:
params: LIST(SORTNAME)
result: SORTNAME
-> SORTNAME
Taking make_tagdefs as a paradigm for PROPS, we have:
no_labels: TDFINT
tds: SLIST(TAGDEF)
-> TAGDEF_PROPS
The no_labels parameter introduces the size of yet another
name-space local to the PROPS, this time for the LABELs used in the
TAGDEFs. Each TAGDEF in tds will define a "tag" name
in the UNIT's name-space. The order of these TAGDEFs is immaterial
since the initialisations of the tags are values which can be solved
at translate time, load time or as unordered dynamic initialisations.
t: TDFINT
signature: OPTION(STRING)
e: EXP x
-> TAGDEF
Here, t is the tag name and the evaluation of e will
be the value of SHAPE x of an obtain_tag(t) in an EXP.
Note that t is not a variable; the value of obtain_tag(t) will
be invariant. The signature parameter gives a STRING (see
section 3.2.3 on page 18) which may be used as an
name for the tag, external to TDF and also as a check introduced by
the producer that a tagdef and its corresponding tagdec have the same
notion of the language-specific type of the tag.
t: TDFINT
opt_access: OPTION(ACCESS)
signature: OPTION(STRING)
e: EXP x
-> TAGDEF
Once again t is tag name but now e is initialisation
of the variable t. A use of obtain_tag(t) will give
a pointer to the variable (of SHAPE POINTER x), rather than its contents
*. There can only be
one make_var_tagdef of a given tag in a program, but there may be
more than one common_tagdef, possibly with different initialisations;
however these initialisations must overlap consistently just as in
common blocks in FORTRAN.
init: EXP s
-> EXP s
The translator will arrange that init will be evaluated once
only before any procedure application, other than those themselves
involved in initial_values, but after any constant initialisations.
The order of evaluation of different initial_values is arbitrary.3.2.1. Scopes and linking
Only names introduced by AL_TAGDEFS, TAGDEFS, TAGDECs, TOKDECs and
TOKDEFs can be used in other UNITs (and then, only via the lks
parameters of the UNITs involved). You can regard them as being similar
to C global declarations. Token definitions include their declarations
implicitly; however this is not true of tags. This means that any
CAPSULE which uses or defines a tag across UNITs must include a TAGDEC
for that tag in its "tagdecs" UNITs. A TAGDEC is constructed
using either make_id_tagdec , make_var_tagdec or common_tagdec, all
with the same form:
t_intro: TDFINT
acc: OPTION(ACCESS)
signature: OPTION(STRING)
x: SHAPE
-> TAGDEC
Here the tagname is given by t_intro; the SHAPE x will
defined the space and alignment required for the tag (this is analogous
to the type in a C declaration). The acc field will define
certain properties of the tag not implicit in its SHAPE; I shall return
to the kinds of properties envisaged in discussing local declarations
in section 5.3 on page 30.3.2.2. Declaration and definition signatures
The signature arguments of TAGDEFs and TAGDECs are designed
to allow a measure of cross-UNIT checking when linking independently
compiled CAPSULEs. Suppose that we have a tag, t, used in one
CAPSULE and defined in another; the first CAPSULE would have to have
a TAGDEC for t whose TAGDEF is in the second. The signature
STRING of both could be arranged to represent the language-specific
type of t as understood at compilation-time. Clearly, when
the CAPSULEs are linked the types must be identical and hence their
STRING representation must be the same - a translator will reject
any attempt to link definitions and declarations of the same object
with different signatures.3.2.3. STRING
The SORT STRING is used in various constructs other than declarations
and definitions. It is a first-class SORT with string_apply_token
and string_cond. A primitive STRING is constructed from a TDFSTRING(k,n)
which is an encoding of n integers,each of k bits, using make_string:
arg: TDFSTRING(k, n)
-> STRING(k, n)
STRINGs may be concatenated using concat_string:
arg1: STRING(k, n)
arg2: STRING(k,m)
-> STRING(k, n+m)
Being able to compose strings, including token applications etc, means
that late-binding is possible in signature checking in definitions
and declarations. This late-binding means that the representation
of platform-dependent HLL types need only be fully expanded at install-time
and hence the types could be expressed in their representational form
on the specific platform.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
4 SHAPEs, ALIGNMENTs and OFFSETs.
In most languages there is some notion of the type of a value. This
is often an uncomfortable mix of a definition of a representation
for the value and a means of choosing which operators are applicable
to the value. The TDF analogue of the type of value is its SHAPE (S3.20).
A SHAPE is only concerned with the representation of a value, being
an abstraction of its size and alignment properties. Clearly an architecture-independent
representation of a program cannot say, for example, that a pointer
is 32 bits long; the size of pointers has to be abstracted so that
translations to particular architectures can choose the size that
is apposite for the platform.4.1. Shapes
There are ten different basic constructors for the SORT SHAPE from
bitfield to top as shown in table 3. SHAPEs arising from those constructors
are used as qualifiers (just using an upper case version of the constructor
name) to various SORTs in the definition; for example, EXP TOP is
an expression with top SHAPE. This is just used for definitional purposes
only; there is no SORT SHAPENAME as one has SORTNAME.4.1.1. TOP, BOTTOM, LUB
Two of the SHAPE constructions are rather specialised; these are
TOP and BOTTOM. The result of any expression with a TOP shape will
always be discarded; examples are those produced by assign and integer_test
. A BOTTOM SHAPE is produced by an expression which will leave the
current flow of control e.g. goto . The significance of these SHAPEs
only really impinges on the computation of the shapes of constructs
which have alternative expressions as results. For example, the result
of conditional is the result of one of its component expressions.
In this case, the SHAPE of the result is described as the LUB of the
SHAPEs of the components. This simply means that if one of the component
SHAPEs is TOP then the resulting SHAPE is TOP; if one is BOTTOM then
the resulting SHAPE is the SHAPE of the other; otherwise both component
SHAPEs must be equal and is the resulting SHAPE. Since this operation
is associative, commutative and idempotent, we can speak quite unambiguously
of the LUB of several SHAPEs.4.1.2. INTEGER
Integer values in TDF have shape INTEGER(v) where v is of SORT VARIETY.
The constructor for this SHAPE is integer with a VARIETY parameter.
The basic constructor for VARIETY is var_limits which has a pair of
signed natural numbers as parameters giving the limits of possible
values that the integer can attain. The SHAPE required for a 32 bit
signed integer would be:
integer(var_limits(-231, 231-1))
while an unsigned char is:
integer(var_limits(0, 255))
A translator should represent each integer variety by an object big
enough (or bigger) to contain all the possible values with limits
of the VARIETY. That being said, I must confess that most current
translators do not handle integers of more than the maximum given
naturally by the target architecture, but this will be rectified in
due course.
signed_width: BOOL
width: NAT
-> VARIETY
4.1.3. FLOATING and complex
Similarly, floating point and complex numbers have shape FLOATING
qualified by a FLOATING_VARIETY.
base: NAT
mantissa_digits: NAT
minimum_exponent: NAT
maximum_exponent:: NAT
-> FLOATING_VARIETY
A FLOATING_VARIETY specifies the base, number of mantissa digits,
and maximum and minimum exponent. Once again, it is intended that
the translator will choose a representation which will contain all
possible values, but in practice only those which are included in
IEEE float, double and extended are actually implemented.4.1.4. BITFIELD
A number of contiguous bits have shape BITFIELD, qualified by a BITFIELD_VARIETY
(S3.4) which gives the number of bits involved and whether these bits
are to be treated as signed or unsigned integers. Current translators
put a maximum of 32 or 64 on the number of bits.4.1.5. PROC
The representational SHAPEs of procedure values is given by PROC with
constructor proc . I shall return to this in the description of the
operations which use it.4.1.6. Non-primitive SHAPEs
The construction of the other four SHAPEs involves either existing
SHAPEs or the alignments of existing SHAPEs. These are constructed
by compound, nof , offset and pointer. Before describing these, we
require a digression into what is meant by alignments and offsets.4.2. Alignments
In most processor architectures there are limitations on how one can
address particular kinds of objects in convenient ways. These limitations
are usually defined as part of the ABI for the processor. For example,
in the MIPs processor the fastest way to access a 32-bit integer is
to ensure that the address of the integer is aligned on a 4-byte boundary
in the address space; obviously one can extract a mis-aligned integer
but not in one machine instruction. Similarly, 16-bit integers should
be aligned on a 2-byte boundary. In principle, each primitive object
could have similar restrictions for efficient access and these restrictions
could vary from platform to platform. Hence, the notion of alignment
has to be abstracted to form part of the architecture independent
TDF - we cannot assume that any particular alignment regime will hold
universally.4.2.1. ALIGNMENT constructors
The TDF abstraction of an alignment has SORT ALIGNMENT. The constructor,
unite_alignments, gives the set-union of its ALIGNMENT parameters;
this would correspond to taking a maximum of two real alignments in
the translator. 4.2.2. Special alignments
There are several other special ALIGNMENTs.4.2.3. AL_TAG, make_al_tagdef
Alignments can also be named as AL_TAGs using make_al_tagdef. There
is no corresponding make_al_tagdec since AL_TAGs are implicitly declared
by their constructor, make_al_tag. The main reason for having names
for alignments is to allow one to resolve the ALIGNMENTs of recursive
data structures. If, for example, we have mutually recursive structures,
their ALIGNMENTs are best named and given as a set of equations formed
by AL_TAGDEFs. A translator can then solve these equations trivially
by substitution; this is easy because the only significant operation
is set-union.4.3. Pointer and offset SHAPEs
A pointer value must have a form which reflects the alignment of the
object that it points to; for example, in the MIPs processor, the
bottom two bits of a pointer to an integer must be zero. The TDF SHAPE
for a pointer is POINTER qualified by the ALIGNMENT of the object
pointed to. The constructor pointer uses this alignment to make a
POINTER SHAPE.4.3.1. OFFSET
Expressions which give sizes or offsets in TDF have an OFFSET SHAPE.
These are always described as the difference between two pointers.
Since the alignments of the objects pointed to could be different,
an OFFSET is qualified by these two ALIGNMENTs. Thus an EXP OFFSET(X,Y)
is the difference between an EXP POINTER(X) and an EXP POINTER(Y).
In order for the alignment rules to apply, the set X of alignments
must include Y. The constructor offset uses two such alignments to
make an OFFSET SHAPE. However, many instances of offsets will be produced
implicitly by the offset arithmetic, e.g., offset_pad:
a: ALIGNMENT
arg1: EXP OFFSET(z, t)
-> EXP OFFSET(z xc8 a, a)
This gives the next OFFSET greater or equal to arg1 at which
an object of ALIGNMENT a can be placed. It should be noted
that the calculation of shapes and alignments are all translate-time
activities; only EXPs should produce runnable code. This code, of
course, may depend on the shapes and alignments involved; for example,
offset_pad might round up arg1 to be a multiple of four bytes
if a was an integer ALIGNMENT and z was a character
ALIGNMENT. Translators also do extensive constant analysis, so if
arg1 was a constant offset, then the round-off would be done
at translate-time to produce another constant.4.4. Compound SHAPEs
The alignments of compound SHAPEs (i.e. those arising from the constructors
compound and nof) are derived from the constructions which produced
the SHAPE. To take the easy one first, the constructor nof has signature:
n: NAT
s: SHAPE
-> SHAPE
This SHAPE describes an array of n values all of SHAPE s;
note that n is a natural number and hence is a constant known
to the producer. Throughout the definition this is referred to as
the SHAPE NOF(n, s). The ALIGNMENT of such a value is alignment(s);
i.e. the alignment of an array is just the alignment of its elements.
sz: EXP OFFSET(x, y)
-> S HAPE
The sz parameter gives the minimum size which can accommodate
the SHAPE. 4.4.1. Offset arithmetic with compound shapes
The constructors offset_add , offset_zero and shape_offset are used
together with offset_pad to implement (inter alia) selection
from structures represented by COMPOUND SHAPEs. Starting from the
zero OFFSET given by offset_zero, one can construct an EXP which is
the offset of a field by padding and adding offsets until the required
field is reached. The value of the field required could then be extracted
using component or add_to_ptr. Most producers would define a TOKEN
for the EXP OFFSET of each field of a structure or union used in the
program simply to reduce the size of the TDF
sz = offset_add(offset_pad(int_al, shape_offset(char)), shape_offset(int))
The various rules for the ALIGNMENT qualifiers of the OFFSETs give
the required SHAPE; these rules also ensure that offset arithmetic
can be implemented simply using integer arithmetic for standard architectures
(see section 13.1 on page 56). Note that
the OFFSET computed here is the minimum size for the SHAPE. This would
not in general be the same as the difference between successive elements
of an array of these structures which would have SHAPE OFFSET(x,
x) as produced by offset_pad(x, sz). For examples
of the use of OFFSETs to access and create structures, see
section 12 on page 53.4.4.2. offset_mult
In C, all structures have size known at translate-time. This means
that OFFSETs for all field selections of structures and unions are
translate-time constants; there is never any need to produce code
to compute these sizes and offsets. Other languages (notably Ada)
do have variable size structures and so sizes and offsets within these
structures may have to be computed dynamically. Indexing in C will
require the computation of dynamic OFFSETs; this would usually be
done by using offset_mult to multiply an offset expression representing
the stride by an integer expression giving the index:
arg1: EXP OFFSET(x, x)
arg2: EXP INTEGER(v)
-> EXP OFFSET(x, x)
and using add_to_ptr with a pointer expression giving the base of
the array with the resulting OFFSET.4.4.3. OFFSET ordering and representation
There is an ordering defined on OFFSETs with the same alignment qualifiers,
as given by offset_test and offset_max having properties like:
shape_offset(S) xb3 offset_zero(alignment(S))
A xb3 B iff offset_max(A,B) = A
offset_add(A, B) xb3 A where B xb3 offset_zero(some compatible alignment)
In most machines, OFFSETs would be represented as single integer values
with the OFFSET ordering corresponding to simple integer ordering.
The offset_add constructor just translates to simple addition with
offset_zero as 0 with similar correspondences for the other offset
constructors. You might well ask why TDF does not simply use integers
for offsets, instead of introducing the rather complex OFFSET SHAPE.
The reasons are two-fold. First, following the OFFSET arithmetic rules
concerned with the ALIGNMENT qualifiers will ensure that one never
extracts a value from a pointer with the wrong alignment by, for example,
applying contents to an add_to_pointer. This frees TDF from having
to define the effect of strange operations like forming a float by
taking the contents of a pointer to a character which may be mis-aligned
with respect to floats - a heavy operation on most processors. The
second reason is quite simple; there are machines which cannot represent
OFFSETs by a single integer value.4.5. BITFIELD alignments
Even in standard machines, one finds that the size of a pointer may
depend on the alignment of the data pointed at. Most machines do not
allow one to construct pointers to bits with the same facility as
other alignments. This usually means that pointers in memory to BITFIELD
VARIETYs must be implemented as two words with an address and bit
displacement. One might imagine that a translator could implement
BITFIELD alignments so that they are the same as the smallest natural
alignment of the machine and avoid the bit displacement, but this
is not the intention of the definition. On any machine for which it
is meaningful, the alignment of a BITFIELD must be one bit; in other
words successive BITFIELDs are butted together with no padding bits
*. Within the limits
of what one can extract from BITFIELDs, namely INTEGER VARIETYs, this
is how one should implement non-standard alignments, perhaps in constructing
data, such as protocols, for exchange between machines. One could
implement some Ada representational statements in this way; certainly
the most commonly used ones.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
5 Procedures and Locals
All procedures in TDF are essentially global; the only values which
are accessible from the body of a procedure are those which are derived
from global TAGs (introduced by TAGDEFs or TAGDECs), local TAGs defined
within the procedure and parameter TAGs of the procedure5.1. make_proc and apply_proc
The make_proc constructor has signature:
result_shape: SHAPE
params_intro: LIST(TAGSHACC)
var_intro: OPTION(TAGACC)
body: EXP BOTTOM
-> EXP PROC
The params_intro and var_intro parameters introduce
the formal parameters of the procedure which may be used in body.
The procedure result will have SHAPE result_shape and will
be usually given by some return construction within body. The
basic model is that space will be provided to copy actual parameters
(into space supplied by some apply_proc) by value into these formals
and the body will treat this space effectively as local variables.
sha: SHAPE
opt_access: OPTION(LIST(ACCESS))
tg_intro: TAG POINTER(alignment(sha))
-> TAGSHACC
params_intro = make_tagshacc( integer(v), empty, make_tag(13))
Then, TAG 13 from the enclosing UNIT's name-space is identified with
the formal parameter with SHAPE POINTER(INTEGER(v)). Any use of obtain_tag(make_tag(13))
in body will deliver a pointer to the integer parameter. I
shall return to the meaning of opt_access and the ramifications
of the scope and extent of TAGs involved in conjunction with local
declarations in section 5.3.1 on page 30.
arg1: EXP x
-> EXP BOTTOM
Here x must be identical to the result_shape of the
call of the procedure There may be several returns in body; and the
SHAPE x in each will be the same. Some languages allow different
types to be returned depending on the particular call. The producer
must resolve this issue. For example, C allows one to deliver void
if the resulting value is not used. In TDF a dummy value must be provided
at the return; for example make_value(result_shape)
result_shape: SHAPE
arg1: EXP PROC
arg2: LIST(EXP)
varparam: OPTION(EXP)
--> EXP result_shape
Here arg1 is the procedure to be called and arg2 gives
the actual parameters. There must be at least as many actual parameters
as given (with the same SHAPE) in the params_intro of the corresponding
make_proc for arg1
*. The values of arg2
will be copied into space managed by caller.
5.1.1. vartag, varparam
Use of the var_intro OPTION in make_proc and the corresponding
varparam in apply_proc allows one to have a parameter of any
SHAPE, possibly differing from call to call where the actual SHAPE
can be deduced in some way by the body of the make_proc . One
supplies an extra actual parameter, varparam, which usually
would be a structure grouping some set of values. The body of the
procedure can then access these values using the pointer given by
the TAG var_intro, using add_to_ptr with some computed offsets
to pick out the individual fields. 5.2. make_general_proc and apply_general_proc
A make_general_proc has signature:
result_shape: SHAPE
prcprops: OPTION(PROCPROPS)
caller_intro: LIST(TAGSHACC)
callee_intro: LIST(TAGSHACC)
body: EXP BOTTOM
-> EXP PROC
Here the formal parameters are split into two sets, caller_intro
and callee_intro, each given by a list of TAGSHACCs just as
in make_proc. The distinction between the two sets is that the make_general_proc
is responsible for de_allocating any space required for the callee
parameter set; this really only becomes obvious at uses of tail_call
within body.
result_shape: SHAPE
prcprops: OPTION(PROCPROPS)
p: EXP PROC
caller_params: LIST(OTAGEXP)
callee_params: CALLEES
postlude: EXP TOP
-> EXP result_shape
The actual caller parameters are given by caller_params as
a list of OTAGEXPs constructed using make_otagexp:
tgopt: OPTION(TAG x)
e: EXP x
-> OTAGEXP
Here, e is the value of the parameter and tgopt, if
present, is a TAG which will bound to the final value of the parameter
(after body is evaluated) in the postlude expression
of the apply_general_proc
*. Clearly, this allows
one to use a caller parameter as an extra result of the procedure;
for example, as in Ada out-parameters.5.2.1. tail_call
Often the result of a procedure, f, is simply given by the
call of another (or the same) procedure, g. In appropriate
circumstances, the same stack space can be used for the call of g
as the call of f. This can be particularly important where
heavily recursive routines are involved; some languages even use tail
recursion as the preferred method of looping.
prcprops: OPTION(PROCPROPS)
p: EXP PROC
callee_params: CALLEES
-> EXP BOTTOM
The procedure p will be called with the same caller parameters as
the current procedure and the new callee_params and return
to the call site of the current procedure. Semantically, if S is the
return SHAPE of the current procedure, and L is its caller-parameters:5.2.2. PROCPROPS
The presence of var_callees (or var_callers) means that the procedure
can be called with more actual callee (or caller) parameters than
are indicated in callee_intro (or caller_intro
). These extra parameters would be accessed within body using
offset calculations with respect to the named parameters. The offsets
should be calculated using parameter_alignment to give the packing
of the parameter packs.5.3. Defining and using locals
5.3.1. identify, variable
Local definitions within the body of a procedure are given
by two EXP constructors which permit one to give names to values over
a scope given by the definition. Note that this is somewhat different
to declarations in standard languages where the declaration is usually
embedded in a larger construct which defines the scope of the name;
here the scope is explicit in the definition. The reason for this
will become more obvious in the discussion of TDF transformations.
The simpler constructor is identify:
opt_access: OPTION(ACCESS)
name_intro: TAG x
definition: EXP x
body: EXP y
-> EXP y
The definition is evaluated and its result is identified with
the TAG given by name_intro within its scope body. Hence
the use of any obtain_tag(name_intro) within
body is equivalent to using this result. Anywhere else, obtain_tag(name_intro
) is meaningless, including in other procedures.
opt_access: OPTION(ACCESS)
name_intro: TAG x
init: EXP x
body: EXP y
-> EXP y
Here the init EXP is evaluated and its result serves as an
initialisation of space of SHAPE x local to the procedure.
The TAG name_intro is then identified with a pointer to that SPACE
within body. A use of obtain_tag(name_intro) within body
is equivalent to using this pointer and is meaningless outside body
or in other procedures. Many variable declarations in programs are
uninitialised; in this case, the init argument could be provided
by make_value which will produce some value with SHAPE given by its
parameter.5.3.2. ACCESS
The ACCESS SORT given in tag declarations is a way of describing a
list of properties to be associated with the tag. They are basically
divided into two classes, one which describes global properties of
the tag with respect to the model for locals and the other which gives
"hints" on how the value will be used. Any of these can
be combined using add_access.5.3.2.1 . Locals model
At the moment there are just three possibilities in the first class
of ACCESS constructors. They are standard_access (the default) , visible,
out_par and long_jump_access.
5.3.2.2 . Access "hints"
A variable tag with ACCESS constant is a write-once value; once it
is initialised the variable will always contain the initialisation.
In other words the tag is a pointer to a constant value; translators
can use this information to apply various optimisations.5.3.3. current_env, env_offset
The constructor current_env gives a pointer to the current procedure
frame of SHAPE POINTER(fa) where fa
is depends on how the procedure was defined and will be some set of
the special frame ALIGNMENTs. This set will always include locals_alignment
- the alignment of any locals defined within the procedure. If the
procedure has any caller- parameters, the set will also include callers_alignment(b)
where b indicates whether there can be a variable number of them;
similarly for callee-parameters.
fa: ALIGNMENT
y: ALIGNMENT
t: TAG x
-> EXP OFFSET(fa,y)
The frame ALIGNMENT fa will be the appropriate one for the
TAG t; i.e. if t is a local then the fa will
be locals_alignment; if t is a caller parameter, fa
will be callers_alignment(b); if t is a callee_parameter, fa
will be callees_alignment(b). The alignment y will be the alignment
of the initialisation of t.
If TAG t is derived from a variable definition
add_to_ptr(current_env(), env_offset(locals_alignment, A, t)) = obtain_tag(t)
if TAG t is derived from an identify definition:
contents(S, add_to_ptr(current_env(), env_offset(locals_alignment, A, t))) = obtain_tag(t)
if TAG t is derived from a caller parameter:
add_to_ptr(current_env(), env_offset(callers_alignment(b), A, t)) = obtain_tag(t)
if TAG t is derived from a callee parameter:
add_to_ptr(current_env(), env_offset(callees_alignment(b), A, t)) = obtain_tag(t)
These identities are valid throughout the extent of t, including in
inner procedure calls. In other words, one can dynamically create
a pointer to the value by composing current_env and env_offset. 5.3.4. local_alloc, local_free_all, last_local
The size of stack frame produced by variable and identify definitions
is a translate-time constant since the frame is composed of values
whose SHAPEs are known. TDF also allows one to produce dynamically
sized local objects which are conceptually part of the frame. These
are produced by local_alloc:
arg1: EXP OFFSET(x, y)
-> EXP POINTER(alloca_alignment)
The operand arg1 gives the size of the new object required
and the result is a pointer to the space for this object "on
top of the stack" as part of the frame. The quotation marks indicate
that a translator writer might prefer to maintain a dynamic stack
as well as static one. There are some disadvantages in putting everything
into one stack which may well out-weigh the trouble of maintaining
another stack which is relatively infrequently used. If a frame has
a known size, then all addressing of locals can be done using a stack-front
register; if it is dynamically sized, then another frame-pointer register
must be used - some ABIs make this easy but not all. The majority
of procedures contain no local_allocs, so their addressing of locals
can always be done relative to a stack-front; only the others have
to use another register for a frame pointer.
a: EXP OFFSET(x, y)
p: EXP POINTER(alloca_alignment)
-> EXP TOP
Here p must evaluate to a pointer generated either by local_alloc
or last_local . The effect is to free all of the space locally allocated
after p. The usual implementation (with a downward growing stack)
of this is that p becomes the "top of stack" pointer5.4. Heap storage
At the moment, there are no explicit constructors of creating dynamic
off-stack storage in TDF. Any off-stack storage requirements must
be met by the API in which the system is embedded, using the standard
procedural interface. For example, the ANSI C API allows the creation
of heap space using standard library procedures like malloc.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
6 Control Flow within procedures
6.1. Unconditional flow
6.1.1. sequence
To perform a sequential set of operations in TDF, one uses the constructor
sequence:
statements: LIST(EXP)
result: EXP x
-> EXP x
Each of the statements are evaluated in order, throwing away
their results. Then, result is evaluated and its result is
the result of the sequence. 6.2. Conditional flow
6.2.1. labelled, make_label
All simple changes of flow of control within a TDF procedure are done
by jumps or branches to LABELs, mirroring what actually happens in
most computers. There are three constructors which introduce LABELs;
the most general is labelled which allows arbitrary jumping between
its component EXPs:
placelabs_intro: LIST(LABEL)
starter: EXP x
places: LIST(EXP)
-> EXP w
Each of the EXPs in places is labelled by the corresponding
LABEL in placelabs_intro; these LABELs are constructed by
make_label applied to a TDFINT uniquely drawn from the LABEL name-space
introduced by the enclosing PROPS. The evaluation starts by evaluating
starter; if this runs to completion the result of the labelled
is the result of starter. If there is some jump to a LABEL
in placelabs_intro then control passes to the corresponding
EXP in places and so on. If any of these EXPS runs to
completion then its result is the result of the labelled; hence the
SHAPE of the result, w, is the LUB of the SHAPEs of the component
EXPs. 6.2.2. goto, make_local_lv, goto_local_lv, long_jump, return_to_label
The unconditional goto is the simplest method of jumping. In common
with all the methods of jumping using LABELs, its LABEL parameter
must have been introduced in an enclosing construction, like labelled,
which scopes it. 6.2.3. integer_test, NTEST
Conditional branching is provided by the various _test constructors,
one for each primitive SHAPE except BITFIELD. I shall use integer_test
as the paradigm for them all:
nt: NTEST
dest: LABEL
arg1: EXP INTEGER(v)
arg2: EXP INTEGER(v)
-> EXP TOP
The NTEST nt chooses a dyadic test (e.g. =, >=, <, etc.)
that is to be applied to the results of evaluating
arg1 and arg2. If arg1 nt arg2 then the result
is TOP; otherwise control passes to the LABEL dest. In other
words, integer_test acts like an assertion where if the assertion
is false, control passes to the LABEL instead of continuing in the
normal fashion. 6.2.4. case
There are only two other ways of changing flow of control using LABELs.
One arises in ERROR_TREATMENTs which will be dealt with in the arithmetic
operations. The other is case:
exhaustive: BOOL
control: EXP INTEGER(v)
branches: LIST(CASELIM)
-> EXP (exhaustive ? BOTTOM : TOP)
Each CASELIM is constructed using make_caselim:
branch: LABEL
lower: SIGNED_NAT
upper: SIGNED_NAT
-> CASELIM
In the case construction, the control EXP is evaluated and
tested to see whether its value lies inclusively between some lower
and upper in the list of CASELIMs. If so, control passes to
the corresponding branch. The order in which these tests are
performed is undefined, so it is probably best if the tests are exclusive.
The exhaustive flag being true asserts that one of the branches will
be taken and so the SHAPE of the result is BOTTOM. Otherwise, if none
of the branches are taken, its SHAPE is TOP and execution carries
on normally.6.2.5. conditional, repeat
Besides labelled, two other constructors, conditional and repeat,
introduce LABELs which can be used with the various jump instructions.
Both can be expressed as labelled, but have extra constraints which
make assertions about the use of the LABELs introduced and could usually
be translated more efficiently; hence producers are advised to use
them where possible. A conditional expression or statement would usually
be done using conditional:
alt_label_intro: LABEL
first: EXP x
alt: EXP y
-> EXP(LUB(x, y))
Here first is evaluated; if it terminates normally, its result
is the result of the conditional. If a jump to alt_label_intro
occurs then alt is evaluated and its result is the result of
the conditional. Clearly, this, so far, is just the same as labelled((alt_label_intro
), first, (alt)). However, conditional imposes the
constraint that alt cannot use alt_label_intro. All
jumps to alt_label_intro are "forward jumps" - a
useful property to know in a translator.
repeat_label_intro: LABEL
start: EXP TOP
body: EXP y
-> EXP y
The EXP start is evaluated, followed by body which is
labelled by repeat_label_intro. If a jump to repeat_label_intro
occurs in body, then body is re-evaluated. If body
terminates normally then its result is the result of the repeat. This
is just the same as:
labelled((repeat_label_intro), sequence((start), goto(repeat_label_intro
)), (body))
except that no jumps to repeat_label_intro are allowed in start
- a useful place to do initialisations for the loop. 6.3. Exceptional flow
A further way of changing the flow of control in a TDF program is
by means of exceptions. TDF exceptions currently arise from three
sources - overflow in arithmetic operations with trap ERROR_TREATMENT(see
section 8.1.1 on page 40) , an attempt
to access a value via a nil pointer using assign_with_mode, contents_with_mode
or move_some(see section 7.3 on page 38)
or a stack overflow on procedure entry with PROCPROPS check_stack(see
section 5.2.2 on page 29) or a local_alloc_check.
Crown
Copyright © 1998.TDF Guide, Issue 4.0
January 1998
7 Values, variables and assignments.
TAGs in TDF fulfil the role played by identifiers in most programming
languages. One can apply obtain_tag to find the value bound to the
TAG. This value is always a constant over the scope of a particular
definition of the TAG. This may sound rather strange to those used
to the concepts of left-hand and right-hand values in C, for example,
but is quite easily explained as follows. 7.1. contents
In order to get the contents of this space (the right-hand value in
C), one must apply the contents operator to the pointer:
contents(shape(v), obtain_tag(v))
In general, the contents constructor takes a SHAPE and an expression
delivering pointer:
s: SHAPE
arg1: EXP POINTER(x)
-> EXP s
It delivers the value of SHAPE s, pointed at by the evaluation
of arg1. The alignment of s need not be identical to
x. It only needs to be included in it; this would allow one,
for example, to pick out the first field of a structure from a pointer
to it.7.2. assign
A simple assignment in TDF is done using assign:
arg1: EXP POINTER(x)
arg2: EXP y
-> EXP TOP
The EXPs arg1 and arg2 are evaluated (no ordering implied)
and the value of SHAPE y given by arg2 is put into the
space pointed at by arg1. Once again, the alignment of y
need only be included in x, allowing the assignment to the
first field of a structure using a pointer to the structure. An assignment
has no obvious result so its SHAPE is TOP.
identify(empty, newtag, exp,
sequence((assign(obtain_tag(v), obtain_tag(newtag))), obtain_tag(newtag)))
From the definition of assign, the destination argument, arg1,
must have a POINTER shape. This means that given the TAG id above,
assign(obtain_tag(id), lhs) is only legal if the definition
of its identify had a POINTER SHAPE. A trivial example would be if
id was defined:
identify(empty, id, obtain_tag(v), assign(obtain_tag(id), lhs))
This identifies id with the variable v which has a POINTER SHAPE,
and assigns lhs to this pointer. Given that id does not occur in lhs,
this is identical to:
assign(obtain_tag(v), lhs).
Equivalences like this are widely used for transforming TDF in translators
and other tools (see section 11 on page 48).7.3. TRANSFER_MODE operations
The TRANSFER_MODE operations allow one to do assignment and contents
operations with various qualifiers to control how the access is done
in a more detailed manner than the standard contents and assign operations.
md: TRANSFER_MODE
arg1: EXP POINTER x
arg2: EXP POINTER y
arg3: EXP OFFSET(z, t)
-> EXP TOP
The EXP arg1 is the destination pointer, and arg2 is
a source pointer. The amount moved is given by the OFFSET arg3.
md: TRANSFER_MODE
s: SHAPE
arg1: EXP POINTER(x)
-> EXP s
Here, the only significant TRANSFER_MODE constructors md are
trap_on_nil and volatile. The latter is principally intended to implement
the C volatile construction; it certainly means that the contents_with_mode
operation will never be "optimised" away.7.4. Assigning and extracting bitfields
Since pointers to bits are forbidden, two special operations are provided
to extract and assign bitfields. These require the use of a pointer
value and a bitfield offset from the pointer. The signature of bitfield_contents
which extracts a bitfield in this manner is:
v: BITFIELD_VARIETY
arg1: EXP POINTER(x)
arg2: EXP OFFSET(y,z)
-> EXP bitfield(v)
Here arg1 is a pointer to an alignment x which includes
v, the required bitfield alignment. In practice, x must
include an INTEGER VARIETY whose representation can contain the entire
bitfield. Thus on a standard architecture, if v is a 15 bit bitfield,
x must include at least a 16 bit integer variety; a 27 bitfield would
require a 32 bit integer variety and so on. Indeed the constraint
is stronger than this since there must be an integer variety, accessible
from arg1, which entirely contains the bitfield.
arg1: EXP POINTER(x)
arg2: EXP OFFSET(y,z)
arg3: EXP BITFIELD_VARIETY(v)
-> EXP TOP
Crown
Copyright © 1998.