The A68k Assembler


  
In addition to the GNU Assembler, the TIGCC package also includes the A68k assembler by Charlie Gibbs (slightly modified Amiga version). Although it is quite inferior in comparing to the GNU assembler, it is included here because almost all assembly programs for TI-89 and TI-92+ are created just with this assembler. So, including it in the package allows compiling existing ASM programs as well. As this part of the TIGCC package was developed completely independently of the rest of the TIGCC project (and long before the TIGCC project was even started), it is somewhat inconsistent with the rest of the project.

This assembler comes with its own set of header files. All of them are included mainly for compatibility reasons (note that some of them are deprecated, obsolete, inconsistent or even obscure), so they will not be described here. Information about them may be found in various ASM tutorials for TI-89 and TI-92+ (also deprecated, but note that nearly 95% of all ASM programs for TI-89 and TI-92+ are written using now deprecated way, because a lot of information about the system were not available in the time where these programs are created). What will be presented here (due to completeness) is the original A68k documentation, written by Charlie Gibbs himself (note that his documentation is also somewhat incomplete: for example, you can not find complete list of supported assembly directives here). Everything below this horizontal line is just untouched (only HTMLized) original Charlie's documentation.


A68k - a freely distributable assembler for the Amiga
by Charlie Gibbs
with special thanks to
Brian R. Anderson and Jeff Lydiatt
(Version 2.70 - February 25, 1991)

NOTE: This program is Freely Distributable, as opposed to Public Domain. Permission is given to freely distribute this program provided no fee is charged, and this documentation file is included with the program.

This assembler is based on Brian R. Anderson's 68000 cross- assembler published in Dr. Dobb's Journal, April through June 1986. I have converted it to produce AmigaDOS-format object modules, and have made many enhancements, such as macros and INCLUDE files.

My first step was to convert the original Modula-2 code into C. I did this for two reasons. First, I had access to a C compiler, but not a Modula-2 compiler. Second, I like C better anyway.

The executable code generator code (GetObjectCode and MergeModes) is essentially the same as in the original article, aside from its translation into C. I have almost completely rewritten the remainder of the code, however, in order to remove restrictions, add enhancements, and adapt it to the AmigaDOS environment. Since the only reference book available to me was the AmigaDOS Developer's Manual (Bantam, February 1986), this document describes the assembler in terms of that book.

Restrictions

Let's get these out of the way first:

Extensions

Now for the good stuff:

The Small Code/Data model

Version 2.4 implements a rudimentary small code/data model. It consists of converting any data reference to one of the following three addressing modes:
These conversions do not take place unless a NEAR directive is encountered. The NEAR directive can take one operand, which must be either an address register or a symbol which has been equated (using EQUR) to an address register. Register A7 (SP) may not be used. If no register is given, A4 is assumed.

Conversion is done for all operands until a FAR directive is encountered. NEAR and FAR directives can occur any number of times, enabling conversion to be turned on and off at will.

Backward references which cannot be converted (e.g. external labels declared as XREF) will remain as absolute long addressing. All forward references are assumed to be convertible, since during pass 1 A68k has no way of telling whether conversion is possible. If conversion turns out to be impossible, invalid object code will be generated - an error message ("Invalid forward reference") will indicate when this occurs.

Although the small code/data model can greatly reduce the size of assembled programs, several restrictions apply: I'll be the first to admit that this is a very crude and ugly implementation. I hope to improve it in future versions.

Files

A68k uses the following files:

File Names

The names of the above files can be explicitly specified. However, A68k will generate default file names in the following cases: A default name is generated by deriving a stem name from the source code file name, and appending '.o' for an object code file name ('.s' if the '-s' switch is specified to produce Motorola S-records), '.equ' for an equate file name, or '.lst' for a listing file name. The stem name consists of all characters of the source file name up to the last period (or the entire source file name if it contains no period). Here are some examples of default names:

Source fileObject fileEquate fileListing file
myprog.asmmyprog.omyprog.equmyprog.lst
myprogmyprog.omyprog.equmyprog.lst
new.prog.asmnew.prog.onew.prog.equnew.prog.lst

How to use A68k

The command-line syntax to run the assembler is as follows:
a68k <source file name> [<object file name>] [<listing file name>] [-d[[!]<prefix>]]
     [-e[<equate file name>]] [-f] [-g] [-h<header file name>]
     [-i<include directory list>] [-k] [-l[<listing file name>]]
     [-m<small data offset>] [-n] [-o<object file name>] [-p<page depth>]
     [-q[<quiet interval>]] [-s] [-t] [-w[<hash table size>][,<secondary heap size>]]
     [-x[<listing file name>]] [-y] [-z[<debug start line>][,<debug end line>]]
These options can be given in any order. Any parameter which is not a switch (denoted by a leading hyphen) is assumed to be a file name; up to three file names (assumed to be source, object, and listing file names respectively) can be given. A source file name is always required. If a switch is being given a value, that value must immediately follow the switch letter with no intervening spaces. For instance, to specify a page depth of 40 lines, the specification '-p40' should be used; '-p 40' will be rejected.

Switches perform the following actions:
-d

Causes symbol table entries (hunk_symbol) to be written to the object module for the use of symbolic debuggers. If the switch is followed by a string of characters, only those symbols beginning with that prefix string will be written. This can be used to suppress internal symbols generated by compilers. If the first character is an exclamation mark ('!'), only symbols which do not begin with the following characters are written out. Here are some examples:

-dwrites all symbols
-dabc   writes only symbols beginning with "abc"
-d!xwrites symbols which do not begin with "x"

-e

Causes an equate file (see above) to be produced. A file name can be specified; otherwise a default name will be used.

-f

Causes any branches (Bcc, BRA, BSR) that could be converted to short form to be flagged. A68k will convert as many branches as possible to short form (unless the '-n' switch is is specified), but certain combinations of instructions may set up a ripple effect where shortening one branch brings another one into range. This switch will cause A68k to flag any branches that it may have missed; during pass 2 it is possible to tell this, although during pass 1 it might not be. If the '-n' switch (see below) is specified along with this switch (suppressing all optimization), no branches will be shortened, but all branches which could be shortened will be flagged.

-g

Causes any undefined symbols to be treated as if they were externally defined (XREF), rather than being flagged as errors.

-h

Causes a header file to be read prior to the source code file. A file name must be given. The action is the same as if the first statement of the source file were an INCLUDE statement naming the header file. To find the header file, the same directories will be searched as for INCLUDE files (see the '-i' switch below).

-i

Specifies directories to be searched for INCLUDE files in addition to the current directory. Several names, separated by commas, may be specified. No embedded blanks are allowed. For example, the specification
-imylib,df1:another.lib
will cause INCLUDE files to be searched for first in the current directory, then in "mylib", then in "df1:another.lib".

-k

Causes the object file to be kept even if any errors were found. Otherwise, it will be scratched if any errors occur.

-l

Causes a listing file to be produced. If you want the listing file to include a symbol table dump and cross-reference, use the '-x' switch instead (see below).

-m

Changes the assumed offset from the start of the DATA/BSS section to the base register used when the small code/data option is activated by the NEAR directive. If this parameter is not specified, the offset defaults to 32768.

-n

Causes all object code optimization (see above) to be disabled.

-o

Allows the default name for the object code file (see above) to be overridden.

-p

Causes the page depth to be set to the specified value. This takes the place of the PLEN directive in the Metacomco assembler. Page depth defaults to 60 lines ('-p60').

-q

Changes the interval at which A68k displays the line number it has reached in its progress through the assembly. The default is to display every 100 lines ('-q100'). Specifying larger values reduces console I/O, making assemblies run slightly faster.

If you specify a negative number (e.g. '-q-10'), line numbers will be displayed at an interval equal to the absolute value of the specified number, but will be given as positions within the current module (source, macro, or INCLUDE) rather than as a total statement count - the module name will also be displayed.

A special case is the value zero ('-q0' or just '-q') - this will cause all console output, except for error messages, to be suppressed.

-s

Causes the object file to be written in Motorola S-record format, rather than AmigaDOS format. The default name for an S-record file ends with '.s' rather than '.o; this can still be overridden with the '-o' switch, though.

-t

Allows tabs in the source file to be passed through to the listing file, rather than being expanded. In addition, tabs will be generated in the listing file to skip from the object code to the source statement, etc. This can greatly reduce the size of the listing file, as well as making it quicker to produce. Do not use this option if you will be displaying or listing the list file on a device which does not assume a tab stop at every 8th position.

-w

Specifies the sizes of fixed memory areas that A68k allocates for its own use. You should normally never have to specify this switch, but it may be useful for tuning.

The first parameter gives the number of entries that the hash table (used for searching the symbol table) will contain. The default value of 2047 should be enough for all but the very largest programs. The assembly will not fail if this value is too small, but may slow down if too many long hash chains must be searched. The hashing statistics displayed by the '-y' switch (see below) can be used to tune this parameter. I've heard that you should really specify a prime number for this parameter, but I haven't gone into hashing theory enough to know whether it's actually necessary.

The second parameter of the '-w' switch specifies the size (in bytes) of the secondary heap, which is used to store nested macro and INCLUDE file information (see below). It defaults to 1024, which should be enough unless you use very deeply nested macros and/or INCLUDE files with long path names.

You can specify either or both parameters. For example:

-w4093secondary heap size remains at 1024 bytes
-w,2000hash table size remains at 2047 entries
-w4093,2000   increases the size of both areas

If you're really tight for memory, and are assembling small modules, you can use this switch to shrink these areas below their default sizes. At the end of an assembly, a message will be displayed giving the sizes actually used, in the form of the '-w' command you would have to enter to allocate that much space. This is primarily useful to see how much secondary heap space was used.

NOTE: All other memory used by A68k (e.g. the actual symbol table) is allocated as required (currently in 8K chunks).

-x

Works the same as '-l' (see above), except that a symbol table dump, including cross-reference information, will be added to the end of the listing file.

-y

Causes hashing statistics to be displayed at the end of the assembly. First the number of symbols in the table is given, followed by a summary of hash chains by length. Chains with length zero denote unused hash table entries. Ideally (i.e. if there were no collisions) there should be as many chains with length 1 as there are symbols, and there should be no chains of length 2 or greater. I added this option to help me tune my hashing algorithm, but you can also use it to see whether you should allocate a larger hash table (using the first parameter of the '-w' switch, see above).

-z

This switch was provided to help debug A68k itself. It causes A68k to list a range of source lines, complete with line number and current location counter value, during both passes. Lines are listed immediately after they have been read from the source file, before any processing occurs. Here are some examples of the '-z' switch:

-zlists all source lines
-z100,200   lists lines 100 through 200
-z100lists all lines starting at 100
-z,100lists the first 100 lines

Technical Information

The actual symbol table entries (pointed to by the hash table, colliding entries are linked together) are stored in 8K chunks which are allocated as required. The first entry of each chunk is reserved as a link to the next chunk (or NULL in the last chunk) - this makes it easy to find all the chunks to free them when we're finished. All symbol table entries are stored in pass 1. During pass 2, cross- reference table entries are built in the same group of chunks, immediately following the last symbol table entry. Additional chunks will continue to be linked in if necessary.

Symbol names and macro text are stored in another series of linked chunks. These chunks consist of a link pointer followed by strings (terminated by nulls) laid end to end. Symbols are independent entries, linked from the corresponding symbol table entry. Macros are stored as consecutive strings, one per line - the end of the macro is indicated by an ENDM statement. If a macro spans two chunks, the last line in the original chunk is followed by a newline character to indicate that the macro is continued in the next chunk.

Relocation information is built during pass 2 in yet another series of linked chunks. If more than one chunk is needed to hold one section's relocation information, all additional chunks are released at the end of the section.

The secondary heap is built from both ends, and it grows and shrinks according to how many macros and INCLUDE files are currently open. At all times there will be at least one entry on the heap, for the original source code file. The expression parser also uses the secondary heap to store its working stacks - this space is freed as soon as an expression has been evaluated.

The bottom of the heap holds the names of the source code file and any macro or INCLUDE files that are currently open. The full path is given. A null string is stored for user macros. Macro arguments are stored by additional strings, one for each argument in the macro call line. All strings are stored in minimum space, similar to the labels and user macro text on the primary heap. File names are pointed to by the fixed table entries (see below) - macro arguments are accessed by stepping past the macro name to the desired argument, unless NARG would be exceeded.

The fixed portion of the heap is built down from the top. Each entry occupies 16 bytes. Enough information is stored to return to the proper position in the outer file once the current macro or INCLUDE file has been completely processed.

The diagram below illustrates the layout of the secondary heap.
Heap2 + maxheap2 ----------->  ___________________________
                              |                           |
                              |   Input file table        |
struct InFCtl *InF ---------> |___________________________|
                              |                           |
                              |   Parser operator stack   |
struct OpStack *Ops --------> |___________________________|
                              |                           |
                              |   (unused space)          |
struct TermStack *Term -----> |___________________________|
                              |                           |
                              |   Parser term stack       |
char *NextFNS --------------> |___________________________|
                              |                           |
                              |   Input file name stack   |
char *Heap2 ----------------> |___________________________|
The "high-water mark" for NextFNS is stored in char *High2, and the "low-water mark" (to stretch a metaphor) for InF is stored in struct InFCtl *LowInF. These figures are used only to determine the maximum heap usage.

And Finally...

Please send me any bug reports, flames, etc. I can be reached on Mind Link (604/533-2312), at any meeting of the Commodore Computer Club / Panorama (PAcific NORthwest AMiga Association), or via Jeff Lydiatt or Larry Phillips. I don't have the time or money to live on Compuserve or BIX, but my Usenet address is Charlie_Gibbs@mindlink.UUCP (...uunet!van-bc!rsoft!mindlink!a218).

Charlie Gibbs
2121 Rindall Avenue
Port Coquitlam, B.C.
Canada
V3C 1T9

Return to the main index