asm.doc

This document describes the Uxn assembler, including the nonstandard
extensions of the implementation included in this repository.

Specify two or three command-line arguments:

* Input file name (you can specify /dev/stdin to read from stdin).

* Output file name (you can specify /dev/stdout to write to stdout, or you
can specify /dev/null to discard the output).

* Name output file name (you can specify /dev/stdout to write to stdout).
This is optional, and if specified, then it will write a list of labels
(including unused labels too) and their addresses to this file. (This is a
nonstandard feature.)

Switches are also allowed if preceding the other arguments:

 -a will set the starting address to 0x0100.

 -i followed by a token will parse that token before parsing the input
 file. This must be a single token, and not the beginning of a macro or a
 comment. This switch may occur multiple times.


=== Tokens ===

A token consists of a sequence of one or more non-whitespace characters
(you can have whitespace characters to separate them). The maximum length
of each token is sixty-three characters.

All numbers must be lowercase hexadecimal, and normally must be either two
or four hex digits.

The following standard tokens are available:

 ( which begins a block comment; all further tokens up to ) are ignored.
 Comments can be nested.

 [ or ] has no meaning. Anything inside this block is still assembled as
 though the [ and ] commands are not present.

 ~ and a file name will include that file as a source file.

 % and a macro name will define a macro; you will need tokens { and }
 around the definition of the macro.

 The name of a previously defined macro without % in front will call that
 macro, as though its definition is substituted in its place.

 | and a number or label name will set the current address to that number
 (or to the address of that label). Any address is possible, but you cannot
 assemble anything in the zero page nor can anything be assembled out of
 order, although you can skip any space.

 $ and a number or label name will advance the address by that number (or
 by the address of that label). (The padding will  be all zeros if not
 filled in.)

 @ and a name will make a label with that name. If the label name starts
 with a uppercase letter, then warnings about that label being unused are
 suppressed. (Otherwise, a warning is emitted if the label is never used.)

 & and a name will make a label whose name is the same as the most recent
 @ command and then / and then the name specified by the & command.

 # and a number will make a LIT or LIT2 opcode with the specified number
 (the number of hex digits determines which opcode).

 A number without # is similar but the opcode is omitted, so raw data is
 emitted instead.

 . and a name will make a LIT opcode whose value is the low 8-bits of the
 address of the specified name.

 , and a name will make a LIT opcode whose value is a signed 8-bit value
 which is a relative offset of the address of the specified name; this
 should be used immediately preceding a JMP, JSR, or JCN instruction.

 ; and a name will make a LIT2 opcode whose value is the address of the
 specified name.

 : and a name will write the 16-bit address of the specified name directly,
 without any opcode. (It is like ; but excluding the LIT2 instruction.)

 = and a name is the same as : and a name.

 - and a name is like . but without the LIT opcode.

 _ and a name is like , but without the LIT opcode.

 ? and a name will make a JCI instruction jumping to that label.

 ! and a name will make a JMI instruction jumping to that label.

 / and a name will treat it like & but instead of making the label, it will
 be a JSI to that label (since & cannot be used with JSI due to being used
 for a different purpose, / is available for this purpose instead).

 " and any characters (other than whitespaces) will emit that sequence of
 characters without any opcodes or delimiters. (If you need spaces, then
 you must use separate tokens such as 20 for spaces.)

 { and } can be used around a lambda block (they can also be used for
 macros as described above). This is the same as the nonstandard [& ]@
 block (described below), but internally uses a separate stack. You can
 also use other runes at the beginning e.g. ?{ } is like the [? ]@ block.

 An instruction opcode consists of three uppercase letters, optionally
 followed by any combination of instruction flags which can be "2" for
 16-bit operands, "r" to work with the other stack, and/or "k" to keep
 the input operands in the stack. (You cannot code the JCI, JMI, and JSI
 instructions in this way, although BRK can be coded in this way.)

 A name by itself to make a JSI instruction (if it does not match any
 of the other tokens mentioned above).

The following nonstandard tokens are available:

 \\ begins a line comment, which ends at the next line break.

 \~ and a file name will include the contents of the binary file.

 \. or \, is like . or , but excluding the LIT instruction. (These are
 like - and _ but are nonstandard; they were added before the - and _
 signs were made standard.)

 \ and a number pushes the number to the special stack (which is a stack
 which exists only at compile-time).

 \: and a name will push the address of the name to the special stack.
 The label must come before this command; forward references cannot be
 used with this command, and & also cannot be used.

 \@ and a name will pop an address from the special stack and define a
 label whose address is the address which has been popped in this way.

 \_ and further text is a special calculation; see the below section
 about special calculations. These are executed immediately at compile
 time, before the compiled program is written to the output file.

 \` is like \_ but the calculation is deferred until when references
 are being resolved. By this time, any references that come before this
 command will already be resolved, and the memory containing them can
 be read and overwritten by the calculation if desired.

 \- or \= followed by eight more characters and optionally ; will make
 a graphical tile; you should have eight such commands per tile, and
 a semicolon at the end of the last one. Use \- for 1bpp tiles and \=
 for 2bpp tiles. The colour codes are . or 0 for the background colour,
 and numbers 1 2 3 for the other colours.

 [@ makes a anonymous label; the matching ] command will reference the
 same anonymous label as this one. (Nesting is allowed.)

 ]@ will make a anonymous label which the matching [ before it will
 reference it.

 [, [. [; [: [? [! will act like a , . ; : ? ! referencing the anonymous
 label defined by the matching ]@ command.

 ], ]. ]; ]: ]? ]! will act like a , . ; : ? ! referencing the anonymous
 label defined by the matching [@ command.

 [& and ]& are like [! and ]! but the opcode is JSI instead of JMI.

 ]\ acts like \: for the matching [@ command.

 ][ is like [! corresponding to a later ] and then ]@ corresponding to a
 earlier [ command (it can be used like a else block).

For the . , : ; commands, the name can optionally start with & to mean
that it is interpreted the same as for a & command.

Another nonstandard feature is that if you define a macro whose name
starts with a character which has a special meaning, then that special
meaning is ignored in the rest of the file.


=== Special calculations ===

Each instruction is a single character.

Numbers 0 to 9 and a to f push that number to the special stack.

'  ( y z -- yz )  Calculates z+(y<<4).

"  ( w x y z -- wxyz )  Calculates z+(y<<4)+(x<<8)+(w<<12).

_  ( -- )  Set loop point.

!  ( x -- )  Set current address.

@  ( -- x )  Get current address.

#  ( x -- )  Set current length. (This length includes the zero page,
even though the zero page is not written to the output file.)

$  ( -- x )  Get current length.

+  ( x y -- z )  Addition x+y.

-  ( x y -- z )  Subtraction x-y.

*  ( x y -- z )  Multiplication x*y.

/  ( x y -- z )  Division x/y.

%  ( x y -- z )  Modulo x&y.

&  ( x y -- z )  Bitwise AND.

|  ( x y -- z )  Bitwise OR.

^  ( x y -- z )  Bitwise XOR.

<  ( x y -- z )  Left shift x by y bits.

>  ( x y -- z )  Right shift x by y bits.

:  ( x -- x x )  Duplicate.

,  ( x y -- y x )  Swap.

.  ( x -- )  Discard.

E  ( code -- )  Emit an error and stop compiling.

g  ( address -- value )  Read 8-bits from memory.

G  ( address -- value )  Read 16-bits from memory.

k  ( count -- count? )  Similar to o (below) but ends the loop if the new
count is zero instead of the old count.

L  ( condition -- )  Go back to beginning of loop if condition is nonzero.

o  ( count -- count? )  If count is nonzero, decrement and go back to the
beginning; if it is already zero then just discards it and continues the
execution from here instead.

p  ( value address -- )  Write 8-bits to memory. You can write anywhere
including the zero page and can also overwrite memory that has already
been written. Anything written to the zero page can be read back by other
special calculations but will not be written to the output file; anything
written outside of the zero page is written to the output file too.

P  ( value address -- )  Write 16-bits to memory.

w  ( value -- )  Write 8-bits at the current address and advance. You
cannot write in the zero page or overwrite what is already written.

W  ( value -- )  Write 16-bits at the current address and advance. You
cannot write in the zero page or overwrite what is already written.

x  ( length -- )  Add padding into expanded memory.

()  ( condition -- )  Put other commands in the () block; it will be
executed only if the condition is zero, and skipped if nonzero. Conditional
blocks of the same kind cannot be nested.

[]  ( condition -- )  Put other commands in the [] block; it will be
executed only if the condition is nonzero, and skipped if zero. Conditional
blocks of the same kind cannot be nested.


=== Name output format ===

The name output format has each record on a line by itself, and each
record will be like:   |xxxx @yyyy

xxxx is four lowercase hex digits being the address of that label.

yyyy is the name of that label.


=== Expanded memory ===

A nonstandard feature allows writing large files that have preset data in
expanded memory. Any expanded memory data comes after the conventional
memory data. It is possible that some data will fit in conventional memory
in which case such data will be entered into conventional memory, but you
can override this using alignment and padding commands.

The following nonstandard tokens are used for expanded memory:

 \* followed by a name will make a label that refers to the address within
 expanded memory. The plain name refers to the low 16-bits of the address
 and the name followed by ^ refers to the high 16-bits (the page number).
 This also sets the initial scope name of any sub-assemblies that follow.

 \^ followed by a file name is similar to \~ but writes the data into
 expanded memory.

 \B \E \F followed by a digit 0 to 9 or a letter a to u (representing a
 number ten to thirty) will begin a new expanded memory block which is
 aligned to begin or end at an address divisible by two to the power of
 that number, or to fit into an aligned space with that size. For example,
 \Bg will require the data to begin on a page boundary, while \Fg will
 require this block to fit in a single page (which might be the same page
 as the previous block if it fits).

 \A followed by a number (which can be a 32-bit number) will begin a new
 expanded memory block which begins at the specified absolute address. If
 everything before it is already beyond that address, then it is an error.

 \C followed by a number will begin a new expanded memory block which must
 fit into the specified number of pages, otherwise it is an error. If
 necessary, it will align it to a new page, but if the data is small enough
 then this will be unnecessary and it will not do alignment.

 \D followed by a number will begin a new expanded memory block which must
 start on the specified page but not necessarily at the beginning.

 [^ ]^ surround a block which will be entered into expanded memory. This
 can contain assembly code which makes up to sixty-four bytes of data, and
 should not contain any labels nor references (including anonymous labels
 and references due to the [@ ]@ blocks). (However, \: is allowed.)

 ^ followed by a file name makes a sub-assembly. This is similar to ~ but
 the data which is assembled from that file is written to expanded memory
 like \^ instead of like \~ does. This is parsed after the rest of the main
 file is parsed, instead of before like the ~ command does. The initial
 internal address in the sub-assembly, which is also the starting address
 that data will be copied from, is the same as the outer address; you can
 use | to change this if desired. (The intended use of this is for making
 overlays which contain absolute addressing within the overlay. If you
 only use relative addressing then this is not important.) A temporary file
 named ".uxn_subassembly" will be created; it can safely be deleted.

 \+ followed by a name of a label or macro will export that label or macro
 to all sub-assemblies; inside of a sub-assembly it will export a name (you
 cannot export a macro from a sub-assembly) into the main code (which can
 be exported to later sub-assemblies, but not ones that come before). Even
 if it is exported from the main code, a label defined by \* cannot be
 referenced by sub-assemblies in the same or an earlier block than the \*
 is in, but can be referenced by sub-assemblies in any later block.