mirror of
https://sourceware.org/git/binutils-gdb.git
synced 2025-01-18 12:24:38 +08:00
269bbc74e0
Add docs for tc_frag_data.
1534 lines
59 KiB
Plaintext
1534 lines
59 KiB
Plaintext
\input texinfo
|
|
@setfilename internals.info
|
|
@node Top
|
|
@top Assembler Internals
|
|
@raisesections
|
|
@cindex internals
|
|
|
|
This chapter describes the internals of the assembler. It is incomplete, but
|
|
it may help a bit.
|
|
|
|
This chapter was last modified on $Date$. It is not updated regularly, and it
|
|
may be out of date.
|
|
|
|
@menu
|
|
* GAS versions:: GAS versions
|
|
* Data types:: Data types
|
|
* GAS processing:: What GAS does when it runs
|
|
* Porting GAS:: Porting GAS
|
|
* Relaxation:: Relaxation
|
|
* Broken words:: Broken words
|
|
* Internal functions:: Internal functions
|
|
* Test suite:: Test suite
|
|
@end menu
|
|
|
|
@node GAS versions
|
|
@section GAS versions
|
|
|
|
GAS has acquired layers of code over time. The original GAS only supported the
|
|
a.out object file format, with three sections. Support for multiple sections
|
|
has been added in two different ways.
|
|
|
|
The preferred approach is to use the version of GAS created when the symbol
|
|
@code{BFD_ASSEMBLER} is defined. The other versions of GAS are documented for
|
|
historical purposes, and to help anybody who has to debug code written for
|
|
them.
|
|
|
|
The type @code{segT} is used to represent a section in code which must work
|
|
with all versions of GAS.
|
|
|
|
@menu
|
|
* Original GAS:: Original GAS version
|
|
* MANY_SEGMENTS:: MANY_SEGMENTS gas version
|
|
* BFD_ASSEMBLER:: BFD_ASSEMBLER gas version
|
|
@end menu
|
|
|
|
@node Original GAS
|
|
@subsection Original GAS
|
|
|
|
The original GAS only supported the a.out object file format with three
|
|
sections: @samp{.text}, @samp{.data}, and @samp{.bss}. This is the version of
|
|
GAS that is compiled if neither @code{BFD_ASSEMBLER} nor @code{MANY_SEGMENTS}
|
|
is defined. This version of GAS is still used for the m68k-aout target, and
|
|
perhaps others.
|
|
|
|
This version of GAS should not be used for any new development.
|
|
|
|
There is still code that is specific to this version of GAS, notably in
|
|
@file{write.c}. There is no way for this code to loop through all the
|
|
sections; it simply looks at global variables like @code{text_frag_root} and
|
|
@code{data_frag_root}.
|
|
|
|
The type @code{segT} is an enum.
|
|
|
|
@node MANY_SEGMENTS
|
|
@subsection MANY_SEGMENTS gas version
|
|
@cindex MANY_SEGMENTS
|
|
|
|
The @code{MANY_SEGMENTS} version of gas is only used for COFF. It uses the BFD
|
|
library, but it writes out all the data itself using @code{bfd_write}. This
|
|
version of gas supports up to 40 normal sections. The section names are stored
|
|
in the @code{seg_name} array. Other information is stored in the
|
|
@code{segment_info} array.
|
|
|
|
The type @code{segT} is an enum. Code that wants to examine all the sections
|
|
can use a @code{segT} variable as loop index from @code{SEG_E0} up to but not
|
|
including @code{SEG_UNKNOWN}.
|
|
|
|
Most of the code specific to this version of GAS is in the file
|
|
@file{config/obj-coff.c}, in the portion of that file that is compiled when
|
|
@code{BFD_ASSEMBLER} is not defined.
|
|
|
|
This version of GAS is still used for several COFF targets.
|
|
|
|
@node BFD_ASSEMBLER
|
|
@subsection BFD_ASSEMBLER gas version
|
|
@cindex BFD_ASSEMBLER
|
|
|
|
The preferred version of GAS is the @code{BFD_ASSEMBLER} version. In this
|
|
version of GAS, the output file is a normal BFD, and the BFD routines are used
|
|
to generate the output.
|
|
|
|
@code{BFD_ASSEMBLER} will automatically be used for certain targets, including
|
|
those that use the ELF, ECOFF, and SOM object file formats, and also all Alpha,
|
|
MIPS, PowerPC, and SPARC targets. You can force the use of
|
|
@code{BFD_ASSEMBLER} for other targets with the configure option
|
|
@samp{--enable-bfd-assembler}; however, it has not been tested for many
|
|
targets, and can not be assumed to work.
|
|
|
|
@node Data types
|
|
@section Data types
|
|
@cindex internals, data types
|
|
|
|
This section describes some fundamental GAS data types.
|
|
|
|
@menu
|
|
* Symbols:: The symbolS structure
|
|
* Expressions:: The expressionS structure
|
|
* Fixups:: The fixS structure
|
|
* Frags:: The fragS structure
|
|
@end menu
|
|
|
|
@node Symbols
|
|
@subsection Symbols
|
|
@cindex internals, symbols
|
|
@cindex symbols, internal
|
|
@cindex symbolS structure
|
|
|
|
The definition for @code{struct symbol}, also known as @code{symbolS}, is
|
|
located in @file{struc-symbol.h}. Symbol structures contain the following
|
|
fields:
|
|
|
|
@table @code
|
|
@item sy_value
|
|
This is an @code{expressionS} that describes the value of the symbol. It might
|
|
refer to one or more other symbols; if so, its true value may not be known
|
|
until @code{resolve_symbol_value} is called in @code{write_object_file}.
|
|
|
|
The expression is often simply a constant. Before @code{resolve_symbol_value}
|
|
is called, the value is the offset from the frag (@pxref{Frags}). Afterward,
|
|
the frag address has been added in.
|
|
|
|
@item sy_resolved
|
|
This field is non-zero if the symbol's value has been completely resolved. It
|
|
is used during the final pass over the symbol table.
|
|
|
|
@item sy_resolving
|
|
This field is used to detect loops while resolving the symbol's value.
|
|
|
|
@item sy_used_in_reloc
|
|
This field is non-zero if the symbol is used by a relocation entry. If a local
|
|
symbol is used in a relocation entry, it must be possible to redirect those
|
|
relocations to other symbols, or this symbol cannot be removed from the final
|
|
symbol list.
|
|
|
|
@item sy_next
|
|
@itemx sy_previous
|
|
These pointers to other @code{symbolS} structures describe a singly or doubly
|
|
linked list. (If @code{SYMBOLS_NEED_BACKPOINTERS} is not defined, the
|
|
@code{sy_previous} field will be omitted; @code{SYMBOLS_NEED_BACKPOINTERS} is
|
|
always defined if @code{BFD_ASSEMBLER}.) These fields should be accessed with
|
|
the @code{symbol_next} and @code{symbol_previous} macros.
|
|
|
|
@item sy_frag
|
|
This points to the frag (@pxref{Frags}) that this symbol is attached to.
|
|
|
|
@item sy_used
|
|
Whether the symbol is used as an operand or in an expression. Note: Not all of
|
|
the backends keep this information accurate; backends which use this bit are
|
|
responsible for setting it when a symbol is used in backend routines.
|
|
|
|
@item sy_mri_common
|
|
Whether the symbol is an MRI common symbol created by the @code{COMMON}
|
|
pseudo-op when assembling in MRI mode.
|
|
|
|
@item bsym
|
|
If @code{BFD_ASSEMBLER} is defined, this points to the BFD @code{asymbol} that
|
|
will be used in writing the object file.
|
|
|
|
@item sy_name_offset
|
|
(Only used if @code{BFD_ASSEMBLER} is not defined.) This is the position of
|
|
the symbol's name in the string table of the object file. On some formats,
|
|
this will start at position 4, with position 0 reserved for unnamed symbols.
|
|
This field is not used until @code{write_object_file} is called.
|
|
|
|
@item sy_symbol
|
|
(Only used if @code{BFD_ASSEMBLER} is not defined.) This is the
|
|
format-specific symbol structure, as it would be written into the object file.
|
|
|
|
@item sy_number
|
|
(Only used if @code{BFD_ASSEMBLER} is not defined.) This is a 24-bit symbol
|
|
number, for use in constructing relocation table entries.
|
|
|
|
@item sy_obj
|
|
This format-specific data is of type @code{OBJ_SYMFIELD_TYPE}. If no macro by
|
|
that name is defined in @file{obj-format.h}, this field is not defined.
|
|
|
|
@item sy_tc
|
|
This processor-specific data is of type @code{TC_SYMFIELD_TYPE}. If no macro
|
|
by that name is defined in @file{targ-cpu.h}, this field is not defined.
|
|
|
|
@item TARGET_SYMBOL_FIELDS
|
|
If this macro is defined, it defines additional fields in the symbol structure.
|
|
This macro is obsolete, and should be replaced when possible by uses of
|
|
@code{OBJ_SYMFIELD_TYPE} and @code{TC_SYMFIELD_TYPE}.
|
|
@end table
|
|
|
|
There are a number of access routines used to extract the fields of a
|
|
@code{symbolS} structure. When possible, these routines should be used rather
|
|
than referring to the fields directly. These routines will work for any GAS
|
|
version.
|
|
|
|
@table @code
|
|
@item S_SET_VALUE
|
|
@cindex S_SET_VALUE
|
|
Set the symbol's value.
|
|
|
|
@item S_GET_VALUE
|
|
@cindex S_GET_VALUE
|
|
Get the symbol's value. This will cause @code{resolve_symbol_value} to be
|
|
called if necessary, so @code{S_GET_VALUE} should only be called when it is
|
|
safe to resolve symbols (i.e., after the entire input file has been read and
|
|
all symbols have been defined).
|
|
|
|
@item S_SET_SEGMENT
|
|
@cindex S_SET_SEGMENT
|
|
Set the section of the symbol.
|
|
|
|
@item S_GET_SEGMENT
|
|
@cindex S_GET_SEGMENT
|
|
Get the symbol's section.
|
|
|
|
@item S_GET_NAME
|
|
@cindex S_GET_NAME
|
|
Get the name of the symbol.
|
|
|
|
@item S_SET_NAME
|
|
@cindex S_SET_NAME
|
|
Set the name of the symbol.
|
|
|
|
@item S_IS_EXTERNAL
|
|
@cindex S_IS_EXTERNAL
|
|
Return non-zero if the symbol is externally visible.
|
|
|
|
@item S_IS_EXTERN
|
|
@cindex S_IS_EXTERN
|
|
A synonym for @code{S_IS_EXTERNAL}. Don't use it.
|
|
|
|
@item S_IS_WEAK
|
|
@cindex S_IS_WEAK
|
|
Return non-zero if the symbol is weak.
|
|
|
|
@item S_IS_COMMON
|
|
@cindex S_IS_COMMON
|
|
Return non-zero if this is a common symbol. Common symbols are sometimes
|
|
represented as undefined symbols with a value, in which case this function will
|
|
not be reliable.
|
|
|
|
@item S_IS_DEFINED
|
|
@cindex S_IS_DEFINED
|
|
Return non-zero if this symbol is defined. This function is not reliable when
|
|
called on a common symbol.
|
|
|
|
@item S_IS_DEBUG
|
|
@cindex S_IS_DEBUG
|
|
Return non-zero if this is a debugging symbol.
|
|
|
|
@item S_IS_LOCAL
|
|
@cindex S_IS_LOCAL
|
|
Return non-zero if this is a local assembler symbol which should not be
|
|
included in the final symbol table. Note that this is not the opposite of
|
|
@code{S_IS_EXTERNAL}. The @samp{-L} assembler option affects the return value
|
|
of this function.
|
|
|
|
@item S_SET_EXTERNAL
|
|
@cindex S_SET_EXTERNAL
|
|
Mark the symbol as externally visible.
|
|
|
|
@item S_CLEAR_EXTERNAL
|
|
@cindex S_CLEAR_EXTERNAL
|
|
Mark the symbol as not externally visible.
|
|
|
|
@item S_SET_WEAK
|
|
@cindex S_SET_WEAK
|
|
Mark the symbol as weak.
|
|
|
|
@item S_GET_TYPE
|
|
@item S_GET_DESC
|
|
@item S_GET_OTHER
|
|
@cindex S_GET_TYPE
|
|
@cindex S_GET_DESC
|
|
@cindex S_GET_OTHER
|
|
Get the @code{type}, @code{desc}, and @code{other} fields of the symbol. These
|
|
are only defined for object file formats for which they make sense (primarily
|
|
a.out).
|
|
|
|
@item S_SET_TYPE
|
|
@item S_SET_DESC
|
|
@item S_SET_OTHER
|
|
@cindex S_SET_TYPE
|
|
@cindex S_SET_DESC
|
|
@cindex S_SET_OTHER
|
|
Set the @code{type}, @code{desc}, and @code{other} fields of the symbol. These
|
|
are only defined for object file formats for which they make sense (primarily
|
|
a.out).
|
|
|
|
@item S_GET_SIZE
|
|
@cindex S_GET_SIZE
|
|
Get the size of a symbol. This is only defined for object file formats for
|
|
which it makes sense (primarily ELF).
|
|
|
|
@item S_SET_SIZE
|
|
@cindex S_SET_SIZE
|
|
Set the size of a symbol. This is only defined for object file formats for
|
|
which it makes sense (primarily ELF).
|
|
@end table
|
|
|
|
@node Expressions
|
|
@subsection Expressions
|
|
@cindex internals, expressions
|
|
@cindex expressions, internal
|
|
@cindex expressionS structure
|
|
|
|
Expressions are stored in an @code{expressionS} structure. The structure is
|
|
defined in @file{expr.h}.
|
|
|
|
@cindex expression
|
|
The macro @code{expression} will create an @code{expressionS} structure based
|
|
on the text found at the global variable @code{input_line_pointer}.
|
|
|
|
@cindex make_expr_symbol
|
|
@cindex expr_symbol_where
|
|
A single @code{expressionS} structure can represent a single operation.
|
|
Complex expressions are formed by creating @dfn{expression symbols} and
|
|
combining them in @code{expressionS} structures. An expression symbol is
|
|
created by calling @code{make_expr_symbol}. An expression symbol should
|
|
naturally never appear in a symbol table, and the implementation of
|
|
@code{S_IS_LOCAL} (@pxref{Symbols}) reflects that. The function
|
|
@code{expr_symbol_where} returns non-zero if a symbol is an expression symbol,
|
|
and also returns the file and line for the expression which caused it to be
|
|
created.
|
|
|
|
The @code{expressionS} structure has two symbol fields, a number field, an
|
|
operator field, and a field indicating whether the number is unsigned.
|
|
|
|
The operator field is of type @code{operatorT}, and describes how to interpret
|
|
the other fields; see the definition in @file{expr.h} for the possibilities.
|
|
|
|
An @code{operatorT} value of @code{O_big} indicates either a floating point
|
|
number, stored in the global variable @code{generic_floating_point_number}, or
|
|
an integer to large to store in an @code{offsetT} type, stored in the global
|
|
array @code{generic_bignum}. This rather inflexible approach makes it
|
|
impossible to use floating point numbers or large expressions in complex
|
|
expressions.
|
|
|
|
@node Fixups
|
|
@subsection Fixups
|
|
@cindex internals, fixups
|
|
@cindex fixups
|
|
@cindex fixS structure
|
|
|
|
A @dfn{fixup} is basically anything which can not be resolved in the first
|
|
pass. Sometimes a fixup can be resolved by the end of the assembly; if not,
|
|
the fixup becomes a relocation entry in the object file.
|
|
|
|
@cindex fix_new
|
|
@cindex fix_new_exp
|
|
A fixup is created by a call to @code{fix_new} or @code{fix_new_exp}. Both
|
|
take a frag (@pxref{Frags}), a position within the frag, a size, an indication
|
|
of whether the fixup is PC relative, and a type. In a @code{BFD_ASSEMBLER}
|
|
GAS, the type is nominally a @code{bfd_reloc_code_real_type}, but several
|
|
targets use other type codes to represent fixups that can not be described as
|
|
relocations.
|
|
|
|
The @code{fixS} structure has a number of fields, several of which are obsolete
|
|
or are only used by a particular target. The important fields are:
|
|
|
|
@table @code
|
|
@item fx_frag
|
|
The frag (@pxref{Frags}) this fixup is in.
|
|
|
|
@item fx_where
|
|
The location within the frag where the fixup occurs.
|
|
|
|
@item fx_addsy
|
|
The symbol this fixup is against. Typically, the value of this symbol is added
|
|
into the object contents. This may be NULL.
|
|
|
|
@item fx_subsy
|
|
The value of this symbol is subtracted from the object contents. This is
|
|
normally NULL.
|
|
|
|
@item fx_offset
|
|
A number which is added into the fixup.
|
|
|
|
@item fx_addnumber
|
|
Some CPU backends use this field to convey information between
|
|
@code{md_apply_fix} and @code{tc_gen_reloc}. The machine independent code does
|
|
not use it.
|
|
|
|
@item fx_next
|
|
The next fixup in the section.
|
|
|
|
@item fx_r_type
|
|
The type of the fixup. This field is only defined if @code{BFD_ASSEMBLER}, or
|
|
if the target defines @code{NEED_FX_R_TYPE}.
|
|
|
|
@item fx_size
|
|
The size of the fixup. This is mostly used for error checking.
|
|
|
|
@item fx_pcrel
|
|
Whether the fixup is PC relative.
|
|
|
|
@item fx_done
|
|
Non-zero if the fixup has been applied, and no relocation entry needs to be
|
|
generated.
|
|
|
|
@item fx_file
|
|
@itemx fx_line
|
|
The file and line where the fixup was created.
|
|
|
|
@item tc_fix_data
|
|
This has the type @code{TC_FIX_TYPE}, and is only defined if the target defines
|
|
that macro.
|
|
@end table
|
|
|
|
@node Frags
|
|
@subsection Frags
|
|
@cindex internals, frags
|
|
@cindex frags
|
|
@cindex fragS structure.
|
|
|
|
The @code{fragS} structure is defined in @file{as.h}. Each frag represents a
|
|
portion of the final object file. As GAS reads the source file, it creates
|
|
frags to hold the data that it reads. At the end of the assembly the frags and
|
|
fixups are processed to produce the final contents.
|
|
|
|
@table @code
|
|
@item fr_address
|
|
The address of the frag. This is not set until the assembler rescans the list
|
|
of all frags after the entire input file is parsed. The function
|
|
@code{relax_segment} fills in this field.
|
|
|
|
@item fr_next
|
|
Pointer to the next frag in this (sub)section.
|
|
|
|
@item fr_fix
|
|
Fixed number of characters we know we're going to emit to the output file. May
|
|
be zero.
|
|
|
|
@item fr_var
|
|
Variable number of characters we may output, after the initial @code{fr_fix}
|
|
characters. May be zero.
|
|
|
|
@item fr_offset
|
|
The interpretation of this field is controlled by @code{fr_type}. Generally,
|
|
if @code{fr_var} is non-zero, this is a repeat count: the @code{fr_var}
|
|
characters are output @code{fr_offset} times.
|
|
|
|
@item line
|
|
Holds line number info when an assembler listing was requested.
|
|
|
|
@item fr_type
|
|
Relaxation state. This field indicates the interpretation of @code{fr_offset},
|
|
@code{fr_symbol} and the variable-length tail of the frag, as well as the
|
|
treatment it gets in various phases of processing. It does not affect the
|
|
initial @code{fr_fix} characters; they are always supposed to be output
|
|
verbatim (fixups aside). See below for specific values this field can have.
|
|
|
|
@item fr_subtype
|
|
Relaxation substate. If the macro @code{md_relax_frag} isn't defined, this is
|
|
assumed to be an index into @code{TC_GENERIC_RELAX_TABLE} for the generic
|
|
relaxation code to process (@pxref{Relaxation}). If @code{md_relax_frag} is
|
|
defined, this field is available for any use by the CPU-specific code.
|
|
|
|
@item fr_symbol
|
|
This normally indicates the symbol to use when relaxing the frag according to
|
|
@code{fr_type}.
|
|
|
|
@item fr_opcode
|
|
Points to the lowest-addressed byte of the opcode, for use in relaxation.
|
|
|
|
@item tc_frag_data
|
|
Target specific fragment data of type TC_FRAG_TYPE.
|
|
Only present if @code{TC_FRAG_TYPE} is defined.
|
|
|
|
@item fr_file
|
|
@itemx fr_line
|
|
The file and line where this frag was last modified.
|
|
|
|
@item fr_literal
|
|
Declared as a one-character array, this last field grows arbitrarily large to
|
|
hold the actual contents of the frag.
|
|
@end table
|
|
|
|
These are the possible relaxation states, provided in the enumeration type
|
|
@code{relax_stateT}, and the interpretations they represent for the other
|
|
fields:
|
|
|
|
@table @code
|
|
@item rs_align
|
|
@itemx rs_align_code
|
|
The start of the following frag should be aligned on some boundary. In this
|
|
frag, @code{fr_offset} is the logarithm (base 2) of the alignment in bytes.
|
|
(For example, if alignment on an 8-byte boundary were desired, @code{fr_offset}
|
|
would have a value of 3.) The variable characters indicate the fill pattern to
|
|
be used. The @code{fr_subtype} field holds the maximum number of bytes to skip
|
|
when doing this alignment. If more bytes are needed, the alignment is not
|
|
done. An @code{fr_subtype} value of 0 means no maximum, which is the normal
|
|
case. Target backends can use @code{rs_align_code} to handle certain types of
|
|
alignment differently.
|
|
|
|
@item rs_broken_word
|
|
This indicates that ``broken word'' processing should be done (@pxref{Broken
|
|
words}). If broken word processing is not necessary on the target machine,
|
|
this enumerator value will not be defined.
|
|
|
|
@item rs_fill
|
|
The variable characters are to be repeated @code{fr_offset} times. If
|
|
@code{fr_offset} is 0, this frag has a length of @code{fr_fix}. Most frags
|
|
have this type.
|
|
|
|
@item rs_leb128
|
|
This state is used to implement the DWARF ``little endian base 128''
|
|
variable length number format. The @code{fr_symbol} is always an expression
|
|
symbol, as constant expressions are emitted directly. The @code{fr_offset}
|
|
field is used during relaxation to hold the previous size of the number so
|
|
that we can determine if the fragment changed size.
|
|
|
|
@item rs_machine_dependent
|
|
Displacement relaxation is to be done on this frag. The target is indicated by
|
|
@code{fr_symbol} and @code{fr_offset}, and @code{fr_subtype} indicates the
|
|
particular machine-specific addressing mode desired. @xref{Relaxation}.
|
|
|
|
@item rs_org
|
|
The start of the following frag should be pushed back to some specific offset
|
|
within the section. (Some assemblers use the value as an absolute address; GAS
|
|
does not handle final absolute addresses, but rather requires that the linker
|
|
set them.) The offset is given by @code{fr_symbol} and @code{fr_offset}; one
|
|
character from the variable-length tail is used as the fill character.
|
|
@end table
|
|
|
|
@cindex frchainS structure
|
|
A chain of frags is built up for each subsection. The data structure
|
|
describing a chain is called a @code{frchainS}, and contains the following
|
|
fields:
|
|
|
|
@table @code
|
|
@item frch_root
|
|
Points to the first frag in the chain. May be NULL if there are no frags in
|
|
this chain.
|
|
@item frch_last
|
|
Points to the last frag in the chain, or NULL if there are none.
|
|
@item frch_next
|
|
Next in the list of @code{frchainS} structures.
|
|
@item frch_seg
|
|
Indicates the section this frag chain belongs to.
|
|
@item frch_subseg
|
|
Subsection (subsegment) number of this frag chain.
|
|
@item fix_root, fix_tail
|
|
(Defined only if @code{BFD_ASSEMBLER} is defined). Point to first and last
|
|
@code{fixS} structures associated with this subsection.
|
|
@item frch_obstack
|
|
Not currently used. Intended to be used for frag allocation for this
|
|
subsection. This should reduce frag generation caused by switching sections.
|
|
@item frch_frag_now
|
|
The current frag for this subsegment.
|
|
@end table
|
|
|
|
A @code{frchainS} corresponds to a subsection; each section has a list of
|
|
@code{frchainS} records associated with it. In most cases, only one subsection
|
|
of each section is used, so the list will only be one element long, but any
|
|
processing of frag chains should be prepared to deal with multiple chains per
|
|
section.
|
|
|
|
After the input files have been completely processed, and no more frags are to
|
|
be generated, the frag chains are joined into one per section for further
|
|
processing. After this point, it is safe to operate on one chain per section.
|
|
|
|
The assembler always has a current frag, named @code{frag_now}. More space is
|
|
allocated for the current frag using the @code{frag_more} function; this
|
|
returns a pointer to the amount of requested space. Relaxing is done using
|
|
variant frags allocated by @code{frag_var} or @code{frag_variant}
|
|
(@pxref{Relaxation}).
|
|
|
|
@node GAS processing
|
|
@section What GAS does when it runs
|
|
@cindex internals, overview
|
|
|
|
This is a quick look at what an assembler run looks like.
|
|
|
|
@itemize @bullet
|
|
@item
|
|
The assembler initializes itself by calling various init routines.
|
|
|
|
@item
|
|
For each source file, the @code{read_a_source_file} function reads in the file
|
|
and parses it. The global variable @code{input_line_pointer} points to the
|
|
current text; it is guaranteed to be correct up to the end of the line, but not
|
|
farther.
|
|
|
|
@item
|
|
For each line, the assembler passes labels to the @code{colon} function, and
|
|
isolates the first word. If it looks like a pseudo-op, the word is looked up
|
|
in the pseudo-op hash table @code{po_hash} and dispatched to a pseudo-op
|
|
routine. Otherwise, the target dependent @code{md_assemble} routine is called
|
|
to parse the instruction.
|
|
|
|
@item
|
|
When pseudo-ops or instructions output data, they add it to a frag, calling
|
|
@code{frag_more} to get space to store it in.
|
|
|
|
@item
|
|
Pseudo-ops and instructions can also output fixups created by @code{fix_new} or
|
|
@code{fix_new_exp}.
|
|
|
|
@item
|
|
For certain targets, instructions can create variant frags which are used to
|
|
store relaxation information (@pxref{Relaxation}).
|
|
|
|
@item
|
|
When the input file is finished, the @code{write_object_file} routine is
|
|
called. It assigns addresses to all the frags (@code{relax_segment}), resolves
|
|
all the fixups (@code{fixup_segment}), resolves all the symbol values (using
|
|
@code{resolve_symbol_value}), and finally writes out the file (in the
|
|
@code{BFD_ASSEMBLER} case, this is done by simply calling @code{bfd_close}).
|
|
@end itemize
|
|
|
|
@node Porting GAS
|
|
@section Porting GAS
|
|
@cindex porting
|
|
|
|
Each GAS target specifies two main things: the CPU file and the object format
|
|
file. Two main switches in the @file{configure.in} file handle this. The
|
|
first switches on CPU type to set the shell variable @code{cpu_type}. The
|
|
second switches on the entire target to set the shell variable @code{fmt}.
|
|
|
|
The configure script uses the value of @code{cpu_type} to select two files in
|
|
the @file{config} directory: @file{tc-@var{CPU}.c} and @file{tc-@var{CPU}.h}.
|
|
The configuration process will create a file named @file{targ-cpu.h} in the
|
|
build directory which includes @file{tc-@var{CPU}.h}.
|
|
|
|
The configure script also uses the value of @code{fmt} to select two files:
|
|
@file{obj-@var{fmt}.c} and @file{obj-@var{fmt}.h}. The configuration process
|
|
will create a file named @file{obj-format.h} in the build directory which
|
|
includes @file{obj-@var{fmt}.h}.
|
|
|
|
You can also set the emulation in the configure script by setting the @code{em}
|
|
variable. Normally the default value of @samp{generic} is fine. The
|
|
configuration process will create a file named @file{targ-env.h} in the build
|
|
directory which includes @file{te-@var{em}.h}.
|
|
|
|
Porting GAS to a new CPU requires writing the @file{tc-@var{CPU}} files.
|
|
Porting GAS to a new object file format requires writing the
|
|
@file{obj-@var{fmt}} files. There is sometimes some interaction between these
|
|
two files, but it is normally minimal.
|
|
|
|
The best approach is, of course, to copy existing files. The documentation
|
|
below assumes that you are looking at existing files to see usage details.
|
|
|
|
These interfaces have grown over time, and have never been carefully thought
|
|
out or designed. Nothing about the interfaces described here is cast in stone.
|
|
It is possible that they will change from one version of the assembler to the
|
|
next. Also, new macros are added all the time as they are needed.
|
|
|
|
@menu
|
|
* CPU backend:: Writing a CPU backend
|
|
* Object format backend:: Writing an object format backend
|
|
* Emulations:: Writing emulation files
|
|
@end menu
|
|
|
|
@node CPU backend
|
|
@subsection Writing a CPU backend
|
|
@cindex CPU backend
|
|
@cindex @file{tc-@var{CPU}}
|
|
|
|
The CPU backend files are the heart of the assembler. They are the only parts
|
|
of the assembler which actually know anything about the instruction set of the
|
|
processor.
|
|
|
|
You must define a reasonably small list of macros and functions in the CPU
|
|
backend files. You may define a large number of additional macros in the CPU
|
|
backend files, not all of which are documented here. You must, of course,
|
|
define macros in the @file{.h} file, which is included by every assembler
|
|
source file. You may define the functions as macros in the @file{.h} file, or
|
|
as functions in the @file{.c} file.
|
|
|
|
@table @code
|
|
@item TC_@var{CPU}
|
|
@cindex TC_@var{CPU}
|
|
By convention, you should define this macro in the @file{.h} file. For
|
|
example, @file{tc-m68k.h} defines @code{TC_M68K}. You might have to use this
|
|
if it is necessary to add CPU specific code to the object format file.
|
|
|
|
@item TARGET_FORMAT
|
|
This macro is the BFD target name to use when creating the output file. This
|
|
will normally depend upon the @code{OBJ_@var{FMT}} macro.
|
|
|
|
@item TARGET_ARCH
|
|
This macro is the BFD architecture to pass to @code{bfd_set_arch_mach}.
|
|
|
|
@item TARGET_MACH
|
|
This macro is the BFD machine number to pass to @code{bfd_set_arch_mach}. If
|
|
it is not defined, GAS will use 0.
|
|
|
|
@item TARGET_BYTES_BIG_ENDIAN
|
|
You should define this macro to be non-zero if the target is big endian, and
|
|
zero if the target is little endian.
|
|
|
|
@item md_shortopts
|
|
@itemx md_longopts
|
|
@itemx md_longopts_size
|
|
@itemx md_parse_option
|
|
@itemx md_show_usage
|
|
@cindex md_shortopts
|
|
@cindex md_longopts
|
|
@cindex md_longopts_size
|
|
@cindex md_parse_option
|
|
@cindex md_show_usage
|
|
GAS uses these variables and functions during option processing.
|
|
@code{md_shortopts} is a @code{const char *} which GAS adds to the machine
|
|
independent string passed to @code{getopt}. @code{md_longopts} is a
|
|
@code{struct option []} which GAS adds to the machine independent long options
|
|
passed to @code{getopt}; you may use @code{OPTION_MD_BASE}, defined in
|
|
@file{as.h}, as the start of a set of long option indices, if necessary.
|
|
@code{md_longopts_size} is a @code{size_t} holding the size @code{md_longopts}.
|
|
GAS will call @code{md_parse_option} whenever @code{getopt} returns an
|
|
unrecognized code, presumably indicating a special code value which appears in
|
|
@code{md_longopts}. GAS will call @code{md_show_usage} when a usage message is
|
|
printed; it should print a description of the machine specific options.
|
|
|
|
@item md_begin
|
|
@cindex md_begin
|
|
GAS will call this function at the start of the assembly, after the command
|
|
line arguments have been parsed and all the machine independent initializations
|
|
have been completed.
|
|
|
|
@item md_cleanup
|
|
@cindex md_cleanup
|
|
If you define this macro, GAS will call it at the end of each input file.
|
|
|
|
@item md_assemble
|
|
@cindex md_assemble
|
|
GAS will call this function for each input line which does not contain a
|
|
pseudo-op. The argument is a null terminated string. The function should
|
|
assemble the string as an instruction with operands. Normally
|
|
@code{md_assemble} will do this by calling @code{frag_more} and writing out
|
|
some bytes (@pxref{Frags}). @code{md_assemble} will call @code{fix_new} to
|
|
create fixups as needed (@pxref{Fixups}). Targets which need to do special
|
|
purpose relaxation will call @code{frag_var}.
|
|
|
|
@item md_pseudo_table
|
|
@cindex md_pseudo_table
|
|
This is a const array of type @code{pseudo_typeS}. It is a mapping from
|
|
pseudo-op names to functions. You should use this table to implement
|
|
pseudo-ops which are specific to the CPU.
|
|
|
|
@item tc_conditional_pseudoop
|
|
@cindex tc_conditional_pseudoop
|
|
If this macro is defined, GAS will call it with a @code{pseudo_typeS} argument.
|
|
It should return non-zero if the pseudo-op is a conditional which controls
|
|
whether code is assembled, such as @samp{.if}. GAS knows about the normal
|
|
conditional pseudo-ops,and you should normally not have to define this macro.
|
|
|
|
@item comment_chars
|
|
@cindex comment_chars
|
|
This is a null terminated @code{const char} array of characters which start a
|
|
comment.
|
|
|
|
@item tc_comment_chars
|
|
@cindex tc_comment_chars
|
|
If this macro is defined, GAS will use it instead of @code{comment_chars}.
|
|
|
|
@item line_comment_chars
|
|
@cindex line_comment_chars
|
|
This is a null terminated @code{const char} array of characters which start a
|
|
comment when they appear at the start of a line.
|
|
|
|
@item line_separator_chars
|
|
@cindex line_separator_chars
|
|
This is a null terminated @code{const char} array of characters which separate
|
|
lines (the semicolon is such a character by default, and need not be listed in
|
|
this array).
|
|
|
|
@item EXP_CHARS
|
|
@cindex EXP_CHARS
|
|
This is a null terminated @code{const char} array of characters which may be
|
|
used as the exponent character in a floating point number. This is normally
|
|
@code{"eE"}.
|
|
|
|
@item FLT_CHARS
|
|
@cindex FLT_CHARS
|
|
This is a null terminated @code{const char} array of characters which may be
|
|
used to indicate a floating point constant. A zero followed by one of these
|
|
characters is assumed to be followed by a floating point number; thus they
|
|
operate the way that @code{0x} is used to indicate a hexadecimal constant.
|
|
Usually this includes @samp{r} and @samp{f}.
|
|
|
|
@item LEX_AT
|
|
@cindex LEX_AT
|
|
You may define this macro to the lexical type of the @kbd{@}} character. The
|
|
default is zero.
|
|
|
|
Lexical types are a combination of @code{LEX_NAME} and @code{LEX_BEGIN_NAME},
|
|
both defined in @file{read.h}. @code{LEX_NAME} indicates that the character
|
|
may appear in a name. @code{LEX_BEGIN_NAME} indicates that the character may
|
|
appear at the beginning of a nem.
|
|
|
|
@item LEX_BR
|
|
@cindex LEX_BR
|
|
You may define this macro to the lexical type of the brace characters @kbd{@{},
|
|
@kbd{@}}, @kbd{[}, and @kbd{]}. The default value is zero.
|
|
|
|
@item LEX_PCT
|
|
@cindex LEX_PCT
|
|
You may define this macro to the lexical type of the @kbd{%} character. The
|
|
default value is zero.
|
|
|
|
@item LEX_QM
|
|
@cindex LEX_QM
|
|
You may define this macro to the lexical type of the @kbd{?} character. The
|
|
default value it zero.
|
|
|
|
@item LEX_DOLLAR
|
|
@cindex LEX_DOLLAR
|
|
You may define this macro to the lexical type of the @kbd{$} character. The
|
|
default value is @code{LEX_NAME | LEX_BEGIN_NAME}.
|
|
|
|
@item SINGLE_QUOTE_STRINGS
|
|
@cindex SINGLE_QUOTE_STRINGS
|
|
If you define this macro, GAS will treat single quotes as string delimiters.
|
|
Normally only double quotes are accepted as string delimiters.
|
|
|
|
@item NO_STRING_ESCAPES
|
|
@cindex NO_STRING_ESCAPES
|
|
If you define this macro, GAS will not permit escape sequences in a string.
|
|
|
|
@item ONLY_STANDARD_ESCAPES
|
|
@cindex ONLY_STANDARD_ESCAPES
|
|
If you define this macro, GAS will warn about the use of nonstandard escape
|
|
sequences in a string.
|
|
|
|
@item md_start_line_hook
|
|
@cindex md_start_line_hook
|
|
If you define this macro, GAS will call it at the start of each line.
|
|
|
|
@item LABELS_WITHOUT_COLONS
|
|
@cindex LABELS_WITHOUT_COLONS
|
|
If you define this macro, GAS will assume that any text at the start of a line
|
|
is a label, even if it does not have a colon.
|
|
|
|
@item TC_START_LABEL
|
|
@cindex TC_START_LABEL
|
|
You may define this macro to control what GAS considers to be a label. The
|
|
default definition is to accept any name followed by a colon character.
|
|
|
|
@item NO_PSEUDO_DOT
|
|
@cindex NO_PSEUDO_DOT
|
|
If you define this macro, GAS will not require pseudo-ops to start with a
|
|
@kbd{.} character.
|
|
|
|
@item TC_EQUAL_IN_INSN
|
|
@cindex TC_EQUAL_IN_INSN
|
|
If you define this macro, it should return nonzero if the instruction is
|
|
permitted to contain an @kbd{=} character. GAS will use this to decide if a
|
|
@kbd{=} is an assignment or an instruction.
|
|
|
|
@item TC_EOL_IN_INSN
|
|
@cindex TC_EOL_IN_INSN
|
|
If you define this macro, it should return nonzero if the current input line
|
|
pointer should be treated as the end of a line.
|
|
|
|
@item md_parse_name
|
|
@cindex md_parse_name
|
|
If this macro is defined, GAS will call it for any symbol found in an
|
|
expression. You can define this to handle special symbols in a special way.
|
|
If a symbol always has a certain value, you should normally enter it in the
|
|
symbol table, perhaps using @code{reg_section}.
|
|
|
|
@item md_undefined_symbol
|
|
@cindex md_undefined_symbol
|
|
GAS will call this function when a symbol table lookup fails, before it
|
|
creates a new symbol. Typically this would be used to supply symbols whose
|
|
name or value changes dynamically, possibly in a context sensitive way.
|
|
Predefined symbols with fixed values, such as register names or condition
|
|
codes, are typically entered directly into the symbol table when @code{md_begin}
|
|
is called.
|
|
|
|
@item md_operand
|
|
@cindex md_operand
|
|
GAS will call this function for any expression that can not be recognized.
|
|
When the function is called, @code{input_line_pointer} will point to the start
|
|
of the expression.
|
|
|
|
@item tc_unrecognized_line
|
|
@cindex tc_unrecognized_line
|
|
If you define this macro, GAS will call it when it finds a line that it can not
|
|
parse.
|
|
|
|
@item md_do_align
|
|
@cindex md_do_align
|
|
You may define this macro to handle an alignment directive. GAS will call it
|
|
when the directive is seen in the input file. For example, the i386 backend
|
|
uses this to generate efficient nop instructions of varying lengths, depending
|
|
upon the number of bytes that the alignment will skip.
|
|
|
|
@item HANDLE_ALIGN
|
|
@cindex HANDLE_ALIGN
|
|
You may define this macro to do special handling for an alignment directive.
|
|
GAS will call it at the end of the assembly.
|
|
|
|
@item md_flush_pending_output
|
|
@cindex md_flush_pending_output
|
|
If you define this macro, GAS will call it each time it skips any space because of a
|
|
space filling or alignment or data allocation pseudo-op.
|
|
|
|
@item TC_PARSE_CONS_EXPRESSION
|
|
@cindex TC_PARSE_CONS_EXPRESSION
|
|
You may define this macro to parse an expression used in a data allocation
|
|
pseudo-op such as @code{.word}. You can use this to recognize relocation
|
|
directives that may appear in such directives.
|
|
|
|
@item BITFIELD_CONS_EXPRESSION
|
|
@cindex BITFIELD_CONS_EXPRESSION
|
|
If you define this macro, GAS will recognize bitfield instructions in data
|
|
allocation pseudo-ops, as used on the i960.
|
|
|
|
@item REPEAT_CONS_EXPRESSION
|
|
@cindex REPEAT_CONS_EXPRESSION
|
|
If you define this macro, GAS will recognize repeat counts in data allocation
|
|
pseudo-ops, as used on the MIPS.
|
|
|
|
@item md_cons_align
|
|
@cindex md_cons_align
|
|
You may define this macro to do any special alignment before a data allocation
|
|
pseudo-op.
|
|
|
|
@item TC_CONS_FIX_NEW
|
|
@cindex TC_CONS_FIX_NEW
|
|
You may define this macro to generate a fixup for a data allocation pseudo-op.
|
|
|
|
@item TC_INIT_FIX_DATA (@var{fixp})
|
|
@cindex TC_INIT_FIX_DATA
|
|
A C statement to initialize the target specific fields of fixup @var{fixp}.
|
|
|
|
@item TC_FIX_DATA_PRINT (@var{stream}, @var{fixp})
|
|
@cindex TC_FIX_DATA_PRINT
|
|
A C statement to output target specific debugging information for
|
|
fixup @var{fixp} to @var{stream}. This macro is called by @code{print_fixup}.
|
|
|
|
@item md_number_to_chars
|
|
@cindex md_number_to_chars
|
|
This should just call either @code{number_to_chars_bigendian} or
|
|
@code{number_to_chars_littleendian}, whichever is appropriate. On targets like
|
|
the MIPS which support options to change the endianness, which function to call
|
|
is a runtime decision. On other targets, @code{md_number_to_chars} can be a
|
|
simple macro.
|
|
|
|
@item md_reloc_size
|
|
@cindex md_reloc_size
|
|
This variable is only used in the original version of gas (not
|
|
@code{BFD_ASSEMBLER} and not @code{MANY_SEGMENTS}). It holds the size of a
|
|
relocation entry.
|
|
|
|
@item WORKING_DOT_WORD
|
|
@itemx md_short_jump_size
|
|
@itemx md_long_jump_size
|
|
@itemx md_create_short_jump
|
|
@itemx md_create_long_jump
|
|
@cindex WORKING_DOT_WORD
|
|
@cindex md_short_jump_size
|
|
@cindex md_long_jump_size
|
|
@cindex md_create_short_jump
|
|
@cindex md_create_long_jump
|
|
If @code{WORKING_DOT_WORD} is defined, GAS will not do broken word processing
|
|
(@pxref{Broken words}). Otherwise, you should set @code{md_short_jump_size} to
|
|
the size of a short jump (a jump that is just long enough to jump around a long
|
|
jmp) and @code{md_long_jump_size} to the size of a long jump (a jump that can
|
|
go anywhere in the function), You should define @code{md_create_short_jump} to
|
|
create a short jump around a long jump, and define @code{md_create_long_jump}
|
|
to create a long jump.
|
|
|
|
@item md_estimate_size_before_relax
|
|
@cindex md_estimate_size_before_relax
|
|
This function returns an estimate of the size of a @code{rs_machine_dependent}
|
|
frag before any relaxing is done. It may also create any necessary
|
|
relocations.
|
|
|
|
@item md_relax_frag
|
|
@cindex md_relax_frag
|
|
This macro may be defined to relax a frag. GAS will call this with the frag
|
|
and the change in size of all previous frags; @code{md_relax_frag} should
|
|
return the change in size of the frag. @xref{Relaxation}.
|
|
|
|
@item TC_GENERIC_RELAX_TABLE
|
|
@cindex TC_GENERIC_RELAX_TABLE
|
|
If you do not define @code{md_relax_frag}, you may define
|
|
@code{TC_GENERIC_RELAX_TABLE} as a table of @code{relax_typeS} structures. The
|
|
machine independent code knows how to use such a table to relax PC relative
|
|
references. See @file{tc-m68k.c} for an example. @xref{Relaxation}.
|
|
|
|
@item md_prepare_relax_scan
|
|
@cindex md_prepare_relax_scan
|
|
If defined, it is a C statement that is invoked prior to scanning
|
|
the relax table.
|
|
|
|
@item LINKER_RELAXING_SHRINKS_ONLY
|
|
@cindex LINKER_RELAXING_SHRINKS_ONLY
|
|
If you define this macro, and the global variable @samp{linkrelax} is set
|
|
(because of a command line option, or unconditionally in @code{md_begin}), a
|
|
@samp{.align} directive will cause extra space to be allocated. The linker can
|
|
then discard this space when relaxing the section.
|
|
|
|
@item md_convert_frag
|
|
@cindex md_convert_frag
|
|
GAS will call this for each rs_machine_dependent fragment.
|
|
The instruction is completed using the data from the relaxation pass.
|
|
It may also create any necessary relocations.
|
|
@xref{Relaxation}.
|
|
|
|
@item md_apply_fix
|
|
@cindex md_apply_fix
|
|
GAS will call this for each fixup. It should store the correct value in the
|
|
object file.
|
|
|
|
@item TC_HANDLES_FX_DONE
|
|
@cindex TC_HANDLES_FX_DONE
|
|
If this macro is defined, it means that @code{md_apply_fix} correctly sets the
|
|
@code{fx_done} field in the fixup.
|
|
|
|
@item tc_gen_reloc
|
|
@cindex tc_gen_reloc
|
|
A @code{BFD_ASSEMBLER} GAS will call this to generate a reloc. GAS will pass
|
|
the resulting reloc to @code{bfd_install_relocation}. This currently works
|
|
poorly, as @code{bfd_install_relocation} often does the wrong thing, and
|
|
instances of @code{tc_gen_reloc} have been written to work around the problems,
|
|
which in turns makes it difficult to fix @code{bfd_install_relocation}.
|
|
|
|
@item RELOC_EXPANSION_POSSIBLE
|
|
@cindex RELOC_EXPANSION_POSSIBLE
|
|
If you define this macro, it means that @code{tc_gen_reloc} may return multiple
|
|
relocation entries for a single fixup. In this case, the return value of
|
|
@code{tc_gen_reloc} is a pointer to a null terminated array.
|
|
|
|
@item MAX_RELOC_EXPANSION
|
|
@cindex MAX_RELOC_EXPANSION
|
|
You must define this if @code{RELOC_EXPANSION_POSSIBLE} is defined; it
|
|
indicates the largest number of relocs which @code{tc_gen_reloc} may return for
|
|
a single fixup.
|
|
|
|
@item tc_fix_adjustable
|
|
@cindex tc_fix_adjustable
|
|
You may define this macro to indicate whether a fixup against a locally defined
|
|
symbol should be adjusted to be against the section symbol. It should return a
|
|
non-zero value if the adjustment is acceptable.
|
|
|
|
@item MD_PCREL_FROM_SECTION
|
|
@cindex MD_PCREL_FROM_SECTION
|
|
If you define this macro, it should return the offset between the address of a
|
|
PC relative fixup and the position from which the PC relative adjustment should
|
|
be made. On many processors, the base of a PC relative instruction is the next
|
|
instruction, so this macro would return the length of an instruction.
|
|
|
|
@item md_pcrel_from
|
|
@cindex md_pcrel_from
|
|
This is the default value of @code{MD_PCREL_FROM_SECTION}. The difference is
|
|
that @code{md_pcrel_from} does not take a section argument.
|
|
|
|
@item tc_frob_label
|
|
@cindex tc_frob_label
|
|
If you define this macro, GAS will call it each time a label is defined.
|
|
|
|
@item md_section_align
|
|
@cindex md_section_align
|
|
GAS will call this function for each section at the end of the assembly, to
|
|
permit the CPU backend to adjust the alignment of a section.
|
|
|
|
@item tc_frob_section
|
|
@cindex tc_frob_section
|
|
If you define this macro, a @code{BFD_ASSEMBLER} GAS will call it for each
|
|
section at the end of the assembly.
|
|
|
|
@item tc_frob_file_before_adjust
|
|
@cindex tc_frob_file_before_adjust
|
|
If you define this macro, GAS will call it after the symbol values are
|
|
resolved, but before the fixups have been changed from local symbols to section
|
|
symbols.
|
|
|
|
@item tc_frob_symbol
|
|
@cindex tc_frob_symbol
|
|
If you define this macro, GAS will call it for each symbol. You can indicate
|
|
that the symbol should not be included in the object file by definining this
|
|
macro to set its second argument to a non-zero value.
|
|
|
|
@item tc_frob_file
|
|
@cindex tc_frob_file
|
|
If you define this macro, GAS will call it after the symbol table has been
|
|
completed, but before the relocations have been generated.
|
|
|
|
@item tc_frob_file_after_relocs
|
|
If you define this macro, GAS will call it after the relocs have been
|
|
generated.
|
|
|
|
@item LISTING_HEADER
|
|
A string to use on the header line of a listing. The default value is simply
|
|
@code{"GAS LISTING"}.
|
|
|
|
@item LISTING_WORD_SIZE
|
|
The number of bytes to put into a word in a listing. This affects the way the
|
|
bytes are clumped together in the listing. For example, a value of 2 might
|
|
print @samp{1234 5678} where a value of 1 would print @samp{12 34 56 78}. The
|
|
default value is 4.
|
|
|
|
@item LISTING_LHS_WIDTH
|
|
The number of words of data to print on the first line of a listing for a
|
|
particular source line, where each word is @code{LISTING_WORD_SIZE} bytes. The
|
|
default value is 1.
|
|
|
|
@item LISTING_LHS_WIDTH_SECOND
|
|
Like @code{LISTING_LHS_WIDTH}, but applying to the second and subsequent line
|
|
of the data printed for a particular source line. The default value is 1.
|
|
|
|
@item LISTING_LHS_CONT_LINES
|
|
The maximum number of continuation lines to print in a listing for a particular
|
|
source line. The default value is 4.
|
|
|
|
@item LISTING_RHS_WIDTH
|
|
The maximum number of characters to print from one line of the input file. The
|
|
default value is 100.
|
|
@end table
|
|
|
|
@node Object format backend
|
|
@subsection Writing an object format backend
|
|
@cindex object format backend
|
|
@cindex @file{obj-@var{fmt}}
|
|
|
|
As with the CPU backend, the object format backend must define a few things,
|
|
and may define some other things. The interface to the object format backend
|
|
is generally simpler; most of the support for an object file format consists of
|
|
defining a number of pseudo-ops.
|
|
|
|
The object format @file{.h} file must include @file{targ-cpu.h}.
|
|
|
|
This section will only define the @code{BFD_ASSEMBLER} version of GAS. It is
|
|
impossible to support a new object file format using any other version anyhow,
|
|
as the original GAS version only supports a.out, and the @code{MANY_SEGMENTS}
|
|
GAS version only supports COFF.
|
|
|
|
@table @code
|
|
@item OBJ_@var{format}
|
|
@cindex OBJ_@var{format}
|
|
By convention, you should define this macro in the @file{.h} file. For
|
|
example, @file{obj-elf.h} defines @code{OBJ_ELF}. You might have to use this
|
|
if it is necessary to add object file format specific code to the CPU file.
|
|
|
|
@item obj_begin
|
|
If you define this macro, GAS will call it at the start of the assembly, after
|
|
the command line arguments have been parsed and all the machine independent
|
|
initializations have been completed.
|
|
|
|
@item obj_app_file
|
|
@cindex obj_app_file
|
|
If you define this macro, GAS will invoke it when it sees a @code{.file}
|
|
pseudo-op or a @samp{#} line as used by the C preprocessor.
|
|
|
|
@item OBJ_COPY_SYMBOL_ATTRIBUTES
|
|
@cindex OBJ_COPY_SYMBOL_ATTRIBUTES
|
|
You should define this macro to copy object format specific information from
|
|
one symbol to another. GAS will call it when one symbol is equated to
|
|
another.
|
|
|
|
@item obj_fix_adjustable
|
|
@cindex obj_fix_adjustable
|
|
You may define this macro to indicate whether a fixup against a locally defined
|
|
symbol should be adjusted to be against the section symbol. It should return a
|
|
non-zero value if the adjustment is acceptable.
|
|
|
|
@item obj_sec_sym_ok_for_reloc
|
|
@cindex obj_sec_sym_ok_for_reloc
|
|
You may define this macro to indicate that it is OK to use a section symbol in
|
|
a relocateion entry. If it is not, GAS will define a new symbol at the start
|
|
of a section.
|
|
|
|
@item EMIT_SECTION_SYMBOLS
|
|
@cindex EMIT_SECTION_SYMBOLS
|
|
You should define this macro with a zero value if you do not want to include
|
|
section symbols in the output symbol table. The default value for this macro
|
|
is one.
|
|
|
|
@item obj_adjust_symtab
|
|
@cindex obj_adjust_symtab
|
|
If you define this macro, GAS will invoke it just before setting the symbol
|
|
table of the output BFD. For example, the COFF support uses this macro to
|
|
generate a @code{.file} symbol if none was generated previously.
|
|
|
|
@item SEPARATE_STAB_SECTIONS
|
|
@cindex SEPARATE_STAB_SECTIONS
|
|
You may define this macro to indicate that stabs should be placed in separate
|
|
sections, as in ELF.
|
|
|
|
@item INIT_STAB_SECTION
|
|
@cindex INIT_STAB_SECTION
|
|
You may define this macro to initialize the stabs section in the output file.
|
|
|
|
@item OBJ_PROCESS_STAB
|
|
@cindex OBJ_PROCESS_STAB
|
|
You may define this macro to do specific processing on a stabs entry.
|
|
|
|
@item obj_frob_section
|
|
@cindex obj_frob_section
|
|
If you define this macro, GAS will call it for each section at the end of the
|
|
assembly.
|
|
|
|
@item obj_frob_file_before_adjust
|
|
@cindex obj_frob_file_before_adjust
|
|
If you define this macro, GAS will call it after the symbol values are
|
|
resolved, but before the fixups have been changed from local symbols to section
|
|
symbols.
|
|
|
|
@item obj_frob_symbol
|
|
@cindex obj_frob_symbol
|
|
If you define this macro, GAS will call it for each symbol. You can indicate
|
|
that the symbol should not be included in the object file by definining this
|
|
macro to set its second argument to a non-zero value.
|
|
|
|
@item obj_frob_file
|
|
@cindex obj_frob_file
|
|
If you define this macro, GAS will call it after the symbol table has been
|
|
completed, but before the relocations have been generated.
|
|
|
|
@item obj_frob_file_after_relocs
|
|
If you define this macro, GAS will call it after the relocs have been
|
|
generated.
|
|
@end table
|
|
|
|
@node Emulations
|
|
@subsection Writing emulation files
|
|
|
|
Normally you do not have to write an emulation file. You can just use
|
|
@file{te-generic.h}.
|
|
|
|
If you do write your own emulation file, it must include @file{obj-format.h}.
|
|
|
|
An emulation file will often define @code{TE_@var{EM}}; this may then be used
|
|
in other files to change the output.
|
|
|
|
@node Relaxation
|
|
@section Relaxation
|
|
@cindex relaxation
|
|
|
|
@dfn{Relaxation} is a generic term used when the size of some instruction or
|
|
data depends upon the value of some symbol or other data.
|
|
|
|
GAS knows to relax a particular type of PC relative relocation using a table.
|
|
You can also define arbitrarily complex forms of relaxation yourself.
|
|
|
|
@menu
|
|
* Relaxing with a table:: Relaxing with a table
|
|
* General relaxing:: General relaxing
|
|
@end menu
|
|
|
|
@node Relaxing with a table
|
|
@subsection Relaxing with a table
|
|
|
|
If you do not define @code{md_relax_frag}, and you do define
|
|
@code{TC_GENERIC_RELAX_TABLE}, GAS will relax @code{rs_machine_dependent} frags
|
|
based on the frag subtype and the displacement to some specified target
|
|
address. The basic idea is that several machines have different addressing
|
|
modes for instructions that can specify different ranges of values, with
|
|
successive modes able to access wider ranges, including the entirety of the
|
|
previous range. Smaller ranges are assumed to be more desirable (perhaps the
|
|
instruction requires one word instead of two or three); if this is not the
|
|
case, don't describe the smaller-range, inferior mode.
|
|
|
|
The @code{fr_subtype} field of a frag is an index into a CPU-specific
|
|
relaxation table. That table entry indicates the range of values that can be
|
|
stored, the number of bytes that will have to be added to the frag to
|
|
accomodate the addressing mode, and the index of the next entry to examine if
|
|
the value to be stored is outside the range accessible by the current
|
|
addressing mode. The @code{fr_symbol} field of the frag indicates what symbol
|
|
is to be accessed; the @code{fr_offset} field is added in.
|
|
|
|
If the @code{TC_PCREL_ADJUST} macro is defined, which currently should only happen
|
|
for the NS32k family, the @code{TC_PCREL_ADJUST} macro is called on the frag to
|
|
compute an adjustment to be made to the displacement.
|
|
|
|
The value fitted by the relaxation code is always assumed to be a displacement
|
|
from the current frag. (More specifically, from @code{fr_fix} bytes into the
|
|
frag.)
|
|
@ignore
|
|
This seems kinda silly. What about fitting small absolute values? I suppose
|
|
@code{md_assemble} is supposed to take care of that, but if the operand is a
|
|
difference between symbols, it might not be able to, if the difference was not
|
|
computable yet.
|
|
@end ignore
|
|
|
|
The end of the relaxation sequence is indicated by a ``next'' value of 0. This
|
|
means that the first entry in the table can't be used.
|
|
|
|
For some configurations, the linker can do relaxing within a section of an
|
|
object file. If call instructions of various sizes exist, the linker can
|
|
determine which should be used in each instance, when a symbol's value is
|
|
resolved. In order for the linker to avoid wasting space and having to insert
|
|
no-op instructions, it must be able to expand or shrink the section contents
|
|
while still preserving intra-section references and meeting alignment
|
|
requirements.
|
|
|
|
For the i960 using b.out format, no expansion is done; instead, each
|
|
@samp{.align} directive causes extra space to be allocated, enough that when
|
|
the linker is relaxing a section and removing unneeded space, it can discard
|
|
some or all of this extra padding and cause the following data to be correctly
|
|
aligned.
|
|
|
|
For the H8/300, I think the linker expands calls that can't reach, and doesn't
|
|
worry about alignment issues; the cpu probably never needs any significant
|
|
alignment beyond the instruction size.
|
|
|
|
The relaxation table type contains these fields:
|
|
|
|
@table @code
|
|
@item long rlx_forward
|
|
Forward reach, must be non-negative.
|
|
@item long rlx_backward
|
|
Backward reach, must be zero or negative.
|
|
@item rlx_length
|
|
Length in bytes of this addressing mode.
|
|
@item rlx_more
|
|
Index of the next-longer relax state, or zero if there is no next relax state.
|
|
@end table
|
|
|
|
The relaxation is done in @code{relax_segment} in @file{write.c}. The
|
|
difference in the length fields between the original mode and the one finally
|
|
chosen by the relaxing code is taken as the size by which the current frag will
|
|
be increased in size. For example, if the initial relaxing mode has a length
|
|
of 2 bytes, and because of the size of the displacement, it gets upgraded to a
|
|
mode with a size of 6 bytes, it is assumed that the frag will grow by 4 bytes.
|
|
(The initial two bytes should have been part of the fixed portion of the frag,
|
|
since it is already known that they will be output.) This growth must be
|
|
effected by @code{md_convert_frag}; it should increase the @code{fr_fix} field
|
|
by the appropriate size, and fill in the appropriate bytes of the frag.
|
|
(Enough space for the maximum growth should have been allocated in the call to
|
|
frag_var as the second argument.)
|
|
|
|
If relocation records are needed, they should be emitted by
|
|
@code{md_estimate_size_before_relax}. This function should examine the target
|
|
symbol of the supplied frag and correct the @code{fr_subtype} of the frag if
|
|
needed. When this function is called, if the symbol has not yet been defined,
|
|
it will not become defined later; however, its value may still change if the
|
|
section it is in gets relaxed.
|
|
|
|
Usually, if the symbol is in the same section as the frag (given by the
|
|
@var{sec} argument), the narrowest likely relaxation mode is stored in
|
|
@code{fr_subtype}, and that's that.
|
|
|
|
If the symbol is undefined, or in a different section (and therefore moveable
|
|
to an arbitrarily large distance), the largest available relaxation mode is
|
|
specified, @code{fix_new} is called to produce the relocation record,
|
|
@code{fr_fix} is increased to include the relocated field (remember, this
|
|
storage was allocated when @code{frag_var} was called), and @code{frag_wane} is
|
|
called to convert the frag to an @code{rs_fill} frag with no variant part.
|
|
Sometimes changing addressing modes may also require rewriting the instruction.
|
|
It can be accessed via @code{fr_opcode} or @code{fr_fix}.
|
|
|
|
Sometimes @code{fr_var} is increased instead, and @code{frag_wane} is not
|
|
called. I'm not sure, but I think this is to keep @code{fr_fix} referring to
|
|
an earlier byte, and @code{fr_subtype} set to @code{rs_machine_dependent} so
|
|
that @code{md_convert_frag} will get called.
|
|
|
|
@node General relaxing
|
|
@subsection General relaxing
|
|
|
|
If using a simple table is not suitable, you may implement arbitrarily complex
|
|
relaxation semantics yourself. For example, the MIPS backend uses this to emit
|
|
different instruction sequences depending upon the size of the symbol being
|
|
accessed.
|
|
|
|
When you assemble an instruction that may need relaxation, you should allocate
|
|
a frag using @code{frag_var} or @code{frag_variant} with a type of
|
|
@code{rs_machine_dependent}. You should store some sort of information in the
|
|
@code{fr_subtype} field so that you can figure out what to do with the frag
|
|
later.
|
|
|
|
When GAS reaches the end of the input file, it will look through the frags and
|
|
work out their final sizes.
|
|
|
|
GAS will first call @code{md_estimate_size_before_relax} on each
|
|
@code{rs_machine_dependent} frag. This function must return an estimated size
|
|
for the frag.
|
|
|
|
GAS will then loop over the frags, calling @code{md_relax_frag} on each
|
|
@code{rs_machine_dependent} frag. This function should return the change in
|
|
size of the frag. GAS will keep looping over the frags until none of the frags
|
|
changes size.
|
|
|
|
@node Broken words
|
|
@section Broken words
|
|
@cindex internals, broken words
|
|
@cindex broken words
|
|
|
|
Some compilers, including GCC, will sometimes emit switch tables specifying
|
|
16-bit @code{.word} displacements to branch targets, and branch instructions
|
|
that load entries from that table to compute the target address. If this is
|
|
done on a 32-bit machine, there is a chance (at least with really large
|
|
functions) that the displacement will not fit in 16 bits. The assembler
|
|
handles this using a concept called @dfn{broken words}. This idea is well
|
|
named, since there is an implied promise that the 16-bit field will in fact
|
|
hold the specified displacement.
|
|
|
|
If broken word processing is enabled, and a situation like this is encountered,
|
|
the assembler will insert a jump instruction into the instruction stream, close
|
|
enough to be reached with the 16-bit displacement. This jump instruction will
|
|
transfer to the real desired target address. Thus, as long as the @code{.word}
|
|
value really is used as a displacement to compute an address to jump to, the
|
|
net effect will be correct (minus a very small efficiency cost). If
|
|
@code{.word} directives with label differences for values are used for other
|
|
purposes, however, things may not work properly. For targets which use broken
|
|
words, the @samp{-K} option will warn when a broken word is discovered.
|
|
|
|
The broken word code is turned off by the @code{WORKING_DOT_WORD} macro. It
|
|
isn't needed if @code{.word} emits a value large enough to contain an address
|
|
(or, more correctly, any possible difference between two addresses).
|
|
|
|
@node Internal functions
|
|
@section Internal functions
|
|
|
|
This section describes basic internal functions used by GAS.
|
|
|
|
@menu
|
|
* Warning and error messages:: Warning and error messages
|
|
* Hash tables:: Hash tables
|
|
@end menu
|
|
|
|
@node Warning and error messages
|
|
@subsection Warning and error messages
|
|
|
|
@deftypefun @{@} int had_warnings (void)
|
|
@deftypefunx @{@} int had_errors (void)
|
|
Returns non-zero if any warnings or errors, respectively, have been printed
|
|
during this invocation.
|
|
@end deftypefun
|
|
|
|
@deftypefun @{@} void as_perror (const char *@var{gripe}, const char *@var{filename})
|
|
Displays a BFD or system error, then clears the error status.
|
|
@end deftypefun
|
|
|
|
@deftypefun @{@} void as_tsktsk (const char *@var{format}, ...)
|
|
@deftypefunx @{@} void as_warn (const char *@var{format}, ...)
|
|
@deftypefunx @{@} void as_bad (const char *@var{format}, ...)
|
|
@deftypefunx @{@} void as_fatal (const char *@var{format}, ...)
|
|
These functions display messages about something amiss with the input file, or
|
|
internal problems in the assembler itself. The current file name and line
|
|
number are printed, followed by the supplied message, formatted using
|
|
@code{vfprintf}, and a final newline.
|
|
|
|
An error indicated by @code{as_bad} will result in a non-zero exit status when
|
|
the assembler has finished. Calling @code{as_fatal} will result in immediate
|
|
termination of the assembler process.
|
|
@end deftypefun
|
|
|
|
@deftypefun @{@} void as_warn_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...)
|
|
@deftypefunx @{@} void as_bad_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...)
|
|
These variants permit specification of the file name and line number, and are
|
|
used when problems are detected when reprocessing information saved away when
|
|
processing some earlier part of the file. For example, fixups are processed
|
|
after all input has been read, but messages about fixups should refer to the
|
|
original filename and line number that they are applicable to.
|
|
@end deftypefun
|
|
|
|
@deftypefun @{@} void fprint_value (FILE *@var{file}, valueT @var{val})
|
|
@deftypefunx @{@} void sprint_value (char *@var{buf}, valueT @var{val})
|
|
These functions are helpful for converting a @code{valueT} value into printable
|
|
format, in case it's wider than modes that @code{*printf} can handle. If the
|
|
type is narrow enough, a decimal number will be produced; otherwise, it will be
|
|
in hexadecimal. The value itself is not examined to make this determination.
|
|
@end deftypefun
|
|
|
|
@node Hash tables
|
|
@subsection Hash tables
|
|
@cindex hash tables
|
|
|
|
@deftypefun @{@} @{struct hash_control *@} hash_new (void)
|
|
Creates the hash table control structure.
|
|
@end deftypefun
|
|
|
|
@deftypefun @{@} void hash_die (struct hash_control *)
|
|
Destroy a hash table.
|
|
@end deftypefun
|
|
|
|
@deftypefun @{@} PTR hash_delete (struct hash_control *, const char *)
|
|
Deletes entry from the hash table, returns the value it had.
|
|
@end deftypefun
|
|
|
|
@deftypefun @{@} PTR hash_replace (struct hash_control *, const char *, PTR)
|
|
Updates the value for an entry already in the table, returning the old value.
|
|
If no entry was found, just returns NULL.
|
|
@end deftypefun
|
|
|
|
@deftypefun @{@} @{const char *@} hash_insert (struct hash_control *, const char *, PTR)
|
|
Inserting a value already in the table is an error.
|
|
Returns an error message or NULL.
|
|
@end deftypefun
|
|
|
|
@deftypefun @{@} @{const char *@} hash_jam (struct hash_control *, const char *, PTR)
|
|
Inserts if the value isn't already present, updates it if it is.
|
|
@end deftypefun
|
|
|
|
@node Test suite
|
|
@section Test suite
|
|
@cindex test suite
|
|
|
|
The test suite is kind of lame for most processors. Often it only checks to
|
|
see if a couple of files can be assembled without the assembler reporting any
|
|
errors. For more complete testing, write a test which either examines the
|
|
assembler listing, or runs @code{objdump} and examines its output. For the
|
|
latter, the TCL procedure @code{run_dump_test} may come in handy. It takes the
|
|
base name of a file, and looks for @file{@var{file}.d}. This file should
|
|
contain as its initial lines a set of variable settings in @samp{#} comments,
|
|
in the form:
|
|
|
|
@example
|
|
#@var{varname}: @var{value}
|
|
@end example
|
|
|
|
The @var{varname} may be @code{objdump}, @code{nm}, or @code{as}, in which case
|
|
it specifies the options to be passed to the specified programs. Exactly one
|
|
of @code{objdump} or @code{nm} must be specified, as that also specifies which
|
|
program to run after the assembler has finished. If @var{varname} is
|
|
@code{source}, it specifies the name of the source file; otherwise,
|
|
@file{@var{file}.s} is used. If @var{varname} is @code{name}, it specifies the
|
|
name of the test to be used in the @code{pass} or @code{fail} messages.
|
|
|
|
The non-commented parts of the file are interpreted as regular expressions, one
|
|
per line. Blank lines in the @code{objdump} or @code{nm} output are skipped,
|
|
as are blank lines in the @code{.d} file; the other lines are tested to see if
|
|
the regular expression matches the program output. If it does not, the test
|
|
fails.
|
|
|
|
Note that this means the tests must be modified if the @code{objdump} output
|
|
style is changed.
|
|
|
|
@bye
|
|
@c Local Variables:
|
|
@c fill-column: 79
|
|
@c End:
|