summaryrefslogtreecommitdiff
path: root/dis88/dis88.1
blob: 66a48f53b9760ac0feecf011537d9b6fc2e88ec3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
.TH dis88 1 LOCAL
.SH "NAME"
dis88 \- 8088 symbolic disassembler
.SH "SYNOPSIS"
\fBdis88\fP [ -f -o ] ifile [ ofile ]
.SH "DESCRIPTION"
Dis88 reads ifile, which must be in PC/IX a.out format.
It interprets the binary opcodes and data locations, and
writes corresponding assembler source code to stdout, or
to ofile if specified.  The program's output is in the
format of, and fully compatible with, the PC/IX assembler,
as(1).  If a symbol table is present in ifile, labels and
references will be symbolic in the output.  If the input
file lacks a symbol table, the fact will be noted, and the
disassembly will proceed, with the disassembler generating
synthetic labels as needed.  If the input file has split
I/D space, or if it is executable, the disassembler will
make all necessary adjustments in address-reference calculations.
.PP
If the "-o" option appears, object code will be included
in comments during disassembly of the text segment.  This
feature is used primarily for debugging the disassembler
itself, but may provide information of passing interest
to users.
.PP
If the "-f" option appears, dis88 will attempt to disassemble
any file whatsoever. It has to assume that the file begins
at address zero.
.PP
The program always outputs the current machine address
before disassembling an opcode.  If a symbol table is
present, this address is output as an assembler comment;
otherwise, it is incorporated into the synthetic label
which is generated internally.  Since relative jumps,
especially short ones, may target unlabelled locations,
the program always outputs the physical target address
as a comment, to assist the user in following the code.
.PP
The text segment of an object file is always padded to
an even machine address.  In addition, if the file has
split I/D space, the text segment will be padded to a
paragraph boundary (i.e., an address divisible by 16).
As a result of this padding, the disassembler may produce
a few spurious, but harmless, instructions at the
end of the text segment.
.PP
Disassembly of the data segment is a difficult matter.
The information to which initialized data refers cannot
be inferred from context, except in the special case
of an external data or address reference, which will be
reflected in the relocation table.  Internal data and
address references will already be resolved in the object file,
and cannot be recreated.  Therefore, the data
segment is disassembled as a byte stream, with long
stretches of null data represented by an appropriate
".zerow" pseudo-op.  This limitation notwithstanding,
labels (as opposed to symbolic references) are always
output at appropriate points within the data segment.
.PP
If disassembly of the data segment is difficult, disassembly of the
bss segment is quite easy, because uninitialized data is all
zero by definition.  No data
is output in the bss segment, but symbolic labels are
output as appropriate.
.PP
For each opcode which takes an operand, a particular
symbol type (text, data, or bss) is appropriate.  This
tidy correspondence is complicated somewhat, however,
by the existence of assembler symbolic constants and
segment override opcodes.  Therefore, the disassembler's
symbol lookup routine attempts to apply a certain amount
of intelligence when it is asked to find a symbol.  If
it cannot match on a symbol of the preferred type, it
may return a symbol of some other type, depending on
preassigned (and somewhat arbitrary) rankings within
each type.  Finally, if all else fails, it returns a
string containing the address sought as a hex constant;
this behavior allows calling routines to use the output
of the lookup function regardless of the success of its
search.
.PP
It is worth noting, at this point, that the symbol lookup
routine operates linearly, and has not been optimized in
any way.  Execution time is thus likely to increase
geometrically with input file size.  The disassembler is
internally limited to 1500 symbol table entries and 1500
relocation table entries; while these limits are generous
(/unix, itself, has fewer than 800 symbols), they are not
guaranteed to be adequate in all cases.  If the symbol
table or the relocation table overflows, the disassembly
aborts.
.PP
Finally, users should be aware of a bug in the assembler,
which causes it not to parse the "esc" mnemonic, even
though "esc" is a completely legitimate opcode which is
documented in all the Intel literature.  To accommodate
this deficiency, the disassembler translates opcodes of
the "esc" family to .byte directives, but notes the
correct mnemonic in a comment for reference.
.PP
In all cases, it should be possible to submit the output
of the disassembler program to the assembler, and assemble
it without error.  In most cases, the resulting object
code will be identical to the original; in any event, it
will be functionally equivalent.
.SH "SEE ALSO"
adb(1), as(1), cc(1), ld(1).
.br
"Assembler Reference Manual" in the PC/IX Programmer's
Guide.
.SH "DIAGNOSTICS"
"can't access input file" if the input file cannot be
found, opened, or read.
.sp
"can't open output file" if the output file cannot be
created.
.sp
"warning: host/cpu clash" if the program is run on a
machine with a different CPU.
.sp
"input file not in object format" if the magic number
does not correspond to that of a PC/IX object file.
.sp
"not an 8086/8088 object file" if the CPU ID of the
file header is incorrect.
.sp
"reloc table overflow" if there are more than 1500
entries in the relocation table.
.sp
"symbol table overflow" if there are more than 1500
entries in the symbol table.
.sp
"lseek error" if the input file is corrupted (should
never happen).
.sp
"warning: no symbols" if the symbol table is missing.
.sp
"can't reopen input file" if the input file is removed
or altered during program execution (should never happen).
.SH "BUGS"
Numeric co-processor (i.e., 8087) mnemonics are not currently supported.
Instructions for the co-processor are
disassembled as CPU escape sequences, or as interrupts,
depending on how they were assembled in the first place.
.sp
Despite the program's best efforts, a symbol retrieved
from the symbol table may sometimes be different from
the symbol used in the original assembly.
.sp
The disassembler's internal tables are of fixed size,
and the program aborts if they overflow.