Programming for pleasure: a 68k disassembler written in 68k assembly

Something a bit different: a post about programming for the sheer fun of it.

Though I haven’t posted here in a while I am still very busy with my hobbies and am at least two blog posts behind:

  1. I have ported my Matrix Display project (GitHub) to the ESP32. This was a fair chunk of work, but it was mostly enjoyable. I made a video demoing this and have put it on YouTube. In summary I’m extremely pleased with this; it’s now extremely easy to get this project running on any ESP32 board equipped with a connection to a 64×32 HUB75 panel. At least one off the shelf board is available. It also runs on an off the shelf Pi Pico W 2 based board. And I now have two panels in my house, the original one in the living room and a brand new ESP32-based panel in my office/lab!
  2. My iCE40HX development board (GitHub) runs well with its HDMI add-on board. I’ve currently parked these efforts but will document the work done as I’m rather chuffed to have built a board that makes use of HDMI.

And not wanting to miss out on all this AI nonsense, I’ve recently purchased an NVIDIA Thor AGX. The initial target for this is my Home Assistant setup, and applying the Thor to run local LLMs for such things as doorbell camera image analysis and, of course, for interactive use via a Nabu Casa Voice Preview Edition, but these systems have virtually unlimited uses. For completeness, I’d never cheapen my blog posts by relying on AI writing tools. I’ve also experimented with using it for coding, but found it to be utterly hopeless at 68k assembly so have  given up with it. I did use it for writing some comments early on though. But to me the coding, and other actives like circuit design, I discuss in this blog are about fun and personal fulfillment; I have no desire to give that activity to a machine.

So back to the topic of this post; writing a 68000 dissembler in 68000 assembly.

There was and is a reason for this project. It is not a one-off; I am back on my 68k projects and the 68k is once again the main outlet for my project time. The initial goal is to resurrect and clean up my monitor code (GitHub) and make it a useful jumping off point to work on a full blown, from scratch multitasking Operating System, following some ideas from MAXI09OS (GitHub). My MC6809 targeted OS was a huge load of fun to do and it would be immensely satisfying to work on an OS again. And with a MC68030 processor at the heart of the MAXI030 board the possibilities for an OS for such a processor are pretty much limitless, as the Linux port shows. I’m already dreaming of my own multitasking graphical desktop. Who knows; I might even end up wanting to work on a new 68k based computer board. My good friend Steve Moody has made amazing progress with his FPGA based graphics card and other software beside an OS could grab my interest, namely a game which makes use of yet to be implemented graphics card features. I might yet spin a new graphics card design for MAXI030 that borrows from what I learned producing a working HDMI expansion card for my iCE40HX development board.

On top of that I’ve found out, only a few days ago, that the Linux performance regression with the m68k port appears to have been fixed, meaning it should be possible for MAXI030 to run the latest Linux kernel and userland at a decent speed. I’ll be sure to give those changes a spin.

But back to the here and now.

Long time (very long time, it was more then a decade ago) readers of this blog might remember that I wrote a MC6809 dissembler in MC6809 assembly, so this is hardly a new idea for me. However when starting out on this project I was extremely daunted by the relative complexity of the 68000 ISA compared to the 6809. At the very outset I decided I would limit myself to the core 68000 ISA and not look at any of the additional instructions and addressing modes available on the 68020 and later processors.

Before tackling a disassembler I knew that there were things I had to do first.

And the very first thing to do was to blow literal dust off my old girl, the MAXI030 board. It was great to see “her” running again!

(Yes I need to solder the rest of the expansion connectors.)

Observant readers might notice that there are two SIMMs attached. MAXI030 now has 64MB of RAM attached. A lofty amount and no mistake! I found a supplier on eBay who had matching 32MB parts available. Basically the same speed and capabilities as the previous stick I used. I confirmed operation by adjusting my memory controller VHDL code, and the 68000 routines to test both sticks and had no errors after leaving it running for a few hours. This was looking good. The memory controller is still not as fast as it could be, but that will come later.

So, onto the disassembler.

A key document I used as a sort of cheat-sheet is this Opcode Map (PDF):

I’ve included a screenshot of about a quarter of the document here, just as a frame of reference for my discussion on the writing of a disassembler.

The 68k uses a 16 bit (word, in 68k terminology) instruction word. An instruction is encoded into one or more words, with some instructions using multiple trailing extension words.

The colour coding above indicates some regularity to the instruction layout, which is essentially a requirement of the decode logic going on inside the control unit within the processor. For instance, the upper most nibble at 15:12 is what’s known as the “line”. It’s never used, beside the (not shown) move instruction as anything other then a constant value. The lowest three bits at 2:0 is usually a register index, either an address register or data register. And if the operation takes both register types, the type is encoded in another bit elsewhere in the word. Some operations also take a second register, usually a data register, at 11:9. You can also see the size of the operation (S) being encoded either in one bit or two bits; some instructions can only operate on words or longs, some can operate on bytes, words or longs. Other common bit positions are used for things like the branch condition, or the 3 bit wide immediate field in add and subtract “quick” as well as rotates and shifts, where it is used for the shifting amount.

After studying this PDF for maybe an hour the sinking sense of dread was replaced with hope. The ISA was more “regular” then I had assumed and it looked like a disassembler wouldn’t be an impossible task for me, even if it was written in assembler. It was also starting to look like it would be a lot of fun to write.

A key field, beside the register selection fields, is the “mode”. This is the 3 bit field marked M and indicates the addressing mode. As you can see from the table at the top right, there are many addressing modes and this includes such things as address register indirect (called Address in the table) and the pre and post increment and decrement modes. But it is the “index” modes where things really get complicated, as these use the trailing extension words. A simpler example, which uses extension words but is not an index mode, is the instruction:

jmp (0xff002000)

Confusingly it seems there is no set standard for the usage of parens in this kind of instruction. Currently my disassembler outputs them, but I may take them out. GAS accepts and ignores them if they are given.

Anyway, this instruction would jump to the address given. It is encoded as:

4ef9 ff00 2000

The upper byte of the first word is 4e which is 0100 1110 in binary, with the upper two bits of the lower byte being 11. This forms the fixed portion of the instruction word, 10 bits in all. Looking at the opcode map it is easy to match this with the jmp instruction.

The remaining six bits, 11 1001, form the mode (left most 3) and register (right most 3). Splitting it up we have:

  • Mode of 111
  • Register of 001

Consulting the table at the top right of the image, we can see that this is the “Absolute Long” mode. Interestingly a Mode of 111 is like an escape sequence, with the final mode being encoded using the register field. The absolute long in question is held in two extension words, ff00 and 2000 in the example above, forming the final address for the jump.

We have just decoded an instruction!

At this point I decided I’d spoil myself and finally buy a paper copy of the official Motorola Programmer Reference Manual (PDF). I’d been looking for a paper copy of this book for a while, and finally found one on eBay, shipped from the US. It’s strange, but I still love programming books. Much better then a Kindle, or trying to read large amounts of text on an LCD screen.

I decided that I didn’t want to rewrite the flash memory in MAXI030 every time I wanted to test out the disassembler code, so using the Ethernet transfer mechanism, that I’d previously created to transfer the Linux kernel to MAXI030, was the way to go. I also needed a way to extend the monitor commands dynamically, that is, from the machine code blob transferred with the disassembler code. This ended up being easier then I expected; essentially the command array that describes the available monitor commands has an end marker and this now points to the next chain of commands in RAM. If the monitor “extension” hasn’t been loaded yet the pointers in RAM will be 0, ending the list. Otherwise another command array describes the extension commands. Of course before starting the disassembler coding I wrote a few test commands.

It was then time, at last, to start on the actual disassembler programming.

The basic mechanism is described as a single pass disassembler; that is instructions are decoded as they are read sequentially from memory and printed out. There is no provision for going back through the decoded instructions to pretty-up the output with such things as labels.  Essentially each instruction is a singular piece of information; the instruction word followed by any needed extension words are read in, decoded, and printed to the console with the process repeated for subsequent instructions.

The instruction encoding described above permits some patterns in the code to be employed, and a certain amount of reuse.

The core element in the code is a macro which is used to describe a single instruction:

An instruction then has the following fields:

  1. A “label”. This is just an internal item that links the instruction name, which is a string, to the rest of the structure.
  2. The instruction name, which the disassembler will output for this instruction
  3. A bit pattern for the instruction word
  4. A bit mask for the instruction word
  5. There are four subroutine pointers, which are all optional, which are used to decode and print the various fields:
    1. The condition which is used by branches
    2. The width (.b for byte etc)
    3. The source operand
    4. The destination operand

Here’s a very typical set of instructions, showing how they are defined:

These four instructions all follow the same structure. The mask field (the right one expressed in binary) is the starting point when understanding how the decoding works; it shows what bits are variable in an instruction and what bits are fixed. When walking this list of instructions, which is done top to bottom, the word to be decoded is masked off with this mask and then tested against the pattern. As soon as a match is found the instruction name is output and then each of the four decoding routines, if specified, is called in order.

So for these four instructions there is no condition to decode (if there is it is printed immediately after the instruction name without a space). There is however an instruction width, which is in the common position of 7:6 and decodes to either .b, .w, or .l (byte, word, long). There is also no source operand, only a destination operand and it uses the common positions of 13:15 for the register and 10:12 for the mode.

There’s an important limitation of this disassembler that becomes apparent here: there is no attempt to block invalid addressing modes from being decoded for certain instructions. A specific example of this is the following instruction:

not.l %a0

This is not valid 68000 assembly; it is not possible to invert an address register, nor is it possible to, or logical to think you can, invert an immediate data item. However my disassembler will happily decode such instruction sequences if it finds it in memory.

The beauty, in my biased opinion, in laying out the disassembler in this way is that instructions can be added with very little work, assuming the “pieces of the puzzle” are already written. The general process was to look at an instruction in the Opcode Map and see if I had already written the individual decoders. If I had then great: all I had to do was add the macro call to the code to define the instruction using whatever existing decoder subroutines it needed. If it didn’t then I had to write the new decoder.

Upon each call, a decoder subroutine finds the registers to be in a given state, for instance %d2 holds the full instruction word. A typical decoder is this one:

This will print the 3 bit immediate at 11:9. It’s pretty simple:

  1. Output a hash
  2. Get the instruction word from %d2 and put it in %d0
  3. Shift the bits down into the 2:0 position
  4. Mask off all but the low 3 bits
  5. Print the result padded to a byte as hex (with the 0x prefix)

Other decoders are a lot more complex. Beside the address mode decoder, an overview of which is given below, the one that took the most amount of work was the movem (move multiple) register decoder. In assembly you typically write something like this:

movem.l %d2-%d4/%a0,-(%sp)

This will stack D2, D3, D4 and A0. The registers are represented in an extension word as a bitmask; from the manual:

Decoding this word and turning it into a reasonable looking string was a little bit of a challenge. My implementation does not print nice register ranges (like %d2-%d4 in the example above) but instead just prints each register with a slash between them. This could result in some extremely long lines!

Turning now to the addressing mode decoding. From the Opcode Map document:

The first job is to decode the 3 bit M field. For the Data Register and Address Register modes the job is simple; a generic “print data register” subroutine was written which is called from various places. The same was done for address registers. For Addresses, Addresses with Postincrement and Addresses with Predecrement the task is much the same, only some additional characters need to be added to the output.

For the Address with Displacement mode a 16 bit (word) must be read from the instruction stream. It is output along with the register. Address with Index is the most complicated. This mode supports compounded offsets which come from both an immediate 8 bit field and an additional data or address register. Here is an atypical instruction as it would appear in assembly source which uses the Address with Index mode for both the source and destination operands:

move.l (-12,%d2.l,%a4),(34,%d1.w,%a5)

This is thankfully as complicated as the 68000 gets, and nicely shows the classic CISC nature of the 68k.  You can see that the address register is summed with an immediate displacement and a further displacement which comes from a data register, but as another complication the data register is either used as is (in long mode) or sign extended from a word. Also instead of a data register an address register can be used.

This is all encoded in an extension word:

This is the version of this word from the 68020 and above; the 68000 does not support the scale field so we can ignore that:

  • D/A: sets weather the register is an address or data register
  • REGISTER: the additional index register to use
  • W/L: wether the index register is to be used in word or long mode
  • DISPLACEMENT: the 8 bit displacement

So to decode an Address with Index instruction  we must extract the fields from this extension word, printing the base address register from the main instruction word, the displacement register (considering its type using the D/A field and wether it is used in word or long mode) and the displacement.

Currently all displacements are printed in unsigned hex. Not the most friendly, but it will suffice for now.

Program Counter modes are handled by the “print an address register” subroutine consulting another register and deciding if it should print %pc for the register or fall through to printing the address register.

The remaining modes, Absolute Short, Absolute Long and Immediate are fairly trivial in comparison to the Index modes. All of them extract data from the stream as extension words. The immediate data is sized according to the size as extracted from the instruction word, with bytes padded to words when they are read in.

A further complication with the mode decoding exists because the same code is used to decode the operand when it is both in the left (higher) or right (lower) bit positions within the instruction word. The right position is used in many instructions, whilst the left most position is only used only once, but it’s used in a very important instruction: move. There are other interesting corner cases which use the mode decoding code, but aren’t decoded from a 3 bit field: the cmpm (compare memory) instruction only supports Address Postincremement being one example. This decoding borrows code from the mode decoder.

There is one final important aspect of the instruction decoding to discuss, that of instruction match ordering. There are at least two places in the opcode map where instructions essentially overlap; the only difference between two completely different instructions is the modes supported. One example is the movep (move to/from peripheral) and bset (bit set) instruction. The movep instruction uses the same position in the map except that it fixes the mode at 001, Address Register, which is not supported by bset. To make it decode both instructions my disassembler must test for movep first. If it tested for bset first it would never match a movep instruction.

This pretty much covers the instruction decoding programming.

The last part of the disassembler work was to make the output pretty. I wanted to include the raw instruction words in the output, which all useful disassemblers do. This required some changes to the serial output mechanism, since the  decoded instructions needed to go in their own column. Essentially the instruction text goes into its own buffer which is only printed after all instruction words have been read in and printed.

Here is a comparison of the objdump output vs my disassembler.

First objdump:

And now my disassembler:

A rough commentary on the things the disassembler does well, and the things it does not do so well:

  • The opening instruction is a movem to a fixed address. You can see that it does not factor in ranges of registers, and in fact the line doesn’t even fit in a standard terminal window. On the plus side I’m not sure objdump’s rendering of %d0-%sp is very clear either, though this is a bit of a strange instruction.
  • Objdump’s use of decimal integers for addresses, and signed ones at that, makes no sense at all.
  • The lea (3rd instruction) uses extended 68020 addressing modes, which my disassembler does not fully decode. The trailing extension words are rendered as ori.b instructions, which is obviously not correct.
  • In the bsr.w instructions you can see that whilst my dissembler does not know anything about symbols it does at least calculate the target of the branch.

All in all, I think my disassembler is pretty useful. Looking at objdump’s output might give something away of what the monitor will eventually be used for.

But most importantly this was a lot of fun to write; something few people will, probably, understand.

The code, for anyone interested, is in the GitHub repo, along with the rest of my monitor code.

On top of this work, I’ve also been part-way down an interesting side road: rewriting the MAXI030 FPGA design in Verilog.

My recent softcore projects have convinced me that I much prefer writing programmable logic in Verilog compared to VHDL and dragging out Quartus 9 to hack on my old FPGA code was fairly unpleasant. I knew that rewriting the glue code would be a fairly large investment in time but I also knew that the rewards would be substantial. I’d also have the opportunity to introduce test benches to the design, and at the same time I decided I could automate the compilation process by controlling the Quartus 9 installation, running on an old Windows 7 virtual machine, using SSH. Combined with tools I’d previously used for my softcore projects, namely Verilator and iVerilog, I could side-step some of the quirks of using an ancient version of Quartus and produce a tolerable workflow involving my Linux desktop running VSCode and a Makefile to hide much of the dirty grunt work required to produce and load an FPGA bitstream into the FPGA on the MAXI030 board.

Despite making some progress with this, I’m still currently running my old VHDL glue code, with some small improvements in the SIMM controller. Hopefully I can finish off the Verilog implementation, as it’s much more pleasant to work on. In the meantime I’m at least able to work on the VHDL implementation of the glue code from within VSCode, which is no small improvement.

I’m unsure what my next post will focus on. We shall see! So many interesting projects I could write about…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.