The machine-code monitor for my MINI000 68000 SBC is coming along nicely. It’s been great fun and an excellent introduction to writing 68000 assembly.
Command processing consists of the following steps:
- Read a line of text from the serial port.
- Parse the line into command and argument portions.
- Locate the command from the set available. This will result in a “command description” pointer.
- Using data in the command description, validate the arguments.
- Call the command handler subroutine, passing in the arguments.
Reading in a command line is trivial and was done to verify UART functions. The getstr routine cannot yet deal with backspace though, just like the 6809 monitor in the early days.
Parsing the command line consists of extracting the command name (copying characters up to the first space, or null), and then extracting the hexadecimal values. This makes use of the generic asciitoint routine, previously written, which is used to populate two arrays, one of types (0=end of list, 1=byte, 2=word, 3=long) and another of the values, in long form. The type is the smallest that will hold the value input, which might well be used in a wider form by the command handler. This routine is, imaginatively enough, called parser.
Determining the command to be executed consists of walking the list of available commands and looking for a match. This routine is generic (not limited to use within monitor) and is called strmatcharray. This uses a new, and mostly trivial, strcmp routine. The result of this is a pointer to a command description structure consisting of a command handler subroutine pointer and a pointer to an array of maximum type widths. This array is a list of the widest arguments that the command will accept. For example, a possible command “writebyte” might take two parameters:
- Address: maximum width: long (3)
- Value: maximum width: byte (1)
Next the inputted types are compared against the types the command will accept. This just consists of walking the two arrays and comparing the two values in the arrays to ensure that each argument is not wider then the command will accept. This routine is called checktypes and probably has no use outside of the monitor.
So a writebyte command call with parameters 10000, 1 would be ok because a byte is narrower then a long; it would simply be zero-extended. But a writebyte call with parameters 10000, 100 (hex) would fail because a word is wider then a byte.
A complication here is wanting to deal with sequences of the same type. This is used by an improved form of writebyte called, imaginatively enough, writebytes. This command will write any number of bytes in sequence. To deal with this the type comparing routine has an additional capability: if the high bit of the type in the command’s max type width is set this is a repeatable argument. It can only be set on the last max type width in the command definition.
To try to make things clearer it might be helpful to describe the command description in pseudo-C:
Struct command_description { Void *handler; Word *max_widths; }
And the top level record:
Struct command { Char *name; Struct command_description *cd; }
The commands are held in an array of Struct command. The reason for two structures instead of combining them into one is the command locating routine: If it is generic and not specific to monitor commands it can be used outside the monitor. It’s possible a future “named thing finder” will need to return pointers to other record types and this would be messy if it had to return multiple pointers. Saying that I’m sure there are other ways to structure the data. One other way that comes to mind is to pass a record length into the command finder routine and have it hop over the fields which are not important to it to get to the next command name. In the case of a match it would return a pointer to the Struct command record which matched and not the cd pointer.
Building up the command array in memory is done using a macro which takes the command name and the maximum argument width array as a vararg:
.macro checkcommand name,maxtypes:vararg .section .rodata.com.name .align 2 name_\name: .asciz "\name" .section .rodata.com.maxtypes .align 2 maxtypes_\name: .word \maxtypes .word 0 .section .rodata.com .align 2 com_\name: .long \name | handler pointer .long maxtypes_\name | list of maxtypes pointer .section .rodata .align 2 .long name_\name | name pointer .long com_\name | command descriptor .endm
This macro is hard to follow but it basically creates the needed constants, for each of:
- The top level command array.
- The maximum argument width array.
- The command name.
- The command description.
Each item goes in it’s own sub section off of .rodata, which causes the linker to group them together into lists. The items are referenced using labels formed from the type of thing being referenced and the command name.
Of course the command data could be constructed by hand but the macro turns defining a command into a one liner, eg:
checkcommand "writebytes", 3,1+0x8000
Note that the command name is also the name of the command’s subroutine.
Also a variant macro, nocheckcommand, exists. This creates a command description with a null pointer to the maximum width array, which causes the argument check routine to be skipped. This allows any width arguments to be passed in, with validation possibly being done by the command handler itself. This is used by the parser test command to allow any combination of arguments (or none) to be passed in.
Inside the commander handler subroutine itself, obtaining the arguments is trivial: the type array is in one address register and the value array is in another. Obtaining a particular argument’s value can be done using address register indirect addressing with a constant displacement. Normally the command is not interested in the type (width) of the argument, which simplifies things even further.
So far I’ve implemented seven commands (ignoring a help command and one to test the parser); out of the seven commands four of them are variants, so in reality there’s really just three commands.
dump address.l length.l
(I’ve borrowed the .l, .w and .b notation from 68K assembly.)
This is the most complicated command. It dumps out words from the given address. The length is the number of bytes to show. The output is pretty-fied with an ASCII display. This command is similar to the one I wrote for the 6809, although the implementation is quite a bit simpler because of the number of registers in the 68000.
readlong address.l
Displays the long value at the given address. The address itself points to the most significant byte of the long. readword and readbyte commands are also available. In the case of words and longs the address must be even, otherwise an address error exception – not currently handled – will be generated. Also, since the argument value is always zero extended into a long held in memory, in the case of readbytes the byte value is obtained at address register + (arg number * 4) + 3, since the byte is in the lowest byte of the long. For readword the word is 2 bytes offset.
writelongs address.l value1.l value2.l …
Writes the list of long values into memory, starting at the specified address. As with readlong, the address points to the most significant byte. writewords and writebytes are also available. For the word and long variant, the address must be even.
An example use of these commands is as follows:
The commands have been used to manipulate the LED and make the buzzer buzz, in addition to playing with memory.
The code, for anyone interested, is in the 68k-monitor repo within my github space.
I’ve also used the monitor to exercise an eight bit addressable latch, a 74HC574 (PDF), attached to some DIP switches. This is the same IC used to attach digital joysticks to the 6809 in the MAXI09 board. Looking at the datasheet the part appeared fast enough to operate without any waste-states, and sure enough it worked fine; from the monitor the state of the switches could be obtained with a readbyte to the appropriate address. Here’s a picture of the breadboard:
The /OE pin on the ‘574 is attached to USER0 on the MINI000 expansion connector. This is asserted by the address decoder within the CPLD when A23..A20 = 0011 or the 32bit address 0x00300000. Because the latch is attached to the databus via D7..D0 the IC can only be read at odd addresses, e.g. 0x00300001. I’m not qualifying the access with /READ which I should really be doing as a write to 0x00300001 will be creating contention between the latch and the MPU. This is something to improve if I use simple addressable latches in a later board.
I’ve also had success at unlocking my previously useless AT28C256 (PDF) EEPROMs:
The part is, again, attached to the low half of the databus. Therefore dumping out the ROM results in data with the upper portion is random. Nonetheless it’s possible to verify that the EEPROM is storing written data:
This was tested with my only spare usable AT28C256. The next step was to verify that it was possible to lock and unlock the memory (called software data protection In the datasheet). The following is an expert from the datasheet showing the special memory writes that lock and unlock the device:
Translating this to code was trivial enough:
.equ EB,0x300000 unlock: move.b #0xaa,EB+(0x5555*2) move.b #0x55,EB+(0x2aaa*2) move.b #0x80,EB+(0x5555*2) move.b #0xaa,EB+(0x5555*2) move.b #0x55,EB+(0x2aaa*2) move.b #0x20,EB+(0x5555*2) rts lock: move.b #0xaa,EB+(0x5555*2) move.b #0x55,EB+(0x2aaa*2) move.b #0xa0,EB+(0x5555*2) rts
The *2 shifts the address up one position which is necessary because the EEPROM is wired A0 to A1, A1 to A2 etc.
These lock and unlock commands work perfectly:
You can see the write operation does nothing whilst the memory is locked.
The next step was to try unlocking the memories I purchased previously which I could not use with my home made programmer. As a reminder I received two sets of five:
- The first set had data loaded in them but refused to write.
- The second set read back 0xff at every location, refused to write, and were suspected fakes.
The first set unlocked fine. I have verified I can now program them with my home made programmer just for completeness.
The next step was to try to unlock the “fakes”. I’m embarrassed to report they unlocked fine; they are, in fact, perfectly good parts. This means I have no less then 11 spare, usable, AT28C256s.
The next IC to attach will be a MC68230 (PDF), a parallel interface and timer similar in functionality to the 6522. The only interesting angle here is the fact that the 68230 I have is only rated at 10Mhz but I have a 16Mhz oscillator driving the 68HC000. This will therefore require the insertion of wait states.
I believe I’m at a point where I can give some initial thoughts on what the 68K family is like to program in assembly. Bare in mind that my point of comparison is based mainly on my experience of programming the 6809. The jump up from a lesser 8 bit CPU, like the 6502 would seem even more massive. Also bear in mind that I’ve not looked into the processor much from a systems point of view. I’ve only looked at it from a user programmers point of view. For instance, I have not looked at exceptions or other things that would concern a systems programmer, eg. someone writing an OS.
First up, the good.
The number of registers is a massive step up compared to what I’m used to. I have yet to have the need to place local variables on the stack; I just use another register.
Obviously manipulating 32 bit quantities is no harder then manipulating 8 bit quantities, though there are some exceptions as described below. Referencing the registers as .b, .w and .l is far more elegant then the x86 approach of giving the wider registers longer names.
As I mentioned in a previous post, I’ve not had the luxury of a dbra instruction before, and I find I’m using it rather a lot. It’s a very compact way to implement loops.
Data moves without going through a register are superb. Eg. to move a byte from a UART receive register to memory, through an index, with post increment:
move.b RHB2681,(%a0)+
Testing and then setting or clearing a particular bit is a nice “high level” instruction with a surprising amount of utility. This is the bclr and bset instructions.
There are, essentially, 7 stack pointers. Other then the return address for a subroutine, which must use A7, the rest of the address registers can all be used to create stacks. While the 6809 almost had this with it’s pre-decrement and post-increment addressing modes, push and pulls could only be done on s and u. The 68000 equivalent, movem (move multiple) can use any address register.
Now the not-so-good or, perhaps, the unexpected.
First up, the general state of consistency within the ISA. In general it is reasonably consistent. It is certainly not a completely orthogonal ISA, and some deliberate inconsistencies result in a better overall instruction set for writing real code. An example of this is the moveq (move quick) command. It detracts from the orthogonal-ness but is clearly there for a reason: loading a register with a small value is a common occurrence and moveq allows it to be performed with only a single instruction word, something that would otherwise take two. There are, interestingly, no increment instructions in the 68K. Instead there is addq, to add a small value to a data register in a single instruction word. This should probably be considered a positive since otherwise manipulating address registers (moving them forward a few bytes) would require two instruction words at least.
One of the key areas is the split between address and data registers. The 68K is fairly unique here, I believe, in the way it separates out the registers in this way. It feels like a middle ground between the classical accumulator/index register orientated model of the 8 bit MPUs compared with the later RISC model where all registers are equal. In summary, I believe that once you are used to the split that the 68K forces on register usage, it feels natural and is not a hindrance. One thing that is a little peculiar about the split, however, is that loading an address register does not change the status register. This is sometimes useful, sometimes not. It is the reason why a subroutine which returns an address would usually put the address in a data register, so that the zero bit can be used by the caller to indicate an error (for example, an allocate some memory routine).
In general, displacements must be 16 bits wide. This, amazingly, includes branches as well as address register indirect with displacement. This was corrected (or improved, depending on how you look at it) in the 68020 and later processors, but the lack of relative branches over the full addressing range in the 68000 seems like an unfortunate inconvenience to me.
The operation of bit shifting is strange: if shifting to a memory location (however that memory location was arrived at) only one bit can be shifted. However, if you are shifting a data register then the size of the shift is either up to the maximum of 32 bits if the shift amount originates in another data register, or up to 8 bits if the shift size originates from an immediate value. This behaviour likely reveals much about the internal workings of the machine. It’s also interesting to note that the 68000 lacks a barrel shifter, so each shift takes one clock cycle. I assume that a barrel shifter was added in later family models, but I’m not certain.
Bit tests, as I mentioned above, are very useful. But again if the item being tested is data in memory then only the first 8 bits can be tested, ie. it is a byte operation. If instead a data register is being tested then any of the full 32 bits can be utilised. I suspect a reason why this was not considered a major concern is because a key use of this instruction is testing IO ports, which – at least on early 68000 systems – were more often then not 8 bits wide. Yet another, more obvious, reason why this is not really an issue is that the effective address can be adjusted to access into the correct byte.
For some strange reason, bit-wise XOR (the eor instruction) and bit-wise AND (the and instruction) have different constraints (for the eor instruction the source must be a data register; it can’t be a memory location.)
Memory indirect addressing is not available at all, even with constants. On the 6809 one could use the following instruction:
lda [,x]
This would place the content of the x register on the address bus, read the two bytes at that location into a private register, put those back on the address bus and then read the byte at that address. This a pretty complex instruction for an 8 bit processor, which was not microcoded, and it is not even the most complex since it is also possible to load a 16 bit register using memory indirect addressing. Under the 68K family this instruction looks like:
move.b ([%a0]),%d0
This is not available in the 68000, but is available in the 68020 and later. This is not a massive problem, as it is trivial to add an additional move:
movea.l (%a0),%a1 move.b (%a1),%d0
The reason for using the a1 register here is to avoid changing the a0. If that does not matter, then a0 could be reused. The above would of course be slower then a single instruction.
That’s probably about it. I’m sure I’ll find more good and bad aspects as I continue learning about and writing 68K code. Overall it is a much more pleasant experience writing code for it compared to the, already solid, 6809. But there is, perhaps, more subtle restrictions to consider.
One excellent, and I believe based off contemporary text, reference I found that documents the quirks of the 68000 can be found here.
I have a few ideas for things I could look at next:
- Possibly the most interesting, and challenging, would be to start working on a VGA signal generator. I need to order a few things before tackling that though, like a 25.175Mhz oscillator for generating the pixel clock.
- I could explore interrupts. The 68000 has a significantly more complex way of dealing with interrupts then the 6809.
- Related to the above, I could write some exception handlers. It would be nice to report on address errors and other exceptions instead of just leaving the MPU to run some random code.
- There’s also the above mentioned MC68230 to breadboard up. It would be nice to get that working as it could be used to prove I fully understand wait states and the usage of the /DTACK signal.
- The monitor could be extended to run, and debug, user code. This would involve not only showing the status of registers when the monitor was re-entered through a trap, but also possibly single stepping via usage of the trace mode.
As usual, there’s lots and lots of choices!