More MAXI000 construction and progress towards an EmuTOS port

The absolute minimum hardware requirements to port EmuTOS to a 68000-based micro are straightforward enough: all that is needed is upwards of 512KB of RAM, about 256KB of ROM and some kind of character IO device.

Of course such a setup would not be very useful as it would have no GUI or storage, only a not-very-useful command-line, called EmuCON, would be available.

EmuTOS was originally written as a TOS re-implementation targeted for running on Atari ST and derivative machines. But it has since been ported to run on new 68K based hardware, including the impressive looking Firebee and even the Amiga. It also runs on the 68008 based Kiwi.

After cloning the repository, it was pretty simple to add a new hardware definition to describe the variances needed for MAXI000: its amount of SRAM, and the fact it was not an Atari ST-variant computer.

Pretty soon I realised it would be a good idea to join the EmuTOS-devel mailing list. I found lots of amazing advice there.

Character IO was initially tied to the VGA display, with the Amiga 600 keyboard used for input. With some help from the folks on the mailing list, I managed to get an initial boot.

The simplest possible approach to hooking the existing console-handling code into EmuTOS was to link in my assembly routines, written for my machine-code monitor, and then call them from within EmuTOS’s BIOS handling in C. This required learning what calling conventions were used, and writing a C language header for the assembly subroutines that needed to be called, but it was very simple to get an initial boot up generating output on the console:

The “junk” characters are VT52 escape sequences for changing text colours and styles which the console handling code does not yet honor.

At this point input was not working. Further, lack of a timer interrupt was creating issues. To get the boot process to work at all a delay loop had to be commented out.

Adding a timer interrupt was pretty interesting.

The 68000 has a fairly sophisticated interrupt mechanism. There are 7 priorities and the vector (index into the table of interrupt service routines) can be obtained either from the priority level (autovectoring) or directly from the databus. It is possible for a single computer to use both methods.

The rough sequence for an autovectored interrupt is as follows:

  1. Glue logic asserts an interrupt level
  2. Processor notices the interrupt at the start of the next instruction, and if it is of greater priority to the current level, it acts on it
  3. Processor indicates it is running an interrupt acknowledgement cycle by setting the FC outputs to 111 with the interrupt level it is acknowledging going on A1 to A4
  4. Glue logic ensures /DTACK remains in the negated state and asserts /VPA
  5. Processor starts the interrupt routine at a calculated address using the table of 7 autovectors as the base and the interrupt priority as the offset

Vectored interrupts are processed in the same way initially, but from step 4:

  1. Glue logic asserts /DTACK and proceeds as if this an ordinary read cycle, placing the vector (0 to 255) number on the low half of the databus
  2. Processor runs the indicated exception, regardless of what it is

The 256 entry exception vector table, fixed on the 68000 at location 0, has all but 64 entries reserved for use by the user (the 64 allocated entries are used for things like division by zero, illegal instruction, and the 7 autovectors). Glue logic can place any vector number on the databus, though it could get confusing if an interrupt causes the division by zero vector to run!

Writing the VHDL for a simple timer in Beta was easy enough. A 16 bit addressable register holds the countdown interval, and each time the countdown counter reaches zero it is reset to this value and an interrupt is generated. The interrupt is cleared by writing a zero into the maximum value register, which saves adding another address for this purpose. This mechanism has the key advantage of keeping the timer interrupt regular regardless of how long the processor takes to acknowledge the interrupt. 16 bits of counter does not allow long counts however – 8.2ms assuming a 8MHz clock. The counters could either be widened and spread across two 16 bit addressable registers or pre-scaled, to allow longer intervals.

Processing interrupts on the MAXI000 board required some additional changes because although Beta receives external interrupt signals from peripheral ICs (like the SC16C654 (PDF) quad UART) it does not know when the 68000 is running an interrupt acknowledge cycle as it does not have access to the Function Code signals. The solution to this was to use one of the three signals between Alpha and Beta to indicate this state. Beta can then present the vector number on the databus: it knows that an interrupt is pending from, for example, it’s own timer state; it knows that the processor is running an interrupt acknowledgement cycle from the new signal; and it knows which interrupt priority is being acknowledged by examining A1 to A3, which it has access to (Beta is attached to A1 to A5).

As usual all the hardware hacking was prototyped in the monitor. The timer ISR simply increments a counter after clearing the interrupt by writing a zero to the maximum count register. Setup consists of setting up the exception vector for the timer interrupt and configuring the timer interval.

This all works pretty well. The timer interval set gives a 200Hz interval with a system clock at 8MHz. 200Hz is the TOS standard timer frequency and it seemed a reasonable number to start with.

This only covers vectored interrupts. To prove my understanding I did have a go at autovectored interrupts as well. This required using the third signal between Alpha and Beta to indicate that Alpha should negate /DTACK for the duration of the interrupt acknowledge cycle while Beta asserts /VPA. It cannot do this unconditionally because the interrupt in question might be an ordinary vectored one. This all worked as expected, but because it uses an additional signal (out of only three) between Alpha and Beta and because vectored interrupts are generally superior anyway, I will stick to using vectored interrupts.

Note that unlike the MAXI09 board it is currently not possible to configure interrupt routing from the processor.

For completeness,  and because it would end up being useful under EmuTOS, a vertical blank interrupt has also been implemented. This interrupt triggers at the bottom of the visible portion of each video frame. To save writing too much extra VHDL code, this interrupt is reset (cleared) by reading from the timer interval register, something that ordinarily is never done.

After sorting out basic interrupts, and getting EmuCON to allow input, the next step was to attach the IDE-related hardware and test it out with the monitor. Here’s a picture of the board as it currently stands, with an IDE to CompactFlash adapter in place:

This all worked as expected. MAXI000 incorporates some hardware not present in MINI000’s IOBoard IDE interface, namely some bus drivers and terminating resistors. The databus uses a pair of 74HC245 (PDF) bi-directional bus drivers, which were a first for me (MAXI09 included them on the expansion connector but I never got around to exercising them). The resistor arrays were a little tricky to solder but otherwise this part of the build presented no issues.

I have tested the little “performance hack” in the IDE wiring whereby the address bus is shifted up one position. This allows the 68000 to perform 32 bit reads and writes on the 16 bit IDE datsbus since the data register is essentially presented at two adjacent 16 bit registers, which saves a few cycles, and an instruction, per operation. The 68000 even does the operation in the correct order so 32 bit longs do not need there words to be swapped in code. I’m not sure if the order of such operations is defined in the 68000 user manual (PDF) or if this works only through luck.

Ignoring the keyboard, an essential input device for an OS with a GUI is the mouse. I’d previously experimented with a PS/2 port bodged onto the MINI000 board, and MAXI000 includes such a port. After soldering the connector and its four pull-up resistors, and pasting the previously written VHDL into the project for Beta, I verified that the PS/2 port could be read (and written) by the 68000, just like before. The next step was to write a proper test routine for the monitor. This test routine would read mouse movements and draw a dotted line when the left mouse button was pushed.

This test routine first required a bitmap mode running on the Beta FPGA. I had previously experimented with a monochrome bitmap mode before, but Atari ST TOS usually runs, in a much lower resolution, in four colours and EmuTOS copies this formatting. It made sense, at this point, to extend the bitmap mode to the mode needed by EmuTOS, which supports higher resolutions and colour depths with the correct hardware. Interestingly the memory layout is a little weird: instead of either interleaving bits so each byte would hold the complete colour for four pixels, or using distinct bitplanes like the Amiga, the Atari hardware interleaves words; 16 bits/pixel on bit position 0 followed by 16 bits/pixels on bit position 1.

Implementing this mode was pretty easy. The palette is hard coded in the VHDL to the Atari defaults: white, green, red and black. Here’s a simple test image:

This image was generated by writing 0x00ff on word plane 0 and 0x0f0f on word plane 1. The stripes are therefore four pixels wide.

Getting the mouse position dot to move about the screen was a little harder. To make this code easier to write, I wrote it in C, linking the C file into the monitor with a command wrapper in assembler that ran the C function.

Once instructed to do so with a special command byte, a PS/2 mouse generates a stream of 3 byte packets in the following format:

The fiddly part is that the x and y offsets are 9 bit two’s complement values, with the MSB in a bit in byte 0. The conversion is done with the following C code:

x_pos += mp->x_delta - ((mp->mouse_state) << 4 & 0x100);
y_pos -= mp->y_delta - ((mp->mouse_state) << 3 & 0x100);

if (x_pos < 0) x_pos = 0;
if (x_pos > 639) x_pos = 639;
if (y_pos < 0) y_pos = 0;
if (y_pos > 479) y_pos = 479;

This computation is done after receiving the 3 bytes that make up one mouse packet. Building up the mp (mouse packet) structure is done outside of the ISR, which only pulls off a single byte:

void  __attribute__ ((interrupt)) mouseisr(void)
        data = READ_BYTE(PS2ASCANCODE);
        new_data = 1;

The new_data flag is marked volatile because it is used by inside and outside the interrupt handler. The READ_BYTE macro reads a byte from an IO port (or memory), taking care of casting. And the __attribute__ (interrupt) modifier, which is GCC specific, makes the compiler use an rte instruction to end the subroutine instead of an rts.

Setting a pixel on the screen consists of four steps:

  1. Calculating the video memory address
  2. Reading the current state of the video memory at that address
  3. Setting a pixel in the retrieved byte
  4. Writing the bytes back out

This translates quite easily into C code:

void draw_dot_at_pos(void)
        uint32_t address = (y_pos * LINE_LEN_WORDS) + (x_pos / 16 * 2);
        uint16_t odd = (x_pos / 8) % 2;
        WRITE_LONG(VGARWADDRHI, address);
        uint8_t plane1 = READ_BYTE(VGADATA + odd);
        uint8_t plane2 = READ_BYTE(VGADATA + odd); 
        plane1 ^= 1 << (7 - (x_pos % 8));
        plane2 ^= 1 << (7 - (x_pos % 8));
        WRITE_LONG(VGARWADDRHI, address);
        WRITE_BYTE(VGADATA + odd, plane1);
        WRITE_BYTE(VGADATA + odd, plane2); 

The initial READ_WORD is a dummy read, needed when changing the read/write address. When updating this address a single 32 operation is used to more efficiently update two 16 bit registers, in a similar way to how the IDE data register is manipulated.

This code also uses a recently introduced feature of the VRAM updating logic, within Beta. The /UDS and /LDS signals are now honored when reading or writing the video memory such that a single byte, and not just the whole word, can be read or written. This is done, in 68000 code, by writing either to the VGADATA register in a word operation, or writing to either VGADATA or VGADATA+1 with a byte operation. Perhaps confusingly, the VGARWADDR 32 bit pointer is still a word address, and not a byte address, and each read or write to VGADATA will advance the pointer by a full word. The purpose of this change is primarily to make updating a single character square possible, without having to read in two 8 pixel wide squares at once and then OR-ing the needed changes.

Obviously this is not all of the code. There is more code which, outside of the ISR, waits for a PS/2 byte and updates the mp struct, and there is code for clearing the screen, etc.

Here’s a picture of this little demo in action:

I was pretty pleased with this.

Back in EmuTOS land, and with the knowledge gained, it was pretty easy to add a PS/2 ISR for reading the mouse data, translating it to the needed EmuTOS data format, and calling a hook with that packet.

The first approach to getting the desktop to appear on the screen was extremely crude and simply considered of copying, in 68000 assembly, the video buffer from main memory to the Beta data register. This was done inside the timer interrupt. After this was added, some level of success. First the boot screen appeared, in glorious colour:

And then the “little green desktop”:

Unfortunately I then had two significant problems:

  • It was very slow to update the screen. The timer ISR can only do an update every 32nd timer tick, which ran at 200hz or the CPU doesn’t have enough time left to do useful work
  • The 68000 was generating every type of crash exception available, randomly

I thought it would be easier to fix the first problem then the second one, so I set about updating the screen data using DMA.

There are, broadly, two approaches to DMA in a system like MAXI000:

  • Act as a middle man and read data into a temporary buffer before writing it on the next clock cycle
  • Select the source device and address and have the target read it by directly pulling it from the databus – only possible if the destination is an IO port

The second approach is not a universal solution to DMA as it requires additional control signals, but it is faster as the busses are, essentially, used only for reading.

Alpha now has three new registers:

  1. A 24 bit (16 + 8) source address
  2. A 16 bit length
  3. A 16 bit flags

The flags register is used to control the transfer, which happens in one chunk. It has a single bit: whether or not the source address should increment through the transfer. Its not useful in this case, but could be useful for other types of transfers.

The most interesting part of this mechanism is that Alpha can now act as a bus master. This uses two so far unused 68000 signals:

  • /BR – Bus Request – an input that is asserted by an external device to indicate that it wants to take over the busses
  • /BG – Bus Grant – an output that indicates the 68000 has given up the busses

There is a third signal, /BGACK – an input that indicates the bus take over is complete. This is not needed when there is only one alternative bus master, when it can be tied high.

The sequence to complete a transfer is the following:

  1. The 68000 writes the source address and length to the relevant registers on Alpha
  2. The 68000 writes to the flags register on Alpha, which sets the busses as being wanted by setting /BR low
  3. Once the 68000 has finished with the current bus cycle it will assert /BG, indicating that it will give up the bus
  4. Once /BG has been asserted, and /AS has been negated, Alpha can commence the DMA operation
    1. On each clock it will perform a read operation at Start address + count, upto the transfer length
  5. When the transfer is complete Alpha will negate /BR
  6. The 68000 then negates /BG and takes the busses back and resumes executing instructions

During a transfer the address presented on the main address bus will increment on each clock edge. Meanwhile a signal, /DMA, will be active on Beta, which will “see” the data outputted by whatever device Alpha has selected (likely the SRAM). Whilst /DMA is active Beta will increment it’s own video memory address, assert the needed video memory Chip Select and write line, and copy the system databus onto the video memory databus. The video memory will therefore, over the course of the transfer, copy main memory (if that is the source of the transfer) into video memory. The initial video memory address comes from the read/write address pointer register, the same one the MPU manipulates when doing non DMA accesses of video memory.

The controller is mostly contained in the following module:

entity dmacontroller is
    port ( CLOCK : in STD_LOGIC;
           SRC : in STD_LOGIC_VECTOR (23 downto 1);
           LEN : in STD_LOGIC_VECTOR (15 downto 0);
           INC_SRC : in STD_LOGIC;
           A : out STD_LOGIC_VECTOR (23 downto 1);
           TRIGGER : in STD_LOGIC;
           RnW : out STD_LOGIC;
           RUNNING : out STD_LOGIC);
end entity;

architecture behavioral of dmacontroller is
    signal COUNTER : STD_LOGIC_VECTOR (15 downto 0) := (others => '0');
    signal RUNNING_STATE : STD_LOGIC := '0';
    signal LAST_TRIGGER : STD_LOGIC := '0';
    process (CLOCK)
        if (CLOCK'Event and CLOCK = '1') then
            LAST_TRIGGER <= TRIGGER;
            if (TRIGGER = '1' and LAST_TRIGGER = '0') then
                COUNTER <= x"0000";
                RUNNING_STATE <= '1';
            end if;

            if (RUNNING_STATE = '1') then
                if (COUNTER = LEN - 1) then
                    RUNNING_STATE <= '0';
                end if;
                COUNTER <= COUNTER + '1';
            end if;
        end if;
    end process;

    A <=
        SRC + COUNTER when (INC_SRC = '1') else
        SRC when (INC_SRC = '0') else
        (others => '0');

    RnW <= '1';
end architecture;

There is still a fair amount of logic “in front of” the DMA controller module. This is because Alpha also needs to now do address decoding for the address generated by the DMA controller, as well as the processor, which is all it did before.

Here’s a logic analyser trace of a 16 word transfer showing the key signals:

The MPU is still only being clocked at 8MHz here (I call the signal 16M out of habbit), since faster speeds make captures from my logic analyser harder. The transfer rate is therefore a fairly respectable 16MB/s. You can see A1 through A4 being counted up through the transfer, the R/W line being held high for a read operation, and of course the /BR and /BG signals.

Before looking at modifying EmuTOS to update the screen using this method, it is helpful to consider the timing aspects.

A 640×480 screen at 2 bits per pixel needs 640 / 16 * 2 * 480 words, or 38,400 words (76,800 bytes). At 8,000,000 words a second it still takes 4.8ms to copy a whole screen worth of data.

The DMA controller also introduces a downside compared to updating the video memory via MPU code: whilst the update is going on the video memory can’t be read, even to draw the display on the screen. This means that the update needs to happen during the vertical blank interval, but since it is only 1.8ms long the update needs to be broken into sections: I’ve chosen to do the update over 8 VBL intervals. Splitting it up over 8 chunks also means interrupts, including the PS/2 byte read interrupt, are blocked for the shortest time possible. In fact it is that which is causing one outstanding problem: because the PS/2 interrupt has the highest priority it will delay the processing of the vertical blank interrupt, causing the update via DMA to happen whilst the screen is being drawn. The result of this is black bars at the top of the screen whilst the mouse is moving around.

Modifying the screen update code to use the new DMA controller instead of writing to the VGADATA register in code was pretty easy. Here is the vertical blank interrupt handler:

static void __attribute__ ((interrupt)) maxi000_int_vbl(void)

        if (slice_count == 0)
                this_slice_ad = (uint32_t) v_bas_ad;
                VGARWADDR = 0;

        DMASRC = this_slice_ad;
        DMAFLAGS = 1;

        this_slice_ad += SLICE_LEN * 2;

        if (slice_count > 7) slice_count = 0;

this_slice_ad rotates through each eighth of a screen of video memory, starting at v_bas_ad, which is a TOS system variable pointing to the start of screen memory.  SLICE_LEN is the length of a eighth length of the screen memory in words, hence this_slice_ad is moved forward by twice SLICE_LEN, since it counts in bytes and not words.

I suspect the problem with the black bars due to bytes being received by the PS/2 port would be trivial to fix by introducing a FIFO to hold the 3 bytes in a typical mouse packet.  PS/2 bytes would therefore not be lost if they happened whilst DMA was ongoing. There are doubtless other solutions. Fixing the exceptions is a different story.

For anyone interested, my changes to EmuTOS for the MAXI000 board have been pushed into my own fork of the repository, available here.

Out of suspicion for the system SRAM, I have written a fairly extensive memory test routine that runs within the monitor. It has not detected any problems. It’s also interesting that I have never seen any problems when using the monitor, or whilst running EhBASIC, which I had running at one point. One possibility, therefore, is some problem with the toolchain used to build EmuTOS. However I am using the recommended version.

Roughly speaking I have the following options available for things to work on next:

  • Diagnose what is causing the EmuTOS crashes and continue to refine my port
  • Give up with EmuTOS and start planning and writing my own Operating System
  • Related to the above, it would be fascinating to look at extending the video implementation further: more colours, accelerated drawing, etc
  • Finish the hardware build of the MAXI000 board, including the SIMM slot which remains to be done
  • Do something completely different

I must say I’m not ecstatic about working on EmuTOS. It’s a “tough” codebase to learn and work on, especially for someone like me who is not deeply familiar with Atari TOS.

One positive is that EmuTOS writes exception data directly to the screen:

Unfortunately there seems to be little pattern to the crashes I’m seeing, making debug extremely difficult.

I certainly will finish the hardware build for MAXI000 at some point. And working on the graphics side has been fun, so I will absolutely come back to that.

But because of wanting a temporary change of pace from the 68000 (as much as I love the processor) I’m seriously thinking of going off on a tangent and tackling something new and exciting next.

Namely to fore-fill a boyhood dream and attempt to design my own processor…


4 thoughts on “More MAXI000 construction and progress towards an EmuTOS port

  1. Paul Wratt

    Bus Error – something to with odd addresses

    I believe this is actually an easy fix, since you are only using ROM code atm. So short of one of your extra pieces of code “breaking something” (I doubt that), or one of the ALPHA/BETA interactions “breaking something” (maybe, somewhere in the new stuff? sisnce previous code showed no issues), pop a post on the EmuTOS mailing list, and I am sure you will get a solution.

    The fact that it is “random” tells me its not, its just happening at random times (maybe?) or random places, which just makes it look random.

    Seriously, I think there is a simple solution here, and you may kick yorself when you find the problem

    Cheers, and thanks for putting the effort into getting EmuTOS running on your MAXI000. The more platforms it can run onn the better


    1. aslak Post author

      Bus Error is just one of the errors, unfortunately. The error occurs after call_mousevector() call, which is within the PS/2 handler.

      FWIW the hardware does implement the bus error signal, on an invalid address.

      I’ve also had Address Errors, Illegal Instructions and even a “no free EVBs” error.

      I’ve ran memory tests (which have the timer and VBL ISRs running) for hours and not seen any issues. It’s really quite baffling. 🙁



Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.