Whilst I did attach the graphics card to MAXI030 it was only to confirm that I could turn the card’s LED on and off. The VHDL for the Cyclone FPGA is in a sorry state and needs major work before I use it in any kind of meaningful way.
Continuing on with MAXI030 my first port of call was to revisit the MC68882 and figure out why it was “half working”. My first attempt at solving this problem was to assume it was a bad solder joint and reflow the processor and FPU pins. This yielded no improvement, so the next thing to do was to look at the VHDL.
The coprocessor is selected under the following conditions:
- A falling clock
- /AS (Address Strobe) is low : the processor address bus contains a valid address
- FC (Function Code) is 111 : the processor is running a “CPU space” cycle, ie. it is not an ordinary memory access
- A (the address bus) 19 down to 16 is 0010 : this indicates a coprocessor access within the CPU space
- A 15 down to 13 is 1111 : this indicates a FPU coprocresor access, vs an MMU one
Because the address decoding logic needed by MAXI030 is not at all trivial, and it is easier to write it this way, it is currently implemented as sequential logic synchronous to the processor clock. The trigger for the decoder is the falling edge of the clock. This allows it to be written as a nice collection of nested if blocks, as summarised here:
The issue with this code is the Chip Selects for the various system peripherals, memory, and the hardware registers implemented in the FPGA itself will be asserted the following falling clock edge instead of as soon as /AS is strobed low. This is not ideal.
Here is a timing diagram:
This scan is taken from the Alan Clements book Microprocessor Systems Design: M68000 Hardware, Software, and Interfacing. Note that the timing diagrams for the 68030 are the same as for the 68020, at least where asynchronous (non burst mode) bus cycles are concerned.
You can see that the next negative clock edge after /AS is asserted is the start of S3. The Chip Selects wont be asserted until S5. This works because the /DSACK handling introduces in a wait state for all accesses.
After rewriting the decoder as combinational logic, I could not get the computer to start completely, though the logic analyser revealed the processor was “mostly” working; the processor hanging in the bootloader, just prior to outputting its banner on the flash reprogramming UART channel. At this point the processor is running code that was previously copied to the SIMM. Eventually I figured out the problem: a longstanding issue with my SIMM controller where it would release the wait state for an extra cycle, the precharge cycle, before returning to the idle state.
With this fixed, and the combinational decoder in play, the computer worked, but I found that even small changes to the VHDL would yield a non starting board. The odd thing is the timing report indicates a more then adequate fMax. For now I’m back to a synchronous decoder.
But the good news is I am now clocking it on the rising clock edge: the Chip Select is being asserted no later then S3. I can eliminate the delayed /DSACK. This means for zero waitstate parts the board completes a machine cycle in the minimum 3 clocks. Talking of waitstates, the current waitstate setup is:
- SST39SF040 (PDF) flash: 1 waitstate
- SC26C94 (PDF) QUART: 1 waitstate
- RTL8019AS (PDF) Ethernet NIC: 2 waitstates
These timings were determined by looking at the datasheets and, where the timing graphs were not clear – such as in the case of the RTL8019AS – experimentally.
The other good news is the 68882 FPU is now working thanks the decoding being implemented on the rising edge of the clock. In doing this I have added a new signal, combinarionally generated. This signal, cleverly named cycle_type, indicates the type of cycle the processor is currently running:
- CYCLE_NULL: this is the case when /AS is high
- CYCLE_FPU: the processor wants access to the FPU
- CYCLE_INT_ACK: the processor is running an interrupt acknowledge cycle and wants an interrupt vector
- CYCLE_NORMAL: a regular memory or IO cycle is being run
This cycle_type signal is fed into the sequential address decoder to remove some of the logic there, as well as being used for the FPU chip select, which is done synchronous to the rising edge of the clock along with all the other decoding. Other actions using cycle_type such as asserting interrupt vectors – which I’ve not yet looked into, might not be done synchronously.
Next, I thought I should solder some more of the board, starting with the RTL8019AS Ethernet controller and associated parts:
Soldering this QFP package was achieved using the usual technique of tacking opposite corners then individually soldering pins, after applying lots of flux.
On the software side, the starting point was the driver for a RTL8019AS card add-on for the RC2014 Z80 based retro computer. Essentially the ne2k.c source was modified slightly and glued into my monitor program, with commands added for sending and receiving Ethernet frames. Arbitrarily I used the EtherType 0x0888 for my packets.
Testing was achieved by plugging the MAXI030 board into my desktop 1Gbit network switch and sending packets between it and my Linux box. This in turn required some Linux software, since there seems to be no standard tools for generating Ethernet frames, though tcpdump can be used to show any received frames. In the end I found two (send and receive) standalone programs on github which were useful for teaching me how to generate raw frames inside Linux userspace.
The result was, after some fiddling, success: I could send and receive Ethernet frames on MAXI030. Here’s a shot of the Linux side receiving a packet sent by MAXI030:
Note the hard-coded MAC address for the MAXI030 board: 11:22:33:44:55:66 which is configured into the RTL8019AS by the processor at initialisation time, and the fact that the packets are padded to 64 bytes, which appears to be the minimum size for an Ethernet frame. This padding is done at the sending end.
I’ve since found out that a MAC address with bit 0 or bit 1 of the first byte set in a MAC address has special meaning to do with broadcast packets, so I’ve subsequently picked a different MAC address for MAXI030: 00:11:22:33:44:55.
I was, and am, very pleased that MAXI030 has working Ethernet!
After this, some more mundane soldering: the PS/2 port, serial keyboard port, and DS1307 (PDF) Real Time Clock parts were all soldered and tested out fine. There is nothing really to say here as the work was just a repeat of the work done for MIDI020.
The 68K (for parts with MMUs) port of Linux is one the oldest ports, possibly even the first, though the Alpha port may be older. Indeed, the framebuffer console was originally written for the 68K Apple Macintosh port, since the Mac has only bitmap graphics output and no PC-style support for outputting text.
These days the port, with the architecture name m68k, is still actively maintained. Though this is specific to the kernel; distributions like Debian Linux previously directly supported being run on Amigas and other 68K machines, so long as they had an MMU and a reasonable amount of RAM, but this hasn’t been true for some time unfortunately. None the less ready to run file system tarballs of m68k Debian are available, albeit for older releases. The last release, 4.0, is also available in the official Debian apt repositories, as is the current “ports” release, which holds mostly up to date builds.
In tackling this port I used, as a guide, the work done by Will Sowerbutts to run Linux on the KISS-68030 retrobrew computer. Much of this guidance was around what actually is required to create a platform (as Linux terms a particular model computer which uses a certain processor type or architecture). The KISS-68030 uses some of the same components (eg. IDE) as MAXI030, but is largely a different animal.
One thing to bear in mind is I am not a Linux kernel hacking expert. I have dabbled with making small changes over the years, and am certainly familiar with the process of configuring the kernel, doing builds and so on, but this is far and away the deepest I’ve gone into the kernel in the nearly 25 years (it’s scary to type that!) I’ve been using it.
It took a fair amount of work, but over the last few weeks I’ve made a fair amount of progress.
Something required before working on my port was a way to transfer a kernel file to the board.
The initial solution to this problem was to copy the compiled kernel, after turning the file from ELF into a raw binary with objdump, onto a Compact Flash card which was then moved to the IDE to CF adapter plugged into MAXI030. No filesystem was used; the kernel was copied onto the raw disk using dd. This quickly became tedious however.
The improvement on this was to use the network. Since I had a working Ethernet link between MAXI030 and my Linux box, the solution was a simple protocol concocted out of raw Ethernet frames. In this description the client is MAXI030 and the server is the Linux box:
- Client sends a request block consisting of a filename padded to 256 bytes
- Server returns with a file length, or 0 if no file is at this name, in Big Endian byte order
- Client then acknowledges this reply, which is the trigger to the server to enter the sending loop, which continues until the file has been sent
- Server sends 1024 bytes of the file, which is written into memory on the client
- Client then sends exactly what it got back again so the server can check it
- If it’s good, server sends a 0 reply and the client advances the write pointer
- If it’s bad the client and server won’t advance the pointers so it can be retried
- This retrying continues indefinitely
- Client acknowledges, which means the server can send the next block (or the same block again if the last one failed)
The point of the back and forth packets is to avoid a situation where the Linux box is sending back to back packets. While this would probably be alright if done once or twice, in testing I have easily swamped the MAXI030 packet reception code. This happens for various reasons including:
- The Linux box is on Gigabit networking, whereas MAXI030 is on 10Mbit
- There is no flow control in the protocol
- Simple PIO is used to move packets on and off the NIC IC
Nevertheless this simple mechanism works well. Retries are very rare and the time to send a 4MB kernel file is around 40 seconds. Using TCP would of course be preferable, though that will have to wait.
After receiving the kernel image and writing it into memory, 4KB (a page) in from the start of RAM, a final step is to add the bootinfo record to the end of the kernel. This record, specific to the m68k Linux port, is normally created by a boot loader and tells the kernel some fundamental characteristics of its environment:
- What platform it is running on (a simple enumerated value, which has been extended to now include MAXI030)
- What processor is installed (68030)
- What FPU is installed (68882)
- The type of MMU in use (68030)
- The amount of installed memory and its starting point in the memory map (32MB from location 0)
- The kernel command line (“console=ttySC0 root=/dev/sda1”)
- An end of record marker
Ordinarily a boot loader would be used, but as the monitor I wrote for MAXI030 is currently being used to start Linux, the monitor is used instead. This record is itself an array of Tag Length Value entries.
After downloading the image and appending the bootinfo record, starting the kernel is easy: it is simply jumped too.
Perhaps the best way to explain how this port, which is far from finished, was achieved is to go over the key files added and changed to the Linux kernel source.
This is the startup entry point of the kernel. The main changes here were to, for the newly created MAXI030 platform:
- Configure the 68030’s MMU for the MAXI030 memory map. Linux, on the m68k architecture at least, expects main memory to be at logical address 0x80000000. Unfortunately at present this address is used by the first expansion slot. Because only the first 32MB needs to be mapped, as that’s how much memory is currently on the board, this can be worked around by mapping logical address 0x8000000 to 0x81ffffff to the physical address 0x0 to 0x1ffffff. 0x84000000 is used for onboard peripherals and is directly mapped, using the 68030’s TT registers, logical to physical.
- At some point I will renumber the memory map to make 0x80000000 available for memory. I can then directly map the expansion card address spaces.
- Use the SC26C94 (PDF) UART for diagnostics. head.S contains a mechanism for outputting very, very low-level debug text, so it was extended to use the MAXI030’s UART.
- Because I had some problems passing this stage, I added some additional debug messages. These will eventually be taken out.
A change is required here to call the config_maxi030() subroutine in the following module to configure platform specific (ie. not core to the m68k architecture) hardware.
This is the main module for tailoring the hardware drivers to the specific hardware on the board.
The scheduler is linked to a timer interrupt generator, which was added to the FPGA design, on autovectored interrupt #1. Autovectored interrupt #2 is used by the UART channels, Autovector #3 is reserved for the IDE interface, and Autovector #4 is used by the RTL80194AS NIC.
This module also contains the definitions for the resources (essentially memory areas and IRQs) used by the various drivers.
This module contains the service routine for the timer interrupt, which is used to “nudge” the scheduler. This interrupt fires at a rate of 100 Hz, per the standard for smaller Linux systems. After being unsuccessful with using the timer in the SC26C94 (PDF) for this purpose, a dedicated component within the FPGA design was created.
This module also defines what Linux terms an “irq_chip“; a software construct for hiding the details of how interrupts are routed in a board. A simple interrupt router has been added to the FPGA: a register is exposed which sets which interrupts should be passed (bit is one) or blocked (bit is zero) and this irq_chip contains pointers to subroutines which manipulate this register.
It took me a while to find it, but mainline Linux contains a driver for the SC26C92 (PDF) DUART, and indeed other parts which have a similar register API (including the MC68681 (PDF). The interesting thing, for MAXI030, is that the SC26C92 is essentially half of a SC26C94. Thus this port makes use of this driver to access Port A and Port B, which are the ports on RJ45 connectors.
The one change I had to make to this driver was to switch it over to using traditional Linux ISRs instead of the threaded kind. It seems that m68k Linux does not support this newer infrastructure for interrupts, which would make sense since I suspect that using a special thread for ISRs only benefits larger systems.
The reason a timer was added to the FPGA was because it seemed to not be possible to share the UARTs interrupt between the timer, needed by the scheduler, and the serial driver. A little annoying, but no big problem.
This is a new driver, but it is largely based off of the zorro8390.c driver, which is a driver for an Amiga Zorro NIC which uses an RTL8019AS, the same as MAXI030.
This was by far the biggest bit of work done for this port, since the Zorro parts needed to be removed and the driver retargeted as a “platform” driver. It’s hardware address details and the IRQ to be used originate in the MAXI030 configuration module.
Other then currently being reduced to work on 8 bit transfers, this driver appears to work well. I was pretty excited to see MAXI030 send pings to google.com!
In terms of IDE, the pata_platform driver is used. This is a highly adaptable generic IDE driver which can be configured with different register mappings.
The usage of this driver does not currently use interrupts, and certainly does not use MAXI030s DMAing abilities – which have not even been prototyped – but it does make use of 32 bit wide transfer instructions when doing block read and writes.
The transfer speed is still pretty poor: 1MB/sec, tested using a simple dd read.
I’ve pushed up my changes to a fork of the Linux kernel repository, on my github account, for anyone interested in how this was all done.
The last area worthy of some discussion is the userland side. After playing with running a relatively up to date Debian and being disappointed with the speed (or lack of) I’ve switched to using a build based on Etch (4.0, released in 2010. I do want to understand why modern Debian, on m68k at least, has regressed so badly: a compile of “Hello World” goes from about 15 seconds on Etch to 50 seconds. I think it is something to do with the introduction of Native Posix Thread Library (NPTL) threads, which replaced the earlier LinuxThreads implementation.
As it seemed the most sensible approach, the Linux root file system running on MAXI030 is produced using a simple script which leverages multistrap, a program for producing debootstrab-like file system images for architectures other then the one the host is running. The script I wrote carries out the following steps:
- Runs multistrap to pull down the core Debian packages
- Creates a few essential system configuration files including the fstab and inittab (the inittab is uninteresting except it runs a login getty on each serial port)
- Configures the DNS resolver and network interfaces
- Runs “dpkg –configure” to configure each package previously installed
- Installs additional packages including links, ircII, Apache, as well as more mundane things like text editors and network utilities
- Copies in the kernel modules which were built as part of the kernel build process
- Creates a non-privileged user, and sets passwords for that user account and the root user
The script makes use of QEMU‘s m68k support. This allows Linux programs compiled for the m68k to be run on other architectures. The process is in fact completely transparent and a simple chroot command can be used. My script is therefore full of commands like this:
chroot $DIR apt-get clean
This will run the apt-get command, which is a m68k binary inside the target file system.
The script is rough, a true hack; it lacks any error checking whatsoever. But it works well enough.
All told I’m very pleased with how my little port is coming along. So pleased, in fact, that I made a video:
As well as being a demonstration of the Linux port, it also goes over the MAXI030 hardware.
Even with an older Linux distribution, the performance is still not great in some places. But the good news is there are many reasons for this, and exploring them all is going to be a whole lot of fun.
Just to briefly list some ideas for future work on the Linux port, some of which will no doubt require changes to the FPGA design and testing out with the monitor. In no particular order:
- A Real Time Clock driver for the DS1307 (PDF) would be useful. Linux contains a driver for the DS1307, but the piece that I will have to write is the I2C master implementation which will drive the I2C controller registers on the FPGA. This could be fairly trivial, or it could be very difficult.
- While I’m there I should probably look at exposing other I2C peripherals to the kernel and/or userspace.
- On the IDE controller front, interrupts would be helpful. It appears to just be an option, though a quick try out was not successful. More useful on hard disks as opposed to Compact Flashes, but it would still be nice to get this working.
- Following on from that, and it’s almost a project in its own right, is to get DMA transfers working in the IDE driver.
- I’ve noticed that the UART driver does not support setting alternative baud rates. This needs correcting.
- 16 bit transfers for the RTL8019AS driver would certainly be nice to have.
Outside of the Linux port, I really want to soon look at the SIMM controller VHDL to see if I can speed it up further. I’ve managed to eliminate some wait states but it still does not make use of most memory module’s Fast Page Mode ability. Making use of the 68030’s burst mode would be great as well, though I suspect it will be a lot of work…