VGA improvements

      No Comments on VGA improvements

After trying to make some small improvements to my VGA implementation I’ve come to the conclusion that it is deeply flawed.

Whilst I can produce a working video controller that appears to operate reliably, if I make an even insignificant change to the VHDL code then corruption when doing writes into the video memory usually occurs. It’s also clear, from looking at the timing analysis, that even when it is operating correctly it is right “at the edge” of not working. The big problem, which lead me to take drastic action, was that no matter what I tried I could not get reliable operation after adding a simple address decoder for the second 32KB 12nS video SRAM.

The crux of the problem with the old implementation was that whilst the display of characters and the reads from video memory required to do this were done under the direction of the 25.175MHz pixel clock, writes into memory at the control on the MPU were done by a process clocked by the MPU. The needed arbitration, for the video memory bus, could not be properly handled due to signals crossing clock domains.

The upshot of this problem is that I have almost entirely rewritten the VHDL code for the VGA controller. As well as all the other changes, internal signals are now all using positive logic, where the asserted state is represented by a ‘1’. External signals which are active low are prefixed with an “n” and are inverted internally and assigned to a signal without the “n” prefix. This should go some way to making the code easier to follow.

In addition to it being a much cleaner codebase with properly partitioned parts and, critically, it is now reliable and can be extended without immediately regressing, it has some nice new features:

  1. A simple bitmap mode is available. Only a 1 bit per pixel (or 8 pixels per byte) mode is supported at present.
  2. It now makes use of 2 of the 32KB SRAMs. The 3rd remains unused, as it would require the use of 17bit addresses, which are awkward for obvious reasons.
  3. Character mode now uses two bytes per character square; the second byte holds an attribute value in the format RGB plus a brightness bit, for both the foreground and the background.
  4. If an attribute byte is zero – the redundant black on black – the attribute comes from a register. This attribute is also used by the bitmap mode and is intended to quickly (without video RAM writes) change the colour of the display.
  5. Read-back of video memory data is now implemented, ie. the MPU can read the contents of the video memory.

The new implementation moves the MPU-directed read and write operations under the control of the VGA pixel clock. It also now properly deals with metastability problems inherent in clock domain crossing to give reliable operation.

The core of the implementation is a process which manages the video memory’s three operations: write under MPU control, read under MPU control, and read under the control of the display generator. This process is wrapped in an entity, to make it nice and self contained, and testable on its own. The process is relatively small so I will paste it here in its entirety:

process (CLOCK)
begin
	if (CLOCK'Event and CLOCK = '1') then
		WRITE_ENABLE2 <= WRITE_ENABLE; -- METASTABLE!
		WRITE_ENABLE1 <= WRITE_ENABLE2; -- STABLE!
		WRITE_ENABLE0 <= WRITE_ENABLE1; -- STABLE!
		READ_ENABLE2 <= READ_ENABLE;
		READ_ENABLE1 <= READ_ENABLE2;
		READ_ENABLE0 <= READ_ENABLE1;
		-- Latch rising edges so they are not missed if STATE /= IDLE
		if (WRITE_ENABLE0 = '1' and WRITE_ENABLE1 = '0') then
			WRITE_PENDING <= '1';
		elsif (READ_ENABLE0 = '1' and READ_ENABLE1 = '0') then
			READ_PENDING <= '1';
		end if;

		case STATE is
			when IDLE =>
				if (WRITE_PENDING = '1') then
					DATA <= WRITE_DATA;
					ADDR <= READWRITE_ADDR;
					READ <= '0';
					WRITE <= '0';
					STATE <= W1;
				elsif (READ_PENDING = '1') then
					DATA <= "ZZZZZZZZ";
					ADDR <= READWRITE_ADDR;
					READ <= '1';
					WRITE <= '0';
					STATE <= R1;
				elsif (DISPLAY_ENABLE = '1') then
					DATA <= "ZZZZZZZZ";
					ADDR <= DISPLAY_ADDR;
					READ <= '1';
					WRITE <= '0';
					STATE <= D1;
				else
					READ <= '0';
					WRITE <= '0';
				end if;
					
			when W1 =>
				WRITE <= '1';
				STATE <= W2;
			when W2 =>
				WRITE <= '0';
				WRITE_PENDING <= '0';
				STATE <= IDLE;
					
			when R1 =>
				READ_DATA <= DATA;
				READ_PENDING <= '0';
				STATE <= IDLE;

			when D1 =>
				DISPLAY_DATA <= DATA;
				STATE <= IDLE;

			when others =>
				STATE <= IDLE;
		end case;
	end if;			
end process;

WRITE_ENABLE and READ_ENABLE are two input signals which are controlled by the addressable register access and are thus set (and cleared) in the MPU clock domain. DISPLAY_ENABLE is set in the logic which builds up the content to display on screen and is thus set and cleared in the VGA clock domain. READ, WRITE, DATA and ADDR are all attached to the video SRAMs.

A state-machine is the core of this process. There are five possible states:

  • IDLE : Nothing to do
  • W1 : First write state
  • W2 : Second write state
  • R1 : Reading for the MPU
  • D1 : Reading for the display

The first thing that this process does is factor out the metastability problem inherent in using signals from slower clock domains by registering the MPU-clocked signals through three flip-flops in a chain. Nandland, a YouTuber who makes VHDL and Verilog tutorials, describes the problem and solution to metastability well in a video.

These enable signals have their rising edges latched so that the read or write action is still noticed even if the state machine is not in the IDLE state at the time of the transition.

When one of the three actions is required, and at the same time as setting the new state, the video memory signals are also manipulated. This is done to minimize the number of states (and thus clock periods) needed:

  • On a write the data to write and it’s address are asserted on the buses. At this point the video memory write line cannot immediately be put in the asserted state as the buses need time to settle (the setup time)
    • In the first write state (W1) the write line is asserted.
    • In the second state (W2) the write line is de-asserted.
  • On a read the video memory READ line is immediately asserted along with the needed address. This can be done here because a read is not destructive; if the wrong addresses is momentarily on the bus then no harm will be done.
    • In the first and only read state (R1) the databus contents are latched.
  • A display read is basically the same as an MPU initiated read, except that the metastability problem is not factored in; video memory reads occur in step with the process initiating the read.

After putting it off for literally years, I’ve finally invested the time to learn about VHDL testbenches. These are analogous to Unit Tests in ordinary programming languages. They are used for simulation and, as in this case, to render wave forms based on stimulus:

The software used here is ModelSim, which was originally written by Mentor Graphics but Altera/Intel also supply it with the Quartus software. This software has a very 90s feel to it, despite being only a few years old.

The inputs, and generated outputs, are scaled to real time with a MPU clock of 16MHz. Currently I have only looked at idealised simulations; FPGA internals are not factored into the generated waveforms.

The testbench used has two processes, one for the VGA pixel clock (CLOCK in the above code) and one for the MPU clock. The pixel clock process runs for 3 clock ticks, then asserts DISPLAY_ENABLE for one clock. This is the frequency with which the current display generator needs to be able to read bytes out of the video memory. The MPU clock runs for 3 clocks, then asserts WRITE_ENABLE for one clock, then runs for another 3 clocks before asserting READ_ENABLE for one clock. This is essentially the worst (or best, depending on your point of view) case for the 68000 bus cycle: a memory or IO access every 4 clock ticks. In reality bus cycles are needed for reading the instruction stream so reads and writes to IO will be less frequent.

The memory addresses accessed in the simulation are as follows:

  • Display : an incrementing address starting from 0
  • Write : 0x2222 is written to
  • Read : 0x1111 is read from

When memory is read the testbench puts 0x33 on the video memory databus, regardless of whether it is a MPU-driven read or a display-driven read. Not in the above waveform screenshot is the resultant READ_DATA and DISPLAY_DATA signals, but they are propagated back from the memory databus DATA signal correctly.

It’s interesting to see that the DISPLAY_ENABLE initiated reads for the display are locked out (skipped over) when either a WRITE_ENABLE or a READ_ENABLE cycle is going on. You can see this clearly by virtue of the missing addresses in the video memory address sequence:

  1. 0x0000
  2. 0x0001
  3. (missing due to write)
  4. 0x0003
  5. (missing due to read)
  6. 0x0005

This is of course due to the ordering of the if-elsif-elsif-else conditionals in the process given above.

It’s also interesting to see the delay introduced when registering a read or write event. This is due not only in the multiple clocks through the states changes, but also in registering the READ_ENABLE and WRITE_ENABLE signals through the metastability-compensating flip-flop chain.

There are, no doubt, more corners to explore with the simulation of this process under the testbench. For example, it might be interesting to simulate what would happen if the DISPLAY_ENABLE signal was cycled more frequently then every 4 clock ticks.

The above described memory controller (the entity vramcontroller in the code) is only one part of the VGA display generator. It happens, though, to be the part I’ve spent the most time rewriting and tweaking.

The core VGA timing generator has remained mostly unchanged. This is the entity which generates the horizontal and vertical sync signals. It generates one new signal: H_DISPLAYING_RANGE, which is set when the video memory needs to be read. This is 8 pixels (one byte) left of the left edge of the screen. This is because reads are performed for the next set of eight pixels displayed. This change means memory is no longer read continually.

The address generator code has also been rewritten. I was previously not happy with the general mess of this code, but it is now nice and “clean”. This code is responsible for generating the DISPLAY_ADDR, which is fed into the vramcontroller. This generates the address for both text and bitmap modes. The main difference between the two modes is that in text mode the address must wrap back to the beginning of each character row to read that row again, only one pixel down the screen – except on the bottom of each character, when it continues on to the next character row.

The other difference is that in text mode the address is also incremented half way through, horizontally, a character square so that the attribute byte can be read.

The rest of the implementation is concerned with two things: dealing with the MPU reading and writing the registers, and the logic for driving the display, which makes use of the vramcontroller entity and others.

On the subject of addressable registers, there are currently 7:

  • 0 : DATA – used for reading and writing bytes to the video memory
  • 1 : DEFAULT_ATTRIBUTE – used when the attribute byte is 0, and by bitmap mode
  • 2 : MODE1 – if bit 0 is 1 then bitmap mode is enabled, otherwise text mode is used
  • 3 : MODE2 – currently unused
  • 4 and 5 : READWRITE_ADDR_HI/LO – a pointer to the address to read and write
  • 6 and 7 : OFFSET_ADDR_HI/LO – the address to reset to at the top of each screen, which is used to facilitate hardware scrolling

Only the DATA register is readable by the MPU, to save FPGA resources.

From an implementation standpoint, the only interesting angle is how the READWRITE_ADDR registers operate. Note that the same pointer is used for reading and writing video memory.

A critical detail is that the address must be incremented after a read or write, but only after the vramcontroller process has latched it. It does this by incrementing it prior to the next read or write:

process (MPU_CLOCK)
begin
	if (MPU_CLOCK'Event and MPU_CLOCK = '1') then
		-- This should only be needed if the MPU is dealyed due to /DTACK
		LAST_WRITE <= WRITE;
		LAST_READ <= READ;
		
		if (WRITE_ENABLE = '1' or READ_ENABLE = '1') then
			-- This falling edge will propogate to the VRAM handler
			WRITE_ENABLE <= '0';
			READ_ENABLE <= '0';
			NEXT_READWRITE_ADDR <= READWRITE_ADDR + '1';
		end if;

		if (WRITE = '1' and LAST_WRITE = '0') then
			if (REG_SELECTS(REG_DATA) = '1') then
				WRITE_DATA <= D;
				-- Trigger a write in the VGA clock domain
				WRITE_ENABLE <= '1';
				-- Use the last write's next write
				READWRITE_ADDR <= NEXT_READWRITE_ADDR;
			elsif (REG_SELECTS(REG_DEFAULT_ATTRIBUTE) = '1') then
				DEFAULT_ATTRIBUTE <= D;
			elsif (REG_SELECTS(REG_MODE1) = '1') then
				BITMAP_MODE <= D (0);
			elsif (REG_SELECTS(REG_MODE2) = '1') then
				-- Unused
			elsif (REG_SELECTS(REG_READWRITE_ADDR_HI) = '1') then
				NEXT_READWRITE_ADDR <= D (7 downto 0) & NEXT_READWRITE_ADDR (7 downto 0);
			elsif (REG_SELECTS(REG_READWRITE_ADDR_LO) = '1') then
				NEXT_READWRITE_ADDR <= NEXT_READWRITE_ADDR (15 downto 8) & D;
			elsif (REG_SELECTS(REG_OFFSET_ADDR_HI) = '1') then
				OFFSET_ADDR <= D (7 downto 0) & OFFSET_ADDR (7 downto 0);
			elsif (REG_SELECTS(REG_OFFSET_ADDR_LO) = '1') then
				OFFSET_ADDR <= OFFSET_ADDR (15 downto 8) & D;
			end if;
		end if;

		if (READ = '1' and LAST_READ = '0') then
			if (REG_SELECTS(REG_DATA) = '1') then
				-- A read in the VGA clock domain fills out READ_DATA
				READ_ENABLE <= '1';
				READWRITE_ADDR <= NEXT_READWRITE_ADDR;
			end if;
		end if;
	end if;
end process;

NEXT_READWRITE_ADDR holds the address used on the next read or write; the READWRITE_ADDR is updated from it on each read or write, and it is incremented when READ_ENABLE or WRITE_ENABLE is set, which happens when the MPU accesses the DATA register.

The rest of this process is pretty simple stuff.

OFFSET_ADDR_HI/LO is updated on a write to its register addresses. This is the address that the DISPLAY_ADDR is set to at the top of the frame.

MODE1 and MODE2 are 16 bits of arbitrary “mode”, though only the first bit of MODE1 is currently used; it is used to enable BITMAP_MODE.

There is one quirk with the operation of the read function via the DATA register: the first read returns a nonsense value. This is because the value presented to the MPU’s databus is the last value read, and on the first read nothing has yet been read. This quirk is also present on the V9958, and presumably other ICs up and down this series which share this mechanic. One possibly solution to this “problem” would be to introduce wait states, so that the MPU is paused while the read happened, but this seems like a sub-optimal solution to a very minor quirk that is easily to work around in MPU code.

Read-back has so far been exercised by implementing a scrolling mechanism for the text mode console. While there a few different methods to implement scrolling, the method I’ve chosen is to use the hardware scrolling functionality in the FPGA to shift the view up, but for only one screen worth of text. After a whole screen has been scrolled in this way, the last written screen is copied back to the now hidden screen area before the offset is reset to the top. It works quite well, and is much nicer then my old V9958 text console, which had to copy the whole screen of text up one line on each and every scrolled line.

The rest of the VHDL is concerned with the actual building up of the display. This is surprisingly straightforward. First up the text mode handling:

-- Text mode: ignore flyback and leave memory bus idle
if (BITMAP_MODE = '0' and H_DISPLAYING_RANGE = '1' and V_VISIBLE = '1') then
	if (H_INTERCHAR = "000") then
		-- Clear the font ROM clock for the previous read
		FONT_ROM_CLOCK <= '0';
		-- Request a memory read, answer in DISPLAY_DATA, in 3 ticks for the character
		DISPLAY_ENABLE <= '1';
	elsif (H_INTERCHAR = "001") then
		-- Clear the memory read request
		DISPLAY_ENABLE <= '0';
	elsif (H_INTERCHAR = "011") then
		-- The data obtained forms the font ROM read address
		DISPLAY_DATA_FOR_ROM <= DISPLAY_DATA;
	elsif (H_INTERCHAR = "100") then
		-- Request a memory read, answer in DISPLAY_DATA, in 3 ticks for the attribute
		DISPLAY_ENABLE <= '1';
	elsif (H_INTERCHAR = "101") then
		-- Clear the memory read request
		DISPLAY_ENABLE <= '0';
		-- Request a memory read on the font ROM, answer in FONT_ROM_DATA in next clock
		FONT_ROM_CLOCK <= '1';
	elsif (H_INTERCHAR = "111") then
		if (NEXT_READWRITE_ADDR (15 downto 1) = DISPLAY_ADDR (15 downto 1) and FRAME_COUNT (5) = '1') then
			-- Swap attributes for the cursor
			if (DISPLAY_DATA /= x"00") then
				ATTRIBUTE_BYTE <= DISPLAY_DATA (3 downto 0) & DISPLAY_DATA (7 downto 4);
			else
				ATTRIBUTE_BYTE <= DEFAULT_ATTRIBUTE (3 downto 0) & DISPLAY_DATA (7 downto 4);
			end if;
		else
			-- Use the DEFAULT_ATTRIBUTE if memory attribute is 0
			if (DISPLAY_DATA /= x"00") then
				ATTRIBUTE_BYTE <= DISPLAY_DATA;
			else
				ATTRIBUTE_BYTE <= DEFAULT_ATTRIBUTE;
			end if;
		end if;
		-- BYTE_TO_DISPLAY is set by text and bitmap modes
		BYTE_TO_DISPLAY <= FONT_ROM_DATA;
	end if;
end if;

The sequencing is driven by the low 3 bits of the horizontal pixiel count; ie. the pixels across the face of a 8 pixel wide character. The processing is broken up cleanly into “left hand” and “right hand” halves of each character: on the left hand (first) half, the character to be drawn is read from memory by toggling the DISPLAY_ENABLE signal. On the right hand half of the character the attribute byte is read. An additional task is to read the character ROM, which is held in FPGA RAM bits; the address to read being determined from the character to show and the low 3 bits of the vertical position.

Instead of drawing the cursor position by using the inverted space character at 0x90 which is how it was previously drawn, the cursor is now drawn by swapping the foreground and background attributes for the character at the READWRITE_ADDR position. Additional conditionals are used to get the attributes from the DEFAULT_ATTRIBUTE regsiter if the attribute byte read out of video memory is zero.

BYTE_TO_DISPLAY is the 8 bits which is the 8 pixels that will find their way onto the screen. This signal is also populated by the bitmap generator code:

-- Bitmap mode: ignore flyback and leave memory bus idle
if (BITMAP_MODE = '1' and H_DISPLAYING_RANGE = '1' and V_VISIBLE = '1') then
	if (H_INTERCHAR = "000") then
		-- Request a memory read, answer in DISPLAY_DATA, in 3 ticks for the pixel byte
		DISPLAY_ENABLE <= '1';
	elsif (H_INTERCHAR = "001") then
		-- Clear the memory read request
		DISPLAY_ENABLE <= '0';
	elsif (H_INTERCHAR = "111") then
		-- BYTE_TO_DISPLAY is on the common path for text and bitmap modes
		BYTE_TO_DISPLAY <= DISPLAY_DATA;
		-- Use the register for the attribute in bitmap mode
		ATTRIBUTE_BYTE <= DEFAULT_ATTRIBUTE;
	end if;
end if;

Quite a bit simpler then text mode! It simply has to read a byte from the video memory and stick it into BYTE_TO_DISPLAY. The attributes used (what colours to show for a one or zero in the read bitmap) are obtained from the DEFAUT_ATTRIBUTE register.

The final task, for the display at least, is to actually feed the BYTE_TO_DISPLAY into the RED, GREEN and BLUE external outputs:

process (VGA_CLOCK)	
begin
	if (VGA_CLOCK'Event and VGA_CLOCK = '1') then
		if (VISIBLE = '1') then
			-- Read the pixel state out of BYTE_TO_DISPLAY, reading one bit at a time, left to right 
			GLYTH_PIXEL <= BYTE_TO_DISPLAY (to_integer(unsigned(not H_INTERCHAR)));
		else
			-- Not a visible pixel; make pixel always off
			GLYTH_PIXEL <= '0';
		end if;
	end if;
end process;

-- Cater for visible or not, on or off, and bright or dark pixels
RED <=	F_RED & F_RED & F_RED 	when (GLYTH_PIXEL = '1' and F_BRIGHT = '1' and VISIBLE = '1') else
	'0' & F_RED & F_RED	when (GLYTH_PIXEL = '1' and F_BRIGHT = '0' and VISIBLE = '1') else
	B_RED & B_RED & B_RED	when (GLYTH_PIXEL = '0' and B_BRIGHT = '1' and VISIBLE = '1') else
	'0' & B_RED & B_RED	when (GLYTH_PIXEL = '0' and B_BRIGHT = '0' and VISIBLE = '1') else
	"000";

Red, green and blue are obviously managed the same.

The first task is to extract the pixel state into GLYTH_PIXEL. “Glyph” (the correct spelling) originally referred to the font data, but it also holds the current byte from the bitmap pixel to be drawn.

There are five “modes” for a pixel, which set the ultimate colour to be shown:

  1. The source pixel is set and the foreground brightness attribute is set
  2. The source pixel is set and the foreground brightness attribute is clear
  3. The source pixel is clear and the background brightness attribute is set
  4. The source pixel is clear and the background brightness attribute is clear
  5. The pixel to be drawn is out of the visible range (not in the 640×480 window)

If a pixel is in the bright state the most significant bit on the red, green and blue components  comes from the attribute, otherwise it is always zero.

Well, that is quite a bit of explanation. Hopefully it makes some sense. Because I’m quite pleased with the code – the first time I can say that about my VHDL – the code is available is in my github account. Note that I have only committed the core VHDL file and the testbench, which is definitely a work in progress, along with the 3 font files in MIF format. The design is not targetted at a particular FPGA and it should be possible to use the design with most FPGAs. Tweaking the code to use different memory arrangements (eg. one large 512KB SRAM) should not be difficult.

I’ll end with a couple of pictures. First up an image from the bitmap mode:

This image, originally a JPG, was converted to the BMP format, and a simple reader was written to copy the contents of the file’s pixel data to the DATA register on the VGA controller. The only chore involved in doing this is that BMP data is arranged with the origin at the bottom left, which means lines have to be loaded into memory in reverse. I’ve also managed to break yet another USB Compact Flash reader so will have to buy another one before I can load data via Compact Flash. Fortunately the two colour 640×480 BMP file is just small enough to fit within the two 32KB EEPROMs present on the MINI000 board. At some point I need to write a file transfer routine that can transfer files across a UART channel, probably using the same file transfer server that I wrote for MAXI09OS.

Here is a picture showing the attribute byte in use:

At some point I will modify the 68K code to make use of coloured text when running monitor commands. For instance, different coloured text for each column in the dump command’s output might look nice.

All in all, I’m very pleased with how my VGA implementation has matured and become much more dependable. I could spend still more time working on improving it further, but I think instead the next thing to do is to work on my Surface Mount Technology soldering skills…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.