Register Unit

Overview

The MUPS/16 CPU has 16 registers in total, each 16-bit, but because the instruction encoding only has 3 bits for the source and destination registers only the first 8 registers are addressable in most instructions. The upper 8 are system registers, and can only be manipulated by certain instructions (in particular, the mfs and mts instructions, which move values between the lower and upper registers).

The 8 user registers are:

Index Mnemonic Description
0 r0 Zero register. Always zero, no matter what value is written to it
1-4 r1-r4 General-purpose user registers
5 sp Stack pointer
6 r5 General-purpose user register
7 ra Return address register. Return address is saved in this register by jal and jalr instructions

The 8 system registers are:

Index Mnemonic Description
8 spc Saved PC. The pc register is copied to this register before a trap handler is invoked
9-11 s1-s3 Used to transfer arguments to trap handlers, and to pass arguments to the rfe instruction
12 tsp Task state pointer. Has no architectural significance, but intended to hold a pointer to a memory area in which current process state can be saved on a trap
13 tmp Temporary register. Used by the CPU to hold intermediate results in some instructions
14 pc Program counter. Contains the address of the next instruction to execute
15 flg State flags. Used to hold the privilege level (user or system), whether the boot process is complete, and the current IRQ mask

The odd split between r1-r4 and r5 is an artifact of the original design, which had pc as a user-addressable register at index 5 (and indeed, the PCB for the register board still shows that). When the PC was made into a system register on the control board (so that it could auto-increment without needing the ALU) it freed up a slot for another general-purpose register.

Implementation Details

The register file is implemented on two boards. Registers 0 to 13 are on the register boards, which is made up of 4 major parts:

  • control-signal decoding logic
  • reset logic, to initialise all registers to zero
  • some single-byte and sign-extension logic (used when loading a single byte from memory into a register)
  • the flipflops and output drivers for each of the 14 registers

The pc and flg registers are separate, and implemented in the control unit, as they need extra connections for their inputs and outputs.

There are four buses in the main register board: the external data input bus and A and B output buses, and an internal bus. The reset and sign extension units drive the internal data bus, and all registers read their content from it. The sign extension unit is the only part that actually reads the external data bus, and either passes the data through to the internal bus or modifies it as necessary.

Overall schematic

Schematic

There is also a PDF of this image, and the original Kicad schematic file will be put on Github once I've fixed the known problems.

Control signal decoding logic

The register unit has 17 control lines:

  • ~SET: active-low signal. If low, causes the register with the index given in the four SETIDX lines to latch a new value on the next clock.
  • ~OUTA: active-low signal. If low, causes the register with the index given in the four OUTAIDX lines to output their value to the A bus.
  • ~OUTB: same as above, for the B bus.
  • ~SEBYTE: active-low signal. If low, causes only the lower 8 bits of the data bus to be used on the next latch. The contents of the upper 8 bits is determined by the ~SEUNSIGNED signal.
  • ~SEUNSIGNED: active-low signal. If this is low, and ~SEBYTE is low, then the upper 8 bits latched on the next clock cycle will all be zero. If high, then sign-extension is done from bit 7 of the data bus.
  • ~RESET: active-low signal. If low, causes all the registers' SET signals to go low, and also asserts all zeros onto the internal data bus. This will cause all registers to reset to zero on the next clock cycle.

Schematic

This is fairly simple logic. Most of the work here is in deciding which registers should have their ~SET, ~OUTA or ~OUTB lines set. Each of these lines is active-low (the naming is misleading on the schematic, as I didn't indicate that they were active-low. I may go back and fix that). Since we have 14 registers, we have 14 output lines for each of the three signals.

As each of the set/outa/outb indices is a 4-bit value the logical solution would be to use a 4-16 demultiplexer, but I couldn't find any of these in the HCT logic series. Fortunately, however, the 74HCT138 3-8 demultiplexer has a nice feature that makes it trivial to chain two together to make a single 4-16 decoder: each 138 chip has 3 separate enable lines, two of which are active-low and one of which is active-high. If any of the three is not set then the chip will output high on all outputs, regardless of the value of the select bits. The truth table looks something like:

E1 ~E2 ~E3 S0 S1 S2 Output
L X X X X X HHHHHHHH
X H X X X X HHHHHHHH
X X H X X X HHHHHHHH
H L L L L L LHHHHHHH
H L L H L L HLHHHHHH
H L L L H L HHLHHHHH
H L L H H L HHHLHHHH

etc.

Since our outputs are all active-low this means that if any of the enable lines are disabled all the outputs will be off.

To chain two demultiplexers together we simply connect the 4th bit of the select lines to ~E3 on the first chip, and to E1 on the second. If the input index is between 0 and 7 then bit 4 will be zero, and the first demux is active to output low on one of the output lines 0-7, while the second demux is 'inactive', and will output high on outputs 8-15. If the input index is between 8 and 15 then bit 4 will be high, and we get high on outputs 0-7 and low on one of 8-15.

We also take advantage of one of the remaining unused enable lines on each chip to handle the overall ~SET, ~OUTA and ~OUTB lines. By wiring the appropriate line to ~E2 on both 138 chips in a pair, we can ensure that the demuxes do not output low on any output line unless the overall signal is low (asserted).

There is also a solitary NOT gate here, to provide an inverted RESET signal. We use this inverted value in the sign-extension unit below.

Reset logic

When the register unit is first powered on it, each register will hold random values. This is a particular problem for the PC register, as we don't really want to start executing at a random address. We use the reset logic here to ensure that all registers can be set to zero by setting ~RESET low and pulsing the clock.

The first part of the reset logic works by setting all registers' ~SET lines low if the ~RESET signal is low. We do that by simply ANDing each of the ~SETx outputs from the control decoding logic above with the ~RESET signal:

Schematic

The second part of reset logic ensures that the internal data bus is set to zero while ~RESET is low. We do this with a pair of 74HCT541 line drivers, each of which has all its inputs tied to GND, and ~RESET wired to its active-low G2 input:

Schematic

Note that there's an obvious optimisation here, which I probably should have taken: we don't really need to zero out all registers on a reset. We could have just zero'ed out the PC register, and let software take care of setting the other registers before they're used. This would have saved 4 74HCT08 chips (we could just use a pair of the tiny 74HCT1G08 chips, one for the ~SETF6 signal and one for sign-extension).

Sign extension logic

The sign extension logic is fairly simple:

Schematic

The byte and sign-extension logic only potentially affects the upper 8 bits of the values off the data bus, and sign-extension is only done if ~SEBYTE is low. This is implemented by having a pair of quad 2-1 74HCT157 multiplexers, with the ~SEBYTE line connected to the select line of each. This has the effect of either passing the input value off the data bus through if ~SEBYTE is high, or outputting the sign bit otherwise.

The sign bit is computed simply by ANDing together the ~SEUNSIGNED line and bit 7 from the data bus. If ~SEUNSIGNED is low (asserted) then the sign bit will always be zero; if it is high then the value of bit 7 is used as the sign bit.

The last part of this unit is a 74HCT541 line driver, which is there to ensure that we don't attempt to drive the internal data bus at the same time as the reset logic is doing so. The output of the two 157 multiplexers is fed through the line driver, which is enabled by the inverted reset signal, so that it drives the bus only when the reset unit isn't doing so.

I'm not 100% sure that the byte and sign-extension logic really belongs in the register unit: there will also be some sign extension needed on immediate value output from the control unit, so perhaps it would have been better to have a dedicated sign extension unit, but for now it's here.

Flipflops and output drivers

Single register schematic

PDF of this image.

This is the guts of the whole register unit. Each register consists of a pair of 8-bit 74HCT377 flipflops, and four 74HCT541 bus output drivers. These are grouped into one set of three chips for the low 8 bits and another for the high 8 bits, which simplified the PCB layout significantly.

Each flipflop is connected to the relevant bits of the internal data bus. The 377 chips do not have tri-state outputs, so their output is directly connected to two line drivers, each driving either the A or B buses.

Single register

Each register has three control lines:

  • a SET line to control whether they latch a new value on the next clock.
  • an OUTA and OUTB line that controls whether the line drivers should drive the relevant bus, or remain in a high-impedance state.

Since the flipflops are always outputting they are also directly connected to 8 LEDs, to diplay their contents. The LEDs used here are low-power, high-brightness green Wurtz diodes. I was a little concerned that driving these directly from the flipflops might cause problems, but so far it doesn't seem to have done. The LED current is limited by 4.7KΩ resistors, in neat little 4-unit resistor networks.

PCB layout

The PCB is a two-layer, 300mm by 150mm board, with a ground fill on the bottom layer. I could have made it slightly smaller, but I liked the nice round numbers of these dimensions, and it should make sizing the remaining boards in the CPU easier. I would love to have gone to 4 layers for this, as it would have massively simplified routing (getting power and ground everywhere it was needed made the signal traces far more complicated than I'd like), but the cost of a 4-layer board this large was prohibitive.

Routing was done by hand; I didn't even try using an auto-router for this. Since there is a lot of repetition in the layout I used the excellent replicate layout plugin. With this I could lay out a single register as I wanted it and then duplicate the layout for the other 12 registers. The only restriction was that the process of updating the other registers after a change would clobber any other traces or silkscreening that fell within the bounding rectangle containing all the components for the dependent registers. This meant that I had to have the layout of the registers pretty much perfect before adding the long top-to-bottom traces carrying the buses and control lines. In practice this wasn't a major problem, and given how much time the plugin saved me, a very minor inconvenience.

Problems

As this was only the second PCB I've ever designed I'm fully expecting the design to have issues, especially considering the length of some of the traces and the incomplete ground fill, but thus far testing has gone pretty well. There are three problems that I know of so far:

  • I accidentally swapped lines 14 and 15 of the data bus on the input to the sign extension multiplexer, so those two bits are swapped when reading data. Fortunately, since this isn't an internal connection I can work around this by swapping the wires connecting those lines to the external data bus.
  • More seriously, there's a transient(ish) issue with bit 11 getting stuck on when reading, affecting all registers. This is almost certainly down to my shoddy soldering, rather than a design problem, but I've not nailed it down yet. I suspect a bad joint that's leaving an input to one of the register output multiplexers floating somewhere.
  • There's also a cosmetic error in the silk-screen layer, with the four multiplexers at the top that control the output lines for each register incorrectly labelled as SET DECODING.