Notes on changes and improvements to control

Posted on Fri 12 July 2024 in MUPS16

Random notes on things I've noticed over the last few weeks that could be improved:

PC

I think I made a mistake with the j and jal instructions. They currently work like MIPS, and leave the top 4 bits of PC unchanged, replacing the bottom 12 bits with the 11-bit immediate shifted left. The problem is that this means that jumps cannot cross a 4KB boundary. That's been fine with the small bits of code I've written so far, because they all sat within a single 4KB chunk, and I knew whether it was being crossed, but, having spent a few evenings working on the LLVM backend, this is going to be a problem. We don't know where the code will end up once the linker has finished, so unless we want to support relaxation we have to use the full jr or jalr forms anyway.

RISCV uses different semantics, and a j is PC-relative. If I used that instead it would means the jump is always valid, and you can jump anywhere within the offset's distance from PC. You still probably can't use this for function calls outside the current translation unit, since they may end up more than the max offset difference apart, but it would still be useful for control flow or intra-object-file calls. This would be trivial to implement (it's just a microcode change). It also has the nice effect that we no longer need to feed the immediate unit inputs directly into the PC, since it would go via the ALU to add to PC. This would simplify the PC/Imm board a bit (and decouple the two units nicely).

Registers

If we're changing the PC anyway, another simplification would be to change the contract of the system registers, and perhaps dedicate one of them to holding an argument when an exception handler is invoked. Currently, the exception handling procedure is that the hardware takes the exception index (between 0 and 16), shifts it left by one and then jumps to the resulting address. This means we need a 32-byte jump table in the first page of RAM in every process. One alternative would that we always jump to address 0x0, but we pass the exception index in, say, rs4. The handler at 0x0 could then just use the value in the register to decide where to jump. This won't really save much space in memory, but it would simplify the PC circuit even more, as there would then only remain two ways for it to be set: by incrementing itself, or reading from the data bus. The main downside is that it would mean that rs4 would not be available to any other code, as it could be clobbered at any time if an interrupt or fault happened.

Also falling out of the LLVM work, it might be worth expanding the register set. The main restriction is that we can only address 8 registers with the three bits we've got in the instruction formats, so we can't have more than 8 general-purpose registers. We do, however, have a few spare instructions. We could conceivably have, for example, a special base-pointer (bp) register that can be set to the top of the stack, and a ldbp $rdd, $imm instruction that loads from an 8-bit offset from that register. This would give the compiler a bit more flexibility in how it refers to local variables. At the moment we can only offset between -15 and +16 bytes from a register, which means that realistically, many stack loads and stores are going to require the full three-instruction liu/lui/lw triple to put a 16-bit value into a register then read from the memory it points to. I'm not sure it's necessarily worth it, but it'll be interesting to see how much it matters. Of course, another alternative that doesn't require any more registers would be to add a three-register form of lw/sw that computes the address as $rs1 + $rs2, instead of $rs1 + $imm. This would mean we could reach within +-128 bytes of the stack pointer with two instructions (li r1, 100 to load an 8-bit signed offset, then lw r2, r1, sp to read from r1 + sp).