LLVM Development Diary

This is just a general dump of notes I make as I work on implementing an LLVM backend for MUPS/16.

20240711

No progress, but some reading on the train:

https://lwn.net/Articles/276782/ - Ian Lance Taylor's (author of the Gold linker) series on how linkers work
http://www.staroceans.org/e-book/LinkersAndLoaders.pdf - the classic Linkers and Loaders book. I used to have a copy of this book, but I think it got loaned to the library at work and never came back. Might have to order another copy.

20240709

OK, sometimes it's better to step back and not just keep hacking at things when you're tired. Thinking a bit more about the address problem, of course I can't do what I was trying to do (resolve addresses to constant loads at compile time). Things like global variables don't really have an address until they're fully linked, since we don't know exactly where in the .data segment they will end up (that depends on what other translation units define), or even necessarily what the address of the instruction that's referencing the variable is (again, that depends on how objects are linked together). It seems blindingly obvious in retrospect, but that's why both the MIPS and Sparc backends jump through hoops to emit custom assembler directives wrapping the address. For Sparc, what gets generated is something like:

sethi %hi(msg),%o1
or    %o1,%lo(msg),%o1

where %hi and %lo are assembler operators that emit relocations for their parameter (of type R_SPARC_HI22 and R_SPARC_LO10 respectively). The linker then takes care of extracting the high 22 or low 10 bits of the final address once it's known. More info on relocation types.

So, I guess I need to do something similar, and I will need those custom node types after all. Hopefully it's a bit simpler now I think I understand why it's doing what it's doing.

Plan:

add a new custom instruction selection DAG node (other targets call this Wrapper - I prefer something a little more descriptive, like Address. We'll see.)
add a pair of type flags that can be associated with the address in Mups16MCExpr (which doesn't exist yet)
implement a printImpl function in this new class that looks at the type of the operation and if it's one of the new ones, wraps the expression in either %hi() or %lo() (not sure about the naming, but these seem good enough). Presumably this will also require some support in the assembly parsing code, too
change the address lowering functions to extract the address and convert it to a TargetXXAddress twice, once with a with a 'hi' flag, and once with a 'low' one.
change the return of the address lowering functions to be an SDNode with the new MUPS16ISD::Address type

20240708

Progress, of a sort. I now have my test function compiling and producing assembly output. The output is wrong, but at least we're getting somewhere.

The main change is I started looking at the Sparc backend for inspiration, instead of the MIPS one. There are so many ways of expressing things in tablegen that even though they're doing largely the same thing with large immediates (splitting into low and high parts, and using multiple instructions to assemble the value into a register). The main difference between what they do and what Mups16 does is that in both Sparc and MIPS architectures, the 'load high' instruction (lui for MIPS, sethi for Sparc), zeros out the bottom part of the target register. This means they use something like (assuming I can remember my Sparc syntax):

sethi 0xA,%r5    // Load 0x0A into high part
ori 0x1,%r5,%r5  // Load 0x01 into low part

That said, the basic idea should be the same. While I was digging, I thought I understood the tablegen matching a bit better, and perhaps I could actually do this without lots of wrappers and custom code, since a lot of that seemed to be about supporting various code models, PIC, etc., none of which I'm trying to support.

So, the next step was to simplify the loading of normal, non-address immediates, using this idea, to see if it was viable. Changing the immediate definition to

def : Pat<(i16 imm:$imm), (LUI (LIU (LO8 imm:$imm)), (HI8 imm:$imm))>;

seems to actually work for actual integer constants, without needing a pseudo-instruction at all.

So, I tried changing my Mups16TargetLowering::LowerGlobalAddress function back to just outputting DAG.getTargetGlobalAddress(...) directly (no ISD::Wrapper), and tried doing exactly the same for tglobaladdr. No joy, though:

def : Pat<(i16 tglobaladdr:$in), (LUI (LIU (LO8 tglobaladdr:$in)), (HI8 tglobaladdr:$in))>;

This doesn't seem to get matched. The generated IR just tries to use the address directly in a lw instruction, and dies producing assembly output. Last IR dump before failure:

# *** IR Dump After Live DEBUG_VALUE analysis ***:
...
renamable $r1 = LW @print.msg, 0 :: (dereferenceable load 2 from @print.msg)

and the error:

        .globl  print                           ; -- Begin function print
        .p2align        1
        .type   print,@function
print:                                  ; @print
; %bb.0:                                ; %entry
        sw -2($sp), $r5
        addi $r5, $sp, -2
        addi $sp, $sp, -4
        lw $r1, 0($llc: /home/ali/git/llvm-project/llvm/include/llvm/MC/MCInst.h:65: unsigned int llvm::MCOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed.

with llvm::Mups16InstPrinter::printMemOperand further in the stack. I get exactly the same error if my pattern is removed, too, which means it's not being used.

Thinking about this a bit more, though, I wonder if I'm barking up the wrong tree. Can we actually know the value of a global at compile time? Surely that's deferred until the link stage? In which case this approach just can't work.

20240706

Following on from yesterday, some progress on globals. It seems that what we need is something that matched the global address and turns it into a LI/LUI pair. It looks like tablegen turns the global address into a tglobaladdress, so in theory it seems like we should be able to just define a pattern that matches i16 tglobaladdr:$addr and converts to LoadImm tglobaladdr:$addr (my pseudo-instruction that resolves to a LI/LUI pair). I didn't try that, since https://discourse.llvm.org/t/a-few-questions-from-a-newbie/12771 suggests that it will result in a cycle, so I followed what every other target does and added C++ code to match GlobalAddress and convert it to a Wrapper node in the DAG that contains the global address.

Added to Mups16TargetLowering::LowerOperation:

  case ISD::GlobalAddress:    return LowerGlobalAddress(Op, DAG);

and added a Mups16TargetLowering::LowerGlobalAddress that creates a Mups16ISD::Wrapper node wrapping the address.

This gets further now, failing with:

llc: /home/ali/git/llvm-project/llvm/include/llvm/CodeGen/MachineOperand.h:536: int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed.

Re-running with --debug-pass=Details to see where it's failing gives

[2024-07-06 09:44:32.146747150] 0x55a0b6de7220     Executing Pass 'Post-RA pseudo instruction expansion pass' on Function 'print'...
0x55a0b6dec930       Required Analyses: Machine Module Information
llc: /home/ali/git/llvm-project/llvm/include/llvm/CodeGen/MachineOperand.h:536: int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed.

So, it's a lot further on. Running with --print-after-all, this is the last dump (and presumably what's getting fed into the failing pass):

# *** IR Dump After Prologue/Epilogue Insertion & Frame Finalization ***:
# Machine code for function print: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten
Frame Objects:
  fi#0: size=2, align=2, at location [SP-2]

bb.0.entry:
  successors: %bb.1(0x80000000); %bb.1(100.00%)

  frame-setup SW $sp, -2, $r5
  $r5 = frame-setup ADDI $sp, -2
  $sp = frame-setup ADDI $sp, -4
  renamable $r1 = LoadImm @print.msg
  renamable $r1 = LW killed renamable $r1, 0 :: (dereferenceable load 2 from @print.msg)
  SW $r5, 0, killed renamable $r1 :: (store 2 into %ir.ptr)
  J %bb.1

bb.1.while.cond:
; predecessors: %bb.0, %bb.2
  successors: %bb.2, %bb.3

  renamable $r1 = LW $r5, 0 :: (dereferenceable load 2 from %ir.ptr)
  renamable $r1 = LBU killed renamable $r1, 0 :: (load 1 from %ir.1)
  BZ killed renamable $r1, %bb.3
  J %bb.2

bb.2.while.body:
; predecessors: %bb.1
  successors: %bb.1(0x80000000); %bb.1(100.00%)

  renamable $r1 = LW $r5, 0 :: (dereferenceable load 2 from %ir.ptr)
  renamable $r1 = LBU killed renamable $r1, 0 :: (load 1 from %ir.3)
  renamable $r5 = LoadImm 2560
  SB killed renamable $r1, killed renamable $r5, 0 :: (store 1 into `i8* inttoptr (i16 2560 to i8*)`)
  J %bb.1

bb.3.while.end:
; predecessors: %bb.1

  $sp = ADDI $r5, 0
  $r5 = LW $r5, -4
  RetRA

# End machine code for function print.

The pass name and error together suggest that the problem might be a pseudo-instruction that expects an immediate and isn't getting one. Perhaps it's that renamable $r1 = LoadImm @print.msg? Actually, looking more closely, that seems almost certain. The stack trace has

#11 0x00005586f645384d llvm::Mups16InstrInfo::expandLoadImm(llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>)

Catching this assertion failure in GDB, we have

(gdb) p *this
$1 = {OpKind = 10, SubReg_TargetFlags = 0, TiedTo = 15, IsDef = 1, IsImp = 1, IsDeadOrKill = 1, IsRenamable = 1, IsUndef = 1, IsInternalRead = 1, IsEarlyClobber = 1, IsDebug = 1, SmallContents = {
    RegNo = 0, OffsetLo = 0}, ParentMI = 0x5555599ced00, Contents = {MBB = 0x5555599967f0, CFP = 0x5555599967f0, CI = 0x5555599967f0, ImmVal = 93825063806960, RegMask = 0x5555599967f0,
    MD = 0x5555599967f0, Sym = 0x5555599967f0, CFIIndex = 1503225840, IntrinsicID = 1503225840, Pred = 1503225840, ShuffleMask = {Data = 0x5555599967f0, Length = 140733193388032}, Reg = {
      Prev = 0x5555599967f0, Next = 0x7fff00000000}, OffsetedInfo = {Val = {Index = 1503225840, SymbolName = 0x5555599967f0 " \202\232YUU", GV = 0x5555599967f0, BA = 0x5555599967f0}, OffsetHi = 0}}}
(gdb)

and sure enough, OpKind of 10 means GlobalAddress, not Immediate:

(gdb) ptype MachineOperandType
type = enum llvm::MachineOperand::MachineOperandType : unsigned char {llvm::MachineOperand::MO_Register, llvm::MachineOperand::MO_Immediate, llvm::MachineOperand::MO_CImmediate,
    llvm::MachineOperand::MO_FPImmediate, llvm::MachineOperand::MO_MachineBasicBlock, llvm::MachineOperand::MO_FrameIndex, llvm::MachineOperand::MO_ConstantPoolIndex,
    llvm::MachineOperand::MO_TargetIndex, llvm::MachineOperand::MO_JumpTableIndex, llvm::MachineOperand::MO_ExternalSymbol, 
    // ----- here ----->
    llvm::MachineOperand::MO_GlobalAddress,
    // <----- here -----
    llvm::MachineOperand::MO_BlockAddress, llvm::MachineOperand::MO_RegisterMask, llvm::MachineOperand::MO_RegisterLiveOut, llvm::MachineOperand::MO_Metadata, llvm::MachineOperand::MO_MCSymbol,
    llvm::MachineOperand::MO_CFIIndex, llvm::MachineOperand::MO_IntrinsicID, llvm::MachineOperand::MO_Predicate, llvm::MachineOperand::MO_ShuffleMask, llvm::MachineOperand::MO_Last = 19}

We obviously need to convert the address to a value somehow. Alternatively, I wonder if this is because I'm trying to convert the address to a LoadImm instruction, which is a pseudo-instruction, and somehow this is off the beaten path?

Two options suggest themselves:

Try explicitly handling address types in the Mups16InstrInfo::expandLoadImm function
Try Mips-style explicitly building the address from high and low parts;

The first looks like less work, so I'll try that first.

Update: first option seems like a bit of a dead end. Firstly, no other target does this, which suggests maybe it's a bad idea. Secondly, there doesn't seem to be a way to extract the actual constant value out of a GlobalValue. The best I can see is getUniqueInteger(), but that only works if the type is exactly ConstantInt or a vector of them. Time to try the second option.

20240705

Getting back to this after a few years (!) while I wait for my new control unit boards to arrive, and making some quick notes on how to build LLVM, since I've forgotten everything.

Building

cd git
git clone git@github.com:pottsali/llvm-project.git
cd llvm-project
gco feature/mups16

Configure:

mkdir build
cd build
cmake ../llvm

Build everything (could probably skip this stage and only build the Mups16 backend, but I can kick it off and let it build over lunch).

cmake --build . -j8

Linking failed first time around with OOM errors, so trying again without -j to re-do the link step one at a time. Looks like linking libLTO.so alone requires ~25GB RAM. Linking without -j worked.

Building with better options, including clang:

cmake -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_PROJECTS="clang" -D LLVM_TARGETS_TO_BUILD="Mips" -D LLVM_EXPERIMENTAL_TARGETS_TO_BUILD="Mups16" -DLLVM_OPTIMIZED_TABLEGEN=On ../llvm
cmake --build . -j8

This just builds the Mups16 and MIPS targets.

You can also just build llc for slightly faster iteration:

cmake --build . --target llc -j8

Running

With that built, I get a promising failure in the assembler when compiling

unsigned short* const CONSOLE=(unsigned short*const)0xA00;
void print()
{
    *CONSOLE = 'H';
}

into object code:

$ ./bin/clang -target mups16 -c ~/git/cpu/cpp/target/hello_world/main.c
/tmp/main-b5d338.s: Assembler messages:
/tmp/main-b5d338.s:7: Error: no such instruction: `sw -2($sp),$fp'
/tmp/main-b5d338.s:8: Error: no such instruction: `addi $fp,$sp,-2'
/tmp/main-b5d338.s:9: Error: no such instruction: `addi $sp,$sp,-4'
/tmp/main-b5d338.s:10: Error: no such instruction: `sw -2($fp),$r2'
/tmp/main-b5d338.s:11: Error: no such instruction: `li $r1,72'
/tmp/main-b5d338.s:12: Error: no such instruction: `liu $r2,0'
/tmp/main-b5d338.s:13: Error: no such instruction: `lui $r2,10'
/tmp/main-b5d338.s:14: Error: no such instruction: `sw 0($r2),$r1'
/tmp/main-b5d338.s:15: Error: no such instruction: `lw $r2,-2($fp)'
/tmp/main-b5d338.s:16: Error: no such instruction: `addi $sp,$fp,0'
/tmp/main-b5d338.s:17: Error: no such instruction: `lw $fp,-4($fp)'
/tmp/main-b5d338.s:18: Error: no such instruction: `jr 0($ra)'
clang-11: error: assembler command failed with exit code 1 (use -v to see invocation)

Obviously I got a bit further 4 years ago than I remembered. Trying to expand this to a loop failed, though:

unsigned short* const CONSOLE=(unsigned short*const)0xA00;

void print()
{
    static const char* msg = "Hello, world!";
    const char* ptr = msg;
    while (*ptr)
    {
        *CONSOLE = *ptr;
    }
}

$ ./bin/clang -target mups16 -c ~/git/cpu/cpp/target/hello_world/main.c -emit-llvm
        .text
        .file   "main.c"
LLVM ERROR: Cannot select: t1: i16 = GlobalAddress<i8** @print.msg> 0
In function: print
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: bin/llc -O0 -march=mups16 -relocation-model=static -filetype=asm main.bc -o -
1.      Running pass 'Function Pass Manager' on module 'main.bc'.
2.      Running pass 'Mups16 DAG->DAG Pattern Instruction Selection' on function '@print'
...

Debugging

Build IR with clang first:

$ ./bin/clang -target mups16 -c ~/git/cpu/cpp/target/hello_world/main.c -emit-llvm

Then compile with llc:

bin/llc -O0 -march=mups16 -relocation-model=static -filetype=asm main.bc -o -debug --debug-only=isel

For some reason running with -debug doesn't actually show any debugging. I'm not sure why, but it seems that --debug-only=isel gives a bit more of what I want:

===== Instruction selection begins: %bb.0 'entry'
ISEL: Starting selection on root node: t8: ch = br t6, BasicBlock:ch<while.cond 0x56124dbca610>
ISEL: Starting pattern match
  Morphed node: t8: ch = J BasicBlock:ch<while.cond 0x56124dbca610>, t6
ISEL: Match complete!

ISEL: Starting selection on root node: t6: ch = store<(store 2 into %ir.ptr)> t4:1, t4, FrameIndex:i16<0>, undef:i16
ISEL: Starting pattern match
  Initial Opcode index to 82
  Skipped scope entry (due to false predicate) at index 92, continuing at 108
  Morphed node: t6: ch = SW<Mem:(store 2 into %ir.ptr)> TargetFrameIndex:i16<0>, TargetConstant:i16<0>, t4, t4:1
ISEL: Match complete!

ISEL: Starting selection on root node: t4: i16,ch = load<(dereferenceable load 2 from @print.msg)> t0, GlobalAddress:i16<i8** @print.msg> 0, undef:i16
ISEL: Starting pattern match
  Initial Opcode index to 4
  Skipped scope entry (due to false predicate) at index 13, continuing at 29
  Skipped scope entry (due to false predicate) at index 30, continuing at 46
  Morphed node: t4: i16,ch = LW<Mem:(dereferenceable load 2 from @print.msg)> GlobalAddress:i16<i8** @print.msg> 0, TargetConstant:i16<0>, t0
ISEL: Match complete!

ISEL: Starting selection on root node: t7: ch = BasicBlock<while.cond 0x56124dbca610>

ISEL: Starting selection on root node: t1: i16 = GlobalAddress<i8** @print.msg> 0
ISEL: Starting pattern match
  Initial Opcode index to 0
  Match failed at index 0
LLVM ERROR: Cannot select: t1: i16 = GlobalAddress<i8** @print.msg> 0

Still not sure why, but at least there's something to go on.

OK, after a couple of hours of digging, I think this is what is going on. In Mups16ISelLowering.cpp, we have

setOperationAction(ISD::GlobalAddress,    MVT::i16,   Custom);

This indicates that we'll handle the lowering of global addresses ourselves, in the Mups16TargetLowering::LowerOperation function. However, this is the current contents of that function:

SDValue Mups16TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
    switch (Op.getOpcode())
    {
    }
    return {};
}

Not only does this not handle anything, but it doesn't even fail. Changing this to

SDValue Mups16TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
  switch (Op.getOpcode())
  {
    default:
      llvm_unreachable("unimplemented operand");
  }
  return {};
}

confirms that we're failing here:

$ bin/llc -O0 -march=mups16 -relocation-model=static -filetype=asm main.bc -o -debug --debug-only=all
unimplemented operand
UNREACHABLE executed at /home/ali/git/llvm-project/llvm/lib/Target/Mups16/Mups16ISelLowering.cpp:327!

I don't think we actually need to do anything special with global addresses. We don't have a global data pointer to use via offsets or anything nice like that, so maybe I can just change the Custom to Expand and let it just load literals?

Update: nope. Same error we started with. Looking at all the other targets, they all use Custom, so I guess I'll have to write some code. Not today.

20201010

Spent a while trying to work out how to load large (>255) constants into registers. The instruction sequence that needs to be generated is simple. For 0x4280 we need:

liu $r1, 0x80
lui $r1, 0x42

I thought this would be an easy pattern to express, but couldn't find a way to do it in tablegen.

I looked at what other RISC-y backends do, and they all work in roughly the same way, with a large immediate load expanding to something equivalent to

lui $r1, 0x42
ori $r1, $r1, 0x80

This works because they're using 32-bit instructions and can encode a halfword in the immediate field of their ori instruction. I can't, with only 5 bits available.

In order to load a register in two cycles I have to use two single-register instruction, as they're the only ones with 8-bit immediates, and my lui instruction has to have slightly unusual semantics, in that it leaves the low 8 bits untouched (most instruction sets that have a lui set the low 8 bits to 0). This works nicely, but the problem now is that the lui instruction effectively uses its register operand as both a source and destination (what it does is effectively reg = (reg & 0xff) | (imm << 8) in a single instruction). I couldn't find a way to express this cleanly in the .td file.

My first attempt was a basic pattern:

def : Pat<(i16 imm:$imm), (LUI (LIU (LO8 imm:$imm)), (HI8 imm:$imm))>;

where LO8 and HI8 are just SDNodeXForms that extract the low and high byte respectively. This doesn't work, as the lui node isn't defined to take an input register. I played around with a few variations on this, before deciding that I'd just use a pseudo-instruction and expand it in code.

def LoadImm:    MupsPseudo<(outs IntReg:$rdd), (ins imm16:$imm),
                            [(set IntReg:$rdd, imm:$imm)]>;

and expanded in Mups16InstrInfo.cpp as

void Mups16InstrInfo::expandLoadImm(MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const
{
    auto& reg = I->getOperand(0);
    BuildMI(MBB, I, I->getDebugLoc(), get(MUPS::LIU))
        .addReg(reg.getReg())
        .addImm(I->getOperand(1).getImm() & 0xff);
    BuildMI(MBB, I, I->getDebugLoc(), get(MUPS::LUI))
        .addReg(reg.getReg())
        .addImm((I->getOperand(1).getImm() >> 8) & 0xff);
}

which seems to work, with the intermediate DAG node %3:intregs = LoadImm 256 expanding to

liu $r2, 0
lui $r2, 1

Next up: try loops again.

20201008

No progress, worked on soldering up the memory board instead

20201007

Spent a couple of frustrating evenings trying to work out why LLVM was failing to match byte loads:

ISEL: Starting selection on root node: t17: i16,ch = load<(load 1 from %ir.0), anyext from i8> t6, t7, undef:i16
ISEL: Starting pattern match
  Initial Opcode index to 4
  Skipped scope entry (due to false predicate) at index 13, continuing at 29
  Skipped scope entry (due to false predicate) at index 30, continuing at 46
  Skipped scope entry (due to false predicate) at index 47, continuing at 61
  Match failed at index 11
LLVM ERROR: Cannot select: t17: i16,ch = load<(load 1 from %ir.0), anyext from i8> t6, t7, undef:i16

I have a pair of byte-loading instructions defined:

def LB  : InstMupsID1<10, (outs IntReg:$rdd), (ins memsrc:$src),                                                                                                                                                                                                                                                                                                   │Included from /home/ali/git/llvm-project/llvm/lib/Target/Mups16/Mups16.td:19:
                      "lb $rdd, $src",                                                                                                                                                                                                                                                                                                                             │/home/ali/git/llvm-project/llvm/lib/Target/Mups16/Mups16InstrInfo.td:334:58: error: expected ')' in dag init
                      [(set IntReg:$rdd, (sextloadi8 addr:$src))]>;                                                                                                                                                                                                                                                                                                │def : Pat<(i16 imm:$imm), (LUI (LIU (LO8 imm:$imm)), HI8 imm:$imm)>;
def LBU : InstMupsID1<11, (outs IntReg:$rdd), (ins memsrc:$src),                                                                                                                                                                                                                                                                                                   │[7/27] Building Mups16GenInstrInfo.inc...
                      "lbu $rdd, $src",                                                                                                                                                                                                                                                                                                                            │FAILED: lib/Target/Mups16/Mups16GenInstrInfo.inc
                      [(set IntReg:$rdd, (zextloadi8 addr:$src))]>;

i.e. there's one for sign-extending loads (sextloadi8) and one for zero-extended loads (zextloadi8).

Wasted a lot of time thinking it was something to do with the instruction having two register arguments instead of reg+imm, but that was just me getting confused by the t6 argument, which I think is just the ch result of the previous instruction, passed in as part of instruction chaining.

Next suspicion was that it was because the load doesn't have a frame index, and that this would somehow screw up matching with the addr operand (defined as def addr: ComplexPattern<iPTR, 2, "selectAddr", [frameindex], []>;), checking the C++ selectAddr function, etc. No joy here.

Finally came across this article about the LLVM backend which had a very useful hint to look in the generated matching table in build/lib/Target/Mups16/Mups16GenDAGISel.inc, to see which predicates were actually failing. Lo and behold, it's pretty obvious:

  static const unsigned char MatcherTable[] = {
/*     0*/ OPC_SwitchOpcode /*9 cases */, 75, TARGET_VAL(ISD::LOAD),// ->79
/*     4*/  OPC_RecordMemRef,
/*     5*/  OPC_RecordNode, // #0 = 'ld' chained node
/*     6*/  OPC_RecordChild1, // #1 = $src
/*     7*/  OPC_CheckPredicate, 0, // Predicate_unindexedload
/*     9*/  OPC_CheckType, MVT::i16,
/*    11*/  OPC_Scope, 16, /*->29*/ // 4 children in Scope
/*    13*/   OPC_CheckPredicate, 1, // Predicate_sextload
/*    15*/   OPC_CheckPredicate, 2, // Predicate_sextloadi8
/*    17*/   OPC_CheckComplexPat, /*CP*/0, /*#*/1, // selectAddr:$src #2 #3
/*    20*/   OPC_EmitMergeInputChains1_0,
/*    21*/   OPC_MorphNodeTo1, TARGET_VAL(MUPS::LB), 0|OPFL_Chain|OPFL_MemRefs,
                 MVT::i16, 2/*#Ops*/, 2, 3,
             // Src: (ld:{ *:[i16] } addr:{ *:[iPTR] }:$src)<<P:Predicate_unindexedload>><<P:Predicate_sextload>><<P:Predicate_sextloadi8>> - Complexity = 13
             // Dst: (LB:{ *:[i16] } addr:{ *:[i16] }:$src)
/*    29*/  /*Scope*/ 16, /*->46*/
/*    30*/   OPC_CheckPredicate, 3, // Predicate_zextload
/*    32*/   OPC_CheckPredicate, 2, // Predicate_zextloadi8
/*    34*/   OPC_CheckComplexPat, /*CP*/0, /*#*/1, // selectAddr:$src #2 #3
/*    37*/   OPC_EmitMergeInputChains1_0,
/*    38*/   OPC_MorphNodeTo1, TARGET_VAL(MUPS::LBU), 0|OPFL_Chain|OPFL_MemRefs,
                 MVT::i16, 2/*#Ops*/, 2, 3,
             // Src: (ld:{ *:[i16] } addr:{ *:[iPTR] }:$src)<<P:Predicate_unindexedload>><<P:Predicate_zextload>><<P:Predicate_zextloadi8>> - Complexity = 13
             // Dst: (LBU:{ *:[i16] } addr:{ *:[i16] }:$src)
/*    46*/  /*Scope*/ 14, /*->61*/
/*    47*/   OPC_CheckPredicate, 4, // Predicate_load
/*    49*/   OPC_CheckComplexPat, /*CP*/0, /*#*/1, // selectAddr:$src #2 #3
/*    52*/   OPC_EmitMergeInputChains1_0,
/*    53*/   OPC_MorphNodeTo1, TARGET_VAL(MUPS::LW), 0|OPFL_Chain|OPFL_MemRefs,
                 MVT::i16, 2/*#Ops*/, 2, 3,
             // Src: (ld:{ *:[i16] } addr:{ *:[iPTR] }:$src)<<P:Predicate_unindexedload>><<P:Predicate_load>> - Complexity = 13
             // Dst: (LW:{ *:[i16] } addr:{ *:[i16] }:$src)
/*    61*/  0, /*End of Scope*/

Matching to the Skipped scope entry (due to false predicate) at index XX messages this gives a pretty clear picture:

/*    13*/   OPC_CheckPredicate, 1, // Predicate_sextload
/*    30*/   OPC_CheckPredicate, 3, // Predicate_zextload
/*    47*/   OPC_CheckPredicate, 4, // Predicate_load

Further down in the file, we have the predicate-matching code:

  case 1: {
    // Predicate_sextload
    SDNode *N = Node;
    (void)N;
if (cast<LoadSDNode>(N)->getExtensionType() != ISD::SEXTLOAD) return false;
return true;

  }
  // skipping 2...
  }
  case 3: {
    // Predicate_zextload
    SDNode *N = Node;
    (void)N;
if (cast<LoadSDNode>(N)->getExtensionType() != ISD::ZEXTLOAD) return false;
return true;

  }
  case 4: {
    // Predicate_load
    SDNode *N = Node;
    (void)N;
if (cast<LoadSDNode>(N)->getExtensionType() != ISD::NON_EXTLOAD) return false;
return true;
  }

So, these will only match if the extension type is one of SEXTLOAD, ZEXTLOAD or NON_EXTLOAD, but apparently not EXTLOAD (which is what anyext maps to in C++). I naively assumed that anyext would happily match either sign- or zero-extended loads, but apparently it doesn't.

Borrowing from the Lanai backend again I added a standalone pattern for anyext loads that maps to the LBU instruction (I don't think we generally want sign extension unless it's specifically requested):

// Pattern for anyext loads, since they don't seem to match either of the above
def : Pat<(extloadi8  addr:$src), (i16 (LBU addr:$src))>;

and with that the load matched:

ISEL: Starting selection on root node: t17: i16,ch = load<(load 1 from %ir.0), anyext from i8> t6, t7, undef:i16
ISEL: Starting pattern match
  Initial Opcode index to 4
  Skipped scope entry (due to false predicate) at index 13, continuing at 29
  Skipped scope entry (due to false predicate) at index 30, continuing at 46
  Skipped scope entry (due to false predicate) at index 47, continuing at 61
  Morphed node: t17: i16,ch = LBU<Mem:(load 1 from %ir.0)> t7, TargetConstant:i16<0>, t6
ISEL: Match complete!

Next failure is one I knew was coming: loading immediates >= 256 into a register, since it can't be done in a single instruction. Hopefully another simple pattern can fix that.

20201004

Implemented eliminateFrameIndex, and got my first successful compile of a function! Admittedly it was the empty print function, but still.

Spent a while tracking down this error:

Use of %1 does not have a corresponding definition on every path:
48r %1:intregs = LI %1:intregs
LLVM ERROR: Use not jointly dominated by defs.

This should have been moving the constant 42 into $r1 prior to returning it, but somehow the DAG

  t0: ch = EntryToken
      t2: i16,ch = CopyFromReg t0, Register:i16 %0
    t6: ch = store<(store 2 into %ir.str.addr)> t0, t2, FrameIndex:i16<0>, undef:i16
  t9: ch,glue = CopyToReg t6, Register:i16 $r1, Constant:i16<42>
  t10: ch = Mups16ISD::Ret t9, Register:i16 $r1, t9:1

was getting transformed into

  t0: ch = EntryToken
  t7: i16 = LI t7
      t2: i16,ch = CopyFromReg t0, Register:i16 %0
    t6: ch = SW<Mem:(store 2 into %ir.str.addr)> t2, TargetFrameIndex:i16<0>, TargetConstant:i16<0>, t0
  t9: ch,glue = CopyToReg t6, Register:i16 $r1, t7
  t10: ch = RetRA Register:i16 $r1, t9, t9:1

The load of Constant:i16<42> had disappeared, replaced by t7: i16 = LI t7. Turned out to be just that the li instruction was defined as

def LI  : InstMupsIID<13, (outs IntReg:$rdd), (ins imm8:$imm8),
                      "li $rdd, $imm8",
                      [(set IntReg:$rdd, i16:$imm8)]>;

Changing the pattern in the last line to [(set IntReg:$rdd, imm8:$imm8)]>; fixed it.

Further minor problem: sw instructions that are generated from a pattern (rather than the ones created explicitly in the lowering C++ functions) had the operands round the wrong way, leading to an assertion printing the instructions:

Assertion `Disp.isImm() && "Expected immediate in displacement field"' failed.

Easily fixed by swapping the operands in the instruction definition, to match the sw <dest_addr>, <source_reg> layout:

// Incorrect
def SW   : InstMupsI2<17, (IntReg:$rs2, ins memdst:$dst), ...
// Correct
def SW   : InstMupsI2<17, (ins memdst:$dst, IntReg:$rs2), ...

20201003

Progress. Decided to try compiling a completely empty function, just to try to make some progress on the structure. My function now looks like:

static char* const TERM_ADDR=(char*)0x100
void print(const char* str_)
{
}

With this, I immediately get an assertion that I put in my Mups16FrameLowering::emitPrologue function, which is good. Decided to use register 5 (the old PC) as a frame pointer for now, just for simplicity.

Other piece of good news is I found a much simpler example to copy from: the Google Lanai backed. That appears to be a very straightforward 32-bit RISC CPU, and the backend code is very concise. I've adapted its emitPrologue and emitEpilogue functions, and now llc is hanging, which is good news. I think that means I need to implement eliminateFrameIndex, judging from a few comments I've seen in code (mainly in the CPU0 backend, I think). Makes a pleasant change to have a hang instead of a crash.

20201002

No new code today. Spent a couple of hours trying to understand how the prologue/epilogue generation works in LLVM. Still somewhat confused. I'm wondering how this works at all if you don't have a frame register (every example I can see does have one). Maybe I'll have to sacrifice one of my precious general-purpose registers as a frame register. Might turn out to be really useful that I redesigned the program counter as a dedicated circuit and freed up register 6.

Got distracted reading about liveness analysis, after seeing lots of calls to .live_in() and .kill() in various places in the register spilling/popping.

20201001

After much fruitless banging of head against wall, I decided to debug this the old-fashioned way. No, not with printf, the other way: by deleting half the code from the print() function I'm compiling and seeing if my problem goes away. Hopefully I should be able to work out if it's the branch instructions or the load/store ones that're causing the problems.

Removing the branches, so the function is just a straightforward

static char* const TERM_ADDR=(char*)0x100
void print(const char* str_)
{
    *TERM_ADDR = *str_;
}

got me a little further, to a segfault in LowerReturn, which wasn't surprising since I hadn't implemented that function yet. Implemented that based mostly on CPU0 and RISCV examples. Only wrinkle was what to do with the ret instruction, which isn't natively supported. Ended up doing something like CPU0, and adding a ret pseudo-instruction (Mups16ISD::Ret) that's expanded in a post-register-allocation pass by the Mups16InstrInfo::expandPostRAPseudo function into

BuildMI(MBB, I, I->getDebugLoc(), get(MUPS::JR)).addReg(MUPS::RA);

There's probably a simpler way, but this seems to work. Next problem is missing function prologue/epilogue code.

20200923

I'm wondering if the assertion from last night is something to do with LLVM not knowing that my load instruction are actually loads (there's an offhand comment in http://llvm.org/docs/CodeGenerator.html#selectiondag-select-phase that says "We don’t automatically infer flags like isStore/isLoad yet"). Most other backends seem to have implemented the isLoadFromStackSlot function in the XXXInstrInfo.cpp file. I'll try adding that. Straws firmly clutched.

Update: nope.

20200922

Decided to take a break from beating my head against the br_cc problem, and look at load/store instead. Partly just for a change, but also in case it turns out that llvm is trying to match the whole fragment I mentioned yesterday, in which case it will need the load pattern anyway.

First stab at this:

def memsrc : Operand<i16> {
  let PrintMethod = "printMemOperand";
  let MIOperandInfo = (ops IntReg, imm5);
  let ParserMatchClass = MemAsmOperand;
}

def LB  : InstMupsID1<10, (outs IntReg:$rdd), (ins memsrc:$src),
                      "lb $rdd, $src",
                      [(set IntReg:$rdd, (sextloadi8 memsrc:$src))]>;

This doesn't work. It appears that memsrc isn't a valid leaf in the dag:

Unknown leaf kind: memsrc:{ *:[i16] }:$src

Attempt two was to use iPTR for the type:

                      [(set IntReg:$rdd, (sextloadi8 iPTR:$src))]>;

which gives a more interesting error:

LB:     (LB:{ *:[i16] } iPTR:{ *:[i16] }:$src)
Included from /xx/git/llvm-project/llvm/lib/Target/Mups16/Mups16.td:19:
/home/ali/git/llvm-project/llvm/lib/Target/Mups16/Mups16InstrInfo.td:118:1: error: In LB: Instruction 'LB' expects more than the provided 1 operands!

Presumably this is because the instruction is defined to take memsrc as its input operand, which is a compound operand that takes an IntReg, imm5 pair to allow for the constant offset from the source register. Maybe I need to add a custom pattern that can match this?

Later:

Success! Not only on memory loads, but on the br_cc problem from yesterday.

Memory first: adding a ComplexPattern and some code to match the base+offset seems to have worked. In the Mups16InstrInfo.td file:

// Leaf pattern for matching memory addresses
def addr: ComplexPattern<iPTR, 2, "selectAddr", [frameindex]>;

// ...

def LB  : InstMupsID1<10, (outs IntReg:$rdd), (ins memsrc:$src),
                      "lb $rdd, $src",
                      [(set IntReg:$rdd, (sextloadi8 addr:$src))]>;

with a matching Mups16DAGToDAGISel::selectAddr(SDValue Addr, SDValue &Base, SDValue &Offset) worked a treat. This function is called whenever an address is matched, and can set the Base and Offset ref arguments.

The br_cc fix was embarassingly simple: just change the definitions in the legalization DAG in Mups16ISelLowering.cpp to expand br_cc, and make brcond legal:

      setOperationAction(ISD::BR_CC,            MVT::i8,    Promote);
      setOperationAction(ISD::BR_CC,            MVT::i16,   Expand);
      setOperationAction(ISD::BRCOND,           MVT::Other, Legal);

With store patterns added for sw and sb, the selection now gets as far as the second load before failing:

===== Instruction selection begins: %bb.0 'entry'

ISEL: Starting selection on root node: t23: ch = br t26, BasicBlock:ch<if.then 0x7fffed961eb0>
ISEL: Starting pattern match
  Skipped scope entry (due to false predicate) at index 2, continuing at 63
  Skipped scope entry (due to false predicate) at index 64, continuing at 111
  Skipped scope entry (due to false predicate) at index 112, continuing at 161
  Skipped scope entry (due to false predicate) at index 162, continuing at 175
  Morphed node: t23: ch = J BasicBlock:ch<if.then 0x7fffed961eb0>, t26
ISEL: Match complete!

ISEL: Starting selection on root node: t26: ch = brcond t29, t31, BasicBlock:ch<if.end 0x7fffed961f88>
ISEL: Starting pattern match
  Skipped scope entry (due to false predicate) at index 2, continuing at 63
  Skipped scope entry (due to false predicate) at index 64, continuing at 111
  Morphed node: t26: ch = BZ t36, BasicBlock:ch<if.end 0x7fffed961f88>, t29
ISEL: Match complete!

ISEL: Starting selection on root node: t36: i16,ch = load<(dereferenceable load 1 from %ir.b), zext from i8> t29, FrameIndex:i16<1>, undef:i16
ISEL: Starting pattern match
  Skipped scope entry (due to false predicate) at index 14, continuing at 30
Creating constant: t38: i16 = TargetConstant<0>
  Morphed node: t36: i16,ch = LBU<Mem:(dereferenceable load 1 from %ir.b)> TargetFrameIndex:i16<1>, TargetConstant:i16<0>, t29
ISEL: Match complete!

ISEL: Starting selection on root node: t29: ch = store<(store 1 into %ir.b), trunc to i8> t28:1, t28, FrameIndex:i16<1>, undef:i16
ISEL: Starting pattern match
  Skipped scope entry (due to false predicate) at index 2, continuing at 63
  Morphed node: t29: ch = SB<Mem:(store 1 into %ir.b)> TargetFrameIndex:i16<1>, TargetConstant:i16<0>, t28, t28:1
ISEL: Match complete!

ISEL: Starting selection on root node: t28: i16,ch = load<(load 1 from %ir.0), anyext from i8> t10, t7, undef:i16
ISEL: Starting pattern match
  Skipped scope entry (due to false predicate) at index 14, continuing at 30
  Skipped scope entry (due to false predicate) at index 31, continuing at 47
  Skipped scope entry (due to false predicate) at index 48, continuing at 62
  Match failed at index 12
  Continuing at 63
  Match failed at index 64
  Continuing at 111
  Match failed at index 112
  Continuing at 161
  Match failed at index 162
  Continuing at 175
  Match failed at index 176
  Continuing at 193
llc: /home/ali/git/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:900: bool llvm::SelectionDAG::RemoveNodeFromCSEMaps(llvm::SDNode*): Assertion `N->getOpcode() != ISD::EntryToken && "EntryToken in CSEMap!"' failed.

I've no idea what's going on here. Tomorrow's problem.

20200921

Some progress with llvm, and lots of frustration. Good news is that I got my Mups16 instruction selected from an llvm one:

===== Instruction selection begins: %bb.0 'entry'

ISEL: Starting selection on root node: t23: ch = br t37, BasicBlock:ch<if.then 0x7fffc3fd73b0>
ISEL: Starting pattern match
  Morphed node: t23: ch = J BasicBlock:ch<if.then 0x7fffc3fd73b0>, t37
ISEL: Match complete!

Unfortunately, the next instruction is a conditional branch, and I can't for the life of me work out how to match that:

ISEL: Starting selection on root node: t37: ch = br_cc t29, seteq:ch, t36, Constant:i16<0>, BasicBlock:ch<if.end 0x7fffc3fd7488>
ISEL: Starting pattern match
  Initial Opcode index to 0
  Match failed at index 0
LLVM ERROR: Cannot select: t37: ch = br_cc t29, seteq:ch, t36, Constant:i16<0>, BasicBlock:ch<if.end 0x7fffc3fd7488>
  t36: i16,ch = load<(dereferenceable load 1 from %ir.b), zext from i8> t29, FrameIndex:i16<1>, undef:i16
    t12: i16 = FrameIndex<1>
    t5: i16 = undef
  t27: i16 = Constant<0>
In function: print

I've tried what seemed obvious:

def BZ  : InstMupsII1<20, (ins IntReg:$rs1, brtarget:$offset),
                      "bz $rs1, $offset",
                      [(brcond (i16 (seteq IntReg:$rs1, 0)), bb:$offset)]>;

and made sure that the unsupported comparisons are expanded in the legalization DAG:

    setOperationAction(ISD::SETCC,            MVT::i8,    Promote);

    // Valid condcode actions (which comparisons are natively supported by the CPU)
    setCondCodeAction(ISD::SETOLE, MVT::i16, Expand);
    setCondCodeAction(ISD::SETOGE, MVT::i16, Expand);
    setCondCodeAction(ISD::SETOGT, MVT::i16, Expand);
    setCondCodeAction(ISD::SETULE, MVT::i16, Expand);
    setCondCodeAction(ISD::SETUGE, MVT::i16, Expand);
    setCondCodeAction(ISD::SETUGT, MVT::i16, Expand);

which should, if I understand it right, mean that comparisons like A <= B get transformed into !(B < A), which is supported by the slt instruction.

What's a little frustrating is that I can't see any references to a br_cc LLVM instruction (ISD::BR_CC exists in the code, but you can't pattern match on br_cc in the TD files), and most other examples I can see online show conditional branches represented as brcond.

I'm wondering now if it's actually a problem with the conditional branch at all now. Perhaps LLVM is trying to match the entire block, and the error is coming from the fact that I haven't added patterns for load yet?

20200920

Managed to get past the sentinel assertion in llc