LLVM Development Diary
This is just a general dump of notes I make as I work on implementing an LLVM backend for MUPS/16.
20240711
No progress, but some reading on the train:
- https://lwn.net/Articles/276782/ - Ian Lance Taylor's (author of the Gold linker) series on how linkers work
- http://www.staroceans.org/e-book/LinkersAndLoaders.pdf - the classic Linkers and Loaders book. I used to have a copy of this book, but I think it got loaned to the library at work and never came back. Might have to order another copy.
20240709
OK, sometimes it's better to step back and not just keep hacking at things when you're tired.
Thinking a bit more about the address problem, of course I can't do what I was trying to do (resolve
addresses to constant loads at compile time). Things like global variables don't really have an
address until they're fully linked, since we don't know exactly where in the .data
segment they
will end up (that depends on what other translation units define), or even necessarily what the
address of the instruction that's referencing the variable is (again, that depends on how objects
are linked together). It seems blindingly obvious in retrospect, but that's why both the MIPS and
Sparc backends jump through hoops to emit custom assembler directives wrapping the address. For
Sparc, what gets generated is something like:
sethi %hi(msg),%o1
or %o1,%lo(msg),%o1
where %hi
and %lo
are assembler operators that emit
relocations for their
parameter (of type R_SPARC_HI22
and R_SPARC_LO10
respectively). The linker then takes care of
extracting the high 22 or low 10 bits of the final address once it's known. More info on relocation
types.
So, I guess I need to do something similar, and I will need those custom node types after all. Hopefully it's a bit simpler now I think I understand why it's doing what it's doing.
Plan:
- add a new custom instruction selection DAG node (other targets call this
Wrapper
- I prefer something a little more descriptive, likeAddress
. We'll see.) - add a pair of type flags that can be associated with the address in
Mups16MCExpr
(which doesn't exist yet) - implement a
printImpl
function in this new class that looks at the type of the operation and if it's one of the new ones, wraps the expression in either%hi()
or%lo()
(not sure about the naming, but these seem good enough). Presumably this will also require some support in the assembly parsing code, too - change the address lowering functions to extract the address and convert it to a
TargetXXAddress
twice, once with a with a 'hi' flag, and once with a 'low' one. - change the return of the address lowering functions to be an
SDNode
with the newMUPS16ISD::Address
type
20240708
Progress, of a sort. I now have my test function compiling and producing assembly output. The output is wrong, but at least we're getting somewhere.
The main change is I started looking at the Sparc backend for inspiration, instead of the MIPS one.
There are so many ways of expressing things in tablegen that even though they're doing largely the
same thing with large immediates (splitting into low and high parts, and using multiple instructions
to assemble the value into a register). The main difference between what they do and what Mups16
does is that in both Sparc and MIPS architectures, the 'load high' instruction (lui
for MIPS,
sethi
for Sparc), zeros out the bottom part of the target register. This means they use something
like (assuming I can remember my Sparc syntax):
sethi 0xA,%r5 // Load 0x0A into high part
ori 0x1,%r5,%r5 // Load 0x01 into low part
That said, the basic idea should be the same. While I was digging, I thought I understood the tablegen matching a bit better, and perhaps I could actually do this without lots of wrappers and custom code, since a lot of that seemed to be about supporting various code models, PIC, etc., none of which I'm trying to support.
So, the next step was to simplify the loading of normal, non-address immediates, using this idea, to see if it was viable. Changing the immediate definition to
def : Pat<(i16 imm:$imm), (LUI (LIU (LO8 imm:$imm)), (HI8 imm:$imm))>;
seems to actually work for actual integer constants, without needing a pseudo-instruction at all.
So, I tried changing my Mups16TargetLowering::LowerGlobalAddress
function back to just outputting
DAG.getTargetGlobalAddress(...)
directly (no ISD::Wrapper
), and tried doing exactly the same for
tglobaladdr
. No joy, though:
def : Pat<(i16 tglobaladdr:$in), (LUI (LIU (LO8 tglobaladdr:$in)), (HI8 tglobaladdr:$in))>;
This doesn't seem to get matched. The generated IR just tries to use the address directly in a lw
instruction, and dies producing assembly output. Last IR dump before failure:
# *** IR Dump After Live DEBUG_VALUE analysis ***:
...
renamable $r1 = LW @print.msg, 0 :: (dereferenceable load 2 from @print.msg)
and the error:
.globl print ; -- Begin function print
.p2align 1
.type print,@function
print: ; @print
; %bb.0: ; %entry
sw -2($sp), $r5
addi $r5, $sp, -2
addi $sp, $sp, -4
lw $r1, 0($llc: /home/ali/git/llvm-project/llvm/include/llvm/MC/MCInst.h:65: unsigned int llvm::MCOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed.
with llvm::Mups16InstPrinter::printMemOperand
further in the stack. I get exactly the same error
if my pattern is removed, too, which means it's not being used.
Thinking about this a bit more, though, I wonder if I'm barking up the wrong tree. Can we actually know the value of a global at compile time? Surely that's deferred until the link stage? In which case this approach just can't work.
20240706
Following on from yesterday, some progress on globals. It seems that what we need is something that
matched the global address and turns it into a LI
/LUI
pair. It looks like tablegen turns the
global address into a tglobaladdress
, so in theory it seems like we should be able to just define
a pattern that matches i16 tglobaladdr:$addr
and converts to LoadImm tglobaladdr:$addr
(my
pseudo-instruction that resolves to a LI/LUI
pair). I didn't try that, since
https://discourse.llvm.org/t/a-few-questions-from-a-newbie/12771 suggests that it will result in a
cycle, so I followed what every other target does and added C++ code to match GlobalAddress
and
convert it to a Wrapper
node in the DAG that contains the global address.
Added to Mups16TargetLowering::LowerOperation
:
case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG);
and added a Mups16TargetLowering::LowerGlobalAddress
that creates a
Mups16ISD::Wrapper
node wrapping the address.
This gets further now, failing with:
llc: /home/ali/git/llvm-project/llvm/include/llvm/CodeGen/MachineOperand.h:536: int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed.
Re-running with --debug-pass=Details
to see where it's failing gives
[2024-07-06 09:44:32.146747150] 0x55a0b6de7220 Executing Pass 'Post-RA pseudo instruction expansion pass' on Function 'print'...
0x55a0b6dec930 Required Analyses: Machine Module Information
llc: /home/ali/git/llvm-project/llvm/include/llvm/CodeGen/MachineOperand.h:536: int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed.
So, it's a lot further on. Running with --print-after-all
, this is the last dump (and presumably
what's getting fed into the failing pass):
# *** IR Dump After Prologue/Epilogue Insertion & Frame Finalization ***:
# Machine code for function print: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten
Frame Objects:
fi#0: size=2, align=2, at location [SP-2]
bb.0.entry:
successors: %bb.1(0x80000000); %bb.1(100.00%)
frame-setup SW $sp, -2, $r5
$r5 = frame-setup ADDI $sp, -2
$sp = frame-setup ADDI $sp, -4
renamable $r1 = LoadImm @print.msg
renamable $r1 = LW killed renamable $r1, 0 :: (dereferenceable load 2 from @print.msg)
SW $r5, 0, killed renamable $r1 :: (store 2 into %ir.ptr)
J %bb.1
bb.1.while.cond:
; predecessors: %bb.0, %bb.2
successors: %bb.2, %bb.3
renamable $r1 = LW $r5, 0 :: (dereferenceable load 2 from %ir.ptr)
renamable $r1 = LBU killed renamable $r1, 0 :: (load 1 from %ir.1)
BZ killed renamable $r1, %bb.3
J %bb.2
bb.2.while.body:
; predecessors: %bb.1
successors: %bb.1(0x80000000); %bb.1(100.00%)
renamable $r1 = LW $r5, 0 :: (dereferenceable load 2 from %ir.ptr)
renamable $r1 = LBU killed renamable $r1, 0 :: (load 1 from %ir.3)
renamable $r5 = LoadImm 2560
SB killed renamable $r1, killed renamable $r5, 0 :: (store 1 into `i8* inttoptr (i16 2560 to i8*)`)
J %bb.1
bb.3.while.end:
; predecessors: %bb.1
$sp = ADDI $r5, 0
$r5 = LW $r5, -4
RetRA
# End machine code for function print.
The pass name and error together suggest that the problem might be a pseudo-instruction that expects
an immediate and isn't getting one. Perhaps it's that renamable $r1 = LoadImm @print.msg
?
Actually, looking more closely, that seems almost certain. The stack trace has
#11 0x00005586f645384d llvm::Mups16InstrInfo::expandLoadImm(llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>)
Catching this assertion failure in GDB, we have
(gdb) p *this
$1 = {OpKind = 10, SubReg_TargetFlags = 0, TiedTo = 15, IsDef = 1, IsImp = 1, IsDeadOrKill = 1, IsRenamable = 1, IsUndef = 1, IsInternalRead = 1, IsEarlyClobber = 1, IsDebug = 1, SmallContents = {
RegNo = 0, OffsetLo = 0}, ParentMI = 0x5555599ced00, Contents = {MBB = 0x5555599967f0, CFP = 0x5555599967f0, CI = 0x5555599967f0, ImmVal = 93825063806960, RegMask = 0x5555599967f0,
MD = 0x5555599967f0, Sym = 0x5555599967f0, CFIIndex = 1503225840, IntrinsicID = 1503225840, Pred = 1503225840, ShuffleMask = {Data = 0x5555599967f0, Length = 140733193388032}, Reg = {
Prev = 0x5555599967f0, Next = 0x7fff00000000}, OffsetedInfo = {Val = {Index = 1503225840, SymbolName = 0x5555599967f0 " \202\232YUU", GV = 0x5555599967f0, BA = 0x5555599967f0}, OffsetHi = 0}}}
(gdb)
and sure enough, OpKind
of 10 means GlobalAddress
, not Immediate
:
(gdb) ptype MachineOperandType
type = enum llvm::MachineOperand::MachineOperandType : unsigned char {llvm::MachineOperand::MO_Register, llvm::MachineOperand::MO_Immediate, llvm::MachineOperand::MO_CImmediate,
llvm::MachineOperand::MO_FPImmediate, llvm::MachineOperand::MO_MachineBasicBlock, llvm::MachineOperand::MO_FrameIndex, llvm::MachineOperand::MO_ConstantPoolIndex,
llvm::MachineOperand::MO_TargetIndex, llvm::MachineOperand::MO_JumpTableIndex, llvm::MachineOperand::MO_ExternalSymbol,
// ----- here ----->
llvm::MachineOperand::MO_GlobalAddress,
// <----- here -----
llvm::MachineOperand::MO_BlockAddress, llvm::MachineOperand::MO_RegisterMask, llvm::MachineOperand::MO_RegisterLiveOut, llvm::MachineOperand::MO_Metadata, llvm::MachineOperand::MO_MCSymbol,
llvm::MachineOperand::MO_CFIIndex, llvm::MachineOperand::MO_IntrinsicID, llvm::MachineOperand::MO_Predicate, llvm::MachineOperand::MO_ShuffleMask, llvm::MachineOperand::MO_Last = 19}
We obviously need to convert the address to a value somehow. Alternatively, I wonder if this is
because I'm trying to convert the address to a LoadImm
instruction, which is a pseudo-instruction,
and somehow this is off the beaten path?
Two options suggest themselves:
- Try explicitly handling address types in the
Mups16InstrInfo::expandLoadImm
function - Try Mips-style explicitly building the address from high and low parts;
The first looks like less work, so I'll try that first.
Update: first option seems like a bit of a dead end. Firstly, no other target does this, which
suggests maybe it's a bad idea. Secondly, there doesn't seem to be a way to extract the actual
constant value out of a GlobalValue
. The best I can see is getUniqueInteger()
, but that only
works if the type is exactly ConstantInt
or a vector of them. Time to try the second option.
20240705
Getting back to this after a few years (!) while I wait for my new control unit boards to arrive, and making some quick notes on how to build LLVM, since I've forgotten everything.
Building
cd git
git clone git@github.com:pottsali/llvm-project.git
cd llvm-project
gco feature/mups16
Configure:
mkdir build
cd build
cmake ../llvm
Build everything (could probably skip this stage and only build the Mups16 backend, but I can kick it off and let it build over lunch).
cmake --build . -j8
Linking failed first time around with OOM errors, so trying again without -j
to re-do the link step
one at a time. Looks like linking libLTO.so
alone requires ~25GB RAM. Linking without -j
worked.
Building with better options, including clang:
cmake -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_PROJECTS="clang" -D LLVM_TARGETS_TO_BUILD="Mips" -D LLVM_EXPERIMENTAL_TARGETS_TO_BUILD="Mups16" -DLLVM_OPTIMIZED_TABLEGEN=On ../llvm
cmake --build . -j8
This just builds the Mups16 and MIPS targets.
You can also just build llc
for slightly faster iteration:
cmake --build . --target llc -j8
Running
With that built, I get a promising failure in the assembler when compiling
unsigned short* const CONSOLE=(unsigned short*const)0xA00;
void print()
{
*CONSOLE = 'H';
}
into object code:
$ ./bin/clang -target mups16 -c ~/git/cpu/cpp/target/hello_world/main.c
/tmp/main-b5d338.s: Assembler messages:
/tmp/main-b5d338.s:7: Error: no such instruction: `sw -2($sp),$fp'
/tmp/main-b5d338.s:8: Error: no such instruction: `addi $fp,$sp,-2'
/tmp/main-b5d338.s:9: Error: no such instruction: `addi $sp,$sp,-4'
/tmp/main-b5d338.s:10: Error: no such instruction: `sw -2($fp),$r2'
/tmp/main-b5d338.s:11: Error: no such instruction: `li $r1,72'
/tmp/main-b5d338.s:12: Error: no such instruction: `liu $r2,0'
/tmp/main-b5d338.s:13: Error: no such instruction: `lui $r2,10'
/tmp/main-b5d338.s:14: Error: no such instruction: `sw 0($r2),$r1'
/tmp/main-b5d338.s:15: Error: no such instruction: `lw $r2,-2($fp)'
/tmp/main-b5d338.s:16: Error: no such instruction: `addi $sp,$fp,0'
/tmp/main-b5d338.s:17: Error: no such instruction: `lw $fp,-4($fp)'
/tmp/main-b5d338.s:18: Error: no such instruction: `jr 0($ra)'
clang-11: error: assembler command failed with exit code 1 (use -v to see invocation)
Obviously I got a bit further 4 years ago than I remembered. Trying to expand this to a loop failed, though:
unsigned short* const CONSOLE=(unsigned short*const)0xA00;
void print()
{
static const char* msg = "Hello, world!";
const char* ptr = msg;
while (*ptr)
{
*CONSOLE = *ptr;
}
}
$ ./bin/clang -target mups16 -c ~/git/cpu/cpp/target/hello_world/main.c -emit-llvm
.text
.file "main.c"
LLVM ERROR: Cannot select: t1: i16 = GlobalAddress<i8** @print.msg> 0
In function: print
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0. Program arguments: bin/llc -O0 -march=mups16 -relocation-model=static -filetype=asm main.bc -o -
1. Running pass 'Function Pass Manager' on module 'main.bc'.
2. Running pass 'Mups16 DAG->DAG Pattern Instruction Selection' on function '@print'
...
Debugging
Build IR with clang
first:
$ ./bin/clang -target mups16 -c ~/git/cpu/cpp/target/hello_world/main.c -emit-llvm
Then compile with llc
:
bin/llc -O0 -march=mups16 -relocation-model=static -filetype=asm main.bc -o -debug --debug-only=isel
For some reason running with -debug
doesn't actually show any debugging. I'm not sure why, but it
seems that --debug-only=isel
gives a bit more of what I want:
===== Instruction selection begins: %bb.0 'entry'
ISEL: Starting selection on root node: t8: ch = br t6, BasicBlock:ch<while.cond 0x56124dbca610>
ISEL: Starting pattern match
Morphed node: t8: ch = J BasicBlock:ch<while.cond 0x56124dbca610>, t6
ISEL: Match complete!
ISEL: Starting selection on root node: t6: ch = store<(store 2 into %ir.ptr)> t4:1, t4, FrameIndex:i16<0>, undef:i16
ISEL: Starting pattern match
Initial Opcode index to 82
Skipped scope entry (due to false predicate) at index 92, continuing at 108
Morphed node: t6: ch = SW<Mem:(store 2 into %ir.ptr)> TargetFrameIndex:i16<0>, TargetConstant:i16<0>, t4, t4:1
ISEL: Match complete!
ISEL: Starting selection on root node: t4: i16,ch = load<(dereferenceable load 2 from @print.msg)> t0, GlobalAddress:i16<i8** @print.msg> 0, undef:i16
ISEL: Starting pattern match
Initial Opcode index to 4
Skipped scope entry (due to false predicate) at index 13, continuing at 29
Skipped scope entry (due to false predicate) at index 30, continuing at 46
Morphed node: t4: i16,ch = LW<Mem:(dereferenceable load 2 from @print.msg)> GlobalAddress:i16<i8** @print.msg> 0, TargetConstant:i16<0>, t0
ISEL: Match complete!
ISEL: Starting selection on root node: t7: ch = BasicBlock<while.cond 0x56124dbca610>
ISEL: Starting selection on root node: t1: i16 = GlobalAddress<i8** @print.msg> 0
ISEL: Starting pattern match
Initial Opcode index to 0
Match failed at index 0
LLVM ERROR: Cannot select: t1: i16 = GlobalAddress<i8** @print.msg> 0
Still not sure why, but at least there's something to go on.
OK, after a couple of hours of digging, I think this is what is going on. In Mups16ISelLowering.cpp, we have
setOperationAction(ISD::GlobalAddress, MVT::i16, Custom);
This indicates that we'll handle the lowering of global addresses ourselves, in the
Mups16TargetLowering::LowerOperation
function. However, this is the current contents of that
function:
SDValue Mups16TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
switch (Op.getOpcode())
{
}
return {};
}
Not only does this not handle anything, but it doesn't even fail. Changing this to
SDValue Mups16TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
switch (Op.getOpcode())
{
default:
llvm_unreachable("unimplemented operand");
}
return {};
}
confirms that we're failing here:
$ bin/llc -O0 -march=mups16 -relocation-model=static -filetype=asm main.bc -o -debug --debug-only=all
unimplemented operand
UNREACHABLE executed at /home/ali/git/llvm-project/llvm/lib/Target/Mups16/Mups16ISelLowering.cpp:327!
I don't think we actually need to do anything special with global addresses. We don't have a global
data pointer to use via offsets or anything nice like that, so maybe I can just change the Custom
to Expand
and let it just load literals?
Update: nope. Same error we started with. Looking at all the other targets, they all use Custom
,
so I guess I'll have to write some code. Not today.
20201010
Spent a while trying to work out how to load large (>255) constants into registers. The instruction
sequence that needs to be generated is simple. For 0x4280
we need:
liu $r1, 0x80
lui $r1, 0x42
I thought this would be an easy pattern to express, but couldn't find a way to do it in tablegen.
I looked at what other RISC-y backends do, and they all work in roughly the same way, with a large immediate load expanding to something equivalent to
lui $r1, 0x42
ori $r1, $r1, 0x80
This works because they're using 32-bit instructions and can encode a halfword in the immediate
field of their ori
instruction. I can't, with only 5 bits available.
In order to load a register in two cycles I have to use two single-register instruction, as they're
the only ones with 8-bit immediates, and my lui
instruction has to have slightly unusual
semantics, in that it leaves the low 8 bits untouched (most instruction sets that have a lui
set
the low 8 bits to 0). This works nicely, but the problem now is that the lui
instruction
effectively uses its register operand as both a source and destination (what it does is effectively
reg = (reg & 0xff) | (imm << 8)
in a single instruction). I couldn't find a way to express this
cleanly in the .td file.
My first attempt was a basic pattern:
def : Pat<(i16 imm:$imm), (LUI (LIU (LO8 imm:$imm)), (HI8 imm:$imm))>;
where LO8
and HI8
are just SDNodeXForm
s that extract the low and high byte respectively. This
doesn't work, as the lui
node isn't defined to take an input register. I played around with
a few variations on this, before deciding that I'd just use a pseudo-instruction and expand it in
code.
def LoadImm: MupsPseudo<(outs IntReg:$rdd), (ins imm16:$imm),
[(set IntReg:$rdd, imm:$imm)]>;
and expanded in Mups16InstrInfo.cpp
as
void Mups16InstrInfo::expandLoadImm(MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const
{
auto& reg = I->getOperand(0);
BuildMI(MBB, I, I->getDebugLoc(), get(MUPS::LIU))
.addReg(reg.getReg())
.addImm(I->getOperand(1).getImm() & 0xff);
BuildMI(MBB, I, I->getDebugLoc(), get(MUPS::LUI))
.addReg(reg.getReg())
.addImm((I->getOperand(1).getImm() >> 8) & 0xff);
}
which seems to work, with the intermediate DAG node %3:intregs = LoadImm 256
expanding to
liu $r2, 0
lui $r2, 1
Next up: try loops again.
20201008
No progress, worked on soldering up the memory board instead
20201007
Spent a couple of frustrating evenings trying to work out why LLVM was failing to match byte loads:
ISEL: Starting selection on root node: t17: i16,ch = load<(load 1 from %ir.0), anyext from i8> t6, t7, undef:i16
ISEL: Starting pattern match
Initial Opcode index to 4
Skipped scope entry (due to false predicate) at index 13, continuing at 29
Skipped scope entry (due to false predicate) at index 30, continuing at 46
Skipped scope entry (due to false predicate) at index 47, continuing at 61
Match failed at index 11
LLVM ERROR: Cannot select: t17: i16,ch = load<(load 1 from %ir.0), anyext from i8> t6, t7, undef:i16
I have a pair of byte-loading instructions defined:
def LB : InstMupsID1<10, (outs IntReg:$rdd), (ins memsrc:$src), │Included from /home/ali/git/llvm-project/llvm/lib/Target/Mups16/Mups16.td:19:
"lb $rdd, $src", │/home/ali/git/llvm-project/llvm/lib/Target/Mups16/Mups16InstrInfo.td:334:58: error: expected ')' in dag init
[(set IntReg:$rdd, (sextloadi8 addr:$src))]>; │def : Pat<(i16 imm:$imm), (LUI (LIU (LO8 imm:$imm)), HI8 imm:$imm)>;
def LBU : InstMupsID1<11, (outs IntReg:$rdd), (ins memsrc:$src), │[7/27] Building Mups16GenInstrInfo.inc...
"lbu $rdd, $src", │FAILED: lib/Target/Mups16/Mups16GenInstrInfo.inc
[(set IntReg:$rdd, (zextloadi8 addr:$src))]>;
i.e. there's one for sign-extending loads (sextloadi8
) and one for zero-extended loads
(zextloadi8
).
Wasted a lot of time thinking it was something to do with the instruction having two register
arguments instead of reg+imm, but that was just me getting confused by the t6
argument, which I
think is just the ch
result of the previous instruction, passed in as part of instruction
chaining.
Next suspicion was that it was because the load doesn't have a frame index, and that this would
somehow screw up matching with the addr
operand (defined as def addr: ComplexPattern<iPTR, 2,
"selectAddr", [frameindex], []>;
), checking the C++ selectAddr
function, etc. No joy here.
Finally came across
this
article about the LLVM backend which had a very useful hint to look in the
generated matching table in build/lib/Target/Mups16/Mups16GenDAGISel.inc
, to see
which predicates were actually failing. Lo and behold, it's pretty obvious:
static const unsigned char MatcherTable[] = {
/* 0*/ OPC_SwitchOpcode /*9 cases */, 75, TARGET_VAL(ISD::LOAD),// ->79
/* 4*/ OPC_RecordMemRef,
/* 5*/ OPC_RecordNode, // #0 = 'ld' chained node
/* 6*/ OPC_RecordChild1, // #1 = $src
/* 7*/ OPC_CheckPredicate, 0, // Predicate_unindexedload
/* 9*/ OPC_CheckType, MVT::i16,
/* 11*/ OPC_Scope, 16, /*->29*/ // 4 children in Scope
/* 13*/ OPC_CheckPredicate, 1, // Predicate_sextload
/* 15*/ OPC_CheckPredicate, 2, // Predicate_sextloadi8
/* 17*/ OPC_CheckComplexPat, /*CP*/0, /*#*/1, // selectAddr:$src #2 #3
/* 20*/ OPC_EmitMergeInputChains1_0,
/* 21*/ OPC_MorphNodeTo1, TARGET_VAL(MUPS::LB), 0|OPFL_Chain|OPFL_MemRefs,
MVT::i16, 2/*#Ops*/, 2, 3,
// Src: (ld:{ *:[i16] } addr:{ *:[iPTR] }:$src)<<P:Predicate_unindexedload>><<P:Predicate_sextload>><<P:Predicate_sextloadi8>> - Complexity = 13
// Dst: (LB:{ *:[i16] } addr:{ *:[i16] }:$src)
/* 29*/ /*Scope*/ 16, /*->46*/
/* 30*/ OPC_CheckPredicate, 3, // Predicate_zextload
/* 32*/ OPC_CheckPredicate, 2, // Predicate_zextloadi8
/* 34*/ OPC_CheckComplexPat, /*CP*/0, /*#*/1, // selectAddr:$src #2 #3
/* 37*/ OPC_EmitMergeInputChains1_0,
/* 38*/ OPC_MorphNodeTo1, TARGET_VAL(MUPS::LBU), 0|OPFL_Chain|OPFL_MemRefs,
MVT::i16, 2/*#Ops*/, 2, 3,
// Src: (ld:{ *:[i16] } addr:{ *:[iPTR] }:$src)<<P:Predicate_unindexedload>><<P:Predicate_zextload>><<P:Predicate_zextloadi8>> - Complexity = 13
// Dst: (LBU:{ *:[i16] } addr:{ *:[i16] }:$src)
/* 46*/ /*Scope*/ 14, /*->61*/
/* 47*/ OPC_CheckPredicate, 4, // Predicate_load
/* 49*/ OPC_CheckComplexPat, /*CP*/0, /*#*/1, // selectAddr:$src #2 #3
/* 52*/ OPC_EmitMergeInputChains1_0,
/* 53*/ OPC_MorphNodeTo1, TARGET_VAL(MUPS::LW), 0|OPFL_Chain|OPFL_MemRefs,
MVT::i16, 2/*#Ops*/, 2, 3,
// Src: (ld:{ *:[i16] } addr:{ *:[iPTR] }:$src)<<P:Predicate_unindexedload>><<P:Predicate_load>> - Complexity = 13
// Dst: (LW:{ *:[i16] } addr:{ *:[i16] }:$src)
/* 61*/ 0, /*End of Scope*/
Matching to the Skipped scope entry (due to false predicate) at index XX
messages this gives a pretty clear picture:
/* 13*/ OPC_CheckPredicate, 1, // Predicate_sextload
/* 30*/ OPC_CheckPredicate, 3, // Predicate_zextload
/* 47*/ OPC_CheckPredicate, 4, // Predicate_load
Further down in the file, we have the predicate-matching code:
case 1: {
// Predicate_sextload
SDNode *N = Node;
(void)N;
if (cast<LoadSDNode>(N)->getExtensionType() != ISD::SEXTLOAD) return false;
return true;
}
// skipping 2...
}
case 3: {
// Predicate_zextload
SDNode *N = Node;
(void)N;
if (cast<LoadSDNode>(N)->getExtensionType() != ISD::ZEXTLOAD) return false;
return true;
}
case 4: {
// Predicate_load
SDNode *N = Node;
(void)N;
if (cast<LoadSDNode>(N)->getExtensionType() != ISD::NON_EXTLOAD) return false;
return true;
}
So, these will only match if the extension type is one of SEXTLOAD
,
ZEXTLOAD
or NON_EXTLOAD
, but apparently not EXTLOAD
(which is what
anyext
maps to in C++). I naively assumed that anyext
would happily match
either sign- or zero-extended loads, but apparently it doesn't.
Borrowing from the Lanai backend again I added a standalone pattern for
anyext
loads that maps to the LBU
instruction (I don't think we generally
want sign extension unless it's specifically requested):
// Pattern for anyext loads, since they don't seem to match either of the above
def : Pat<(extloadi8 addr:$src), (i16 (LBU addr:$src))>;
and with that the load matched:
ISEL: Starting selection on root node: t17: i16,ch = load<(load 1 from %ir.0), anyext from i8> t6, t7, undef:i16
ISEL: Starting pattern match
Initial Opcode index to 4
Skipped scope entry (due to false predicate) at index 13, continuing at 29
Skipped scope entry (due to false predicate) at index 30, continuing at 46
Skipped scope entry (due to false predicate) at index 47, continuing at 61
Morphed node: t17: i16,ch = LBU<Mem:(load 1 from %ir.0)> t7, TargetConstant:i16<0>, t6
ISEL: Match complete!
Next failure is one I knew was coming: loading immediates >= 256 into a register, since it can't be done in a single instruction. Hopefully another simple pattern can fix that.
20201004
Implemented eliminateFrameIndex
, and got my first successful compile of a function! Admittedly it
was the empty print function, but still.
Spent a while tracking down this error:
Use of %1 does not have a corresponding definition on every path:
48r %1:intregs = LI %1:intregs
LLVM ERROR: Use not jointly dominated by defs.
This should have been moving the constant 42
into $r1
prior to returning it, but somehow the DAG
t0: ch = EntryToken
t2: i16,ch = CopyFromReg t0, Register:i16 %0
t6: ch = store<(store 2 into %ir.str.addr)> t0, t2, FrameIndex:i16<0>, undef:i16
t9: ch,glue = CopyToReg t6, Register:i16 $r1, Constant:i16<42>
t10: ch = Mups16ISD::Ret t9, Register:i16 $r1, t9:1
was getting transformed into
t0: ch = EntryToken
t7: i16 = LI t7
t2: i16,ch = CopyFromReg t0, Register:i16 %0
t6: ch = SW<Mem:(store 2 into %ir.str.addr)> t2, TargetFrameIndex:i16<0>, TargetConstant:i16<0>, t0
t9: ch,glue = CopyToReg t6, Register:i16 $r1, t7
t10: ch = RetRA Register:i16 $r1, t9, t9:1
The load of Constant:i16<42>
had disappeared, replaced by t7: i16 = LI t7
.
Turned out to be just that the li
instruction was defined as
def LI : InstMupsIID<13, (outs IntReg:$rdd), (ins imm8:$imm8),
"li $rdd, $imm8",
[(set IntReg:$rdd, i16:$imm8)]>;
Changing the pattern in the last line to [(set IntReg:$rdd, imm8:$imm8)]>;
fixed it.
Further minor problem: sw
instructions that are generated from a pattern (rather than the ones created
explicitly in the lowering C++ functions) had the operands round the wrong way, leading to an
assertion printing the instructions:
Assertion `Disp.isImm() && "Expected immediate in displacement field"' failed.
Easily fixed by swapping the operands in the instruction definition, to match the sw <dest_addr>,
<source_reg>
layout:
// Incorrect
def SW : InstMupsI2<17, (IntReg:$rs2, ins memdst:$dst), ...
// Correct
def SW : InstMupsI2<17, (ins memdst:$dst, IntReg:$rs2), ...
20201003
Progress. Decided to try compiling a completely empty function, just to try to make some progress on the structure. My function now looks like:
static char* const TERM_ADDR=(char*)0x100
void print(const char* str_)
{
}
With this, I immediately get an assertion that I put in my Mups16FrameLowering::emitPrologue
function, which is good. Decided to use register 5 (the old PC) as a frame pointer for now, just for
simplicity.
Other piece of good news is I found a much simpler example to copy from: the Google Lanai backed.
That appears to be a very straightforward 32-bit RISC CPU, and the backend code is very concise.
I've adapted its emitPrologue
and emitEpilogue
functions, and now llc
is hanging, which is
good news. I think that means I need to implement eliminateFrameIndex
, judging from a few comments
I've seen in code (mainly in the CPU0 backend, I think). Makes a pleasant change to have a hang
instead of a crash.
20201002
No new code today. Spent a couple of hours trying to understand how the prologue/epilogue generation works in LLVM. Still somewhat confused. I'm wondering how this works at all if you don't have a frame register (every example I can see does have one). Maybe I'll have to sacrifice one of my precious general-purpose registers as a frame register. Might turn out to be really useful that I redesigned the program counter as a dedicated circuit and freed up register 6.
Got distracted reading about liveness analysis, after seeing lots of calls to .live_in()
and
.kill()
in various places in the register spilling/popping.
20201001
After much fruitless banging of head against wall, I decided to debug this the old-fashioned way.
No, not with printf, the other way: by deleting half the code from the print()
function I'm
compiling and seeing if my problem goes away. Hopefully I should be able to work out if it's the
branch instructions or the load/store ones that're causing the problems.
Removing the branches, so the function is just a straightforward
static char* const TERM_ADDR=(char*)0x100
void print(const char* str_)
{
*TERM_ADDR = *str_;
}
got me a little further, to a segfault in LowerReturn
, which wasn't surprising since I hadn't
implemented that function yet. Implemented that based mostly on CPU0 and RISCV examples. Only
wrinkle was what to do with the ret
instruction, which isn't natively supported. Ended up doing
something like CPU0, and adding a ret
pseudo-instruction (Mups16ISD::Ret
) that's expanded in a
post-register-allocation pass by the Mups16InstrInfo::expandPostRAPseudo
function into
BuildMI(MBB, I, I->getDebugLoc(), get(MUPS::JR)).addReg(MUPS::RA);
There's probably a simpler way, but this seems to work. Next problem is missing function prologue/epilogue code.
20200923
I'm wondering if the assertion from last night is something to do with LLVM not knowing that my load
instruction are actually loads (there's an offhand comment in
http://llvm.org/docs/CodeGenerator.html#selectiondag-select-phase that
says "We don’t automatically infer flags like isStore/isLoad yet"). Most other backends seem to have
implemented the isLoadFromStackSlot
function in the XXXInstrInfo.cpp
file. I'll try adding that.
Straws firmly clutched.
Update: nope.
20200922
Decided to take a break from beating my head against the br_cc
problem, and
look at load/store instead. Partly just for a change, but also in case it turns
out that llvm is trying to match the whole fragment I mentioned yesterday, in
which case it will need the load
pattern anyway.
First stab at this:
def memsrc : Operand<i16> {
let PrintMethod = "printMemOperand";
let MIOperandInfo = (ops IntReg, imm5);
let ParserMatchClass = MemAsmOperand;
}
def LB : InstMupsID1<10, (outs IntReg:$rdd), (ins memsrc:$src),
"lb $rdd, $src",
[(set IntReg:$rdd, (sextloadi8 memsrc:$src))]>;
This doesn't work. It appears that memsrc
isn't a valid leaf in the dag:
Unknown leaf kind: memsrc:{ *:[i16] }:$src
Attempt two was to use iPTR
for the type:
[(set IntReg:$rdd, (sextloadi8 iPTR:$src))]>;
which gives a more interesting error:
LB: (LB:{ *:[i16] } iPTR:{ *:[i16] }:$src)
Included from /xx/git/llvm-project/llvm/lib/Target/Mups16/Mups16.td:19:
/home/ali/git/llvm-project/llvm/lib/Target/Mups16/Mups16InstrInfo.td:118:1: error: In LB: Instruction 'LB' expects more than the provided 1 operands!
Presumably this is because the instruction is defined to take memsrc
as its
input operand, which is a compound operand that takes an IntReg, imm5
pair
to allow for the constant offset from the source register. Maybe I need to add
a custom pattern that can match this?
Later:
Success! Not only on memory loads, but on the br_cc
problem from yesterday.
Memory first: adding a ComplexPattern
and some code to match the base+offset
seems to have worked. In the Mups16InstrInfo.td
file:
// Leaf pattern for matching memory addresses
def addr: ComplexPattern<iPTR, 2, "selectAddr", [frameindex]>;
// ...
def LB : InstMupsID1<10, (outs IntReg:$rdd), (ins memsrc:$src),
"lb $rdd, $src",
[(set IntReg:$rdd, (sextloadi8 addr:$src))]>;
with a matching Mups16DAGToDAGISel::selectAddr(SDValue Addr, SDValue &Base,
SDValue &Offset)
worked a treat. This function is called whenever an address
is matched, and can set the Base
and Offset
ref arguments.
The br_cc
fix was embarassingly simple: just change the definitions in the legalization DAG in Mups16ISelLowering.cpp
to expand br_cc
, and make brcond
legal:
setOperationAction(ISD::BR_CC, MVT::i8, Promote);
setOperationAction(ISD::BR_CC, MVT::i16, Expand);
setOperationAction(ISD::BRCOND, MVT::Other, Legal);
With store patterns added for sw
and sb
, the selection now gets as far as the second load before
failing:
===== Instruction selection begins: %bb.0 'entry'
ISEL: Starting selection on root node: t23: ch = br t26, BasicBlock:ch<if.then 0x7fffed961eb0>
ISEL: Starting pattern match
Skipped scope entry (due to false predicate) at index 2, continuing at 63
Skipped scope entry (due to false predicate) at index 64, continuing at 111
Skipped scope entry (due to false predicate) at index 112, continuing at 161
Skipped scope entry (due to false predicate) at index 162, continuing at 175
Morphed node: t23: ch = J BasicBlock:ch<if.then 0x7fffed961eb0>, t26
ISEL: Match complete!
ISEL: Starting selection on root node: t26: ch = brcond t29, t31, BasicBlock:ch<if.end 0x7fffed961f88>
ISEL: Starting pattern match
Skipped scope entry (due to false predicate) at index 2, continuing at 63
Skipped scope entry (due to false predicate) at index 64, continuing at 111
Morphed node: t26: ch = BZ t36, BasicBlock:ch<if.end 0x7fffed961f88>, t29
ISEL: Match complete!
ISEL: Starting selection on root node: t36: i16,ch = load<(dereferenceable load 1 from %ir.b), zext from i8> t29, FrameIndex:i16<1>, undef:i16
ISEL: Starting pattern match
Skipped scope entry (due to false predicate) at index 14, continuing at 30
Creating constant: t38: i16 = TargetConstant<0>
Morphed node: t36: i16,ch = LBU<Mem:(dereferenceable load 1 from %ir.b)> TargetFrameIndex:i16<1>, TargetConstant:i16<0>, t29
ISEL: Match complete!
ISEL: Starting selection on root node: t29: ch = store<(store 1 into %ir.b), trunc to i8> t28:1, t28, FrameIndex:i16<1>, undef:i16
ISEL: Starting pattern match
Skipped scope entry (due to false predicate) at index 2, continuing at 63
Morphed node: t29: ch = SB<Mem:(store 1 into %ir.b)> TargetFrameIndex:i16<1>, TargetConstant:i16<0>, t28, t28:1
ISEL: Match complete!
ISEL: Starting selection on root node: t28: i16,ch = load<(load 1 from %ir.0), anyext from i8> t10, t7, undef:i16
ISEL: Starting pattern match
Skipped scope entry (due to false predicate) at index 14, continuing at 30
Skipped scope entry (due to false predicate) at index 31, continuing at 47
Skipped scope entry (due to false predicate) at index 48, continuing at 62
Match failed at index 12
Continuing at 63
Match failed at index 64
Continuing at 111
Match failed at index 112
Continuing at 161
Match failed at index 162
Continuing at 175
Match failed at index 176
Continuing at 193
llc: /home/ali/git/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:900: bool llvm::SelectionDAG::RemoveNodeFromCSEMaps(llvm::SDNode*): Assertion `N->getOpcode() != ISD::EntryToken && "EntryToken in CSEMap!"' failed.
I've no idea what's going on here. Tomorrow's problem.
20200921
Some progress with llvm, and lots of frustration. Good news is that I got my Mups16 instruction selected from an llvm one:
===== Instruction selection begins: %bb.0 'entry'
ISEL: Starting selection on root node: t23: ch = br t37, BasicBlock:ch<if.then 0x7fffc3fd73b0>
ISEL: Starting pattern match
Morphed node: t23: ch = J BasicBlock:ch<if.then 0x7fffc3fd73b0>, t37
ISEL: Match complete!
Unfortunately, the next instruction is a conditional branch, and I can't for the life of me work out how to match that:
ISEL: Starting selection on root node: t37: ch = br_cc t29, seteq:ch, t36, Constant:i16<0>, BasicBlock:ch<if.end 0x7fffc3fd7488>
ISEL: Starting pattern match
Initial Opcode index to 0
Match failed at index 0
LLVM ERROR: Cannot select: t37: ch = br_cc t29, seteq:ch, t36, Constant:i16<0>, BasicBlock:ch<if.end 0x7fffc3fd7488>
t36: i16,ch = load<(dereferenceable load 1 from %ir.b), zext from i8> t29, FrameIndex:i16<1>, undef:i16
t12: i16 = FrameIndex<1>
t5: i16 = undef
t27: i16 = Constant<0>
In function: print
I've tried what seemed obvious:
def BZ : InstMupsII1<20, (ins IntReg:$rs1, brtarget:$offset),
"bz $rs1, $offset",
[(brcond (i16 (seteq IntReg:$rs1, 0)), bb:$offset)]>;
and made sure that the unsupported comparisons are expanded in the legalization DAG:
setOperationAction(ISD::SETCC, MVT::i8, Promote);
// Valid condcode actions (which comparisons are natively supported by the CPU)
setCondCodeAction(ISD::SETOLE, MVT::i16, Expand);
setCondCodeAction(ISD::SETOGE, MVT::i16, Expand);
setCondCodeAction(ISD::SETOGT, MVT::i16, Expand);
setCondCodeAction(ISD::SETULE, MVT::i16, Expand);
setCondCodeAction(ISD::SETUGE, MVT::i16, Expand);
setCondCodeAction(ISD::SETUGT, MVT::i16, Expand);
which should, if I understand it right, mean that comparisons like A <= B
get
transformed into !(B < A)
, which is supported by the slt
instruction.
What's a little frustrating is that I can't see any references to a br_cc
LLVM instruction (ISD::BR_CC
exists in the code, but you can't pattern match
on br_cc
in the TD files), and most other examples I can see online show
conditional branches represented as brcond
.
I'm wondering now if it's actually a problem with the conditional branch at all
now. Perhaps LLVM is trying to match the entire block, and the error is coming
from the fact that I haven't added patterns for load
yet?
20200920
Managed to get past the sentinel assertion in llc