Metadata
Abstract
http://llvm.org/devmtg/2013-04/ — Introduction to LLVM - Eric Christopher, Johannes Doerfert
Slides: Coming Soon — Introduction to LLVM. — Videos Filmed & Edited by Bash Films: http://www.BashFilms.com
Tags and Collections
- Keywords: 05 Finished; Compiler; IRFuzzer; LLVM
Comments
Annotations
Notes
Source Layout
Each subdirectory in LLVM is a subproject (e.g. clang, llvm, debugger, linker, etc). Each subproject has the same layout (include, lib, tests, util)
- clang: language front-end
- llvm: LLVM core (middle-end, back-end)
To build:
debug builds are default (and slow & big), consider changing build type ccache can save a lot of time if you rebuild a lot ninja speed things up for rebuilds (updating files and recompiling)
Useful options:
Compilers 101
After clang AST, LLVM internally uses control flow graphs.
Life of a program
C → LLVM-IR (.ll) (human readable IR through -S
)
-O3 adds some information like type aliasing; otherwise clang just do -O0, which adds attributes to every function that says “no inlining and do not optimize”, which breaks opt
LLVM generates stack allocations for communication across basic blocks; we force it to generate another form
We want stuff to live on registers, so we run memory to register in opt
instnamer gives the instruction a name if they don’t (avoid numeric instructions); Release builds lose name information
LLVM-IR
LLVM module ~= translation unit
data layout encodes target/architecture-dependent information like pointer size, address spaces, structure padding, etc
target triple = architecture (x86_64), ??? (unknown
in this case), OS (linux-gnu)
<type> %var
e.g. i64 %var
: function parameter
%var = ...
: variable declaration/initialization
add nsw i64 %a, %b
: add instruction, no signed wrapping, for i64 ints.
semicolon = colon
#n
e.g. #0
at the end of function header = attributes defined elsewhere
%var = phi i64 [ %var1, %branch1 ], [ %var2, %branch2 ] ...
: select variable depending on which branch was executed
attribute #0 = { nounwind uwtable}
(nounwind: won’t throw exception and unwind the stack, uwtable: ??); an attribute can be applied to a specific call, a function parameter, entire function, return instruction, etc
instruction can also take additional flags
lines starting with exclamation point = metadata information
SSA: static single assignment; all variables in LLVM-IR can only be assigned once; this limitation enables various optimization SSA holds even if the IR uses another representation (stack allocations)
LLVM-IR Hierarchy
- llvm::Module
- llvm:GlobalVariable
- llvm:Function (declarations / definitions)
- llvm::BasicBlock (if a function contains basic blocks, it’s a definition; if not, then declaration)
- llvm::Instruction
- llvm::ICmpInst (compare)
- llvm::BranchInst
- llvm::Instruction
- llvm::BasicBlock (if a function contains basic blocks, it’s a definition; if not, then declaration)
LLVM-IR instructions
Each basic block contains a terminator instruction at the end
LLVM-IR (cont)
IR is in SSA-form with infinite registers
organization:
-
module: list of global symbols
-
function: list of basic blocks that form control-flow graph (CFG)
-
basic blocks: list of instructions terminated by branch/return
-
instruction = typed assembly instructions
-
constant = constant literals, globals, (function pointer is a constant)
-
value = almost anything is a value; mostly constants & instructions
-
global symbols: @xxx
-
local symbols: %xxx
-
to use a basic block, %xxx
-
to define a basic block,
xxx:
common pitfals
- LLVM-IR only deals with reducible loops
- irreducible loops are not recognized
- SSA only followed in reachable code (there is a path to the basic block); only iterate over reachable blocks, otherwise you might find broken IR
- address spaces are pointer-typed
- null (0) is a valid pointer even in address space 0 (default)
- types are sign agnostic
- llvm::constant includes values that become constants at start-up time, e.g. addresses of globals
Useful commandline options
Target and Code Generation
Example: lea
Each target (x86) has subtargets (mostly attribute-based)
- Target/
- TargetMachine contains module level lowering information
- object file / OS ABI information
- Subtarget
- contains function level lowering information, e.g. limit instruction selection (“limit this function to use SSE”)
- A subtarget is primarily defined by CPU & features
- SubtargetFeature = ISA-level stuff
Machine IR
- CodeGen/MachineInstr.h
- similar hierarchy as LLVM-IR
- not completely free of LLVM-IR
- tons of MIR construction APIs
- target-dependent AND -independent opcodes & registers
- can be produced with
llc
FastISel
- generates straight forward instructions
- not optimized
LLVM-IR to MIR happens after optimization
Instruction Selection and Register Allocation
- CodeGen/
- 3 instruction selectors
- SelectionDAG
- FastISel
- GlobalISel
- 2 typical register allocators
- RegAllocFast
- RegAllocGreedy
Object Generation - Assembly Printing
- assembler is built-in
- CodeGen/AsmPrinter
- target independent MIR to assembly
- calls into backends (?) Target/*AsmPrinter
- target specific opcode & operand printing
AsmParser/ MC/
- target-independent object file construction / encoding
Target/{AsmParser,MCTargetDesc}
- parse target-specifc assembly & encoding
Object Reading
Object/ DebugInfo/
- parse object files & target-independent data
- calls into Target/ for instruction decoding
Target/Disassembler
- target specific instruction decoding