Metadata
Abstract
http://llvm.org/devmtg/2019-10/ — Getting Started With LLVM: Basics - Jessica Paquette, Florian Hahn
Slides: — This tutorial serves as a tour of LLVM, geared towards beginners interested in implementing LLVM passes. Both LLVM middle-end (IR) and back-end (MIR) passes are covered. At the end of this tutorial, newcomers will be armed with the tools necessary to create their own passes, and improve upon existing passes.
This tutorial contains
A brief, high-level explanation of LLVM’s pass-based architecture. An explanation of analysis and tranformation passes, and how they interact. Examples of important analysis passes, such as Dominator Trees and Target Transform Information. An introduction to fundamental data structures and APIs for LLVM pass development. A sample project which ties together the tutorial material, for use as a reference. — Videos Filmed & Edited by Bash Films: http://www.BashFilms.com
Tags and Collections
- Keywords: 05 Finished; IRFuzzer; Introductory; LLVM
Comments
Annotations
Notes
See: LLVM LLVM IR = generic assembly language
- easy to transform
- easy to lower (codegen)
- improvements to IR propagate to target code
IR Elements
Instruction component
- zero or one result value
- opcode
- explicit operand type
- operands
Main instruction classes
- arithmetic
- compare
- control flow (jumps, conditional jumps, return, etc)
- call (direct/indirect)
- load/store
Basic blocks
- can have labels or implicitly numbered (both can be referred to)
- contain instructions
- ends with a terminator instruction, no fallthroughs
Function
- name + type signature
- made of basic blocks (order is irrelevant except entry block since every BB has terminator)
- first BB is special (“entry block”)
Module
- container for program
- contains
- functions
- decls
- globals
- …
Instructions – def, use and users
- a user of an identifier is an identifier initialized by an instruction that uses the former identifier.
- all uses must be reachable from the def (of used identifier)
Three IR types: Textual IR (.ll)
- human readable
- pass dev & debugging
Bitcode (.bc)
- storage
- backwards compat
In-memory
- objects / data structure
- interact with in C++
IR Transformation
Removing Dead Blocks
-
dead block: orphaned blocks (not connected to CFG) and also not an entry block
-
Function is iterable (BB)
-
get predecessor: BB are also values, which have users (used in a terminator if not orphaned)
-
use
make_early_inc_range(Func)
to get iterators that allow deletion from parent container (the range increments early and won’t reference the current iterator) -
when modifying/deleting BB, watch out for:
- whether or not it’s the entry block
- whether or not it has any predecessors (
BB.users()
,BB.users()[i].isTerminator()
) - if BB is to be deleted, also let successor know they lost predecessor (
sucessors(BB)
,succ->removePredecessor(&BB)
) - if BB is to be deleted, remove all uses of its instructions (start from
back()
; replace all uses with undefvalue, then erase instruction from parent) - make sure to return true if function is modified
Target-specific Pass
- middle-end passes minimize # of instructions and simplify control flow
- but different targets may need different optimization needs, which IR doesn’t handle
- so we allow transformation to query target info (e.g. target-defined TargetTransformInfo/TTI)
- use TTI hooks to get target-specific info e.g. instruction or type costs
- can’t model everything with IR & IR passes, e.g. target-specific instruction features
- use codegen passes (aka backend passes) to do target-specific opt; run after isel; codegen passes runs on MIR (between IR and assembly); MIR has target-specific opcodes, virtual registers (i.e. %0, %1) and real registers (e.g. $eflags); not SSA-compliant
- pass order matter in codegen
test codegen pass: