Metadata
Abstract
http://llvm.org/devmtg/2019-10/ — An overview of Clang - Sven van Haastregt, Anastasia Stulova
Slides: Coming soon — This tutorial will give an overview of Clang. We will cover the distinction between the Clang compiler driver and the Clang language frontend, with an emphasis on the latter. We will examine the different Clang components that a C program goes through when being compiled, i.e., lexing, parsing, semantic analysis, and LLVM IR generation. This includes some of the Clang Abstract Syntax Tree (AST), Type, and the Diagnostics infrastructure. We will conclude by explaining the various ways in which Clang is tested.
The tutorial is aimed at newcomers who have a basic understanding of compiler concepts and wish to learn about the architecture of Clang or start contributing to Clang. — Videos Filmed & Edited by Bash Films: http://www.BashFilms.com
Tags and Collections
- Keywords: 05 Finished; IRFuzzer; LLVM; clang
Comments
Annotations
Notes
Introduction
- clang is a compiler driver
- C language front-end (aka CFE = C frontend, cc1, clang_cc1); doesn’t do any optimization by itself, only converts C-like source to LLVM-IR and calls into other parts of LLVM to generate machine code
- drives compiler invocation (preprocess, compile, link)
Compiler driver phases
-
C file + flags
-
preprocessor
-
frontend
-
middle end (optimization on IR)
-
backend (codegen/produce assembly)
-
assembler (produce object files)
-
linker (link object files)
-
LLVM:
clang -ccc-print-phases file.c
- aka
clang -### file.c
(“toilet brush option”) - dry run and prints all commands that will be run
- aka
Preprocessor & Frontend
- Other than driving the compiling process, clang is mainly responsible for frontend (converting source code to IR).
Preprocessor + frontend:
- preprocessor & lexer
- (token)
- parser
- sema: check semantics
- (AST)
- codegen
- (LLVM-IR)
All of these parts need:
- TargetInfo
- Diagnostics subsystem (error checking)
- LangOptions
- SourceMgr
Lexer
- source to tokens
- performance critical
- preprocessor-included
- takes shortcuts (e.g. skip
#if 0
) - tentative parsing (support parser’s ability to parse in multiple ways / backtrack until it successfully parses the current bit of code)
- inspect tokens:
clang -c -Xclang -dump-tokens file.c
- tokens consumed by parser
Parser
- Recursive-descent
- tentative parsing: look-ahead, and backtrack if parsing fails
- tries to recover from parsing error and continue parsing the rest of the program (will try to suggest hints to fix the program)
Sema
- Parser only means syntactically correct
- tightly coupled with parser
- diagnostic errors mostly from sema
Idealized procedure:
- check for valid semantics
- create AST if valid (or if problems aren’t severe), produce diagnostics if invalid
Diagnostics Subsystem
- communicate problems in a program
- diagnostic
- severity: note, warning, error
- source location: file, line, col
- message: e.g. “unknown typename ‘i‘“
- defined in TableGen (most in
DiagnosticSemaKinds.td
)- all message strings are found in TableGen, never in C++ code
- diagnostic messages emitted through
Diag()
AST
- faithful representation of code
- AST is mostly immutable
clang -c -Xclang -ast-dump file.c
to dump AST
AST Nodes
- Type
- BuiltinType (e.g.
int
) - QualType (e.g.
const int
) - PointerType (e.g.
int*
) - ArrayType (e.g.
int[]
)
- BuiltinType (e.g.
- ValueStmt: Stmt, Type
- Stmt
- e.g. IfStmt, ReturnStmt, DeclStmt
- Expr: ValueStmt
- e.g. IntegerLiteral, BinaryOperator, DeclRefExpr (e.g.
var
)
- e.g. IntegerLiteral, BinaryOperator, DeclRefExpr (e.g.
Frontend CodeGen
- Not the same thing as LLVM Codegen (machine code)
- uses ASTVisitor, IRBuilder, TargetInfo
- CodeGenModule
- keeps global state (e.g. type cache)
- emits global & shared entities
- CodeGenFunction
- stores per-function state
- generates LLVM-IR for function body
Working on Clang codebase
build clang (LLVM_ENABLE_PROJECTS=clang)
- builds clang-tblgen and uses it to produce .inc files from .td files
- build rest of clang
TableGen
- interpreted code to generate C++ source
- Attr.td: attributes
- Diagnostic*Kind.td: diagnostics
- *Options.td: commandline options
- arm_neon.td, OpenCLBuiltins.td: builtin functions
Testing
make check-clang
clang/unittests
clang/test
contains small C/C++ programs forllvm-lit
to test various clang functionalities (source input edge cases, faithful AST, faithful LLVM IR, valid diagnostics)