Metadata

  • Source
  • Zotero: View Item
  • Type: VideoRecording
  • Title: An overview of Clang,
  • Year: 2019

Slides: Coming soon — This tutorial will give an overview of Clang. We will cover the distinction between the Clang compiler driver and the Clang language frontend, with an emphasis on the latter. We will examine the different Clang components that a C program goes through when being compiled, i.e., lexing, parsing, semantic analysis, and LLVM IR generation. This includes some of the Clang Abstract Syntax Tree (AST), Type, and the Diagnostics infrastructure. We will conclude by explaining the various ways in which Clang is tested.

The tutorial is aimed at newcomers who have a basic understanding of compiler concepts and wish to learn about the architecture of Clang or start contributing to Clang. — Videos Filmed & Edited by Bash Films: http://www.BashFilms.com

Annotations

Notes

See: LLVM, clang

Introduction

  • clang is a compiler driver
  • C language front-end (aka CFE = C frontend, cc1, clang_cc1); doesn’t do any optimization by itself, only converts C-like source to LLVM-IR and calls into other parts of LLVM to generate machine code
  • drives compiler invocation (preprocess, compile, link)

Compiler driver phases

  • C file + flags

  • preprocessor

  • frontend

  • middle end (optimization on IR)

  • backend (codegen/produce assembly)

  • assembler (produce object files)

  • linker (link object files)

  • LLVM: clang -ccc-print-phases file.c

    • aka clang -### file.c (“toilet brush option”)
    • dry run and prints all commands that will be run

Preprocessor & Frontend

  • Other than driving the compiling process, clang is mainly responsible for frontend (converting source code to IR).

Preprocessor + frontend:

  • preprocessor & lexer
  • (token)
  • parser
  • sema: check semantics
  • (AST)
  • codegen
  • (LLVM-IR)

All of these parts need:

  • TargetInfo
  • Diagnostics subsystem (error checking)
  • LangOptions
  • SourceMgr

Lexer

  • source to tokens
  • performance critical
  • preprocessor-included
  • takes shortcuts (e.g. skip #if 0)
  • tentative parsing (support parser’s ability to parse in multiple ways / backtrack until it successfully parses the current bit of code)
  • inspect tokens: clang -c -Xclang -dump-tokens file.c
  • tokens consumed by parser

Parser

  • Recursive-descent
  • tentative parsing: look-ahead, and backtrack if parsing fails
  • tries to recover from parsing error and continue parsing the rest of the program (will try to suggest hints to fix the program)

Sema

  • Parser only means syntactically correct
  • tightly coupled with parser
  • diagnostic errors mostly from sema

Idealized procedure:

  • check for valid semantics
  • create AST if valid (or if problems aren’t severe), produce diagnostics if invalid

Diagnostics Subsystem

  • communicate problems in a program
  • diagnostic
    • severity: note, warning, error
    • source location: file, line, col
    • message: e.g. “unknown typename ‘i‘“
  • defined in TableGen (most in DiagnosticSemaKinds.td)
    • all message strings are found in TableGen, never in C++ code
  • diagnostic messages emitted through Diag()

AST

  • faithful representation of code
  • AST is mostly immutable
  • clang -c -Xclang -ast-dump file.c to dump AST

AST Nodes

  • Type
    • BuiltinType (e.g. int)
    • QualType (e.g. const int)
    • PointerType (e.g. int*)
    • ArrayType (e.g. int[])
  • ValueStmt: Stmt, Type
  • Stmt
    • e.g. IfStmt, ReturnStmt, DeclStmt
  • Expr: ValueStmt
    • e.g. IntegerLiteral, BinaryOperator, DeclRefExpr (e.g. var)

Frontend CodeGen

  • Not the same thing as LLVM Codegen (machine code)
  • uses ASTVisitor, IRBuilder, TargetInfo
  • CodeGenModule
    • keeps global state (e.g. type cache)
    • emits global & shared entities
  • CodeGenFunction
    • stores per-function state
    • generates LLVM-IR for function body

Working on Clang codebase

build clang (LLVM_ENABLE_PROJECTS=clang)

  • builds clang-tblgen and uses it to produce .inc files from .td files
  • build rest of clang

TableGen

  • interpreted code to generate C++ source
  • Attr.td: attributes
  • Diagnostic*Kind.td: diagnostics
  • *Options.td: commandline options
  • arm_neon.td, OpenCLBuiltins.td: builtin functions

Testing

  • make check-clang
  • clang/unittests
  • clang/test contains small C/C++ programs for llvm-lit to test various clang functionalities (source input edge cases, faithful AST, faithful LLVM IR, valid diagnostics)