

Nevertheless, it’s important to note that LLVM is not intended to be a universal compiler IR.


It aims to be a "universal IR" of sorts, by being at a low enough level that high-level ideas may be cleanly mapped to it (similar to how microprocessors are "universal IR's", allowing many source languages to be mapped to them).



  • “想知道除了SSA1,还有没有其他形式,这样才能知道SSA为什么用于编译器中间表示的优势和劣势。”
# [LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation](../../../Essays/IR/2004.LLVM-A_Compilation_Framework_for_Lifelong_Program_Analysis_Transformation.pdf)阅读笔记



Allowing lifelong lifelong reoptimization of the program gives architects the power to evolve processors and exposed interfaces in more flexible ways 2 3, while allowing legacy applications to run well on new systems.



A. Chernoff, et al. FX!32: A profile-directed binary translator. IEEE Micro, 18(2):56–64, 1998. 3: K. Ebcioglu and E. R. Altman. DAISY: Dynamic compilation for 100% architectural compatibility. In ISCA, pages 26–37, 1997.

This paper presents LLVM — Low-Level Virtual Machine — a compiler framework that aims to make lifelong program analysis and transformation available for arbitrary software, and in a manner that is transparent to programmers. LLVM achieves this through two parts: (a) a code representation with several novel features that serves as a common representation for analysis, transformation, and code distribution...


Because of the differing goals and representations, LLVM is complementary to high-level virtual machines (e.g., Small Talk [18], Self [43], JVM 4, Microsoft’s CLI [33], and others), and not an alternative to these systems. It differs from these in three key ways. First, LLVM has no notion of high-level constructs such as classes, inheritance, or exception-handling semantics, even when compiling source languages with these features. Second, LLVM does not specify a runtime system or particular object model: it is low-level enough that the runtime system for a particular language can be implemented in LLVM itself. Indeed, LLVM can be used to implement high-level virtual machines. Third, LLVM does not guarantee type safety, memory safety, or language interoperability any more than the assembly language for a physical processor does.

很好奇这里举的例子只认识JVM,但为什么称JVM为high-level virtual machines呢?java不也是bytecode嘛?好想有时间去了解了解当时的JAVA手册,不知道是不是1997年版本的java虚拟机还没有bytecode?或是bytecode确实是高级语言?不过最早只能看到2006年的版本的即java se6见oracle官网的Java Language and Virtual Machine Specifications页面。


T. Lindholm and F. Yellin. The Java Virtual Machine Specification. Addison-Wesley, Reading, MA, 1997.

> We evaluate the effectiveness of the LLVM system with respect to three issues: (a) the size and effectiveness of the representation, including the ability to extract useful type information for C programs;


The detailed syntax and semantics of the representation are defined in the LLVM reference manual 5.



2.1 Overview of the LLVM Instruction Set

ABORT: 想知道除了SSA1,还有没有其他形式,这样才能知道SSA为什么用于编译器中间表示的优势和劣势。

The entire LLVM instruction set consists of only 31 opcodes.


LLVM uses SSA form as its primary code representation, i.e., each virtual register is written in exactly one instruction, and each use of a register is dominated by its definition. Memory locations in LLVM are not in SSA form because many possible locations may be modified at a single store through a pointer, making it difficult to construct a reasonably compact, explicit SSA code representation for such locations.


SSA form provides a compact def-use graph that simplifies many dataflow optimizations and enables fast, flow-insensitive algorithms to achieve many of the benefits of flow-sensitive algorithms without expensive dataflow analysis. Non-loop transformations in SSA form are further simplified because they do not encounter anti- or output dependences on SSA registers. Non-memory transformations are also greatly simplified because (unrelated to SSA) registers cannot have aliases.


2.2 Language-independent Type Information, Case, and GetElementPtr

Every SSA register and explicit memory object has an associated type, and all operations obey strict type rules.


This type information enables a broad class of high-level transformations on low-level code (for example, see Section 4.1.1).

DONE: 去看看4.1.1节,这样的类型系统如何支持在低层次语言上对高层次语言进行变换。文章里写的什么东西?

The LLVM type system includes source-language-indep endent primitive types with predefined sizes (void, bool, signed/unsigned integers from 8 to 64 bits, and single- and double-precision floating-point types). ... LLVM also includes (only) four derived types: pointers, arrays, structures, and functions. We believe that most high-level language data types are eventually represented using some combination of these four types in terms of their operational behavior. For example, C++ classes with inheritance are implemented using structures, functions, and arrays of function pointers, as described in Section 4.1.2.


  • void
  • bool
  • signed/unsigned integers from 8 to 64 bits
  • single-/double-precision floating-point


  • pointers
  • arrays
  • structures
  • functions


DONE: 详细阅读4.1.2节了解LLVM提供的类型如何组成复杂的类型。文章里的写的是什么东西?

2.3 Explicit Memory Allocation and Unified Memory Model

In LLVM, all addressable objects (“lvalues”) are explicitly allocated.


2.4 Function Calls and Exception Handling