想找历史版本的LLVM,可以看这里http://releases.llvm.org/

2019.10.21看到一个好玩的,在这篇文章的第二页,Introduction部分,

Nevertheless, it’s important to note that LLVM is not intended to be a universal compiler IR.

然而在上面那个网站的Introduction的部分,

It aims to be a "universal IR" of sorts, by being at a low enough level that high-level ideas may be cleanly mapped to it (similar to how microprocessors are "universal IR's", allowing many source languages to be mapped to them).

说法完全相反,哈哈哈哈哈哈哈哈哈哈!😅😂🤣

ABORT:

  • “想知道除了SSA1,还有没有其他形式,这样才能知道SSA为什么用于编译器中间表示的优势和劣势。”
2019.10.20
# [LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation](../../../Essays/IR/2004.LLVM-A_Compilation_Framework_for_Lifelong_Program_Analysis_Transformation.pdf)阅读笔记

ABSTRACT

1. INTRODUCTION

Allowing lifelong lifelong reoptimization of the program gives architects the power to evolve processors and exposed interfaces in more flexible ways 2 3, while allowing legacy applications to run well on new systems.

这个不就和二进制翻译的目标一致嘛,这样看来这个LLVM有点厉害了。看引用竟然就是两个有名的二进制翻译系统FX!32和DAISY嘛,有空可以看看这两篇文章。

2

A. Chernoff, et al. FX!32: A profile-directed binary translator. IEEE Micro, 18(2):56–64, 1998. 3: K. Ebcioglu and E. R. Altman. DAISY: Dynamic compilation for 100% architectural compatibility. In ISCA, pages 26–37, 1997.

This paper presents LLVM — Low-Level Virtual Machine — a compiler framework that aims to make lifelong program analysis and transformation available for arbitrary software, and in a manner that is transparent to programmers. LLVM achieves this through two parts: (a) a code representation with several novel features that serves as a common representation for analysis, transformation, and code distribution...

加粗部分便是我这次读这篇文章的重点关注点!


Because of the differing goals and representations, LLVM is complementary to high-level virtual machines (e.g., Small Talk [18], Self [43], JVM 4, Microsoft’s CLI [33], and others), and not an alternative to these systems. It differs from these in three key ways. First, LLVM has no notion of high-level constructs such as classes, inheritance, or exception-handling semantics, even when compiling source languages with these features. Second, LLVM does not specify a runtime system or particular object model: it is low-level enough that the runtime system for a particular language can be implemented in LLVM itself. Indeed, LLVM can be used to implement high-level virtual machines. Third, LLVM does not guarantee type safety, memory safety, or language interoperability any more than the assembly language for a physical processor does.

很好奇这里举的例子只认识JVM,但为什么称JVM为high-level virtual machines呢?java不也是bytecode嘛?好想有时间去了解了解当时的JAVA手册,不知道是不是1997年版本的java虚拟机还没有bytecode?或是bytecode确实是高级语言?不过最早只能看到2006年的版本的即java se6见oracle官网的Java Language and Virtual Machine Specifications页面。

4

T. Lindholm and F. Yellin. The Java Virtual Machine Specification. Addison-Wesley, Reading, MA, 1997.


2019.10.21
> We evaluate the effectiveness of the LLVM system with respect to three issues: (a) the size and effectiveness of the representation, including the ability to extract useful type information for C programs;

提到了本文将要讲中间表示的效率问题!


The detailed syntax and semantics of the representation are defined in the LLVM reference manual 5.

2. PROGRAM REPRESENTATION

详细的语法参考手册,

2.1 Overview of the LLVM Instruction Set

ABORT: 想知道除了SSA1,还有没有其他形式,这样才能知道SSA为什么用于编译器中间表示的优势和劣势。

The entire LLVM instruction set consists of only 31 opcodes.

仅仅只包含31操作码!!!真精简!


LLVM uses SSA form as its primary code representation, i.e., each virtual register is written in exactly one instruction, and each use of a register is dominated by its definition. Memory locations in LLVM are not in SSA form because many possible locations may be modified at a single store through a pointer, making it difficult to construct a reasonably compact, explicit SSA code representation for such locations.

这里提到了访存没有用SSA,这就很有意思了,能够在LLVM里比较SSA和非SSA的区别,有必要好好看一看这一部分!


SSA form provides a compact def-use graph that simplifies many dataflow optimizations and enables fast, flow-insensitive algorithms to achieve many of the benefits of flow-sensitive algorithms without expensive dataflow analysis. Non-loop transformations in SSA form are further simplified because they do not encounter anti- or output dependences on SSA registers. Non-memory transformations are also greatly simplified because (unrelated to SSA) registers cannot have aliases.

感觉这里每一句话都是一个研究方向,所以是不是有必要去查一下这些我目前还不懂的内容?(感觉有很多东西在目前看来是理所应当的吧)

2.2 Language-independent Type Information, Case, and GetElementPtr

Every SSA register and explicit memory object has an associated type, and all operations obey strict type rules.

在我看来就是将弱类型的编程语言(比如python,尽管那个时候还不支持python,但LLVM是有野心支持尽可能多的编程语言嘛;在比如说c里允许类型转换)尽量转变为强类型的编程语言。


This type information enables a broad class of high-level transformations on low-level code (for example, see Section 4.1.1).

DONE: 去看看4.1.1节,这样的类型系统如何支持在低层次语言上对高层次语言进行变换。文章里写的什么东西?


The LLVM type system includes source-language-indep endent primitive types with predefined sizes (void, bool, signed/unsigned integers from 8 to 64 bits, and single- and double-precision floating-point types). ... LLVM also includes (only) four derived types: pointers, arrays, structures, and functions. We believe that most high-level language data types are eventually represented using some combination of these four types in terms of their operational behavior. For example, C++ classes with inheritance are implemented using structures, functions, and arrays of function pointers, as described in Section 4.1.2.

4种基本类型,

  • void
  • bool
  • signed/unsigned integers from 8 to 64 bits
  • single-/double-precision floating-point

4种衍生类型,

  • pointers
  • arrays
  • structures
  • functions

LLVM的设计者认为其他复杂的类型皆可以又这些类型组成。

DONE: 详细阅读4.1.2节了解LLVM提供的类型如何组成复杂的类型。文章里的写的是什么东西?

2.3 Explicit Memory Allocation and Unified Memory Model

In LLVM, all addressable objects (“lvalues”) are explicitly allocated.

应该是左值的意思吧。左值全部都需要显式的分配空间。

2.4 Function Calls and Exception Handling