2019.composite_isa.venkat.hpca.17.md

2023.02.09

指令超集，每个核的指令集是子集。

从另一方面讲，每个核可以有不同的指令集拓展。

问题：软件碎片化？x86有很多拓展集，x86怎么解决的？fat binary或是损失性能？

Core	ISA
single
multi
heterogeneous	single
heterogeneous	multi
	composite

3. ISA Feature Set Derivation

Register Depth: 8, 16, 32, 64
- register spill (store)
- register refill (load)
Register Width: 32, 64
Opcode and Addressing mode complexity: RISC(m
Predication
- Predication的含义参考 https://en.wikipedia.org/wiki/Predication_(computer_architecture) 比如x86的CMOV，FCMOV。
Data-Parallel Execution

2022.04.16

4. Compiler And Runtime Strategy

应该和2012.process_migration.devuyst.asplos.83类似

Furthermore, we increase the width of the macro-op queue by 2 bytes to account for the extra prefixes. Since predication support and greater register depth in our superset ISA could potentially require wider micro-op ISA encoding, we increase the width of the micro-op cache and the micro-op queue by 2 bytes. Finally, for our microx86 implementations, we replace the complex 1:4 decoder with another simple 1:1 decoder and forgo the microsequencing ROM.

Figure 4的图很具有参考意义。

好奇uop cache内部构造，如何做到精确中断，类似reorder buffer的结构嘛？

2022.04.18

6. Methodology

micro-op fusion技术 [112].

Intel R 64 and IA-32 Architectures Optimization Reference Manual

使用了SimPoint简单搜索了一下，是用来加速仿真的。知乎：Simpoint 在 GEM5 里加速仿真, 1 minute = N day(s)

使用Synopsys Design Compiler测decoder面积和功耗
使用McPAT测其他流水层次的面积和功耗。

使用超算进行计算消耗49733 core hours。

多核的方法未详细说。

7 Result

7.B Feature Sensitivity Analysis

Figure 10减少某一维度的异构，对面积造成的影响。这些异构的影响的面积，挤占cache的面积。

xieby1's notes

思考

3. ISA Feature Set Derivation

4. Compiler And Runtime Strategy

5. Decoder Design

Instruction Encoding

6. Methodology

7 Result

7.B Feature Sensitivity Analysis