指令超集,每个核的指令集是子集。
从另一方面讲,每个核可以有不同的指令集拓展。
问题:软件碎片化?x86有很多拓展集,x86怎么解决的?fat binary或是损失性能?
思考
Core | ISA |
---|---|
single | |
multi | |
heterogeneous | single |
heterogeneous | multi |
composite |
3. ISA Feature Set Derivation
- Register Depth: 8, 16, 32, 64
- register spill (store)
- register refill (load)
- Register Width: 32, 64
- Opcode and Addressing mode complexity: RISC(m
- Predication
- Predication的含义参考 https://en.wikipedia.org/wiki/Predication_(computer_architecture) 比如x86的CMOV,FCMOV。
- Data-Parallel Execution
4. Compiler And Runtime Strategy
应该和2012.process_migration.devuyst.asplos.83类似
5. Decoder Design
Instruction Encoding
Furthermore, we increase the width of the macro-op queue by 2 bytes to account for the extra prefixes. Since predication support and greater register depth in our superset ISA could potentially require wider micro-op ISA encoding, we increase the width of the micro-op cache and the micro-op queue by 2 bytes. Finally, for our microx86 implementations, we replace the complex 1:4 decoder with another simple 1:1 decoder and forgo the microsequencing ROM.
Figure 4的图很具有参考意义。
好奇uop cache内部构造,如何做到精确中断,类似reorder buffer的结构嘛?
6. Methodology
micro-op fusion技术 [112].
Intel R 64 and IA-32 Architectures Optimization Reference Manual
使用了SimPoint简单搜索了一下,是用来加速仿真的。 知乎:Simpoint 在 GEM5 里加速仿真, 1 minute = N day(s)
- 使用Synopsys Design Compiler测decoder面积和功耗
- 使用McPAT测其他流水层次的面积和功耗。
使用超算进行计算消耗49733 core hours。
多核的方法未详细说。
7 Result
7.B Feature Sensitivity Analysis
Figure 10减少某一维度的异构,对面积造成的影响。 这些异构的影响的面积,挤占cache的面积。