常用数据

  • uop长度56 bits
  • uop容量2K uops
2022.02.21

Abstract

earlier detection of mispredicted branches是什么原理

2022.02.22

Intro

Such an ISA level abstraction enables processor vendors to implement an x86 instruction differently based on their custom micro-architectures

不同微架构uop设计不同

2022.09.28

Background

X86处理器解码部分由3部分构成

  • instruction cache
  • uop cache
    • [TODO] 2003: Micro-operation cache: a power aware frontend for variable instruction length isa
    • [TODO] 2016: Empirical study of the power consumption of the x86-64 instruction decoder
  • loop cache
    • [TODO] Intel Performance Optimization Manual https://cutt.ly/PX7i5v.
    • [TODO] 1999: Energy and performance improvements in microprocessor design using a loop cache
    • [TODO] 2000: Effective hardware-based two-way loop cache for high performance low power processors
    • [TODO] 1999: Instruction fetch energy reduction using loop caches for embedded applications with small tight loops
    • [TODO] 2018: Mobilizing the micro-ops: Exploiting context sensitive decoding for security and energy efficiency

定长微码设计[TODO] 2006: An approach for ing applications,” https://mahout.apache.org. implementing efficient superscalar cisc processors

Trace cache的缺点

  • 重复代码,可能会存下3条trace CB、DB、EB都包含基本块B,自修该代码很麻烦
  • 多次分支预测带来能耗损失

4. Methodology

使用的内部的仿真器。 参卡RTL模拟性能、硅性能,仿真器的性能精准。

5. Optimized Uop Cache Design

全文基本没有做控制信息存储增加的实现。 增加算法、增加cache line tag对空间和性能的影响的分析。

  • 软件压缩
  • ucache压缩
  • 和tcache比较
  • code顺序调整