常用数据
- uop长度56 bits
- uop容量2K uops
2022.02.21
Abstract
earlier detection of mispredicted branches是什么原理
2022.02.22
Intro
Such an ISA level abstraction enables processor vendors to implement an x86 instruction differently based on their custom micro-architectures
不同微架构uop设计不同
2022.09.28
Background
X86处理器解码部分由3部分构成
- instruction cache
- uop cache
- [TODO] 2003: Micro-operation cache: a power aware frontend for variable instruction length isa
- [TODO] 2016: Empirical study of the power consumption of the x86-64 instruction decoder
- loop cache
- [TODO] Intel Performance Optimization Manual https://cutt.ly/PX7i5v.
- [TODO] 1999: Energy and performance improvements in microprocessor design using a loop cache
- [TODO] 2000: Effective hardware-based two-way loop cache for high performance low power processors
- [TODO] 1999: Instruction fetch energy reduction using loop caches for embedded applications with small tight loops
- [TODO] 2018: Mobilizing the micro-ops: Exploiting context sensitive decoding for security and energy efficiency
定长微码设计[TODO] 2006: An approach for ing applications,” https://mahout.apache.org. implementing efficient superscalar cisc processors
Trace cache的缺点
- 重复代码,可能会存下3条trace CB、DB、EB都包含基本块B,自修该代码很麻烦
- 多次分支预测带来能耗损失
4. Methodology
使用的内部的仿真器。 参卡RTL模拟性能、硅性能,仿真器的性能精准。
5. Optimized Uop Cache Design
全文基本没有做控制信息存储增加的实现。 增加算法、增加cache line tag对空间和性能的影响的分析。
7. Related Work
- 软件压缩
- ucache压缩
- 和tcache比较
- code顺序调整