2023.12.01
A Top-Down Method for Performance Analysis and Counters Architecture
Introduction
Modern processors expose hundreds of performance events, any of which may or may not relate to the bottlenecks of a particular workload.
Q: [TODO]有没有什么方法/工具能利用起所有的performance events?
Background
Ivy Bridge的微架构图很有意思,重画了一遍:
Top-Down Analysis
Top Level breakdown
这是咋分类的? 没看懂它的分类依据啊!
Counter Architecture
如何判断性能计数器的开销高低? 文章没有详细说,只是给了几个开销高的引用。
low-cost hardware
Neither at-retirement tagging is required as in IBM POWER [6], nor complex structures with latency counters as in Accurate CPI Stacks proposals[1][8][9].
Results
Case Study 1: Matrix-Multiply
multiply2 => multiply3性能提升,IPC下降。 这反映出IPC或是占比并不是评价性能的唯一指标。 看绝对时间,或者看各部分的耗时、而不是占比。