2022.09.14

Micro Cache Adapter in Gem5

Micro Cache Adapter (MCA, UCA)

git commit: 95d7c1d

configs/common/Caches.py:

class L1Cache(Cache):
    assoc = 2
    tag_latency = 2
    data_latency = 2
    response_latency = 2
    mshrs = 4
    tgts_per_mshr = 20

class L1_ICache(L1Cache):
    is_read_only = True
    # Writeback clean lines as well
    writeback_clean = True

icache的查询延迟和数据均为2。 为和原gem5对照,将UCA的延迟设置为icache的延迟2+2=4。 指令发射分布的数据如下,

gem5-orig

system.cpu.numIssuedDist::samples              309598                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::mean               2.450507                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::stdev              2.065197                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::underflows                0      0.00%      0.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::0                     72040     23.27%     23.27% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::1                     46305     14.96%     38.23% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::2                     55312     17.87%     56.09% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::3                     46645     15.07%     71.16% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::4                     31274     10.10%     81.26% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::5                     23686      7.65%     88.91% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::6                     24134      7.80%     96.70% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::7                      8138      2.63%     99.33% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::8                      2064      0.67%    100.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::overflows                 0      0.00%    100.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::min_value                 0                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::max_value                 8                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::total                309598                       # Number of insts issued each cycle (Count)

xa64-bt

system.cpu.numIssuedDist::samples             1629797                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::mean               0.334398                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::stdev              0.501654                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::underflows                0      0.00%      0.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::0                   1106546     67.89%     67.89% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::1                    502574     30.84%     98.73% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::2                     20204      1.24%     99.97% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::3                       132      0.01%     99.98% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::4                        94      0.01%     99.98% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::5                       238      0.01%    100.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::6                         7      0.00%    100.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::7                         1      0.00%    100.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::8                         1      0.00%    100.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::overflows                 0      0.00%    100.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::min_value                 0                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::max_value                 8                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::total               1629797                       # Number of insts issued each cycle (Count)

xa64-bt比gem5-orig慢,是其7.328115001倍。 由此推测主要性能瓶颈在这里。 采用预取来缓解。

2022.09.15

每次取8条左右的uop,性能变为

system.cpu.numIssuedDist::samples              466450                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::mean               1.204727                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::stdev              1.267099                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::underflows                0      0.00%      0.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::0                    182367     39.10%     39.10% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::1                    115977     24.86%     63.96% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::2                     95149     20.40%     84.36% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::3                     42788      9.17%     93.53% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::4                     24408      5.23%     98.76% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::5                      4947      1.06%     99.83% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::6                       762      0.16%     99.99% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::7                        49      0.01%    100.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::8                         3      0.00%    100.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::overflows                 0      0.00%    100.00% # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::min_value                 0                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::max_value                 8                       # Number of insts issued each cycle (Count)
system.cpu.numIssuedDist::total                466450                       # Number of insts issued each cycle (Count)

平均发射数由0.334398变为1.204727,3.6倍。 性能变为3.47倍。

目前是原生x86性能的70%。 平均发射率仍然不高,才1.204727,原生2.450507。 原生fetch stall周期数69007,xa64-bt fetch stall周期数345589+2788=348377,为原生的5倍! squashCycles也较高,但我觉得主要矛盾仍然是fetch发射平均数。 十分有必要加入针对ucache的预取机制。