2022.06.28

O3 CPU in Gem5

Q

  • TODO: Is renameing arch specific or general?
  • TODO: rename related with thread? I guess CPU thread, not OS thread

CPU Stages

build/X86/cpu/o3/cpu.cc:

  • fetch
  • decode
  • rename
  • iew(issue, execute, writeback)
  • commit

Fetch (Actual Decode)

  • Q: DONE: why fetch stage contains decode? src/cpu/o3/fetch.cc: 1199 auto *dec_ptr = decoder[tid];

  • A: src/cpu/o3/decode.hh:

    Decode class handles both single threaded and SMT decode. Its width is specified by the parameters; each cycles it tries to decode that many instructions. Because instructions are actually decoded when the StaticInst is created, this stage does not do much other than check any PC-relative branches.

  • src/cpu/o3/fetch.cc

    fetch.tick()

    • src/cpu/o3/fetch.cc

      void Fetch::fetch(bool &status_change) {
        ...
        /// `staticInst` microop is fetched from `curMacroop`.
        /// `curMacroop` is a `staticInst` macroop.
        /// File decode.md notes how binary code is decoded to macroop `staticInst`.
        ...
        DynInstPtr instruction = buildInst(tid, staticInst, curMacroop, this_pc, *next_pc, true);
      }
      

      buildInst connects DynInst and StaticInst!

      • src/cpu/o3/fetch.cc:

        DynInstPtr instruction = new (arrays) DynInst(
          arrays, staticInst, curMacroop, this_pc, next_pc, seq, cpu);
        

        TODO: how new (arrays) DynInst works? This is a placement new.

Rename

  • src/cpu/o3/dyn_inst.hh:

    setIntRegOperand

    • renamedDestIdx

IEW

  • src/cpu/o3/cpu.cc:

    gem5::o3::CPU::tick

    • src/cpu/o3/iew.cc:

      gem5::o3::IEW::tick

      • dispatch(tid);

      • executeInsts()

      • schedule inst will be executed next cycle

        TODO: no thread here?

        instQueue.scheduleReadyInsts();

        • src/cpu/o3/dyn_inst.cc:

          gem5::o3::DynInst::execute

          • build/X86/arch/x86/generated/exec-ns.cc.inc:

            gem5::X86ISAInst::LimmBig::execute

TODO: DynInst diff StaticInst?

Issue

  • Q: DONE: What is src/cpu/o3/inst_queue.hh: listOrder?

  • A: one of internal data structures for InstructionQueue.

List that contains the age order of the oldest instruction of each ready queue. Used to select the oldest instruction available among op classes.

/** List of ready instructions, per op class.  They are separated by op
*  class to allow for easy mapping to FUs.
*/
ReadyInstQueue readyInsts[Num_OpClasses];

This is a global order for all dynamic inst.

// inst sequence type, used to order instructions in the ready list,
// if this rolls over the ready list order temporarily will get messed
// up, but execution will continue and complete correctly
typedef uint64_t InstSeqNum;

FU (Function Unit?)

  • Q: inst latency?

  • A: lyw: src/cpu/o3/inst_queue.cc: void InstructionQueue::scheduleReadyInsts()

  • Q: DONE: how microop & opClass is related?

  • A: x86 microop constructor, e.g.

    LimmBig::LimmBig(...) :
      X86ISA::RegOpT<...> (..., IntAluOp, ...)
    

Initialized by python, see m5out/config.ini,

Example

./build/X86/gem5.debug --debug-flags Exec,IntRegs configs/example/se.py --cpu-type O3CPU --caches -c ~/Gist/hello/hello

system.cpu.fuPool:

FUListopListopClassopLatpipelined
0-IntAlu1true
10IntMult3true
11IntDiv1false
20FloatAdd2true
21FloatCmp2true
30FloatMult4true
31FloatMultAcc5true
32FloatMisc3true
33FloatDiv12false
34FloatSqrt24false
40MemRead1true
41FloatMemRead1true
70MemWrite1true
71FloatMemWrite1true
80MemRead1true
81MemWrite1true
82FloatMemRead1true
83FloatMemWrite1true
9-IprAccess3false

Simd FU seems un-initialized

FUListopListopClassopLatpipelined
500SimdAdd1true
501SimdAddAcc1true
502SimdAlu1true
503SimdCmp1true
504SimdCvt1true
505SimdMisc1true
506SimdMult1true
507SimdMultAcc1true
508SimdShift1true
509SimdShiftAcc1true
510SimdDiv1true
511SimdSqrt1true
512SimdFloatAdd1true
513SimdFloatAlu1true
514SimdFloatCmp1true
515SimdFloatCvt1true
516SimdFloatDiv1true
517SimdFloatMisc1true
518SimdFloatMult1true
519SimdFloatMultAcc1true
520SimdFloatSqrt1true
521SimdReduceAdd1true
522SimdReduceAlu1true
523SimdReduceCmp1true
524SimdFloatReduceAdd1true
525SimdFloatReduceCmp1true
6-SimdPredAlu1true

Memory Model

TODO: needsTSO