xieby1's Manual for isa, a new DSL(Domain Specific Language)

2022.06.21

TLDR: isa_parser.py

  • defines a DSL(Domain Specific Language) by Lex&Yacc, called isa,
  • generates decoder code (cpp) from isa language, called isa parser.

Resources:

  • https://www.gem5.org/documentation/general_docs/architecture_support/isa_parser/
  • gem5/src/arch/isa_parser/isa_parser.py

2022.07.16

declaration section

Format definitions

Syntax

def_format : DEF FORMAT ID LPAREN param_list RPAREN CODELIT SEMI
  • ID: python string
  • param_list: python argument, therefore use asterisk(*) to represent variadic arguments
  • CODELIT: python code
    • Variables name and Name are available, which are inst name, see inst :

Source

def p_def_format(self, t):
  (id, params, code) = (t[3], t[5], t[7])
  self.defFormat(id, params, code, t.lexer.lineno)
class Format(object):
  def __init__(self, id, params, code):
    ...
    self.user_code = compile(fixPythonIndentation(code), label, 'exec')
    ...
    f = '''def defInst(_code, _context, %s):
            my_locals = vars().copy()
            exec(_code, _context, my_locals)
            return my_locals\n''' % param_list
    c = compile(f, label + ' wrapper', 'exec')
    exec(c, globals())
    self.func = defInst
  ...

Format.__init__ define a func defInst on the fly. defInst will be called by defineInst in decode section. Therefore CODELIT will be executed by defineInst. defineInst add name&Name of defined Inst to exec context.

class Format(object):
  ...
  def defineInst(self, parser, name, args, lineno):
    ...
    context.update({ 'name' : name, 'Name' : Name })
    ...
    vars = self.func(self.user_code, context, *args[0], **args[1])
    ...

Template

definition

Suffix doest not influence template behaviors. It is just naming convention. Gem5 use following 4 prefixes,

  • Declare: declaration (header output) templates

  • Decode: decode-block templates

  • Constructor: decoder output templates

  • Execute: exec output templates

  • src/arch/isa_parser/isa_parser.py:

    def p_def_template(self, t):
        'def_template : DEF TEMPLATE ID CODELIT SEMI'
        if t[3] in self.templateMap:
            print("warning: template %s already defined" % t[3])
        self.templateMap[t[3]] = Template(self, t[4])
    

    Every template definition is added to templateMap.

Subtitution

  • src/arch/isa_parser/isa_parser.py:

    class Template(object):
      ...
      def subst(self, d):
        ...
        myDict = self.parser.templateMap.copy()
        ...
        if isinstance(d, InstObjParams):
          ...
          myDict.update(d.__dict__)
          ...
          myDict['op_decl'] = operands.concatAttrStrings('op_decl')
          ...
          myDict['op_rd'] = operands.concatAttrStrings('op_rd')
          ...
          for op_desc in reordered:
            ...
            op_wb_str = op_desc.op_wb + op_wb_str
          myDict['op_wb'] = op_wb_str
        ...
        return template % myDict
    

    code will be searched for operands, for more details see Operand definitions.

    The op_decl, op_rd and op_wb is generated by operands analysis. And the user-defined replacement is allowed.

Output blocks

Let blocks

Bitfield definitions: def [signed] bitfield NAME ...

Operand definitions

Syntax

def operand {{
  'op_name': tuple
}};
  • tuple:

    (base_cls_name, dflt_ext, reg_spec, flags, sort_pri, [read_code[, write_code[, read_predicat[e, write_predicate]]]])
    

Source

src/arch/x86/isa/operands.isa:

def operands {{
  ...
  'DestReg': intReg('dest', 5),
  ...
}};

intReg is defined in the same file.

def intReg(idx, id):
  return ('IntReg', 'uqw', idx, 'IsInteger', id)

This isa file is parsed by src/arch/isa_parser/isa_parser.py.

  • src/arch/isa_parser/isa_parser.py:

    class ISAParser(Grammar):
      ...
      def p_def_operands(self, t):
        'def_operands : DEF OPERANDS CODELIT SEMI'
        ...
        user_dict = eval('{' + t[3] + '}', self.exportContext)
        ...
        self.buildOperandNameMap(user_dict, t.lexer.lineno)
      ...
    

    After parsing user_dict contains a entry user_dict[DestReg] = ('IntReg', 'uqw', 'dest', 'IsInteger', 5).

    • src/arch/isa_parser/isa_parser.py:

      def buildOperandNameMap(self, user_dict, lineno):
        ...
        for op_name, val in user_dict.items():
          val += (None, None, None, None)
          base_cls_name, dflt_ext, reg_spec, flags, sort_pri, \
          read_code, write_code, read_predicate, write_predicate = val[:9]
          ...
          cls_name = base_cls_name + '_' + op_name
          ...
          base_cls = eval(base_cls_name + 'Operand')
          ...
          operand_name[op_name] = type(cls_name, (base_cls,), tmp_dict)
        self.operandNameMap.update(operand_name)
      

      buildOperandNameMap shows avialable args for an operand definition. buildOperandNameMap derives a class named cls_name from base_cls, where cls_name = IntReg_DestReg, base_cls = IntRegOperand. And every thing in operand definition is add to derived class IntReg_DestReg. Finally, an entry {"DestReg": class IntReg_DestReg} is added to operandNameMap.

Operand type definitions

Syntax

def operand_types {{
  'typename' : 'ctype'
}};

Source

src/arch/x86/isa/microops/limmop.isa:

def template MicroLimmOpExecute {{
    Fault
    %(class_name)s::execute(ExecContext *xc,
            Trace::InstRecord *traceData) const
    {
        %(op_decl)s;
        %(op_rd)s;
        %(code)s;
        %(op_wb)s;
        return NoFault;
    }
}};
iop = InstObjParams("limm", "Limm", base,
      {"code" : "DestReg = merge(DestReg, dest, imm64, dataSize);"})
exec_output += MicroLimmOpExecute.subst(iop)

code will be replaced by code defined in InstObjParams above. op_decl, op_rd, op_wb will be generated by parsing what are code needed. What are code needed means which operands are used.

  • src/arch/isa_parser/isa_parser.py:

    class InstObjParams(object):
      def __init__(self, parser, mnem, class_name, base_class = '',
                   snippets = {}, opt_args = []):
        ...
        self.operands = OperandList(parser, compositeCode)
        ...
    

    compositeCode is code passed as parameter to InstObjParams, is "DestReg = merge(DestReg, dest, imm64, dataSize);".

    • ./src/arch/isa_parser/operand_list.py:

      class OperandList(object):
        def __init__(self, parser, code):
          ...
          for match in parser.operandsRE().finditer(code):
            ...
      

      A RE is applied to code. The definition of RE is listed below.

      • src/arch/isa_parser/isa_parser.py:

        def operandsRE(self):
            if not self._operandsRE:
                self.buildOperandREs()
            return self._operandsRE
        
        def buildOperandREs(self):
          operands = list(self.operandNameMap.keys())
          ...
          operandsREString = r'''
          (?<!\w)      # neg. lookbehind assertion: prevent partial matches
          ((%s)(?:_(%s))?)   # match: operand with optional '_' then suffix
          (?!\w)       # neg. lookahead assertion: prevent partial matches
          ''' % ('|'.join(operands), '|'.join(extensions))
        

        The re is (?<!\w)((%s)(?:_(%s))?)(?!\w). This re contains 3 captured groups,

        • group1: ((%s)(?:_(%s))?)
        • group2: the first (%s)
        • group3: the second (%s)

        Note: https://regexr.com/ is good to help you pick up RE quickly.

      RE will capture all operands in code. The captured operands are processed as below.

      class OperandList(object):
        def __init__(self, parser, code):
          ...
          for match in parser.operandsRE().finditer(code):
            op = match.groups()
            (op_full, op_base, op_ext) = op
            ...
            # see if we've already seen this one
            op_desc = self.find_base(op_base)
            if op_desc:
              ...
            else:
              # new operand: create new descriptor
              op_desc = parser.operandNameMap[op_base](parser,
              op_full, op_ext, is_src, is_dest)
              ...
      

      The matched operand will be checked whether it has been added to operand list. I focus on operand has not been added, in which situation, a new operand is created. Assume the to-be-created operand is DestReg, therefore op_base is DestReg. And operandNameMap[op_base] is class IntReg_DestReg, which is created in operands definition section. class IntReg_DestReg is a subclass of IntRegOperand.

      As a result, the constructor of class IntReg_DestReg will be called.

      Finally,

      class OperandList(object):
        def __init__(self, parser, code):
          ...
          for op_desc in self.items:
            op_desc.finalize(self.predRead, self.predWrite)
      

      self.predRead and self.predWrite are bool variables, which indicate whether conditional read/write exists in current operand list.

      • ./arch/isa_parser/operand_types.py:

      I focus on finalizing a dest reg.

      def finalize(self, predRead, predWrite):
        ...
        if self.is_dest:
          self.op_wb = self.makeWrite(predWrite)
          self.op_dest_decl = self.makeDecl()
        else:
          self.op_wb = ''
          self.op_dest_decl = ''
        ...
      
      • ./arch/isa_parser/operand_types.py:

        def makeWrite(self, predWrite):
          ...
          if predWrite:
            ...
          else:
              wcond = ''
              windex = '%d' % self.dest_reg_idx
          wb = '''
          %s
          {
              %s final_val = %s;
              xc->setIntRegOperand(this, %s, final_val);\n
              if (traceData) { traceData->setData(final_val); }
          }''' % (wcond, self.ctype, self.base_name, windex)
          return wb
        
        • If we focus on no conditional code, then wcond = ''.
        • self.ctype is defined in src/arch/isa_parser/operand_types.py: Operand: __init__. It is needs parser def_operand_types, which is simple. self.ctype is uint64_t, according to dflt_ext = uqw.
        • self.base_name is defined in src/arch/isa_parser/isa_parser.py, which is equal to op_name aka DestReg.
        • windex = self.dest_reg_idx is defined is ./src/arch/isa_parser/operand_list.py, which is sorted by sort_pri. There is only one dest, so windex here is 0.

Namespace declaration: namespace NAME;

decode section

specifying instruction formats

Syntax

inst : ID LPAREN arg_list RPAREN
  • ID: python string: inst name
    • arg_list:
inst : ID DBLCOLON ID LPAREN arg_list RPAREN

docode block defaults

preprocessor directive handling

TODO:

0x17: MOV(Bv,Iv); => ?

  • ./src/arch/isa_parser/isa_parser.py

    def p_inst_0(self, t):
      'inst : ID LPAREN arg_list RPAREN'
      ...
      codeObj = currentFormat.defineInst(self, t[1], t[3], t.lexer.lineno)
      ...
    

    or

    def p_inst_1(self, t):
      'inst : ID DBLCOLON ID LPAREN arg_list RPAREN'
      ...
      codeObj = format.defineInst(self, t[3], t[5], t.lexer.lineno)
      ...
    

    Use the defined format to define an inst. MOV use the Inst format, which is define in src/arch/x86/isa/formats/multi.isa.

    def format Inst(*opTypeSet) {{
        blocks = specializeInst(Name, list(opTypeSet), EmulEnv())
        (header_output, decoder_output,
         decode_block, exec_output) = blocks.makeList()
    }};
    
    • ./src/arch/isa_parser/isa_parser.py:

      class Format(object):
        ...
        def defineInst(self, parser, name, args, lineno):
          ...
          vars = self.func(self.user_code, context, *args[0], **args[1])
          ...
          return GenCode(parser, **vars)
      

      name is MOV. args are [Bv, Iv] user_code is Inst format code in multi.isa, see above.

      Executing user_code is a very long procedure. I would like to depict GenCode first.

      After parsing isa file, GenCode object's emit() will be called. For details, see isa_parser.py p_ prefixed && inst related function.

      • ./src/arch/isa_parser/isa_parser.py:

        class GenCode(object):
          ...
          def emit(self):
            if self.header_output:
              self.parser.get_file('header').write(self.header_output)
            if self.decoder_output:
              self.parser.get_file('decoder').write(self.decoder_output)
            if self.exec_output:
              self.parser.get_file('exec').write(self.exec_output)
            if self.decode_block:
              self.parser.get_file('decode_block').write(self.decode_block)
          ...
        

      Before GenCode, user_code is executed.

      • src/arch/x86/isa/specialize.isa:

        def specializeInst(Name, opTypes, env):
          ...
          return genMacroop(Name, env)
        

        specializeInst will add suffix to Name according to opTypes, add regs to env according to opTypes.

        E.g. Name="MOV", opTypes=[Bv, Iv]. Bv will prepend _R to Name, and env.addReg(InstRegIndex) Iv will prepend _I to Name. After specialization, Name="MOV_R_I".

        • src/arch/x86/isa/macroop.isa:

          def genMacroop(Name, env):
              ...
              macroop = macroopDict[Name]
              if not macroop.declared:
                  ...
                  blocks.header_output = macroop.getDeclaration()
                  blocks.decoder_output = macroop.getDefinition(env)
                  macroop.declared = True
              blocks.decode_block = "return %s;\n" % macroop.getAllocator(env)
              return blocks
          

          macroopDict is created in src/arch/x86/isa/microasm.isa. In short, macroopDict's element is src/arch/x86/isa/macroop.isa: class X86Macroop(Combinational_Macroop). The microops contains in a macroop is declared in src/arch/x86/isa/microops/*.isa. For details, see microop.md.

          Here we focus on macroop.getDeclaration(), macroop.getDefinition(env) and macroop.getAllocator(env).

          • src/arch/x86/isa/macroop.isa:

            def getDeclaration(self):
              ...
              iop = InstObjParams(self.getMnemonic(), self.name, "Macroop",
                {"code" : "",
                 "declareLabels" : declareLabels
                })
              return MacroDeclare.subst(iop);
            

            By using MacroDeclare template, macroop declaration is generated.

            • src/arch/x86/isa/macroop.isa:

              def template MacroDeclare {{
                ...
              }};
              

            The output is write to

            • build/X86/arch/x86/generated/decoder-ns.hh.inc:

              ...
                class MOV_R_I : public Macroop
                {...};
              ...
              
          • src/arch/x86/isa/macroop.isa:

            def getDefinition(self, env):
              ...
              for op in self.microops:
                ...
                allocMicroops += \
                  "microops[%d] = %s;\n" % \
                  (micropc, op.getAllocator(flags))
                ...
              ...
              iop = InstObjParams(self.getMnemonic(), self.name, "Macroop",
                {"code" : "", "num_microops" : numMicroops,
                 "alloc_microops" : allocMicroops,
                 "adjust_env" : self.adjust_env,
                 "adjust_imm" : self.adjust_imm,
                 "adjust_disp" : self.adjust_disp,
                 "disassembly" : env.disassembly,
                 "regSize" : regSize,
                 "init_env" : self.initEnv})
              return MacroConstructor.subst(iop) + MacroDisassembly.subst(iop);
            

            MacroConstructor generates cpp class constructor, e.g. x86_macroop::MOV_R_I::MOV_R_I(...){...}.

            MacroDisassembly generates generateDisassembly, e.g. std::string x86_macroop::MOV_R_I::generateDisassembly(...){...}.

          • src/arch/x86/isa/macroop.isa:

            def getAllocator(self, env):
              return "new x86_macroop::%s(machInst, %s)" % \
                (self.name, env.getAllocator())
            

            macroop.getAllocator(env) expands to

            new x86_macroop::MOV_R_I(machInst, EmulEnv((OPCODE_OP_BOTTOM3 | (REX_B << 3)),
                                     0,
                                     1,
                                     ADDRSIZE,
                                     STACKSIZE))
            

            The expansion of env (env.getAllocator()) is as follow,

            • src/arch/x86/isa/macroop.isa:

              def getAllocator(self):
                ...
                return '''EmulEnv(%(reg)s,
                                  %(regm)s,
                                  %(dataSize)s,
                                  %(addressSize)s,
                                  %(stackSize)s)''' % \
                    self.__dict__
              

Appendix

TODO:

Instruction behaviors are described by

  • C++ code
  • bitfield operators
  • operand type qualifiers

generated files

generated files

TODO:

generic_cpu_exec.cc inst-constrs.cc

sectionfile
decode_blockdecode-method.cc.inc
headerdecoder.hh, decoder-g.hh.inc, decoder-ns.hh.inc
decoderdecoder.cc, decoder-g.cc.inc, decoder-ns.cc.inc
execexec-g.cc.inc, exec-ns.cc.inc

see src/arch/isa_parser/isa_parser.py:

# Get the file object for emitting code into the specified section
# (header, decoder, exec, decode_block).
def get_file(self, section):
  ...

# Change the file suffix of a base filename:
#   (e.g.) decoder.cc -> decoder-g.cc.inc for 'global' outputs
def suffixize(self, s, sec):
  ...

suffix

  • -ns: namespace
  • -g: global