xieby1's Manual for isa, a new DSL(Domain Specific Language)
TLDR: isa_parser.py
- defines a DSL(Domain Specific Language) by Lex&Yacc, called isa,
- generates decoder code (cpp) from isa language, called isa parser.
Resources:
- https://www.gem5.org/documentation/general_docs/architecture_support/isa_parser/
- gem5/src/arch/isa_parser/isa_parser.py
declaration section
Format definitions
Syntax
def_format : DEF FORMAT ID LPAREN param_list RPAREN CODELIT SEMI
- ID: python string
- param_list: python argument, therefore use asterisk(*) to represent variadic arguments
- CODELIT: python code
- Variables nameandNameare available, which are inst name, seeinst :
 
- Variables 
Source
def p_def_format(self, t):
  (id, params, code) = (t[3], t[5], t[7])
  self.defFormat(id, params, code, t.lexer.lineno)
class Format(object):
  def __init__(self, id, params, code):
    ...
    self.user_code = compile(fixPythonIndentation(code), label, 'exec')
    ...
    f = '''def defInst(_code, _context, %s):
            my_locals = vars().copy()
            exec(_code, _context, my_locals)
            return my_locals\n''' % param_list
    c = compile(f, label + ' wrapper', 'exec')
    exec(c, globals())
    self.func = defInst
  ...
Format.__init__ define a func defInst on the fly.
defInst will be called by defineInst in decode section.
Therefore CODELIT will be executed by defineInst.
defineInst add name&Name of defined Inst to exec context.
class Format(object):
  ...
  def defineInst(self, parser, name, args, lineno):
    ...
    context.update({ 'name' : name, 'Name' : Name })
    ...
    vars = self.func(self.user_code, context, *args[0], **args[1])
    ...
Template
definition
Suffix doest not influence template behaviors. It is just naming convention. Gem5 use following 4 prefixes,
- 
Declare: declaration (header output) templates
- 
Decode: decode-block templates
- 
Constructor: decoder output templates
- 
Execute: exec output templates
- 
src/arch/isa_parser/isa_parser.py: def p_def_template(self, t): 'def_template : DEF TEMPLATE ID CODELIT SEMI' if t[3] in self.templateMap: print("warning: template %s already defined" % t[3]) self.templateMap[t[3]] = Template(self, t[4])Every template definition is added to templateMap.
Subtitution
- 
src/arch/isa_parser/isa_parser.py: class Template(object): ... def subst(self, d): ... myDict = self.parser.templateMap.copy() ... if isinstance(d, InstObjParams): ... myDict.update(d.__dict__) ... myDict['op_decl'] = operands.concatAttrStrings('op_decl') ... myDict['op_rd'] = operands.concatAttrStrings('op_rd') ... for op_desc in reordered: ... op_wb_str = op_desc.op_wb + op_wb_str myDict['op_wb'] = op_wb_str ... return template % myDictcodewill be searched for operands, for more details see Operand definitions.The op_decl,op_rdandop_wbis generated by operands analysis. And the user-defined replacement is allowed.
Output blocks
Let blocks
Bitfield definitions: def [signed] bitfield NAME ...
Operand definitions
Syntax
def operand {{
  'op_name': tuple
}};
- 
tuple: (base_cls_name, dflt_ext, reg_spec, flags, sort_pri, [read_code[, write_code[, read_predicat[e, write_predicate]]]])
Source
src/arch/x86/isa/operands.isa:
def operands {{
  ...
  'DestReg': intReg('dest', 5),
  ...
}};
intReg is defined in the same file.
def intReg(idx, id):
  return ('IntReg', 'uqw', idx, 'IsInteger', id)
This isa file is parsed by src/arch/isa_parser/isa_parser.py.
- 
src/arch/isa_parser/isa_parser.py: class ISAParser(Grammar): ... def p_def_operands(self, t): 'def_operands : DEF OPERANDS CODELIT SEMI' ... user_dict = eval('{' + t[3] + '}', self.exportContext) ... self.buildOperandNameMap(user_dict, t.lexer.lineno) ...After parsing user_dictcontains a entryuser_dict[DestReg] = ('IntReg', 'uqw', 'dest', 'IsInteger', 5).- 
src/arch/isa_parser/isa_parser.py: def buildOperandNameMap(self, user_dict, lineno): ... for op_name, val in user_dict.items(): val += (None, None, None, None) base_cls_name, dflt_ext, reg_spec, flags, sort_pri, \ read_code, write_code, read_predicate, write_predicate = val[:9] ... cls_name = base_cls_name + '_' + op_name ... base_cls = eval(base_cls_name + 'Operand') ... operand_name[op_name] = type(cls_name, (base_cls,), tmp_dict) self.operandNameMap.update(operand_name)buildOperandNameMapshows avialable args for an operand definition.buildOperandNameMapderives a class namedcls_namefrombase_cls, wherecls_name = IntReg_DestReg,base_cls = IntRegOperand. And every thing in operand definition is add to derived classIntReg_DestReg. Finally, an entry{"DestReg": class IntReg_DestReg}is added tooperandNameMap.
 
- 
Operand type definitions
Syntax
def operand_types {{
  'typename' : 'ctype'
}};
Source
src/arch/x86/isa/microops/limmop.isa:
def template MicroLimmOpExecute {{
    Fault
    %(class_name)s::execute(ExecContext *xc,
            Trace::InstRecord *traceData) const
    {
        %(op_decl)s;
        %(op_rd)s;
        %(code)s;
        %(op_wb)s;
        return NoFault;
    }
}};
iop = InstObjParams("limm", "Limm", base,
      {"code" : "DestReg = merge(DestReg, dest, imm64, dataSize);"})
exec_output += MicroLimmOpExecute.subst(iop)
code will be replaced by code defined in InstObjParams above.
op_decl, op_rd, op_wb will be generated by parsing what are code needed.
What are code needed means which operands are used.
- 
src/arch/isa_parser/isa_parser.py: class InstObjParams(object): def __init__(self, parser, mnem, class_name, base_class = '', snippets = {}, opt_args = []): ... self.operands = OperandList(parser, compositeCode) ...compositeCodeiscodepassed as parameter toInstObjParams, is"DestReg = merge(DestReg, dest, imm64, dataSize);".- 
./src/arch/isa_parser/operand_list.py: class OperandList(object): def __init__(self, parser, code): ... for match in parser.operandsRE().finditer(code): ...A RE is applied to code. The definition of RE is listed below.- 
src/arch/isa_parser/isa_parser.py: def operandsRE(self): if not self._operandsRE: self.buildOperandREs() return self._operandsREdef buildOperandREs(self): operands = list(self.operandNameMap.keys()) ... operandsREString = r''' (?<!\w) # neg. lookbehind assertion: prevent partial matches ((%s)(?:_(%s))?) # match: operand with optional '_' then suffix (?!\w) # neg. lookahead assertion: prevent partial matches ''' % ('|'.join(operands), '|'.join(extensions))The re is (?<!\w)((%s)(?:_(%s))?)(?!\w). This re contains 3 captured groups,- group1: ((%s)(?:_(%s))?)
- group2: the first (%s)
- group3: the second (%s)
 Note: https://regexr.com/ is good to help you pick up RE quickly. 
- group1: 
 RE will capture all operands in code. The captured operands are processed as below.class OperandList(object): def __init__(self, parser, code): ... for match in parser.operandsRE().finditer(code): op = match.groups() (op_full, op_base, op_ext) = op ... # see if we've already seen this one op_desc = self.find_base(op_base) if op_desc: ... else: # new operand: create new descriptor op_desc = parser.operandNameMap[op_base](parser, op_full, op_ext, is_src, is_dest) ...The matched operand will be checked whether it has been added to operand list. I focus on operand has not been added, in which situation, a new operand is created. Assume the to-be-created operand is DestReg, therefore op_baseisDestReg. AndoperandNameMap[op_base]isclass IntReg_DestReg, which is created in operands definition section.class IntReg_DestRegis a subclass ofIntRegOperand.As a result, the constructor of class IntReg_DestRegwill be called.Finally, class OperandList(object): def __init__(self, parser, code): ... for op_desc in self.items: op_desc.finalize(self.predRead, self.predWrite)self.predReadandself.predWriteare bool variables, which indicate whether conditional read/write exists in current operand list.- ./arch/isa_parser/operand_types.py:
 I focus on finalizing a dest reg. def finalize(self, predRead, predWrite): ... if self.is_dest: self.op_wb = self.makeWrite(predWrite) self.op_dest_decl = self.makeDecl() else: self.op_wb = '' self.op_dest_decl = '' ...- 
./arch/isa_parser/operand_types.py: def makeWrite(self, predWrite): ... if predWrite: ... else: wcond = '' windex = '%d' % self.dest_reg_idx wb = ''' %s { %s final_val = %s; xc->setIntRegOperand(this, %s, final_val);\n if (traceData) { traceData->setData(final_val); } }''' % (wcond, self.ctype, self.base_name, windex) return wb- If we focus on no conditional code,
then wcond = ''.
- self.ctypeis defined in src/arch/isa_parser/operand_types.py:- Operand: __init__. It is needs parser def_operand_types, which is simple.- self.ctypeis uint64_t, according to- dflt_ext = uqw.
- self.base_nameis defined in src/arch/isa_parser/isa_parser.py, which is equal to- op_nameaka- DestReg.
- windex = self.dest_reg_idxis defined is ./src/arch/isa_parser/operand_list.py, which is sorted by- sort_pri. There is only one dest, so- windexhere is 0.
 
- If we focus on no conditional code,
then 
 
- 
 
- 
Namespace declaration: namespace NAME;
decode section
specifying instruction formats
Syntax
inst : ID LPAREN arg_list RPAREN
- ID: python string: inst name
- arg_list:
 
inst : ID DBLCOLON ID LPAREN arg_list RPAREN
docode block defaults
preprocessor directive handling
TODO:
0x17: MOV(Bv,Iv); => ?
- 
./src/arch/isa_parser/isa_parser.py def p_inst_0(self, t): 'inst : ID LPAREN arg_list RPAREN' ... codeObj = currentFormat.defineInst(self, t[1], t[3], t.lexer.lineno) ...or def p_inst_1(self, t): 'inst : ID DBLCOLON ID LPAREN arg_list RPAREN' ... codeObj = format.defineInst(self, t[3], t[5], t.lexer.lineno) ...Use the defined format to define an inst. MOVuse theInstformat, which is define in src/arch/x86/isa/formats/multi.isa.def format Inst(*opTypeSet) {{ blocks = specializeInst(Name, list(opTypeSet), EmulEnv()) (header_output, decoder_output, decode_block, exec_output) = blocks.makeList() }};- 
./src/arch/isa_parser/isa_parser.py: class Format(object): ... def defineInst(self, parser, name, args, lineno): ... vars = self.func(self.user_code, context, *args[0], **args[1]) ... return GenCode(parser, **vars)nameisMOV.argsare[Bv, Iv]user_codeisInstformat code in multi.isa, see above.Executing user_codeis a very long procedure. I would like to depictGenCodefirst.After parsing isa file, GenCodeobject'semit()will be called. For details, see isa_parser.pyp_prefixed && inst related function.- 
./src/arch/isa_parser/isa_parser.py: class GenCode(object): ... def emit(self): if self.header_output: self.parser.get_file('header').write(self.header_output) if self.decoder_output: self.parser.get_file('decoder').write(self.decoder_output) if self.exec_output: self.parser.get_file('exec').write(self.exec_output) if self.decode_block: self.parser.get_file('decode_block').write(self.decode_block) ...
 Before GenCode, user_codeis executed.- 
src/arch/x86/isa/specialize.isa: def specializeInst(Name, opTypes, env): ... return genMacroop(Name, env)specializeInstwill add suffix toNameaccording toopTypes, add regs toenvaccording toopTypes.E.g. Name="MOV",opTypes=[Bv, Iv].Bvwill prepend_RtoName, andenv.addReg(InstRegIndex)Ivwill prepend_ItoName. After specialization,Name="MOV_R_I".- 
src/arch/x86/isa/macroop.isa: def genMacroop(Name, env): ... macroop = macroopDict[Name] if not macroop.declared: ... blocks.header_output = macroop.getDeclaration() blocks.decoder_output = macroop.getDefinition(env) macroop.declared = True blocks.decode_block = "return %s;\n" % macroop.getAllocator(env) return blocksmacroopDictis created in src/arch/x86/isa/microasm.isa. In short,macroopDict's element is src/arch/x86/isa/macroop.isa:class X86Macroop(Combinational_Macroop). The microops contains in a macroop is declared insrc/arch/x86/isa/microops/*.isa. For details, see microop.md.Here we focus on macroop.getDeclaration(),macroop.getDefinition(env)andmacroop.getAllocator(env).- 
src/arch/x86/isa/macroop.isa: def getDeclaration(self): ... iop = InstObjParams(self.getMnemonic(), self.name, "Macroop", {"code" : "", "declareLabels" : declareLabels }) return MacroDeclare.subst(iop);By using MacroDeclaretemplate, macroop declaration is generated.- 
src/arch/x86/isa/macroop.isa: def template MacroDeclare {{ ... }};
 The output is write to - 
build/X86/arch/x86/generated/decoder-ns.hh.inc: ... class MOV_R_I : public Macroop {...}; ...
 
- 
- 
src/arch/x86/isa/macroop.isa: def getDefinition(self, env): ... for op in self.microops: ... allocMicroops += \ "microops[%d] = %s;\n" % \ (micropc, op.getAllocator(flags)) ... ... iop = InstObjParams(self.getMnemonic(), self.name, "Macroop", {"code" : "", "num_microops" : numMicroops, "alloc_microops" : allocMicroops, "adjust_env" : self.adjust_env, "adjust_imm" : self.adjust_imm, "adjust_disp" : self.adjust_disp, "disassembly" : env.disassembly, "regSize" : regSize, "init_env" : self.initEnv}) return MacroConstructor.subst(iop) + MacroDisassembly.subst(iop);MacroConstructorgenerates cpp class constructor, e.g.x86_macroop::MOV_R_I::MOV_R_I(...){...}.MacroDisassemblygeneratesgenerateDisassembly, e.g.std::string x86_macroop::MOV_R_I::generateDisassembly(...){...}.
- 
src/arch/x86/isa/macroop.isa: def getAllocator(self, env): return "new x86_macroop::%s(machInst, %s)" % \ (self.name, env.getAllocator())macroop.getAllocator(env)expands tonew x86_macroop::MOV_R_I(machInst, EmulEnv((OPCODE_OP_BOTTOM3 | (REX_B << 3)), 0, 1, ADDRSIZE, STACKSIZE))The expansion of env ( env.getAllocator()) is as follow,- 
src/arch/x86/isa/macroop.isa: def getAllocator(self): ... return '''EmulEnv(%(reg)s, %(regm)s, %(dataSize)s, %(addressSize)s, %(stackSize)s)''' % \ self.__dict__
 
- 
 
- 
 
- 
 
- 
 
- 
Appendix
TODO:
Instruction behaviors are described by
- C++ code
- bitfield operators
- operand type qualifiers
generated files
generated files
TODO:
generic_cpu_exec.cc inst-constrs.cc
| section | file | 
|---|---|
| decode_block | decode-method.cc.inc | 
| header | decoder.hh, decoder-g.hh.inc, decoder-ns.hh.inc | 
| decoder | decoder.cc, decoder-g.cc.inc, decoder-ns.cc.inc | 
| exec | exec-g.cc.inc, exec-ns.cc.inc | 
see src/arch/isa_parser/isa_parser.py:
# Get the file object for emitting code into the specified section
# (header, decoder, exec, decode_block).
def get_file(self, section):
  ...
# Change the file suffix of a base filename:
#   (e.g.) decoder.cc -> decoder-g.cc.inc for 'global' outputs
def suffixize(self, s, sec):
  ...
suffix
- -ns: namespace
- -g: global