xieby1's Manual for isa, a new DSL(Domain Specific Language)
TLDR: isa_parser.py
- defines a DSL(Domain Specific Language) by Lex&Yacc, called isa,
- generates decoder code (cpp) from isa language, called isa parser.
Resources:
- https://www.gem5.org/documentation/general_docs/architecture_support/isa_parser/
gem5/src/arch/isa_parser/isa_parser.py
declaration section
Format definitions
Syntax
def_format : DEF FORMAT ID LPAREN param_list RPAREN CODELIT SEMI
- ID: python string
- param_list: python argument, therefore use asterisk(*) to represent variadic arguments
- CODELIT: python code
- Variables
name
andName
are available, which are inst name, seeinst :
- Variables
Source
def p_def_format(self, t):
(id, params, code) = (t[3], t[5], t[7])
self.defFormat(id, params, code, t.lexer.lineno)
class Format(object):
def __init__(self, id, params, code):
...
self.user_code = compile(fixPythonIndentation(code), label, 'exec')
...
f = '''def defInst(_code, _context, %s):
my_locals = vars().copy()
exec(_code, _context, my_locals)
return my_locals\n''' % param_list
c = compile(f, label + ' wrapper', 'exec')
exec(c, globals())
self.func = defInst
...
Format.__init__
define a func defInst
on the fly.
defInst
will be called by defineInst
in decode section.
Therefore CODELIT will be executed by defineInst
.
defineInst
add name
&Name
of defined Inst to exec context.
class Format(object):
...
def defineInst(self, parser, name, args, lineno):
...
context.update({ 'name' : name, 'Name' : Name })
...
vars = self.func(self.user_code, context, *args[0], **args[1])
...
Template
definition
Suffix doest not influence template behaviors. It is just naming convention. Gem5 use following 4 prefixes,
-
Declare
: declaration (header output) templates -
Decode
: decode-block templates -
Constructor
: decoder output templates -
Execute
: exec output templates -
src/arch/isa_parser/isa_parser.py:
def p_def_template(self, t): 'def_template : DEF TEMPLATE ID CODELIT SEMI' if t[3] in self.templateMap: print("warning: template %s already defined" % t[3]) self.templateMap[t[3]] = Template(self, t[4])
Every template definition is added to
templateMap
.
Subtitution
-
src/arch/isa_parser/isa_parser.py:
class Template(object): ... def subst(self, d): ... myDict = self.parser.templateMap.copy() ... if isinstance(d, InstObjParams): ... myDict.update(d.__dict__) ... myDict['op_decl'] = operands.concatAttrStrings('op_decl') ... myDict['op_rd'] = operands.concatAttrStrings('op_rd') ... for op_desc in reordered: ... op_wb_str = op_desc.op_wb + op_wb_str myDict['op_wb'] = op_wb_str ... return template % myDict
code
will be searched for operands, for more details see Operand definitions.The
op_decl
,op_rd
andop_wb
is generated by operands analysis. And the user-defined replacement is allowed.
Output blocks
Let blocks
Bitfield definitions: def [signed] bitfield NAME ...
Operand definitions
Syntax
def operand {{
'op_name': tuple
}};
-
tuple:
(base_cls_name, dflt_ext, reg_spec, flags, sort_pri, [read_code[, write_code[, read_predicat[e, write_predicate]]]])
Source
src/arch/x86/isa/operands.isa:
def operands {{
...
'DestReg': intReg('dest', 5),
...
}};
intReg
is defined in the same file.
def intReg(idx, id):
return ('IntReg', 'uqw', idx, 'IsInteger', id)
This isa file is parsed by src/arch/isa_parser/isa_parser.py.
-
src/arch/isa_parser/isa_parser.py:
class ISAParser(Grammar): ... def p_def_operands(self, t): 'def_operands : DEF OPERANDS CODELIT SEMI' ... user_dict = eval('{' + t[3] + '}', self.exportContext) ... self.buildOperandNameMap(user_dict, t.lexer.lineno) ...
After parsing
user_dict
contains a entryuser_dict[DestReg] = ('IntReg', 'uqw', 'dest', 'IsInteger', 5)
.-
src/arch/isa_parser/isa_parser.py:
def buildOperandNameMap(self, user_dict, lineno): ... for op_name, val in user_dict.items(): val += (None, None, None, None) base_cls_name, dflt_ext, reg_spec, flags, sort_pri, \ read_code, write_code, read_predicate, write_predicate = val[:9] ... cls_name = base_cls_name + '_' + op_name ... base_cls = eval(base_cls_name + 'Operand') ... operand_name[op_name] = type(cls_name, (base_cls,), tmp_dict) self.operandNameMap.update(operand_name)
buildOperandNameMap
shows avialable args for an operand definition.buildOperandNameMap
derives a class namedcls_name
frombase_cls
, wherecls_name = IntReg_DestReg
,base_cls = IntRegOperand
. And every thing in operand definition is add to derived classIntReg_DestReg
. Finally, an entry{"DestReg": class IntReg_DestReg}
is added tooperandNameMap
.
-
Operand type definitions
Syntax
def operand_types {{
'typename' : 'ctype'
}};
Source
src/arch/x86/isa/microops/limmop.isa:
def template MicroLimmOpExecute {{
Fault
%(class_name)s::execute(ExecContext *xc,
Trace::InstRecord *traceData) const
{
%(op_decl)s;
%(op_rd)s;
%(code)s;
%(op_wb)s;
return NoFault;
}
}};
iop = InstObjParams("limm", "Limm", base,
{"code" : "DestReg = merge(DestReg, dest, imm64, dataSize);"})
exec_output += MicroLimmOpExecute.subst(iop)
code
will be replaced by code
defined in InstObjParams
above.
op_decl
, op_rd
, op_wb
will be generated by parsing what are code
needed.
What are code
needed means which operands are used.
-
src/arch/isa_parser/isa_parser.py:
class InstObjParams(object): def __init__(self, parser, mnem, class_name, base_class = '', snippets = {}, opt_args = []): ... self.operands = OperandList(parser, compositeCode) ...
compositeCode
iscode
passed as parameter toInstObjParams
, is"DestReg = merge(DestReg, dest, imm64, dataSize);"
.-
./src/arch/isa_parser/operand_list.py:
class OperandList(object): def __init__(self, parser, code): ... for match in parser.operandsRE().finditer(code): ...
A RE is applied to
code
. The definition of RE is listed below.-
src/arch/isa_parser/isa_parser.py:
def operandsRE(self): if not self._operandsRE: self.buildOperandREs() return self._operandsRE
def buildOperandREs(self): operands = list(self.operandNameMap.keys()) ... operandsREString = r''' (?<!\w) # neg. lookbehind assertion: prevent partial matches ((%s)(?:_(%s))?) # match: operand with optional '_' then suffix (?!\w) # neg. lookahead assertion: prevent partial matches ''' % ('|'.join(operands), '|'.join(extensions))
The re is
(?<!\w)((%s)(?:_(%s))?)(?!\w)
. This re contains 3 captured groups,- group1:
((%s)(?:_(%s))?)
- group2: the first
(%s)
- group3: the second
(%s)
Note: https://regexr.com/ is good to help you pick up RE quickly.
- group1:
RE will capture all operands in
code
. The captured operands are processed as below.class OperandList(object): def __init__(self, parser, code): ... for match in parser.operandsRE().finditer(code): op = match.groups() (op_full, op_base, op_ext) = op ... # see if we've already seen this one op_desc = self.find_base(op_base) if op_desc: ... else: # new operand: create new descriptor op_desc = parser.operandNameMap[op_base](parser, op_full, op_ext, is_src, is_dest) ...
The matched operand will be checked whether it has been added to operand list. I focus on operand has not been added, in which situation, a new operand is created. Assume the to-be-created operand is DestReg, therefore
op_base
isDestReg
. AndoperandNameMap[op_base]
isclass IntReg_DestReg
, which is created in operands definition section.class IntReg_DestReg
is a subclass ofIntRegOperand
.As a result, the constructor of
class IntReg_DestReg
will be called.Finally,
class OperandList(object): def __init__(self, parser, code): ... for op_desc in self.items: op_desc.finalize(self.predRead, self.predWrite)
self.predRead
andself.predWrite
are bool variables, which indicate whether conditional read/write exists in current operand list.- ./arch/isa_parser/operand_types.py:
I focus on finalizing a dest reg.
def finalize(self, predRead, predWrite): ... if self.is_dest: self.op_wb = self.makeWrite(predWrite) self.op_dest_decl = self.makeDecl() else: self.op_wb = '' self.op_dest_decl = '' ...
-
./arch/isa_parser/operand_types.py:
def makeWrite(self, predWrite): ... if predWrite: ... else: wcond = '' windex = '%d' % self.dest_reg_idx wb = ''' %s { %s final_val = %s; xc->setIntRegOperand(this, %s, final_val);\n if (traceData) { traceData->setData(final_val); } }''' % (wcond, self.ctype, self.base_name, windex) return wb
- If we focus on no conditional code,
then
wcond = ''
. self.ctype
is defined in src/arch/isa_parser/operand_types.py:Operand: __init__
. It is needs parser def_operand_types, which is simple.self.ctype
is uint64_t, according todflt_ext = uqw
.self.base_name
is defined in src/arch/isa_parser/isa_parser.py, which is equal toop_name
akaDestReg
.windex = self.dest_reg_idx
is defined is ./src/arch/isa_parser/operand_list.py, which is sorted bysort_pri
. There is only one dest, sowindex
here is 0.
- If we focus on no conditional code,
then
-
-
Namespace declaration: namespace NAME;
decode section
specifying instruction formats
Syntax
inst : ID LPAREN arg_list RPAREN
- ID: python string: inst name
- arg_list:
inst : ID DBLCOLON ID LPAREN arg_list RPAREN
docode block defaults
preprocessor directive handling
TODO:
0x17: MOV(Bv,Iv);
=> ?
-
./src/arch/isa_parser/isa_parser.py
def p_inst_0(self, t): 'inst : ID LPAREN arg_list RPAREN' ... codeObj = currentFormat.defineInst(self, t[1], t[3], t.lexer.lineno) ...
or
def p_inst_1(self, t): 'inst : ID DBLCOLON ID LPAREN arg_list RPAREN' ... codeObj = format.defineInst(self, t[3], t[5], t.lexer.lineno) ...
Use the defined format to define an inst.
MOV
use theInst
format, which is define in src/arch/x86/isa/formats/multi.isa.def format Inst(*opTypeSet) {{ blocks = specializeInst(Name, list(opTypeSet), EmulEnv()) (header_output, decoder_output, decode_block, exec_output) = blocks.makeList() }};
-
./src/arch/isa_parser/isa_parser.py:
class Format(object): ... def defineInst(self, parser, name, args, lineno): ... vars = self.func(self.user_code, context, *args[0], **args[1]) ... return GenCode(parser, **vars)
name
isMOV
.args
are[Bv, Iv]
user_code
isInst
format code in multi.isa, see above.Executing
user_code
is a very long procedure. I would like to depictGenCode
first.After parsing isa file,
GenCode
object'semit()
will be called. For details, see isa_parser.pyp_
prefixed && inst related function.-
./src/arch/isa_parser/isa_parser.py:
class GenCode(object): ... def emit(self): if self.header_output: self.parser.get_file('header').write(self.header_output) if self.decoder_output: self.parser.get_file('decoder').write(self.decoder_output) if self.exec_output: self.parser.get_file('exec').write(self.exec_output) if self.decode_block: self.parser.get_file('decode_block').write(self.decode_block) ...
Before GenCode,
user_code
is executed.-
src/arch/x86/isa/specialize.isa:
def specializeInst(Name, opTypes, env): ... return genMacroop(Name, env)
specializeInst
will add suffix toName
according toopTypes
, add regs toenv
according toopTypes
.E.g.
Name="MOV"
,opTypes=[Bv, Iv]
.Bv
will prepend_R
toName
, andenv.addReg(InstRegIndex)
Iv
will prepend_I
toName
. After specialization,Name="MOV_R_I"
.-
src/arch/x86/isa/macroop.isa:
def genMacroop(Name, env): ... macroop = macroopDict[Name] if not macroop.declared: ... blocks.header_output = macroop.getDeclaration() blocks.decoder_output = macroop.getDefinition(env) macroop.declared = True blocks.decode_block = "return %s;\n" % macroop.getAllocator(env) return blocks
macroopDict
is created in src/arch/x86/isa/microasm.isa. In short,macroopDict
's element is src/arch/x86/isa/macroop.isa:class X86Macroop(Combinational_Macroop)
. The microops contains in a macroop is declared insrc/arch/x86/isa/microops/*.isa
. For details, see microop.md.Here we focus on
macroop.getDeclaration()
,macroop.getDefinition(env)
andmacroop.getAllocator(env)
.-
src/arch/x86/isa/macroop.isa:
def getDeclaration(self): ... iop = InstObjParams(self.getMnemonic(), self.name, "Macroop", {"code" : "", "declareLabels" : declareLabels }) return MacroDeclare.subst(iop);
By using
MacroDeclare
template, macroop declaration is generated.-
src/arch/x86/isa/macroop.isa:
def template MacroDeclare {{ ... }};
The output is write to
-
build/X86/arch/x86/generated/decoder-ns.hh.inc:
... class MOV_R_I : public Macroop {...}; ...
-
-
src/arch/x86/isa/macroop.isa:
def getDefinition(self, env): ... for op in self.microops: ... allocMicroops += \ "microops[%d] = %s;\n" % \ (micropc, op.getAllocator(flags)) ... ... iop = InstObjParams(self.getMnemonic(), self.name, "Macroop", {"code" : "", "num_microops" : numMicroops, "alloc_microops" : allocMicroops, "adjust_env" : self.adjust_env, "adjust_imm" : self.adjust_imm, "adjust_disp" : self.adjust_disp, "disassembly" : env.disassembly, "regSize" : regSize, "init_env" : self.initEnv}) return MacroConstructor.subst(iop) + MacroDisassembly.subst(iop);
MacroConstructor
generates cpp class constructor, e.g.x86_macroop::MOV_R_I::MOV_R_I(...){...}
.MacroDisassembly
generatesgenerateDisassembly
, e.g.std::string x86_macroop::MOV_R_I::generateDisassembly(...){...}
. -
src/arch/x86/isa/macroop.isa:
def getAllocator(self, env): return "new x86_macroop::%s(machInst, %s)" % \ (self.name, env.getAllocator())
macroop.getAllocator(env)
expands tonew x86_macroop::MOV_R_I(machInst, EmulEnv((OPCODE_OP_BOTTOM3 | (REX_B << 3)), 0, 1, ADDRSIZE, STACKSIZE))
The expansion of env (
env.getAllocator()
) is as follow,-
src/arch/x86/isa/macroop.isa:
def getAllocator(self): ... return '''EmulEnv(%(reg)s, %(regm)s, %(dataSize)s, %(addressSize)s, %(stackSize)s)''' % \ self.__dict__
-
-
-
-
-
Appendix
TODO:
Instruction behaviors are described by
- C++ code
- bitfield operators
- operand type qualifiers
generated files
generated files
TODO:
generic_cpu_exec.cc inst-constrs.cc
section | file |
---|---|
decode_block | decode-method.cc.inc |
header | decoder.hh, decoder-g.hh.inc, decoder-ns.hh.inc |
decoder | decoder.cc, decoder-g.cc.inc, decoder-ns.cc.inc |
exec | exec-g.cc.inc, exec-ns.cc.inc |
see src/arch/isa_parser/isa_parser.py:
# Get the file object for emitting code into the specified section
# (header, decoder, exec, decode_block).
def get_file(self, section):
...
# Change the file suffix of a base filename:
# (e.g.) decoder.cc -> decoder-g.cc.inc for 'global' outputs
def suffixize(self, s, sec):
...
suffix
-ns
: namespace-g
: global