Instruction-Level Simulation And Tracing
This is $Revision: 1.107 $, last updated $Date: 2004/06/08 17:36:44 $.
For an up-to-date version, please check www.xsim.com/bib
.
WARNING: THIS PAGE IS STILL UNDER CONSTRUCTION.
Places that are known to have dubious or absent information are marked with .
There is also a simulators mailing list. To subscribe, write to <majordomo@xsim.com>
. For sample messages see here.
Quick Index
-
Glossary of terms used here
.
-
Who is who
in simulation and tracing (forever incomplete).
A Quick Overview of Instruction-Set Simulation and Tracing
The most important thing is what does it do? If you are building or using a simulator you need to be concerned at some level about the implementation. But first you need to figure out what you want it to do.
Why aren't you using the real thing? Do you want an accurate simulation? If yes, use real hardware. If no, make up the numbers. NullSIM. It's not as accurate, but it's cheaper and faster than any other simulation tool. It's the only universal simulator! Tired of configuring your simulator to do exactly what you want? Use NullSIM, with a familiar user interface and predictable results!
Instruction-set simulators can execute programs written or compiled for computers that do not yet exist, which no longer exist, or which are more expensive to purchase than to simulate. Simulators can also provide access to internal state that is invisible on the real hardware, can provide deterministic execution in the face of races, and can be used to ``stress test'' for situations that are hard to produce on real hardware.
Instruction-level tracing can provide detailed information about the behavior of programs; that information drives analyzers that analyze or predict behavior of various system components and which can, in turn, improve the design and implementation of everything from architectures to compilers to applications.
Although simulators and tracing tools appear to perform different tasks, they in practice do much the same work: both manipulate machine-level details, and both use similar implementation techniques.
This web page is a jumping-off point for lots of work related to instruction-level simulation and tracing. Please contribute! Please send comments, contributions, and suggestions to [
pardo@xsim.com`](http://www.xsim.com/index.html)'. If you'd like to help, edit this page, there is lots that needs to be done; your help is appreciated.
This web also page lists a few OS emulation tools. Although these don't specifically fit the category of tools covered by this page, it's interesting to consider whether you could glue together a processor emulator and an OS emulator and wind up with a whole simulated system. To date, whole simulated systems are built as integrated tools, rather than being assembled modularly.
Terminology
Some terminology:
- Simulation is recreating an environment in enough detail that desired effects of a ``real'' system can be observed.
- Instruction-Set Simulation is simulating a processor at the instruction-set level. Instruction-set simulation is simulation that is detailed enough to run executable programs intended for the machine being simulated. It is possible to do both a more-detailed simulation, for example timing-accurate or RTL (register transfer level) simulation are even more detailed, and bus architecture or cluster simulation are less detailed.
- Emulation is simulation that uses special hardware assistance [RFD 72], [Tucker 65], [Wilkes 69].
- The target machine is the one being simulated; the host machine is the one where the simulation runs. This terminology parallels retargetable compiler terminology. However, there is no standard terminology where the simulation framework is produced on yet a third platform. That is, a target simulator which runs on special host hardware often has the simulation software compiled on a general-purpose machine. Some have suggested
generation host'' or
ghost'' for the machine where the software is created, that suggests the place where the simulator runs is theruntime host'' or
rhost'' (pronounced "roast").
See also the Glossary.
A Brief Categorization
A list of tools, organized according to various interesting features. See also a listing of tools ordered alphabetically. Interesting things about the tools include:
- Purpose of the tool
- Supports buggy applications (that is: is the tool robust in the face of application errors?).
- Supports dynamic instruction space modification (a.k.a.
Dynamic Linking'',
Runtime Code Generation'', or ``Self-Modifying Code'') - Supports multiple target processors
- Supports multiple protection domains (address spaces)
- Supports signals, exceptions and asynchronous events
- Supports system-mode simulation or tracing
- Implementation:
- Timing simulation
- Performance of the tool
- Product status
Purpose Of The Tool
Simulation and tracing tools can perform a wide variety of tasks. Here are some common uses:
-
atr
: address tracing
Classical ``address tracing'' gathers a list of instruction and/or data memory references performed by a system. There are many variations, such as tracing only targets of control transfers or tracing other resources.
-
db
: debugging
A simulator can help with debugging because: it runs deterministically and repeatably; it is possible to query system state without disturbing it; the simulator can be backed up to an earlier checkpoint in order to implement reverse execution (```foo
is twelve ... what was the value of
bar` in the routine we just returned from?''); and because a simulator can perform consistency checks that cannot be done on real hardware. -
otr
: other tracing and event counting
A generalization of address tracing is to trace, count, or categorize events on any kind of processor or system event or resource. For example, a tool may collect the common values of variables; register usage patterns; interrupt or exception event counts, timing information, and so on.
-
sim
: (instruction set) simulation
Simulators commonly implement a processor architecture that does not yet or no longer exists. Simulators can also implement other devices such as memory, bus, I/O devices, user input, and so on.
-
tb
: tool building
Here, ``tool building'' is meant to encompass tools that are used to build other tools, for example, a tool that builds various tracing tools is a tool-building tool, whereas a configurable cache simulator is not. The usual distinction is that a tool-building tool can be extended [NG87, NG88] using a general-purpose programming language (e.g. C, C++, ...), whereas a configurable tool is programmed with a less-powerful language e.g. a list of cache size, line size, associativity, etc.
In addition, some tools are used for
-
os
: operating system (OS) Emulation
Compare OS emulation ``as a purpose'' with simulators that emulate the OS for simplicity (see system-mode simulation or tracing).
Handles Application Bugs Robustly
- No: Application errors such as stores to random memory locations may cause the simulation or tracing tool to fail or produce spurious answers, or may cause the application program to fail in an unexpected (unintended) way or produce spurious answers.
- Some: Certain kinds of errors are detected or serviced. For example, application errors may be constrained so that they can clobber application data in random ways but that they cannot cause the simulation or tracing tool to fail or produce erronious results.
- Yes: Application errors are detected and handled in some predictable way. Typically, ``predictable'' means that the error model is the same as a reference for the target architecture.
- Yes*: Selectable; turning on checking may slow execution.
Works with Self-Modifying Code
THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.
- No
- Yes, but not all kinds
- Yes
Multiple Processors
THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.
- No
- Y1: multiplexes all target processors on a single host processor
- Y=: same number of host and target processors (to be precise, should be a ``Y-'' category for several host processors per target processor).
- Y+: can multiplex a large number of target processors onto a potentially smaller number of host processors
Support for Multiple Protection Domains
THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.
- No
- Yes
Signals and Exceptions
THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.
- No
- S: yes, but not all kinds. For example, a tracing tool might execute the traced program correctly but fail to trace signal handlers.
- Yes
Support for System-Mode Code
(Detail) THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.
-
d: device
-
u: user
-
s
: system
Note: the system mode may be marked in parenthesis, e.g. (s), indicating that the host processor does not have a distinct system mode in hardware, but the tool is intended to work with (simulate, trace, etc.) operating system code.
Processor simulators typically implement either a full procesor or just the user-mode part of the instruction set. A full simulation is more precise and allows analysis of operating systems, etc. However, it also requires implementing the processor's protected mode architecture, simulated devices, etc. An alternative is to implement just the user-mode portion of the ISA and to implement system calls (transitions to protected mode) using simulator code rather than by simulating the operating system. OS emulation is typically less accurate
Input Representation
THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.
- asm: assembly code
- exe: executable code, no symbol table information
- exe*: executable code, with symbol table information
- hll: high-level language
Implementation: Decompilation Technology
Decompilation technology'' here refers to the process of analyzing a (machine code) fragment and, through analysis, creating some higher-level information about the fragment. For simulation and tracing tools, decompilation is typically simpler than [static program decompilation](http://www.csee.uq.edu.au/csm/decompilation), in which the goal is to read a binary program and produce source code for it in some high-level language. Simulation and tracing
has it easy'' in comparison because it is possible to get by with a lower-level representation and also to punt hard problems to the runtime, when more information is available.
Even so, executable machine code is difficult to simulate and trace efficiently (within 2 orders of magnitude of the performance of native execution) when using ``naive'' instruction-by-instruction translation, because lots of relevant information is unavailable statically. For example, every instruction is potentially a branch target; every word of memory is potentially used both as code and as data; every mutable word of memory is potentially executed, modified (at runtime), and then executed again; and so on.
Executable machine code is also inherently (target) machine-dependent and thus lexing and parsing the machine code is a source of potential portability problems. (Note that some tools use a high-level input, so that relatively little analysis is needed to determine the original programmers intent, at least at a level needed to simulate the program with modest efficiency.)
The following is a a list of tools and papers that show how to reduce the overhead of analyzing each instruction; how to reduce the number of times each instruction is analyzed; how to perform optimistic analysis and recover when it's wrong; and how to improve the abstraction of machine-dependent parts of the tool.
A short list:
- FlashPort
g88
- Mimic
- MINT
- Moxie
- New Jersey Machine Code Toolkit
- OM
- qp/qpt
- [Pittman 95]
- SELF
- Shade
- SimICS
- SimOS
- SoftPC
- ST-80
- Vest
A slightly longer list:
- Accelerator
- ATOM
- CRISP
- Cygnus
- dcc, [Cifuentes 94b]
- dis+mod+run
- EEL
- Executor
- FlashPort
g88
- GNU Simulators
- Mable
- Migrant
- Mimic
- MINT
- Moxie
- MPtrace
- New Jersey Machine Code Toolkit
- OM
- Pixie
- qp/qpt
- SELF
- Shade
- SimICS
- SimOS
- SoftPC
- SPIM
- ST-80
- Tapeworm II
- Titan tracing
- TRAPEDS
- Vest
Implementation: Simulation Technology
The ``simulation technology'' is how the original machine instructions (or other source representation) gets translated into an executable representation that is suitable for simulation and/or tracing. Choices include:
-
ddi: Decode-and-dispatch
interpretation: the input representation for an operation is fetched and decoded each time it is executed.
-
pdi: Predecode
interpretation: the input form is translated into a form that is faster to decode; that form is then saved so that successive invocations (e.g. subsequent iterations of a loop) need only fetch and decode the ``fast'' form. Note that
- The translation may happen before program invocation, during startup, or incrementally during execution; and that the translated form may be discarded and regenerated.
- If the original instructions change, the translated form becomes incoherent with the original representation; a system that fails to update (invalidate) the translated form before it is then reexecuted will simulate the old instructions instead of the new ones. For some systems (e.g., those with hardware coherent instruction caches) such behavior is erronious.
-
tci: Threaded code
interpretation: a particularly common and efficient form of predecode interpretation.
-
scc: Static cross-compilation
: The input form is statically (before program execution) translated from the target instruction set to the host instruction set. Note that:
- All translation costs are paid statically, so runtime efficiency may be very good. In contrast, dynamic analysis and transformation costs are paid during simulation, and so it may be necessary to ``cut corners'' with dynamic translation in order to manage the runtime cost. Cutting corners may affect both the quality of analysis of the original program and the quality of code generation.
- Instructions that cannot be located statically or which do not exist until runtime cannot be translated statically.
- Historically, it is difficult to distinguish between memory words that are used for instructions and those that are used for data; translating data as instructions may cause errors.
- Translating to machine code allows the use of the host hardware's instruction fetch/decode/dispatch hardware to help simulate the target's.
- Translating to machine code makes it easier to translate clumps of host instructions; most dispatching between target instructions is thus eliminated.
-
dcc: Dynamic Cross Compilation
: Host machine code is generated dynamically, as the program runs. Note that:
- Translating ``on demand'' eases the problem of determining what is code and what is data; a given word may even be used as both code and data.
- Translating to machine code is often more expensive than translating to other representations; both the cost of generating the machine code and the cost of executing it contribute to the overall execution time.
- Theoretical performance advantages from dynamic cross-compilation may be overwhelmed by the host's increased cache miss ratio due to dynamic cross-compilation's larger code sizes [Pittman 95].
-
aug: Augmentation
: cross-compilation where the host and target are the same machine. Note that
- Augmentation is typically done statically.
- There is a fine line between having identical host and target machines (augmetnation) and having nearly-identical machines in which just a few features (e.g. memory references) are simulated, but in which the bulk of instruction sets and encodings are identical.
-
emu: Emulation: Where software simulation is sped up using hardware assistance.
Hardware assistance'' might include special compatability modes but might also include careful use of page mappings. (See
emulation''.)
Dynamic Compilation: Displaced Execution
Move an instruction from one place to another, but execute with the same host and target.
- 1951: EDSAC Debug
- 1987: Shadow
Dynamic Compilation: Cross-Compilation
Compile instruction sequences from a target machine to run on a host machine.
- 1984: ST-80
- 1987: CRISP
- 1987: Mimic
- 1988: SoftPC
- 1988: SELF
- 1991: Shade
- 1993: MINT
- 1994: Executor
- 1994: IMS.
- 1993: SimICS; in particular, ``Partial Translation''
- 1994: SimOS.
- 1994: T2.
Hardware Emulation
Interpreters
Simulation and tracing tools that perform execution using interpretation; the original executable code is neither preprocessed (augmentation or static cross-compilation) nor is it dynamically compiled to host code.
- 1986: Z80MU
- 1987: Cerberus
- 1988:
g88
- 1990: Spa
- 1991: SimICS
- 1991: Dynascope
- 1992: Accelerator
- 1992: GNU Simulators
- 1992: SPIM
- 1993: Cygnus
- 1993: Dynascope-II
- 1993: Executor
- 1993: MINT
- 1993: WWT
- 1994: Dynascope-II
- 1994: Talisman (also known as ```mg88`'').
- 1994: Kx10
- 1994: Mable
- 1994: Mime
Static Cross-Compilation
Statically cross-compile instruction sequences from a target machine to run on some host machine.
- 1983: dis+mod+run
- 1986: Moxie
- 1987: Cerberus
- 1992: Accelerator
- 1993: Vest and mx
- 1994: FlashPort
- 1994: FreePort Express
- 1994: Migrant
- 1994: Pixie-II
Static Augmentation
Augmentation-based tracing tools run host instructions native, but some instructions are simulated. For example, Proteus executes arithmetic and stack-relative memory reference instructions native, and simulates load and store instructions that may reference shared memory.
- 1983: Simon
- 1986: Pixie
- 1988: RPPT
- 1989: MPtrace
- 1989: Titan tracing
- 1989: TRAPEDS
- 1991: Proteus
- 1991: Tango Lite
- 1992: FAST
- 1992: OM
- 1992: Purify
- 1993: ATOM (based on OM)
- 1993: Hiprof (based on OM)
- 1993: qp/qpt
- 1993: Third Degree (based on OM)
- 1993: WWT
- 1994: IDtrace
Multiple Strategies
Some tools rely on having multiple strategies in order to achieve their desired functionality. For the purposes here, ``untraced native execution'' counts as a translator.
- 1951: EDSAC Debug (displaced execution, native execution)
- 1991: Dynascope (interpretation, native execution)
- 1992: Accelerator (static cross-compilation, interpretation)
- 1993: MINT (dynamic cross-compilation, interpretation)
- 1993: Vest and mx (static cross-compilation, interpretation)
- 1994: Executor (interpretation, dynamic cross-compilation)
- 1994: SimICS (interpretation, dynamic cross-compilation)
- 1995: FreePort Express (static cross-compilation, interpretation; uses Vest and mx technology)
Other
Some tools/papers not listed under other headings.
Match Between Host and Target
THIS CATEGORY NOT YET ORGANIZED.
Generally, the closer the match between the host and the target, the easier it is to write a simulator, and the better the efficiency. Possible mismatches include:
- Byte or word size. For example, Kx10 simulates a machine with 36-bit words; it runs on machines with 32-bit and 64-bit words.
- Numeric representation. For example, whether integers are sign-magnitude, one's complement, or two's complement. Or, for example, Vest, which simulates all VAX floating-point formats on a host machine that lacks some of the VAX formats.
- Which instruction combinations cause exceptions, and how those exceptions are reported.
- Synchronization and atomicity. In particular, the details may be messy where the target machine synchronizes implicitly and the host does so explicitly, since all target operations that might cause synchronization generally need to be treated as if they do.
Note that target support for self-modifying code may be treated as a special case of synchronization. For example, target machines with no caches or unified instruction and data caches will typically write instructions using ordinary store instructions. Therefore, all store instructions must be treated as potential code-modifying instructions.
For timing-accurate simulation (see Talisman and RSIM), some matches between the host and target can improve the efficiency, but many do not.
Timing Simulation
THIS CATEGORY NOT YET ORGANIZED.
Some instruction-set simulators also perform timing simulation. Timing is not strictly an element of timing simulation, but is often useful, since one major use for instruction set simulation is to collect information for predicting or analyzing performance. Important features of timing simulation include both the processor pipeline and the memory system (see Talisman and RSIM).
Performance
There are many ways to measure performance. Some common metrics include:
- host instructions executed per target instruction executed;
- host cycles executed per target instruction executed;
- relative wallclock time of host and target
Metrics that are more abstract have the advantage that they are typically simple to reason about and applicable across a variety of implementations. For example, host instructions may be counted relatively easily for each of a variety of target instructions, and the counts are relatively isolated from the structure of the caches and microarchitecture. Conversly, concrete metrics tend to more accurately reflect all related costs. For example the effects of caches and microarchitectures are included.l
It is worth noting that few reports give enough information about the measurement methodology in order to make a valid comparison. For example, if dilation is typically'' 20x, what is
typical'', and what is the performance for ``non-typical'' workloads?
Product Status
THIS CATEGORY NOT YET ORGANIZED.
The status of tool
- info: only information is available
- nonprod: the tool is available but is not a product
- product: the tool is a commercial product
An Alphabetical List of Tools
Just The Names
- Accelerator
- ATOM
- ATUM
- BEaT
- Cerberus
- CRISP
- Crusoe
- Cygnus
- dcc
- Decomp
- dis+mod+run
- Dynascope
- Dynascope-II
- EDSAC Debug
- EEL
- Executor
- FAST
- FlashPort
- FLEX-ES
- FreePort Express
g88
- GNU Simulators
gsim
: see SimICS- Hiprof
- IDtrace
- The Interpreter
- Kx10
- Mable
mg88
: see Talisman- Migrant
- Mime
- Mimic
- MINT
- Moxie
- MPtrace
- Mshade: a component of SimOS
- mx (same description as Vest)
- New Jersey Machine Code Toolkit (NJMCT)
- OM
- PDP-8 Simulators
- PDP-11 Simulators
- Partial Emulation
- Pixie
- Pixie-II
- Proteus
- Purify
- qp/qpt
- RPPT
- RSIM
- SELF
- Shade
- Shadow
- SimICS (a follow-on to ```gsim`'')
- Simon
- SimOS (see also ``Mshade'')
- Sleipnir
- SoftPC
- Spa
- SPIM
- ST-80
- T2
- Tango Lite
- Talisman (also known as ```mg88`'')
- Tapeworm II
- Third Degree
- Titan tracing
- TRAPEDS
- Vest
- Windows x86
- Windows on Windows (WOW)
- WWT
- Z-80 Simulators
- Z80MU
Longer Writeups
Longer writeups and cross-references. Some of the tools here have bibliographic entries, home pages or online papers, noted with See: ...''. Many are also described and referenced in the 1994 SIGMETRICS Shade paper, noted with
See: Shade''.
See here for a list of tools.
Accelerator
See:
Atari Emulators
The listed tools include:
- Gemulator [IBM PC] (Atari ST emulator)
- ST XFormer [Atari ST] (Atari 130XE emulator)
- PC XFormer 2 [IBM PC] (Atari 800 emulator)
- PC XFormer 3 [IBM PC] (Atari 130XE emulator)
See:
Apple II Emulators
The listed tools include Apple II emulators:
- Apple 2000 [Amiga]
- AppleOnAmiga [Amiga]
- STM [Macintosh]
- YAE [Unix/X]
See:
- Apple II Emulation FAQ.
- Apple II Emulators Home Page
- Apple II emulators at wilbur.stanford.edu.
- Apple 2000
- AppleOnAmiga
- YAE
Apple Macintosh Emulators
The listed tools include Macintosh emulators:
- AMax [Amiga] (software + hardware)
- Emplant [Amiga] (software + hardware)
- ShapeShifter [Amiga]
- MAE [Unix/X]
See:
- The Apple Emulators FAQ
- Macintosh emulators at wilbur.stanford.edu
- MAE information
- More MAE info
- [Halfhill 94b]
ATOM
ATOM is built on top of OM.
See:
- the technical report that became a PLDI paper
- the PLDI paper
- the technical report that became a USENIX paper
- the USENIX paper
- Shade
ATUM
See:
BEaT (Binary Emulation and Translation)
See:
- BEaT Web page
- A Research Summary (from Purdue's ECE Overview).
- Info about the principles, including Thai Wey Then, Lee Kiat Chia and Russell Quong.
Cerberus
As of 1994, Cerberus was being actively used and updated by <csa@transmeta.com>
, who might be willing to provide information and/or code.
Commodore Emulators
See:
Amiga
- A64 [Amiga]
- C64Emulator [Amiga]
- c64 [Atari ST]
- Mac64 [Macintosh]
- X64 [Unix/X]
PET
- The PET Emulator [C64]
VIC20
- vic-emu [Amiga]
- vic-emulator [C64]
See
CRISP
See:
Crusoe
Crusoe is an x86 emulator. It both interprets x86 instructions and also translates x86 instructions to a host VLIW instruction set; translations are cached for reuse. The host instruction set is not exported, only target instructions may be executed. A demonstration Crusoe executed both x86 and Java instructions.
Categories:
- Purpose: simulation
- Input representation: exe
- Detail: System
- Multiple protection domains: Y
- Multiple processors: N
- Signals and execptions: Y
- SMC OK: Y
- Simulation technology: ddi + dcc + emu
- Tool is robust in the face of application bugs: Y
- Status: product.
See:
- Transmeta Breaks x86 Low-Power Barrier
- The Technology Behind Crusoe(tm) Processors
- ``Combining Hardware and Software to Provide an Improved Microprocessor''
- ``Memory Controller For A Microprocessor for Detecting A Failure Of Speculation On The Physical Nature Of A Component Being Addressed''
- ``Method And Apparatus for Aliasing Memory Data In An Advanced Microprocessor''
Cygnus
See:
dcc
A prototype/research vehicle for decompiling DOS EXE binary files. It uses digital signatures to determine library function calls and the original compiler.
See:
- DCC home page (1998/03). Sources are availabale.
- dcc home page (see also)
- ``Decompilation of Binary Programs'',
- ``Decompilation of Binary Programs'',
- ``A Methodology for Decompilation'',
- ``Interprocedural Data Flow Decompilation'',
- ``Interprocedural Data Flow Decompilation'',
- ``A Structuring Algorithm for Decompilation''
- ``Structuring Decompiled Graphs'',
- ``Reverse Compilation Techniques'',
- The ``decomp'' tool.
DEC PDP-8 Simulators
- Old emulator [fairly portable]
- N.A.B. Gray's
exec8
PDP-8 simulator and tools. - Bill Haygood's PDP-8 simulator (versions may also be available here) [portable]
- Emulator by Robert Supnik [Unix]
- Emulators available through Doug Jones' PDP-8 emulation page [Unix], also references, tools, core files, etc.
See:
DEC PDP-11 Simulators
- Ersatz-11 [IBM PC] (even has a manual).
- Russian emulator [IBM PC]
- PDP11 emulator [Unix]
- Emulator by Robert Supnik [Unix]
- Emulator by Eric Edwards [Unix]
- Emulator by der Mouse [Unix]
- Bill Haygood's LSI-11 simulator [portable]
See:
- DEC machines emulation page (contains original refs for all of the above emulators).
- 11SIM, a PDP-11 simulator that runs on a PDP-6.
Decomp
See:
dis+mod+run
See:
Dynascope
See:
- bib cite
- ftp-able papers
- WWW page
- Source, version 3.1.15 (current as of 95/04/19, but check the WWW page for newer versions).
- Shade
Dynascope-II
See:
- ``Design and Implementation of Dynascope, a Directing Platform for Compiled Programs''
- ``Dynascope: A Tool for Program Directing''
- ``The Dynascope Directing Server: Design and Implementation''
- ftp-able papers,
- Shade
EDSAC Debug
The EDSAC Debugger uses a tracing simulator that operates by: fetching the simulated instruction; decoding it to save trace information; checking to see if the instruction is a branch, and updating the simulated program counter if it is; else placing the instruction in the middle of the simulator loop and executing it directly; and then returning to the top of the simulator loop.
As an aside, the 1951 paper on the EDSAC debugger contains a pretty complete description of a modern debugger...
Categories:
- Purpose: debugging
- Input representation: exe
- Detail: user, (system)
- Multiple protection domains: No
- Multiple processors: No
- Signals and execptions: No
- SMC OK: Yes
- Simulation technology: Dynamic compilation: displaced execution
- Tool is robust in the face of application bugs: N
- Status: information
See:
EEL
EEL reads object files and executables and allows tools built on top of EEL to modify the machine code without needing details of the underlying architecture or operating system or with the consequences of adding or deleting code.
EEL appears as a C++ class. EEL is provided with an executable, which it analyzes, creating abstractions such as executable (whole program), routines, CFGs, instructions and snippets. A tool built on EEL then edits the executable by performing structured rewrites of the EEL constructs; EEL ensures that details of register allocation, branches, etc. are updated correctly in the final code.
Categories:
- Purpose: tool building
- Input representation: exe
- Detail: User
- Multiple protection domains: Yes
- Multiple processors: Y=
- Signals and execptions: No
- SMC OK: S (dynamically-linked libraries only)
- Simulation technology: SCC + ???
- Tool is robust in the face of application bugs: N
- Status: available, nonproduct.
See:
- bib cite
- The WARTS home page; EEL is a part of WARTS.
- An EEL home page.
Executor
See:
- bib cite
- FAQ from ARDI (an HTML version might be here)
- An overview of Executor's internals
- Shade
FAST
See:
FlashPort
See:
FLEX-ES
FLEX-ES (formerly OPEN/370) provides a System/390 on a Pentium. It includes system-mode operation, runs 8 popular S/370 OS's. On a 2-processor Pentium-II/400MHz, it provides 7 to 8 MIPS on one processor and I/O functions on the other processor. They also sell installed systems (hardware/software turnkey systems).
Categories:
-
Simulation technology
:
dynamic cross-compilation
- Units of analysis: several target instructions at a time.
- Units of translation: one target instruction at a time
-
Degree of possible mismatch between host and target:
- byte order
- floating-point numeric representation
- instructions that cause exceptions
-
Performance: Range of 3 to hundreds of host machine instructions per target machine instruction; typically 50.
FreePort Express
FreePort Express is a tool for convering Sun SPARC binaries to DEC Alpha AXP binaries.
See: FreePort Express web page
g88
g88
is a portable simulator that simulates both user and system-mode code. It uses threaded code to performance on the order of a few tens of instructions per simulated instruction.
See:
- the USENIX paper
- Shade.
- Contact Robert Bedichek for the source code.
g88
was written by Robert Bedichek.
GNU Simulators
See:
Hiprof
Built on top of OM.
See:
IDtrace
See:
IMS
See:
The Interpreter
``The Interpreter'' is a micro-architecture that is intended for a variety of uses including emulation of existing or hypothetical machines and program profiling. An emulator is written in microcode and instructions executed from the microinstructions that are executed from the microstore give both parallelism and fast execution.
Categories:
- Purpose: instruction set simulation, other tracing
- Input representation: exe
- Detail: User (system-mode execution was not discussed).
- Multiple protection domains: ??
- Multiple processors: N
- Signals and execptions: Yes
- SMC OK: Yes
- Simulation technology: ddi, emu
- Tool is robust in the face of application bugs: Yes
- Status: ??
More detailed review:
-
A brief (1,000 word) history of microprogramming.
-
(pg. 715) Suggested applications: emulation of existing or hypothetical machines; direct execution of high-level languages; tuning the instruction set to the application (by iterative profiling and instruction-set change).
-
(pg. 715) ``Emulation is defined in this paper as the ability to execute machine language programs intended for one machine (the mulated machine) on another machine (the host machine). Within this broad definition, any machine with certain basic cpabilities can emulate any other machine; however, a measure of the efficiency of operation is generally implied when the term emulation is used. For example, ... a conventional computer [has poor] emulation efficiency ... since for each machine language instruction of the emulated machine there corresponds a sequence of machine instructions to be executed on the host machine ... (called simulation ... [Husson 70] and [Tucker 65]) turns out to be significanly more efficent on micorprogrammable computers. In a microprogrammed machine, the controls for performing the fetching and execution fo the machine instructions of the mulated machine consist of a sequence of microinstructions which allows the increased efficiency.'' In short, as Deutsch and Schiffman point out, you get hardware support for instruction fetch and decode, which are typically multi-instruction operations in decode-and-dispatch interpreters.
-
(pg. 717) Description of Interpreter features that help it emulate a variety of machine architectures and instruction encodings.
-
(pg. 719) ``The basic items necessary to define a machine and hence emulate it are:
- Memory structure (registers, stacks, etc.),,
- Machine language format (0, 1, 2, 3 address) including effective operand address calculation, and
- Operation codes and their functional m eaning (instructions for arithmetic, branching, etc.).''
Note that you also need e.g. data formats, an exception model, a device or other I/O model, ...
-
(pg. 719) ``The process of writing an emulation therefore, involves the following analysis and the microprogramming of these basic items:
- Mapping the registers (stacks, etc.) from the emulated machine onto the host machine.
- Analysis of the machine language format and addressing structure.
- Analysis of each operation code defined i the machine language.
-
(pg. 719-720)
All of the registers of the emulated machine must be mapped onto the host machine; that is, each register must have a corresponding register on the host machine. The most frequently used registers are mapped onto the registers within the Interpreter Logic Unit (e.g., registers A1, A2, A3). The remaining registers are stored either in the main system memory or in special high speed registers depending on the desired emulatin speed [[Which, I assume, means
do you want R5 fast and R6 slow or R6 fast and R5 slow; it doesn't make sense to me that they'd offer you slower emulation as a feature --pardo]]. The machine language format may be 0, 1, 2, 3 address or variable length for increased code density, and may involve indexing, indirection, relative addressing, stacks and complex address conversions. Figure 14 shows the general micrporogram requirements (MPM Map) and operating procesures for the exmulation task. -
Summary: You're probably already familiar with the concepts in this paper. The paper describes the overall structure of a classic decode-and-dispatch interpreter; this one happens to use microcode, but many of the same features are the same when using normal machine code. The opportunity with microcode (which tends to be poorly stated in all of these papers) is that writable microcode allows the use of a machine with a very fast and very flexible but very space-consuming instruction set; microcode makes such an instruction set useful by providing a fast mechanism for mapping that instruction set to a denser representation (the one stored in primary memory). In the particular case of emulation, much of the interpreter can be written directly in the low-density machine code and can take advantage of that code's flexibility and performance without being hurt by the low encoding density.
-
See also: [Rosin 69] and Deutsch's ST-80 VM, written (largely) in Xerox Dorado microcode.
See:
This review/summary by Pardo.
Kx10
See:
Mable
See:
Migrant
See:
Mime
See:
Mimic
See:
MINT
See:
Moxie
See:
MPtrace
MPtrace statically augments parallel programs written for the i386-based Sequent Symmetry multiprocessor. The instrumented programs are then run to generate multiprocessor address traces.
See:
- the SIGMETRICS paper
- the MPtrace home page [Link broken, please e-mail
<pardo@xsim.com>
to get it fixed.] (includes an abstract and a reference for the SIGMETRICS paper) - Shade
- The source code. For details on how to get it, see The MPtrace home page. [Link broken, please e-mail
<pardo@xsim.com>
to get it fixed.]
MPtrace was written by David Keppel and Eric J. Koldinger under the supervision of Susan J. Eggers and Henry M. Levy
MSX
Emulators:
- fMSX
- MSX-1 emulator for IBM PC.
- MSX-1 emulator for Amiga written by people from .CL (see also/alternatively another site
- [MSX-2 emulator for IBM PC beta]
- MSX-1 emulator for Atari ST.
- Amiga fMSX [Amiga]
- MSX-1 emulator [Atari ST]
- MSX-1 emulator [IBM PC]
- MSX-2 emulator [IBM PC]
See:
- MSX Emulation Page which summarizes all of the above.
New Jersey Machine Code Toolkit (NJMCT)
The New Jersey Machine Code Toolkit lets programmers decode and encode machine instructions symbolically, guided by machine specifications that mappings between symbolic and machine (binary) forms. It thus helps programmers write applications such as assemblers, diassemblers, linkers, run-time code generators, tracing tools, and other tools that consume or produce machine code.
Questions and comments can be sent to ``toolkit@cs.princeton.edu`'.
See:
- An overview paper (also available as an earlier technical report)
- A paper on architecutre specifications
- The reference manual
- The toolkit home page (Includes links to the distribution, which is also available via ftp from
ftp://ftp.cs.princeton.edu/pub/toolkit
).
OM
See:
- the OM TR that became a JPL article
- the JPL article on OM
- the OM home page
- tools built on top of OM, including ATOM, Hiprof and Third Degree.
Partial Emulation
Summary:
Virtual machines (VMs) provide greater flexibility and protection but require the ability to run one operating system (OS) under the control of another. In the absence of virtualization hardware, VMs are typically built by porting the OS to run in user mode, using a special kernel-level environment or as a system-level simulator.
Partial Emulation'' or a
Lightweight Virtual Machine'' is an augmentation-based approach to system-level simulation: directly execute most instructions, statically rewrite and virtualize those instructions which are ``tricky'' due to running in a VM environment. Compared to the other approaches, partial emulation offers fewer OS modifications than user-mode execution (user-mode Linux requires a machine description around 33,000 lines) and higher performance than a full (all instructions) simulator (Bochs is about 10x slower than native execution).
The implementation described here emultes all privilged instructions and some non-privileged instructions. One approach replaces each interesting'' instruction with illegal instruction traps. A second approach is to call emulation subroutines.
Rewriting'' is done during compilation, and the current implementation requires OS source code [EY 03].
The approach here must: detect and emulate privileged and some non-privileged instructions; redirect system calls and page faults to the user-level OS; emulate an MMU; emulate devices.
The implementation with illegal instruction traps uses a companion process and debugger-type accesses to simulate interesting instructions. Otherwise, the user-level OS and its processes are executed in a single host process. The illegal instruction trap'' approach inserts an illegal instruction before each
interesting'' instruction. The companion process then skips the illegal instruction, simulates the interesting'' instruction, then restarts the process. It is about 1,500 lines of C code. The
procedure call'' approach is about 1,400 lines but is faster. There are still out-of-process traps due to e.g., MMU emulation (ala SimOS).
For IA-32, the ``interesting'' instructions are mov
, push
, and pop
instructions that manipulate segment registers; call
, jmp
, and ret
instructions that cross segment boundaries; iret
; instructions that manipulate special registers; and instructions that read and write (privileged bits of) the flag register.
Not all host OSs have the right facilities to implement a partial emulator.
Some target OS changes were needed. For NetBSD, six address constants were changed to avoid host OS conflicts, and device drivers were removed. For FreeBSD, there were also replaced BIOS calls with code that returned the needed values; had they tried to implement (run) the BIOS the system would need to execute virtual 8086 mode.
User-level execution speed was similar to native. For OS-intensive microbenchmarks, the illegal instruction trap'' implementat was at least 100x slower than native (non-virtual) execution and slower than Bochs. The
procedure call'' approach was 3-5x faster, but little slower than Bochs and still 10x slower than VMware which was in turn 4x-10x slower than native. A test benchmark (patch
) was 15x slower using illegal instruction traps and about 5x slower using procedure calls. For comparison, VMware was about 1.1x slower.
The paper proposes using a separate host process for each page table base register value in order to reduce overhead for MMU emulation.
Categories:
- Purpose: simulation
- Input representation: asm
- Detail: System
- Multiple protection domains: Y
- Multiple processors: not sure
- Signals and execptions: Y
- SMC OK: No
- Simulation technology: aug + scc
- Tool is robust in the face of application bugs: Y
- Status: information only.
Further reading: ``Running BSD Kernels as User Processes by Partial Emulation and Rewriting of Machine Instructions'' [EY 03].
Pixie
See:
Pixie-II
See:
Proteus
See:
Purify
See:
qp/qpt
See:
RPPT
See:
RSIM
Simulates pipeline-level parallelism and memory system behavior.
See:
SELF
See:
- an OOPSLA paper
- an ECOOP paper
- a PLDI paper
- The SELF project home page
- The SELF-inspired Cecil project's home page
- Shade
Shade
Shade combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling and caching code to simulate and trace the application program; the cost is as low as two instructions per simulated instruction. The user may control the extent of tracing in various ways; arbitrarily detailed application state information may be collected during the simulation, but tracing less translates directly into greater efficiency. Current Shade implementations run on SPARC systems and simulate the SPARC (Versions 8 and 9) and MIPS I instruction sets.
See:
- Download Shade.
- the technical report
- the SIGMETRICS paper
- an introduction to Shade
- the Shade manual pages
- Chapter 2 of [Conte & Gimarc 95]
Shade was written by Bob Cmelik, with help from David Keppel.
SimICS
SimICS is a multiprocessor simulator. SimICS simulates both the user and system modes of 88000 and SPARC processors and is used for simulation, debugging, and prototyping.
See:
- A SimICS web page.
- An ftp'able collection of papers at
ftp://sics.se/users/psm/simics-papers/
, including the SimICS papers listed here - an introduction
- partial translation
- the compact IR
- efficient memory simulation
- Efficient Memory Simulation in SimICS
- efficient SPARC simulation
- Shade
SimICS should soon be available under license. Contact Peter Magnusson.
SimICS is a rewrite of gsim
, which, in turn, was derived from g88
. SimICS was written by Peter Magnusson, David Samuelsson, Bengt Werner and Henrik Forsberg.
Sinclair ZX Spectrum Emulators
- Spectrum [Amiga]
- ZXAM [Amiga]
- KGB [Amiga]
- !MZX [Archimedes]
- !Speccy [Archimedes]
- Speculator [Archimedes]
- ZX-SPECTRUM Emulator [Atari]
- JPP [IBM PC]
- Z80 [IBM PC]
- SpecEm [IBM PC]
- SP [IBM PC]
- SPECTRUM [IBM PC]
- VGASpec [IBM PC]
- Elwro 800-3 Jr v1.0 [IBM PC]
- MacSpeccy [Macintosh]
- PowerSpectum [PowerMAC]
- xzx [Unix/X]
- xz80 [Unix/X]
See:
Shadow
See:
Simon
See:
SimOS
SimOS emulates both user-mode and system-mode code for a MIPS-based multiprocessor. It uses a combination of direct-execution (some OS rewrites may be required) and dynamic cross-compilation (no rewrites needed) in order to emulate and, to some degree, instrument.
Categories:
- Purpose: otr, sim.
- Input representation: exe
- Detail: user, system
- Multiple protection domains: Yes
- Multiple processors: Yes, on one processor
- Signals and execptions: Yes
- SMC OK: Yes
- Simulation technology: dcc, aug, emu.
- Tool is robust in the face of application bugs: Y
- Status: information
See:
Sleipnir
Sleipnir is an instruction-level simulator generator in the style of yacc. The configuration file is extended C, with special constructs to describe bit-level encodings and common code and support for generation of a threaded-code simulator.
For example, 0b_10ii0sss_s0iidddd
specifies a 16-bit pattern with constant values which must match and named ``don't care'' fields i
(split over two locations), s
, and d
. Sleipnir combines the various patterns to create an instruction decoder. Named fields are substituted in action rules for an instruction. For example, add 0b_10ii0sss_s0iidddd { GP(reg[$d]) = GP(reg[$s]) + $^c }
. Here, ^
indicates sign-extension. Threaded-code dispatch is implied.
For simple machines, Sleipnir can generate cycle-accurate simulators. For more complex machines, it generates ISA machines. Threaded-code simulators are typically weak at VLIW simulation and machines with some kinds of exposed latencies. Threaded-code simulators typically simulate one instruction entirely before starting the next, but with VLIW and exposed latencies, the effects of a single instruction are spread over the execution of several instructions. Sleipnir supports some kinds of exposed latencies by running an after()
function after each instruction. Simulator code that creates values writes them in to buffers, and code in after()
can copy the values as needed to memory, the PC, and so on.
Reported machine description sizes, speeds, and level of accuracy include the following. ``Speed'' is based on a 250 MHz MIPS R10000-based machine.
In Norse mythology, ``Sleipnir'' is an eight-legged horse that could travel over land and sea and through the air.
Architecture | MD lines | Sim. speed | Accuracy |
---|---|---|---|
MIPS-I (integer) | 700 | 5.1 MIPS | ISA |
M*Core | 970 | 6.4 MIPS | Cycle |
ARM/Thumb | 2,812 | 3.6 MIPS | ISA |
TI C6201 | 5,231 | 3.4 MIPS | Cycle |
Lucent DSP1600 | 3,903 | 3.7 MIPS | Cycle |
See:
SoftPC
SoftPC is an 8086/80286 emulator which runs on a variety of host machines. The first version implemented an 8086 processor core using an interpreter. It provided device emulators for EGA/VGA and Hercules graphics, hard disks, floppies, and and an interrupt controller.
In about 1986, Steve Chamberlain developed a dynamic cross-compiler for the Sun 3/260. The basic emulation structure is an array of bytes for simulated memory and and an action'' array, which is a same-size array of bytes. There are then three arrays `R`, `W`, and `X` for reads, writes, and execution; each is subscripted by the
action'' byte and contains a pointer to the correspondition read, write, or execute action. For example, a read of location 17
is implemented by reading a = action[17]
, then branching to R[a]
. Similarly, executing location 17 is implemented by reading a = action[17]
, then branching to X[a]
. The default action is that each instruction is interpreted.
Each branch invokes the translator. The translator (dynamic cross-compiler) generates a translation that starts at the last branch and goes through the current branch. SoftPC then records the current branch target, which is the starting place for the next branch's translation. SoftPC installs'' the translation by allocating a byte subscript `a`, then it fills in the action table with the value `a` and sets `R[a]` to act as a normal read; `W[a]` to invalidate the corresponding translation; and `X[a]` to point to the new translation. For each byte
covered'' by the translation, the action table is set to a byte value that will invalidate the translation. For each translation, SoftPC also sets a back-pointer in a 256-entry table so that when a particular translation is being invalidated it is easy to find the location in the ``action'' table which currently uses that translation.
There are thus a maximum of 256 translations at any time (actually 254 due to reserved byte values). The simulated system had up to 1MB RAM. In about 1988 Henry ??? extended the system to use the low bit of the address as part of the subscript, in order to expand the table to 512 translations. This is used in the first Apple MacIntosh target of SoftPC.
SoftPC emulates many devices, including EGA, VGA, and Hercules video; disks, including floppies and hard disks; the interrupt controller; and so on. In about 1987, Steve Chamberlain implemented an 8087 (FP coprocessor) that was not a faithful 8087 (e.g., did not provide full 80-bit FP) but which provided sufficient accuracy to run common applications.
Categories:
- Purpose: sim.
- Input representation: exe
- Detail: user, system, device. Note: the 8086 does not have a distinct ``system'' mode.
- Multiple protection domains: No. Note that the 8086 target does not have multiple protection domains.
- Multiple processors: No.
- Signals and execptions: Yes.
- SMC OK: Yes
- Simulation technology: dcc, ddi.
- Tool is robust in the face of application bugs: Y
- Status: product
See:
-
home page.
-
FWB Software, which bought SoftWindows and RealPC from Insignia Solutions.
Spa
See:
- bib cite
- brief writeup [Link broken, please e-mail
<pardo@xsim.com>
to get it fixed.] - Shade
- The Spa source code
SPIM
See:
Spix
See:
ST-80
See:
STonX
An Atari ST emulator that runs on (at least) a Sun SPARC IPC under SunOS 4.1; it emulates an MC68000, RAM, ROM, Atari ST graphics, keyboard, BIOS, clock and maybe some other stuff. On a SPECint=13.8 machine it runs average half the speed of a real ST.
See:
T2
T2 is a SPARCle/Fugu simulator that is implemented by dynamically cross-compiling SPARCle code to SPARC code. It simulates both user and system mode code and was used for doing program development before the arrival of SPARCle hardware.
The name T2 is short for ``Talisman-2''. Note that, despite the similarity in names, Talisman and T2 share little in implementation or core features: the former uses a threaded code implementation and provides timing simulation of an m88k, while the latter uses dynamic cross-compilation and provides fast simulation of a SPARCle.
Tango Lite
See:
- a bib cite
- another bib cite
- source code (search around)
- Shade
Talisman
Talisman is a fast timing-accurate simulator for an 88000-based multiple-processor machine. Talisman provides both user-mode and system mode simulation and can boot OS kernels. Simulation is reasonably fast, on the order of a hundred instructions per simulated instruction. Talisman also does low-level timing simulation and typically produces estimated running times that are within a few percent of running times on real hardware. Note that e.g. turning off dynamic RAM refresh simulation makes the timing accuracy substantially worse!
See:
- a bib cite
- the simulator chapter of Robert Bedichek's dissertation
- Source for Talisman.
tar
'd andgzip
'd. - A demonstration of Meerkat simulated using Talisman.
tar
'd andgzip
'd. - Shade (look under ```mg88`'')
Tapeworm II
See:
Third Degree
Built on top of OM.
See:
Titan tracing
See:
TRAPEDS
See:
VAX-11 RSX Emulator
See:
Vest and mx
See:
- bib cite
- bib cite
- Shade
- DECmigrate Products home page.
Windows x86
According to a Microsoft information release, "Windows x86" is a user-space x86 emulator with an OS interface to 32-bit Microsoft Windows (tm).
Windows on Windows (WOW)
According to a Microsoft information release, "Windows on Windows" is a user-space x86 emulator with an interface to 16-bit Microsoft Windows (tm).
Wabi
See:
- Wabi for Solaris and distribution info and the ``technical knowledge page''
- Wabi for IBM web page
- SunPC web page and another.
Wine
Wine is a Microsoft Windows(tm) OS emulator for i*86 systems. Most of the application's code runs native, but calls to ``OS'' functions are transformed into calls into Unix/X. Some programs require enhanced mode device drivers and will (probably) never run under Wine. Wine is neither a processor emulator nor a tracing tool.
See:
WWT
See:
xtrs
See:
Z-80 Simulators
- Bill Haygood's Z-80 simulator [portable]
Z80MU
See:
8051 Emulators
Simulators
- 2500 A.D.
- Avocet Systems
(also compilers and assemblers).
- ChipTools
on a 33 MHz 486 matches the speed of a 12 MHz 8051
- Cybernetic Micro Systems
- Dunfield Development Systems
Low cost $50.00
500,000+ instructions/second on 486/33
Can interface to target system for physical I/O
Includes PC hosted "on chip" debugger with identical user
interface
- HiTech Equipment Corp.
- Iota Systems, Inc.
- J & M Microtek, Inc.
- Keil Electronics
- Lear Com Company
- Mandeno Granville Electronics, Ltd
- Micro Computer Control Corporation
Simulator/source code debugger ($79.95)
- Microtek Research
- Production Languages Corp.
- PseudoCorp
Emulators ($$$ - high, $$ - medium, $ - low priced)
- Advanced Micro Solutions $$
- Advanced Microcomputer Systems, Inc. $
- American Automation $$$ $$
- Applied Microsystems $$
- ChipTools (front end for Nohau's emulator)
- Cybernetic Micro Systems $
- Dunfield Development Systems $
plans for pseudo-ice using Dallas DS5000/DS2250
used together with their resident monitor and host debugger
- HBI Limited $
- Hewlett-Packard $$$
- HiTech Equipment Corp.
- Huntsville Microsystems $$
- Intel Corporation $$$
- Kontron Electronics $$$
- Mandeno Granville Electronics, Ltd
full line covering everything from the Atmel flash to the
Siemens powerhouse 80c517a
- MetaLink Corporation $$ $
- Nohau Corporation $$
- Orion Instruments $$$
- Philips $
DS-750 pseudo-ICE developed by Philips and CEIBO
real-time emulation and simulator debug mode
source-level debugging for C, PL/M, and assembler
programs 8xC75x parts
low cost - only $100
DOS and Windows versions available
- Signum Systems $$
- Sophia Systems $$$
- Zax Corporation
- Zitek Corporation $$$
(Contacts listed in FAQ below).
See:
Glossary
A glossary of some terms used here and in the cited works.
See also Terminology.
-
An
application
is some code (program or program fragment) that is executed or traced by one of the tools described here.
Note that an operating system is considered an application: it is thus possible to speak distinctly of the host and target operating systems.
The target operating system may itself be managing programs; these are considered to be a part of the OS
application'' and are refered to as
user-mode parts of the application''. -
Emulation
is simulating a
target
machine using both software and a
host
machine that has special hardware to help speed the simulation.
See: [Tucker 65]; referenced by [Wilkes] as the original definition.
-
Fidelity
From Paul A. Fishwick
:
Simulation fidelity'' is usually captured under the title
Validation'' within the simulation literature, and within modeling literature in general. A good place to start with validation is the proceedings of the Winter Simulation Conference since the first part of the proceedings is dedicated to tutorials and introductions. Recently, Sargent had a tutorial on validation and you may find others as well. -
The
host
machine is the ``real machine'' where the simulation or tracing is finally run. Compare to the
target
machine, which is the machine that is being simulated or traced.
Note that the host and the target may be the same machine, e.g. a V8 SPARC simulator that runs on a V8 SPARC. See also virtual host.
There are many other terms that can and have been used for host and target. For example, [Wilkes] refers to them as the
object machine'' and
subject machine''. -
Static
analysis, optimization, etc. is performed using the static code but no runtime data.
Compare to dynamic or runtime operations, which may use program data and which may be interleaved with program execution.
Note that static execution is possible, but is limited to pieces that do not depend on program data or places where data values is speculated and a ``backup'' mechanism is available where the speculation was erronious.
-
The
target
machine is the machine that is being simulated or traced. The target machine may be old hardware (e.g. machines that no longer exist), proposed hardware (e.g. machines that do not yet exist), or machines that do currently exist, but for which it is nonetheless valuable to perform simulation or tracing. Compare to the
host
machine, which is the real machine that actually executes the simulation and tracing code.
Note that the host and target may be the same machine, e.g. a V8 SPARC simulator that runs on a V8 SPARC.
There's reportedly an IBM paper that referes to the target as the ``guest'' machine.
-
The term
virtual host
may be used when there are several levels of simulation and tracing. For example,
SoftPC
can run on a SPARC and simulate an 8086; that simulated 8086 can then execute
Z80MU
, which rus on an 8086 and simulates a Z80. As far as Z80MU is concerned, it is running on an 8086 host; the simulated 8086 provided by SoftPC is thus a virtual host for Z80MU.
Note that the real host and the virtual host may be the same machine. For example, Shade runs on a V8 SPARC and simulates a V8 SPARC, and so Shade can simulate a V8 SPARC that is running Shade that is simulating a V8 SPARC that is running an application.
Bibliography
Titles
- ``The Accuracy of Trace-Driven Simulations of Multiprocessors''
- ``Address Tracing for Parallel Machines''
- ``An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures''
- ``ATOM: A Flexible Interface for Building High Performance Program Analysis Tools''
- ``ATOM: A Flexible Interface for Building High Performance Program Analysis Tools'' (2)
- ``ATOM: A System for Building Customized Program Analysis Tools''
- ``ATOM: A System for Building Customized Program Analysis Tools'' (2)
- ``ATUM: A New Technique for Capturing Address Traces Using Microcode''
- ``Binary Translation''
- ``Binary Translation'' (2)
- ``Branch Folding in the CRISP Microprocessor: Reducing Branch Delay to Zero''
- ``A Case for Runtime Code Generation''
- ``The Cerberus Multiprocessor''
- Steve Chamberlain, Personal communication
- ``Combining Hardware and Software to Provide an Improved Microprocessor''
- ``A Compact Intermediate Format for SimICS''
- ``Computer Organization and Design: The Hardware-Software Interface''
- ``The Cygnus Simulator Proposal''
- Decomp
- ``Decompilation of Binary Programs''
- ``Decompilation of Binary Programs'' (2)
- ``Design and Implementation of Dynascope, a Directing Platform for Compiled Programs''
- ``A Design For Efficient Simulation of a Multiprocessor''
- ``The Diagnosis Of Mistakes In Programmes on the EDSAC''
- ``The Dorado Smalltalk-80 Implementation: Hardware Architecture's Impact on Software Architecture''
- ``DOS on the Dock''
- ``The Dynamic Incremental Compiler of APL{$\backslash$}3000''
- ``Dynascope: A Tool for Program Directing''
- ``The Dynascope Directing Server: Design and Implementation''
- ``EEL: Machine-Independent Executable Editing''
- ``Efficient Implementation of the Smalltalk-80 System''
- ``An Efficient Implementation of SELF, a Dynamically-Typed Object-Oriented Language Based on Prototypes''
- ``Efficient Instruction Level Simulation of Computers''
- ``Efficient Memory Simulation in SimICS
- ``Efficient Program Monitoring Techniques''
- ``Efficient Program Tracing''
- ``EMMY -- An Emulation System for User Microprogramming''
- ``Emulators and Emulation''
- ``Enhancement through Extension: The Extension Interpreter''
- ``An Environment for the Reverse Engineering of Executable Programs''
- [``Emulation of Large Computer Systems'' Tucker 65]
- ``Emulation: RISC's Secret Weapon''
- ``Engineering a RISC Compiler System''
- ``Evaluating Runtime-Compiled Value-Specific Optimizations''
- ``Extension and Software Development''
- ``Fast Accurate Simulation of Large Shared Memory Multiprocessors''
- ``Fast and Accurate Multiprocessor Simulation: The SimOS Approach''
- Fast Simulation of Computer Architectures
- FlashPort product literature
- ``Generation and Analysis of Very Long Address Traces''
- GNU debugger and simulator
- Stu Grossman, Personal communication
- ``The Growth of Interest in Microprogramming: A Literature Survey''
- ``The Hardware Architecture of the CRISP Microprocessor''
- ``How To Detect Self-Modifying Code During Instruction-Set Simulation''
- ``IDtrace -- A Tracing Tool for i486 Simulation''
- ``Interprocedural Data Flow Decompilation''
- ``Interprocedural Data Flow Decompilation'' (2)
- The Interpreter -- A Microprogrammable Building Block System
- ``Introduction to Shade''
- ``Introduction to Shadow''
- ``IMS Demonstrates x86 Emulation Chip''
- Gordon Irlam, Personal communication
- Earl Killian, Personal communication
- ``Long Address Traces from RISC Machines: Generation and Analysis''
- ``Low-cost Concrurent Checking of Pointer and Array Access in C Programs''
- ``Mable: A Technique for Efficient Machine Simulation''
- ``The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture''
- ``Memory Controller For A Microprocessor for Detecting A Failure Of Speculation On The Physical Nature Of A Component Being Addressed''
- ``Memory System Performance of UNIX on CC-NUMA Multiprocessors''
- ``Method And Apparatus for Aliasing Memory Data In An Advanced Microprocessor''
- ``A Methodology for Decompilation''
- ``Migrating a CISC Computer Family onto RISC via Object Code Translation''
- ``Mimic: A Fast S/370 Simulator''
- ``Mime: A Tool for Random Emulation and Feedback Trace Collection''
- ``Mint Tutorial and User Manual''
- ``MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors''
- ``Multiplexed Busses: The Endian Wars Continue''
- ``Multiprocessor Simulation and Tracing Using Tango''
- ``The New Jersey Machine-Code Toolkit''
- ``The New Jersey Machine-Code Toolkit'' (2)
- ``New Jersey Machine-Code Toolkit Architecture Specifications''
- ``New Jersey Machine-Code Toolkit Reference Manual''
- ``Optimally Profiling and Tracing Programs''
- ``Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches''
- ``Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback''
- ``Partial Translation''
- ``A Portable Interface for On-The-Fly Instruction Space Modification''
- ``A Practical System for Intermodule Code Optimization at Link-Time''
- ``A Practical System for Intermodule Code Optimization at Link-Time'' (2)
- ``PROTEUS: A High-Performance Parallel-Architecture Simulator''
- ``Purify: Fast Detection of Memory Leaks and Access Errors''
- ``Reverse Compilation Techniques''
- ``Rewriting Executable Files to Measure Program Behavior''
- ``The Rice Parallel Processing Testbed''
- ``RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors''
- ``The RISC Penalty''
- ``Running BSD Kernels as User Processes by Partial Emulation and Rewriting of Machine Instructions''
- ``Shade: A Fast Instruction-Set Simulator for Execution Profiling''
- ``Shade: A Fast Instruction-Set Simulator for Execution Profiling'' (2)
- ``Shade: A Fast Instruction-Set Simulator for Execution Profiling'' (3)
- ``The Shade User's Manual''
- ``Simon: A Simulator of Multicomputer Networks''
- ``SimOS: A Platform for Complete Workload Studies''
- Sleipnir -- An Instruction-Level Simulator Generator
- ``A Software High Performance APL Interpreter''
- ``Some Efficient Architecture Simulation Techniques''
- ``Some Efficient Techniques for Simulating Memory''
- ``SpixTools Introduction and User's Manual''
- ``A Structuring Algorithm for Decompilation''
- ``Structuring Decompiled Graphs''
- ``System Level Interpretation of the SPARC V8 Instruction Set Architecture''
- ``Talisman: Fast and Accurate Multicomputer Simulation''
- ``Tapeworm~II: A New Method for Measuring OS Effects on Memory''
- ``Techniques for Efficient Inline Tracing on a Shared-Memory Multiprocessor''
- ``Threaded Code Interpreter for Object Code''
- ``Method and Apparatus for Correcting Errors in Computer Systems''
- ``Tracing With Pixie''
- ``Transmeta Breaks x86 Low-Power Barrier''
- ``Trap-driven Simulation with Tapeworm II''
- ``TRAPEDS: Producing Traces for Multicomputers via Execution Driven Simulation''
- ``Two-Level Hybrid Interpreter/Native Code Execution for Combined Space-Time Program Efficiency''
- ``The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers''
- ``Z80MU''
- ``680x0 emulation on x86 (ARDI's syn68k used in Executor)''
Bibliography
-
[ASH 86]
\bibitem{ASH:86} Anant Agarwal, Richard L. Sites and Mark Horowitz, ``ATUM: A New Technique for Capturing Address Traces Using Microcode,'' Proceedings of the 13th International Symposium on Computer Architecture (ISCA-14), June 1986, pp.~119-127.
-
[AS 92]
\bibitem{AS:92} Kristy Andrews and Duane Sand, ``Migrating a CISC Computer Family onto RISC via Object Code Translation,'' Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-V), October 1992, pp.~213-222.
-
[BL 94]
\bibitem{BL:94} Thomas Ball, and James R. Larus ``Optimally Profiling and Tracing Programs,'' ACM Transactions on Programming Languages and Systems, (16)2, May 1994,
-
[Baumann 86]
\bibitem{Baumann:86} Robert A. Baumann, ``Z80MU,'' Byte Magazine, October 1986, pp.~203-216.
-
[Jeremiassen 00]
\bibitem{Jeremiassen:00} Tor E. Jeremiassen, ``Sleipnir --- An Instruction-Level Simulator Generator,'' International Conference on Computer Design, pp.~23--31. IEEE, 2000.
-
[Bedichek 90]
\bibitem{Bedichek:90} Robert Bedichek, ``Some Efficient Architecture Simulation Techniques,'' Winter 1990 USENIX Conference, January 1990, pp.~53-63. PostScript(tm) paper [Link broken, please e-mail
<pardo@xsim.com>
to get it fixed.] -
[Bedichek 94]
\bibitem{Bedichek:94} Robert Bedichek, ``The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture,'' Doctoral Dissertation, University of Washington Department of Computer Science and Engineering technical report 94-06-06, 1994.
-
[Bedichek 95]
\bibitem{Bedichek:95}
@inproceedings(Bedichek:95, author = "Robert C. Bedichek", title = "Talisman: Fast and Accurate Multicomputer Simulation", booktitle="Proceedings of the 1995 ACM SIGMETRICS Conference on Modeling and Measurement of Computer Systems", month=May, year="1995", page=14--24 )
-
[BKLW 89]
\bibitem{BKLW:89} Anita Borg, R. E. Kessler, Georgia Lazana and David W. Wall, ``Long Address Traces from RISC Machines: Generation and Analysis,'' Digital Equipment Western Research Laboratory Research Report 89/14, (appears in shorter form as~\cite{BKW:90}) September 1989. Abstract/paper.
-
[BKW 90]
\bibitem{BKW:90} Anita Borg, R. E. Kessler and David W. Wall, ``Generation and Analysis of Very Long Address Traces,'' Proceedings of the 17th Annual Symposium on Computer Architecture (ISCA-17), May 1990, pp.~270-279.
-
[Boothe 92]
\bibitem{Boothe:92} Bob Boothe, ``Fast Accurate Simulation of Large Shared Memory Multiprocessors,'' technical report UCB/CSD 92/682, University of California, Berkeley, Computer Science Division, April 1992.
-
[BDCW 91]
\bibitem{BDCW:91} Eric A. Brewer, Chrysanthos N. Dellarocas, Adrian Colbrook and William E. Weihl, ``{\sc Proteus}: A High-Performance Parallel-Architecture Simulator,'' Massachusetts Institute of Technology technical report MIT/LCS/TR-516, 1991.
-
[BAD 87]
\bibitem{BAD:87} Eugene D. Brooks III, Timothy S. Axelrod and Gregory A. Darmohray, ``The Cerberus Multiprocessor,'' Lawrence Livermore National Laboratory technical report, Preprint UCRL-94914, 1987.
-
[Chamberlain 94]
\bibitem{Chamberlain:94} Steve Chamberlain, Personal communication, 1994.
-
[CUL 89]
\bibitem{CUL:89} Craig Chambers, David Ungar and Elgin Lee, ``An Efficient Implementation of {\sc Self}, a Dynamically-Typed Object-Oriented Language Based on Prototypes,'' OOPSLA '89 Proceedings, October 1989, pp.~49-70.
-
[CHRG 95]
%A John Chapin %A Steve Herrod %A Mendel Rosenblum %A Anoop Gupta %T Memory System Performance of UNIX on CC-NUMA Multiprocessors %J ACM SIGMETRICS '95 %P 1-13 %D May 1995 %W ftp://www-flash.stanford.edu/pub/hive/numa-os.ps
-
[CHKW 86]
\bibitem{CHKW:86} Fred Chow, A. M. Himelstein, Earl Killian and L. Weber, ``Engineering a RISC Compiler System,'' IEEE COMPCON, March 1986.
-
[CG 93]
\bibitem{CG:93} Cristina Cifuentes and K.J. Gough ``A Methodology for Decompilation,'' In Proceedings of the XIX Conferencia Latinoamericana deInformatica, pp. 257-266, Buenos Aires, Argentina, August 1993. PostScript(tm) paper, PostScript(tm) paper.(Note: these papers may have moved to here.)
-
[CG 94]
\bibitem{CG:94} Cristina Cifuentes and K.J. Gough ``Decompilation of Binary Programs,'' Technical report 3/94, Queensland University of Technology, School of Computing Science, 1994. PostScript(tm) paper(Note: these papers may have moved to here.)
-
[CG 95]
\bibitem{CG:95} C. Cifuentes and K. John Gough, ``Decompilation of Binary Programs,'' Software--Practice&Experience, July 1995. PostScript(tm) paperDescribes general techniques and a 80286/DOS to C converter.
-
[Cifuentes 93]
\bibitem{Cifuentes:93} C. Cifuentes, ``A Structuring Algorithm for Decompilation'', Proceedings of the XIX Conferencia Latinoamericana de Informatica, Aug 1993, Buenos Aires, pp. 267 - 276. PostScript(tm) paper
-
[Cifuentes 94a]
\bibitem{Cifuentes:94a} Cristina Cifuentes ``Interprocedural Data Flow Decompilation,'' Technical report 4/94, Queensland University of Technology, School of Computing Science, 1994. PostScript(tm) paper(Note: these papers may have moved to here.)
-
[Cifuentes 94b]
\bibitem{Cifuentes:94b} Cristina Cifuentes ``Reverse Compilation Techniques,'' Doctoral disseration, Queensland University of Technology, July 1994. PostScript(tm) paper (474MB).
-
[Cifuentes 94c]
\bibitem{Cifuentes:94c} C. Cifuentes, ``Structuring Decompiled Graphs,'' Technical Report 4/94, Queensland University of Technology, Faculty of Information Technology, April 1994. PostScript(tm)
-
[Cifuentes 95]
\bibitem{Cifuentes:95} C. Cifuentes, ``Interprocedural Data Flow Decompilation'', Journal of Programming Languages. In print, 1995. PostScript(tm) paper
-
[Cifuentes 95b]
\bibitem{Cifuentes:95b} C. Cifuentes, ``An Environment for the Reverse Engineering of Executable Programs''. To appear: Proceedings of the Asia-Pacific Software Engineering Conference (APSEC). IEEE. Brisbane, Australia. December 1995. PostScript(tm) paper
-
[Conte & Gimarc 95]
``Fast Simulation of Computer Architectures'', Thomas M. Conte and Charles E. Gimarc, Editors. Kluwer Academic Publishers, 1995. ISBN 0-7923-9593-X.See here for ordering information.
-
[CDKHLWZ 00]
%A Robert F. Cmelik %A David R. Ditzel %A Edmund J. Kelly %A Colin B. Hunter %A Douglas A. Laird %A Malcolm John Wing %A Gregorz B. Zyner %T Combining Hardware and Software to Provide an Improved Microprocessor %R United States Patent #US6031992
Available as of 2000/03 viahttp://www.patents.ibm.com/details?&pn=US06031992__
HERE r 77% -
[98]
US06011908 01/04/2000 Gated store buffer for an advanced microprocessor Available as of 2000/03 via77% r 77%
-
[98]
US05958061 09/28/1999 Host microprocessor with apparatus for temporarily holding target processor state e Available as of 2000/03 via77%
-
[Cmelik 93a]
\bibitem{Cmelik:93a} Robert F. Cmelik, ``Introduction to Shade,'' Sun Microsystems Laboratories, Incorporated, February 1993.
-
[Cmelik 93b]
\bibitem{Cmelik:93b} Robert F. Cmelik, ``The Shade User's Manual,'' Sun Microsystems Laboratories, Incorporated, February 1993.
-
[Cmelik 93c]
\bibitem{Cmelik:93c} Robert F. Cmelik, ``SpixTools Introduction and User's Manual,'' Sun Microsystems Laboratories, Incorporated, technical report TR93-6, February 1993. Html pointer
-
[CK 93]
\bibitem{CK:93} Robert F. Cmelik, and David Keppel, ``Shade: A Fast Instruction-Set Simulator for Execution Profiling,'' Sun Microsystems Laboratories, Incorporated, and the University of Washington, technical report SMLI 93-12 and UWCSE 93-06-06, 1993. Html pointer, PostScript(tm) paper.
-
[CK 94]
\bibitem{CK:94} Robert F. Cmelik, and David Keppel, ``Shade: A Fast Instruction-Set Simulator for Execution Profiling,'' Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems May 1994, pp.~128-137. Html pointer, PostScript(tm) paper. [Link broken, please e-mail
<pardo@xsim.com>
to get it fixed.] -
[CK 95]
\bibitem{CK:95} Robert F. Cmelik, and David Keppel,
Shade: A Fast Instruction-Set Simulator for Execution Profiling,'' Appears as Chapter~2 of
[Conte & Gimarc 95]'', pp.~5-46. -
[CMMJS 88]
\bibitem{CMMJS:88} R. C. Covington, S. Madala, V. Mehta, J. R. Jump and J. B. Sinclair, ``The Rice Parallel Processing Testbed,'' Proceedings of the 1988 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, 1988, pp.~4-11.
-
[DLHH 94]
\bibitem{DLHH:94} Peter Davies, Philippe LaCroute, John Heinlein and Mark Horowitz, ``Mable: A Technique for Efficient Machine Simulation,'' Quantum Effect Design, Incorporated, and Stanford University technical report CSL-TR-94-636, 1994.
-
[DGK 91]
\bibitem{DGH:91} Helen Davis, Stephen R. Goldschmidt and John Hennessy, ``Multiprocessor Simulation and Tracing Using Tango,'' Proceedings of the 1991 International Conference on Parallel Processing (ICPP, Vol. II, Software), August 1991, pp.~II 99-107.
-
[Deutsch 83]
\bibitem{Deutsch:83} Peter Deutsch, ``The Dorado Smalltalk-80 Implementation: Hardware Architecture's Impact on Software Architecture,'' Smalltalk-80: Bits of History, Words of Advice, 1983 Addison-Wesley pp.~113-126.Review/summary by Pardo:Describes a mostly-microcode implementation of the ST-80 VM. Runs on a Xerox Dorado; fastest ST-80 implementation, in it's day. About 85-95% of the execution time is spent in the Dorado's ST-80 microcode.
-
[DS 84]
\bibitem{DS:84} Peter Deutsch and Alan M. Schiffman, ``Efficient Implementation of the Smalltalk-80 System,'' 11th Annual Symposium on Principles of Programming Languages (POPL-11), January 1984, pp.~297-302.
-
[DM 87]
\bibitem{DM:87} David R. Ditzel and Hubert R. McLellan ``Branch Folding in the CRISP Microprocessor: Reducing Branch Delay to Zero,'' Proceedings of the 14th Annual International Symposium on Computer Architecture; Computer Architecture News, Volume 15, Number 2, June 1987, pp.~2-9.
-
[DMB 87]
\bibitem{DMB:87} David R. Ditzel, Hubert R. McLellan and Alan D. Berenbaum, ``The Hardware Architecture of the CRISP Microprocessor,'' Proceedings of the 14th Annual International Symposium on Computer Architecture; Computer Architecture News, Volume 15, Number 2, June 1987, pp.~309-319.
-
[EKKL 90]
\bibitem{EKKL:90} Susan J. Eggers, David Keppel, Eric J. Koldinger and Henry M. Levy, ``Techniques for Efficient Inline Tracing on a Shared-Memory Multiprocessor,'' Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1990, pp.~37-47.
-
[ES 94]
\bibitem{ES:94} Alan Eustace and Amitabh Srivastava, ``ATOM: A Flexible Interface for Building High Performance ProgramAnalysis Tools,'' Technical note TN-44, July 1994, Digital Equipment Corporation Western Research Laboratory, July 1994. Html.
-
[ES 95]
\bibitem{ES:95} Alan Eustace and Amitabh Srivastava, ``ATOM: A Flexible Interface for Building High Performance Program Analysis Tools,'' Proceedings of the USENIX 1995 Technical Conference on UNIX and Advanced Computing Systems, New Orleans, Louisiana, January 16-20, 1995, pp. 303-314.
-
[EY 03]
\bibitem{EY:83} Hideki Eiraku, Yasushi Shinjo ``Running BSD Kernels as User Processes by Partial Emulation and Rewriting of Machine Instructions'', Proceedings of BSDCon '03 San Mateo, CA, USA, 8-12 September 2003.
-
[Fujimoto 83]
\bibitem{Fujimoto:83} Richard M. Fujimoto, ``Simon: A Simulator of Multicomputer Networks'' technical report UCB/CSD 83/137, ERL, University of California, Berkeley, 1983.
-
[FC 88]
\bibitem{FC:88} Richard M. Fujimoto, and William B. Campbell, ``Efficient Instruction Level Simulation of Computers,'' Transactions of The Society for Computer Simulation 5(2), April 1988, pp.~109-123.
-
[FP 94]
\bibitem{FP:94} FlashPort product literature, AT&T Bell Laboratories, August 1994.
-
[FN75]
\bibitem{FN:75} M. J. Flynn, C. Neuhauser,
EMMY -- An Emulation System for User Microprogramming,'' National Computer Conference, 1975, pp.~85-89.Review/summary by [Pardo](http://www.xsim.com/bib/index1.d/Index.html#Whos-Pardo):EMMY is a user-microprogrammable machine designed to allow easy microprogramming in order to emulate other machines. The goal is to facilitate inter-architecture comparisons, analyze the effectiveness of architectures and compilers (optimizations) through
software probes'', and to develop new architectures (in order to solve thesemantic gap'' problem).Features of special hardware include: field handling and selection, shifting, extensive bit testing and flexible specification of data paths (which they call
residual control''; I don't understand the last one).EMMY was designed withsevere'' cost constraints. Goal is to fit whole CPU on one PC board. It is 32-bit, 4,096 words (32-bit) of microstore. There are 7 general-purpose registers and an eighth status register that includes condition codes, memory busy, status (halt/run/disable interrupts) and microinstruction PC.Each microinstruction has
a high degree of parallelism'' for a ``machine of this size'', by which I assume they mean something that is ~vertically microcoded. Each 32-bit microinstruction is, basically, a 2-wide VLIW, where one half performs logical and branch operations and the other branch and memory operations.Implemented in TTL plus some MECL-10K (a Motorola ECL). The CPU is on a 12" x 15" card with micromemory and console logic on separate cards. The clock is 25ns; microstorage has 60ns to access, with a 200ns cycle time; microinstructions are typically executed every 200ns; they estimate it takes ~10 microseconds to execute each simulated instruction.No discussion of multitasking/privilege/VM/etc. -
[Gill 51]
\bibitem{Gill:51} S. Gill, ``The Diagnosis Of Mistakes In Programmes on the EDSAC'' Proceedings of the Royal Society Series A Mathematical and Physical Sciences, 22 May 1951, (206)1087, pp.~538-554, Cambridge University Press London and New York.The scanned article available via here. [Link broken, please e-mail
<pardo@xsim.com>
to get it fixed.] -
[GDB 94]
\bibitem{GDB:94} GNU debugger and simulator, Internet Universal Resource Locator {\mbox{\tt ftp://prep.ai.mit.edu/pub/gnu}}, GDB distribution, {{\tt sim}} subdirectory.Note that (as of 1998) for each simulator included with GDB there is also a GCC target and a set of runtime libraries.
-
[GH 92]
\bibitem{GH:92} Stephen R. Goldschmidt and John L. Hennessy, ``The Accuracy of Trace-Driven Simulations of Multiprocessors,'' Stanford University Computer Systems Laboratory, technical report CSL-TR-92-546, September 1992.
-
[Granlund 94]
\bibitem{Granlund:94} Torbj"{o}rn Granlund, ``The Cygnus Simulator Proposal,'' Cygnus Support, Mountain View, California, March 1994.
-
[Grossman 94]
\bibitem{Grossman:94} Stu Grossman, Personal communication, November 1994.
-
[Halfhill 94]
\bibitem{Halfhill:94} Tom. R. Halfhill, ``Emulation: RISC's Secret Weapon,'' Byte, April 1994, pp.~119-130.
-
[Halfhill 00]
\bibitem{Halfhill:00} Tom. R. Halfhill,
Transmeta Breaks x86 Low-Power Barrier,'' Microprocessor Report, Feburary 14, 2000.Review/summary by [Pardo](http://www.xsim.com/bib/index1.d/Index.html#Whos-Pardo):Basic architecture: a VLIW core plus software to translate x86 code to native VLIW code at run time.Related tools include: FWB's SoftWindows for the Macintosh and Unix, Connectix's Virtual PC for the Macintosh, FX!32 for Alpha, Sun's HotSpot Java JIT.All are "emulators" -- optimizing and caching translated code is a performance-enhancing techique that does not change the fundamentals of what is going on.Crusoe has special hardware to assist emulation.Related tools include International Meta Systems (IMS) 3250, a never-produced design to emulate x86, 68K and 6502 using customizable microcode plus ... (See uPR 5/6/92-03,
Microcode Engine Offers Enhanced Emulation'').Crusoe hardware is not specifically for x86 emulation; to boost performance of any non-native executables. For example, see Transmeta Java->VLIW demonstration. Still, probably more than coincidence Crusoe chips have 80-bit FP, partial register writes, etc.Crusoe's most important accomplishments:combining VLIW and emulation;HW/SW technology to vary processor voltage and frequency adaptively depending on workload (``LongRun'');new standard for low power among x86-compatible processors;sacrifice less performance to emulation than other software translators -- Crusoe chips are slower than other similar-frequency x86 processors, but because the core is optimzed for low power, not high perfornace.700MHz Crusoe is about 70% the speed of a 700-MHz Pentium III.Integrated PCI controller, DRAM controllers, and other components of a traditional north bridge. Gains power consumption and efficiency of on-chip memory controllers.Architectural features of little importance to SW developers because nobody writes software for the native architecture.Different Crusoe models not compatible at the host level, are compatible at the target level.Discussion of LongRun. [[Not relevant to this page except to note that it is transparent. --pardo]]Overhead of emulation is highly variable. Traditionally about 10:1. Modern emulators use caching, optimized recompilation for 4:1.For mobile markets, x86 has historically be ousted by RISCs due to lower power consumption of RISCs.x86 instruction decoding in software.Current Intel/AMD processors convert x86 to micro-ops, while Cycrix and Centaur execute x86 directly. Micro-op tranlsationa adds a pipe stage and has more overhead (microcode call) for complex instructions."Micro-op" conversion in software saves control logic and simplifies chip design.Can fix many kinds of bugs in software.Emulation is at least as old as 1964, with IBM's S/360 provided emulation for IBM's older 1401s. [[Arguably, [Gill 51]. See also [Tucker 65] and [Wilkes 69] -- pardo]]What is new: emulation is not an alternative, it is the whole strategy. [[Arguably, microcode is "the whole strategy" for many processors; but Crusoe uses dynamic compilation of microcode and skips most hardware used in a traditional microcode approach. --pardo]]HW features to assist CMS:register files, many with shadows;gated store buffer;80-bit FP registers;per-instruction "commit" bit;alias hardware to allow memory reference reordering;MMU protection of translated memory;special caches for translation softwareShadow state lets Crusoe execute speculatively and out of order. On an exception, can roll back to most recent committed state by a simple copy. Preserves prcecise exception model.On boot, reserves fixed block (e.g., 16MB) for translator and recompiled code.Interpretation requires at least 12 clock cycles per x86 instruction.Dynamically profile code and select it for translation, maybe optimization.Granularity of translation is one or more basic blocks.Familiar optimizations: loop unrolling, common subexpression eliminiation, loop-invariant code removal, ... Some are x86-specific: skip redundant sets of x86 condition codes. Some are VLIW-specific: combine multiple x86 blocks into one VLIW block.Code expansionby increasing the number of instructions to do the same work; orby translating compact x86 instructions to longer VLIW equivalents.Example from [Transmeta 00]:20 x86 instructions to 10 VLIW instructions but 23 "useful" VLIW packets plus 7 NOP packets. Total 32 subinstructions, 50% more than the original x86 code.Further expansion because VLIW packets are 32b, but "typical" x86 instructions are 16-24b.I$ expansion may be 33% to 150%Reduces effective size of caches including translation cache.If translation cache flushing is often, hard to amortize cost of translation.Cost of extra RAM for translation cache.No free lunch -- tradeoff to get lower power.It is a VLIW, efficiency depends on scheduling -- no dynamic reordering hardware. For comparison, the IA-64 group has been refining Multiflow's VLIW compiler for years; Transmeta started from scratch, and the compiler has to run in real time, not overnight.Factor in Crusoe's favor: monitor actual usage.At time of writing this article, all results based on Transmeta claims, no independent results.Crusoe's technology defies benchmarking: too heavy on repetitive loops, overestimates performance; conversely, low-repeat benchmarks may underestimate real performance. Battery tests, similar: unless they mirror real-live use cloesly, they will not represent what average users can expect.Emulator is no longer new technology: Java JIT compilers, Windows emulators, dozens of emulators including otherwise-dead machines like Apple II, Atari 2600, Commodor 64. -
[Haygood 1999]
%A Bill Haygood %T Emulators and Emulation %J Self (http://www.haygood.org/~bill/emul/index.html) %D 1999
Review/summary by Pardo:Available here (see also here (reprinted with permission). See ``Bill Haygood'' for contact information.Briefly describes implementation techniques for three simulators used to simulate the PDP-8/e, Zilog Z80A, and DEC LSI-11.Implementation tradeoffs favor large lookup tables to reduce computation. Instruction handlers are dispatched using a table indexed by opcode; condition codes on the Z-80 emulator are computed by indexing a 2^16-entry table by both 8-bit operands.Details of finding out what the actual operations do.PDP-8a FP coprocessor. It uses IEEE host hardware with 6 bits less precision and a smaller exponent than the PDP-8a. -
[HJ 92]
\bibitem{HJ:92} Reed Hastings and Bob Joyce, ``Purify: Fast Detection of Memory Leaks and Access Errors,'' Proceedings of the Winter USENIX Conference, January 1992, pp.~125-136.
-
[HP 93]
\bibitem{HP:93} John Hennessy and David Patterson, ``Computer Organization and Design: The Hardware-Software Interface'' (Appendix A, by James R. Larus), Morgan Kaufman, 1993.
-
[HCU 91]
\bibitem{HCU:91} Urs H"{o}lzle, Craig Chambers and David Ungar, ``Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches,'' Proceedings of the European Conference on Object-Oriented Programming (ECOOP), July 1991, pp.~21-38.
-
[HU 94]
\bibitem{HU:94} Urs H"{o}lzle and David Ungar, ``Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback,'' Proceedings of the 1994 ACM Conference on Programming Language Design and Implementation (PLDI), June, 1994, pp.~326-335.
-
[Hsu 89]
\bibitem{Hsu:89} Peter Hsu, ``Introduction to Shadow,'' Sun Microsystems, Incorporated, July 1989.
-
[IMS 94]
\bibitem{IMS:94} ``IMS Demonstrates x86 Emulation Chip,'' Microprocessor Report, 9 May 1994, pp.~5 and~15.
-
[Irlam 93]
\bibitem{Irlam:93} Gordon Irlam, Personal communication, February 1993.
-
[James 90]
\bibitem{James:90} David James, ``Multiplexed Busses: The Endian Wars Continue,'' IEEE Micro Magazine, June 1990, pp.~9-22.
-
[Johnston 79]
\bibitem{Johnston:79} Ronald L. Johnston, ``The Dynamic Incremental Compiler of APL{$\backslash$}3000,'' APL Quote Quad 9(4), Association for Computing Machinery (ACM), June 1979, pp.~82-87.
-
[KCW 98]
%A Edmund J. Kelly %A Robert F. Cmelik %A Malcolm John Wing %T Memory Controller For A Microprocessor for Detecting A Failure Of Speculation On The Physical Nature Of A Component Being Addressed %D 1998/11/03 %R United States Patent #05832205
Available as of 2000/03 via http://www.patents.ibm.com/details?pn=US05832205__. -
[Keppel 91]
\bibitem{Keppel:91} David Keppel, ``A Portable Interface for On-The-Fly Instruction Space Modification,'' Proceedings of the 1991 Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), April 1991, pp.~86-95 (source code is also available via anonymous ftp from
ftp://ftp.cs.washington.edu/pub/pardo/fly-1.1.tar.gz
). PostScript(tm) paper. -
[KEH 91]
\bibitem{KEH:91} David Keppel, Susan J. Eggers and Robert R. Henry, ``A Case for Runtime Code Generation,'' University of Washington Computer Science and Engineering technical report UWCSE TR 91-11-04, November 1991. Html pointer [Link broken, please e-mail
<pardo@xsim.com>
to get it fixed.] -
[KEH 93]
\bibitem{KEH:93} David Keppel, Susan J. Eggers and Robert R. Henry, ``Evaluating Runtime-Compiled Value-Specific Optimizations,'' University of Washington Computer Science and Engineering technical report UWCSE TR 93-11-02, November 1993. Html pointer [Link broken, please e-mail
<pardo@xsim.com>
to get it fixed.] -
[Killian 94]
\bibitem{Killian:94} Earl Killian, Personal communication, February 1994.
-
[KKB 98]
%A Alex Klaiber %A David Keppel %A Robert Bedicheck %T Method and Apparatus for Correcting Errors in Computer Systems %D 18 May 1999 %R United States Patent #05905855
Available as of 2000/03 via http://www.patents.ibm.com/details?pn=US05905855__ -
[LOS 86]
\bibitem{LOS:86} T. G. Lang, J. T. O'Quin II and R. O. Simpson, ``Threaded Code Interpreter for Object Code,'' IBM Technical Disclosure Bulletin, 28(10), March 1986, pp.~4238-4241.
-
[Larus 93]
\bibitem{Larus:93} James R. Larus, ``Efficient Program Tracing,'' IEEE Computer 26(5), May 1993, pp.~52-61.
-
[LB 94]
\bibitem{LB:94} James R. Larus and Thomas Ball, ``Rewriting Executable Files to Measure Program Behavior,'' Software -- Practice and Experience 24(1), February 1994, pp.~197-218.
-
[LS 95]
\bibitem{LS:95} James R. Larus, and Eric Schnarr ``EEL: Machine-Independent Executable Editing,'' to appear: SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 291-300, June 1995. PostScript(tm) paper, or an laternate site for the same paper.
-
[Magnusson 93a]
\bibitem{Magnusson:93a} Peter S. Magnusson, ``A Design For Efficient Simulation of a Multiprocessor,'' Proceedings of the First International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), La Jolla, California, January 1993, pp.~69-78. PostScript(tm) paper.
-
[Magnusson 93b]
\bibitem{Magnusson:93b} Peter S. Magnusson, ``Partial Translation,'' Swedish Institute for Computer Science technical report T93:05, 1993. PostScript(tm) paper.
-
[MS 94]
\bibitem{MS:94a} Peter S. Magnusson, and David Samuelsson, ``A Compact Intermediate Format for SimICS,'' Swedish Institute of Computer Science technical report R94:17, September 1994. PostScript(tm) paper.
-
[MW 94]
\bibitem{MW:94} Peter S. Magnusson, and Bengt Werner, ``Some Efficient Techniques for Simulating Memory,'' Swedish Institute of Computer Science technical report R94:16, September 1994. PostScript(tm) paper.
-
[MW 95]
\bibitem{MW:94} Peter S. Magnusson, and Bengt Werner, ``Efficient Memory Simulation in SimICS''. In 28th International Annual Simulation Simposium, Phoenix, AZ. April 1995.
-
[Matthews 94]
\bibitem{Matt:94} Clifford T. Matthews, ``680x0 emulation on x86 (ARDI's syn68k used in Executor)'' USENET \code{comp.emulators.misc} posting, 3 November, 1994. plain text document, plain text document.
-
[May 87]
\bibitem{May:87} Cathy May, ``Mimic: A Fast S/370 Simulator,'' Proceedings of the ACM SIGPLAN 1987 Symposium on Interpreters and Interpretive Techniques; SIGPLAN Notices 22(7), June 1987, pp.~1-13.
-
[Nielsen 91]
\bibitem{Nielsen:91} Robert D. Nielsen, ``DOS on the Dock,'' NeXTWorld, March/April 1991, pp.~50-51.
-
[NG 87]
\bibitem{NG:87} David Notkin and William G. Griswold, ``Enhancement through Extension: The Extension Interpreter,'' Proceedings of the ACM SIGPLAN '87 Symposium on Interpreters and Interpretive Techniques, June 1987, pp.~45-55.
-
[NG 88]
\bibitem{NG:88} David Notkin and William G. Griswold, ``Extension and Software Development,'' Proceedings of the 10th International Conference on Software Engineering, Singapore, April 1988, pp.~274-283.
-
[Kep 03]
\bibitem{Keppel:03} David Keppel,
How to Detect Self-Modifying Code During Instruction-Set Simulation'', April, 2003. Available as of 2003/10 from
xsim.com'' (papers). -
[PM 94]
\bibitem{PM:94} Jim Pierce and Trevor Mudge, ``IDtrace -- A Tracing Tool for i486 Simulation,'' Proceedings of the International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), January 1994.
-
[PRA 97]
\bibitem{PRA:97} Vijay S. Pai, Parthasarathy Ranganathan, and Sarita V. Adve. "RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors". In Proceedings of the Third Workshop on Computer Architecture Education. February 1997.
-
[Patil 96]
@TECHREPORT{Patil96, TITLE = "Efficient {P}rogram {M}onitoring {T}echniques", AUTHOR = "Harish Patil", INSTITUTION = "Computer Sciences department, University of Wisconsin", YEAR = 1996, MONTH = "July", TYPE = "{TR} 1320: Ph.D. Dissertation", ADDRESS = "Madison, Wisconsin", }
-
[Patil 97]
@ARTICLE{patil97, TITLE = "Low-cost, Concurrent Checking of Pointer and Array Accesses in {C} Programs", JOURNAL = "Software - Practice and Experience", AUTHOR = "Harish Patil and Charles Fischer", VOLUME = 27, NUMBER = 1, YEAR = 1997, MONTH = "January", PAGES = "87-110", }
-
[PM 94]
\bibitem{PM:94} Jim Pierce and Trevor Mudge, ``IDtrace -- A Tracing Tool for i486 Simulation,'' Technical report 203-94, University of Michigan, March 1994.PostScript(tm) paper
-
[Pittman 87]
\bibitem{Pittman:87} Thomas Pittman, ``Two-Level Hybrid Interpreter/Native Code Execution for Combined Space-Time Program Efficiency,'' Proceedings of the 1987 ACM SIGPLAN Symposium on Interpreters and Interpretive Techniques, June 1987, pp.~150-152.
-
[Pittman 95]
\bibitem{Pittman:95} Thomas Pittman,
The RISC Penalty,'' IEEE Micro, December 1995, pp.~5, 76-80.Brief summary by [Pardo](http://www.xsim.com/bib/index1.d/Index.html#Whos-Pardo): The paper analyzes the costs of RISC due to higher (instruction) cache miss rates. Demonstrated by comparing the inner loop code for an interpretive ([ddi](http://www.xsim.com/bib/index1.d/Index.html#Category-sim-int-ddi)) processor emulator to the inner loop code for a [dynamic cross-compiler](http://www.xsim.com/bib/index1.d/Index.html#Category-sim-int-dcc). With perfect cache hit ratios, the former would take 61 cycles while the latter would take 18. However, due to cache miss costs, the
18-cycle'' version took longer to run. -
[RF 94a]
@techreport{ramsey:tk-architecture, author="Norman Ramsey and Mary F. Fernandez", title="The {New} {Jersey} Machine-Code Toolkit", number="TR-469-94", institution="Department of Computer Science, Princeton University", year="1994" }
PostScript(tm) paper, conference paper -
[RF 94b]
@techreport{ramsey:tk-architecture, author="Norman Ramsey and Mary F. Fernandez", title="{New} {Jersey} {Machine-Code} {Toolkit} Architecture Specifications", number="TR-470-94", institution="Department of Computer Science, Princeton University", month="October", year="1994" }
WWW page, PostScript(tm) paper. -
[RF 94c]
@techreport{ramsey:tk-reference, author="Norman Ramsey and Mary F. Fernandez", month=oct, title="{New} {Jersey} {Machine-Code} {Toolkit} Reference Manual", number="TR-471-94", institution="Department of Computer Science, Princeton University", month="October", year="1994" }
WWW page, PostScript(tm) paper. -
[RF 95]
\bibitem{Ramsey:95} Norman Ramsey and Mary F. Fernandez, ``The {New} {Jersey} Machine-Code Toolkit,'' Proceedings of the Winter 1995 USENIX Conference, New Orleans, Louisiana, January, 1995, pp~289-302.
@inproceedings{ramsey:jersey, refereed=1, author="Norman Ramsey and Mary F. Fernandez", title="The {New} {Jersey} Machine-Code Toolkit", booktitle="Proceedings of the 1995 USENIX Technical Conference", address="New Orleans, LA", pages="289-302", month=January, year="1995" }
-
[RFD 72]
\bibitem{RFD:72} E. W. Reigel, U. Faber, D. A. Fisher, ``The Interpreter -- A Microprogrammable Building Block System,'' Spring Joint Computer Conference, 1972, pp.~705-723.
-
[RHLLLW]
\bibitem{RHLLLW:93} Steven K. Reinhardt, Mark D. Hill, James R. Larus, Alvy. R. Lebeck, J. C. Lewis and David A. Wood, ``The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers,'' Proceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, June 1993 pp.~48-60.
-
[Reuter 8X]
\bibitem{Reuter:8X} Jim Reuter,
Decomp,'' circa 1985. [source code available from
ftp://ftp.cs.washington.edu/pub/decomp.tar.Z'](ftp://ftp.cs.washington.edu/pub/decomp.tar.Z), and [sample inputs available from ``ftp://ftp.cs.washington.edu/pub/pardo/decomp-samples.tar.gz
'](ftp://ftp.cs.washington.edu/pub/decomp-samples.tar.gz). -
[RHWG 95]
\bibitem{RHWG:95} Mendel Rosenblum, Stephen A. Herrod, Emmett Witchel and Anoop Gupta, ``Complete Computer System Simulation: The SimOS Approach,'' IEEE Parallel and Distributed Technology: Systems and Applications, 3(4):34-43, Winter 1995.abstract, Compressed PostScript® (57 KB).
-
[RW 94]
\bibitem{RW:94} Mendel Rosenblum and Emmett Witchel, ``SimOS: A Platform for Complete Workload Studies,'' Personal communication (submitted for publication), November 1994.
-
[SW 79]
\bibitem{SW:79} H. J. Saal and Z. Weiss, ``A Software High Performance APL Interpreter,'' APL Quote Quad 9(4), June 1979, pp.~74-81.
-
[Samuelsson 79]
\bibitem{Samuelsson:94} David Samuelsson ``System Level Interpretation of the SPARC V8 Instruction Set Architecture,'' Research report 94:23, Swedish Institute of Computer Science, 1994.
-
[Sathaye 94]
\bibitem{Sath:94} Sumedh W. Sathaye, ``Mime: A Tool for Random Emulation and Feedback Trace Collection,'' Masters thesis, Department of Electrical and Computer Engineering, University of South Carolina, Columbia, South Carolina, 1994.
-
[SE 93]
\bibitem{SE:93} Gabriel M. Silberman and Kemal Ebcio\u{g}lu ``An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures,'' IEEE Computer, June 1993, pp.~39-56.
-
[SCKMR 92]
\bibitem{SCKMR:92} Richard L. Sites, Anton Chernoff, Matthew B. Kirk, Maurice P. Marks and Scott G. Robinson, ``Binary Translation,'' Digital Technical Journal Vol. 4 No. 4 Special Issue 1992. Html paper, PostScript(tm) paper.
-
[SCKMR 93]
\bibitem{SCKMR:93} Richard L. Sites, Anton Chernoff, Matthew B. Kirk, Maurice P. Marks and Scott G. Robinson, ``Binary Translation,'' Communications of The ACM (CACM) 36(2), February 1993, pp.~69-81.
-
[Smith 91]
\bibitem{Smith:91} M. D. Smith, ``Tracing With Pixie,'' Technical Report CSL-TR-91-497, Stanford University, Computer Systems Laboratory, November 1991. PostScript(tm).
-
[Sosic 92]
\bibitem{Sosic:92} Rok Sosi\v{c}, ``Dynascope: A Tool for Program Directing,'' Proceedings of the 1992 ACM Conference on Programming Language Design and Implementation (PLDI), June 1992, pp.~12-21.
-
[Sosic 94]
\bibitem{Sosic:94} Rok Sosi\v{c}, ``Design and Implementation of Dynascope, a Directing Platform for Compiled Programs,'' technical report CIT-94-7, School of Computing and Information Technology, Griffith University, 1994.
-
[Sosic 94b]
\bibitem{Sosic:94b} Rok Sosi\v{c}, ``The Dynascope Directing Server: Design and Implementation,'' Computing Systems, 8(2): 107-134, Spring 1994
-
[SE 94a]
\bibitem{SE:94a} Amitabh Srivastava, and Alan Eustace, ``ATOM: A System for Building Customized Program Analysis Tools,'' Research Report 94/2, March 1994, Digital Equipment Corporation Western Research Laboratory, March 1994.
-
[SE 94b]
\bibitem{SE:94b} Amitabh Srivastava, and Alan Eustace, ``ATOM: A System for Building Customized Program Analysis Tools,'' Proceedings of the 1994 ACM Conference on Programming Language Design and Implementation (PLDI), June 1994, pp.~196-205.
-
[SW 92]
\bibitem{SW:92} Amitabh Srivastava, and David W. Wall, ``A Practical System for Intermodule Code Optimization at Link-Time,'' Research Report 92/6, Digital Equipment Corporation Western Research Laboratory, December 1992.
-
[SW 93]
\bibitem{SW:93} Amitabh Srivastava, and David W. Wall, ``A Practical System for Intermodule Code Optimization at Link-Time,'' Journal of Programming Languages, March 1993.
-
[SF 89]
\bibitem{SF:89} Craig B. Stunkel and W. Kent Fuchs, ``{\sc Trapeds}: Producing Traces for Multicomputers via Execution Driven Simulation,'' ACM Performance Evaluation Review, May 1989, pp.~70-78.
-
[SJF 91]
\bibitem{SJF:91} Craig B. Stunkel, Bob Janssens and W. Kent Fuchs, ``Address Tracing for Parallel Machines,'' IEEE Computer 24(1), January 1991, pp.~31-38.
-
[Tucker 65]
%A S. G. Tucker %T Emulation of Large Systems %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 753-761
ABSTRACT: The conversion problem and a new technique called emulation are discussed. The technique of emulation is developed and includes sections on both the Central Processing Unit (CPU) and the Input/Output (I/O) unit. This general treatment is followed by three sections that describe in greater detail the implemention of compatibility features using the emulation techniques for the IBM 7074, 7080 and 7090 systems on the IBM System/360.Cited by [Wilkes 69].Pardo has a copy. -
[UNMS 94a]
\bibitem{UNMS:94} Richard Uhlig, David Nagle, Trevor Mudge and Stuart Sechrest, ``Tapeworm~II: A New Method for Measuring OS Effects on Memory Architecture Performance,'' Technical Report, University of Michigan, Electrical Engineering and Computer Science Department, May 1994.
-
[UNMS 94b]
\bibitem{UNMS:94} Richard Uhlig, David Nagle, Trevor Mudge and Stuart Sechrest, ``Trap-driven Simulation with Tapeworm II,'' Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), San Jose, California, October 5-7, 1994.
-
[Veenstra 93]
\bibitem{Veenstra:93} Jack E. Veenstra, ``Mint Tutorial and User Manual,'' University of Rochester Computer Science Department, technical report 452, May 1993.
-
[VF 94]
\bibitem{VF:94} Jack E. Veenstra and Robert J. Fowler, ``{\sc Mint}: A Front End for Efficient Simulation of Shared-Memory Multiprocessors,'' Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), January 1994, pp.~201-207.
-
[Wilkes 69]
\bibitem{Wilkes:69} Maurice V. Wilkes
The Growth of Interest in Microprogramming: A Literature Survey,'' Computing Surveys 1(3):139-145, September 1969.Excerpts:(pg 142):
1965 and 1966 saw papers on an entirely new subject, namely the emulation of one computer by another [42-44]. Tucker [38] defined an emulator as a package that includes special hardware and a complementary set of software routines. An emulator runs five or even ten times as fast as a purely software simulator. Tucker goes on to discuss the design of emulators for large systems. It is only in very unusal circumstances that it is practicable to write a microprogram that implements directly on the object machine [[host --pardo]] the instruction set of the subject machine [[target --pardo]]; this is because of differences in word length, processor structure, and so on. Tucker recommends that in order to design an emulator, one should first study a simulator and see in what areas it spends most of its time. This analysis will generally lead to the identification, as candidates for microprogramming, of a group of special instructions which are related tnot to specific instructions of the sbuect machine but rather to problems common to many such instructions. The most important of these special instructions is likely to be one that performs a similar function to the main loop in an interpreter and sends control to an appropriate software simulator for each instruction interpreted. Another will probably be an instruction that performs a conditional test in the way that it is performed on the subject machine. It may also be worthwhile adding special instructions to deal with such instructions of the subject machine as are difficult to simulate. If this procedure is carried to the extreme, the software simulation disappears altogether and we have a full hardware feature. Full hardware features are econimically practicable only for small machines (McCormack, et al [41]).Sometimes the design of an emulator can be much simplified if a small change or addition is made to the register interconnection logic of the object machine; an example, cited by Tucker, is the addtion of a small amount of logic to the IBM System 360/65 processor in order to facilitate the emulation of overflow detection on IBM 7090 shifts. Such addtions (if made) can enable the efficiency of the emuator as a whole to be imrpoved to a useful extent. Sometimes more substantial additions are worthwhile, such as hardware registers intended to correspond to particular register on the subject machine. By careful design of an emulator it is even possible to handle correctly certain types of fuction that are time-dependent on the subject machine. McCormack, et al [41] gives an example of a case in which hardware additions to the object machine were necessary in order to enable it, when running under the mulator, to handle data at the rates required by certain peripheral devices. It is generally found that, in order to accomodate an emulator, it is necessary to provide a section section to the read-only memory approximately equal in size to the section that holds the microprogram for the basic instruction set. There is no doubt that emulators will be of great economic importance to the computer industry in the future, and the fact that they can be provided relatively easily on a microprogrammed computer is an argument in favor of microprogramming as a design method.... Opler [45] has suggested the term firmware [... and ...] suggests that firmware may take its place along with software and hardware as one of the main commodities of the computer field.[38] Tucker, S. G. Emulation of large systems, Comm. ACM 8, 12 (Dec. 1965), 753-761. (See [Tucker 65].)[41] McCormack, M. A., Schansman, T. T., and Womack, K. K. 1401 compatability feature on the IBM system/360 model 30. Comm. ACM 8, 12 (Dec. 1965), 773-776.[42] Benjamin, R. I. The Spectra 70/45 emulator for the RCA 301 Comm. ACM 8, 12 (Dec. 1965), 748-752.[43] Green, J. Microprogramming, emulators and programming languages Comm. ACM 9, 3 (Mar. 1966), 230-232.[44] Campbell, C. R., and Neilson, D. A. Microprogramming the Spectra 70/35, Datamation 12, 9 (1966), 64.[45] Opler, A. Fourth generation software. Datamation 13, 1 (1967), 22. -
[Wilner 72]
\bibitem{Wilner:72} W. T. Wilner,
Design of the Burroughs B1700,'' Fall Joint Compuer Conference, 1972, 489-497.Excerpts:The B1700 uses user-programmed microcode that is dynamically swapped to and from main memory. The goal is to support multitasking in an environment where there are many macro-machines, each tailored to a particular high-level language or machine simulation.(pp. 489-490) The design rationalle is that good performance is achieved only if programs are executed by a machine that closely resembles the language (the
semantic gap'' argument). Includes target machine emulation as a language.(pg. 491) The B1700 includes: bit-level memory addressability, microcode (conceptually) executed from main memory to allow interleaved program execution; 16-bit microinstructions; 2/4/6 MHz micro-issue; 14-53 microseconds to reconfigure (swap instruction sets) at 6MHz; compilers ``may not'' generate code for the microengine, thus ensuring portability.(pg. 492) All memory accesses are via a transliteration processor that maps variable-width bit-addressed references onto main memory.(pg. 493) Switching interpreters between instructions is useful not only for context switches but also for switching e.g. between traced and non-traced execution.(pg. 493) Microcode logically executes out of main memory, but may actually use processor-local memory.(pg. 495) Running-time comparisons indicate that an RPG-II program runs an order of magnitude faster than on an IBM System/3 that has a similar lease rate (memory bound); another set of (banking) benchmarks ran in 50% to 110% of the System/3 time (not clear if I/O bound, but the B1700 had a slower card reader). Notes that B1700 supports bit-variable addressing, segmented virtual memory, multitasking, ...Microarchitecture not clearly described. Note that uses 16-bit (vertical) microcode. Might actually be a two-level scheme as in The Interpreter. -
[WK 99]
%A Malcolm J. Wing %A Edmund C. Kelly %T Method And Apparatus for Aliasing Memory Data In An Advanced Microprocessor %D 20 July 1999 %R United States Patent #05926832
Available as of 2000/03 via http://www.patents.ibm.com/details?pn=US05926832__
Comments on Some Of The Papers
- EDSAC Debug introduces several things including: multiple strategy execution that switches between simulation for tracing and direct execution for speed; and displaced execution that simulates an instruction by copying it from its original location. EDSAC Debug also describes a fairly modern debugger; the paper is from 1951!
- ST-80 introduces the idea of dynamically cross-compiling groups of instructions from the simulated target machine to run directly on the host machine. Mimic extends this idea to work on a real (non-virtual) instruction set and shows how several important optimizations can be performed using runtime information.
g88
shows how a simple and portable interpreter can simulate a processor with modest slowdown.g88
also introduces fast sophisticated address translation used to implement multiple-domain and system-mode simulation. One of the SimICS papers shows a construction for optimizing and abstracting address translation. SimOS shows how to do use the host's MMU to perform address translation even faster.- CRISP introduces the idea of dynamic compilation of machine instructions to microcode.
- Kx10 shows how to simulate a 36-bit DEC-10 on machines that have 32-bit words.
- Mime introduces the notion of checkpointing and rollback of simulator state, used to simulate and trace the effects of e.g. speculative execution.
- OM describes tool-building tools for tracing.
- Talisman introduces fast approximate timing simulation that can estimate program running time with high accuracy (within a few percent; turning off e.g. dynamic RAM refresh simulation reduces accuracy dramatically), while adding only several tens of instructions to the cost of each simulated instruction.
- Mshade (a component of SimOS) and T2 use dynamic cross-compilationr to implement system mode code. There is often less benefit over interpretation because system code tends to have checking overheads that narrows the gap between translation and interpretation.
- WWT applies the idea of exploiting hardware features to use distributed shared virtual memory for user-mode multiprocessor simulation and tracing. Tapeworm II extends the idea to use general memory reference exceptions for general user and system-mode tracing.
Who's Who
This really isn't who's who -- it's a list of some of the people who were easy to find and who seem to spend a lot of their time doing simulation and tracing.
The Lists
A list of names of some people doing simulation and tracing.
- Anant Agarwal
- Kristy Andrews
- Thomas Ball
- Robert A. Baumann
- Robert Bedichek
- Anita Borg
- Bob Boothe
- Eric A. Brewer
- Steve Chamberlain
- Lee Kiat Chia
- Cristina Cifuentes
- Bob Cmelik
- Thomas M. Conte
- Don Eastlake
- Alan Eustace
- U. Faber
- D. A. Fisher
- David Keppel: see Pardo
- Richard M. Fujimoto
- Torbjorn Granlund
- Bill Haygood
- Tom R. Halfhill
- Steve Herrod
- Mark Horowitz
- Tor Jeremiassen
- R. E. Kessler
- James R. Larus
- Georgia Lazana
- Peter Magnusson
- Vijay Pai
- Pardo
- Russell Quong
- Norman Ramsey
- E. W. Reigel
- Steven K. Reinhardt
- Mendel Rosenblum
- Duane Sand
- Richard L. Sites
- M. D. Smith
- Amitabh Srivastava
- Rok Sosic
- Richard M. Stallman
- Thai Wey Then
- David Wall
- Emmett Witchel
- Marinos "nino" Yannikos
- Maurice Wilkes
Details About Who's Who
See here for a list of names.
Anant Agarwal
Kristy Andrews
(```kristy.andrews@compaq.com`'' as of 1999/09) is a co-developer of Accelerator.
Thomas Ball
Thomas Ball (``tball@research.bell-labs.com` as of 1999) works on improving software production through domain-specific languages, automated program analysis, and software visualization. He has helped build tools such as qp/qpt and the Hot Path Browser.
See also The Twelve Days of Christmas, Reverse-Engineered.
Robert A. Baumann
Robert Bedichek
Robert Bedichek (``robert@bedicheck.org' as of 1999/07), wrote the [
g88`](http://www.xsim.com/bib/index1.d/Index.html#Tool-g88) simulator while at Tektronix, Talisman while at the University of Washington, and T2 while at MIT.
Robert Bedichek is interested in computer architecture and operating systems and has built Meerkat, a modestly-scalable multiple-processor machine. The lack of good systems analysis tools, however, keeps driving him back to tool-building.
Anita Borg
Bob Boothe
Eric A. Brewer
Steve Chamberlain
Steve Chamberlain (``sac *** pobox ; com`', as of 1999/07) has written a series of amazing virtual machines including SoftPC and the GNU Simulators. He has also done a lot of work on BFD, GAS, GCC, GLD, etc. for a wide variety of machines.
Lee Kiat Chia
Lee Kiat Chia, (chia@ecn.purdue.edu
as of 1995/06) is part of Purdue's Binary Emulation and Translation group.
Cristina Cifuentes
Cristina Cifuentes (cristina@csee.uq.edu.au`'' or
cristina@it.uq.edu.au'' both as of 1998) has studied decompilation extensively and wrote [dcc](http://www.xsim.com/bib/index1.d/Index.html#Tool-dcc). Cristina was previously at UTAS ([here](http://www.cs.utas.edu.au/Staff/Cifuentes,Cristina/CNC.html), ``C.N.Cifuentes@cs.utas.edu.au
' as of 1994).
Bob Cmelik
Bob Cmelik (``Bob.Cmelik@Sun.com' as of 1995/03 [Link broken, please e-mail
pardo@xsim.com` to get it fixed.]), wrote the Spix static instrumentation tools and the Shade simulation and tracing tool while at Sun Microsystems, and helped to design and implement Crusoe at Transmeta.
Thomas M. Conte
Thomas M. Conte (conte@ncsu.edu
as of 2001/08/31) is one of the editors of [Conte & Gimarc 95].
Don Eastlake
Don Eastlake (dee@world.std.com
as of July 1995) wrote the instruction execution engine of 11SIM.
Alan Eustace
Alan Eustace (``eustace@pa.dec.com`' as of 1994) worked with Amitabh Srivastava to develop ATOM.
U. Faber
D. A. Fisher
Richard M. Fujimoto
Richard M. Fujimoto (``fujimoto@cc.gatech.edu`', as of 1994) has worked on several simulators, including dis+mod+run, Simon, and a variety of time-warp simulation systems.
Torbjorn Granlund
Torbjorn Granlund (``tege@cygnus.com`', as of 1994) has worked on simulators both at the Swedish Institute for Computer Science and at Cygnus.
Note: the second o'' in
Torbjorn'' should have an umlaut over it, but so far no umlaut appears here.
Bill Haygood
Bill Haygood (bill@haygood.org
as of July 1999) wrote portable PDP-8, Z-80, and LSI-11 simulators. His home page contains a [short writeup Haygood 1999] on computation/space tradeoffs (e.g., lookup tables for condition codes).
Tom R. Halfhill
Tom R. Halfhill (halfhill@mdr.cahners.com
and halfhill@hooked.net
as of March 2000) writes for Microprocessor Report and before that wrote for Byte and other technology magazines. He has been watching and writing about emulation for quite a while. Articles include [Halfhill 94], [Halfhill 94b], and [Halfhill 00].
Steve Herrod
Steve Herrod (herrod@cs.stanford.edu
or herrod@vmware.com
as of January 2002) has been involved with Tango Lite, studying about and writing a paper called ``Memory System Performance of UNIX on CC-NUMA Multiprocessors'', a hardware, trace-based evaluation of IRIX on the Stanford DASH multiprocessor, SimOS, the Crusoe processor, and VMWare.
Mark Horowitz
Tor Jeremiassen
(``tor**ti;*com`' as of 2003/10).
R. E. Kessler
James R. Larus
James R. Larus, (``larus *** microsoft ; com`' as of 2003/11) specializes in compiler- and architecutre-related projects and has worked on EEL, SPIM, qp/qpt and WWT.
Georgia Lazana
Peter S. Magnusson
Peter Magnusson (``psm *** virtutech ; com' as of 2003/10) built [SimICS](http://www.xsim.com/bib/index1.d/Index.html#Tool-SimICS) and its predecessor, [
gsim`](http://www.xsim.com/bib/index1.d/Index.html#Tool-gsim) while at the Swedish Institute for Computer Science. As of 2003/10 he is president and CEO of Virtutech.
Cathy May
Cathy May (may *** watson *;* ibm *;* com
) is author of Mimic, which performed dynamic translation of groups of blocks of target code to groups of blocks of host code.
Vijay S. Pai
Vijay S. Pai (vijaypai *** rice *;* edu
as of 2003/11) was coauthor of RSIM at Rice.
Pardo
Pardo (``pardo *** xsim ; com`' as of 1999/03) helped with the design and implementation of MPtrace and the design of Shade, both while at the University of Washington. He was an original Crusoe architect and implmentor.
Pardo is most infamous for his shameless promotion of Run-Time Code Generation (also known as self-modifying code), and he also suffers from interests in compilers, computer architecture, operating systems, performance analysis, and a bunch of other stuff.
Russell Quong
Russell W. Quong (at Sun Microsystems as of 2002/10) directed Purdue's Binary Emulation and Translation group and also built very-large workload simulators at Sun.
Norman Ramsey
Norman Ramsey (``norman**eecs;purdue;*edu`' as of 2003) spends a lot of time trying to solve portability problems and is responsible for the New Jersey Machine Code Toolkit. He also has an ongoing interest in debuggers, interpreters, linkers, and so on.
E. W. Reigel
Steven K. Reinhardt
Steven K. Reinhardt (``stever@cs.wisc.edu`' as of 1994) spends a lot of time simulating multiple-processor machines. He's spent a lot of time working on WWT.
Mendel Rosenblum
Mendel Rosenblum (mendel@cs.stanford.edu`' as of 1999, also probably
mendel@vmware.com`') has both spent a lot of time simulating multiple-processor machines, and lately, at VMWare, simulating uniprocessors nested virtual machines.
Duane Sand
Duane Sand (```duane.sand@compaq.com`'', as of 1999/09) designed and helped write Accelerator, used to migrate Tandem's application base and OS from their proprietary processor to a MIPS-based processor.
Richard L. Sites
Richard L. Sites
M. D. Smith
Michael D. Smith (smith@eecs.harvard.edu
as of 1999/08) works on computer architectures and compilation for those architectures. Instruction-Set Simulator and tracing tool papers include Pixie.
Rok Sosic
Rok Sosic (sosic@cit.gu.edu.au
as of 1995/09) wrote Dynascope and Dynascope-II. Note: The c' in Rok's name should have a
v'-shaped accent over it, but HTML doesn't seem to have that accent.
Amitabh Srivastava
Amitabh Srivastava (``amitabh@pa.dec.com`' as of 1994) worked with David W. Wall to develop OM and with Alan Eustace to develop ATOM.
Richard M. Stallman
Richard M. Stallman (rms@gnu.ai.mit.edu
as of July 1995) wrote the device emulation engine of 11SIM.
Thai Wey Then
Thai Wey Then (at Purdue as of 1995/06) is part of Purdue's Binary Emulation and Translation group.
David Wall
David Wall (wall@mti.sgi.com
as of 95/08) has worked on several compiler tools that operate at or near link time, including Titan tracing and OM.
Maurice V. Wilkes
Maurice V. Wilkes, is generally considered the inventor of microcode. Wilkes cites various authors who've proposed or used microcode to implement high-performance emulators.
Wilkes is also one of the grandparents'' of computing. He was around the day that EDSAC became the world's first opreational general-purpose programmable computer. He is credited with saying that they
discovered'' debugging that very same day while attempting to execute a simple program for generating a table of prime numbers (see ``The Multics System'' by Elliot I. Organick, The MIT Press 1972, pg. 127).
Emmett Witchel
Emmet Witchel (witchel@lcs.mit.edu`' as of 1995,
witchel@cs.stanford.edu`' as of 1994) worked on SimOS.
Marinos "nino" Yannikos
Marinos "nino" Yannikos (nino@mips.complang.tuwien.ac.at) is the author of STonX and helped with this web page.
See Also: Related Work
A variety of work that seems relevant and isn't folded in elsewhere.
- Archive of simulators-realted announcements and call-for-paper messages.
comp.emulators.misc
FAQ (also available non-HTML'ized here and here): a large list of available emulators and other information about simulation.- Archive of postings to the simulators mailing list
- Archive of conference calls
- Home page on emulator home pages. References Snes9x.com and ``PE2000.net'' which turns up nothing as of 2003/02/03.
- Emu article -- ``Steve Snake'' here.
- Article about game emulators.
- A Emulation Software R&D WWW Page
- Computers and Emulation -- a web page on topics including emulation.
- The Retrocomputing Museum keeps track of emulators for retro-machines.
- The Charles Babbage Institute has a link to a history of simulation site, but the link wasn't working when Pardo tried it. See perhaps their TR or the TR's author, Paul A. Fishwick. The TR itself is about general simulation, rather than ISA simulation (simulating instruction sets).
- FPGA's for emulation.
- General Simulation and more general simulation (not just instruction-set simulation).
- The Comprehensive Computer Catalogue (CCC), listing thousands of machines starting with the ZS1.
See also more general information discovery sites
To Do: Work In Progress
-
Other tracing work by the
Computer Architecture
folks at the University of Washington.
-
A history of computers
, which includes information on simulators.
-
Larus's
AE'' (abstract execution), including James R. Larus,
Abstract Execution: A Technique for Efficiently Tracing Programs'', Software--Practice & Experience, 20(12):1241-1258, December 1990 UW CS TR 912. -
Pure Software, United States Patent 5,193,180 March 1993.
-
Robert Wahbe, Steven Lucco, Thomas E. Anderson and Susan L. Graham ``Efficient Software-Based Fault Isolation,'' Proceedings of the Fourteenth ACM Symposium on Operating System Principles (SOSP), December 1993, pp.~203-216.
PostScript(tm) paper
-
Robert Wahbe, Steven Lucco and Susan L. Graham, ``Practical Data Breakpoints: Design and Implementation,'' In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation. SIGPLAN, ACM, 1993. Sigplan Notices, Volume 28, Number 6.
PostScript(tm) paper
-
Robert Wahbe. Efficient Data Breakpoints. In ASPLOS V. ACM, 1992. PostScript(tm) paper
-
Mark Weiser, ``Program Slicing,'' IEEE Transactions on Software Engineering, SE-10(4):352-357, July 1994.
-
The glossary should have a ``short list'', and should define runtime, dynamic, speculation and rollback.
-
STonX
not integrated.
-
Apple Macintosh emulators
not integrated.
-
Apple II emulators
not integrated.
-
DEC PDP-8 emulators
not integrated.
-
DEC PDP-11 emulators
not integrated.
-
SimCPM
(see the bottom of the page) isn't listed.
-
xtrs
isn't integrated.
-
MSX emulators
isn't integrated.
-
Sinclair ZX Spectrum Emulators
isn't integrated.
-
VAX-11 RSX Emulator
isn't integrated.
-
Wabi
isn't integrated.
-
8051 Emulators
isn't integrated.
-
Emulator for disk drive processors
-
AXP simulators
including Mannequin, ISP, AUD, and AUDI. Also in
PostScript(tm)
-
Simulation of the DEC NVAX:
text
and
PostScript(tm)
-
Intel 80960 Emulators
. (this is just a list of mfg., but lists a few emulator mfg.)
-
Wine
is not integrated.
-
DOSEMU (from Linux) isn't listed.
-
The section
Implementation: Simulation Technology
needs to be rewritten to make orthogonal static/dynamic and level of IR.
-
Need an entry for
\bibitem{Rosin:69}
R. F. Rosin, ``Contemporary Concepts of Microprogramming and Emulation,'' Computer Surveys, Vol. 1, No. 4, Dec. 1969, pp. 197-212. Cited by
[RFD 72]
.
-
Need an entry for
\bibitem{McCormack:65}
M. A. McCormack, T. T. Schansman, and K. K. Womack, ``1401 compatability feature on the IBM system/360 model 30,'' Comm. ACM 8, 12 (Dec. 1965), 773-776. Cited by
[Wilkes 69]
.
-
Need an entry for
\bibitem{Benjamin:65}
R. I. Benjamin, ``The Spectra 70/45 emulator for the RCA 301,'' Comm. ACM 8, 12 (Dec. 1965), 748-752. Cited by
[Wilkes 69]
.
-
Need an entry for
\bibitem{Green:66}
J. Green, ``Microprogramming, emulators and programming languages,'' Comm. ACM 9, 3 (Mar. 1966), 230-232. Cited by
[Wilkes 69]
and
[RFD 72]
.
-
Need an entry for
\bibitem{CN:66}
C. R. Campbell, and D. A. Neilson, ``Microprogramming the Spectra 70/35,'' Datamation 12, 9 (1966), 64. Cited by
[Wilkes 69]
.
-
[Wilkes 69]
isn't yet incorporated fully into this document.
-
Need to merge interpreter types (e.g. ddi, tci, etc.) into glossary.
-
Cons up ``Who's Who'' entries for all the authors.
-
Need to read/incorporate
Computer Structures: Readings and Examples
, by Bell and Newell, especially Weber's paper comparing native issue rates of microcode and emulated ``native'' code. (Reference courtesey of
Duane Sand
).
-
\bibitem{Wilner:72}
is not integrated.
-
[FN75]
is not integrated. It should have a ``Tool'' listing and the review under the paper should be moved up to the tool listing.
-
Need to find/read/incorporate
R. F. Rosin, ``Contemporary Concepts of Microprogramming and Emulation,'' Computing Surveys, Volume 1, Number 4, December 1961, pp.~197-212
, found in [Tucker & Flynn, Dynamic Miicroprogramming: Processor Organization and Programming, 1971].
-
Need to find/incorporate papers from
Techewb
, including
Try a search
here
using `emulation' as the search key.
-
Emulation under GNU HURD
-
Need to find/read/incorporate
[HV 79]
\bibitem{HV:79} R. N. Horspool and N. Marovac, ``An Approach to the Problem of Detranslation of Computer Programs,'' The Computer Journal, 23(3)223-229, 1979.
C. Cifuentes says that it may not apply to e.g. x86 architectures; limits also mentioned in one of May's papers
-
Duane Sand
notes that
dynamic cross compilation
-- anything using
Runtime Code Generation (RTCG)
[Link broken, please e-mail
to get it fixed.] -- has at least one advantage over some other forms of emulation:
It is less liable to cause legal troubles with copyright owners' rights to control all derivative works, because the RTCG's result is only a transient copy rather than a permanently stored codefile. RTCG-based emulation techniques are legal IFF the 1995 generation of chips' hardware-implemented transformations at icache load time are legal.
-
Need to read/categorize/etc. the paper ``A $\mu$ Simulator and $\mu$ Debugger for the CAL DATA 135'', Fredirck L. Ross, M. S. Thesis, Department of Compuer Science, Southern Illinois University July 1978. See
Pardo
and ask him to ask Ebeling for a copy.
-
Need to find/read/categorize/etc. the paper
%A Tom Thompson, %T Building the Better Virtual CPU %J Byte %D August 1995 %P 149-150
which,
Duane Sand
says (paraphrasing):
Describes two variations of Apple's 68K interpreter it used in the initial PowerMacs. Both variations identify frequently executed blocks of 68K code, compile them with trivial peephole optimizations into host RISC code, hold the code in a software-managed "cache" until it's full, then throw it all away and start over. One variation is used on Unix emulations of Apple, and the other variation is used on the 'Power Mac 9500', in combination with a modified interpreter with a smaller lookup table footprint than in the first generation PowerMacs. (The original interpeter used so much lookup table space that it ran poorly on the original PPC 603 chips, which held up Apple's plans for laptop PowerMacs for a year.) On PowerMacs that are able to run both old & new versions, the new version averages 20-30% speedup over the entire (nonnative) application.
-
The
Retrocomputing Museum
keeps track of emulators for retro-machines.
-
The
Charles Babbage Institute
has a link to a
history of simulation
site, but the link wasn't working when
Pardo
tried it. See perhaps
their TR
or the TR's author,
Paul A. Fishwick
. The TR itself is about general simulation, rather than ISA simulation (simulating instruction sets).
-
Integrate
[PM 94]
.
-
Investigate/integrate
Redo
project, using decompilation for software maintainance and reverse engineering.
-
Investigate
Slack: A New Performance Metric for Parallel Programs
which has to do with profiling.
-
Find and incorporate
-
[Graham 65]
%A S. Graham %T The Semi-Automatic Computer Conversion System (SACCS) %J Presented at the ACM Reprogramming Conference %C Princeton, New Jersey %D June 1965 %W Referenced by [Gaines 65]
-
-
Find and incorporate
-
[Gaines 65]
%A R. Stockton Gaines %T On the Translation of Machine Language Programs %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 736-741
Pardo
has a copy.
-
-
Find and incorporate
-
[Dellert 65]
%A George T. Dellert, Jr. %T A Use of Macros in Translation of Symbolic Assembly Language of One Computer to Another %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 742-748
Pardo
has a copy.
-
-
Find and incorporate
-
[Benjamin 65]
%A R. I. Benjamin %T The Spectra 70/45 Emulator for the RCA 301 %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 748-752
Pardo
has a copy.
Pardo has a copy.
-
-
Find and incorporate
%A Thomas M. Olsen %T Philco/IBM Translation at Problem-Oriented Symbolic and Binary Levels %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 762-768
Pardo
has a copy.
-
Find and incorporate
%A Marvin Lowell Graham %A Peter Zilahy Ingerman %T An Assembly Language for Reprogramming %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 769-773
Pardo
has a copy.
-
Find and incorporate
%A M. A. McCormack %A T. T. Schansman %A K. K. Womack %T 1401 Compatability Feature on the IBM System/360 Model 30 %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 773-776
Pardo
has a copy.
-
Find and incorporate
%A Donald M. Wilson %A David J. Moss %T CAT: A 7090-3600 Computer-Aided Translation %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 777-781
Pardo
has a copy.
-
Find and incorporate
%A Mark I. Halpern %T Machine Independence: Its Technology and Economics %J Communications of the ACM (CACM) %V 8 %N 12 %D December 1965 %P 782-785
Pardo
has a copy.
-
Integrage
FreePort Express
-
Find and integrate MIT AI Lab Technical reports about the ``
11SIM
'' PDP-11 simulator that ran on a PDP-6 and later a DEC KA-10. Written by
Don Eastlake
and
Richard M. Stallman
. See
MIT TR e-mail
.
-
Find and integrate
OmniVM
.
-
Write a PDP-1 emulator in
Java
,
OmniVM
or whatever.
-
One advantage of running a simulator on a segmented machine (such as the 80386) is that you can set aside a segment to use for application addreses. Then, bogus application addresses can't ``wrap'' to valid simulator addresses. The advantage is that you can get the benefits of full address mapping (e.g.,
g88
,
SimOS
) without paying the translation overhead. Of course segmented machines are harder to simulate efficiently...
-
gsim
gsim
is derived from
g88
and a predecessor of
SimICS
. Get a straight history from
Peter Magnusson
and write up a separate entry.
-
Library signatures
Integrate this:
[Emmerik 94] M van Emmerik, ``Signatures for Library Functions in Executable Files'', Technical Report 2/94, Queensland University of Technology, Faculty of Information Technology, April 1994. Submitted ... 1994. PostScript(tm)
-
Yannikos Simulators
Search outward from
Marinos "nino" Yannikos
's page on
simulators
.
-
Integrate this:
LazyFP (???)
%A David Wakeling %T A Throw-away Compiler for a Lazy Functional Language %J Proceedings of the Fuji Internation Workshop on Functional and Logic Programming %C Susono, Japan %D July 1995 %P 203--216 %W David Wakeling <david@dcs.exeter.ac.uk> ``http://www.dcs.ex.ac.uk/~david'' %X Dynamic cross-compiler for a virtual machine used to run lazy languages.
-
Subsections
This file is easiest to edit and browse (sometimes) if it is one big file, but it (a) takes forever to load and (b) screws with some browsers. Ideally, it should be possible to build a script that takes the big thing as input and makes it into a bunch of little things. Doing it in a couple of passes would even be alright, so that intra-document links would get fixed up correctly. For example:
- Build a tool that generates one .html file for each header up to the next header. Do one file per
or one file per
and
or
(currently only nested to
).
- Scan each HTML file for name="..." constructs and build an index of
{"#name", "file#name"}
tuples. Easy to distinguish file-relative references from absolute names becaues the absolute ones all start with "http:" or "ftp:" or whatever, while the relative ones start with "Tool-" or "Bib-" or "Whos-" or whatever. - Go back and edit every .html file, replacing each occurance of
href="#name"
with"file#name"
, as described by the index tuple file.
It's a simple Perl script, write it and get credit for helping out!
- Build a tool that generates one .html file for each header up to the next header. Do one file per
-
S21
A simulator for the
MuP21
MISC (
Minimal Instruction-Set Computer
; c.f., Mutable Instruction-Set Computer) by
Ultra Technology
-
Arm Simulators
According to Michael Williams
michael.williams@armltd.co.uk
, simulators for the ARM6 and ARM7 are available as part of the ARM GNU toochain, from
ftp://ftp.cl.cam.ac.uk/arm/gnu/armul-1.0.tar.gz
and maybe also VHDL/Verilog models from ARM partners, see
http://www.arm.com/
.
-
Ape 2600 -- Atari Simulator
An
ongoing project
to emulate an Atari 2600 on a generic machine.
-
Cosimulation
Find and incorporate
[Rowson 94] @InProceedings{Rowson:94, author = "James A. Rowson", title = "Hardware/Software Co-Simulation", booktitle = "Proc.~of the 31st Design Automation Conference (DAC~'94)", year = "1994", organization = "ACM", address = "San Diego, CA", OPTmonth = "June", note = "(Tutorial)", OPTannote = "" }
-
Speculative Loads
Find and incorporate
@InProceedings{Rogers:92, author = "Anne Rogers and Kai Li", title = "Software Support for Speculative Loads", pages = "38-50", booktitle = "Proc.~of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems", year = "1992", month = "October" }
Evidently contains information about a cycle-level simulator. More in
@TechReport{Rogers:93, author = "Anne Rogers and Scott Rosenberg", title = "Cycle Level {SPIM}", institution = "Department of Computer Science, Princeton University", year = 1993, address = "Princeton, NJ", month = "October" }
-
SSIM
Find and incorporate
ssim: A Superscalar Simulator Mike Johnson AMD M. D. Smith Stanford Univ.
Pixie front ends in
ftp://velox.stanford.edu/pub
-
Find and incorporate (based on work with
pixie' and
ssim', may tell about them?)Johnson, Mike: Superscalar microprocessor design Englewood Cliffs, NJ : Prentice Hall, 1991. - XXIV, 288 S. : graph. Darst. (Prentice-Hall series in innovative technology) Literaturverz. S. 273 - 278 ISBN 0-13-875634-1
-
Find and incorporate (mostly results of tracing, but may discuss simulation and tracing):
@Book{Huck:89, author = {Jerome C. Huck and Michael J. Flynn}, title = {Analyzing Computer Architectures}, publisher = {IEEE Computer Society Press}, year = 1989, address = {Washington, DC} }
-
Find out more about
Robert Bedichek
's
T2 simulator
.
-
Yaze Z80 and CP/M emulator.
more info
and
source code
.
-
WinDLX
, MSWindows GUI for DLX. Also include information about DLX from
[Hennessy & Patterson 93]
-
UAE
Commodore Amiga hardware emulator (incomplete).
-
DEC FX!32
binary translation/emulation system for running Microsft Windows applications.
-
Find and incorporate
%A Max Copperman %A Jeff Thomas %T Poor Man's Watchpoints %J ACM SIGPLAN NotIces %V 30 %N 1 %D January 1995 %P 37-44
Pardo
has a copy. Executive summary: debugging tool; statically patches loads and stores with code to check for data breakpoints.
Amusing story: The processor they were running on has load delay slots and does not have pipeline interlocks. Their tool replaces each load or store with several instructions; it patched a piece of user-mode code of the form
load addr -> r5 store r5 -> addr2
Before patching, the code saved the
old
value of
r5
to
addr2
. After patching, it saved the new value. Technically, this code was broken already because the symptom could have also been exhibited by an interrupt or exception between the load and the store.
-
Find and incorporate information about Spike. Referenced in
[Conte & Gimarc 95]
, Tom Conte
conte@eos.ncsu.edu
says (paraphrpased):
Spike was built inside GNU GCC by Michael Golden and myself. It includes a lot of features that have appeared in ATOM, including the simulator with the benchnark into a single
self-tracing'' binary. The instruction trace was based on an abstract machine model distilled from GCC's RTL; it had both a high-level and a low-level form. Spike is still in occasional use, but has never been released.'' -
Find and incorporate information about Reiser & Skudlarek's paper "Program Profiling Problems, and a Solution via Machine Language Rewriting", from ACM SIGPLAN Notices, V29, $1, January 1994.
Pardo
has a copy.
Basic summary: Wanted to profile. -p/-pg code is larger and slower by enough to make it hard to justify profiling as he default. Assumes the entire source is available. For these and other reasons, wrote jprof which operates with disassembly, analysis and rewriting. Discusses sampling errors, expected accuracy, stability, randomness, etc. Describes jprof: counters and stopwatches; subroutine call graph. Domain/OS on HP/Apollo using 68030. Discusses shared libraries. Can also use page-fault clock. 4-microsecond clocks. Some lessons/observations. Doesn't explain how program running time is affected by jprof.
-
Design tradeoffs
between various implementations of 68k implementations (
comp.arch
posting).
-
More on
decompilation of PC executables
-
Update the reference for
Alvin R. "Alvy" Lebeck
.
-
Review and include FX!32. March 5 1996 Microprocessor Report. Jim Turley, "Alpha Runs x86 Code with FX!32".
Summary: DEC is running Win32 application binaries on Alpha by a new combination of interpreter and static translator. The static translator runs in the background, between the first and second executions of the application. It uses info collected by the interpreter during the 1st run, to reliably distinguish active code paths from r/o data and work out the effects of indirect jumps. Static analysis can't do this automatically on its own, for typical x86 binaries.
-
Add info about Doug Kwan (author of
YAE
, an Apple ][ emulator) to "Who's who" section. Nino says: only freely available dynamic recompilation. (Dynamic recompilation for SPARC and MIPS). Information forwarded by Marinos Yannikos nino@complang.tuwien.ac.at.
-
Find, read, and incorporate decompilation info (also cites a program verification dissertation):
%A P. J. Brown %T Re-creation of Source Code from Reverse Polish Form %J Softwawe \- Practice & Experience %V 2 %N 3 %P 275-278 %D 1972
Note: there's a slightly later SPE that has a follow-up article explaining how to do it faster/more efficiently.
-
Xref: uw-beaver comp.compilers:10907 Path: uw-beaver!uhog.mit.edu!news.mathworks.com!newsfeed.internetmci.com!in2.uu.net!ivan.iecc.com!ivan.iecc.com!not-for-mail From: faase@cs.utwente.nl (Frans F.J. Faase) Newsgroups: comp.compilers Subject: Re: Need decompiler for veryyy old code.... Date: 29 Apr 1996 23:11:51 -0400 Organization: University of Twente, Dept. of Computer Science Lines: 29 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: 96-04-144@comp.compilers References: 96-04-110@comp.compilers NNTP-Posting-Host: localhost.iecc.com Keywords: disassemble, IBM > Currently I am undertaking to modify some very old IBM code (at least > 20 years old. I believe that the code is either Assembler or Cobol. I do not know whether the following is of use for you, but I do maintain a WWW page about decompilation, which has some links to other resources as well. http://www.cs.utwente.nl/~faase/Ha/decompile.html Maybe, you should contact Martin Ward Martin.Ward@durham.ac.uk: http://www.dur.ac.uk/~dcs0mpw/ Or Tim Bull tim.bull@durham.ac.uk: http://www.dur.ac.uk/~dcs1tmb/home.html Frans (P.S. Email to PROCUNIERA@ucfv.bc.ca bounced with 451 error) -- Frans J. Faase Information Systems Group Tel : +31-53-4894232 Department of Computer Science secr. : +31-53-4893690 University of Twente Fax : +31-53-4892927 PO box 217, 7500 AE Enschede, The Netherlands Email : faase@cs.utwente.nl --------------- http://www.cs.utwente.nl/~faase/ --------------------- -- Send compilers articles to compilers@iecc.com, meta-mail to compilers-request@iecc.com.
-
A Java runtime, which generates native code at runtime:
Softway's
Guava
. Info from Jeremy Fitzhardinge (
jeremy@suede.sw.oz.au
)
-
Find, read, and summarize the following:
%A Ariel Pashtan %T A Prolog Implementation of an Instruction-Level Processor Simulator %J Software \- Practice and Experience %V 17 %N 5 %P 309-318 %D May 1987
-
Find, read and summarize "Augmint". According to Anthony-Trung Nguyen anguyen@csrd.uiuc.edu, it is based on MINT, and understands x86 instruction set and runs on Intel x86 boxes with UNIX (Linux, Unixware, etc.) or Windows NT. It is described further at
http://www.csrd.uiuc.edu/iacoma/augmint.html
and there was a paper in ICCD-96 paper, available from
ftp://ftp.csrd.uiuc.edu/pub/Projects/iacoma/aug.ps
.
-
Find, read and summarize "Etch". See
http://memsys.cs.washington.edu/memsys/html/etch.html
. Etch is an x86 Windows/NT tool for annotating x86 binaries, without source code. -
Find, read and summarize "Etch".
From: bchen@eecs.harvard.edu (Brad Chen) Newsgroups: comp.arch Subject: Windows x86 Address Traces Available Date: 7 Oct 1996 22:20:30 GMT Organization: Harvard University EECS Lines: 15 Message-ID: <53bvne$5lb@necco.harvard.edu> NNTP-Posting-Host: steward.harvard.edu Keywords: Windows x86 address traces
A collection of x86 memory reference traces from Win32 applications are now available from the following URL:
http://etch.eecs.harvard.edu/traces/index.html
. The collection includes traces from both commercial and public-domain applications. The collection currently includes:- Perl - MPeg Play - Borland C++ - Microsoft Visual C - Microsoft Word
These traces were created using Etch, and instrumentation and optimization tool for Win32 executables. For more information on Etch see the above URL.
(
etch-info@cs.washington.edu
) -
Add information on
. Here's a summary from
Peter Kuhn
:
Peter Kuhn voice: +49-89-289-23092 Institute for Integrated Circuits (LIS) fax1: +49-89-289-28323 Technical University of Munich fax2: +49-89-289-25304 Arcisstr. 21, D-80290 Munich, Germany email: P_Kuhn@lis.e-technik.tu-muenchen.de http: //www.lis.e-technik.tu-muenchen.de/people/kp.html
- portable to GNU gcc/g++ supported platforms, operating systems and processors
- detailed instrumentation of instruction usage
- no source code modification necessary
- no restrictions for the application programmer (only "-a" switch for gcc/g++ compilers)
- applicable to statically linked libraries
- minimal slow down of program execution time (about 5%)
- fast: no source recompilation necessary for repeated simulation runs
- less amount of trace data produced
- high reliability: no executable modification
- covered by GNU Public License
- available via anonymous ftp at: ftp://ftp.lis.e-technik.tu-muenchen.de/pub/iprof
The operation is: With gcc/g++ option -a (above version 2.6.3) you can produce a basic block statistics file (bb.out), which contains the number of times each basic block of the program is acccessed during runtime.
iprof
processes this basic block statistics file and accesses the program's executable to summarize the machine instructions used for each basic block. So
iprof
doesn't make any modifications to the gcc/g++ and is easily portable among gcc/g++ supported architectures. Currently binaries for LINUX 486, Pentium and Sparc Solaris are provided, ports to other architectures are straightforward.
-
There are many ways to measure slowdown. Each has certain benefits, each has shortcomings.
- Time to execute target code on simulated target vs. native target running time. This is particularly interesting if you are trying to deterine relative performance for a cmmercial product such as SoftPC or if you're otherwise interested in real-time response. However, it ignores the implementation technology of the host machine. For example, a simulated Z-80 on a SPARC will be faster than a simulated SPARC on a Z-80, and performance may vary by 6X depending on which Z-80 you use.
- The time or number of host instructions to execute the workload vs. executing the workload native on the host tells you the most about simulation efficiency if the host and the target are the same machine. The numbers get less useful if the host and target are different; there's also differences if the simulator executes some part of the program "native" (e.g., system calls). For example, a workload compiled for the EDSAC (17-bit words) and then run on a MIPS is unlikely to be close to the performance of the workload compiledd and run on the SPARC.
- Number of host instructions per target instructions captures more of the "simulation efficiency" wihtout getting caught inthe confusion of processor implementation technologies. Howver, it potentially does the least accurate job of predicting real-time performance, as it may be unduly hurt by real-world concerns such as the number of cache misses. For example, SimICS got faster when the IR got smaller but more complicated to decode. The number of host instructions increased, but the overall running time decreased.
- Multiprocessor performance is even harder to judge. For example, multiplexing target processors on a single host processor may induce TLB, cache and paging misses that lead to much worse performance. Conversely, I/O effects may be overlapped with simulation of other processors, reducing the effective overhead of simulation.
- Simulating more costs more; simulators such as Shade, FX!32, etc. are as fast as they are in part because some parts of the overall workload (e.g., OS code) is executed native on the host machine, rather than simulating all host OS code.
So what we see includes:
- You can't measure the running time of a workload on a target that does not yet or no longer exists.
- Anything that uses elapsed running times depends strongly on the implementation technology.
- The real-world performance does vary depending on the implementation technology.
- The host/target ratio fails to capture some significant effects, e.g., the SimICS example.
- Multiprocessor simulation may cause higher miss rates in the processor cache, TLB and paging memory. Conversely, simulation may be overlapped with compuation.
- Running more of the application as host code improves the observed running time and host/target instruction ratio.
(I forget the details, but I'd definitely check out some of the early SimICS papers for a discussion of runnign times, Peter has more to say.)
-
Find an incorporate Harish Patil's dissertation
on ``efficient program monitoring''. See
the TR
. Or, try
here
.
From: Harish Patil Newsgroups: comp.compilers Subject: Thesis available: Program Monitoring Date: 29 Jan 1997 11:21:02 -0500 Organization: Compilers Central Lines: 59 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: <97-01-223@comp.compilers> Reply-To: Harish Patil NNTP-Posting-Host: ivan.iecc.com Keywords: report, available, performance Hello everyone: I am glad to announce that my Ph.D. thesis, titled "Efficient Program Monitoring Techniques", is available on-line. This thesis was completed under the supervision of Prof. Charles Fischer at the department of Computer Sciences, University of Wisconsin --Madison. The thesis is available as technical report # 1320. Please check it out at the URL: http://www.cs.wisc.edu/Dienst/UI/2.0/Describe/ncstrl.uwmadison%2fCS-TR-96-1320 An abstract of the thesis follows. Regards, -Harish Efficient Program Monitoring Techniques --------------------------------------- Programs need to be monitored for many reasons, including performance evaluation, correctness checking, and security. However, the cost of monitoring programs can be very high. This thesis contributes two techniques for reducing the high execution time overhead of program monitoring: 1) customization and 2) shadow processing. These techniques have been tested using a memory access monitoring system for C programs. "Customization" reduces the cost of monitoring programs by decoupling monitoring from original computation. A user program can be customized for any desired monitoring activity by deleting computation not relevant for monitoring. The customized program is smaller, easier to analyze, and almost always faster than the original program. It can be readily instrumented to perform the desired monitoring. We have explored the use of program slicing technology for customizing C programs. Customization can cut the overhead of memory access monitoring by up to half. "Shadow processing" hides the cost of on-line monitoring by using idle processors in multiprocessor workstations. A user program is partitioned into two run-time processes. One is the main process executing as usual, without any monitoring code. The other is a shadow process following the main process and performing the desired monitoring. One key issue in the use of shadow process is the degree to which the main process is burdened by the need to synchronize and communicate with the shadow process. We believe the overhead to the main process must be very modest to allow routine use of shadow processing for heavily-used production programs. We therefore limit the interaction between the two processes to communicating certain irreproducible values. In our experimental shadow processing system for memory access checking the overhead to the main process is very low - almost always less than 10%. Further, since the shadow process avoids repeating some of the computations from the main program, it runs much faster than a single process performing both the computation and monitoring. ========================================================================== Harish Patil: Massachusetts Language Lab - Hewlett Packard Mail Stop CHR02DC, 300 Apollo Drive, Chelmsford MA 01824 Phone: 508 436 5717 Fax: 508 436 5135 Email: patil@apollo.hp.com
CGuard
Categories:
- Purpose: debugging
- Input representation: hll
- Detail: User
- Multiple protection domains: No
- Multiple processors: No
- Signals and execptions: No
- SMC OK: S (dynamically-linked libraries only)
- Simulation technology: augmentation
- Tool is robust in the face of application bugs: N
- Status: information.
See:
-
Related to OM/ATOM/Hiprof: There are also derivative products, for example "Client Server News Issue 192 (G-2 Computer Intelligence Inc, 3 Maple Place, PO Box 7, Glen Head, New York 11545-9864, USA Telephone: 516-759-7025 Fax: 516-759-7028)" reports
CS192-24 TRACEPOINT NAMES ITS FIRST PRODUCT DEC spin-off Tracepoint Technology named its first product HiProf, as we suspected it would (CSN No 185), and described it as a graphical hierarchical profiler that will enable C++ developers to analyze the binaries of 32-bit x86 applications and figure out where modifications should be made. The first of a family, the tool is based on a patented Binary Code Instrumentation technology that displays a detailed analysis of an application's execution in Tracepoint's IDE. The company's core framework can handle executables and .dlls that have been generated by compiling software as well. Therefore, source code shouldn't have to be recompiled. The data can be viewed on a threads basis. HiProf is due out next month at $599 and runs on Win95 or NT 3.51 or later. It supports apps developed with VC++ 2.0 or above and Microsoft Developer Studio 4.03.
Newsgroups: comp.compilers Subject: ANNOUNCE - Fast Code Coverage Tool Date: 8 May 1997 21:27:24 -0400 Organization: Tracepoint/DIGITAL Lines: 33 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: 97-05-111@comp.compilers Reply-To: jgarvin@scruznet.com NNTP-Posting-Host: ivan.iecc.com Keywords: testing, tools, available ANNOUNCING - TestTrack, Fast Code Coverage Tool for 32-bit Windows Apps TracePoint Technology has just opened the beta for TestTrack - an advanced code coverage tool that analyses test results and identifies areas in your code that have not been tested. Since TestTrack works on compiled and linked binary code (no source code or obj files required), there=92s no need for recompiling or preprocessing so the entire process is dramatically quicker than with past generation tools. TestTrack analyzes and reports on coverage of several different types including; function coverage, class coverage, line coverage, branch coverage, multiple condition coverage, call-pair coverage and more. TestTrack allows you to selectively exclude portions of the code base , if desired, so you can analyze only those portions of an app that concern you. A robust and intuitive GUI displays results in "live" pie charts or bar graphs that let you drill down into the code represented with just a mouse click, extensive reporting capabilities include the ability to publish reports in html, and a powerful merge function allows you to merge the results of several test runs for total coverage analysis. In addition, TestTrack identifies dead code in your app which is no longer used but which can slow performance and bloat program size. An evaluation copy of the latest TestTrack beta is available for free download from TracePoint at www.tracepoint.com. TestTrack works on 32-bit apps generated with VC++ 2.x - 5.0. TracePoint is a recent spin-off of DIGITAL Equipment Corp, whose mission is to create and market advanced development tools for 32-bit Windows apps. For further information on TracePoint visit our web site or call 888-688-2504.
-
From: dcpi-czar@pa.dec.com (Lance Berc) Newsgroups: comp.arch,comp.sys.dec,comp.unix.osf.osf1,comp.compilers Subject: New Alpha Performance Analysis Tools Date: 20 Jun 1997 21:43:17 -0400 Organization: Digital Equipment Corporation, Systems Research Center Lines: 33 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: 97-06-084@comp.compilers NNTP-Posting-Host: ivan.iecc.com Keywords: tools, available
Version 2.2 of the DIGITAL Continuous Profiling Infrastructure, a set of performance tools for Digital Alpha systems running Digital Unix, is available for general use.
The Digital Continuous Profiling Infrastructure for Digital Alpha platforms permits continuous low-overhead profiling of entire systems, including the kernel, user programs, drivers, and shared libraries. The system is efficient enough that it can be left running all the time, allowing it to be used to drive online profile-based optimizations for production systems.
The Continuous Profiling Infrastructure maintains a database of profile information that is incrementally updated for every executable image that runs. A suite of profile analysis tools analyzes the profile information at various levels. At one extreme, the tools show what fraction of cpu cycles were spent executing the kernel and each user program. At the other extreme, the tools show how long a particular instruction stalls on average, e.g., because of a D-cache miss.
DCPI runs under Digital Unix V3.2 and V4.x, with a port to WindowsNT underway. It is free of charge. Further information, including papers and man pages, can be found at: http://www.research.digital.com/SRC/dcpi The system was developed at Digital's Systems Research Center and Western Research Laboratory, both in Palo Alto, California. A paper describing the system, will appear at SOSP-16 in October. -- Send compilers articles to compilers@iecc.com, meta-mail to compilers-request@iecc.com.
See
http://www.research.digital.com/SRC/dcpi
-
SIS
-- a SPARC V7 instruction set simulator, cycle accurate including parallel execution of IU and FPU and operand dependency stalls. Comments to Jiri Gaisler
.
-
From: el@compelcon.se (Erik Lundh) Newsgroups: comp.compilers Subject: Re: asm -> structured form Date: 14 Jan 1998 14:28:38 -0500 Organization: Algonet/Tninet Lines: 22 Sender: johnl@iecc.com Approved: compilers@ivan.iecc.com Message-ID: 98-01-055@comp.compilers References: 98-01-013@comp.compilers NNTP-Posting-Host: ivan.iecc.com Keywords: disassemble, tools, comment Have a look at Christina Cifuentes work with decompilers at http://www.it.uq.edu.au/groups/csm/dcc.html Also, have a look at Frans Faase's excellent compilation of decompiler efforts at http://wwwis.cs.utwente.nl:8080/~faase/Ha/decompile.html (There is a disclaimer at the top of the page that Mr Faase has left the faculty and might be unable to maintain the page. But the last update is dated in december 1997... Hope he can keep it!) Best Regards, Erik Lundh Compelcon AB SWEDEN Alexander Kjeldaas wrote: [I'm impressed -- it does a better job of decompiling than anything I've seen elsewhere. It's still a far cry from the original source, but good enough to be a big help figuring out what a dusty old program does. -John] -- Send compilers articles to compilers@iecc.com, meta-mail to compilers-request@iecc.com. Archives at http://www.iecc.com/compilers
-
Daisy
, a VLIW + dynamic translator project at IBM.
-
Date: Sun, 22 Feb 1998 22:09:04 -0800 Message-Id: 199802230609.WAA08308@ncube From: Steve Herrod
To: simos-release@Crissy.Stanford.EDU Subject: Announcing SimOS Release 2.0! Content-Type: text Content-Length: 1628
The SimOS team at Stanford University is pleased to announce the second release of our complete machine simulation environment. If you are receiving this email, then you have downloaded an earlier version of SimOS or were deemed "someone who may be interested". If you would like to be taken off this infrequently used list, send mail to "simos@cs.stanford.edu" and we'll take you off of it immediately.
For those of you who need a refresher, SimOS is a "complete machine simulator" in that it models the hardware of uniprocessor and multiprocessor computers in enough detail to boot and run commercial operating systems as well as applications designed for these operating systems. This includes databases, web servers, and other workloads that traditional simulation tools have trouble supporting. Furthermore, SimOS executes these workloads at high speeds and provides support for easily collecting detailed hardware and software performance information.
There have been substantial improvements and enhancements since the first SimOS release including:
* Support for the Digital Alpha architecture running the Digital Unix operating system.
* Support for the MIPS 64-bit architecture.
* More modular hardware simulator interfaces that simplify the process of adding new processor, memory system, and device models.
SimOS is available free of charge for the research community and runs on several different hardware platforms. For download information, research papers, a discussion group, and more, visit the new SimOS web site at:
-
Connectix VirtualPC simulates a complete PC system including VGA, Audio Ethernet hardware and does sophisticated dynamic translation to achieve reasonable speeds (it's not exactly clear how well that works, but it seems that its achieved speed is between 25% and 80% native speed) and claims ``up to twice as fast as the competition.''
http://www.connectix.com/html/connectix_virtualpc.html
and
http://www.byte.com/art/9711/sec4/art4.htm
-
SoftWindows 98 is
FWB Software's
competitive product. Also, RealPC, also by FWB, is closer to VirtualPC in its design, it is a hardware-level emulator. See
http://www.fwb.com
. Note that both were originally developed by Insignia Solutions, see
http://www.insignia.com
.
-
Another interesting product is Inferno. See
http://inferno.lucent.com
. It describes the Inferno operating system, which is available both in native and application form and is VM-based, using dynamic translation to achieve (allegedly) a 1.5-2.5 times slowdown over native code. It is similar to TAOS in that it achieves application portability via a virtual machine.
-
Date: Sun, 21 Jun 1998 10:50:58 -0400 Reply-To: History of Computing Issues Sender: History of Computing Issues From: Lee Wittenberg Subject: SSEM Simulator To: SHOTHC-L@SIVM.SI.EDU X-UIDL: 2b629fc6064ad7c8f2c0919e41c76276
To coincide with the 50th Anniversary of the Small Scale Experimental Machine at Manchester, I am releasing the first "official" version of an SSEM simulator written in Java, and therefore (presumably) platform-independent. Source and binaries are available at
ftp://samson.kean.edu/pub/leew/ssem/
-
Bochs
-
The New Mexico Statue University Parallel Trace Archive
-
The Paradyn system uses runtime (during execution) code generation (instrumentation).
-
Method for verifying contiquity of a binary translated block of instructions by attaching a compare and/or branch instruction to predecessor block of instructions Abstract A method for enabling a first block of instructions to verify whether the first block of instructions follows a second block of instructions in an order of execution. The method includes appending a compare instruction to the first block of instructions. The compare instruction compares a first value from the first block of instructions with a second value from the second block of instructions, which precedes the first block of instructions in the order of execution. The method further includes appending a branching instruction to the first block of instructions. The branching instruction is executed in response to the first value being unequal to the second value. The branching instruction, when executed, branches to an alternative look-up routine to obtain a block of instructions that follows the second block of instructions in the order of execution.
http://patents.uspto.gov/cgi-bin/ifetch4?INDEX+PATBIB-ALL+0+24884+0+6+20371+OF+1+1+1+PN%2f5721927
What is claimed is: 1. A computer-implemented method for enabling a first block of instructions to verify whether the first block of instructions follows a second block of instructions in an order of execution the method comprising the steps of:
a) appending a compare instruction to the first block of instructions, the compare instruction when executed compares a first value from the first block of instructions with a second value from the second block of instructions, said second block of instructions preceding said first block of instructions in the order of execution; and b) appending a branching instruction to the first block of instructions, said branching instruction is executed in response to the first value being unequal to the second value, said branching instruction, when executed, branches to an alternative look-up routine to obtain a block of instructions that follows the second block of instructions in the order of execution.
U.S. REFERENCES: (No patents reference this one) Patent Inventor Issued
Title 5167023 De Nicolas et al. 11 /1992 Translating a dynamic transfer control instruction address in a simulated CPU processorABSTRACT: The system and method of this invention simulates the flow of control of an application program targeted for a specific instruction set of a specific processor by utilizing a simulator running on a second processing system having a second processor with a different instruction set. The simulator reduces the number of translated instructions needed to simulate the flow of control of the first processor instructions when translating the address of the next executable instruction resulting from a dynamic transfer of control, i.e., resulting from a return instruction. The simulator compares the address that is loaded at run time by the return instruction with the return address previously executed by that instruction. If the last return address matches, the location of the return is the same. If the last return does not match, a translate look-aside buffer is used to determine the address. If the translate look-aside buffer does not find the address, then a binary tree look up mechanism is used to determine the address of the next instruction after a return. The performance of the simulator is enhanced by utilizing the easiest approaches first in the chance that a translated instruction will result most efficiently.
5287490 Sites 2 /1994 Identifying plausible variable length machine code of selecting address in numerical sequence, decoding code strings, and following execution transfer paths
ABSTRACT: Information about the location of untranslated instructions in an original program is discovered during execution of a partial translation of the program, and that information is used later during re-translation of the original program. Preferably the information includes origin addresses of translated instructions and corresponding destination address of untranslated instructions of execution transfers that occur during the execution of the partial translation. Preferably this feedback of information from execution to re-translation is performed after each execution of the translated program so that virtually all of the instructions in the original program will eventually be located and translated. To provide an indication of the fraction of the code that has been translated, the program is scanned to find plausible code in the areas of memory that do not contain translated code. The plausible code is identified by selecting addresses according to three different scanning modes and attempting to decode variable-length instructions beginning at the selected addresses. The scanning modes include a first mode in which addresses are selected in numerical sequence by a scan pointer, a second mode in which addresses are selected in instruction-length sequence by an instruction decode pointer, and a third mode in which the selected addresses are destination addresses of previously-decoded execution transfer instructions.
hat is claimed is: 26. A method of operating a digital computer having an addressable memory, said addressable memory containing a computer program, said computer program including instructions and data at respective address locations of said addressable memory, each of said instructions consisting of contents of a variable number of contiguous ones of said address locations depending upon an operation specified by said each of said instructions, said method identifying address locations of said addressable memory that appear to contain said instructions of said computer program, said method comprising the steps of:
a) selecting program addresses in numerical sequence, and attempting to decode an instruction in said addressable memory at each program address until an initial instruction is decoded; and when said initial instruction is decoded, then b) attempting to decode a string of instructions immediately following said initial instruction until an execution transfer instruction is decoded, and when an attempt to decode an instruction fails, continuing said selecting program addresses and said attempting to decode an instruction at each program address as set out in said step a), and when an execution transfer instruction is decoded, then c) attempting to decode an instruction at a destination address of the decoded execution transfer instruction, and when the attempt to decode an instruction at the destination address of the decoded execution transfer instruction fails, continuing said selecting program addresses and said attempting to decode an instruction at each program address as set out in step a), and when the attempt to decode an instruction at the destination address of the decoded execution transfer instruction succeeds, then identifying, as said address locations of said addressable memory that appear to contain said instructions of said computer program, the address locations including said initial instruction and said string of instructions including said execution transfer instruction, wherein some program addresses of said computer program are known to contain instructions, and wherein said step a) skips over the program addresses that are known to contain instructions, wherein the decoding of an instruction is not permitted when an instruction being decoded partially overlaps program addresses known to contain an instruction, and wherein said step a) skips over a program address containing a value that is included in a predefined set of values, regardless of whether an attempt to decode an instruction starting at the program address would be successful, wherein said set of values includes values that indicate instructions having a length of one program address location, said set of values includes opcodes of privileged instructions, and said set of values includes the value of zero, and wherein said step a) skips over a program address that is the first address of a string of at least four printable ASCII alphanumeric characters.
5560013 Scalzi et al. 9 /1996 Method of using a target processor to execute programs of a source architecture that uses multiple address spaces
ABSTRACT: A method of utilizing large virtual addressing in a target computer to implement an instruction set translator (1ST) for dynamically translating the machine language instructions of an alien source computer into a set of functionally equivalent target computer machine language instructions, providing in the target machine, an execution environment for source machine operating systems, application subsystems, and applications. The target system provides a unique pointer table in target virtual address space that connects each source program instruction in the multiple source virtual address spaces to a target instruction translation which emulates the function of that source instruction in the target system. The target system efficiently stores the translated executable source programs by actually storing only one copy of any source program, regardless of the number of source address spaces in which the source program exists. The target system efficiently manages dynamic changes in the source machine storage, accommodating the nature of a preemptive, multitasking source operating system. The target system preserves the security and data integrity for the source programs on a par with their security and data integrity obtainable when executing in source processors (i.e. having the source architecture as their native architecture). The target computer execution maintains source-architected logical separations between programs and data executing in different source address spaces--without a need for the target system to be aware of the source virtual address spaces.
Having thus described our invention, what we claim as new and desire to secure by Letters patent is: 1. An emulation method for executing individual source instructions in a target processor to execute source programs requiring source processor features not built into the target processor, comprising the steps of:
inputting instructions of a source processor program to an emulation target processor having significant excess virtual addressing capacity compared to a virtual addressing capacity required for a source processor to natively execute the source processor program, and supporting multiple source virtual address spaces in the operation of the source processor, building a virtual ITM (instruction translation map) in a target virtual address space supported by the target processor, the virtual ITM containing an ITM entry for each source instruction addressable unit, each source instruction addressable unit beginning on a source storage instruction boundary, structuring each ITM entry for containing a translation address to a target translation program that executes a source instruction having a source address associated with the ITM entry, determining a ratio R by dividing the length of each ITM entry by the length of each source instruction addressable unit, accessing an ITM entry for an executing source instruction by: generating a source aggregate virtual address for the source instruction by combining the source address of the source instruction with a source address space identifier of a source virtual address space containing the instruction, multiplying the source aggregate virtual address by R to obtain a target virtual address component, and inserting the target virtual address component into a predetermined component location in a target virtual address to generate an ITM entry target virtual address for locating an ITM entry associated with the source instruction in order to obtain a one-to-one addressing relationship between ITM entry target virtual addresses and source instruction addresses.
5619665 Emma 4 /1997 Method and apparatus for the transparent emulation of an existing instruction-set architecture by an arbitrary underlying instruction-set architecture
ABSTRACT: The invention provides means and methods for extending an instruction-set architecture without impacting the software interface. This circumvents all software compatibility issues, and allows legacy software to benefit from new architectural extensions without recompilation and reassembly. The means employed are a translation engine for translating sequences of old architecture instructions into primary, new architecture instructions, and an extended instruction (EI) cache memory for storing the translations. A processor requesting a sequence of instructions will look first to the EI-cache for a translation, and if translations are unavailable, will look to a conventional cache memory for the sequence, and finally, if still unavailable, will look to a main memory.
I claim: 1. A method for translating a series of one or more instructions of a first semantic type into one or more instructions of a second semantic type, comprising the steps of:
providing a first memory; providing a second memory; translating a sequence of instructions of the first semantic type stored in the first memory into one or more primary instructions of the second semantic type and storing the instructions of the second type in the second memory; upon a request from the processor for the sequence of instructions of the first semantic type: providing the corresponding instructions of the second semantic type if available in the second memory; providing the sequence of instructions of the first semantic type if the corresponding instructions of the second semantic type are not available in the second memory.
[Others found.]
4347565 Kareda et al. 8 /1982 Address control system for software simulation
ABSTRACT: An address control system for software simulation in a virtual machine system having a virtual storage function. When a simulator program is simulating an instruction of a program to be simulated, an address translation of an operand address in the program to be simulated is achieved using a translation lookaside buffer, thereby greatly reducing the overhead for the address translation during the simulator program execution.
4638423 Ballard 1 /1987 Emulating computer
ABSTRACT: An apparatus and method is disclosed for providing an emulating computer. The present invention consists of a computer having a storage area, processing unit, control circuits and translation circuit. The original instructions are first loaded into the storage area. When the processor attempts to operate an instruction the control circuit loads a section of the instructions into the translating circuit. These instructions are then translated and stored in a memory area of the translating circuit having the address of the original instruction. The processor unit then accesses the storage area and retrieves the translated instruction.
What is claimed is: 7. A method of emulating a computer comprising the steps of:
transmitting an instruction to a processing unit; checking a cache memory for a translated instruction; loading an instruction block into an instruction memory if said translated instruction is not in said cache memory; translating an instruction of said instruction block providing a translated instruction; storing said translated instruction in said cache memory; and transmitting said translated instruction from said cache memory to said processing unit.
-
http://www.nwlink.com/~tigger/altair.html
. -
Find/write up a 1984 bib cite on a Bell Labs project to emulate the PDP-11. They implemented it in portable FORTRAN (minus some host-specific work around to handle random access files for swapping the simulated memory). They were able to boot Unix straight from distribution tapes. The work as done 81-82, I believe. The intention was to simplify bootstrapping Unix on new hardware in environments that did not have an existing Unix machine. The objective sort of failed since by the time they got it working, Unix was so succesful, few locations were that desperate. Their slowdown was about 120, and among other ideas they said that they could re-implement the interpreter kernel using threaded code for performance.
-
TeraGen emulating microcontroller. Following from an EE Times article by David Lammers,
TeraGen architecture primes single engine for multiple instruction sets
(01/25/99, 02:08:32 PM EDT).
-
TeraGen Corp., Sunnyvale CA.
-
Microcontroller.
-
Translates multiple ISAs on the fly to ``POPs'' (primitive operations) for scheduling on a VLIW.
-
(Unclear how this differs from superscaler or simultaneous multithreading).
-
TeraGen cofounder Don Sollers was a principal architect of the DSP architecture being brought to market by ZSP Corp., which uses conventional superscalar techniques to increase signal-processing throughput. Sollers earlier worked on processors at Digital and Sun and was principal architect of the Supersparc II.
-
The key advantage ... will be its ability to execute the code from several different processing cores on one engine.
-
The TeraGen engine is adapted to additional instruction sets by adding a ``small block of fast ROM'' to govern the translation of the new instructions into POPs.
-
A ROM can also be set up to translate a set of hardware functions into POPs. The TeraGen engine could thus be configured to emulate peripheral functions as well as other processors.
-
Instructions streams for all processors/emulated peripherals flow to the scheduler, where each is translated into POPs. The resulting streams of VLIW-like operations are then scheduled for execution.
-
A key feature is ... ``a large, fast data cache used for register emulation. By allocating cache locations to represent each of the registers for each of the instruction sets it is emulating, the TeraGen engine apparently can blend POPs from different streams of instructions into a single flow. It thus can theoretically find opportunities for parallelism that would escape a conventional superscalar or VLIW architecture.''
-
Put another way, the system uses a virtual register file in a cache to emulate the register file. Stollers said ``This is part of our secret sauce: The cache can be accessed as a register.''
-
Stollers said ``We have the capability to manage and schedule operations within an RTOS. This approach would allow the control logic to run at the same speed as the data path, which is what real-world multiprocessing is all about.''
-
Note that all emulated processors/devices get faster together as the TeraGen is made faster.
-
CEO George Alexy was recruited from Cirrus Logic Inc. in mid-1998 to head up TeraGen.
-
Alexy said two semiconductor companies have taken licenses, initially for 8-bit applications; they will reach silicon within the year.
-
Will Strauss, principal at Forward Concepts (Tempe, Ariz.), says there may be legal questions about emulation.
-
Strauss says most DSP designs use a Harvard architecture; TeraGen employs a unique register-file approach.
-
Quoting from the above article, ``The ability to reuse code while combining a DSP and an MCU may be unique to TeraGen, Sollers said. The StarCore approach now being developed by Motorola and Lucent is working toward combining a DSP and an MCU on the same die, but Sollers claimed that the StarCore effort "will almost be forced to adopt a new ISA. In our approach, we allow people to use a familiar ISA. From a top-level perspective, what we are doing is allowing people to configure a system-on-chip through software. That is where the flexibility of this approach comes from."''
-
TeraGen "breaks very complex tasks into primitives very quickly, to achieve an advantage that way. The POPs are long instructions-a native instruction set that is dramatically different from what previous architectures have attempted. How we hierarchically establish our instructions is our inherent advantage."
-
TeraGen has a staff of about 20 engineers.
-
Analyst Strauss said TeraGen may quickly run into intellectual-property issues.
-
TeraGen has attracted $9 million in investment capital from Sequoia Capital Partners and InterWest Partners.
-
-
DATE: June 10, Thursday, 2:30 TITLE: Jalapeno --- a new Java Virtual Machine for Servers SPEAKER: Vivek Sarkar IBM T. J. Watson Research Center ABSTRACT: In this talk, we give an overview of the Jalapeno Java Virtual Machine (JVM) research project at the IBM T. J. Watson Research Center. The goal of Jalapeno is to expand the frontier of JVM technologies for server nodes --- especially in the areas of dynamic optimized compilation and specialization, scalable exploitation of multiple processors in SMPs, and the use of a JVM as a 7x24 application server. The Jalapeno JVM has two key distinguishing features. First, the Jalapeno JVM takes a compile-only approach to program execution. Instead of providing both an interpreter and a JIT/dynamic compiler, it provides two dynamic compilers --- a quick non-optimizing "baseline" compiler, and a slower production-strength optimizing compiler. Both compilers share the same interfaces with the rest of the JVM, thus making it easy to mix execution of unoptimized methods with optimized methods. Second, the Jalapeno JVM is itself implemented in Java! This design choice brings with it several advantages as well as technical challenges. The advantages include a uniform memory space for JVM objects and application objects, and ease of portability. The key technical challenge is to overcome the large performance penalties of executing Java code (compared to native code) that has been the experience of current JVMs; if we succeed in doing so, we will simultaneously improve the performance of our JVM as well as of the applications running on our JVM. The Jalapeno project was initiated in January 1998 and is still work in progress. This talk will highlight our design decisions and early experiences in working towards our goal of building a high-performance JVM for SMP servers.
-
Get info on
EPP
(
Ball
,
Larus
, et. al.).
-
Get info on xtrace.
-
Jack Veenstra (MINT) is probably at MIPS (1998).
-
Paint
, a PA-RISC simulator based on Mint. See also
http://www.cs.utah.edu/projects/avalanche/avalanche-publications.html http://www.cs.utah.edu/projects/avalanche/paint.ps http://www.hensa.ac.uk/parallel/simulation/architectures/paint/paint.tar.Z
-
Get more info on this quote:
"What's visible about software is the effect it has on something else. If two thoroughly different programs have the same observable effects, you cannot tell which one has executed. If a given portion of a program has no observable effects, then you have no way of knowing if it is executing, if it has finished, if it got part way through and then stopped, or if it produced 'the right answer.' Programmers nearly always must rely on highly indirect measures to determine what happens when their programs execute. This is one reason why debugging is so difficult."
[Digital Woes, Lauren Ruth Weiner, 1993, Addison-Wesley]
-
Dixie: M. Ferna'ndez and R. Espasa. Dixie Architecture Reference Manual (version 1.0). TR UPC-DAC-1998-55, Computer Architecture Department, Universitat Politecnica de Catalunya-Barcelona, 1998.
-
SimpleScaler: D. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. TR CS-TR-97-1342, computer Sciences Department, University of Wisconsin-Madison, 1997.
-
Get, read and incorporate: R. Uhlig and T. N. Mudge, ``Trace-Driven Memory Simulation: A Survey'', ACM Computing Surveys, pages 128-170, Feb. 1997.
-
Get, read, and incorporate: J. E Veenstra and R. J. Fowler. MINT: A front end for efficient simulation of shared-memory multiprocessors. In "Proceedings of the Second Interational Workshop on Modeling, Analysis and Simulation of Computer and Telecommuncation Systems (MASCOTS), pages 201-207, January 1994.
-
Get, read, and incorporate: B.-S. Yang, S.-M. Moon, S. Park, J. Lee, S. Lee, J. Lee, K. Ebcioglu, and E. Altman. LaTTe: A Java VM Just-in-Time Compiler with Fast and Efficient Register Allocation. In 1999 International Conference on Paralle Architectures and Compilatin Techniques, Ocotber 1999.
-
Get, read, and incorporate: MichaelCierniak, James. M. Stichnoth, Guie-Yuan Lueh. "Support for Garbage Collection at Every Instruction in a Java Compiler." In 1999 ACM SigPLAN Conference on Program Language Design and Implementation (POLDI) 1999.
-
Get, read, and incorporate: A. Krall, and M. Probst> "Monitors and Exceptions: How to Implement Java Efficiently." In AMC 1998 Workshop on Java for High-Performance Network Computing, 1998.
-
Get, read and incorproate: T. Wilkinson, "Kaffe: A JIT and Interpreting Virtual Machine to Run Java Code." See
http://www.transvirtual.com
-
Get, read, and incorporate: Kemal Ebcioglu and Erik R. Altman, "DAISY: Dynamic Compilation for 100% Architectural Compatibility." In Proceedings of the 24th International Symposium on Computer Architecture, pp. 26-37, June 1997.
-
MAME is a ``Multiple Arcade Machine Emulator''. See http://mame.retrogames.com for more info. Note that MAME even runs on a Digita camera! Courtesy of James W. Surine.
-
[Transmeta 00]
%A Alexander Klaiber %I Transmeta Corporation %T The Technology Behind Crusoe(tm) Processors %R From http://www.transmeta.com/pdf/white_papers/paper_aklaiber_19jan00.pdf as of 2002/08/19. %D 2000
White paper on Crusoe, emulation.
[Halfhill 94b]
\bibitem{Halfhill:94} Tom. R. Halfhill, ``Apple's 680x0 Emulation for Unix'' Byte, April 1994
[Scantlin 96]
Scantlin. ``RISC architecture computer configured for emulation of the instruction set of a target computer.'' 1996/11, United States Patent #5574927.
[Baraz et al 98]
Baraz et al. ``Method for verifying contiquity of a binary translated block of instructions by attaching a compare and/or branch instruction to predecessor block of instructions.'' 1998/02, United States Patent #5721927.
[Klein et al 98]
Klein et al. ``Optimizing hardware and software co-simulator.'' 1998/06, United States Patent #5768567.
[Bunza 98]
Bunza. ``System and method for simulation of computer systems combining hardware and software interaction.'' 1998/11, U.S. Patent #5838948
Find, read and incorporate:
Kumar et al., emulation Verification of the Motorola 68060, Proceedings, ICCD, 1995, pp. 150-158. Note et al., Rapid Prototyping of DSP Systems: Requirements and Solutions, 6th IEEE Int'l Wkshp on RSP, 1995, pp. 40-47. Tremblay et al., A Fast and Flexible Performance Simulator for Micro-Architecture Trade-off Analysis on Ultrasparc-1 '1995, pp 2. Rosenberg, J.M., Dictionary of Computers, Information Processing & Telecommunications, John Wiley & Sons, pp 382
Transitive Technologies Ltd., a spinoff of Manchester University, HQ'd in San Diego, CA.
Dynamite'' dynamic translator. Claims can do several machines; can run above or below the OS; is not quite shrink-wrap --
a licensing proposition''; can do instruction decoding very quickly; has issues around interrupt/exception handling; can lead to speedups during translation; less than six months to port; is ready for production. Story at EE Times, 11 June 2001.ACM Transactions on Computer Systems (TOCS) Volume 15 , Issue 4 (November 1997)
Continuous profiling: where have all the cycles gone?
Authors Jennifer M. Anderson Digital Equipment Corp., Palo Alto, CA William E. Weihl Digital Equipment Corporation, Palo Alto, CA Lance M. Berc Digital Equipment Corp., Palo Alto, CA Jeffrey Dean Digital Equipment Corp., Palo Alto, CA Sanjay Ghemawat Digital Equipment Corp., Palo Alto, CA Monika R. Henzinger Digital Equipment Corp., Palo Alto, CA Shun-Tak A. Leung Digital Equipment Corporation, Palo Alto, CA Richard L. Sites Digital Equipment Corporation, Palo Alto, CA Mark T. Vandevoorde Digital Equipment Corporation, Palo Alto, CA Carl A. Waldspurger Digital Equipment Corporation, Palo Alto, CA
Publisher ACM Press New York, NY, USA Pages: 357 - 390 Periodical-Issue-Article Year of Publication: 1997 ISSN:0734-2071
ABSTRACT This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1-3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.
http://www.dynarec.com/~victor/Project/Bibliography/Bibliography.htm
-
Dynamically Recompiling ARM Emulator, Julian Brown (May 1, 2000) WEB
-
Generator: A Sega Genesis Emulator, James Ponder (1997-1998) DOC
-
A Robust Foundation Binary Translation of x86 Code, Liang Chuan Hsu, University of Illinois,
-
DOC
-
WEB
-
DAISY: Dynamic Compilation for 100% Architectural Compatibility, Kemal Ebcioglu, Erik R. Altman, IBM Research Division Yorktown Center [Computer Science RC 20538 08/05/96] DOC WEB
-
DAISY Dynamic Binary Translation Software, Erik R. Altman, Kemal Ebcioglu, IBM T. J. Watson Research Center, 2000 DOC SOURCE WEB
-
DAISY: Dynamic Compilation for 100% Architectural Compatibility, Kemal Ebcioglu, Erik R. Altman, IBM T. J. Watson Research Center [Presentation] PRESENTATION WEB
-
UQBT: A Resourceable and Retargetable Binary Translator (Web Page), Cristina Cifuentes, Mike Van Emmaik (Queensland University), Norman Ramsey (Harvard University), December 1999 DOC WEB
-
UQBT: Adaptable Binary Translation at Low Cost, Cristina Cifuentes, Mike Van Emmerik, PACT99, DOC WEB
-
Machine-Adaptable Dynamic Binary Translation, David Ung, Cristina Cifuentes (University of Queensland davidu@csee.up.edu.au , cristina@csee.uq.edu.au 1999?) DOC WEB
-
Binary Translation: Static, Dynamic, Retargetable?, Cristina Cifuentes (University of Queensland, cristina@cs.uq.edu.au), Vishv Malhotra (University of Tasmania, vmm@cs.utas.edu.au) 1996? DOC WEB
-
The Design of a Resourceable and Retargetable Binary Translator, Cristina Cifuentes, Mike Van Emmerik (Queensland University), Norman Ramsey (Virgina University), 1999? DOC WEB
-
Transparent Dynamic Optimization, Vasanth Bala (vas@hpl.hp.com), Evelyn Duesterwald, Sanjeev Banerjia, HPL Cambridge, June 1997, DOC(short), DOC(large), WEB [DYNAMO]
-
Dynamo: A Transparent Dynamic Optimization System, Vasanth Bala (vas@hpl.hp.com), Evelyn Duesterwald (duester@hpl.hp.com), Sanjeev Banerjia (sbanerjia@incert.com), HPL Cambridge, 2000, DOC WEB
-
Microprocessor = Dynamic Optimization Software + CPU Hardware, HPL (DYNAMO), TALK, WEB
-
An Out-of-Order Execution Technique for Runtime Binary Translators, Bich C. Le (PDL-HP, leb@cup.hp.com 1998) [1008 ACM 1-58113-107-0/98/0010] DOC WEB
-
Migrating a CISC Computer Family onto RISC via Object Code Translation, Kristy Andrews, Duane Sand (Tandem Computers Inc. 1992) [1992 ACM O-89791-535-6/92/0010/0213] DOC WEB
-
Optimizations and Oracle Parallelism with Dynamic Translation, Kemal Ebciuglu, Erik R. Altman, Sumedh Sathaye, Michael Gschwind (IBM Yorktown Center, {kemel@watson.ibm.com , erik@watson.ibm.com , sathaye@watson.ibm.com , mikeg@watson.ibm.com 1999) [1072-4451/99 1999 IEEE] DOC PRESENTATION WEB
-
BOA: Targeting Multi-GHz with Binary Translation, Erik Altman and others, IBM Research, PRESENTATION WEB
-
Dynamic and Transparent Binary Translation, Michael Gschwind, Erik R. Altman, Sumedh Sathaye, Paul Ledak, David Appenzeller, PACT99, DOC WEB
-
Binary Translation, Richard L. Sites, Anton Chernoff, Matthew B. Kirk, Maurice P. Marks, Scott G. Robinson (Digital 1993) [ACM 1993, vol.36 no. 2] DOC WB
-
Binary Translation: A Short Tutorial, Erik R. Altman, David Kaeli, Yaron Sheffer (PACT99) DOC WEB
-
Welcome to the opportunities of Binary Translation, Erik R. Altman, David Kaeli, Yaron Seffer, PACT99 DOC WEB
-
PA-RISC to IA-64: Transparent Execution, No Recompilation, Cindy Zheng, Carol Thompson (HP March 2000) [0018-9162/00 IEEE March 2000] DOC WEB
-
Embra: Fast and Flexible Machine Simulation, Emmett Witchel, Mendel Rosenblum (MIT witchel@lcs.mit.edu, Standford mendel@cs.standford.edu 1996) [Sigmetrics96] DOC1 DOC2 WEB
-
A Structuring Algorithm for Decompilation, Cristina Cifuentes, Queensland Universtiy, August 1993 DOC WEB
-
The Impact of Copyright on the Development of Cutting Edge Binary Reverse Engineering Technology, Cristina Cifuentes, U. Queensland DOC WEB
-
A Transformational Approach to Binary Translation of Delayed Branches, Norman Ramsey, Cristina Cifuentes, 1999
-
Software Profiling for Hot Path Prediction: Less is More, Evelyn Duesterwald (duester@hpl.hp.com), Vasanth Bala (vas@hpl.hp.com), HPL 2000 DOC WEB
-
A JAVA ILP Machine Based on Fast Dynamic Compilation, Kemal Ebcioglu (kemal@watson.ibm.com), Erik Altman (erik@watson.ibm.com), Erdem Hokene (hokenek@watson.ibm.com), IBM Yorktown Center, 1997 DOC WEB
-
Timing Insensitive Binary to Binary Translation of Real Time Systems, Bryce Cogswell (CMU), Zary Segall (U. Oregon), 1994 DOC WEB
-
DIGITAL FX!32: Combining Emulation and Binary Translation, Raymond J. Hookway and Mark A.Herdeg, (DIGITAL Technical Journal, 28 August 1997) DOC WEB
-
White Paper: How DIGITAL FX!32 works, DIGITAL DOC WEB
-
Executor Internals: How to Efficiently Run Mac Programs on PCs, Mathew J. Hostetter mat@ardi.com , Clifford T. Matthews cmt@ardi.com (Ardi 1996) DOC WEB
-
Syn68k: ARDIs dynamically compiling 68LC040 emulator, Mat Hostetter mat@ardi.com (Ardi October 27 1995) DOC WEB
-
The Technology Behind Crusoe Processor, Alexander Klaiber, Transmeta Corp., January 2000 DOC WEB
-
Combining hardware and software to provide an improved microprocessor (Crusoes Transmeta Patent), Robert F. Cmelik, et al. (US Patent and Trademark Offioce, February 29, 2000) DOC WEB
-
Builiding the Virtual PC, Eric Traut, Core Technologies, Nov 1995 DOC WEB
-
The DR Emulator (Technote PT39), Eric Traut, Apple, February 1995 DOC WEB
-
HW 28 DR Emulator Caches, Apple, 8 April 1996 DOC WEB
-
Java Hot Spot Documentation DOC WEB
-
Opendoc and Java Beans, Gregg Williams DOC WEB
-
What is Binary Translation (Freeport Express), Richard Gorton, Digital 1995 DOC WEB
-
DRFAQ: Dynamic Recompilation Frequently Asked Questions, Michael König (M.I.K.e), mike@dynarec.com (www.dynarec.com/~mike/drfaq.html 29-8-20000). DOC WEB
-
Emulators and Emulation, Bill Haygood 1999
-
Interview to Jeremey Chadwick
-
VCODE: A Retargetable, Extensible, Very Fast Dynamic Code Generation System, Dawson R. Engler (MIT 1995) [ACM PACT96] DOC WEB
-
A VCODE Tutorial, Dawson R. Engler (MIT, April 28 1996) DOC WEB
-
DCG: An Efficient, Retargetable Dynamic Code Generation System, Dawson R. Engler (MIT), Todd A. Proebsting (University of Arizona) 1993? DOC WEB
-
Dynamo: A Staged Compiler Architecture for Dynamic Program Optimization, Mark Leone, R. Kent Dybvig (Indiana University 1997) DOC WEB
-
Efficient Compilation and Profile-Driven Dynamic Recompilation in Scheme, Robert G. Burger, Indiana Universtity, March 97 DOC WEB
-
C: A Language for High-Level, Efficient and Machine-Independant Dynamic Code Generation, Dawson R. Engler, Wilson C. Hsieh, M. Franskaashoek, MIT 1995 (ACM) DOC WEB
-
tcc: A System for Fast, Flexible and High-Level Dynamic Code Generation, Massimiliano Poletto, Dawson R. Engler, M. Frans Keashoek, MIT 1996 DOC WEB
-
tcc: A Template-Based Compiler for ‘C. Massimiliano Poletto, Dawson R. Engler, M. Frans Keashoek, MIT 1996 DOC WEB
-
Fast, Effective Dynamic Compilation, Joel Auslander, Matthai Philipose, Craig Cambers, Susan J. Eggers,and Brian. Bershad, University of Washington, 1996 DOC WEB
-
Fast and Efficient Procedure Inlining, Oscar Weddell, R. Kent Dybvig, Indiana University, June 11 1997 DOC WEB [DELETE]
-
Some Efficient Architecture Simulation Techniques, Robert Bedichek, University of Washington, 1990 DOC WEB
-
SUPERSIM -- A New Technique for Simlation of Programmable DSP Architectures, C. Zivojnovic, S. Pees, Ch. Schlger, R. Weber, H. Meyr, Aachen University (Germany), 1995, DOC [DELETE?]
http://research.ac.upc.es/pact01/wbt/davidson.pdf Kevin Scott and Jack Davidson, Strata: A software dynamic translation infrastructure. (kscott@cs.virginia.edu, jwd@microsoft.com).
bintrans
by Mark Probst. See http://www.complang.tuwien.ac.at/schani/bintrans/Strata -- see http://www.cs.virginia.edu/~skadron/Papers/strata_tr2001_18.pdf.
Dynamic Optimization Infrastructure and Algorithms for IA-64 (includes discussion of delayed compilation and dynamic translation). See http://www.tinker.ncsu.edu/theses/thesis-kim-hazelwood.ps.
``Binary Translation: Classification of emulators'' by Arjan Tijms, Leiden Institute for Advanced Computer Science, atijms@liacs.nl. Notes some early work (e.g., IBM 1401). Read and review. See http://www.liacs.nl/~atijms/bintrans.pdf as of 2003/01/30.
``Studying the Performance of the FX!32 Binary Translation System'', see http://citeseer.nj.nec.com/233168.html as of 2003/01/30.
``Three factors contribute to the success of a microprocessor: price, performance, and software availability.'' Digital Technical Journal Vol. 9, No. 1, 1997. See http://citeseer.nj.nec.com/279579.html as of 2003/01/30.
`` B. Case, "Rehosting Binary Code for Software Portability," Microprocessor Report, Vol. 3, No. 1, Jan. 1989, pp. 4-9.'' See http://citeseer.nj.nec.com/context/1112089/0 as of 2003/01/30.
Valgrind is a memory tracing tool by Julian Seward
jseward@acm.org
and Nick Nethercotenjn25@acm.ac.uk
. Inofrmation athttp://developer.kde.org/~sewardj
includingThe design and implementation of Valgrind'' from [`http://developer.kde.org/~sewardj/docs/techdocs.html`](http://developer.kde.org/~sewardj/docs/techdocs.html). Notes: GPL'd. For finding memory-management problems. Targeted for x86 GNU/Linux executables. Dynamically link with executable using `-z initfirst` so it runs before any other object in the image; Valgrind thus gains control. Translates and instruments basic blocks; saved in translation cache and mapped by a translation table. Fast map typically has 98% hit rate. Translations are procedures; they return the virtual (not physical) `eip`. The main loop does all dispatch. Translation storage management uses an approximate LRU algorithm. Multiple main loop entry points handle special cases such as returns of system call handlers; calls to `malloc()`, `free()`, etc. User memory is tracked at the level of allocation blocks. Tracking information includes the call stack at allocation time. Calls to `free()` look for tracking information, if none, the block has already been freed or was never allocate and an error is signaled. If found, the block is marked inaccessible and the tracking information goes on a free list to look for invalid accesses. Various per-byte information via a sparse map of the entire 4G address space -- 64K sections mapping 64KB each, indexed by the high and low 16 bits of the address, respectively. Valgrind supports `--stop-after=` to return to native execution. Among other things, this helps with a binary search for Valgrind bugs. To support native execution, memory must be identical whether or not running on the simulated CPU. Signals are handled
differently'' -- because they arrive asynchronously,--stop-after=
varies. Thus, there is a mode to run all signals native. Valgrind also uses a different stack format than native, so a given signal must be handled all-native or all-Valgrind. Contains numerous assertions and internal checks, which are enabled all the time; more expensive checks are done once every N uses of the relevant coe. Some very slow checks are only enabled optionally: low-level memory management complete checks; symbol table reader; per-byte tracking; translation IR sanity checking; system call wrapper sanity checks; main loop%ebp
; register allocator checks. Most symbols prefixed using a CPP macro;malloc()
,free()
, etc. are intercepted and have the usual names. Is largely independent of other shared objects, avoids conflicts if Valgrind and the application simultaneously execute impure code. Also,glibc
definessigset_t
differently than the kernel, and Valgrind uses the kernel definition. Still imports too manyunsafe'' `.h` files. Limitations include: no threads (
most-requested feature'', needs fast mutex and race safety), no MMX, support for non-POSIX signals, Linux API is not a module, so nontrivial to port to other APIs. Does not use%edi
in order to avoid save/restore issues aroundspecial uses''. Translations are position-independent and are moved in TC as part of LRU management. `%ebp` always points at a fixed region except on translation return a special value indicates the return virtual pc requires special handling. Thus, virtual state accesses use `N(%ebp)` addressing. with the most common using 8-bit immediates. Values are re-homed at the end of every translation. Snapshotting initial state takes place before Valgrind is set up; therefore, dump state to a static area, initialize, home. System calls operate on the virtual state except the program counter is the real (Valgrind) value. Need to save Valgrind state while faking up system call state. Horror: may need to handle a second system call while blocked inside a first system call, so actually a stack of Valgrind save areas. Translation based on UCode to avoid details of the x86. UCode is approximately 2-address code, with a third field for immediates. UCode is much simpler than x86; for example only `LOAD` and `STORE` touch memory. Register allocation on UCode but some fixup done later. Valgrind does not use the FPU, so simulation is much easier. Valgrind does not cache FPU state across multiple FPU instructions. Optimisation passes can be disabled. Instrumentation can be disabled. Can single-step. UCode is annotated with memory instrumentation code. Code generation aided by GNU superopt v2.5. Memory instrumentation is 1==problem 0==valid; unintuitive, but faster to test against 0 than all 1's. Code generation is paranoid about partial-register updates. Simulated `%esp` is not updated lazily because the instrumenter needs the current value. Instrumentation is at the UCode level, rather than the x86 level. Valgrind attempts to track defined-ness of memory at the bit level, but
throws in the towel'' on things that call helper functions. Optimize bit-level referneces that touch a whole object. Valgrind runs on top of asetjmp
-installed exception handler so segfaults, etc., can bail out of current translation.%eip
is kept within 4 bytes of real. Signals are queued and checked every 1000 blocks. Signal frames have a Valgrind return address for frame cleanup, but some signals e.g.,longjmp()
to exit. No systematic verification or regression. Has a cache simulator -- tracks which instructions have effects, not just the cache effects. Data structure for cache/instruction must not be discarded when the instruction is discarded. Projects include user-defined permission ranges.Questions: Can Valgrind run itself? The
-z
trick suggests no. Probably no SSE/SSE2. Further explanation of nested system calls (how they arise) would be useful.KCachegrind:
http://kcachegrind.sourceforge.net/
visualization tool for data from Cachegrind.http://www.cs.mtu.edu/~jmbastia/project_proposal.pdf An architectural description language for simulator and other tools construction.
http://www.eros-os.org/design-notes/IA32-Emulation.html Links to Wine/Plex86/VMware/etc.
- RSIM
- V. S. Pai, P. Ranganathan, S. V. Adve, "RSIM Reference Manual", Version 1.0.
- V. S. Pai, P. Ranganathan, S. V. Adve, "RSIM: An Execution-Driven Simulator for ILP-Based Shared Memory Multiprocessors and Uniprocessors", IEEE TCCA Newsletter, Oct., 1997.
- SIMOS
- The SimOS User Guide
- M. Rosenblum, E. Bugnion, S. Devine, S. A, Herrod, "Using the SimOS Machine Simulator to Study Complex Computer Systems", ACM Transactions in Modeling and Computer Simulation, Vol. 7, No. 1, Jan., 1997.
- M. Rosenblum, S. Herrod, E. Witchel. Anoop Gupta, "Complete Computer System Simulation: The SimOS Approach".
- Power PC (II), MET
- M. Moudgill, J. Wellman, and J. H. Moreno, "Environment for PowerPC Microarchitecture Exploration", IEEE Micro 99.
- W. Anderson "An Overview of Motorola's PowerOC Simulator Family", Comm. of The ACM, June, 1994, vol. 37, No. 6.
- A. Pouesepanj, "The PowerPC performance Modeling Methodology", Comm. of The ACM, June, 1994, vol. 37, No. 6.
- SimpleScalar
- The Simple Scalar Tool Set, Version 2.0.
- SimplePower--execution-driven datapath enery estimation tool based on SimpleScalar
- CGEN -- The Cpu tools GENerator, SID
- Manual
- WARTS/QPT
- WWT2 - Wisconsin Wind Tunnel II - multiprocessor simulator
- James R. Larus, "Efficient Program Tracing," IEEE Computer, 26, 5, May 1993, pp 52-61.
- Thomas Ball and James R. Larus, "Optimally Profiling and Tracing Programs," ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 16, no. 4, July 1994, pp. 1319-1360.
- MINT and Augmint
- MINT Tutorial and User Manual
- A-T. Nguyen, M. Michael, A. Sharma, J. Torrellas, The Augmint Multiprocessor Simulation Toolkit for Intel x86 Architectures, Proceedings of International Conference on Computer Design, October 1996.
- Augmint User's guide
- ABSS: a Sparc Simulator
- D. Sunada, D. Glasco, M. Flynn, ABSS v2.0: a SPARC Simulator, SASIMI '98.
- Tullsen's SMTSIM Multithreading Simulator
- D.M. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism", In 22nd Annual International Symposium on Computer Architecture, June, 1995.
- D.M. Tullsen, Simulation and Modeling of a Simultaneous Multithreading Processor, In the 22nd Annual Computer Measurement Group Conference, December, 1996
- SIMOS-PPC
- SimpleScalar PPC
- CACTI --Cache Access and Cycle Time Information
- An Integrated Cache Timing and Power Model
- CACTI: An Enhanced Cache Access and Cycle Time Model PPC
[RF 97]
@article{rf-specifying-instructions:97, author="Norman Ramsey and Mary F. Fernandez", title="{S}pecifying {R}epresentations of {M}achine {I}nstructions", journal="ACM Transactions on Programming Languages and Systems", volume = "19", number = "3", pages = "492--524", month="May", year="1997" }
[Larsson 97]
@techreport{larsson-sim-from-spec:97, name = "F. Larsson", title="{G}enerating {E}fficient {S}imulators from a {S}pecification {L}anguage", institution="Swedish Institute of Computer Science", year="1997" }
[PZRM 97]
@article{pzrm-fast-320C54x:97, author="S. Pees, V. Zivojnovic, A. Ropers, H. Meyr", title="{F}ast {S}imulation of the {T}{I} {T}{M}{S}320{C}54x {D}{S}{P}", journal="International Conference on Signal Processing Applications and Technology}, pages = "995-999", month="September", year="1997" }
- Bergh et al., HP 3000 Emulation on HP Precision Architecture Computers, Hewlett-Packard Journal, Dec. 1987, pp. 87-89.
- Hunter et al., DOS at RISC, Byte, Nov. 1989, pp. 361-368.
- Saari, Michael, 68000 Binary Code Translator, FORML Conference Proceedings, 1987, pp. 48-52.
- Banning, John, The XDOS Binary Code Conversion System, IEEE, 1989, pp. 282-287.
SESC simulator -- superscalar simulator for in-order and out-of-order, also several multiprocessor configurations. As of 2003/12, supports only MIPS ISA.
ReXSim: A Retargetable Framework for Instruction-Set Architecture Simulation, Mehrdad Reshadi, Prabhat Mishra, Nikhil Bansal, Nikil Dutt
SICS technical reports, ``T97-01 Generating Efficient Simulators from a Specification Language 51 pages (PostScript) Fredrik Larsson
A simulator is a powerful tool for hardware as well as software development. However, implementing an efficient simulator by hand is a very labour intensive and error-prone task. This paper describes a tool for automatic generation of efficient instruction set architecture (ISA) simulators. A specification file describing the ISA is used as input to the tool. Besides a simulator, the tool also generates an assembler and a disassembler for the architecture. We present a method where statistics is used to identify frequently used instructions. Special versions of these instructions are then created by the tool in order to speed up the simulator. With this technique we have generated a SPARC V8 simulator which is more efficient than our hand-coded and hand-optimized one.''
R97-03 SimGen: Development of Efficient Instruction Set Simulators (abstract, PostScript) Fredrik Larsson, Peter Magnusson, Bengt Werner
T97-02 Performance Debugging and Tuning using an Instruction-Set Simulator (PostScript) Peter S. Magnusson, Johan Montelius
Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems. In principle, instruction set simulation can model any target computer and gather any statistic. Furthermore, such simulators are usually portable, independent of compiler tools, and deterministic-allowing bugs to be recreated or measurements repeated. Though often viewed as being too slow for use as a general programming tool, in the last several years their performance has improved considerably. We describe SIMICS, an instruction set simulator of SPARC-based multiprocessors developed at SICS, in its role as a general programming tool. We discuss some of the benefits of using a tool such as SIMICS to support various tasks in software engineering, including debugging, testing, analysis, and performance tuning. We present in some detail two test cases, where we've used SimICS to support analysis and performance tuning of two applications, Penny and EQNTOTT. This work resulted in improved parallelism in, and understanding of, Penny, as well as a performance improvement for EQNTOTT of over a magnitude. We also present some early work on analyzing SPARC/Linux, demonstrating the ability of tools like SimICS to analyze operating systems. (NOTE: A later version of this report was published in ILPS'97)
``IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium(R)-based systems'', Leonid Baraz, Tevi Devor, Orna Etzion, Shalom Goldenberg, Alex Kaletsky, Yun Wang, and Yigal Zemach. IEEE MICRO-36 2003.
Simulators to keep old software alive. Products include PDP-11 on ?, VAX on ?, and Alpha on Itanium. [According to one reader, they reported about 10 VAX MIPS on a 600 MHz P-III, with 4X memory expansion.].
http://cap.anu.edu.au/cap/projects/sulima/: Sulima SPARC V9 system simulator (as of 2004/02). Also a paper.
Anything you know about that I haven't included and any bugs you find that I haven't fixed.
-
Selected Recent Changes
- 2003/01
- Added TO.DO about Valgrind
- 2003/06
- Added TO.DO on several systems.
- 2003/10
- Added TO.DO on several systems.
- Added bib cite [Kep 03] ``How to Detect Self-Modifying Code During Instruction-Set Simulation''.
- 2003/11
- Added Sleipnir info.
- Started obfuscating e-mail addresses and updating contact info in Details About Who's Who.
- 2003/12
- Add various TO.DO references, fix some others.
- Write up Partial Emulation.
- Note that Crusoe is available (it's only been a couple years, now!).
- 2004/02
- Sulima -- SPARC V9 system simulator
- 2004/06
- Fix ``download Shade'' link.
Acknowledgements
This page was prepared by Pardo, based in large part on the SIGMETRICS '94 Shade paper. and thus with help from Bob Cmelik and the other people who helped with the Shade paper.
The magic script that splits a single large HTML source file into various-size pages, for ease of browsing, was written by Marinos "nino" Yannikos.
Many individual entries in the TO.DO section are contributions by readers too numerous to list. Many thanks, all!
Copyright (c) 1999 by Pardo. All rights reserved.
Please address comments and suggestions to [
pardo@xsim.com`](http://www.xsim.com/index.html)'.