2020.2.7

Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics

这篇文章提到了很多历史上的用于转换二进制而文件到某个中间表示的项目,且列出了参考文献,值得参考!(SASI、PIITTSFIELD、CFI、XFI、NACL)

Thus, nearly all static Intel CISC binary rewriters in the literature to date rely upon various strong assumptions about target binaries in order to successfully transform them. While each is suitable for particular applications, they each lack generality. End users cannot be confident of the correctness of the rewritten code, since many of the algorithms’ underlying assumptions can be violated in real-world binaries.

两个亟待解决的问题和本文给出的解决方案:

  1. 如何能够正确地反汇编所有代码(CISC变长,且和数据混合在一起,有难度);

    所谓的superset disassembly即是反汇编所有偏移处的代码(简单分析可知:指令数量<=二进制文件大小/1B)。

    🤔这样做出来的静态翻译效率怎样比动态翻译效率高?

  2. 如何保证翻译出来的二进制与原二进制之间有相同的语义;

    建立原二进制文件到翻译后二进制文件的地址映射表,由此可以解决间接跳转的问题。

    🤔映射表岂不是很大?和二进制文件一个量级?