Skip to content Skip to sidebar Skip to footer

Reassemble .py File From Bytecode

Problem Statement I have a file (no extension) with some nicely formatted python opcodes that I would like to reassemble into the original .py file (or as close as I can). Recreati

Solution 1:

There is some confusion about what uncompyle6 does. It starts with Python bytecode, or more accurately "wordcode" if this is Python 3.6 or greater. Alternatively it is often used to decompile a Python-compiled file which contains bytecode.

Judging from what you show above, what I believe you want to do is start with a text representation of bytecode produced by the version-specific disassembler that comes with (and only completely works on) the version that Python is running.

Here is the reason you get that strange "Import Error" message above from uncompyle6. It looks at the beginning of the text file you have weirdly called a Python compiled file. That file starts with the ASCII-encoded string "1" and uncompyle6 is interpreting that according to the specific format for Python compiled file, where the beginning of the file contains some sort of Python-encoded version string, technically called a "magic number".

Never fear though, I have written a few more tools to get you closer to where you want to get to. Specifically, I wrote a Python cross-version assembler to match Python's built-in disassembler.

This is in my github project python-xasm.

Using that, you can produce real Python bytecode which can be run. And if the code you wrote indeed is like from something Python spit out, it probably can be decompiled back into high-level Python.

However, xasm currently does need a little more help than what you have above. Specifically it won't guess from opcode names which Python version(s) they can belong to. Matching opcode names with acceptable Python versions is even harder than you might think. If you see LOAD_CONST, you also need to consider whether this is instruction takes 2 bytes or 3. If 2 then it is Python 3.6 and greater otherwise it is Python < 3.6. And if that is not hard enough already, some versions of Python change the opcode value for a particular opcode name! Therefore it is possible that you might not be able to exactly determine which Python interpreter some assembly comes from. But I am assuming you don't care, as long as whatever you come up with is consistent.

So with the above, now back to solve your question.

First produce real bytecode. You could do it like this

import py_compile 
py_compile.compile("/tmp/test.py", "/tmp/test.pyc", 'exec')

Now instead of using the builtin python disassembler, use the cross-version disassembler I wrote and that comes with xdis called pydisasm, and use the --asm option which will output the assembly in a xasm-friendly way:

$pydisasm--asm# pydisasm version 4.0.0-git# Python bytecode 3.6 (3379)# Disassembled from Python 3.6.5 (default, Aug 12 2018, 16:37:27)# [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]# Timestamp in code: 1554492841 (2019-04-05 15:34:01)# Source code size mod 2**32: 23 bytes# Method Name:       <module># Filename:          exec# Argument count:    0# Kw-only arguments: 0# Number of locals:  0# Stack size:        3# Flags:             0x00000040 (NOFREE)# First Line:        1# Constants:#    0: 1#    1: 2#    2: None# Names:#    0: a#    1: b#    2: print1:LOAD_CONST(1)STORE_NAME(a)2:LOAD_CONST(2)STORE_NAME(b)3:LOAD_NAME(print)LOAD_NAME(a)LOAD_NAME(b)BINARY_ADDCALL_FUNCTION1POP_TOPLOAD_CONST(None)RETURN_VALUE

Notice all of the additional information in comments at the top of the file which contains some really arcane stuff like "stack size" and "flags". This and most of the other stuff needs to be stored in a Python bytecode file.

So save this to a file, and then you can assemble that to bytecode. And then decompile it.

$ ./xasm/xasm_cli.py /tmp/test.pyasm
Wrote /tmp/test.pyc
$ uncompyle6 /tmp/test.pyc# uncompyle6 version 3.2.6# Python bytecode 3.6 (3379)# Decompiled from: Python 3.6.5 (default, Aug 12 2018, 16:37:27)# [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]# Embedded file name: exec# Compiled at: 2019-04-05 15:34:01# Size of source mod 2**32: 23 bytes
a = 1
b = 2
print(a + b)
# okay decompiling /tmp/test.pyc

I gave a lightning talk at Pycon2018 in Medellín Columbia related to this. Sorry you missed it, but you can find a video of it here http://rocky.github.io/pycon2018-light.co

It shows how to:

  • produce a Python compiled file from an ASCII-encoded Python source text,
  • modify it to remove tail recursion,
  • write that back out to a Python compiled file, and then
  • run the code.

Of course, you can't decompile that because there is no easily Python that mimics this closely - it was hand modified.

Lastly it seems like you are also interested in how the bytecode and the source code are related. So I'll mention that uncompyle6 has options --tree and the even more verbose --grammar which will show the steps taken to reconstruct the Python from the Python bytecode.

Post a Comment for "Reassemble .py File From Bytecode"