- 1. Implement type obfuscation through stack out-of-bounds read and write
- 2. Arbitrary address reading and writing
- 3. Locate the offset of the ByteArray related member
- 4. 1st shellcode
- 5. Bypass ROP
- 6. Bypass CFG
- 7. Exploitation under 64-bit
jinquan of Qihoo 360 Core Security @jq0904
On June 1, 2018, the 360 Advanced Threat Response Team discovered an in-the-wild flash 0day vulnerability. Last week, Unit 42 published more details about the operation. Subsequently, Kaspersky twitted that the APT group behind the attack was the FruityArmor APT.
In this blog we will disclose further details of this exploit.
The original sample needs to be triggered by the interaction with the Cloud, but there are a lot of inconveniences. Therefore, we spent some time looking into the sample and managed to completely reverse the whole set of exploit code. The code fragments in the following analysis are all reversed codes. The original exploit supports full platforms: xp/win7/win8/win8.1/win10 x86/x64. The following analysis environment is Windows 7 sp1 x86 + Flash 220.127.116.11. The exploitation process of 64-bit version will be briefly mentioned in the last section.
The original sample first defines two very similar classes : class_5 and class_7, and the first member variable of class_7 is a class_5 object pointer:
Afterwards, the replace is called to try to trigger the vulnerability.You can see that a class_5 object and a class_7 object are defined in the replace function, and the two objects are alternately passed as arguments to the function trigger_vul ().
As you can see from the figure below, the trigger_vul method has a total of 256 parameters, which are 128 class_5 objects and 128 class_7 objects appearing alternatively. This is to prepare for the type obfuscation in later step.
First, create a class_6 object inside the trigger_vul to trigger the vulnerability:
Call li(123456) in the class_6 class to trigger RangeError. After modifying the ByteCode, it can trigger the following catch logic (pseudo code). You can see that the variables on the two stacks (local_448 and local_449) are swapped in the catch. The attacker precisely controls the jit stack, causing the two stack variables exchanged to be a cls5 object pointer and a cls7 object pointer that were previously pushed in. Now, the type obfuscation is realized.
After successfully swapping the pointer, the modified stack data (256 parameters) is assigned back to a cls5_vec object and a cls7_vec object, respectively. In the end, the cls5_vec object is returned which already contains a cls7 object. The rest are all cls5.
The above process in windbg is as follows:
According to the shaded areas, you can see that a cls5 object pointer and a cls7 pointer on the stack are interchanged after the vulnerability is triggered:
After returning to trigger_vul, traverse through the members of cls5_vec and find the cls_5 object whose m_p1 is not 0x11111111. This object is the obfuscated cls_7. The problematic "cls_5" object and the cls_7 object are then saved to the static member.
After the trigger_vul returns results, we will know whether the current environment is x86 or x64 by checking if the _cls5.m_p6 is 0. Then initialize a class_8 object with two obfuscated objects (cls5 and cls7), which is used to implement arbitrary address reading and writing.
The class_8 class is a tool class constructed by the attacker to implement arbitrary address reading and writing. Based on it, a series of read and write functions under x86/x64 are implemented. Let's focus on the implementation of readDWORD32 and writeDWORD32 in this section.
Since the first member of cls7 (var_114) is a cls5 object, so after cls5 is obfuscated with cls7, the modification of cls5.m_p1 is actually the modification of cls7.var_114. Now suppose we have a 32-bit address addr to be read, we only need to assign the value addr-0x10 to cls5.m_p1, then it is equivalent to setting cls7.var_114 to addr-0x10. When read cls7.var_114.m_p1, this statement will treat the value at cls7.var_114.m_p1 as a class_5 object, and read its first member variable, which also means addr-0x10 is treated as a class_5 object, and the four bytes at addr-0x10+0x10 will be read.
The following figure explains why addr-0x10 is required for 32-bit. Due to inheritance reasons, the first 16 bytes of each as3 object structure is fixed (where "pvtbl" is a C++ virtual table pointer, "composite", "vtable" and "delegate" members can be realized by referring to the ScriptObject implementation in the avmplus source), the first member variable of a class object is located at the object's first address +0x10 ( by analogy, 64-bit version is addr-0x20):
Figure: From the memory point of view, after the obfuscation, the operation of cls5 actually affects the value of the corresponding memory in cls7, and then you can read the value of any addr by accessing cls7.var_114.m_p1.
The root cause of writeDWORD32 principle is similar to readDWORD32 and will not be mentioned here.
In clsss_8, the attacker implements a series of feature functions based on the above two functions, all of which are as follows:
Although the attacker did not use ByteArray to implement arbitrary address reading and writing, in order to facilitate the use of writing, he must know the memory offset of the ByteArray related members in the current Flash version. To realize this, the attacker defines a class_15 class that is used to implement an offset search and save for a particular member by using arbitrary addressing writing, for later use.
Part of the setOffset32 logic:
The following members of class_15 are used to save dynamically searched memory offsets.
Once the relevant offset is found, the attacker immediately begins constructing the shellcode and executing it. The 1st stage shellcode is built in, but there are 7 DWORD32 fields that need to be dynamically filled. The 2-stage shellcode is dynamically passed in via a ByteArray, which is the _bArr member in the setOffset function above. Since the attacker's 2-stage shellcode was not obtained, the 2-stage shellcode we used came from the leaking code of HackingTeam, which was used to play as a calculator.
The attacker first stores a 1-stage shellcode template with ByteArray(ba). The disassembly is as follows. The purple area is the field that needs to be dynamically filled. The meaning of these fields is as follows:
Then initialize a new ByteArray object (ba2) and initialize the first 16 bytes of its array area as follows:
In order to construct the ROP, the attacker specifically defines a helper class_25, which implements the following functions:
The attacker first finds the GetDC address of User32.dll by the IAT of the flash module, and then finds the RtlUnWind address of ntdll.dll by the IAT of User32.dll.
Then find the function offset of NtProtectVirtualMemory and NtPrivilegedServiceAuditAlarm from the EAT AddressOfFunctions array of ntdll.dll and calculate the corresponding function address.
The attacker's idea here is to take the SSDT index of NtProtectVirtualMemory and the address of NtPrivilegedServiceAuditAlarm+0x5 for later use.
By calling NtPrivilegedServiceAuditAlarm+0x5 and passing the SSDT index of NtProtectVirtualMemory, it will bypass the ROP detection. Since the ROP detection does not Hook NtPrivilegedServiceAuditAlarm as a key function, it will not enter the ROP detection logic, thus bypassing all detections of the ROP.
Then search for the following ROP parts and save them for later use.
The above information is then returned to the upper caller:
Afterwards, some of the values are filled into the first 5 patterns of the 1st shellcode.
This sample bypasses CFG by overriding the jit stack under 32-bit. The attacker first defines two similar classes: class_26 and class_27. Both define a method called method_87. The difference is that class_26.method_87 accepts only two arguments, while class_27.method_87 accepts 256 arguments and saves all passed arguments back to the caller.
The attacker first initialized a class_26 object cls26 and a class_27 object cls27. Then replace the jit address of cls26.method_87 with the jit address of cls26.method_87 through arbitrary address reading and writing.
Then call cls26.method_87 for the second time, this time it actually called cls27.method_87. As cls26.method_87 itself will only pass 2 parameters, it results in a leakage of a large amount of data on the jit stack. Then the attacker uses the leaked data to find the address of a jit parameter stack, and call cls27.method_87 for a second time to override a return address of the jit stack to control the eip when the corresponding function returns.
Observe the above process in windbg:
According to this article, we can know that the +0x08 of the cls26 object is a vTable object pointer, and the +0x48 of the vTable object is a MethodEnv object pointer. The MethodEnv object also contains its own _implGPR function pointer and a MethodInfo object pointer. The MethodInfo object also contains a pointer to the _implGPR function. The addressing relationships in memory among these structures are as follows:
So the replace_jit_addr function essentially replaces the jit address of cls26.method_87 with the jit address of cls27.method_87. But the jit address of cls26.method_87 is stored in several places (as shown in the above figure, MethodEnv._implGPR and MethodEnv.MethodInfo._implGPR both store the address of cls26.method_87), how do we determine which place to overlay?
We have to find the answer from the jit assembly code of the class_21$/executeShellcodeWithCfg32 function. The following is part of the assembly code for executeShellcodeWithCfg32. The two lines of code enclosed in the red box in the code clearly indicate the function pointer addressing process when the cls26.method_27 function is called for the second time. Obviously, MethodEnv._implGPR is used here.
As for the address of cls27.method_27, you can find and read any of the places that store its jit address. (You can also use the method of reading the jit function pointer in the HackingTeam code, as follows). So there are three ways in total: two in the Exp code, plus one in the HackingTeam code. But there is only one way of address writing. Through the above practices, the jit address was successfully replaced.
In a 2016 article summarizing Flash usage, the author introduced the method of hijacking eip by overriding MethodInfo._implGPR. The two methods are very similar, but not identical.
On the second call to cls27.method_87, the parameters passed by the attacker are as follows, where retn is the gadget03 (addr_of_ret) found above. The remaining important parameters are explained in the comments. Since the first 12 bytes of ba2_array are the first stage shellcode address (ba_array), 0x1000, 0. These correspond exactly to the first three parameters required by NtProtectVirtualMemory.
Let's take a closer look at the logic inside cls27.method_87. It can be seen that if the first parameter is 0x85868788, the call itself is recursively 20 times. This is to layout the jit stack, which is convenient for overlaying eip:
In the last call, cls27.method_87 will use the previously leaked jit stack address to find the stack address pRetAddr where the eip will be overwritten and save the original return address.
Subsequently, in order not to cause a crash after triggering the vulnerability, the attacker passes the original return address and modifies the 1st shellcode for the second time, and fills in the last two patterns with the correct values to ensure that the shellcode can be returned normally after execution:
By overriding the eip hijacked control flow on the stack, the CFG detection is successfully avoided, thus Bypass CFG.
Through debugging, it finds that the overridden eip is the return address from one of the 20 recursive calls of cls27.method_87 on the jit stack.
Finally, in the process of recursively calling a return, eip is successfully hijacked to the ROP of the first stage, and the subsequent process is observed in windbg as follows:
After the 2nd shellcode is executed, it will continue to be returned from the recursive call of class_27.method. Then return to the normal logic of the flash. This process will not cause crashes and jams and the entire utilization is very stable.
The original exploit code also supports 64-bit environments. The vulnerability trigger code under 64-bit is no different from that under 32-bit. They only differ in the Bypass CFG section. Two methods of Bypass CFG appear in the original exploit code, which will be described below.
If the following gadget can be found in ntdll.dll in the current 64-bit environment, branch 1 is taken. The function of this part of the gadget can be clearly seen from the assembly code of the comment: the four values at the top of the popup stack are given to the x64 call convention as the first four parameters of the register and returned.
Then find the kerner32!VirtualProtect function address, and pass it and the incoming shellcode to the function shown in the figure below. Replace the return address with the jit address override in the curruptJitStack function (this process is very similar to 32-bit), and when the jit function is returned, use rop to set the address of the shellcode to be executable. Then call replaceJitApply64 to call and execute the shellcode. The replaceJitApply64 function uses the method leaked by HackingTeam to bypass the CFG, which is to overlay the virtual table address of the FunctionObject.Apply() method. The replaceJitApply64 method will be introduced in branch 2.
If the ntdll.dll of the current process does not find the gadget required by branch 1, then enter branch 2, which uses the method of overriding the virtual table address of the FunctionObject.Apply() method.
Let's take a closer look at replaceJitApply64. If you are familiar with the previous exploit code from HackingTeam, it is easy to understand the following code:
Branch 2 will call the replaceJitApply64 function twice. The first time is to call the kernel32!VirtualProtect function to set the execution permission of the shellcode. First define a ByteArray object ba in the function, and then place the shellcode in the head of ba.array.
Then you will find the virtual table of the ExecMgr object, copy the 8 bytes before the virtual table and the first 0xE4/8 virtual function addresses of the virtual table to the beginning of the len(shellcode) of ba.array (forged virtual table).
Then overwrite 8bytes of the forged ExecMgr virtual table in +0x30, which is the virtual function address corresponding to the apply method. Then overwrite the virtual table pointer of the ExecMgr header, set the value of the relevant register and the value of the relevant object offset to construct the four parameters required by the VirtualProtect function. Afterwards, call the apply method to call VirtualProtect and set all the overridden values to the previous values after the call ended to avoid crash. A detailed description of this part can be found in this blog. The comments in the figure below are also very detailed:
After the call, return to the previous function, then call the replaceJitApply64 method again. Replace the virtual function address corresponding to the apply method with the address of shellcode+0x8 and execute the shellcode. After executing the shellcode and returning to the Flash code, the whole process will not cause any crash.
CVE-2018-5002 is a high-risk vulnerability in the avm2 interpreter. The vulnerability is of high quality and has a wide range of potential targets. It can be observed from the compilation log of the original flash exploitation that the entire framework compilation can be dated back to as early as 2018.2.7. Through our analysis, we can see that exploitation code is very versatile and stable. If it’s used by the attacker, it may cause great damage.