Leveraging Java Bytecode for Fun & Analysis

Summary

An approach similar to modifying assembly code to direct control flow can be used to de-obfuscate and reverse-engineer Java malware or any compiled Java classes for that matter. In this post we will look at one such instance where this technique proved useful.

TL;DR – Skip to Bytecode

Background

During the second week of December 2021 the news of the discovery of Log4Shell (CVE-2021-44228) vulnerability wreaked havoc in every sector that is using Log4j, a widely used Java logging framework, either directly or via third party modules. We all have heard about and dealt with the fallout of this ad nauseam. Hence, I will refrain from discussing anything related to this vulnerability; rather we are taking a look into analysis of a sample from Khonsari malware family.

We've updated the vx-underground Malware Sample collection with more malware using the LOG4J exploit.

We now have 70 samples utilizing the exploit – a small number compared to what is probably present in the wild by now.

Check it out here: https://t.co/XOKsN4zXSx pic.twitter.com/HiYfNmFha1
— vx-underground (@vxunderground) December 14, 2021

vx-ug has been collecting all the samples that are leveraging this Log4Shell vulnerability which is where this sample (SHA: 86fc70d24f79a34c46ef66112ef4756639fcad2f2d7288e0eeb0448ffab90428) has been obtained from under Orcus RAT. The reason for attributing this particular sample to Khonsari even though it’s under Orcus RAT is because of the package name for the classes, which we will see during the analysis. Additionally as per this article by Bitdefender the final payload injected into conhost.exe is Orcus RAT.

As far as Java malware goes, it is seldom complicated and most can be reverse-engineered by either analysis of decompiled Java classes or using dynamic analysis in tandem, to enrich the findings from static analysis.

Initial Analysis

After busting open the JAR sample we can see that there are three classes under khonsari package. Additionally, notice that it is using the JNA library, perhaps, for native calls. Manifest file tells us that the entry point of this JAR is khonsari.A.

Lossy Decompilation

We can use any decompiler to look at the classes. My usual go-to is jd-gui, however, as decompilation of Java bytecode to Java source is not lossless, it failed to decompile for some code blocks esp. the blocks that are involved in decoding obfuscated strings.

Even with this minor setback we could figure out the tasks that this sample is carrying out just by doing some static analysis.

Method A.R(Object[] paramArrayOfObject) is being used for HTTP communication.

//   34: invokestatic a : (III)Ljava/lang/String;
//   37: invokespecial <init> : (Ljava/lang/String;)V
//   40: invokevirtual openConnection : ()Ljava/net/URLConnection;
//   43: checkcast java/net/HttpURLConnection

The same method is carrying out some sort of decryption.

//   321: invokestatic a : (III)Ljava/lang/String;
//   324: invokestatic getInstance : (Ljava/lang/String;)Ljavax/crypto/Cipher;
//   327: astore #13
//   329: aload #13
//   331: iconst_2
//   332: new javax/crypto/spec/SecretKeySpec

Method a() in all the classes is used for string deobfuscation. Since we see that being called often where string values are assigned.

invokestatic a : (III)Ljava/lang/String;

Judging by the Windows API calls, P.k() performs some code injection into another process.

After finding out the above details about this malware, if we execute the JAR, we do not observe the said activity.

More Trials, More Errors

I considered Bytecode Viewer as well since it provides multiple decompilers to choose from. Unfortunately, none of them were able to generate a valid source code. It was interesting to see static initialization blocks containing return and break, which are not allowed, but it’s expected as the compilation <-> decompilation is not lossless.
Debugging the sample was a futile endeavour too since stack and local variable visibility is needed. Here is an old but decent article on how to debug bytecode using Dr. Garbage.
Then I decided to go this route of editing the bytecode and have relevant code blocks executed and de-obfuscate strings. I came across JBE, unfortunately there were lot of syntax errors. Hence, saving the bytecode after edits was out of question.
Finally, I landed on Recaf. Initially, it was a little disappointing to find that it uses the same decompilers for classes as that of Bytecode Viewer, and as expected the same parsing errors stop us from editing the classes. Turns out there are other Class Modes in Recaf which could allow editing.

At this point I suspected that these classes were written in Bytecode to make the analysis difficult. But it is a mere speculation.

Bytecode

Bytecode is the instruction set that gets executed on the Java Virtual Machine (JVM). All Java source once compiled gets translated to Bytecode. If we are able to modify the bytecode for these classes, consequently we can control the entire flow of the malware.

We are going to modify the method epilogue of A.a(), which is the method of concern for decoding the string. Printing out the string to stdout before it is returned by the method will do the trick.

These four instructions are doing the following:

Duplicating the final String value onto the stack, so that we do not lose the reference to it.

Printing it out using our good old friend ‘System.out.println.’

Debugging from ProgrammerHumor

Dynamic + Static Analysis

As we proceed to execute the JAR again, four String values are printed out. The first three Strings can be ignored as they are the initialized values of class variables. The fourth string is interesting, and as inferred from the partially decompiled code is checked against the actual argument of main. Let’s supply this as a command-line argument to the jar while execution.

Presto! We are blind no more. If we follow the code and match up with the occurrences of A.a() calls, it’s clear that the method A.y() is being used for persistence. As evident from the decoded string a run key registry has been added.

Next strings are related to the HTTP communication in A.R(). The URL suggests that file dorflersaladreviews.bin.encrypted is being fetched via HTTP GET request. If we look at the other file under Orcus RAT URL where vx-ug has uploaded the samples, it’s the same encrypted file with hash 295aa53d4f104ee8532593b17eaf6b31b8c065de922e4507879cecb13f0d3504. Simulating a fake network and hosting this encrypted file will allow for further execution of the code.

HTTP GET request for the encrypted payload

All decoded strings are printed out showing decryption and process injection

As indicated from the code alongside our stdout, decryption of the encrypted payload happens, which is supplied to the method P.k() and eventually injected into conhost.exe.

Conclusion

Sometimes simple tricks can go a long way while reverse-engineering malware. In our case less than a few lines of instructions was enough to get more insight into the workings of this sample.

295AA53D4F104EE8532593B17EAF6B31B8C065DE922E4507879CECB13F0D3504	Encrypted Payload
86FC70D24F79A34C46EF66112EF4756639FCAD2F2D7288E0EEB0448FFAB90428	Malicious JAR
test.verble.rocks	Domain where both these files were hosted

Indicators of Compromise