kernel.engineering - Research & Writing

Note: this write-up stays disciplined about what is directly supported by the recovered loader, unpacked payload, helper scripts, and HLIL. Where the sample clearly supports a behaviour, I say so. Where the capability would require recovered modules or live traffic to prove, I leave it open.

I - What we are trying to understand

I came across this sample by starting with the loader first. I already knew the loader was malware from the outset; I had pulled it from a repository of malicious samples. What I wanted to understand was what it actually unpacked and how much of the real implant logic was already present in the recovered stage.

I did not chase the live server-side modules or try to recover every follow-on capability the operator might have delivered later. That was not really the point of this pass.

What interested me was that most of the important machinery is already here.

Even without chasing the server-delivered modules, this sample still exposes:

the loader-to-payload chain
the unpacking logic
the runtime bootstrap
the import hashing
the HTTP transport
the cookie-based C2 wrapping
the task and module execution framework

So this is not a complete write-up of every possible malicious capability in the broader intrusion set. It is a write-up of the parts that are already present and legible in the recovered artefacts.

What we actually have is a staged chain:

a Windows PE loader
an encrypted blob embedded in .data
a decrypted raw shellcode-style payload
a second stage that resolves APIs manually, builds an HTTP C2 channel, and executes structured task blobs

So the practical reversing question is not just "is this ChChes?".

It is:

What is the loader actually unpacking, what is the payload actually doing once it starts, and how do we write small Python tooling that proves each step instead of glossing over it?

Based on the URL pattern, tasking model, and overall structure, this lines up well with ChChes / APT10-linked reporting from JPCERT/CC, LAC, and MITRE ATT&CK. But the useful part of the exercise is not the family label. The useful part is making the loader-to-payload chain concrete enough that you can reproduce the analysis yourself, while staying honest about what this recovered stage does and does not prove.

II - The workflow at a glance

The whole reversing path looks like this:

Rendering diagram...

That sequence matters.

If you start from the raw payload without understanding how it was recovered, you miss why the base address and entry matter.

If you look at the HLIL before resolving the hashes, you end up staring at a forest of constants.

If you assign capabilities too early, you end up claiming more than the sample actually proves.

III - What the loader is actually doing

The key loader function in HLIL is sub_408fd0.

Even before naming anything, the structure is recognisable:

10040902b  int32_t var_38 = 0xe6fa
200409041  int32_t* ebx = sub_4092cd()
30040904c  sub_409f00(ebx, 0x424000, 0xe6fa)
40040905f  sub_4017c0(0x434828, ebx, 0xe6fa)
50040906e  int32_t* dwSize = sub_401cf0(esi, ebx, 0xe6fa)
600409080  int32_t eax_6 = VirtualAlloc(
7              lpAddress: nullptr,
8              dwSize,
9              flAllocationType: MEM_COMMIT | MEM_RESERVE,
10              flProtect: PAGE_EXECUTE_READWRITE)
1100409086  *arg1 = eax_6

This tells us a few things immediately:

there is a concrete blob size: 0xe6fa
bytes are copied from a concrete VA: 0x424000
some transform is applied before output size is derived
the output is intended to be executed, not just stored, because the loader allocates RWX memory with VirtualAlloc

That is the point where writing a Python script becomes the right move.

Not because "automation is nice", but because the loader has already shown us that the packed data and the unpacking path are concrete enough to reproduce.

IV - Why write the unpacker in Python

When you already have HLIL, it is tempting to keep reversing inside the disassembler and just inspect memory live.

That works for a quick sanity check, but it is weaker than writing the transformation down.

A Python unpacker gives you three things:

repeatability
a way to validate your reading of the loader
a clean payload artefact that you can reload in tooling without depending on the original process state

The mental process behind unpack_stub.py is straightforward:

identify where the blob lives
identify where the key lives
work out how the loader converts VA to raw bytes
reproduce the decryption
reproduce the container unpacking
verify that the output behaves like the next stage you expect

That is the important teaching point. You are not writing "a malware script". You are writing an executable statement of your reverse-engineering hypothesis.

V - Writing the unpacker step by step

The first problem is mundane but critical: Binary Ninja IL text is not the source of truth for bytes. The PE file is.

So the script starts by reading the original sample with pefile and converting virtual addresses into file offsets:

python

1def va_to_file_offset(pe: pefile.PE, va: int) -> int:
2    rva = va - pe.OPTIONAL_HEADER.ImageBase
3
4    for sec in pe.sections:
5        start = sec.VirtualAddress
6        end = start + max(sec.Misc_VirtualSize, sec.SizeOfRawData)
7
8        if start <= rva < end:
9            return sec.PointerToRawData + (rva - start)
10
11    raise ValueError(f"VA {va:#x} / RVA {rva:#x} not inside any section")

This is the first real engineering decision in the script.

Why write this helper first?

Because once the loader HLIL says "copy bytes from 0x424000" and "read the key from 0x4326fc", you need a byte-accurate way to answer those requests from the PE on disk.

After that, the script can read the embedded blob and AES key directly:

python

1BLOB_VA = 0x424000
2BLOB_SIZE = 0xE6FA
3
4AES_KEY_VA = 0x4326FC
5AES_KEY_SIZE = 16
6
7blob = read_va(pe, raw, BLOB_VA, BLOB_SIZE)
8aes_key = read_va(pe, raw, AES_KEY_VA, AES_KEY_SIZE)

At this point the script is not doing anything magical. It is just turning hardcoded observations from the loader into concrete bytes.

VI - Reproducing the decryption logic

The loader clearly transforms the blob before allocating the decoded result. The reconstruction that matched the recovered artefact layout here was AES-ECB over the aligned main body.

The concrete values used by the unpacker were:

text

1Encrypted blob VA:   0x424000
2Encrypted blob size: 0xe6fa
3AES key VA:          0x4326fc
4AES key:             21 a1 95 06 2f af 32 a6 ab f7 15 8f 09 cf 4f 3f

The corresponding Python is small:

python

1def aes_ecb_decrypt_blocks(data: bytes, key: bytes) -> bytes:
2    full_len = (len(data) // 16) * 16
3    tail = data[full_len:]
4
5    cipher = Cipher(algorithms.AES(key), modes.ECB())
6    dec = cipher.decryptor()
7
8    return dec.update(data[:full_len]) + dec.finalize() + tail

There are two useful lessons in that function.

First, do not overbuild. If the sample is using a simple block transform, mirror the observed behaviour exactly.

Second, preserve the tail if the blob length is not block-aligned. That reflects the practical reality of reversing real loader code: your script should mimic what the sample does, not what an ideal crypto wrapper would do in a clean-room design.

Once decrypted, the container header becomes readable:

python

1mode, key_len, key_off, out_len, payload_off = struct.unpack_from("<5I", aes_plain, 0)

That header tells the unpacker how to interpret the next stage:

mode
key_len
key_off
out_len
payload_off

This is one of the key transitions in the workflow. Before this point, we are dealing with an encrypted loader blob. After this point, we are dealing with a structured container whose fields we can reason about directly.

VII - Reproducing the container unpacking

The next step in unpack_stub.py is the container-specific decode:

python

1def unpack_container(aes_plain: bytes) -> tuple[bytes, dict]:
2    mode, key_len, key_off, out_len, payload_off = struct.unpack_from("<5I", aes_plain, 0)
3
4    if mode != 1:
5        raise ValueError(f"Unexpected container mode: {mode:#x}")
6
7    xor_key = aes_plain[key_off:key_off + key_len]
8    payload = aes_plain[payload_off:payload_off + out_len]
9
10    final_xor_byte = xor_key[-1]
11    unpacked = bytes(b ^ final_xor_byte for b in payload)

This is the second major engineering idea in the script.

The goal is not just "get output bytes". The goal is to capture the assumptions the reverse-engineering depends on:

mode should be what we observed
the key material should not be truncated
the payload length should match the advertised output length
the final XOR stage should match the loader's real behaviour

That is why the script returns both the unpacked bytes and metadata. When you are validating a reversing hypothesis, introspection matters.

The recovered output is not a PE. It is a raw shellcode-style payload stage.

That result is important because it tells us we have probably decoded the right thing. If the output had looked like random data or a broken PE, the script would have been wrong or incomplete.

VIII - The unpacking workflow as a graph

The loader-to-payload script logic can be drawn very simply:

Rendering diagram...

This is exactly the sort of malware workflow that benefits from a small script.

Each box is concrete.

Each edge corresponds to an observation from the loader.

And once you can run it end-to-end, the reverse-engineering stops being an informal story and becomes a reproducible pipeline.

IX - What comes out of the loader

Once unpacked, the output is not another PE. It is a raw shellcode-style stage.

That changes how you load it in your tooling and how you interpret the next analysis steps. If you expect imports, sections, and a friendly PE entry point, you will waste time.

The recovered entry chain was:

text

1sub_2a1fec -> sub_2a1ce0 -> sub_2a2065 -> sub_2afd60

And the entry itself is visible in HLIL:

1002a1fec    void shellcode_entry() __noreturn
2
3002a1ff0  shellcode_main_init()
4002a2009  struct runtime_ctx* eax_1 = init_api_table(&g_ctx)

This is the next good teaching point.

When a raw stage enters a small setup routine and then immediately starts building a runtime context, you are not at the payload's real mission logic yet. You are looking at bootstrap code preparing the environment.

X - What the export resolver is actually doing

The payload does not rely on the normal import table for the interesting functionality. Instead, it walks module exports and resolves them by a ROR7-based hash.

The HLIL is clear enough to show the whole idea:

1002a1f17  if (*ebx == 0x5a4d)
2002a1f31      if (*(eax_1 + ebx) == 0x4550 && *(eax_1 + ebx + 0x7c) != 0)
3...
4002a1f68      char* edx_3 = *(esi_2 + (edx_1 << 2)) + ebx
5...
6002a1f79      int32_t ebx_3 = ror.d(var_8_1, 7) + sx.d(eax_2.b)
7...
8002a1f91      if (var_8_1 == arg2)
9002a1fbd          return *(ecx_4 + (zx.d(*(edi_2 + (var_c_1 << 1))) << 2)) + ebx

This tells us:

the function validates MZ and PE
it walks the export name table
it computes a rotate-right-7 plus additive hash over the export name bytes
it compares the result to the caller's constant
it returns the resolved export VA on a match

This is one of those cases where HLIL really earns its keep. We are not vaguely feeling that "some hash loop exists". We can read the control-flow well enough to reimplement it faithfully.

XI - Why write a second Python script for the hashes

Once the resolver is understood, the next bottleneck is readability.

The payload is full of constants like 0xbbafdf85, 0x0c917432, and 0x04a7c4c8. Until you resolve them, init_api_table is just an ocean of opaque numbers.

This is the reason for resolve_hashes.py.

Again, the process matters more than the file itself:

derive the hash algorithm from HLIL
reproduce it exactly
enumerate exports from likely DLLs
match concrete hashes back to API names
feed those names back into the reverse-engineering

The core reimplementation is tiny:

python

1def ror32(x: int, n: int) -> int:
2    x &= 0xFFFFFFFF
3    return ((x >> n) | (x << (32 - n))) & 0xFFFFFFFF
4
5
6def api_hash(name: bytes) -> int:
7    h = 0
8    for b in name:
9        h = (ror32(h, 7) + b) & 0xFFFFFFFF
10    return h

This is exactly what we want from a helper script.

It takes one local reversing insight and turns it into leverage across the rest of the sample.

XII - What the hash script gives us back

Once the hashes are resolved, the bootstrap stops looking mysterious.

A few examples are enough to make the point:

text

10xbbafdf85 -> kernel32.dll!GetProcAddress
20x0c917432 -> kernel32.dll!LoadLibraryA
30x5d0fb57d -> kernel32.dll!SetErrorMode
40x04a7c4c8 -> winhttp.dll!WinHttpGetIEProxyConfigForCurrentUser
50x8f5ef202 -> winhttp.dll!WinHttpGetProxyForUrl
60x491b47ec -> wininet.dll!InternetInitializeAutoProxyDll, jsproxy.dll!InternetInitializeAutoProxyDll
70x1a7d2670 -> advapi32.dll!CredEnumerateW

That immediately changes what we can say about the payload.

It is not just hiding LoadLibraryA and GetProcAddress to be annoying. It is building a substantial runtime around:

HTTP communications
proxy discovery and PAC / WPAD handling
host identity
task execution
proxy authentication compatibility

This is also where disciplined reading matters. The presence of CredEnumerateW does not automatically prove generic credential theft as an operator goal. In this sample, the surrounding resolved APIs support a narrower and better-grounded interpretation: the implant appears designed to operate reliably in enterprise proxy environments.

XIII - What we are actually seeing during bootstrap

Once the hash constants are readable, init_api_table becomes much easier to explain.

In HLIL we can see the payload recover GetProcAddress, then LoadLibraryA, then load additional DLLs:

1002a20f5      void* GetProcAddress = resolve_export_by_ror7_hash(field_38, __saved_ecx_2, var_10c_2)
2002a20fd      g_ctx->pGetProcAddress = GetProcAddress
3...
4002a210d      resolve_export_by_ror7_hash(g_ctx->field_38, 0xc917432, GetProcAddress)
5002a2115      g_ctx->pLoadLibraryA = eax_10
6...
7002a2211      __builtin_strcpy(dest: &ebp_1[-0x20], src: "winhttp.dll")
8...
9002a22e2      ebp_1[-1] = pLoadLibraryA(&ebp_1[-0xb])
10002a233e      ebp_1[-3] = g_ctx->pLoadLibraryA(&ebp_1[-0x20])

From there the bootstrap flow looks like this:

Rendering diagram...

That is not the shape of a tiny single-purpose downloader. It is the shape of a backdoor building itself a runtime for real-world operation.

XIV - What the C2 channel is actually doing

The hardcoded URL is visible directly in the HLIL export:

text

1hxxp://zebra[.]wthelpdesk[.]com/%r.htm

The payload is also explicit about the transport wrapper it wants to use:

1002adf30    int32_t encode_c2_message_into_cookie(int32_t arg1, int32_t arg2, int32_t arg3)
2...
3002adf9b          __builtin_strncpy(dest: &var_12c, src: "Cookie", count: 7)

And elsewhere:

1002a67d8  __builtin_strcpy(dest: &var_24, src: "Cookie:")
2002a7702  __builtin_strcpy(dest: &var_40, src: "Set-Cookie:")

That is a much stronger statement than "it communicates over HTTP".

It tells us the payload deliberately packages outbound data into cookie headers so the traffic blends into ordinary-looking web requests more easily.

The message builder also gives us protocol classes we can observe directly:

1002ade47  if (arg2 != 0x400)
2002ade89      if (arg2 == 0x401)
3002ade91          *(0(result) + result - 1) = 0x42
4...
5002adeb3      if (arg2 == 0x406)
6002aded7          *(0(result) + result - 1) = 0x43
7...
8002adf1d          *(0(result) + result - 1) = 0x44
9002ade47  else
10002ade49      int32_t eax_6 = build_initial_host_profile()
11002ade57      *(0(result) + result - 1) = 0x41

From that, we can map:

0x400 / marker A: initial registration with host profile
0x401 / marker B: polling / check-in
0x406 / marker C: result upload
marker D: another control or status path reached through the remaining branch in this message builder

Again, this is the kind of detail that is worth teaching carefully. We are not just naming a function and projecting behaviour onto it. We can see the message-type branch, the marker assignment, and the initial host-profile path in the HLIL itself.

XV - How the payload and C2 actually communicate

At a high level, the payload does not just open an HTTP connection and dump obvious tasking data into the body. It appears to build an internal message format first, then wrap that message for HTTP transport inside cookie headers.

The visible path looks like this:

Rendering diagram...

The interesting part is that the payload seems to separate logical protocol messages from transport encoding.

In other words:

build_c2_message(...) appears to construct the implant's real application-layer message
encode_c2_message_into_cookie(...) appears to take that message and turn it into something safe to carry in a Cookie header
the HTTP layer then sends that wrapped data to the C2 URL

That is important because it tells us the cookie is not the protocol itself. The cookie is the carrier.

The protocol appears to have at least four message classes:

initial registration
periodic polling
result upload
an additional control or status path

And we can see those classes being built before the cookie transport step:

1002ae47b          int32_t eax_1 = build_c2_message(var_34, 0x400, nullptr)
2002ae48f          int32_t eax_4 = encode_c2_message_into_cookie(eax_1, esi[0x19](eax_1), 1)
3
4002ae6b3          int32_t eax = build_c2_message(*0x1c, 0x401, nullptr)
5002ae6c7          int32_t eax_2 = encode_c2_message_into_cookie(eax, 0(eax), 1)
6
7002aeb14          int32_t eax_9 = build_c2_message(&var_20, 0x406, 0x48)

That gives us a reasonably clean model for the wire behaviour.

On first contact, the implant likely builds a registration message containing host profiling data and sends that via the cookie path.

After that, it appears to send periodic poll messages to ask for work.

When task execution finishes, it appears to package the result into a result message and send that back through the same transport wrapper.

So the traffic model looks like this:

Rendering diagram...

There are also signs that the server side may use cookie-related response handling as well, which is why Set-Cookie: appears in the payload alongside Cookie:. The careful reading is that Set-Cookie: strongly suggests cookie-aware response parsing, while the strongest directly supported conclusion is the outbound use of Cookie as a C2 transport container.

This is a useful design choice for the operator.

Normal-looking HTTP requests with cookie headers are less conspicuous than bespoke plaintext tasking fields, and the payload can still preserve its own internal protocol structure behind that wrapper.

XVI - What the tasking model actually looks like

The recovered payload is best understood as a modular HTTP backdoor, not a monolithic stealer.

The reason is in the execution path. Inbound work is handled as structured blobs with validation, compatibility checks, checksums, worker-thread execution, caching, and explicit status reporting.

A useful HLIL slice is this one:

1002aa422  if (*(edx_5 + ecx_4 + 0x24) == ecx_5)
2002aa43e      esi_3 = sub_2a9690(edx_5 + ecx_4 + 0x10, esi_1, &var_14)
3...
4002aa45e      if (esi_3[0xa] == esi_3[6] + esi_3[5] + esi_3[1] + *esi_3)
5002aa498          var_8 = sub_2a9050(esi_3, *ebx, &var_c)

Even without perfect names, that is not the shape of:

receive command string -> run command -> send stdout

It is the shape of:

parse a wrapped task or module format
validate structure
verify integrity
perform compatibility or activation checks
unpack or materialise the module
run it via a worker execution path

The diagnostic strings support the same reading:

text

1"Invalid data received!"
2"Module is not found!"
3"Activation is required!"
4"Incompatible module received!"
5"Checksum error!"
6"Result is empty!"

Those strings tell us the operator expects module failures to happen in specific ways and wants detailed feedback.

That is a framework mindset, not a one-shot payload mindset.

XVII - What is proven and what is not

From this recovered payload stage, we can prove:

staged loader-to-payload execution
embedded blob extraction and decryption
raw shellcode-style second-stage recovery
manual import resolution using export hashing
HTTP C2
cookie-based transport
proxy-aware communications
structured task or module execution
cached module handling
result upload with diagnostic status strings

What we cannot honestly claim from this payload alone is:

confirmed interactive shell functionality
confirmed screenshot capture
confirmed keylogging
confirmed standalone file theft
confirmed credential exfiltration as a primary mission

Those may exist in modules delivered later. This recovered stage does not prove them on its own.

XVIII - Summary

The cleanest description of this sample is:

ChChes here appears to be a staged, modular HTTP implant that recovers a raw shellcode-style payload from an embedded encrypted blob, bootstraps itself with ROR7-based API resolution, communicates through cookie-encoded HTTP traffic, receives structured task or module blobs from C2, executes them in worker threads, caches reusable modules, and reports results back to the operator.

From a reversing perspective, the broader lesson is just as useful as the family identification.

Split the problem into loader, bootstrap, transport, and tasking.

Then, when one of those stages is concrete enough, write the smallest Python script that forces your understanding to become testable.

That is what turned this sample from "interesting malware with opaque constants" into a legible workflow from embedded blob to operator tasking.

References

JPCERT/CC, "ChChes - Malware that Communicates with C&C Servers Using Cookie Headers"
LAC, "menuPass APT10 Attack Campaign against Japanese Organizations"
MITRE ATT&CK, "ChChes, Software S0144"