kernel.engineering - Research & Writing

Note: This post covers baseline flattening only. We are not looking at advanced hardening like opaque predicates or encrypted state here.

Code Repository

I - What we are trying to understand

Control-Flow Flattening (CFF) is a structural obfuscation technique. It takes predictable flow like if/else chains and loops, then hides the real block-to-block transitions behind a dispatcher.

Instead of blocks jumping to their real successors, execution gets routed through a central control point.

In this post, we build a minimal flattener, look at what it becomes in a compiled binary, then patch it back into direct flow.

The goal is simple. Make the mechanism concrete enough that you can build it yourself, recognise it in a binary, and undo it without guessing.

II - The core concept

A normal function exposes its logic through direct transitions. A conditional jump tells you that one block can go to the true branch or the false branch.

Flattening breaks that relationship.

We introduce a state variable and a dispatcher. Instead of a direct jump:

A block computes the next state and writes it into the state variable.
The block jumps back to the dispatcher.
The dispatcher checks the state variable and jumps to the matching block.

In C, the basic transformation looks like this:

1int state = ENTRY_STATE;
2while (state != EXIT_STATE) {
3    switch (state) {
4        case K1:
5            /* block 1 logic */
6            state = K2;
7            break;
8        case K2:
9            /* block 2 logic */
10            state = EXIT_STATE;
11            break;
12    }
13}

The program still does the same thing. What changes is the structure. The real CFG edges are replaced with state updates and dispatcher routing.

III - A simple example

To make this easy to inspect, we use a small password check:

1void check_password(int input) {
2    if (input == 1337) {
3        printf("Access Granted!\n");
4    } else {
5        printf("Access Denied.\n");
6    }
7}

A normal compile produces a CFG with the expected diamond shape:

Rendering diagram...

That graph is readable because the branch is still doing the routing directly.

After flattening, the branch condition stops routing control-flow itself. It becomes data that selects the next state instead. Below is an example of the flattened shape. It is illustrative, not an exact preview of the compiled output used later.

Rendering diagram...

Notice how multiple blocks jump back into the dispatcher. That high fan-in is the main structural signature of a flattened function.

IV - Building the flattener

To implement this, we can write a script (flattener.py) using pycparser. It parses the C code into an AST, rewrites the function into a state machine, and emits the transformed C.

For this example, the script finds check_password, extracts the original conditional logic, and builds explicit cases for the true and false paths. Those cases then get wrapped in a while loop and a switch statement.

The critical step is converting the original branch condition into a state assignment:

python

1original_if = node.body.block_items[0]
2
3dispatcher_logic = c_ast.Assignment(
4    "=",
5    c_ast.ID("state"),
6    c_ast.TernaryOp(
7        original_if.cond,
8        c_ast.Constant("int", "2"),
9        c_ast.Constant("int", "3")
10    )
11)
12
13case1 = c_ast.Case(c_ast.Constant("int", "1"), [dispatcher_logic, c_ast.Break()])

The original if.cond is no longer acting as a direct control-flow fork. It is now just the expression used to compute the next state.

The generated C (flattened.c) matches the state-machine model directly:

1void check_password(int input)
2{
3  int state = 1;
4  while (state != 0)
5  {
6    switch (state)
7    {
8      case 1:
9        state = (input == 1337) ? (2) : (3);
10        break;
11
12      case 2:
13        printf("Access Granted!\n");
14        state = 0;
15        break;
16
17      case 3:
18        printf("Access Denied.\n");
19        state = 0;
20        break;
21    }
22  }
23}

The logic is preserved exactly. But the structural intent is now hidden inside the dispatcher and the state transitions.

V - Recovering the real CFG

Once this is compiled to a binary (flattened_bin), the original logical flow is no longer obvious from the CFG alone. The next step is to recover the real edges.

A script (deflatten.py) using angr can do this without tracing the entire program from start to finish. For this case, it is enough to take each basic block in the function, create a blank state at that address, and step forward by one block.

python

1for addr in blocks:
2    state = proj.factory.blank_state(addr=addr)
3
4    # Stop angr from panicking over uninitialised variables
5    state.options.add(angr.options.ZERO_FILL_UNCONSTRAINED_REGISTERS)
6    state.options.add(angr.options.ZERO_FILL_UNCONSTRAINED_MEMORY)
7
8    simgr = proj.factory.simulation_manager(state)
9    simgr.step()
10
11    if simgr.active:
12        for succ in simgr.active:
13            edges.add((addr, succ.addr))

We need ZERO_FILL_UNCONSTRAINED_REGISTERS because execution is starting in the middle of a function. The state variable and other values may not be initialised meaningfully. If angr branches on unconstrained symbolic state here, it will explode into useless paths. Zero-filling forces it down a concrete path so we can recover the architectural jumps cleanly.

The script outputs a DOT graph of the recovered edges:

dot

1"0x401149" -> "0x4011c4";
2"0x401179" -> "0x4011c4";
3"0x40118b" -> "0x4011c4";
4"0x401190" -> "0x4011c4";
5"0x4011c4" -> "0x4011ca";
6"0x4011ca" -> "0x0";

Look at 0x4011c4. Four different blocks jump into it. That is the dispatcher. The flattening structure is now visible directly in the compiled binary.

VI - Patching the binary

To patch the binary, we need to understand the exact assembly pattern the compiler generated.

In a basic flattened binary, the state variable is concrete. The case constants are concrete. The jump targets are concrete. That gives the patcher a lot to work with.

A patcher (simple_patcher.py) can use Capstone to disassemble the function and identify the compare-and-branch pattern inside the dispatcher:

asm

1cmp dword ptr [rbp - 4], 1
2je 0x40117b
3cmp dword ptr [rbp - 4], 2
4je 0x401184

From that, we can recover two things:

The stack offset of the state variable, such as rbp - 4.
The mapping from state values to code addresses.

The extraction logic is straightforward:

python

1def build_state_case_map(proj, func):
2    # ... loop through capstone insns ...
3    if ins_a.mnemonic == "cmp" and ins_b.mnemonic in {"je", "jz"}:
4        disp = mem_disp_from_op(ins_a.op_str)  # extracts rbp offset
5        state_val = parse_int_from_op(parts[1]) # extracts state constant
6        target = parse_int_from_op(ins_b.op_str) # extracts jump address
7
8        case_map[state_val] = target

Once the case_map is built, the patcher scans the tail of each block. If a block assigns a constant to the state variable and then jumps back to the dispatcher, the real target is already known.

For example:

asm

10x401182: mov dword ptr [rbp - 4], 2
20x401189: jmp 0x4011c4 ; Jump to dispatcher

If state 2 maps to 0x401184, then the dispatcher is just overhead. We can replace the dispatcher jump with a direct jump to 0x401184.

The patcher uses normal x86 relative jump encodings. If the target is close, it writes a 2-byte short jump (EB rel8). If the target is further away, it writes a 5-byte near jump (E9 rel32) and pads the rest with NOP instructions.

python

1def patch_jmp(buf, elf, src_addr, dst_addr, overwrite_len, image_base=0):
2    if overwrite_len == 2:
3        rel8 = dst_addr - (src_addr + 2)
4        payload = b"\xEB" + int(rel8).to_bytes(1, "little", signed=True)
5    elif overwrite_len >= 5:
6        rel32 = dst_addr - (src_addr + 5)
7        payload = b"\xE9" + int(rel32).to_bytes(4, "little", signed=True)
8        payload += b"\x90" * (overwrite_len - 5)
9
10    # Write payload into raw ELF buffer

By rewriting those jumps, we bypass the dispatcher entirely. The binary goes back to using direct block-to-block transitions, which is what the original function had before flattening.

VII - Running it yourself

You can run the full sequence with the project files:

bash

1# 1. Flatten the C code
2python3 flattener.py
3
4# 2. Compile it
5gcc flattened.c -o flattened_bin
6
7# 3. Recover the CFG edges and output DOT graph
8python3 deflatten.py
9
10# 4. Patch the binary to restore direct flow
11python3 simple_patcher.py flattened_bin -o patched_bin --dot check_password_flow.dot

When you run patched_bin, it executes the same logic as target.c, but the dispatcher overhead is gone.

VIII - Summary

Control-flow flattening replaces direct structural edges with state-driven routing through a dispatcher.

For baseline flattening, that structure is also brittle. The state machine is built from explicit constants, predictable comparisons, and obvious jump patterns. Once those pieces are identified, automated analysis can recover the real transitions and patch the binary back into direct flow.

If you want CFF to hold up better against automated recovery, the recovery assumptions themselves need to break. That means harder dispatcher fingerprinting, harder state tracking, and fewer nice clean cmp and je patterns to abuse.

Baseline flattening is still worth building though, because it shows very quickly where the real strength and weakness of the technique sits.