kernel.engineering - Research & Writing

Note: The code repository for this project will be released soon.

I - Intro

I recently built a practical reversing experiment: obfuscating arithmetic in .NET IL, then immediately trying to recover it with a paired deobfuscator to see what actually survives.

I kept the project pretty structured: deterministic template selection, optional nonlinear templates, telemetry output, and a deobfuscator broken into simple tiers.

The goal isn't to create "unsimplifiable code" (which is mostly a myth anyway). The idea is to make automated simplification work harder while keeping correctness and overhead under control.

Note : I'm not perfect regarding my IL implementation, I could be wrong in some parts, but the results speak for themselves. This project was also done over 3 days on a collective of 9 hours of sleep. So take most of it with a grain of salt

II - Model and Notation

Before getting into the pipeline, here is the small model I use throughout the post.

$w = 32$ is the machine word size.
$B_w = \{0,1\}^w$ is the set of $w$ -bit vectors.
$\operatorname{eval}(e) : B_w^n \to B_w$ is expression evaluation under unchecked two's-complement semantics.
$x + y$ , $x - y$ , and $x * y$ are arithmetic modulo $2^w$ .
$x \& y$ , $x | y$ , $x \oplus y$ , $\sim x$ , $x \ll k$ , and $x \gg k$ are the usual bitwise and shift operators over $B_w$ .

The rule for each replacement is simple:

\forall \sigma \in B_w^n,\ \operatorname{eval}(e_{\mathrm{orig}}, \sigma) = \operatorname{eval}(e_{\mathrm{obf}}, \sigma)

At method level, that means:

\forall \sigma,\ \operatorname{exec}(m_{\mathrm{orig}}, \sigma) = \operatorname{exec}(m_{\mathrm{obf}}, \sigma)

In practice, that meant using identities that cancel cleanly, only rewriting sites with the right stack shape, and normalising branches afterwards.

III - Threat Model and Goals

So, who are we defending against? I model the defender against an analyst who has:

static IL access,
symbolic simplification tooling,
pattern-based deobfuscation capability.

In practice, that analyst is trying to recover x + y intent, collapse all the algebraic noise, and normalise the IL back to simpler equivalent forms.

For our defence, we want to:

Preserve runtime semantics under unchecked 32-bit arithmetic.
Increase expression diversity and rewrite effort.
Keep growth and runtime overhead controllable.
Quantify simplification success and residual complexity after deobfuscation.

IV - System Overview

The workflow is split into four cooperating components:

mba-obfuscation: The IL transformer that replaces eligible add sites with layered MBA templates.
mba-deobfuscation: canonicalisation + exact rewrite engine + residual analysis.
mba-eval: experiment runner producing CSV/JSON metrics and bundled artefacts.
mba-sample: a deterministic sample program used for baseline sweeps.

Dataflow is pretty simple: Build the sample -> Obfuscate it -> Deobfuscate it -> Execute and compare -> Export telemetry.

V - Let's build

This is the sequence I followed while building it. If you want to reimplement it, I suggest following the same order:

Build safe IL mutation infrastructure.
Identify valid int32 add candidate sites.
Create a template catalogue (Linear/Nonlinear).
Add layer-by-layer expansion.
Make template selection deterministic.
Add growth budgeting.
Emit telemetry.
Build tiered deobfuscation.
Build the eval harness.

Let's dive into the details.

VI - Safe IL Rewrite Surface

The first thing I focused on was mutation safety. The obfuscator runs on dnlib and edits instruction lists in place. Any sloppy rewrite breaks everything quickly.

I used some simple guardrails: skip methods with exception handlers, rewrite in descending index order to avoid index drift, and normalise branches (SimplifyBranches) after successful rewrites.

Most of the early failures had nothing to do with the algebra. They came from IL hygiene issues like broken branch targets or bad stack shape, so rewrite safety became a hard precondition.

csharp

1if (body.ExceptionHandlers.Count > 0)
2{
3    telemetry.SkipReason = MethodSkipReason.ExceptionHandlersPresent;
4    return telemetry;
5}
6
7var byIndex = adds.Select(a => il.IndexOf(a)).OrderByDescending(i => i).ToList();
8foreach (var idx in byIndex)
9{
10    // remove original add and insert replacement layers
11    il.RemoveAt(idx);
12    var insertAt = idx;
13    InsertBinaryMbaLayer(...);
14    for (var layer = 1; layer < layers; layer++)
15        InsertUnaryMbaLayer(...);
16}
17
18if (telemetry.TransformedSites > 0)
19{
20    body.SimplifyBranches();
21    body.OptimizeBranches();
22}

VII - Finding 'int32 add' Sites

Blindly rewriting every add is a good way to break IL. Before I touch a site, I want to know that both stack inputs are really int32.

So I first split the method into basic blocks, then run a lightweight stack simulation. The Sk domain (I32, I64, R4...) is enough to decide if an add is safe.

csharp

1// mark only add ops whose two stack inputs are int32
2if (insn.OpCode == OpCodes.Add && st.Count >= 2 && st[^2] == Sk.I32 && st[^1] == Sk.I32)
3    hits?.Add(insn);

Once I added stack merges and block simulation, the invalid replacements in branch-heavy methods mostly disappeared.

VIII - Template Catalog

Now for the fun part. I moved to template architecture.

Linear templates: Things like (x ^ y) + 2*(x & y). I keep these intentionally recoverable so deobfuscation can measure exact match rates.

Nonlinear templates: For these, I only introduced mild multiplicative expansions with explicit cancellation terms. These are marked best-effort and are the material I expect to survive exact-match rewriting.

csharp

1// (x + y) + ((x + c)*(y + c) - (x*y + c*x + c*y + c*c))
2private static void EmitBinaryAffineProductCancel(...)
3{
4    var c = PickConst(method, originalAddIndex, layer, 0x63);
5    il.Insert(insertAt++, OpCodes.Ldloc.ToInstruction(lx));
6    il.Insert(insertAt++, OpCodes.Ldloc.ToInstruction(ly));
7    il.Insert(insertAt++, OpCodes.Add.ToInstruction());
8    // ... expansion terms ...
9    il.Insert(insertAt++, OpCodes.Sub.ToInstruction());
10    il.Insert(insertAt++, OpCodes.Add.ToInstruction());
11}

I kept this emitter intentionally explicit so you can audit every cancellation term directly in IL.

IX - Layered Generation

Each selected add site expands as:

Layer 0 (binary): consumes original two operands.
Layers 1..L-1 (unary): repeatedly wraps the evolving expression with unary MBA identities.

The sequence applies a single binary base transformation followed by iterative unary wrapping to increase depth.

csharp

1il.RemoveAt(idx);
2var insertAt = idx;
3InsertBinaryMbaLayer(method, il, ref insertAt, lx, ly, idx, 0, enableNonlinear, nonlinearRatio, nonlinearStartLayer, telemetry);
4
5for (var layer = 1; layer < layers; layer++)
6    InsertUnaryMbaLayer(method, il, ref insertAt, lx, ly, idx, layer, enableNonlinear, nonlinearRatio, nonlinearStartLayer, telemetry);

X - Deterministic Randomness

I wanted reproducible runs before tuning anything, so template choice and constants come from a deterministic hash of the method token, add index, layer index, salt, and user seed.

That gives me the same output every time I rerun the same configuration. It also lets me control nonlinear behaviour proportionally without losing determinism at each (method, site, layer).

XI - Growth Budget

Without a budget, expression layering blows methods up fast. So each method gets a max IL growth budget, and rewrites that would go over it are skipped.

Enforcing a growth budget prevents a small number of heavy sites from dominating the total binary size growth.

XII - Telemetry

I treated telemetry as part of the build from the start. The obfuscator emits module-level totals, per-method telemetry, and a template usage histogram. Without that, the determined quality of an obfuscator is simply just an opinion.

XIII - Deobfuscation Tiers

To keep the evaluation grounded, I built a paired deobfuscator that mirrors a simple attacker workflow in tiers:

Tier 0: Canonicalisation (normalise commutative operand ordering, constant-fold)
Tier 1: Exact matching (match known linear templates and rewrite them)
Tier 2: Residual analysis (measure residual fragments and complexity)

The point of splitting it this way is to separate exact recovery from partial cleanup, and then look at what kind of structure is still left afterwards.

XIV - Evaluation

To evaluate the whole thing properly, I automated the full loop in mba-eval. Each run is indexed by the tuple:

r = (L, s, n, \rho, g)

corresponding to the configuration parameters layers, seed, nonlinear, nonlinear_ratio, and growth_budget. This maps to a concrete serialised identifier:

\mathrm{run\_id} = \texttt{L(L)-S(s)-N(n)-R(rho)-G(g)}

That stable identifier makes it easy to group results by configuration and rerun the same experiment later.

One annoying bug showed up while I was building the evaluator. My first parser used naive substring matching on the obfuscator's telemetry line, so transformed accidentally matched inside methods-transformed and briefly skewed some downstream ratios. I fixed that by parsing the telemetry as exact key=value fields instead.

XV - Results

Here are selected slices from the current sweep artefacts (eval-artifacts/metrics-20260403-165636.csv), aggregated across seeds at growth_budget = 10000.

The first thing I checked was correctness and raw transformation coverage. On these selected runs, every obfuscated and deobfuscated assembly matched the baseline output, and the obfuscator's own module telemetry reported attempted=15, transformed=15, and skipped-growth-budget=0. So on this sample, the interesting movement is not in whether rewrites happen, but in how much structural cost and diversity they add.

Cost Curves With correctness confirmed, the direct cost mostly moves with the number of layers ( $L$ ) and how much nonlinear material gets injected.

I tracked those costs with:

\mathrm{size\_growth\_ratio}_r = \frac{\mathrm{obfuscated\_size\_bytes}_r}{\max(1,\mathrm{baseline\_size\_bytes}_r)}

\mathrm{runtime\_overhead\_ratio}_r = \frac{\mathrm{obfuscated\_runtime\_ms}_r}{\max(\varepsilon,\mathrm{baseline\_runtime\_ms}_r)}

Layers	NL Ratio	Budget	Mean Runtime Overhead	Mean Size Growth
1	0.5	10000	0.88x	1.03x
3	0.5	10000	0.87x	1.18x

On this small sample, runtime ratios are noisy and mostly stay below 1.0x, so size growth is the cleaner signal. The stable effect here is IL/code-size expansion, not a decisive runtime slowdown.

Template Entropy Structural complexity in the generated IL depends a lot on template diversity. For a given run $r$ with a template histogram where each template count is $c_i$ :

N_r = \sum_i c_i,\quad p_i = \frac{c_i}{N_r}

The overall entropy is:

\mathrm{template\_entropy}_r = -\sum_i p_i \log_2(p_i)

Layers	NL Ratio	Mean Entropy	Mean Diversity
1	0.5	2.02	4.33
3	0.5	2.99	9.00

That trend is the clearest structural result in Part 1: adding layers increases the number of template instances and broadens the observed template mix, even when the runtime signal stays noisy.

XVI - End word

Well, I think that we are done!

You should treat obfuscation as systems engineering, not just algebraic mutation. CFG safety, stack typing, and mutation hygiene matter just as much as clever identities. Build evaluation in from day one, make the transformations measurable, and keep complexity under policy control.

Part 2 stays on the attacker side of the workflow: canonicalisation strategy details, exact-match limits, and what the residual scan still sees after automated cleanup.