Note: The code repository for this project will be released soon.
I - Intro
I recently built a practical reversing experiment: obfuscating arithmetic in .NET IL, then immediately trying to recover it with a paired deobfuscator to see what actually survives.
I kept the project pretty structured: deterministic template selection, optional nonlinear templates, telemetry output, and a deobfuscator broken into simple tiers.
The goal isn't to create "unsimplifiable code" (which is mostly a myth anyway). The idea is to make automated simplification work harder while keeping correctness and overhead under control.
Note : I'm not perfect regarding my IL implementation, I could be wrong in some parts, but the results speak for themselves. This project was also done over 3 days on a collective of 9 hours of sleep. So take most of it with a grain of salt
II - Model and Notation
Before getting into the pipeline, here is the small model I use throughout the post.
- is the machine word size.
- is the set of -bit vectors.
- is expression evaluation under unchecked two's-complement semantics.
- , , and are arithmetic modulo .
- , , , , , and are the usual bitwise and shift operators over .
The rule for each replacement is simple:
At method level, that means:
In practice, that meant using identities that cancel cleanly, only rewriting sites with the right stack shape, and normalising branches afterwards.
III - Threat Model and Goals
So, who are we defending against? I model the defender against an analyst who has:
- static IL access,
- symbolic simplification tooling,
- pattern-based deobfuscation capability.
In practice, that analyst is trying to recover x + y intent, collapse all the algebraic noise, and normalise the IL back to simpler equivalent forms.
For our defence, we want to:
- Preserve runtime semantics under unchecked 32-bit arithmetic.
- Increase expression diversity and rewrite effort.
- Keep growth and runtime overhead controllable.
- Quantify simplification success and residual complexity after deobfuscation.
IV - System Overview
The workflow is split into four cooperating components:
mba-obfuscation: The IL transformer that replaces eligibleaddsites with layered MBA templates.mba-deobfuscation: canonicalisation + exact rewrite engine + residual analysis.mba-eval: experiment runner producing CSV/JSON metrics and bundled artefacts.mba-sample: a deterministic sample program used for baseline sweeps.
Dataflow is pretty simple: Build the sample -> Obfuscate it -> Deobfuscate it -> Execute and compare -> Export telemetry.
V - Let's build
This is the sequence I followed while building it. If you want to reimplement it, I suggest following the same order:
- Build safe IL mutation infrastructure.
- Identify valid
int32 addcandidate sites. - Create a template catalogue (Linear/Nonlinear).
- Add layer-by-layer expansion.
- Make template selection deterministic.
- Add growth budgeting.
- Emit telemetry.
- Build tiered deobfuscation.
- Build the eval harness.
Let's dive into the details.
VI - Safe IL Rewrite Surface
The first thing I focused on was mutation safety. The obfuscator runs on dnlib and edits instruction lists in place. Any sloppy rewrite breaks everything quickly.
I used some simple guardrails: skip methods with exception handlers, rewrite in descending index order to avoid index drift, and normalise branches (SimplifyBranches) after successful rewrites.
Most of the early failures had nothing to do with the algebra. They came from IL hygiene issues like broken branch targets or bad stack shape, so rewrite safety became a hard precondition.
1if (body.ExceptionHandlers.Count > 0)
2{
3 telemetry.SkipReason = MethodSkipReason.ExceptionHandlersPresent;
4 return telemetry;
5}
6
7var byIndex = adds.Select(a => il.IndexOf(a)).OrderByDescending(i => i).ToList();
8foreach (var idx in byIndex)
9{
10 // remove original add and insert replacement layers
11 il.RemoveAt(idx);
12 var insertAt = idx;
13 InsertBinaryMbaLayer(...);
14 for (var layer = 1; layer < layers; layer++)
15 InsertUnaryMbaLayer(...);
16}
17
18if (telemetry.TransformedSites > 0)
19{
20 body.SimplifyBranches();
21 body.OptimizeBranches();
22}VII - Finding 'int32 add' Sites
Blindly rewriting every add is a good way to break IL. Before I touch a site, I want to know that both stack inputs are really int32.
So I first split the method into basic blocks, then run a lightweight stack simulation. The Sk domain (I32, I64, R4...) is enough to decide if an add is safe.
1// mark only add ops whose two stack inputs are int32
2if (insn.OpCode == OpCodes.Add && st.Count >= 2 && st[^2] == Sk.I32 && st[^1] == Sk.I32)
3 hits?.Add(insn);Once I added stack merges and block simulation, the invalid replacements in branch-heavy methods mostly disappeared.
VIII - Template Catalog
Now for the fun part. I moved to template architecture.
Linear templates:
Things like (x ^ y) + 2*(x & y). I keep these intentionally recoverable so deobfuscation can measure exact match rates.
Nonlinear templates: For these, I only introduced mild multiplicative expansions with explicit cancellation terms. These are marked best-effort and are the material I expect to survive exact-match rewriting.
1// (x + y) + ((x + c)*(y + c) - (x*y + c*x + c*y + c*c))
2private static void EmitBinaryAffineProductCancel(...)
3{
4 var c = PickConst(method, originalAddIndex, layer, 0x63);
5 il.Insert(insertAt++, OpCodes.Ldloc.ToInstruction(lx));
6 il.Insert(insertAt++, OpCodes.Ldloc.ToInstruction(ly));
7 il.Insert(insertAt++, OpCodes.Add.ToInstruction());
8 // ... expansion terms ...
9 il.Insert(insertAt++, OpCodes.Sub.ToInstruction());
10 il.Insert(insertAt++, OpCodes.Add.ToInstruction());
11}I kept this emitter intentionally explicit so you can audit every cancellation term directly in IL.
IX - Layered Generation
Each selected add site expands as:
- Layer 0 (binary): consumes original two operands.
- Layers 1..L-1 (unary): repeatedly wraps the evolving expression with unary MBA identities.
The sequence applies a single binary base transformation followed by iterative unary wrapping to increase depth.
1il.RemoveAt(idx);
2var insertAt = idx;
3InsertBinaryMbaLayer(method, il, ref insertAt, lx, ly, idx, 0, enableNonlinear, nonlinearRatio, nonlinearStartLayer, telemetry);
4
5for (var layer = 1; layer < layers; layer++)
6 InsertUnaryMbaLayer(method, il, ref insertAt, lx, ly, idx, layer, enableNonlinear, nonlinearRatio, nonlinearStartLayer, telemetry);X - Deterministic Randomness
I wanted reproducible runs before tuning anything, so template choice and constants come from a deterministic hash of the method token, add index, layer index, salt, and user seed.
That gives me the same output every time I rerun the same configuration. It also lets me control nonlinear behaviour proportionally without losing determinism at each (method, site, layer).
XI - Growth Budget
Without a budget, expression layering blows methods up fast. So each method gets a max IL growth budget, and rewrites that would go over it are skipped.
Enforcing a growth budget prevents a small number of heavy sites from dominating the total binary size growth.
XII - Telemetry
I treated telemetry as part of the build from the start. The obfuscator emits module-level totals, per-method telemetry, and a template usage histogram. Without that, the determined quality of an obfuscator is simply just an opinion.
XIII - Deobfuscation Tiers
To keep the evaluation grounded, I built a paired deobfuscator that mirrors a simple attacker workflow in tiers:
- Tier 0: Canonicalisation (normalise commutative operand ordering, constant-fold)
- Tier 1: Exact matching (match known linear templates and rewrite them)
- Tier 2: Residual analysis (measure residual fragments and complexity)
The point of splitting it this way is to separate exact recovery from partial cleanup, and then look at what kind of structure is still left afterwards.
XIV - Evaluation
To evaluate the whole thing properly, I automated the full loop in mba-eval. Each run is indexed by the tuple:
corresponding to the configuration parameters layers, seed, nonlinear, nonlinear_ratio, and growth_budget. This maps to a concrete serialised identifier:
That stable identifier makes it easy to group results by configuration and rerun the same experiment later.
One annoying bug showed up while I was building the evaluator. My first parser used naive substring matching on the obfuscator's telemetry line, so transformed accidentally matched inside methods-transformed and briefly skewed some downstream ratios. I fixed that by parsing the telemetry as exact key=value fields instead.
XV - Results
Here are selected slices from the current sweep artefacts (eval-artifacts/metrics-20260403-165636.csv), aggregated across seeds at growth_budget = 10000.
The first thing I checked was correctness and raw transformation coverage. On these selected runs, every obfuscated and deobfuscated assembly matched the baseline output, and the obfuscator's own module telemetry reported attempted=15, transformed=15, and skipped-growth-budget=0. So on this sample, the interesting movement is not in whether rewrites happen, but in how much structural cost and diversity they add.
Cost Curves
With correctness confirmed, the direct cost mostly moves with the number of layers () and how much nonlinear material gets injected.
I tracked those costs with:
| Layers | NL Ratio | Budget | Mean Runtime Overhead | Mean Size Growth |
|---|---|---|---|---|
| 1 | 0.5 | 10000 | 0.88x | 1.03x |
| 3 | 0.5 | 10000 | 0.87x | 1.18x |
On this small sample, runtime ratios are noisy and mostly stay below 1.0x, so size growth is the cleaner signal. The stable effect here is IL/code-size expansion, not a decisive runtime slowdown.
Template Entropy Structural complexity in the generated IL depends a lot on template diversity. For a given run with a template histogram where each template count is :
The overall entropy is:
| Layers | NL Ratio | Mean Entropy | Mean Diversity |
|---|---|---|---|
| 1 | 0.5 | 2.02 | 4.33 |
| 3 | 0.5 | 2.99 | 9.00 |
That trend is the clearest structural result in Part 1: adding layers increases the number of template instances and broadens the observed template mix, even when the runtime signal stays noisy.
XVI - End word
Well, I think that we are done!
You should treat obfuscation as systems engineering, not just algebraic mutation. CFG safety, stack typing, and mutation hygiene matter just as much as clever identities. Build evaluation in from day one, make the transformations measurable, and keep complexity under policy control.
Part 2 stays on the attacker side of the workflow: canonicalisation strategy details, exact-match limits, and what the residual scan still sees after automated cleanup.