Note: The code repository for this project will be released soon.
I - Intro
In Part 1, I built a layered MBA (Mixed Boolean-Arithmetic) obfuscator with deterministic template selection, optional nonlinear templates, and a small eval harness around it. The whole point was to add algebraic noise to 32-bit arithmetic without losing correctness.
The maths model is still the same: bit vectors, with evaluation under unchecked two's-complement semantics. For any transformed site, the original and obfuscated expressions should evaluate identically:
Because every site replacement preserves its expression denotation, the entire method-level functional equivalence holds:
But making the IL uglier is only half the job. The more interesting question is what still survives once you point a simplifier at it.
So in this post, I switch to the attacker's side. I run the obfuscated binaries through a tiered .NET IL deobfuscator and look at which simplifications fire, which ones fail, and how much structural "slag" is left when the automated cleanup stops.
II - The Rewrite Pipeline and Effort
A realistic attacker does not rely on a single magic pass to clean up code. Instead, they use a tiered pipeline. I structured mba-deobfuscation to mimic this strategy.
The pipeline applies two main tiers in a loop until no more changes can be made:
- Canonicalisation (
canonicalisation_rewrites): Commutes operands into a standard order and performs constant folding. This strips away syntactic noise and prepares the IL for pattern matching. - Exact Matching (
exact_simplifications): Searches for known linear MBA identities and replaces them with their simplified equivalents (add).
Here is what the core loop looks like inside MbaRewriter.cs:
1var keepGoing = true;
2while (keepGoing)
3{
4 keepGoing = false;
5 var passCanonicalisationRewrites = IlPatternHelpers.Canonicalise(body, il);
6 if (passCanonicalisationRewrites > 0)
7 {
8 canonicalisationRewrites += passCanonicalisationRewrites;
9 keepGoing = true;
10 }
11
12 for (var i = 0; i < il.Count; i++)
13 {
14 if (MbaPatternMatcher.TryMatchUnary(il, body, i, out var unaryLen, out var v))
15 {
16 IlPatternHelpers.ReplaceRange(il, i, unaryLen);
17 exactSimplifications++;
18 keepGoing = true;
19 break;
20 }
21
22 // ... TryMatchBinary ...
23 }
24}For rewrite effort, I just count how many canonicalisation rewrites and exact simplifications fired in a given run :
As layers goes up, grows too, because canonicalisation has to peel back the generated expressions step by step.
III - Recovery Effectiveness and Failures
I track two simple ratios here to get a rough feel for how the deobfuscator is doing against the obfuscated output:
Both names need a little context in the current implementation.
exact_simplifications counts rewrite events, not recovered sites. One transformed site can trigger multiple exact rewrites, so is really an "exact simplification events per transformed site" signal, and it can exceed 1.0.
residual_fragments also needs context. The residual pass scans the final IL after rewriting and groups surviving arithmetic/bitwise sequences into coarse fragments. So is not a clean probability, and it is not guaranteed to drop to zero even when all emitted linear templates are removed.
With linear-only templates (nonlinear=false, or nonlinear_ratio=0.0), the deobfuscator can exactly match the templates it knows about, and the exact simplification count scales strongly with layers.
Once nonlinear templates enter the mix, the exact matchers still strip the recognizable linear wrappers, but the multiplicative expansions survive exact matching. In the current metrics, that shows up less as a collapse in exact rewrites and more as an increase in residual density and residual complexity.
IV - Residual Structure Analysis
After simplification, what is actually left behind? I wanted a rough way to score how difficult the remaining code looks, which is what I call residual_complexity.
Instead of just counting instructions, I assign weights based on how annoying the operations usually are to analyse. Multiplications score highest, then bitwise logic and shifts, while plain arithmetic scores lowest.
1private static int GetResidualWeight(Code code) =>
2 code switch
3 {
4 Code.Mul => 4,
5 Code.And or Code.Or or Code.Xor => 3,
6 Code.Shl or Code.Shr or Code.Shr_Un => 2,
7 Code.Add or Code.Sub or Code.Not or Code.Neg => 1,
8 _ => 0
9 };After the rewrite loop finishes, the deobfuscator walks the remaining IL, groups arithmetic/bitwise sequences into residual_fragments, and calculates the total residual_complexity.
This is intentionally coarse. It does not prove that every counted fragment came from a failed MBA simplification, but it is still useful when comparing runs from the same pipeline.
V - Results
Now for the numbers. To report these metrics across a set of sweep runs , I aggregate the per-run metric using the mean over the slice:
The tables below use selected slices from eval-artifacts/metrics-20260403-165636.csv, aggregated across seeds at growth_budget = 10000. One small note: during development I found and fixed a parser bug in mba-eval where transformed was being read from inside methods-transformed, so the ratios below assume corrected TransformedSites values.
Table B: Rewrite Effort and Recovery
| Layers | NL Ratio | Exact Simplifications | Canonicalisation Rewrites | Simplification Rate | Residual Rate |
|---|---|---|---|---|---|
| 1 | 0.0 | 15.0 | 0.0 | 1.00 | 1.40 |
| 3 | 0.0 | 45.0 | 0.0 | 3.00 | 1.40 |
| 1 | 0.5 | 14.0 | 3.0 | 0.93 | 1.82 |
| 3 | 0.5 | 43.0 | 2.0 | 2.87 | 2.20 |
The linear rows are the cleanest. As layering rises from 1 to 3, exact simplifications scale from 15 to 45, which is what you would expect when every emitted linear wrapper is recoverable by the current matcher set. Because these ratios are events per transformed site, the layer-3 linear slice rises above 1.0: each transformed site is causing multiple exact simplification events on the way back down.
The nonlinear rows are where it gets more interesting. At layer 1, the exact simplification rate drops below 1.0, which tells us that some transformed sites are no longer fully recoverable through exact matching alone. By layer 3, the pipeline still strips a lot of the linear wrapping, but the residual metrics climb more sharply: more of the final IL survives into the post-pass scan, and the weighted complexity goes up with it.
Table C: Residual Hardness
| Layers | NL Ratio | Avg Fragments | Avg Complexity |
|---|---|---|---|
| 1 | 0.0 | 21.0 | 25.0 |
| 1 | 0.5 | 27.3 | 48.0 |
| 3 | 0.5 | 33.0 | 77.0 |
There is one important subtlety here. The linear baseline is not zero because the residual scan is method-wide; it still sees ordinary arithmetic and bitwise structure in the final IL. What matters is the delta. Moving from the linear layer-1 slice to the mixed nonlinear layer-1 slice raises mean residual complexity from 25.0 to 48.0. Pushing the same nonlinear mix to layer 3 raises that to 77.0, about 1.6x above the layer-1 nonlinear slice.
VI - Conclusion
This is why I wanted measurement in the loop. Generating huge IL blocks is easy; making them harder to simplify without breaking correctness or blowing up size is the harder part.
Building the obfuscator and deobfuscator side by side made a few things pretty clear:
- Linear MBA is brittle against canonicalisation + pattern matching, merely imposing a computational tax.
- Nonlinear expansions break exact structural matchers and leave measurably heavier residual structure in the current pipeline.
- Complexity can be explicitly budgeted and mathematically measured, taking the guesswork out of protection efficacy.
In the end, obfuscation is still about shifting the economics of reverse engineering. This prototype does not create an unbreakable binary, but it does leave more surviving structure for an automated cleanup pass to deal with while keeping measured size growth bounded on the current sample.