Decompiler Tech: Blog

Blog

2018-01-27: Debugging For The Elite!
2016-11-14: How To Output Info On Non-6502 Assembly Files
2016-11-12: Decompiler Now Outputting Useful Info For Reverse Engineers
2016-03-23: Initial Work on ARM Decompiler
2016-03-04: On the Philosophy of Apathy
2016-02-28: New Feature: Removing Erroneous BLocal Initialisation
2016-01-21: Breakthrough: Recompilation
2016-01-13: First recompiled decompiled code and Selective Recompilability
2015-12-28: Initial Work on Java Decompiler
2015-12-22: Auto-Detecting Arguments and Return Values
2015-12-19: Goto's Considered Okay
2015-12-16: Global X-Referencing
2015-12-08: Commodore Free Issue 90
2015-11-22: Notes on static analysis
2015-11-12: Notes on bytecode
2015-10-02: RevEngE Explained
2015-10-01: We're using Lisp!
2015-09-11: VB5/VB6 Native Decompiling Service is Back
2015-09-07: VB5/VB6 Native Decompiler Backported
2015-08-27: 68000 CPU Ported to RevEngE Decompiler
2015-08-02: The 6502 Decompiler, RevEngE6502, is now available for download!

2018-01-27: Debugging For The Elite

It's been a couple of years since this software was revised, but it deserves posting now.

Basically, in SuperSAM, which is the C64 emulator Frodo with debug extensions, there is a separate window (the 'log' window) which shows things like when a memory write has been made, or a read.

SuperSAM:
< expression        Set refuse-write from here up
> expression        Set refuse-write from here down
* expression        Set update display from this address
! expression        Add read log point to list (no arg clears list)
w expression        Add write log point to list (no arg clears list)
$ expression        Add execute log point to list (no arg clears list)
U expression        Old/new value pokefinder
u expression        +/- 1 pokefinder
D "file"            Load up interleaved decompile dump
J                   Show last 4 JSR's
R                   Reset JSR count
I [choice] [start] [end]  Show functions with JSR count >0 in memory range (default choice=0 start=0 end=ffff).

For the 'D' argument, the file looks like this, with a function name, then 2 lines at a time, the first the address, the second the code:

void MainMenu() {
19585
Global_90 = 1;
19587
usub_1CFB();
19605
usub_1D9A();
19608
usub_439A();
19611
usub_49E4();
19614
DrawMenuLines();

The rest of the commands above are self explanatory right now. Just play around and see what happens.

Get SuperSAM here.

2016-11-24: How To Output Info On Non-6502 Assembly Files

The decompiler can process a 6502 assembly file for outputting the debug info, but what for other .asm files?

Well, only 3 things need doing.

First, an x86 file looks like this:
- 00000001 jnz 00000005h
- 00000002 call 0046CF38h
- 00000003 jmp 00000004h
- 00000004 xor eax,eax
- 00000005 ret
A 6502 would convert to:
- 0001 BNE 0005
- 0003 JMP 0004
- 0004 NOP
- 0005 RTS

See how easy it is to convert it?

First, any jnz is converted to the 6502 equivalent (BNE)
Second, any jmp is converted to a 6502 JMP
Any non-branch line and non-ret referenced in a jnz or jmp is turned into a NOP
ret turns into RTS
All other assembly lines are ignored

With the above conversion of data, now you can run any x86 assembly file into the 6502 module!

2016-11-12: Decompiler Now Outputting Useful Info For Reverse Engineers

I made my decompiler output useful information for reverse-engineers. The information below ends up in a path like: jobs/zol/logs/zol.usub_6967.txt.debug.txt

File: zol.usub_ED6B.txt

Inter-loop Jumps: (ED95)
Jumps: NIL
Main Path: (ED6B ED6E ED71 ED74 ED77 ED79 ED7C ED7E (0 ED80 ED82 ED84 ED86 ED88) ED8A ED8C ED8E ED90 ED92 ED95 ED97 ED98 ED9A ED9C (0 ED9E EDA0) EDA2 EDA5 EDA7 EDA9 EDAB EDAD EDAF)
(1 is Else, 0 is If)
Number of loops: 1
Loop 0: (WHILEBRANCH = -1) (EXIT-DO-DESTINATION = EDA5) (END = EDA2) (REALSTART = ED8C) (START = ED8C) (STARTED = 0) (ENDED = 0) (WHILE1 = NIL) (SCANNED = NIL) (ISFOR = NIL) (DONE-FIRST-EXIT-DO = NIL) (PASS = 1)

"Inter-loop Jumps" tells us any jumps into or out of a loop.
"Jumps" is a jump from inside one Path and another.
"Main Path" is a list. Any sub-list beginning '1' is an Else block, and '0' is an If block. Otherwise, it's just addresses in the root of the list.
- See main path of ED9E/EDA0:
  - if ((Global_66 > 255)) {
  - Global_67 = (Global_67 + 1);
  - }//EndIF; EDA0
- See ASM for above path:
  - ED9E: bcc EDA2
  - EDA0: inc 43

Maybe people will use it now...

2016-03-23: Initial Work on ARM Decompiler

We've got the ARM compiler toolchain and we're writing simple, general code then disassembling it (using objdump) and then decompiling that assembly code. There is an ARM module which is 458 lines of code, but the decompiler itself works fine as usual.

We've done:

Strings (got from the arm.bin which is a compiled binary)
Loops
Branches
Loop variables and Branch variables
Function variables
Basic structs (eg, FILE *)
No registers in produced code
3 operand handling
Conditional execution

We haven't done:

Arrays (these are tricky)
Structs with no types
Figuring out statically linked library functions (but we're working on it).

Next up will be a sample to show just how well it works.

2016-03-04: On the Philosophy of Apathy

On reverse engineering forums, you inevitably get someone finding a particular problem which a decompiler will fail at.

Well, let's say the decompiler can do everything else, and such problems are in the minority.

My point is this: Is it better to say it's not going to be perfect, so don't attempt it OR just try and get success 99% of the time?

2016-02-28: New Feature: Removing Erroneous BLocal Initialisation

Look at the following code:

int BLocal1;
BLocal1 = 0;
if (((CreatureUnderCursor_HealthEtcAttributes)[2] < 144)) {
       BLocal1 = 1;
} else {
       if (((CreatureUnderCursor_HealthEtcAttributes)[4] == 255)) {
               BLocal1 = 2;
       } else {
               BLocal1 = 3;
       }//EndIF; 67E2
}//EndIF; 67E2
myregs.pc = 0x4341;//DrawBorderTile
myregs.a = BLocal1;
_sys(&myregs);

Basically, the 'BLocal1 = 0' is code that is irrelevant.

That's because if the '< 144' condition is true, BLocal1 becomes 1. Else it goes on to the '== 255' condition. If true, it's 2. If the '== 255' condition is false, BLocal1 becomes 3.

Whatever happens, the 'BLocal1 = 0' is simply not needed, so the decompiled code omits that 'BLocal1 = 0'.

2016-01-21: Breakthrough: Recompilation

As you probably know, there's 2 kinds of decompiled code. One that humans need to edit and fix, the other that recompiles with no human input, and runs identical to the original code.

You might also know that 6502 code has various gotchas which should make decompilation impossible.

Well, now we prove the doubters wrong. Thanks to help from the CC65 mailing list, we have a bunch of decompiler-generated code which runs identically to the code it was decompiled from!

What we did was, we compiled the code below with CC65 to a binary file of around 600 bytes. Then we loaded up Temple of Terror (the text adventure the code below is from) and loaded in the binary file at $0801. Then we edited the DropAll subroutine so the first thing it did was JMP to $080D which is from the binary file. Then we ran the game and typed 'DROP ALL'. If anything was wrong, it would have bombed out, but instead, it worked and dropped all objects the player has. Success!

This, of course, means that the decompiler will soon be able to output code from other CPU's without editing.

#include <stdio.h>
#include <stdlib.h>
#include <6502.h>
typedef unsigned char *PUC;
#define PlayerLocation (*(PUC)1024)
#define ObjectIn (*(PUC)1026)
#define NumberOfObjectsHeld (*(PUC)1029)
#define NumberOfObjectsInGame (*(PUC)1030)
#define ObjectInRoomTable ((PUC)1086)
void DropAll() {
int LLocal1;
if ((NumberOfObjectsHeld != 0)) {
        LLocal1 = NumberOfObjectsInGame;
        do {
                if ((ObjectInRoomTable[LLocal1] == ObjectIn)) {
                        ObjectInRoomTable[LLocal1] = PlayerLocation;
                }//EndIF; 313B
                LLocal1 = (LLocal1 - 1);
        } while (LLocal1 != 0);//LoopEndWh 313F
        NumberOfObjectsHeld = 0;
        __asm__("jsr $30B8");
        return;// 3147
} else {
        __asm__("jsr $2E84");//Inventory
        return;// 314A
}//EndIF; 314A
return;// 314A
};

2016-01-13: First recompiled decompiled code and Selective Recompilability

It's a landmark, the first time Detech has put decompiled code into a C compiler, and have it run exactly the same as the original assembly code.

The compiler used is CC65, a 6502 C compiler, and outputs to a 6502 binary.

PUC is a pointer to an address.

#define Redraw_PointerToColourMap *((PUC*)6)
#define BorderDrawParamColour *((PUC*)133)
#define GUI_Draw_Pointer *((PUC*)4)
#define Global18 *((PUC*)18)
#define Global135 *((PUC*)135)

void DrawBorderTile(int Arg_acc) {
int LLocal1;
Global135 = Arg_acc;
usub_4313();
LLocal1 = 7;
do {
       ((PUC *)GUI_Draw_Pointer)[LLocal1] = ((PUC *)Global18)[LLocal1];
       LLocal1 = (LLocal1 - 1);
} while (LLocal1 >= 0);//LoopEndWh 434D
((PUC *)Redraw_PointerToColourMap)[0] = BorderDrawParamColour;
return();// 4355
};

This code (just 12 lines of assembly code) compiles to a 672-byte C64 .PRG file.

It hasn't yet been tested but this is to follow in a couple of days.

For now, just bask in the glory!

This is part of Decompiler Tech's new strategy on decompilation: Selective recompilability.

The goal is to make it possible to recompile the code output by RevEngE. But the twist on a normal strategy is that people can recompile just the subroutines they want to deal with, and patch them so the new code is called instead of the old code. Rather than decompiling the entire program.

And this strategy also means that for a fully-defined decompiler module such as 6502, if we get it working with 6502, it will follow quite easily for any CPU we can fully define.

As a side note, 'define' means every CPU opcode is accounted for and defined. 6502 has all opcodes in the module, but of course, x86 doesn't (and Java soon will).

2015-12-28: Initial Work on Java Decompiler

We've started working on a Java module for RevEngE (decompiling Java, not running it). After 2 days, we've got a basic sketch that fits entirely within the RevEngE API. The module is about 150 lines of code. It can handle loops, loop variables, and the stack-ish stuff that Java does a lot.

The stack is emulated the same as normal (eg, x86 stack), by changing just 15 lines of main engine code.

In a few days, it should be good enough for more ambitious input code.

Decompiled Java code:
public static void main(java.lang.String[]) {
	int BLocal1;
	java.lang.System.out.println("Hello, World");
	float LLocal1 = 0.0;
	do {
		if (50.0 <= LLocal1) {
			break;
		};
		java.lang.System.out.println(LLocal1);
		LLocal1 = (1.0 + LLocal1);
	}//LoopEnd 1C
	String local2 = "hi there";
	java.lang.System.out.println(local2.charAt(3));
	int local4 = 2;
	BLocal1 = 1;
	if ((2 == local4)) {
			BLocal1 = 1;
	} else {
			BLocal1 = 2;
	}//EndIF; 3F
	java.lang.System.out.println(BLocal1);
	return();// 47
};

Original Java code:
public static void main(String[] args) {
	// Prints "Hello, World" to the terminal window.
	System.out.println("Hello, World");
	for (float i = 0; i < 50; i++)
	{
		System.out.println(i);
	}

	int f = 2.0;
	
	String tempstr = "hi there";
	System.out.println(tempstr.charAt(3));

	int blocal = 1;
	if (f == 2)
	{
		blocal = 1;
	}
	else
	{
		blocal = 2;
	}
	System.out.println(blocal);
}

2015-12-22: Auto-Detecting Arguments and Return Values

One of the things the RevEngE decompiler does is find which registers a function has as its input, and also which registers the function outputs.

if ((Global163 + Global138) & 1) {
        usub_B6A1(207,1,0);
}

B6A1 has as its function header: void tirnanog.usub_b6a1.txt(int Arg_acc,int Arg_y,int Arg_x)

2015-12-19: Goto's Considered Okay

There's a problem in decompilation. The first problem is when people use goto's in their code, and the second problem is the way conditional blocks made from || and && create crazy goto's.

So the only solution is to recognise when to use goto's, and that must be as rare as possible.

We recognise goto's that are things like:

Jumps out of loops
Jumps into loops
Jumps out of a subroutine right into the middle of another
Crazy conditional blocks

Whenever they occur, we put a goto in the decompiled code, so that the program is readable most of the time, and goto's are only used in extreme cases.

A goto used in a conditional block, can actually, in some cases, be turned back into the original conditions, so goto's can be optimised out this way.

Just another example of how we get readable, bug-free code for 99.5% of the time, and leave the nasty stuff for the last 0.5%.

2015-12-16: Global X-Referencing

There are now listings of Global variables, saying which function reads or writes to them.

See the writes here and the reads here.

2015-12-08: Commodore Free Issue 90

The new issue of C= Free is out, and Decompiler Tech is mentioned! Which is great!

Also in this issue is the Crack and Train Like a Pro which has a great tip for finding the start address of a C64 program: Hunt for these bytes: H 0000 0800 A9 37 85 01

In Frodo/SuperSAM, that's basically q 0000 ffff a9 37 85 01.

I just found out the start address for Super G-Man! Great tip.

2015-11-22: Notes on static analysis

Because the decompiler simulates the code, it should be possible to find function calls that were previously hidden, and also trace variables across the code base. This should make it useful for analysing binaries for malicious code.

2015-11-12: Notes on bytecode

If a program (or game) has its own internal bytecode (like assembly code), people say you can't decompile it. But you can!

First you decompile the program and figure out the bytecodes it uses.

Then you disassemble the bytecode and put THAT into RevEngE. So it'll decompile the bytecode!

2015-10-01: RevEngE Explained

The new version of the decompiler is called RevEngE. This stands for: Reverse Engineering Emulator (not Engine!).

2015-10-01: We're using Lisp!

The first version of our decompiler, called VBRB (VB Right Back) was written in C++. But it was rewritten after the author learned Scheme at University. The rewrite was from the ground up in Common Lisp.

Advantages include complex data structures made easy, and it's as fast, if not faster, than C++.

Another advantage is memory usage. I found out C++ was hogging tons of memory in various strings, maps and vectors, and Lisp is much better (around 12MB with a typical decompiled binary).

2015-09-11: VB5/VB6 Decompiling Service is Back

More great news for VB5/6 native code customers.

We have fully resurrected our VB5/6-specific code. It's now working with the new engine (see below). The great thing is the new engine means the decompiled VB code is now quite a bit more accurate than before.

2015-09-07: VB5/VB6 Native Decompiler Backported

The good news is that all the code dealing with VB5/VB6 native executables, has been merged back into the most recent decompiler engine.

Therefore, even better news than usual: We can now decompile any VB5/VB6 native executables! And even better, all the improvements to structure of your programs are available for VB5/VB6!

Feel free to send us a message regarding VB5/VB6. Don't worry if you don't know if it's native or P-code, we'll tell you that for free!

2015-08-27: 68000 CPU Ported to RevEngE Decompiler

It took just 3 days, but we ported RevEngE to deal with 68000 code. We took a small sample C program (DeHex from Fred's Fish Disks) and implemented all the opcodes, without any custom opcodes needed (except movem but that's not part of the actual code). About 18 opcodes were used in this program.

It's the same as 6502. There are NO registers in the code, and just a few special variable types, which aren't changed from 6502 (or even x86).

This has just been done, and could take a couple of weeks to implement the rest of the opcodes and operand data types. But this is the future!

Sample here.

2015-08-02: The 6502 Decompiler, RevEngE6502, is now available for download!

Some features in the paid version are:

Interleaved code output, where each line is prefixed by the address of that line.
Outputs code to a .code.txt file.
Recurses to find function arguments and return values.
Outputs disassembly to a HTML file.
Output Global variable cross-references to a HTML file (where each Global is listed by the functions that call it, with a clickable link to the function code files).

For now, the demo version:

Outputs code to the (black) console window
Has a 60 line assembly line limit - per file

Also available is a version of the C64 Frodo emulator, with extra code in the SAM machine code monitor. This makes it easier to find areas of code to disassemble.

See here for downloadable binaries.