Navigate Site: | Home | Blog | Forum | Samples | Downloads | About Us | Links | Documentation
It's been a couple of years since this software was revised, but it deserves posting now.
Basically, in SuperSAM, which is the C64 emulator Frodo with debug extensions, there is a separate window (the 'log' window) which shows things like when a memory write has been made, or a read.
SuperSAM: < expression Set refuse-write from here up > expression Set refuse-write from here down * expression Set update display from this address ! expression Add read log point to list (no arg clears list) w expression Add write log point to list (no arg clears list) $ expression Add execute log point to list (no arg clears list) U expression Old/new value pokefinder u expression +/- 1 pokefinder D "file" Load up interleaved decompile dump J Show last 4 JSR's R Reset JSR count I [choice] [start] [end] Show functions with JSR count >0 in memory range (default choice=0 start=0 end=ffff).
For the 'D' argument, the file looks like this, with a function name, then 2 lines at a time, the first the address, the second the code:
void MainMenu() { 19585 Global_90 = 1; 19587 usub_1CFB(); 19605 usub_1D9A(); 19608 usub_439A(); 19611 usub_49E4(); 19614 DrawMenuLines();
The rest of the commands above are self explanatory right now. Just play around and see what happens.
Get SuperSAM here.
The decompiler can process a 6502 assembly file for outputting the debug info, but what for other .asm files?
Well, only 3 things need doing.
See how easy it is to convert it?
With the above conversion of data, now you can run any x86 assembly file into the 6502 module!
I made my decompiler output useful information for reverse-engineers. The information below ends up in a path like: jobs/zol/logs/zol.usub_6967.txt.debug.txt
File: zol.usub_ED6B.txt
Maybe people will use it now...
We've got the ARM compiler toolchain and we're writing simple, general code then disassembling it (using objdump) and then decompiling that assembly code. There is an ARM module which is 458 lines of code, but the decompiler itself works fine as usual.
We've done:
We haven't done:
Next up will be a sample to show just how well it works.
On reverse engineering forums, you inevitably get someone finding a particular problem which a decompiler will fail at.
Well, let's say the decompiler can do everything else, and such problems are in the minority.
My point is this: Is it better to say it's not going to be perfect, so don't attempt it OR just try and get success 99% of the time?
Look at the following code:
int BLocal1; BLocal1 = 0; if (((CreatureUnderCursor_HealthEtcAttributes)[2] < 144)) { BLocal1 = 1; } else { if (((CreatureUnderCursor_HealthEtcAttributes)[4] == 255)) { BLocal1 = 2; } else { BLocal1 = 3; }//EndIF; 67E2 }//EndIF; 67E2 myregs.pc = 0x4341;//DrawBorderTile myregs.a = BLocal1; _sys(&myregs);
Basically, the 'BLocal1 = 0' is code that is irrelevant.
That's because if the '< 144' condition is true, BLocal1 becomes 1. Else it goes on to the '== 255' condition. If true, it's 2. If the '== 255' condition is false, BLocal1 becomes 3.
Whatever happens, the 'BLocal1 = 0' is simply not needed, so the decompiled code omits that 'BLocal1 = 0'.
As you probably know, there's 2 kinds of decompiled code. One that humans need to edit and fix, the other that recompiles with no human input, and runs identical to the original code.
You might also know that 6502 code has various gotchas which should make decompilation impossible.
Well, now we prove the doubters wrong. Thanks to help from the CC65 mailing list, we have a bunch of decompiler-generated code which runs identically to the code it was decompiled from!
What we did was, we compiled the code below with CC65 to a binary file of around 600 bytes. Then we loaded up Temple of Terror (the text adventure the code below is from) and loaded in the binary file at $0801. Then we edited the DropAll subroutine so the first thing it did was JMP to $080D which is from the binary file. Then we ran the game and typed 'DROP ALL'. If anything was wrong, it would have bombed out, but instead, it worked and dropped all objects the player has. Success!
This, of course, means that the decompiler will soon be able to output code from other CPU's without editing.
#include <stdio.h> #include <stdlib.h> #include <6502.h> typedef unsigned char *PUC; #define PlayerLocation (*(PUC)1024) #define ObjectIn (*(PUC)1026) #define NumberOfObjectsHeld (*(PUC)1029) #define NumberOfObjectsInGame (*(PUC)1030) #define ObjectInRoomTable ((PUC)1086) void DropAll() { int LLocal1; if ((NumberOfObjectsHeld != 0)) { LLocal1 = NumberOfObjectsInGame; do { if ((ObjectInRoomTable[LLocal1] == ObjectIn)) { ObjectInRoomTable[LLocal1] = PlayerLocation; }//EndIF; 313B LLocal1 = (LLocal1 - 1); } while (LLocal1 != 0);//LoopEndWh 313F NumberOfObjectsHeld = 0; __asm__("jsr $30B8"); return;// 3147 } else { __asm__("jsr $2E84");//Inventory return;// 314A }//EndIF; 314A return;// 314A };
It's a landmark, the first time Detech has put decompiled code into a C compiler, and have it run exactly the same as the original assembly code.
The compiler used is CC65, a 6502 C compiler, and outputs to a 6502 binary.
PUC is a pointer to an address.
#define Redraw_PointerToColourMap *((PUC*)6) #define BorderDrawParamColour *((PUC*)133) #define GUI_Draw_Pointer *((PUC*)4) #define Global18 *((PUC*)18) #define Global135 *((PUC*)135) void DrawBorderTile(int Arg_acc) { int LLocal1; Global135 = Arg_acc; usub_4313(); LLocal1 = 7; do { ((PUC *)GUI_Draw_Pointer)[LLocal1] = ((PUC *)Global18)[LLocal1]; LLocal1 = (LLocal1 - 1); } while (LLocal1 >= 0);//LoopEndWh 434D ((PUC *)Redraw_PointerToColourMap)[0] = BorderDrawParamColour; return();// 4355 };
This code (just 12 lines of assembly code) compiles to a 672-byte C64 .PRG file.
It hasn't yet been tested but this is to follow in a couple of days.
For now, just bask in the glory!
This is part of Decompiler Tech's new strategy on decompilation: Selective recompilability.
The goal is to make it possible to recompile the code output by RevEngE. But the twist on a normal strategy is that people can recompile just the subroutines they want to deal with, and patch them so the new code is called instead of the old code. Rather than decompiling the entire program.
And this strategy also means that for a fully-defined decompiler module such as 6502, if we get it working with 6502, it will follow quite easily for any CPU we can fully define.
As a side note, 'define' means every CPU opcode is accounted for and defined. 6502 has all opcodes in the module, but of course, x86 doesn't (and Java soon will).
We've started working on a Java module for RevEngE (decompiling Java, not running it). After 2 days, we've got a basic sketch that fits entirely within the RevEngE API. The module is about 150 lines of code. It can handle loops, loop variables, and the stack-ish stuff that Java does a lot.
The stack is emulated the same as normal (eg, x86 stack), by changing just 15 lines of main engine code.
In a few days, it should be good enough for more ambitious input code.
Decompiled Java code: public static void main(java.lang.String[]) { int BLocal1; java.lang.System.out.println("Hello, World"); float LLocal1 = 0.0; do { if (50.0 <= LLocal1) { break; }; java.lang.System.out.println(LLocal1); LLocal1 = (1.0 + LLocal1); }//LoopEnd 1C String local2 = "hi there"; java.lang.System.out.println(local2.charAt(3)); int local4 = 2; BLocal1 = 1; if ((2 == local4)) { BLocal1 = 1; } else { BLocal1 = 2; }//EndIF; 3F java.lang.System.out.println(BLocal1); return();// 47 }; Original Java code: public static void main(String[] args) { // Prints "Hello, World" to the terminal window. System.out.println("Hello, World"); for (float i = 0; i < 50; i++) { System.out.println(i); } int f = 2.0; String tempstr = "hi there"; System.out.println(tempstr.charAt(3)); int blocal = 1; if (f == 2) { blocal = 1; } else { blocal = 2; } System.out.println(blocal); }
One of the things the RevEngE decompiler does is find which registers a function has as its input, and also which registers the function outputs.
if ((Global163 + Global138) & 1) { usub_B6A1(207,1,0); }
B6A1 has as its function header: void tirnanog.usub_b6a1.txt(int Arg_acc,int Arg_y,int Arg_x)
There's a problem in decompilation. The first problem is when people use goto's in their code, and the second problem is the way conditional blocks made from || and && create crazy goto's.
So the only solution is to recognise when to use goto's, and that must be as rare as possible.
We recognise goto's that are things like:
Whenever they occur, we put a goto in the decompiled code, so that the program is readable most of the time, and goto's are only used in extreme cases.
A goto used in a conditional block, can actually, in some cases, be turned back into the original conditions, so goto's can be optimised out this way.
Just another example of how we get readable, bug-free code for 99.5% of the time, and leave the nasty stuff for the last 0.5%.
There are now listings of Global variables, saying which function reads or writes to them.
See the writes here and the reads here.
The new issue of C= Free is out, and Decompiler Tech is mentioned! Which is great!
Also in this issue is the Crack and Train Like a Pro which has a great tip for finding the start address of a C64 program: Hunt for these bytes: H 0000 0800 A9 37 85 01
In Frodo/SuperSAM, that's basically q 0000 ffff a9 37 85 01.
I just found out the start address for Super G-Man! Great tip.
Because the decompiler simulates the code, it should be possible to find function calls that were previously hidden, and also trace variables across the code base. This should make it useful for analysing binaries for malicious code.
If a program (or game) has its own internal bytecode (like assembly code), people say you can't decompile it. But you can!
First you decompile the program and figure out the bytecodes it uses.
Then you disassemble the bytecode and put THAT into RevEngE. So it'll decompile the bytecode!
The new version of the decompiler is called RevEngE. This stands for: Reverse Engineering Emulator (not Engine!).
The first version of our decompiler, called VBRB (VB Right Back) was written in C++. But it was rewritten after the author learned Scheme at University. The rewrite was from the ground up in Common Lisp.
Advantages include complex data structures made easy, and it's as fast, if not faster, than C++.
Another advantage is memory usage. I found out C++ was hogging tons of memory in various strings, maps and vectors, and Lisp is much better (around 12MB with a typical decompiled binary).
More great news for VB5/6 native code customers.
We have fully resurrected our VB5/6-specific code. It's now working with the new engine (see below). The great thing is the new engine means the decompiled VB code is now quite a bit more accurate than before.
The good news is that all the code dealing with VB5/VB6 native executables, has been merged back into the most recent decompiler engine.
Therefore, even better news than usual: We can now decompile any VB5/VB6 native executables! And even better, all the improvements to structure of your programs are available for VB5/VB6!
Feel free to send us a message regarding VB5/VB6. Don't worry if you don't know if it's native or P-code, we'll tell you that for free!
It took just 3 days, but we ported RevEngE to deal with 68000 code. We took a small sample C program (DeHex from Fred's Fish Disks) and implemented all the opcodes, without any custom opcodes needed (except movem but that's not part of the actual code). About 18 opcodes were used in this program.
It's the same as 6502. There are NO registers in the code, and just a few special variable types, which aren't changed from 6502 (or even x86).
This has just been done, and could take a couple of weeks to implement the rest of the opcodes and operand data types. But this is the future!
Sample here.
See here for downloadable binaries.