Map and projects (the most frequently updated page of this blog)


It's just a flesh wound

Section-less PE file (updated)
You may not expect a PE to be valid without all its standard structure:
Dos Header, Nt Headers, File Header, Optional Header, Data Directories, Section Headers.
TinyPE already proved that the Data directories are not compulsory, but also sections are not always required.

If the alignment is smaller than 1000h (800h or less), and the number of section is null, the loader loads the file directly as-is (RVA = Offset). And since the number of section is null, you don't need a section table altogether.


din nebunia de culori, vreau sa aleg si alte flori

a PE Headers graph
If you're looking for a good representation of the PE format, OpenRCE's poster from Ero Carrera is the standard.
I gave it a try making my own representation, and started a multi-page one, lighter to open, easier to print (A4-formatted), where elements are shown differently depending on their importance.


On aura plus de pain sur la planche, parce que la planche aura brûlé

messing with sections physical offset
With a high alignment (>= 1000h), nothing prevents 2 sections to come from the same physical data.
Thus, if 2 sections with different virtual addresses have the same PointerToRawData and SizeOfRawData, their content will be initially the same. Relocations and imports will be applied afterward though.
.VirtualSize dd Section0Size
.VirtualAddress dd Section0Start - IMAGEBASE
.SizeOfRawData dd Section0Size
.PointerToRawData dd Section0Start - IMAGEBASE
.VirtualSize dd Section0Size *same
.VirtualAddress dd Section1Start - IMAGEBASE
.SizeOfRawData dd Section0Size *same
.PointerToRawData dd Section0Start - IMAGEBASE *same


Policeman got no gun, U don't have 2 run

SMSW based anti-emulator/stepping
SMSW (store machine status word) stores the 16 lowest bits of cr0 in the operand register. In the case of SMSW with a reg32, the highest word is not defined - it seems to be always 8001h, though.

It makes it a weird reg32 opcode (why accepting a 32b operand if you undefine the highest bits and if there is a 16b operand counterpart ?) but it definitely changes the highest word (some disassembler show invariably a word operand, which is wrong).

While 'mov eax, cr0' is a priviledged instruction, SMSW isn't.


The hen never laid and the corn never growed

anti-* with the GS register
On thread switch, the GS register value is not restored (32 bits only).
It's a simple statement that leads to anti-* (debugger/tracing/emulator) that defy common sense. (one of my favorite anti-*, since it doesn't call any API and requires to think out of the box).

When stepping, threads are switched, so your debugger might lose the right value.
Try it yourself:
  1. open debugger
  2. set GS to a non-zero value
  3. step, even once
  4. GS might be zero already!

so it's makes an easy anti-stepping:


You can rock this land, baby

the other subsystems
In your Windows directory, most drivers have many sections, including the PAGE and INIT ones, where the EP is. All this is pretty scary, while, in the end, only a very small amount of information (compared to a GUI PE) is necessary to create a working driver:
as expected, the Subsystem has to be set to NATIVE, then relocations are compulsory since you can't tell in advance where the driver will be loaded, and a correct PE checksum is required to have the driver running.
And that's all!


Sail to the edge and I'd be there

Messing with the TLS
TLS, aka Thread Local Storage, is a way to execute some code before the EntryPoint or after ExitThread/ExitProcess.
the 10th Data Directory points to a structure, and one of the elements (VA, not RVA) points to null-terminated list of callbacks, which will be called one after the other.
This list is stored as VAs (it includes the ImageBase then), which makes it quite uncommon among the PE structures.
AddressOfCallBacks dd Callbacks ; VA
dd TLS
dd 0 ; null-terminated list

The size of the Data Directory is not taken into account. Some tool may ignore wrongly the TLS if it's not defined, though.

Callbacks are executed on (before) thread start and on (after) thread exit. However, (credits goes to Peter Ferrie and Kris Kaspersky here), TLS callbacks execution won't happen if no dll importing kernel32 is imported itself. So, if kernel32.dll is the only 'official' import (it doesn't mean it's the only dll in the program space), the callbacks are not executed.


If you got the money honey, we got your disease

Messing with the EntryPoint
In most files, the EP is in the first section. In many packers or file infecters, it will be in another section. It's actually common in the header itself (Upack, FSG), and sometimes (like - among others - in collapsed.asm), it's at RVA 0, in which case the MZ signature is just interpreted as dec ebp, pop edx, which is benign. Many packers just put some trampoline code at RVA 0, then the rest of the code further.
So, usually:
Section0 VA <= EntryPoint <= Section0 VA + Physical Size
and to a general extend:
0 <= EntryPoint <= SizeOfImage
But no check is actually done on the EntryPoint value!


With a rebel yell! more, more, more!

Description of a compiled PE header
In my previous posts, I started exploring PE Headers with a minimum amount of information (as opposed to the official specifications). On the other hand, standard compilers like MASM add more elements (not necessarily documented), on top of defining, as you would expect, mosts elements of the structures.

To understand things correctly, I assembled and linked a simple HelloWorld code source in Masm, and reproduce the complete structure of the executable with a YASM source (that defines every byte of the header manually).


Hey, hey, hey, what's in your head?

PE Header holes / filling them
Since the PE loader in Windows is too flexible, most of the PE Header information can be discarded.
As the Tiny PE project proved, it's possible to get a 97 bytes PE! It also proved a valid PE can't be smaller, as 97 bytes is the minimum size to fit all the structures until OPTIONAL_HEADER.Subsystem, the last compulsory element.

In my short one-section file header (which I use in my helloworld.asm example), I define a minimum (not an absolute minimum, though) amount of elements of the PE structure, to have a file with Imports, Section and EntryPoint (none of them is strictly necessary):
e_magic (constant)
Signature (constant)
Machine (almost constant)
NumberOfSections (not strictly necessary)
Magic (almost constant)
AddressOfEntryPoint (not strictly necessary)
MajorSubsystemVersion (almost constant)
NumberOfRvaAndSizes (not strictly necessary)
ImportsVA (not strictly necessary)
VirtualAddress (not strictly necessary)
SizeOfRawData (not strictly necessary)
PointerToRawData (not strictly necessary)


They say jump, you say how high

Various ways of JMPing
jumping, aka branching, is one of the most common operations.

I wrote a file that implements many forms of jumping, whether they are common, obfuscated, or rare. Not everything is detailed in this post, check the source for further information.

First, Jumps,

EB 07 JMP SHORT 004000F9
E9 07000000 JMP 00400105
FFE7 JMP EDI ; 00400113
FF25 19014000 JMP DWORD PTR DS:[400119] ; 00400124
EA 32014000 1B00 JMP FAR 001B:00400132
FF2D 38014000 JMP FAR DS:[400138] ; DS:[00400138]=001B:00400145

then CALLs,

E802 CALL 00400103
9A 7C014000 1B00 CALL FAR 001B:0040017C


C3 RETN ; Return to 004001DE
CB RETF ; Return to 001B:004001EC
CF IRETD ; Return to 001B:004001FB, flags = 206



Storm warning, but there's no fear

relocater < mutater < virtualiser
I already wrote about a relocater and different kinds of virtual machines. Between the two of them, there is another kind of executable, simpler than virtual machine but particularly suitable for obfuscation:
a mutater, or polymorphic code.
Similar to virtual machines, some data represents the virtual code to execute. However, in this case, the architecture is strictly the same as the cpu. The main point of mutation is randomization. And if you add some junk code in the middle, you get what happens when virii modifies themselves from one file to the other.


And go where you're going to

To be able to create custom PEs, I wrote a simple script that helps with simple tasks like generating import structures, PE checksum and default values.

So, add all PE structures manually (or better, use the same one over and over), generate imports, and voila! you have a handmade PE file in which you control every byte.

I didn't extend (yet?) that script to Exports/Resource/Relocations/TLS/Sections, because I don't use them so often.
Also, different Section/File alignments are not supported. Once again, I don't really need it (often).

Source directory


PE maison

Pour pouvoir créer des PEs spéciaux, j'ai écrit un script simple, qui permet de faire des petites choses comme générer les structure des imports, calculer la checksum ou mettre des valeurs par défaut.

Donc, ajoutez les structures PE a la main (ou mieux, utilisez toujours le même en-tête), générez les imports, et voilà! vous avez un PE fait main, dans lequel vous contrôlez chaque octet.

Je n'ai pas (encore?) ajouté la gestion des Exports/Resource/Relocations/TLS/Sections, car je n'en ai pas besoin si souvent.
De même, les alignements Section/File différents ne sont pas possibles. Là aussi, je n'en ai pas besoin (souvent).

répertoire Source

Useless but original

An different form of junk code
You probably know about the overlapping instruction technique used to fool disassemblers:
due to the way x86 CPUs work, jumping over a E8 byte will make a bogus CALL instruction appear in the code.

if you use a longer instruction like IMUL, you can fit any instruction, so you can create a blocky piece of code.
So from the outside, whether from hex or from assembly, it looks quite blocky
EB 02 JMP SHORT 004000F4
69846A 40681C01 4000EB02 IMUL EAX,[EDX+EBP*2+11C6840],2EB0040
698468 22014000 9090EB02 IMUL EAX,[EAX+EBP*2+400122],2EB9090
69846A 00E81E00 0000EB02 IMUL EAX,[EDX+EBP*2+1EE800],2EB0000
69846A 00E81900 00005461 IMUL EAX,[EDX+EBP*2+19E800],61540000

while the execution trace looks almost normal:

You'll stumble in my footsteps

A different flow obfuscation: a relocater
I wrote a simple executable, implementing an idea by Piotr Krysiuk, where all routines are made to be executed at the same address. Because of that feature, following the flow is potentially difficult, and creating a direct dump could be annoying as no disassembler allow different pieces of code to be present at the same address.

To give you an example, here are the 2 functions of that binary upon their execution:
004000FA 6A 40 PUSH 40

004000FC 68 6E014000 PUSH 0040016E ; ASCII "Tada!"
00400101 68 74014000 PUSH 00400174 ; ASCII "Hello World!"
00400106 6A 00 PUSH 0
00400108 E8 55000000 CALL 00400162 ; MessageBoxA
004000FA 6A 00 PUSH 0
004000FC E8 67000000 CALL 00400168 ; ExitProcess


when CPUs have too many opcodes...

Back from my last post, to real machines, I decided to release as-is a YASM source that contains most x86 32bits opcodes, including SSE, AVX, FPU,...

My conclusion is that there are way too many!

You can use it just for curiosity or testing your favorite disassembler.

Source Code (Yasm)

the longest opcode (as a word) is
vaeskeygenassist xmm0, xmm0, 0
even though the recent
vbroadcastf128 ymm0, [0]
is not far behind.