1. Moving data
Before we see how we move data in ASM, when we read intel manual, we notice a very important point.
In 64-bit mode, the size of the operand determines the number of valid bits in the general destination register:
- 64-bit operands generate a 64-bit result in the general destination register
- The 32-bit operand generates a 32-bit result, extended to zero to a 64-bit result in the destination
- The 8-bit and 16-bit operand generates an 8 or 16-bit result. The higher 56 or 48 bits (respectively) of the destination general register are not modified by the operation. If the result of an 8-bit or 16-bit operation is for 64-bit address computation, explicitly extend the register to the full 64-bit.
Because the higher 32/64 bit general register is not set in 32-bit mode, the 32-bit of any general register is not retained when switching from 64-bit mode to 32-bit mode (in protected mode or compatibility mode) . The software must not depend on these bits to maintain a value after a 64/32 bit mode switch.
All this can be quite vague, but you will understand a little better when you practice, a little further in this tutorial.
1.1 What are the instructions for moving data?
In x32/x64, the most commonly used instruction is instruction:
- MOV
This instruction moves the data in different ways:
- mov register, register
- mov register, immediateValue
- mov immediateValue, register
- mov register, memory
- mov memory, immediateValue
- mov memory, register
It is also possible to move data using the instruction:
- XCHG.
This statement, as its name may indicate, allows for a swap:
- xchg register, memory
- xchg register, register
Finally, there remains a very important instruction, namely:
- LEA (Load Effective Address).
This instruction is used to load a pointer referenced by a label.
- lea rax, [label]
1.2 Practice
To better understand what is explained above, let's test the following code:
global _start
section .text
_start:
; mov register, immediate value
mov rax, 0xaabbccddeeff1122
mov eax, 0x33445566
mov ax, 0xaabb
mov al, 0x88
mov ah, 0x99
mov ecx, 0xaabbccdd
; mov register, register
mov rbx, rax
mov al, dl
mov dx, bx
; mov register, memory
mov rax, [var1]
mov ebx, [var2]
mov cx,[var3]
mov rbx, 0xbbccddeeff112233
; mov memory, register
mov [var1], rbx
mov word [var2], cx
mov dword [var1], ecx
; mov memory, immediateValue
mov byte [var1], 0xbb
; lea
lea rdx, [var1]
lea eax, [var2]
lea bx, [var3]
; xchg register, register
mov rax, 0xaabbccddeeff1122
mov rbx, 0x1122334455667788
xchg rax, rbx
; xchg register, memory
xchg rax, [var1]
section .data
var1: db 0x11, 0x22, 0x33, 0x44, 0x55, 0xAA
var2: dq 0xaabbccddeeff88
var3: dw 'hello'
2. How the Stack works ?
We will address here a very important aspect of the functioning of a program, namely the STACK.
The stack is a memory area with a variable size to store a lot of useful information. The main advantage of the stack is the speed of read / write access, which makes it an area of choice for storing temporary information, such as local variables, function parameters (x86).
The stack uses the LIFO (Last In First Out) mode, which means that the last element on the stack is the first element that will come out of it. You should know that the stack points high addresses to low addresses.
2.1 How to manipulate the stack?
To manipulate the stack, there are some assembly instructions:
- push
- pop
- all operation on RSP register (mov, add, sub, ect)
The entry point of the stack (Top of the stack) is pointed by the RSP register. This means that when an operation is performed on the stack, the register will automatically point to the top of the stack.
Here is a small diagram to understand the operation:
When adding data to the stack, thanks to the instruction PUSH, here is what happens:
And if you delete a data with the POP statement, the RSP pointer will return to the location of the first image.
2.2 Practice
As before it is important to carry out practical work, to do this, here is a small program.
section .text
global _start
_start:
; mov register, immediateValue
mov rax, 0x1122334455667788
push rax
push var1
push qword [var1]
pop rcx
pop rbx
section .data
var1: db 0xaa, 0xbb, 0xcc, 0xdd
3. Procedure + stack frame
Now that you're comfortable with how the stack works, there's another point about the stack, the procedures and the stack frame.
3.1 What is a procedure ?
A procedure is an instruction set that can be compared to a function in a higher level language. Like all functions, the procedure can be called from anywhere in the code.
To call a procedure, you simply have to do:
- call procedureName
When using procedures, we must not forget the ret statement, which allows you to return after the call and resume the execution of our program.
The ret statement is like making a pop rip. Before going further, we must understand the operation.
When the CPU sees a call in the program, it will automatically add on the Stack the return address, in order to return to the normal execution of our program.
3.1.1. Example
int add(int a, int b)
{
return a+b;
}
int main(int ac, char **av)
{
int a = 1;
int b = 2;
int c = 0;
c = add(a, b);
return 0;
}
We will analyze the operation with gdb:
Dump of assembler code for function main:
0x00000000004004a9 <+0>: push rbp
0x00000000004004aa <+1>: mov rbp,rsp
0x00000000004004ad <+4>: sub rsp,0x20
0x00000000004004b1 <+8>: mov DWORD PTR [rbp-0x14],edi
0x00000000004004b4 <+11>: mov QWORD PTR [rbp-0x20],rsi
0x00000000004004b8 <+15>: mov DWORD PTR [rbp-0xc],0x1
0x00000000004004bf <+22>: mov DWORD PTR [rbp-0x8],0x2
0x00000000004004c6 <+29>: mov edx,DWORD PTR [rbp-0x8]
0x00000000004004c9 <+32>: mov eax,DWORD PTR [rbp-0xc]
0x00000000004004cc <+35>: mov esi,edx
0x00000000004004ce <+37>: mov edi,eax
0x00000000004004d0 <+39>: call 0x400487 <add> <--------- PUT YOUR BREAKPOINT HERE
0x00000000004004d5 <+44>: mov DWORD PTR [rbp-0x4],eax
0x00000000004004d8 <+47>: mov eax,0x0
0x00000000004004dd <+52>: leave
0x00000000004004de <+53>: ret
End of assembler dump.
Now let's look at the state of our Stack:
gdb-peda$ x/30wx $rsp
0x7fffffffdc80: 0xffffdd88 0x00007fff 0x004003b0 0x00000001
0x7fffffffdc90: 0xffffdd80 0x00000001 0x00000002 0x00000000
0x7fffffffdca0: 0x004004e0 0x00000000 0xf7a44021 0x00007fff
0x7fffffffdcb0: 0x00040000 0x00000000 0xffffdd88 0x00007fff
0x7fffffffdcc0: 0xf7b9b088 0x00000001 0x004004a9 0x00000000
0x7fffffffdcd0: 0x00000000 0x00000000 0x452aded6 0x688910f0
0x7fffffffdce0: 0x004003b0 0x00000000 0xffffdd80 0x00007fff
0x7fffffffdcf0: 0x00000000 0x00000000
Now, we will enter the add function:
gdb-peda$ x/30wx $rsp
0x7fffffffdc78: 0x004004d5 0x00000000 0xffffdd88 0x00007fff
0x7fffffffdc88: 0x004003b0 0x00000001 0xffffdd80 0x00000001
0x7fffffffdc98: 0x00000002 0x00000000 0x004004e0 0x00000000
0x7fffffffdca8: 0xf7a44021 0x00007fff 0x00040000 0x00000000
0x7fffffffdcb8: 0xffffdd88 0x00007fff 0xf7b9b088 0x00000001
0x7fffffffdcc8: 0x004004a9 0x00000000 0x00000000 0x00000000
0x7fffffffdcd8: 0x452aded6 0x688910f0 0x004003b0 0x00000000
0x7fffffffdce8: 0xffffdd80 0x00007fff
Note that our stack points well on the instruction following the call.
This is a very important notion and will be fully used when developing shellcode.
3.2 How to pass arguments to a procedure ?
The transition from argument to is a procedure is relatively simple. They can be given in several ways:
- By Register
- By the Stack (Commonly used in x86)
- Passed as data structures in memory referenced by a register / on the Stack
3.3 What is Stack Frame ?
To allow many unknowns in the runtime environment, functions are often configured with a "stack frame" to allow access to function parameters, and variables in functions. The idea behind the stack frame is that each subroutine can act independently of its location on the stack, and each subroutine can act as if it were the top of the stack.
When a function is called, a new stack frame is created at the current RSP location. A stack frame acts as a partition on the stack. All elements of the above functions are higher on the stack and should not be changed. Each current function has access to the rest of the stack, from the image of the stack to the end of the stack. The current function always has access to the "top" of the stack, so functions do not need to consider memory usage by other functions or programs.
The stack frame is divided into two parts:
- Prolog => push rbp; mov rbp, rsp
- Epilog => mov rsp, rbp; pop rbp
4. Example
Let's compile the following C code and look at how it works:
int add(int a, int b)
{
return a+b;
}
int main(int ac, char **av)
{
int a = 1;
int b = 2;
int c = 0;
c = add(a, b);
return 0;
}
If we look at the function add with gdb, we see:
gdb-peda$ disass add
Dump of assembler code for function add:
0x0000000000400487 <+0>: push rbp ; prologue
0x0000000000400488 <+1>: mov rbp,rsp ; prologue
=> 0x000000000040048b <+4>: mov DWORD PTR [rbp-0x4],edi
0x000000000040048e <+7>: mov DWORD PTR [rbp-0x8],esi
0x0000000000400491 <+10>: mov edx,DWORD PTR [rbp-0x4]
0x0000000000400494 <+13>: mov eax,DWORD PTR [rbp-0x8]
0x0000000000400497 <+16>: add eax,edx
0x0000000000400499 <+18>: pop rbp ; epilog
0x000000000040049a <+19>: ret
End of assembler dump.
5. Practice
Well now we're going to create our own assembler procedure to better understand what's going on:
section .text
global _start
_start:
mov rax, 0x1122334455667788
mov rsi, 0x6f6c6c6568 ; hello
push rsi
mov rsi, rsp
call myproc
;exit
mov rax, 60
syscall
myproc:
push rbp
mov rbp, rsp
xor rax, rax
mov rax, 0x1
mov rdi, 0x1
mov rdx, 0x5
syscall
; mov rsp, rbp
; pop rbp
leave
ret