1. Creating a shellcode

In order to create our shellcode, I recommend you to go Part 3: NASM anatomy / Syscall / Passing argument to know the order of the parameter passing of a function.
As said before, it is essential that our shellcode is as short as possible but especially that it contains no NULL Byte.

We will see how to create a useless shellcode but that will allow you to better understand how to remove NULL Bytes and I encourage you to modify the instructions to try to create an even smaller shellcode.

Take our Exit shellcode:

global _start

section .text
_start:
    mov rax, 60 ; exit syscall number
    mov rdi, 0x12 ; exit value
    syscall

Let's look with objdump what gives our program.

$ objdump -M intel -d a 

a:     file format elf64-x86-64

Disassembly of section .text:

0000000000400080 <_start>:
  400080:   b8 3c 00 00 00          mov    eax,0x3c
  400085:   bf 12 00 00 00          mov    edi,0x12
  40008a:   0f 05                         syscall 

We notice the presence of NULL Byte, which is not good.

1.1. How to remove NULL Bytes ?

To remove NULL Bytes, it is necessary to understand how the assembler works. Concretely, in our program, we put the value 60 in our RAX register. Which means that we use only 8 bits on the available 64 bits. That's why we have padding with NULL Bytes.
To correct this, just use the registers correctly.

global _start
section .text

_start:
    xor rax, rax ; set rax to 0
    mov al, 60 
    xor rdi, rdi 
    mov dil, 0x12
    syscall

If we look at what assembly code gives when compiled:

$ objdump -M intel -d a

a:     file format elf64-x86-64

Disassembly of section .text:

0000000000400080 <_start>:
  400080:   48 31 c0            xor    rax,rax
  400083:   b0 3c               mov    al,0x3c
  400085:   48 31 ff            xor    rdi,rdi
  400088:   40 b7 12            mov    dil,0x12
  40008c:   0f 05               syscall

Bingo! We no longer have NULL Byte.

1.1.1. Exercice

It is possible to further reduce the size of our shellcode, so I leave you the pleasure of finding a way :).

1.2. Extract our shellcode from the compiled program

To convert machine code to "shellcode", I advise you to use this function of CommandLineFu.com::

function objdumptoshellcode (){
    for i in $(objdump -d $1 -M intel |grep "^ " |cut -f2); do echo -En '\x'$i; done;echo 
}

If we use it, it gives:

$ objdumptoshellcode a
\x48\x31\xc0\xb0\x3c\x48\x31\xff\x40\xb7\x12\x0f\x05

We will now try to execute our shellcode:

#include <stdio.h>
#include <string.h>

unsigned char shellcode[] = \
"\x48\x31\xc0\xb0\x3c\x48\x31\xff\x40\xb7\x12\x0f\x05";

main()
{

    printf("Shellcode Length:  %d\n", strlen(shellcode));

    int (*ret)() = (int(*)())shellcode;

    ret();

}

We run our program and Bingo, everything works fine. We have just created our first functional shellcode.

1.3. Shellcode development technique

There are several shellcode development techniques, but you should know that 2 are very often used:

  • JMP CALL POP Technique
  • STACK Technique

1.3.1. JMP CALL POP Technique

Before explaining the operation, here is a skeleton program making a hello:

    jmp call_shellcode

    shellcode:
        pop rsi
        ...
        ...

    call_shellcode:
        call shellcode
        hello: db "HELLO", 0xa

If you have some assembler knowledge, you know the mechanism behind a call. For others here is a small explanation.

When we do "call shellcode", the instruction sets the address of the next instruction on the Stack. In our program, we have the address of our message on the Stack.
That's why we make a "pop rsi". Thus, the RSI register contains the address of our message.

We do these actions because in a shellcode, we must not have a hardcoded address. Remember that we are injecting our shellcode into a program that is already running, so we can not know the addressing.

global _start

section .text
_start:
    jmp call_shellcode

shellcode:
    pop rsi
    xor rax, rax
    xor rdi, rdi
    xor rdx, rdx

    mov al, 0x1
    mov dil, 0x1
    ; rsi already set
    mov dl, 0x6
    syscall

    ; exit
    mov al, 60
    syscall

call_shellcode:
    call shellcode
    hello: db "Hello", 0xa

By repeating the steps seen previously, it is contested that the shellcode works correctly. Perfect.

1.3.2. Stack technique

One of the advantages of this technique over the one previously seen is the size of the shellcode. Indeed, one using the Stack, the shellcode will be much shorter.
Important points for the use of this technique:

  • We must put our string on the stack in reverse order. This is due to the evolution of the addresses of the stack that goes from high addresses to low addresses
  • We need a reference from our RSP
  • Think about endianness
global _start

section .text
_start:
    xor rax, rax
    xor rdi, rdi
    xor rdx, rdx

    push rdx
    push 0x0a6f6c6c
    push word 0x6548
    mov al, 0x1
    mov dil, 0x1
    mov rsi, rsp
    mov dl, 6
    syscall
    mov al, 60
    syscall

1.3.3. RIP Relative Addressing Technique

The use of the x86-64 architecture brings a new shellcode development technique. Indeed, a new instruction has been added for this architecture, namely: REL. This instruction allows:

  • write the code independent of the position (PIC)
  • calculated address relative to RIP
  • lea rsi, [rel label]

This is what our shellcode looks like using this technique:

global _start

_start:
    jmp main
    hello: db "Hello", 0xA

main:
    xor rax, rax
    xor rdi, rdi
    xor rdx, rdx

    mov al, 0x1
    mov dil, 0x1
    lea rsi, [rel hello]
    mov dl, 6
    syscall
    mov al, 60
    syscall

We have to make a jmp and declare our string before our main, because the jump is negative, there would be NULL Bytes to compensate.

If we look at what this code gives once compiled:

/tmp > objdump -M intel -d a

a:     file format elf64-x86-64

Disassembly of section .text:

0000000000400080 <_start>:
  400080:   eb 06                   jmp    400088 <main>

0000000000400082 <hello>:
  400082:   48                      rex.W
  400083:   65 6c                   gs ins BYTE PTR es:[rdi],dx
  400085:   6c                      ins    BYTE PTR es:[rdi],dx
  400086:   6f                      outs   dx,DWORD PTR ds:[rsi]
  400087:   0a                      .byte 0xa

0000000000400088 <main>:
  400088:   48 31 c0                xor    rax,rax
  40008b:   48 31 ff                xor    rdi,rdi
  40008e:   48 31 d2                xor    rdx,rdx
  400091:   b0 01                   mov    al,0x1
  400093:   40 b7 01                mov    dil,0x1
  400096:   48 8d 35 e5 ff ff ff    lea    rsi,[rip+0xffffffffffffffe5]        # 400082 <hello>
  40009d:   b2 06                   mov    dl,0x6
  40009f:   0f 05                   syscall
  4000a1:   b0 3c                   mov    al,0x3c
  4000a3:   0f 05                   syscall

We note the reference with RIP to determine the position of our string character.
I think you have all the information you need to build your own shellcodes. By the way, here are some exercise ideas.