1. How operating system works ?

Before going into the details of assembly language, it is interesting to have a small refresh of the basic operation of the organization of a system.

1.1 Control Unit

The control unit (CU) is a component of a computer's central processing unit (CPU) that directs the operation of the processor. It tells the computer's memory, arithmetic/logic unit and input and output devices how to respond to a program's instructions.

It directs the operation of the other units by providing timing and control signals. Most computer resources are managed by the CU. It directs the flow of data between the CPU and the other devices. In modern computer designs, the control unit is typically an internal part of the CPU with its overall role and operation unchanged since its introduction

1.2 Arithmetic and Logic Unit (ALU)

An arithmetic logic unit (ALU) is a combinational digital electronic circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on floating point numbers. An ALU is a fundamental building block of many types of computing circuits, including the central processing unit (CPU) of computers, FPUs, and graphics processing units (GPUs). A single CPU, FPU or GPU may contain multiple ALUs.

The inputs to an ALU are the data to be operated on, called operands, and a code indicating the operation to be performed; the ALU's output is the result of the performed operation. In many designs, the ALU also has status inputs or outputs, or both, which convey information about a previous operation or the current operation, respectively, between the ALU and external status registers.

1.3 The register

The register are very fast storage location inside the processor itself.
There are many registers including:

  • Memory Address Register (MAR) : hold the address of the location in memory
  • Memory Data Register (MDR) : hold data just read from or written to memory
  • Program Counter (PC) : Hold the address of the next instruction to be fetched
  • Instruction Register (IR) : Hold the current instruction being executed
  • General Purpose Register : can be used by programmer

2. Organization of a process.

Now that we have a better understanding of how a CPU works, it is interesting to look at a Linux system organizing a process.
What is a process?
A process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.

A computer program is a passive collection of instructions, while a process is the actual execution of those instructions. Several processes may be associated with the same program; for example, opening up several instances of the same program often means more than one process is being executed.

On a Linux system, there are several ways to view information about a process:

  • /proc
  • pmap
  • with GDB

When executing a program, a unique identifier called PID is given to the program by the Kernel. All information regarding this process is identifiable in the "/ proc / $ pid /" folder. Let's take a closer look at the contents of the directory.

$ ping localhost&
[1] 31089

$ ls /proc/31089/
attr       cgroup      comm             cwd      fd       io        map_files  mountinfo   net        oom_adj        pagemap      root       setgroups     stack  status   timers         wchan
autogroup  clear_refs  coredump_filter  environ  fdinfo   limits    maps       mounts      ns         oom_score      personality  schedstat  smaps         stat   syscall  timerslack_ns
auxv       cmdline     cpuset           exe      gid_map  loginuid  mem        mountstats  numa_maps  oom_score_adj  projid_map   sessionid  smaps_rollup  statm  task     uid_map

As we can see, the folder contains a lot of files but we will look at the mapping memory of our process.

$ cat /proc/31089/maps

555555554000-555555562000 r-xp 00000000 fe:02 24647249                   /usr/bin/iputils-ping
555555761000-555555762000 r--p 0000d000 fe:02 24647249                   /usr/bin/iputils-ping
555555762000-555555763000 rw-p 0000e000 fe:02 24647249                   /usr/bin/iputils-ping
555555763000-5555557a7000 rw-p 00000000 00:00 0                          [heap]
7ffff7024000-7ffff702f000 r-xp 00000000 fe:02 24650579                   /usr/lib/libnss_files-2.26.so
7ffff702f000-7ffff722e000 ---p 0000b000 fe:02 24650579                   /usr/lib/libnss_files-2.26.so
7ffff722e000-7ffff722f000 r--p 0000a000 fe:02 24650579                   /usr/lib/libnss_files-2.26.so
7ffff722f000-7ffff7230000 rw-p 0000b000 fe:02 24650579                   /usr/lib/libnss_files-2.26.so
7ffff7230000-7ffff7236000 rw-p 00000000 00:00 0 
7ffff7236000-7ffff73e1000 r-xp 00000000 fe:02 24650591                   /usr/lib/libc-2.26.so
7ffff73e1000-7ffff75e1000 ---p 001ab000 fe:02 24650591                   /usr/lib/libc-2.26.so
7ffff75e1000-7ffff75e5000 r--p 001ab000 fe:02 24650591                   /usr/lib/libc-2.26.so
7ffff75e5000-7ffff75e7000 rw-p 001af000 fe:02 24650591                   /usr/lib/libc-2.26.so
7ffff75e7000-7ffff75eb000 rw-p 00000000 00:00 0 
7ffff75eb000-7ffff75fe000 r-xp 00000000 fe:02 24650563                   /usr/lib/libresolv-2.26.so
7ffff75fe000-7ffff77fe000 ---p 00013000 fe:02 24650563                   /usr/lib/libresolv-2.26.so
7ffff77fe000-7ffff77ff000 r--p 00013000 fe:02 24650563                   /usr/lib/libresolv-2.26.so
7ffff77ff000-7ffff7800000 rw-p 00014000 fe:02 24650563                   /usr/lib/libresolv-2.26.so
7ffff7800000-7ffff7802000 rw-p 00000000 00:00 0 
7ffff7802000-7ffff79ac000 r-xp 00000000 fe:02 24661577                   /usr/lib/libcrypto.so.42.0.0
7ffff79ac000-7ffff7bab000 ---p 001aa000 fe:02 24661577                   /usr/lib/libcrypto.so.42.0.0
7ffff7bab000-7ffff7bc8000 r--p 001a9000 fe:02 24661577                   /usr/lib/libcrypto.so.42.0.0
7ffff7bc8000-7ffff7bce000 rw-p 001c6000 fe:02 24661577                   /usr/lib/libcrypto.so.42.0.0
7ffff7bce000-7ffff7bd2000 rw-p 00000000 00:00 0 
7ffff7bd2000-7ffff7bd6000 r-xp 00000000 fe:02 24647647                   /usr/lib/libcap.so.2.25
7ffff7bd6000-7ffff7dd6000 ---p 00004000 fe:02 24647647                   /usr/lib/libcap.so.2.25
7ffff7dd6000-7ffff7dd7000 r--p 00004000 fe:02 24647647                   /usr/lib/libcap.so.2.25
7ffff7dd7000-7ffff7dd8000 rw-p 00005000 fe:02 24647647                   /usr/lib/libcap.so.2.25
7ffff7dd8000-7ffff7dfd000 r-xp 00000000 fe:02 24650573                   /usr/lib/ld-2.26.so
7ffff7fd8000-7ffff7fdc000 rw-p 00000000 00:00 0 
7ffff7ff7000-7ffff7ffa000 r--p 00000000 00:00 0                          [vvar]
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0                          [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00024000 fe:02 24650573                   /usr/lib/ld-2.26.so
7ffff7ffd000-7ffff7ffe000 rw-p 00025000 fe:02 24650573                   /usr/lib/ld-2.26.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]

Legends:

Header Description
Address Start and end address of the section
Permission Permission on the section:
  • r = readable
  • w = writable
  • x = executable
  • p = private not shared
  • s = shared
Offset Offset in file for memory mapped files. 0 otherwise
Device Major – Minor device number of device from where the file is loaded
Inode Inode number
Path File path

3. Registers

Now that we have a little more knowledge about CPU operation and process management, it's time to start our adventure in the assembler world. We will first see the proposed registers for the x86_64 architecture and their utilities.

64 bits Lower 32 bits Lower 16 bits Higher 8 bits of 16 bits Lower 8 bits of 16 bits General purpose
RAX EAX AX AH AL General purpose / Function return value / Syscall number / Arithmetic operation result
RBX EBX BX BH BL General purpose
RCX ECX CX CH CL General purpose / Loop counter / Fourth parameter of function
RDX EDX DX DH DL General purpose / Rest of the multiplications and divisions / Third parameter of function
RSI ESI SI N/A SIL String source / Second parameter of function
RDI EDI DI N/A DIL String destination / First parameter of function
RSP ESP SP N/A SPL Stack pointer
RBP EBP BP N/A BPL Stack base pointer
RIP N/A N/A N/A N/A Next instruction to be executed
R8 R8D R8W N/A R8B General purpose / Fifth parameter of function
R9 R9D R9W N/A R9B General purpose / Sixth parameter of function
R10 R10D R10W N/A R10B General purpose
R11 R11D R11W N/A R11B General purpose
R12 R12D R12W N/A R12B General purpose
R13 R13D R13W N/A R13B General purpose
R14 R14D R14W N/A R14B General purpose

4. EFLAGS

EFLAGS is a registry used as a collection of bits representing Boolean values to store the results of operations and the state of the processor.

EFLAGS
For example, the FLAG "ZF" is set to 1 when a comparison is made and the results are identical. We will see in more detail the use of the FLAGS during this tutorial.
When using gdb with the PEDA extension, here is what you can see:

gdb_eflags

Thanks to extensions, it becomes easier to find interesting information.

5. Sections / Segments

The primary purpose of segment registers is to keep the location of specific segments in virtual memory. Each 16-bit register may contain the location of a segment such as the code segment, held by the CS register.
This register can then be used by the processor to find out where the code is in memory and access the offset accordingly. Because segment registers are only 16 bits wide, they are only able to reference the offset of a loading address for a given process. Segmentation is unnecessary in 64-bit systems; however, registries such as FS are important for pointing to the structural data of Windows.

There are 6 segment registers:

Segment Description
Code Segment (CS) The code segment (CS) contains the executable instructions of an object file. The CS is sometimes called the text segment. Because the CS has read and execute permissions, but not write permission, multiple instances of the program can run concurrently. The code segment register often points to an offset containing the start address of the executable code for a given process
Stack segment (SS) The Stack Segment (SS) register keeps the location of the stack procedure. Specifically, the SS register generally points to an address in memory on the stack, while the stack pointer (RSP) points to the top of the Stack Frame in use.
Data segment (DS) There are four segment registers with the ability to point to different segments of data. The four registers are the data segments (DS), the additional segments (ES), FS and GS. FS has a notable use with Windows

5.1. Sections

An assembler program can be divided into 3 three distinct sections:

Section Description
Text section Contains program code
Data section Contains initialized variables
BSS section Contains uninitialized variables

6. References

  1. Brief introduction to CPU operation
    Inside CPU
  2. Introduction to x64 by intel
    Intel x64 introduction