Leveraging glibc in exploitation - Part 2: Fingerprinting glibc
In part one of this series, we examined the GNU C library and its relationship with a program and the operating system. We also reviewed tools and methods for figuring out important details such as glibc’s version and where it is loaded in a program’s memory at runtime. In this part, we will look at what is involved in locating glibc in memory by leveraging a program’s memory layout, and identifying glibc’s version based on code loaded in memory.
Table of contents
Posts in this series
- Part one: What is glibc?
- Part two: Fingerprinting glibc
- Part three: Defenses
- Part four: An example
Dealing with address space layout randomization (ASLR)
Before we can begin poking a program or glibc at the binary level, we need to discuss ASLR and why it makes doing so challenging.
ASLR is a general strategy for randomizing where memory regions are mapped in user space processes. A memory-mapped region can be anything from a dynamically linked library, to the call stack or heap space. Implementations of ASLR vary from one operating system to another, which means each implementation has its nuances and quirks. For example, Jacob Thompson’s Mandiant blog post describes how Microsoft’s design choices led to several peculiar behaviors in Windows' ASLR implementation. 1
As discussed in part one, dynamically linked libraries are mapped into the memory of a dependent program. The kernel maps library code into reserved segments of a program’s memory, which allows that code to be referenced by memory address. On Linux, ASLR effectively randomizes where those memory regions are mapped each time the dependent program runs. This makes it difficult for a hacker to predict the memory addresses of useful code without risking a memory segmentation violation. Or, in other words, accessing memory that has not been allocated to the process.
Like most “mitigations”, ASLR attempts to address a symptom of a problem rather than the underlying issue itself. That being if a hacker exploits a memory management bug, ASLR should make it very difficult to predict where any hacker-controlled code or other useful memory-mapped data will appear in memory. This theoretically increases the monetary and time costs required to research and develop a successful exploit.
It is not controversial to say ASLR is just another hindrance to skilled exploit developers. In Brad Spengler’s 2013 blog post on kernel space ASLR, he states:
ASLR was always meant to be a temporary measure and its survival for this long speaks much less to its usefulness than our inability to get our collective acts together and develop/deploy actual defenses against the remaining exploit techniques. 2
Besides being an apparent headache, ASLR practically created its own niche market of bugs and research. While this post will only explore ASLR on Linux at a high-level, there are many excellent papers that discuss bypassing ASLR. I recommend checking out “return-to-csu: A New Method to Bypass 64-bit Linux ASLR” by Doctors Hector Marco and Ismael Ripoll. 3 Their research takes a look at ASLR on x86 architectures, as well as a potential technique to bypass it using glibc.
“From IP ID to Device ID and KASLR Bypass” by Amit Klein and Benny Pinkas, as well as “Remote iPhone Exploration Part 2: Bringing Light into the Darkness – a Remote ASLR Bypass” by Samuel Groß are also worth checking out. 4 5 Those researchers explore side channels and the limitations of different operating systems' architectures to bypass K/ASLR remotely.
ASLR on Linux
Linux offers a runtime lever to control ASLR in the form of a pseudo file
located at /proc/sys/kernel/randomize_va_space
. Writing to this pseudo file
will change how ASLR behaves, and reading from it will return the current ASLR
mode. These modifications only persist until the machine is rebooted. There are
three possible values that can be written. These are documented in the Linux
kernel sysctl documentation under randomize_va_space
:
0 - Turn the process address space randomization off. This is the default for architectures that do not support this feature anyways, and kernels that are booted with the “norandmaps” parameter.
1 - Make the addresses of mmap base, stack and VDSO page randomized. This, among other things, implies that shared libraries will be loaded to random addresses. Also for PIE-linked binaries, the location of code start is randomized. This is the default if the CONFIG_COMPAT_BRK option is enabled.
2 - Additionally enable heap randomization. This is the default if CONFIG_COMPAT_BRK is disabled. 6
One quick way to view ASLR in action is with /proc/self/maps
. If you
recall from the previous part, the /proc
pseudo-filesystem provides
a convenient interface into the memory mappings of a process. The magic self/
directory aliases to the current process. This can be combined with the
grep
program to view its memory mappings when ASLR is enabled or disabled:
# With ASLR disabled:
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
0
$ /usr/bin/grep libc /proc/self/maps
7ffff7dd3000-7ffff7df5000 r--p 00000000 fd:03 15207062 /usr/lib/x86_64-linux-gnu/libc-2.31.so
# ...
$ /usr/bin/grep libc /proc/self/maps
7ffff7dd3000-7ffff7df5000 r--p 00000000 fd:03 15207062 /usr/lib/x86_64-linux-gnu/libc-2.31.so
# ...
$ /usr/bin/grep libc /proc/self/maps
7ffff7dd3000-7ffff7df5000 r--p 00000000 fd:03 15207062 /usr/lib/x86_64-linux-gnu/libc-2.31.so
# ...
As you can see from the output above, the C library is loaded at the same
address (0x7ffff7dd3000
) during each execution of grep
when ASLR
is disabled.
Now, go ahead and enable ASLR and run grep
again:
# With ASLR enabled:
$ echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
2
$ /usr/bin/grep libc /proc/self/maps
7f41c6bf7000-7f41c6c19000 r--p 00000000 fd:03 15207062 /usr/lib/x86_64-linux-gnu/libc-2.31.so
# ...
$ /usr/bin/grep libc /proc/self/maps
7fbeb9f31000-7fbeb9f53000 r--p 00000000 fd:03 15207062 /usr/lib/x86_64-linux-gnu/libc-2.31.so
# ...
$ /usr/bin/grep libc /proc/self/maps
7f6d33a04000-7f6d33a26000 r--p 00000000 fd:03 15207062 /usr/lib/x86_64-linux-gnu/libc-2.31.so
# ...
You may have noticed a pattern between the ASLR’d addresses. Let’s take a closer look at these addresses:
# Note: Addressess have been padded with zeros because they are 64-bit
# integers and "maps" does not include the leading zeros. I also added
# spaces and the number of bits to make the randomness easier to identify.
#
# Bits: 8 16 24 32 40 48 56 64
# 0x00 00 7f 41 c6 bf 70 00
# 0x00 00 7f be b9 f3 10 00
# 0x00 00 7f 6d 33 a0 40 00
As you can see, not all bits in each address are randomized.
Why is the glibc object being loaded at partially-randomized locations?
Partly because that is just how the Linux kernel works. User space starts at
0x00007fffffffffff
, growing downward with new allocations. 7 While the
deeper mechanics of this are outside the scope of this blog post, I did spend
quite some time trying to understand this behavior. Marco Bonelli wrote an
excellent summary on stackoverflow.com on the subject. 8
At the time of writing this post, the general assessment is that libraries
will be loaded at addresses starting with 0x00007f
, and program data (such
as functions) at addresses starting with 0x000055
. In any case, this means
the upper 24 bits of a memory-mapped library will be predictable.
As for the lower 12 bits, Linux maps libraries on memory page boundaries. 9
10 Since the (usual) page size of 4096 bytes is 12 bits, the kernel would
not be able to guarantee memory-mapped page alignment if those lower 12 bits
were randomized (the page size can be retrieved with getconf PAGE_SIZE
).
You can test this by converting one of the memory addresses to base 10 and
dividing it by the memory page size in bytes. The resulting number will have
no remainder.
This leaves us with 28 bits of entropy (64 - 24 - 12). While that is still a lot of bits of entropy, you can see how this might be problematic on a 32-bit CPU, which has similar constraints and less entropy. Keep in mind many IoT devices and embedded computers still use 32-bit CPUs. Such constraints still play a role in modern computers.
Another consequence of ASLR on Linux is that it only influences where “objects” (like the call stack or heap) are mapped. In other words, ASLR cannot slice up a dynamically linked library and map arbitrary function to random locations. This means leaking the mapped location (or “base address”) of a library will allow us to locate other pieces of code deterministically if we know their offsets relative to the beginning of the object.
Abusing the call stack
One of the most reliable ways to bypass ASLR is through information leaks that reveal the addresses of specific memory segments. The “call stack” (often called “the stack”) is a good target for such leaks. This is where data related to the process' execution state resides. One way to think of this is to imagine the call stack like a tower of Jenga blocks, where each block in the stack of blocks is a statement in your program’s source code. The current executing function’s context is just a small slice of the call stack, and it is identified by beginning and end memory addresses stored in CPU registers. State from a parent function can be resumed by simply changing the beginning and end memory addresses stored in the corresponding CPU registers.
Even with ASLR, the layout of the call stack will be similar between executions of the program - regardless of the computer. This is because the size of data stored on the call stack must be known at a program’s compile time. The layout of the call stack is effectively part of the compiled program. Dynamically sized data is stored elsewhere on the “heap”. Allocating heap memory is more computationally expensive, and not guaranteed to succeed. You can see why programmers may be tempted to store as much as they can on the call stack, increasing the likelihood of stack-based memory management mistakes.
The main consequence of this design is: if you leak data at a relative offset from the call stack, you can expect that data to be in the same relative location across executions of the program.
Another byproduct of this design is that the entire stack is readable by the code in the program. Imagine if your programming language of choice permitted you to read the value of a local variable from another function - that is effectively the capability a call stack-based information leak provides.
An example
Let’s take a look at this functionality using another example program: a very simple TCP listener created by Professors Bryant and O’Hallaron of Carnegie Mellon University 11:
src: tcpserver.c
(click to expand)
|
|
The program takes one argument: the port number to listen on. It then starts
a TCP listener on loopback at that port, reads some data from a TCP client,
and writes it to both standard output and back to the client. Compile it and
set a breakpoint with gdb at the call to printf
after read
on line 130:
# The "-g" will include debugging information in the resulting executable.
gcc -g -o tcpserver tcpserver.c
gdb ./tcpserver
# The following lines are executed in the gdb shell.
(gdb) b tcpserver.c:130
(gdb) r 6666
The r 6666
command starts the program with the argument 6666
(the TCP port
to listen on). We can hit the breakpoint by making a TCP connection to that
port on loopback and writing some data to it with netcat
(nc
). Execute the
following in another shell:
echo 'AAAA' | nc 127.0.0.1 6666
Back in the debugger, you should see a note from gdb that the breakpoint was hit:
Breakpoint 1, main (argc=2, argv=0x7fffffffe788) at tcpserver.c:130
130 printf("server received %d bytes: %s", n, buf);
We can use this opportunity to take a look at what is on the call stack using the “examine” command - the syntax being:
x/<number of memory chunks><size of each chunk> <start address>
If you are new to gdb, bear with me - it is definitely not straightforward.
For example, x/14a $rsp
will retrieve 14 chunks of memory starting at the
memory address stored in the rsp
CPU register (often called “the stack
pointer” register). The rsp
register stores the address of the top of
the call stack.
The size of a single chunk is determined by the second argument. In this
case, that is the size of a single memory address ("a
" for “address”) for
the current CPU. On a 64-bit CPU, a single address (or pointer) is 8 bytes
(64 bits). As a result, this retrieves 14 pointers-worth of memory starting
at the top of the call stack:
# As previously noted, this assumes you are on an x86 64-bit processor.
(gdb) x/14a $rsp
0x7fffffffe220: 0x7fffffffe788 0x2ffffe2b0
0x7fffffffe230: 0x7fffffffe2c0 0x100000010
0x7fffffffe240: 0x300001f90 0x500000004
0x7fffffffe250: 0x7ffff7fc2b80 0x7ffff7fc6510
0x7fffffffe260: 0x100007f901f0002 0x000
0x7fffffffe270: 0x100007fa2c80002 0x0
0x7fffffffe280: 0xa41414141 0x0
The hex-encoded integers in the leftmost column represent call stack memory
addresses starting at the top of the stack, which happens to be
0x7fffffffe220
. The other two columns contain the hex-encoded data found
at the top of the stack, represented in little-endian order, and split into
pointer-sized chunks (64 bits, or 8 bytes).
For extra confusion, gdb does not display the leading zeros that would convey the memory addresses or their corresponding values being 64 bits in width. The only clue about values being 64 bits is the addressing on the left, which is in increments of 16 bytes (thus each column represents 8 bytes, or 64 bits).
Looking at 0x7fffffffe280
, we can see that our AAAA\n
string ended up on
the stack in the form of 0xa41414141
. Why the stack? Because that is where
the buf
variable is stored. Local variables with fixed sizes (like buf
)
are stored on the stack. The variable’s value is reversed because x86
processors store byte sequences in little-endian order. Since the variable’s
value was first zeroed out using the bzero
function, we are left with 1,019
zeros trailing behind the string.
Let’s see what else is on the stack by retrieving 152 more pointers worth of memory (1,216 bytes):
(gdb) x/152a $rsp
0x7fffffffe220: 0x7fffffffe788 0x2ffffe2b0
0x7fffffffe230: 0x7fffffffe2c0 0x100000010
0x7fffffffe240: 0x300001f90 0x500000004
0x7fffffffe250: 0x7ffff7fc2b80 0x7ffff7fc6510
0x7fffffffe260: 0x100007f901f0002 0x000
0x7fffffffe270: 0x100007fa2c80002 0x0
0x7fffffffe280: 0xa41414141 0x0
0x7fffffffe290: 0x0 0x0
# ...
0x7fffffffe670: 0x0 0x0
0x7fffffffe680: 0x7fffffffe780 0xbd40dd910d243300
0x7fffffffe690: 0x0 0x7ffff7dfa0b3 <__libc_start_main+243>
0x7fffffffe6a0: 0x7ffff7ffc620 0x7fffffffe788
0x7fffffffe6b0: 0x200000000 0x55555555538f <main>
0x7fffffffe6c0: 0x5555555556a0 0xe33c7eb32d1202ea
0x7fffffffe6d0: 0x555555555280 0x7fffffffe780
Why are we looking at pointer-sized memory chunks you ask? Since the stack is creatively re-used to store data for different function calls, we might be able to locate pointers (memory addresses) that were pushed on to the stack by previous function calls.
One helpful gdb feature is that it automatically annotates addresses that point
to known “things” like global variables and function addresses. These types of
human-readable identifiers are colloquially known as symbols. You can see this
in the output above in the form of text flanked by < >
. For example,
0x55555555538f
is where the main
function is mapped to. By simply looking
for these helpful annotations in the call stack, perhaps we can find a pointer
to a known glibc symbol…
Sure enough, we can find one glibc symbol: __libc_start_main+243
.
The +243
indicates that the address points to the glibc __libc_start_main
function plus 243 bytes (base 10). I am not sure why gdb displays a base 10
offset alongside a base 16 memory address… again, another sub-optimal
gdb-ism. On a side note, I only know that __libc_start_main
is a glibc symbol
because I searched around in Google.
We can confirm that this is indeed a glibc symbol by examining the memory
mappings. The address (0x7ffff7dfa0b3
from the output above) fits in
the second memory region, which happens to map to glibc:
(gdb) info proc mappings
# ...
0x7ffff7dd3000 0x7ffff7df8000 0x25000 0x0 /usr/lib/x86_64-linux-gnu/libc-2.31.so
0x7ffff7df8000 0x7ffff7f70000 0x178000 0x25000 /usr/lib/x86_64-linux-gnu/libc-2.31.so
0x7ffff7f70000 0x7ffff7fba000 0x4a000 0x19d000 /usr/lib/x86_64-linux-gnu/libc-2.31.so
Call stack layout reproducibility
This is also a good opportunity to double-check that the call stack layout
remains the same between executions of the program. If that assumption is true,
we will be able to locate this glibc pointer in the same location relative to,
say, the buf
variable across executions of the program. First we need to
calculate the relative offset between where the __libc_start_main+243
pointer
is stored and where the local buf
variable is stored. We can accomplish this
using gdb:
# Get the exact address of "__libc_start_main+243" pointer:
(gdb) x/1a 0x7fffffffe690+8
0x7fffffffe698: 0x7ffff7dfa0b3 <__libc_start_main+243>
# Get the address of the "buf" variable:
(gdb) p &buf
$1 = (char (*)[1024]) 0x7fffffffe280
# Subtract the address of "buf" from the address of the glibc pointer:
(gdb) print 0x7fffffffe698 - 0x7fffffffe280
$2 = 1048
# Double check that buf+1048 contains the "__libc_start_main+243" pointer:
(gdb) x/1a buf+1048
0x7fffffffe698: 0x7ffff7dfa0b3 <__libc_start_main+243>
Now that we know the pointer is stored 1,048 bytes relative to buf
, we can
rerun the program, retrieve a pointer-sized chunk at buf+1048
, and confirm
that the returned value is __libc_start_main+243
:
# Note: gdb disables ASLR by default. To really demonstrate that this
# technique is unaffected by ASLR, you need to stop gdb from disabling
# ASLR using the following command:
(gdb) set disable-randomization off
(gdb) kill
Kill the program being debugged? (y or n) y
[Inferior 1 (process 32177) killed]
(gdb) r 6666
Starting program: /tcpserver 6666
server established connection with localhost (127.0.0.1)
Breakpoint 2, main (argc=2, argv=0x7ffe45c2f9a8) at tcpserver.c:130
130 printf("server received %d bytes: %s", n, buf);
(gdb) x/1a buf+1048
0x7ffe45c2f8b8: 0x7f37b30570b3 <__libc_start_main+243>
This demonstrates that certain data can be predictably found on the call stack at a consistent, relative location regardless of ASLR being enabled. The only way to discover what data is stored on the stack is to research the application or library code. In this case, we did this using dynamic analysis.
The significance of seemingly random glibc addresses
You are probably wondering why we are so interested in glibc addresses, and why we can find them on the stack at all. These addresses are artifacts left behind from glibc’s initialization. When a program starts, code inserted by the C compiler executes before the programmer’s code. This leaves behind data such as pointers to code in glibc. Unbeknownst to the programmer, these artifacts are typically overwritten when they zero out memory or initialize variables. However, that does not guarantee all such artifacts are scrubbed from the process' memory.
The significance of these addresses is that they can hint at the version of glibc used by a vulnerable program. Once we know the glibc version, we can derive where other glibc functions or global variables are mapped in memory relative to glibc’s base address. Open-source databases such as niklasb/libc-database and blukat29/search-libc (a web UI wrapper for the former project) can identify or suggest glibc versions based on the offsets of leaked addresses. The more addresses you can leak, the more accurate these tools become.
Even with a single address these tools can still be effective. If you take the
address of __libc_start_main+243
from the previous output, and subtract 243
you will get the actual address of that symbol:
# <__libc_start_main+243> <243 base 16> <__libc_start_main addr>
0x7f37b30570b3 - 0xf3 = 0x7f37b3056fc0
Go ahead and drop the symbol name __libc_start_main
and that address into
libc.blukat.me. You may be surprised to see it
suggests only three possible glibc versions:
libc6_2.31-0ubuntu9.1_amd64
libc6_2.31-0ubuntu9.2_amd64
libc6_2.31-0ubuntu9_amd64
In my case, libc6_2.31-0ubuntu9.2_amd64
is the version installed in the
Docker container I used for this example. While the database presented three
possibilities, a hacker can potentially automate testing for the correct
version. If you are unable to find any matching glibc versions for an address,
it is possible that the database does not know about the particular version or
OS-specific variant you are using (I ran into this with Kali Linux, which
apparently maintains its own glibc packages).
How was the database able to work its way back to glibc versions from a single ASLR-wrapped glibc address? If you recall from earlier, memory in Linux must be page-aligned. This results in glibc being mapped to an address with its lower 12 bits being zeroed out. The lower 12 bits will be consistent regardless of ASLR being enabled.
In other words, the database is simply looking for instances of
__libc_start_main
with addresses whose lower 12 bits match the lower bits
of the address we supplied. Recall that ASLR only influences where objects are
mapped in memory (the object in this case being glibc). ASLR does not have the
capability to randomize where arbitrary data within an object are mapped.
Calculating glibc’s base address
So far, we have identified some key pieces of information:
- Where to find a glibc pointer / address
- The address of the
__libc_start_main
glibc function - One or more possible versions of glibc
Putting together everything we have discussed so far: we know that ASLR has limitations regarding where it can map objects, and what it can randomize. Since a memory mapping is effectively a copy of the object being mapped, we can subtract the offset of the symbol in the library shared object file from its memory-mapped location. This will reveal where the library itself is mapped in memory for a running process. This location is often referred to as the “base address”.
There are several ways to find symbol addresses in a shared object file.
The readelf
tool is the most straightforward, as it abstracts dumping all
symbols with the -s
option:
$ readelf -s /usr/lib/x86_64-linux-gnu/libc-2.31.so | grep __libc_start_main
2235: 0000000000026fc0 483 FUNC GLOBAL DEFAULT 16 __libc_start_main@@GLIBC_2.2.5
You can also use the slightly-more-unwieldy objdump
tool, which can analyze
file formats other than ELF. Depending on how glibc was compiled, you can
likely use the -T
option to dump the dynamic symbol table:
# Note: The "-r", "-R", "-t", and "-T" options all dump different symbol-related
# tables. The correct option to use may depend on the compile-time options used.
$ objdump -T /lib/x86_64-linux-gnu/libc-2.31.so | grep __libc_start_main
0000000000026fc0 g DF .text 00000000000001e3 GLIBC_2.2.5 __libc_start_main
The glibc database tool (particularly, blukat
) also tells you the function’s
offset relative to the beginning of the file if you click on a glibc version.
The first address listed in both readelf
and objdump
(0x026fc0
) is the
offset of the function relative to the start of the glibc shared object. If
we subtract that from the function’s memory-mapped address, we get the base
address of the glibc library at runtime:
# <function addr at runtime> <offset in file> <glibc base addr>
0x7f37b3056fc0 - 0x026fc0 = 0x7f37b3030000
We can easily narrow down glibc candidates because we have direct access to the vulnerable program at runtime. In the real world, a hacker would need to calculate addresses for different glibc versions and test them. The difficulty of this depends on the vulnerable program restarting automatically if it exits unexpectedly. Referencing incorrect memory addresses often results in the program crashing or being killed by the operating system. A crash can be used as a test condition by the hacker.
I have seen such predictors described as “crash oracles”, although that might be a bit sloppy of a characterization. 12 In any case, a TCP reset or other transport-level event can make good crash oracles.
Finding addresses of other glibc code
Locating other glibc code in memory is easy once you identify the glibc
version and its base address. Simply take the offset of the desired code
in glibc’s shared object file and add that offset to glibc’s base address.
Let’s try it out with the system
function:
$ readelf -s /usr/lib/x86_64-linux-gnu/libc-2.31.so | grep system
1427: 0000000000055410 45 FUNC WEAK DEFAULT 16 system@@GLIBC_2.2.5
# <glibc base addr> <offset> <"system" function addr>
# 0x7f37b3030000 + 0x055410 = 0x7f37b3085410
We can confirm that the address is correct using gdb:
(gdb) p &system
$1 = (int (*)(const char *)) 0x7f37b3085410 <__libc_system>
We will examine how this can be used in an exploit in part four.
The importance of zeroing memory
I was initially surprised that there were so few glibc pointers littering the
stack of this program. Based on my previous experience with CTF challenges,
I simply expected to see more addresses leftover from glibc initialization.
Thinking back to one CTF challenge in particular, I realized why my
expectations were skewed: the CTF program did not zero out memory before
storing input data. The tcpserver
program, on the other hand, does so on
line 126. Let’s see what happens when we comment that logic out:
|
|
After re-compiling the program, set a breakpoint for line 130 again, and write
some data using nc
like before:
(gdb) b tcpserver.c:130
Breakpoint 1 at 0x1607: file tcpserver.c, line 130.
(gdb) r 6666
Starting program: /tcpserver 6666
server established connection with localhost (127.0.0.1)
Breakpoint 1, main (argc=2, argv=0x7fffffffe788) at tcpserver.c:130
130 printf("server received %d bytes: %s", n, buf);
(gdb) x/32a buf
0x7fffffffe280: 0x7f0a41414141 0x7ffff7ffe4f8
0x7fffffffe290: 0x0 0x7ffff7fcd1a8
0x7fffffffe2a0: 0x7ffff7ff42bf 0x1
0x7fffffffe2b0: 0xffffffff 0x7fffffffe304
0x7fffffffe2c0: 0x7ffff7dd9790 0x7ffff7fc3000
0x7fffffffe2d0: 0x7fffffffe4f0 0x7fffffffe3c0
0x7fffffffe2e0: 0x0 0x7fffffffe300
0x7fffffffe2f0: 0x7fffffffe3f0 0x7ffff7fe1bcc
0x7fffffffe300: 0x7ffff7ffc739 <_rtld_global_ro+281> 0x7ffff7fcffb0
0x7fffffffe310: 0x7fffffffe450 0x7
0x7fffffffe320: 0x800000007 0x7ffff7fcf580
0x7fffffffe330: 0x7ffff7ffd9e8 <_rtld_global+2440> 0x7ffff7fdcf14
0x7fffffffe340: 0x9 0x7ffff7fdd799
0x7fffffffe350: 0x7fffffffe3a0 0x7ffff7dedab0
0x7fffffffe360: 0x7ffff7fc3000 0x0
0x7fffffffe370: 0x7fffffffe420 0x0
While there are no glibc pointers in this memory, there are pointers to linker data. While zeroing out memory is usually standard practice for most C programmers, I think this really demonstrates why it is important to do so.
Saved by the call stack canary
There is another neat learning opportunity in the tcpserver
program. In C,
strings are handled by either carefully saving their size, or by placing
a “null” (0x00
) byte at the end of the string. The latter practice is known
as “null terminating a string”. A relatively common bug in C programs is
failing to place a null terminator in a string or buffer variable, which is
the case in this example program.
Even though the code zeros out the buffer, a hacker can provide a string that
is exactly the length of the buffer variable - thus overwriting all of the
0x00
bytes. This becomes a problem when the program tries to write the
user-supplied data back to the user because it uses the strlen
function:
|
|
The strlen
function finds the length of a string or buffer variable by
counting until it finds a 0x00
byte. This can be very unsafe if the buffer
is not null-terminated, as there is no limit to strlen
’s counting. In this
case, strlen
is used to figure out how many bytes of buf
should be
written back to the user. This bug can be used to leak information from
the process.
[Un]lucky for us, modern versions of gcc
automatically apply the call stack
canary mitigation. Without going into too much detail, this means a null
terminator is always placed after local variables at runtime. As a result,
the strlen
here may still leak some data, but it will never leak sensitive
program state stored on the “other side” of the canary.
We will discuss this behavior in more detail in the next post.
Summary
We covered quite a bit in this post, including:
- ASLR on Linux and its limitations (at a high-level)
- The memory layout of the call stack is reproducible across executions of a program
- Linux maps libraries to addresses starting with
0x00007f
and program data to addresses starting with0x000055
- How to determine the glibc version and the addresses of code within using a hypothetical call stack-based information leak
- The importance of zeroing out memory and initializing variables - especially before returning that data to users
- The importance of null-terminating buffers or byte strings
- How to examine memory and locate variables using
gdb
In part three, we will take a look at a purposely-vulnerable program and the built-in defenses it offers.
References
-
Thompson, Jacob. 2020, March 17. “Six Facts about Address Space Layout Randomization on Windows”. ↩︎
-
Spengler, Brad. 2013, March 20. “KASLR: An Exercise in Cargo Cult Security”. Note: Also at: forums.grsecurity.net ↩︎
-
Marco-Gisbert, Hector and Ripoll, Ismael. 2018, March 20. “return-to-csu: A New Method to Bypass 64-bit Linux ASLR)”. Note: Also at: www.semanticscholar.org ↩︎
-
Klein, Amit and Pinkas, Benny. 2019, June. “From IP ID to Device ID and KASLR Bypass”. Note: Also at: ui.adsabs.harvard.edu ↩︎
-
Groß, Samuel. 2020, January 9. “Remote iPhone Exploitation Part 2: Bringing Light into the Darkness – a Remote ASLR Bypass”. ↩︎
-
www.kernel.org. Accessed: 2022, January 15. “Documentation for /proc/sys/kernel/* kernel version 2.2.10”. ↩︎
-
www.kernel.org. Accessed: 2022, January 15. “Complete virtual memory map with 4-level page tables”. ↩︎
-
Bonelli, Marco. 2020, May 2. “Why does Linux favor 0x7f mappings?”. ↩︎
-
cs4401.walls.ninja. Accessed: 2022, January 15. “Lecture Notes: Address Space Layout Randomization”. ↩︎
-
kitctf.de. Accessed: 2022, January 15. “The Tools We Built”. ↩︎
-
Bryant, Randal and O’Hallaron, David. 1999, December 7. “tcpserver.c”. Note: Related material: www.cs.cmu.edu ↩︎
-
Groß, Samuel. 2020, April 28. “Fuzzing ImageIO”. ↩︎