Fuzzing Made Easy Part #1: A beginner’s guide to writing a fuzzing harness

h0mbre: “Harnessing targets is some of the most satisfying work related to fuzzing. It feels great to successfully sandbox a target and have it play nice with your fuzzer.”

Key takeaways

Identify the target function: Choose your fuzzing candidate wisely. Examples and tutorials of the target library are usually a good guide on how to interface the harness effectively.
Resource management: Implement a cleanup to prevent a memory leak and to maintain consistency across fuzzing iterations (see xmlFreeTextReader(doc)).
Validation checks: Always check (and maybe parse) the returned result of the function you are fuzzing. Look out for pointers to invalid memory addresses and remember that you must trigger the library’s parsing functionality as completely as possible.
Keep Improving: Fuzzing is an iterative process, and your harness may evolve as you gain insights into your application's behavior under various inputs.

‍

In this first article of our series, we focus on customizing a fuzzing harness—the key to effective fuzz testing. We assume you know the basics of fuzzing and how to use tools like AFL++ and Honggfuzz.

Fuzzing is a powerful technique, but without a harness, even advanced tools like AFL++ and Honggfuzz are constrained in their ability to test complex targets. A harness allows these tools to interface seamlessly with intricate APIs or non-standard input formats. That way, deeper vulnerabilities can be uncovered more efficiently. To follow along, you need basic C coding skills and experience with Linux.

Introduction

A fuzzing harness is a program that enables fuzz testing of specific functions. It connects the fuzzer to the software under test (“SUT”). The harness translates test inputs from the fuzzing engine into a format the SUT can understand.

Figure 1. Fuzzing: automated data mutation to trigger bugs

Besides being the interface between the fuzzer and the target, a harness also initializes the target API for testing.

In some cases, for example, when testing command-line applications that read input files, a custom harness isn’t strictly necessary. However, creating one can improve fuzzing speed and coverage even in those cases.

Benefits of a good fuzzing harness

A well-designed fuzzing harness uncovers hidden bugs that might otherwise go unnoticed. Here’s how it boosts your fuzzing efforts:

1. Input Translation: First, the harness ensures the fuzzer interacts with the target SUT correctly. Many applications require inputs in specific formats or through specific interfaces, and a well-built harness ensures that the fuzzing tool can meet those requirements.

2. Maximized Code Coverage: Next, a solid harness optimizes code coverage by activating various API configurations. This helps the fuzzer explore more code, increasing the likelihood of finding vulnerabilities.

3. Continuous Fuzzing: When it comes to continuous fuzzing, especially in CI/CD pipelines, a good harness constantly probes your code for vulnerabilities (Klooster et al 2023). A harness providing scalable tests can make all the difference, helping you catch bugs early and adapt to changes in the codebase. (Though, as we’ll explain in a future post, there are better ways to approach this :-))

4. Targeted Testing: Lastly, a harness can be customized to focus on specific parts of an application. This allows for more precise bug hunting, especially when paired with advanced techniques like in-memory fuzzing or detailed coverage metrics (Haboob).

In summary, whether testing a small app or a complex system, a well-crafted fuzzing harness is your best friend in identifying security issues before others do.

Limitations of fuzzing

Let’s keep realistic expectations, though. While fuzzing is a powerful tool for finding vulnerabilities, it's just one part of a much bigger picture in a comprehensive Secure Development Lifecycle (SDL). Fuzzing is a key puzzle piece—useful, but you still need other practices (like threat modeling, code reviews, and penetration testing) to cover all security bases.

Where does fuzzing fall short compared to other practices?‍

Logic flaws: Fuzzing can’t catch everything. It’s fantastic at spotting certain bugs quickly, especially memory issues, but it’s not so great with logical flaws, authentication problems, or encryption weaknesses.

Coverage: Even with advanced, coverage-guided fuzzers, achieving full code coverage is tough. Some parts of the code might rely on specific conditions or rare inputs that the fuzzer doesn’t generate—like magic values or deep program states. In some cases, certain code paths are unreachable, depending on the harness setup.

False positives and negatives: Depending on what and how you fuzz, you may find a fair share of false positives and negatives. You may miss vulnerabilities due to limited edge-case testing, complex bug conditions, or issues like injection flaws, weak cryptography, or missing authorization checks. Conversely, your fuzzer can trigger crashes that wouldn’t occur in normal use, for example, when pre-handlers filter inputs before they actually reach the function you are fuzzing directly.

Context: Finally, fuzzers aren't "aware" of the bigger picture. They don't know what's considered "correct" behavior beyond spotting crashes or exceptions. So, they can't catch more subtle issues, like data leaks or compliance violations.

While fuzzing is a great tool, it’s not a one-stop solution for securing software.

Getting started: Don’t despair

Fuzzing is a skill that takes time to master, and it is normal to feel overwhelmed at first. There are a few common challenges that most beginners run into, but don’t worry—you’re not alone (Nourry at al 2023).

Start with a simple target. Setting up the environment is often one of the first hurdles, with issues like build failures, missing dependencies, or system configurations that just don’t want to cooperate. Then there’s the tricky part of choosing the right target to fuzz. Many newcomers dive straight into overly complex programs, only to feel frustrated when progress is slow, or results are hard to come by. Start testing simple programs or utilities that process straightforward inputs. File format converters are a good start.

Be patient. Many fuzzing tools aren’t exactly beginner friendly. They often require you to know your way around low-level programming, debugging, or even reverse engineering. And even when you start finding crashes, reproducing them reliably or figuring out the root cause can feel like solving a puzzle without all the pieces. On top of that, fuzzing can be a resource hog, demanding plenty of computational power and time to yield meaningful results. Instead of reinventing the wheel, consider using prebuilt tools and frameworks. Tools like AFL++, libFuzzer, andHonggfuzz have helpful documentation and examples to guide you through the setup process and avoid common mistakes.

Optimize. As you dive deeper, focus on coverage-guided fuzzers. Tools like AFL++ provide feedback on explored code paths, giving you a clearer picture of what's happening and where to focus your efforts.

To investigate identified crashes, familiarize yourself with tools like GDB, Valgrind, and reverse engineering platforms like Ghidra or IDA Pro. These tools are essential for analyzing crashes and understanding the program's behavior during fuzzing.

You'll need to tweak your target code for fuzzing. This might involve specific changes, like disabling checksums or modifying dependencies on global state, to ensure your code is fuzzing-friendly while minimizing false positives.

Keep on. These obstacles aren’t there to stop you. Everyone starts somewhere, and most seasoned practitioners of fuzz testing have faced these same struggles early on. The key is to take it one step at a time, focus on small victories, and trust that with persistence you’ll build the skills to tackle even the most advanced fuzzing challenges. Keep going, you’ve got this! And this blog post series is there to guide you along your path.

You are not alone—join online communities. Platforms like Discord's Awesome Fuzzing or Reddit’s r/ExploitDev are great for connecting with others in the field. Sharing advice and learning from others’ experiences can help you stay motivated and overcome challenges.

Prepare for writing a fuzzing harness

In this article, we focus on harness generation for C/C++ targets. The approach is similar in other development languages like Rust, GO, and Python. We’ll feature these in another article.

For C/C++, there is an industry-standard function for writing fuzzing harnesses: LLVMFuzzerTestOneInput():

#include <stdio.h>
#include <stdint.h>

// If the target needs to be initialized once then put
// these calls in this function:
int LLVMFuzzerInitialize(int *argc, char ***argv) {
  return 0;
}

// This is the harness function, here you apply the mutated
// input in “data” with size “size” to the target function.
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  // 1. Check first that “size” is the necessary length we
  // need and return otherwise

  // 2. Transform “data” of “size” to the necessary 
  // parameters of the target functions

  // 3. Call the initial functions necessary to set up the 
  // call you want to fuzz

  // 4. Call the target function with the mutated input

  // 5. Check return code and returned data validity

  // 6. If necessary, perform cleanup and reset the state.
  // Avoid memory leaks by any means!

  return 0;
}

‍

In this example, we inspect libxml2, a highly versatile and widely used library for parsing, validating, and manipulating XML and HTML documents. Many higher-level libraries (e.g., lxml in Python) use libxml2 as their underlying engine to provide user-friendly APIs while retaining the power of C-level performance. libxml2 works across various platforms including Linux, Windows, macOS, and embedded systems.

First, we download libxml2 and compile it for fuzzing. For beginners, the hardest part is taming the build system to successfully compile the target for fuzzing. In this example, we fuzz using AFL++, but the approach works just as well with libfuzzer, honggfuzz, or libAFL.

To avoid frustration, follow these easy steps to download and compile libxml2 for fuzzing:

1. Download the latest version of libxml2

git clone https://github.com/GNOME/libxml2
cd libxml2

‍

2. Compile libxml2 for fuzzing

‍We will do this in three steps:

‍2.1. Run the ls command

‍It reveals you can use cmake or autogen. Let us use autogen:

./autogen.sh

2.2. Configure the libxml2 library

‍Configure it as a static library and use the AFL++ compiler engine for compiling the target for fuzzing. (Additionally, we will turn off Python support because we do not need it. You can and should disable everything you do not need or are not interested in):

CC=afl-cc ./configure --disable-shared --without-python

‍‍

2.3. Compile it

‍Run the following command:

make -j4

‍

Creating a fuzzing harness for a specific application requires two key preparations:

1. Identifying the target function

2. Understanding the input format and the function interface

1. Identify the target function

Choose a function or API that processes user input or handles potentially untrusted data. Wanting to fuzz the libxml2 parsing library, we start by looking at examples. These are located in the ./example/ directory.

Looking at the parse*.c and read*.c files, we find out that two different interfaces are exposed: The first one is called “xmlReadFile” and the second one “xmlReaderForFile”, a newer implementation.

Both read from files. When we fuzz, however, we try to avoid writing test files for each iteration, as the increased I/O takes time and stresses the disk. Looking at the other examples or the include files that export those read functions, we find alternative entry points that are perfect for fuzzing—“xmlReadMemory” and “xmlReaderForMemory” respectively.

We decide to focus on “xmlReaderForMemory”.

2. Understand the input format and the function interface

We now need to examine the function signature and understand its parameters and return values. Let’s determine what type of input the target function expects (byte array, file, options, etc.).

In our example, this is quite straightforward:

XmlTextReaderPtr xmlReaderForMemory(const char *buffer, int size, const char *URL, const char *encoding, int options);

‍

We can assign our input data and size to buffer and size of the function. Looking at the documentation, examples, or the source code reveals that we can set URL and encoding to NULL and options to 0.

Create a fuzzing harness

Now, let’s create a harness for our target!

Using reader1.c as a template, where we switch the function signature of streamFile() to LLVMFuzzerTestOneInput() and then just simply ignore the process_node debug output, we can achieve the following simple harness:

#include <libxml/xmlreader.h>

int LLVMFuzzerTestOneInput(const unsigned char *data, size_t size) {

  // Assign our fuzzing data to the xmlReaderForMemory function
  // Note that this acquires memory we must free before any return!
  xmlTextReaderPtr doc = xmlReaderForMemory((const char *)data, size, NULL, NULL, 0);

  if (doc) {  // If this succeeded:
    // Ensure we parse the whole XML document
    while (xmlTextReaderRead(doc) == 1);

    // Free the memory, otherwise we have a memory leak.
    xmlFreeTextReader(doc);
  }

  // Done.
  return 0;
}

Compile the harness and fuzz

To compile the harness, we need to link both the libxml2 static library as well as all additional libraries required by libxml2. Undefined functions indicate which libraries are still missing. We can also check the generated Makefile. This approach should work for most users:

afl-cc -o harness -fsanitize=fuzzer harness.c .libs/libxml2.a -Iinclude -lz -lm

‍

Finding out with which libraries to link is often the hardest part as many build systems are opaque. “make -n” shows the real build commands that would be performed for a target example application. Undefined reference errors can also help you identify missing libraries.

To fuzz, we use the ./test/ XML example directory of the library as the initial seeds:

afl-fuzz -i test -o out ./harness

‍

Congratulations! We have successfully created our first simple harness. In the next article of this series, we dive deep into creating high-performance harnesses, identifying which parts of a library are not fully covered by fuzzing.

To learn more about AFL++ and fuzzing we highly recommend this link: https://github.com/AFLplusplus/AFLplusplus/blob/stable/docs/fuzzing_in_depth.md

Conclusion

To run an effective fuzzing campaign, you need to write a custom harness. Its main functionalities are initialization, input delivery, and result analysis.

Some key takeaways from our harness development process include:

1. Identify the target function: Choose your fuzzing candidate wisely. Examples and tutorials of the target library are usually a good guide on how to interface the harness effectively.

2. Resource management: Implement a cleanup to prevent a memory leak and to maintain consistency across fuzzing iterations (see xmlFreeTextReader(doc) above).

3. Validation checks: Always check (and maybe parse) the returned result of the function you are fuzzing. Look out for pointers to invalid memory addresses and remember that you must trigger the library’s parsing functionality as completely as possible.

4. Keep Improving: Fuzzing is an iterative process, and your harness may evolve as you gain insights into your application's behavior under various inputs. Stay curious and keep refining your approach.

Now that you have written your first harness, you are ready to dive into common pitfalls and best practices to refine your fuzzing harnesses!

‍

In this article series, we put together all pieces of the jigsaw puzzle:

#2: Common fuzzing mistakes and further best practices

#3: How to write harnesses for Go and fuzz Go applications

#4: How to write harnesses for Rust and Python and fuzz them

#5: How to scope a software target for APIs to fuzz

#6: The different types of fuzzing harnesses

#7: Effective seeding

#8: How to perform coverage analysis

#9: How to run fuzzing campaigns

#10: Continuous fuzzing campaigns

Special thanks to Reviewer Stephan Zeisberg.

Photo by Clark Van Der Beken on Unsplash

Editing by Maria A. Sivenkova

‍