The Motivation And Design Behind Autogenerated Challenges

Christopher Ganas

April 11, 2016

In nearly all CTF competitions organizers spend dozens of hours creating challenges that are compiled once with no thought for variation or alternate deployments. For example, a challenge may hard-code in a flag, making it hard to change later, or hard-code in a system-specific resource.

At ForAllSecure, we are working to build automatically generated challenges from templates. For example, when creating a buffer overflow, you should be able to generate 10 different instances to practice on. And these instances should be able to be deployed anywhere, on a dime. While you can't automate away the placement of subtle bugs and clever tricks, we can definitely add meaningful sources of variance to challenges without much additional effort, with the added bonus that challenges are easier to deploy.

Defining Challenge Specifications

We differentiate between autogenerated challenges and standard challenges. A standard challenge is built once for the purpose of a contest. An autogenerated challenge is built from a framework, and can be rebuilt and redeployed on demand.

The key features in our autogeneration framework are:

The ability to rebuild a challenge on demand
The ability to template a problem so that we can create many instances that vary only in the templated (and possibly randomly generated) parameters.
The ability to deterministically rebuild and redeploy without changing the secret flags. Deterministic replay only works if the random parameters are replayable.

In our approach, an autogenerated challenge is comprised of challenge metadata, a generation script, and templatable challenge files (source code and other problem resources). The total package makes up a self contained challenge specification which we can use to create an endless stream of similar challenge instances. The metadata provides information about the challenge such as the author, category, point value, and any hints pertaining to the challenge. The generation script templates and compiles challenge resources into a specific challenge instance (read: a fully realized and deployed challenge) that competitors can play.

Deterministic rebuild and redeployment without the need to change the secret flags was not obvious to use at first. However, we've found it an important property. If we need to edit a problem mid-contest, we want to make sure the flags do not change for consistency. We make generation a deterministic process by enforcing that all challenge generation scripts rely on a single source of provided randomness. In our implementation we derive the randomness for instance generation from a global generation secret, the challenge_name, and the number assigned to a particular instance (typically 0…n).

By providing an instance number we can generate unique entropy for any number of instances for a particular challenge.
By providing a challenge_name we ensure that no two problems have the same source of entropy for a given instance number.
And finally by providing a global secret we can trivially change all problem instances from competition to competition.

Case Study: Variable Length Buffers And Canaries

Note: The examples below are targeted at the picoCTF implementation of autogeneration which also includes system dependency management and other subsystems. They serve as a great introduction to the power of autogenerated challenges.

To give a more concrete example of what autogeneration can look like for the average challenge author I’m going to give a walkthrough on creating a local pwnable with an autogenerated buffer size and stack canary, Pwn1, by commenting on the challenge specification.

To reiterate, our challenge specification is comprised of a problem.json for the challenge metadata, a challenge.py for challenge generation, and finally vuln.c for the actual source code. This walk-through assumes you have some understanding of python, json, and C. It would be great to have some familiarity with the picoCTF autogeneration framework; however, the code should be meaningful psuedocode without it.

problem.json

{
"name": "Pwn1",
"category": "Binary Exploitation",
"description": "Exploit the challenge here and read the flag file.",
"score" : 50,
"hints": [],
"author": "Christopher Ganas",
"organization": "ForAllSecure"
}

These fields are specific to the picoCTF framework; however, it is worth noting that the description field for the challenge instance is also templated. In this case, we are leveraging the implicit fields of our deployment to include the location of the local binary on our challenge server. This is our basic challenge metadata.

vuln.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#define FLAG_LEN
char flag[FLAG_LEN];
void vuln(){
char canary[] = "{canary}";
char buf[];
gets(buf);
puts(buf);
fflush(stdout);
if(strncmp(canary, "", ) != 0) {
puts("BUFFER OVERFLOW DETECTED!");
exit(1);
}
}
void call_me() {
puts("Congratulations! Have a shell.");
char *args[2];
args[0] = "/bin/sh";
args[1] = 0;
execve(args[0], args, NULL);
}
int main(int argc, char **argv){
// Set the gid to the effective gid
// this prevents /bin/sh from dropping the privileges
gid_t gid = getegid();setresgid(gid, gid, gid);
puts("Echo Server 0.1 ALPHA");
vuln();
return 0;
}

Our challenge source code takes advantage of the templating offered by our generation framework to provide a stack canary and a buffer_length for our challenge to use. Authors are given full control to dump python variables and functions into the templating scope for this purpose.

challenge.py

from hacksport.problem_templates import CompiledBinary
import string
StarterTemplate = CompiledBinary(sources=["vuln.c"], flag_file="flag.txt")
class Problem(StarterTemplate):
def initialize(self):
self.len = len
self.canary = "".join(self.random.sample(string.digits, 4))
self.buffer_length = 4 * self.random.randint(8, 17)

‍

The picoCTF framework goes above and beyond to provide some generic challenge templates, in our case CompiledBinary, to promote code reuse. This template in particular will take care of compiling vuln.c as a 32bit setegid binary and creating a protected flag.txt file with the instance’s flag.

We then extend CompiledBinary with our Problem class definition by supplying an initialize function that will be called before challenge templating. Any class fields will automatically be placed into the templating scope to use. In the case of Pwn1, we include a random 4 digit canary string and a random buffer length between 32 and 64.

The rest of the picoCTF shell framework will take care of packaging and deploying our instances. So our responsibility as a challenge author ends with the completion of the challenge specification.

If you are interested in seeing more examples there are a few on the picoCTF wiki:

Deploy a unique flag for every instance of a binary exploitation challenge.
Change the encrypted key and template the running service for a crypto challenge.
Autogenerate web exploitation challenges in a consistent and portable way.

Some Benefits Of Autogenerated Challenges

Replayability

One of the most helpful aspects of autogenerated challenges (from a learning perspective) is the ability to play multiple versions of the same challenge. A practical example of this is having the ability to randomize the stack canary length of a beginner pwnable (an adaption of the problem we created in the walk-through) with a trivial eip overwrite and a print_flag. Competitors will run into subtle padding issues when trying to adapt their exploit from instance to instance. The insight gained from having to work around the subtle differences allows competitors to master concepts faster and, more importantly, to demonstrate learning in an educational setting. For much the same reason a Calculus I class would ask students to master integration of similar functions, autogeneration allows us to apply the same learning principles to topics like binary exploitation and applied cryptography.

Challenge integrity

When running a competition with an open forum (eg, IRC, Slack) it can be a nightmare for organizers to moderate the channel effectively. Beyond the issue of unauthorized hint distribution, organizers face a unique challenge when mitigating the damage from competitors dropping flags in a global channel. Replacing the flags in static challenges can sometimes be impractical. This includes crypto or forensics challenges where the flag is hidden in a binary blob or a filesystem image. However, with autogeneration, replacing challenge flags is trivial. Additionally, with a setup where users have a 1/n chance of being assigned to any given challenge instance, only some subset of users would even be affected. Redeploying an autogenerated challenge with a different secret allows any administrator to change its flag without first having to understand the challenge. This minimizes concerns over broken patches and keeps things moving in the competition. Additionally, if it becomes necessary, autogenerating new challenge instances can reduce the effects of competitors finding unauthorized challenge writeups or sharing solve scripts. If you redeploy a challenge, causing buffer sizes and names to change, it is unlikely that contestants well remain able to solve it by copying and pasting exploits from a walkthrough. Of course, this does not prevent them from taking the code given and making the necessary modifications to solve the challenge, but it does make it a more involved process. They might even learn something about the challenge while doing it!

Cheating detection

Provided your competition infrastructure is set up to randomly assign challenge instances from a set of reasonable size, you can use autogeneration to detect flag sharing and even track down the distributor of leaked problems. In the case of autogenerated flags, where a significant portion of the string is a hash, it is very unlikely that a competitor of instance 0 would happen to submit the correct flag for instance 5 by chance. In the case where a user claims to have accidentally mistyped the 32 characters of his friend's hash, the evidence becomes substantially more damning if the trend continues over multiple problems. If the userbase is similar in size to the set of challenge instances it becomes trivial to determine the other member of the cheating pair with convincing probability. Likewise, if you find a set of challenges you wanted to keep private out in the world and wish to identify the individual responsible the pool of possible candidates decreases exponentially with each problem leaked. Because of the exponential partitioning, only a few problems have to be recovered to arrive at a match.

The Future of Challenge Development

Challenge authors are still in complete control of the static and dynamic parts of their challenges when creating challenge specifications, making autogenerated challenges appropriate for use in all typical CTF categories. The picoCTF-shell-manager framework was designed to handle packaging and autogeneration of challenges with or without the rest of the picoCTF infrastructure (making it a viable solution for any organization). The versatility and power of autogenerated challenges is actively being explored by ForAllSecure in its cybersecurity training programs. For the sake of organizers and competitors, consider taking advantage of autogenerated problems in your next competition.

Share this post