AI Ruined CTFs
I love Capture-The-Flag competitions.
Or - Should I rather say: I used to. Now we decided maybe it is best to spend some time apart, you know? With AI entering the picture… I don’t want to fall into a love triangle. CTFs made it clear, their heart belongs to someone else.
Love at first sight
CTF competitions hooked me in 2019, after I participated in PicoCTF. Immediately I was entranced. Soon after, Covid19 struck, and I was stuck at home with nothing to do, so instead of doing boring school work, I participated in every CTF I could find. The most notable one I participated in was the CSCG 2020, where I placed 21st in the global category.
In 2022, I pivoted toward competitive programming, but soon thereafter returned to CTFs.
In 2023, I jumped into CTFs again. I founded our university’s cybersecurity society in 2024, and our team ranked number one in South Africa in 2024 and 2025, closely followed by… myself! During this time I also won the BSides Joburg 2025 CTF.
What makes a good CTF challenge
Good CTF challenges teach through friction. The struggle is the point.
One of my first encounters with software security puzzles was this challenge from Securify. This is one of the best examples of a well designed CTF challenge. Simple, the goal is right there. The only thing keeping you from solving the problem is a lack of knowledge. The great thing is, you can find the information with a bit of reading and experimentation and eventually solve the challenge. The depth of knowledge you gain from bashing your head against such a challenge is incredible.
Contrast it with a bad challenge. A challenge like, for example, crack this hash.
955b322ab45819879603fe24584dc4604a1a412681a0884b34a573e1a21a217b
While a well designed crypto challenge might give you a nice python script with an implementation flaw that you have to exploit mathematically to derive the original string, a challenge like this does not teach anything. It becomes a question of who can paste the hash into CrackStation first. Not very helpful.
The cracks
During the start of 2025, I started to notice AI creeping in. First it was one of my team mates using his ChatGPT pro subscription to farm challenges in a friendly competition, then it was other teams slowly but surely catching on. I even experimented a bit, by building my own harness called tertullian. Overnight, an arms race started.
At first, challenge authors could keep up. AI solved the beginner challenges. The ones nobody really cared about. But eventually, the strongest oak must fall, AI models could soon solve close to 80% of challenges in a typical CTF. Agentic tools exploded onto the scene. Suddenly, AI companies made hacking a benchmark, and CTF challenges became training material. Most CTF challenges can be solved in half an hour with the right prompt. Success no longer correlates with a team’s problem-solving skills, but rather the amount of tokens they can afford.
Authors responded, by adding a layer of obscurity. For instance, they might remove whitespace and rename the variables of the above PHP script in an attempt to confuse the AI. I personally am not fond of this style of challenge, but understanding a piece of code with a level of obfuscation is still a skill, I do not object to this style of challenge.
The problem is not bad CTF challenges. Bad challenges have existed since the dawn of CTFs, rather the transformation of CTF challenges with excellent educational value into bad ones.
The breakup
AI teams again recognised this trend, and models started to become better at solving these obfuscated challenges. Authors had to adapt, again. This time by adding another level of indirection.
The straw that broke the camel’s back for me was when I encountered this challenge: Instead of just putting your SQLi payload into the application, you should now first XOR encrypt it with the challenge author’s name, and a value posted in the Discord. Other authors further obfuscate the challenge with more guessing layers that do not reward skill. Luck becomes the defining factor, or teams that can run 40 agents on a single challenge to find the top secret method this string is encoded in.
Could we get back together again?
The real problem is not that AI can solve challenges, rather that its presence changes what challenge authors design for.
This is an incredibly hard challenge to solve. How do you reward teams who use human effort to solve challenges, while keeping challenges fun? It is not realistic to try and ban AI usage in CTFs. How do you showcase something cool without an AI tool bypassing the learning opportunity? How do you satisfy competitive people (like myself) who want to compete, but cant compete against a machine that automatically solves your opponents challenges for them?
These are all my unanswered questions. For now, until we have a proper solution, I’m retiring from CTF competitions with the exception of small local challenges and private competitions where I trust people to not spoil the fun.